Re: WIP: Avoid creation of the free space map for small tables

2019-03-17 Thread Amit Kapila
On Fri, Mar 15, 2019 at 3:25 PM Amit Kapila  wrote:
>
> I have committed the latest version of this patch.  I think we can
> wait for a day or two see if there is any complain from buildfarm or
> in general and then we can close this CF entry.
>

Closed this CF entry.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-15 Thread Amit Kapila
On Fri, Mar 15, 2019 at 3:40 PM John Naylor  wrote:
>
> On Fri, Mar 15, 2019 at 5:55 PM Amit Kapila  wrote:
> > I have committed the latest version of this patch.  I think we can
> > wait for a day or two see if there is any complain from buildfarm or
> > in general and then we can close this CF entry.  IIRC, this was the
> > last patch in the series, right?
>
> Great, thanks! I'll keep an eye on the buildfarm as well.
>
> I just spotted two comments in freespace.c that were true during
> earlier patch revisions, but are no longer true, so I've attached a
> fix for those.
>

LGTM, so committed.

> There are no other patches in the series.
>

Okay.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-15 Thread John Naylor
On Fri, Mar 15, 2019 at 5:55 PM Amit Kapila  wrote:
> I have committed the latest version of this patch.  I think we can
> wait for a day or two see if there is any complain from buildfarm or
> in general and then we can close this CF entry.  IIRC, this was the
> last patch in the series, right?

Great, thanks! I'll keep an eye on the buildfarm as well.

I just spotted two comments in freespace.c that were true during
earlier patch revisions, but are no longer true, so I've attached a
fix for those. There are no other patches in the series.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


correct-local-map-comments.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-03-15 Thread Amit Kapila
On Thu, Mar 14, 2019 at 7:46 PM Amit Kapila  wrote:
>
> On Thu, Mar 14, 2019 at 12:37 PM John Naylor
>  wrote:
> >
> > On Thu, Mar 14, 2019 at 2:17 PM Amit Kapila  wrote:
> > >
> > > 1. Added an Assert in new_cluster_needs_fsm() that old cluster version
> > > should be >= 804 as that is where fsm support has been added.
> >
> > There is already an explicit check for 804 in the caller,
> >
>
> Yeah, I know that, but I have added it to prevent this function being
> used elsewhere.  OTOH, maybe you are right that as per current code it
> is superfluous, so we shouldn't add this assert.
>

I have committed the latest version of this patch.  I think we can
wait for a day or two see if there is any complain from buildfarm or
in general and then we can close this CF entry.  IIRC, this was the
last patch in the series, right?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-14 Thread Amit Kapila
On Thu, Mar 14, 2019 at 12:37 PM John Naylor
 wrote:
>
> On Thu, Mar 14, 2019 at 2:17 PM Amit Kapila  wrote:
> >
> > 1. Added an Assert in new_cluster_needs_fsm() that old cluster version
> > should be >= 804 as that is where fsm support has been added.
>
> There is already an explicit check for 804 in the caller,
>

Yeah, I know that, but I have added it to prevent this function being
used elsewhere.  OTOH, maybe you are right that as per current code it
is superfluous, so we shouldn't add this assert.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-14 Thread John Naylor
On Thu, Mar 14, 2019 at 2:17 PM Amit Kapila  wrote:
>
> 1. Added an Assert in new_cluster_needs_fsm() that old cluster version
> should be >= 804 as that is where fsm support has been added.

There is already an explicit check for 804 in the caller, and the HEAD
code is already resilient to FSMs not existing, so I think this is
superfluous.

> 2. Reverted the old cluster version check to <= 1100.  There was
> nothing wrong in the way you have written a check, but I find most of
> the other places in the code uses that way.
> 3. At one place changed the function header to be consistent with the
> nearby code, run pgindent on the code and changed the commit message.

Looks good to me, thanks.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-03-14 Thread Amit Kapila
On Thu, Mar 14, 2019 at 7:08 AM John Naylor  wrote:
>
> > [segfault problems]
>
> This now seems spurious. I ran make distclean, git pull, reapplied the
> patch (leaving out the gettimeofday() calls), and now my upgrade perf
> test works with default compiler settings. Not sure what happened, but
> hopefully we can move forward.
>

Yeah, I took another pass through the patch and change minor things as
you can see in the attached patch.

1. Added an Assert in new_cluster_needs_fsm() that old cluster version
should be >= 804 as that is where fsm support has been added.
2. Reverted the old cluster version check to <= 1100.  There was
nothing wrong in the way you have written a check, but I find most of
the other places in the code uses that way.
3. At one place changed the function header to be consistent with the
nearby code, run pgindent on the code and changed the commit message.

Let me know what you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v24-0001-During-pg_upgrade-conditionally-skip-transfer-of-FSM.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-03-13 Thread John Naylor
> [segfault problems]

This now seems spurious. I ran make distclean, git pull, reapplied the
patch (leaving out the gettimeofday() calls), and now my upgrade perf
test works with default compiler settings. Not sure what happened, but
hopefully we can move forward.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-03-13 Thread Amit Kapila
On Wed, Mar 13, 2019 at 7:42 PM John Naylor  wrote:
>
> On Wed, Mar 13, 2019 at 8:18 PM Amit Kapila  wrote:
> > > First, I had a problem: On MacOS with their "gcc" wrapper around
> > > clang, I got a segfault 11 when compiled with no debugging symbols.
> > >
> >
> > Did you get this problem with the patch or both with and without the
> > patch?  If it is only with patch, then we definitely need to
> > investigate.
>
> Only with the patch.
>

If the problem is reproducible, then I think we can try out a few
things to narrow down or get some clue about the problem:
(a) modify function new_cluster_needs_fsm() such that it always
returns true as its first statement.  This will tell us if the code in
that function has some problem.
(b) run with Valgrind

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-13 Thread John Naylor
On Wed, Mar 13, 2019 at 8:18 PM Amit Kapila  wrote:
> > First, I had a problem: On MacOS with their "gcc" wrapper around
> > clang, I got a segfault 11 when compiled with no debugging symbols.
> >
>
> Did you get this problem with the patch or both with and without the
> patch?  If it is only with patch, then we definitely need to
> investigate.

Only with the patch.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-03-13 Thread Amit Kapila
On Wed, Mar 13, 2019 at 4:57 PM John Naylor  wrote:
>
> On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila  wrote:
>
> > Have you done any performance testing of this patch?  I mean to say
> > now that we added a new stat call for each table, we should see if
> > that has any impact.  Ideally, that should be compensated by the fact
> > that we are now not transferring *fsm files for small relations.
>
> To be precise, it will only call stat if pg_class.relpages is below
> the threshold. I suppose I could hack a database where all the
> relpages values are wrong, but that seems like a waste of time.
>

Right.

> > How
> > about constructing a test where all relations are greater than 4 pages
> > and then try to upgrade them.  We can check for a cluster with a
> > different number of relations say 10K, 20K, 50K, 100K.
>
> I did both greater and less than 4 pages for 10k relations. Since
> pg_upgrade is O(# relations), I don't see a point in going higher.
>
> First, I had a problem: On MacOS with their "gcc" wrapper around
> clang, I got a segfault 11 when compiled with no debugging symbols.
>

Did you get this problem with the patch or both with and without the
patch?  If it is only with patch, then we definitely need to
investigate.

> I
> added "CFLAGS=-O0" and it worked fine. Since this doesn't happen in a
> debugging build, I'm not sure how to investigate this. IIRC, this
> doesn't happen for me on Linux gcc.
>
> Since it was at least running now, I measured by putting
> gettimeofday() calls around transfer_all_new_tablespaces(). I did 10
> runs each and took the average, except for patch/1-page case since it
> was obviously faster after a couple runs.
>
> 5 pages:
> masterpatch
> 5.59s 5.64s
>
> The variation within the builds is up to +/- 0.2s, so there is no
> difference, as expected.
>
> 1 page:
> masterpatch
> 5.62s 4.25s
>
> Clearly, linking is much slower than stat.
>

The results are fine.  Thanks for doing the tests.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-13 Thread John Naylor
On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila  wrote:

> Have you done any performance testing of this patch?  I mean to say
> now that we added a new stat call for each table, we should see if
> that has any impact.  Ideally, that should be compensated by the fact
> that we are now not transferring *fsm files for small relations.

To be precise, it will only call stat if pg_class.relpages is below
the threshold. I suppose I could hack a database where all the
relpages values are wrong, but that seems like a waste of time.

> How
> about constructing a test where all relations are greater than 4 pages
> and then try to upgrade them.  We can check for a cluster with a
> different number of relations say 10K, 20K, 50K, 100K.

I did both greater and less than 4 pages for 10k relations. Since
pg_upgrade is O(# relations), I don't see a point in going higher.

First, I had a problem: On MacOS with their "gcc" wrapper around
clang, I got a segfault 11 when compiled with no debugging symbols. I
added "CFLAGS=-O0" and it worked fine. Since this doesn't happen in a
debugging build, I'm not sure how to investigate this. IIRC, this
doesn't happen for me on Linux gcc.

Since it was at least running now, I measured by putting
gettimeofday() calls around transfer_all_new_tablespaces(). I did 10
runs each and took the average, except for patch/1-page case since it
was obviously faster after a couple runs.

5 pages:
masterpatch
5.59s 5.64s

The variation within the builds is up to +/- 0.2s, so there is no
difference, as expected.

1 page:
masterpatch
5.62s 4.25s

Clearly, linking is much slower than stat.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-03-12 Thread Amit Kapila
On Sun, Mar 10, 2019 at 7:47 PM John Naylor  wrote:
>
> On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila  wrote:
>
> > Have you done any performance testing of this patch?  I mean to say
> > now that we added a new stat call for each table, we should see if
> > that has any impact.  Ideally, that should be compensated by the fact
> > that we are now not transferring *fsm files for small relations.  How
> > about constructing a test where all relations are greater than 4 pages
> > and then try to upgrade them.  We can check for a cluster with a
> > different number of relations say 10K, 20K, 50K, 100K.
>
> I have not, but I agree it should be done. I will try to do so soon.
>

Thanks, I will wait for your test results.  I believe this is the last
patch in this project and we should try to get it done soon.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-10 Thread John Naylor
On Fri, Mar 8, 2019 at 7:43 PM Amit Kapila  wrote:
> Few minor comments:
> 1.
> warning C4715: 'new_cluster_needs_fsm': not all control paths return a value
>
> Getting this new warning in the patch.

Hmm, I don't get that in a couple versions of gcc. Your compiler must
not know that pg_fatal() cannot return. I blindly added a fix.

> 2.
>
> This comment line doesn't seem to be related to this patch.  If so, I
> think we can avoid having any additional change which is not related
> to the functionality of this patch.  Feel free to submit it
> separately, if you think it is an improvement.

> +
> + /* Transfer any VM files if we can trust their contents. */
>   if (vm_crashsafe_match)

Well, I guess the current comment is still ok, so reverted. If I were
to do a separate cleanup patch, I would rather remove the
vm_must_add_frozenbit parameter -- there's no reason I can see for
calls that transfer the heap and FSM to know about this.

I also changed references to the 'first segment of the main fork'
where there will almost always only be one segment. This was a vestige
of the earlier algorithm I had.

> 3. Can we add a note about this in the Notes section of pg_upgrade
> documentation [1]?

Done.

> Have you done any performance testing of this patch?  I mean to say
> now that we added a new stat call for each table, we should see if
> that has any impact.  Ideally, that should be compensated by the fact
> that we are now not transferring *fsm files for small relations.  How
> about constructing a test where all relations are greater than 4 pages
> and then try to upgrade them.  We can check for a cluster with a
> different number of relations say 10K, 20K, 50K, 100K.

I have not, but I agree it should be done. I will try to do so soon.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From b65084daf15d198c9fdd986992b0147290c8e690 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sun, 10 Mar 2019 22:01:21 +0800
Subject: [PATCH v23] During pg_upgrade, conditionally skip transfer of FSMs.

If a heap on the old cluster has 4 pages or fewer, and the old cluster
was PG v11 or earlier, don't copy or link the FSM. This will shrink
space usage for installations with large numbers of small tables.
---
 doc/src/sgml/ref/pgupgrade.sgml  |  7 
 src/bin/pg_upgrade/info.c| 16 +++--
 src/bin/pg_upgrade/pg_upgrade.h  |  6 
 src/bin/pg_upgrade/relfilenode.c | 58 +++-
 4 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 7e1afaf0a5..c896882dd1 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -792,6 +792,13 @@ psql --username=postgres --file=script.sql postgres
is down.
   
 
+  
+   In PostgreSQL 12 and later small tables by
+   default don't have a free space map, as a space optimization.  If you are
+   upgrading a pre-12 cluster, the free space maps of small tables will
+   likewise not be transferred to the new cluster.  
+  
+
  
 
  
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 2f925f086c..902bfc647e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data,
 
 	map->old_db_oid = old_db->db_oid;
 	map->new_db_oid = new_db->db_oid;
+	map->relpages = old_rel->relpages;
+	map->relkind = old_rel->relkind;
 
 	/*
 	 * old_relfilenode might differ from pg_class.oid (and hence
@@ -418,6 +420,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	char	   *nspname = NULL;
 	char	   *relname = NULL;
 	char	   *tablespace = NULL;
+	char	   *relkind = NULL;
 	int			i_spclocation,
 i_nspname,
 i_relname,
@@ -425,7 +428,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 i_indtable,
 i_toastheap,
 i_relfilenode,
-i_reltablespace;
+i_reltablespace,
+i_relpages,
+i_relkind;
 	char		query[QUERY_ALLOC];
 	char	   *last_namespace = NULL,
 			   *last_tablespace = NULL;
@@ -494,7 +499,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	 */
 	snprintf(query + strlen(query), sizeof(query) - strlen(query),
 			 "SELECT all_rels.*, n.nspname, c.relname, "
-			 "  c.relfilenode, c.reltablespace, %s "
+			 "  c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s "
 			 "FROM (SELECT * FROM regular_heap "
 			 "  UNION ALL "
 			 "  SELECT * FROM toast_heap "
@@ -525,6 +530,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	i_relname = PQfnumber(res, "relname");
 	i_relfilenode = PQfnumber(res, "relfilenode");
 	i_reltablespace = PQfnumber(res, "reltablespace");
+	i_relpages = PQfnumber(res, "relpages");
+	i_relkind = PQfnumber(res, "relkind");
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	for (relnum = 0; relnum < ntups; relnum++)
@@ -556,6 +563,11 @@ get_rel_infos(ClusterInfo *cluster, 

Re: WIP: Avoid creation of the free space map for small tables

2019-03-08 Thread Amit Kapila
On Fri, Mar 8, 2019 at 5:13 PM Amit Kapila  wrote:
>
>
> Few minor comments:
..
>
> 2.
> +
> + /* Transfer any VM files if we can trust their
> contents. */
>   if (vm_crashsafe_match)
>
> 3. Can we add a note about this in the Notes section of pg_upgrade
> documentation [1]?
>
> This comment line doesn't seem to be related to this patch.  If so, I
> think we can avoid having any additional change which is not related
> to the functionality of this patch.  Feel free to submit it
> separately, if you think it is an improvement.
>

oops, I have messed up the above comments.  The paragraph starting
with "This comment line doesn't ..." is for comment number-2.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-08 Thread Amit Kapila
On Wed, Mar 6, 2019 at 5:19 PM John Naylor  wrote:
>
> On Fri, Jan 25, 2019 at 9:50 AM Amit Kapila  wrote:
> > Once we agree on the code, we need to test below scenarios:
> > (a) upgrade from all supported versions to the latest version
> > (b) upgrade standby with and without using rsync.
>
> Although the code hasn't been reviewed yet, I went ahead and tested
> (a) on v21 of the pg_upgrade patch [1]. To do this I dumped out a 9.4
> instance with the regression database and restored it to all supported
> versions. To make it work with pg_upgrade, I first had to drop tables
> with oids, drop functions referring to C libraries, and drop the
> later-removed '=>' operator. Then I pg_upgrade'd in copy mode from all
> versions to HEAD with the patch applied. pg_upgrade worked without
> error, and the following query returned 0 bytes on all the new
> clusters:
>
> select sum(pg_relation_size(oid, 'fsm')) as total_fsm_size
> from pg_class where relkind in ('r', 't')
> and pg_relation_size(oid, 'main') <= 4 * 8192;
>
> The complementary query (> 4 * 8192) returned the same number of bytes
> as in the original 9.4 instance.
>

Thanks, the tests done by you are quite useful.  I have given a few
comments as a response to your previous email.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-08 Thread Amit Kapila
On Mon, Jan 28, 2019 at 2:33 AM John Naylor  wrote:
>
> On Sat, Jan 26, 2019 at 2:14 PM Amit Kapila  wrote:
> >
> > I think there is some value in using the information from
> > this function to skip fsm files, but the code doesn't appear to fit
> > well, how about moving this check to new function
> > new_cluster_needs_fsm()?
>
> For v21, new_cluster_needs_fsm() has all responsibility for obtaining
> the info it needs. I think this is much cleaner,
>

Right, now the code looks much better.

> but there is a small
> bit of code duplication since it now has to form the file name. One
> thing we could do is form the the base old/new file names in
> transfer_single_new_db() and pass those to transfer_relfile(), which
> will only add suffixes and segment numbers. We could then pass the
> base old file name to new_cluster_needs_fsm() and use it as is. Not
> sure if that's worthwhile, though.
>

I don't think it is worth.

Few minor comments:
1.
warning C4715: 'new_cluster_needs_fsm': not all control paths return a value

Getting this new warning in the patch.

2.
+
+ /* Transfer any VM files if we can trust their
contents. */
  if (vm_crashsafe_match)

3. Can we add a note about this in the Notes section of pg_upgrade
documentation [1]?

This comment line doesn't seem to be related to this patch.  If so, I
think we can avoid having any additional change which is not related
to the functionality of this patch.  Feel free to submit it
separately, if you think it is an improvement.

Have you done any performance testing of this patch?  I mean to say
now that we added a new stat call for each table, we should see if
that has any impact.  Ideally, that should be compensated by the fact
that we are now not transferring *fsm files for small relations.  How
about constructing a test where all relations are greater than 4 pages
and then try to upgrade them.  We can check for a cluster with a
different number of relations say 10K, 20K, 50K, 100K.

In general, the patch looks okay to me.  I would like to know if
anybody else has any opinion whether pg_upgrade should skip
transferring fsm files for small relations or not?  I think both me
and John thinks that it is good to have feature and now that patch
turns out to be simpler, I feel we can go ahead with this optimization
in pg_upgrade.

[1] - https://www.postgresql.org/docs/devel/pgupgrade.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-07 Thread Amit Kapila
On Sat, Feb 23, 2019 at 1:24 PM John Naylor  wrote:
>
> On Thu, Feb 21, 2019 at 9:27 PM Alvaro Herrera  
> wrote:
> >
> > I think this test is going to break on nonstandard block sizes.  While
> > we don't promise that all tests work on such installs (particularly
> > planner ones), it seems fairly easy to cope with this one -- just use a
> > record size expressed as a fraction of current_setting('block_size').
> > So instead of "1024" you'd write current_setting('block_size') / 8.
> > And then display the relation size in terms of pages, not bytes, so
> > divide pg_relation_size by block size.
>
> I've done this for v6, tested on 16k block size.
>

Thanks, the patch looks good to me.  I have additionally tested it 32K
and 1K sized blocks and the test passes.  I will commit this early
next week.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-03-06 Thread John Naylor
On Fri, Jan 25, 2019 at 9:50 AM Amit Kapila  wrote:
> Once we agree on the code, we need to test below scenarios:
> (a) upgrade from all supported versions to the latest version
> (b) upgrade standby with and without using rsync.

Although the code hasn't been reviewed yet, I went ahead and tested
(a) on v21 of the pg_upgrade patch [1]. To do this I dumped out a 9.4
instance with the regression database and restored it to all supported
versions. To make it work with pg_upgrade, I first had to drop tables
with oids, drop functions referring to C libraries, and drop the
later-removed '=>' operator. Then I pg_upgrade'd in copy mode from all
versions to HEAD with the patch applied. pg_upgrade worked without
error, and the following query returned 0 bytes on all the new
clusters:

select sum(pg_relation_size(oid, 'fsm')) as total_fsm_size
from pg_class where relkind in ('r', 't')
and pg_relation_size(oid, 'main') <= 4 * 8192;

The complementary query (> 4 * 8192) returned the same number of bytes
as in the original 9.4 instance.

To make it easy to find, the latest regression test patch, which is
intended to be independent of block-size, was in [2].

[1] 
https://www.postgresql.org/message-id/CACPNZCu4cOdm3uGnNEGXivy7Gz8UWyQjynDpdkPGabQ18_zK6g%40mail.gmail.com

[2] 
https://www.postgresql.org/message-id/CACPNZCsWa%3Ddd0K%2BFiODwM%3DLEsepAHVJCoSx2Avew%3DxBEX3Ywjw%40mail.gmail.com

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-26 Thread Petr Jelinek
On 26/02/2019 16:20, Alvaro Herrera wrote:
> On 2019-Feb-23, John Naylor wrote:
> 
>> On Fri, Feb 22, 2019 at 3:59 AM Amit Kapila  wrote:
>>> The reason for not pushing much on making the test pass for
>>> nonstandard block sizes is that when I tried existing tests, there
>>> were already some failures.
>>
>> FWIW, I currently see 8 failures (attached).
> 
> Hmm, not great -- even the strings test fails, which seems to try to handle
> the case explicitly.  I did expect the plan shape ones to fail, but I'm
> surprised about the tablesample one.
> 

The SYSTEM table sampling is basically per-page sampling so it depends
heavily on which rows are on which page.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-26 Thread Alvaro Herrera
On 2019-Feb-23, John Naylor wrote:

> On Fri, Feb 22, 2019 at 3:59 AM Amit Kapila  wrote:
> > The reason for not pushing much on making the test pass for
> > nonstandard block sizes is that when I tried existing tests, there
> > were already some failures.
> 
> FWIW, I currently see 8 failures (attached).

Hmm, not great -- even the strings test fails, which seems to try to handle
the case explicitly.  I did expect the plan shape ones to fail, but I'm
surprised about the tablesample one.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-23 Thread John Naylor
On Fri, Feb 22, 2019 at 3:59 AM Amit Kapila  wrote:
> The reason for not pushing much on making the test pass for
> nonstandard block sizes is that when I tried existing tests, there
> were already some failures.

FWIW, I currently see 8 failures (attached).

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


jcn-16blk-regress.diffs
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-02-22 Thread John Naylor
On Thu, Feb 21, 2019 at 9:27 PM Alvaro Herrera  wrote:
>
> I think this test is going to break on nonstandard block sizes.  While
> we don't promise that all tests work on such installs (particularly
> planner ones), it seems fairly easy to cope with this one -- just use a
> record size expressed as a fraction of current_setting('block_size').
> So instead of "1024" you'd write current_setting('block_size') / 8.
> And then display the relation size in terms of pages, not bytes, so
> divide pg_relation_size by block size.

I've done this for v6, tested on 16k block size.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 157f74155beb53c455b0a85b6eebdd22a60ce5f5 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 23 Feb 2019 08:48:31 +0100
Subject: [PATCH v6] Add more tests for FSM.

In commit b0eaa4c51bb, we left out a test that used a vacuum to remove dead
rows as the behavior of test was not predictable.  This test has been
rewritten to use fillfactor instead to control free space.  Since we no
longer need to remove dead rows as part of the test, put the fsm regression
test in a parallel group.
---
 src/test/regress/expected/fsm.out  | 59 +-
 src/test/regress/parallel_schedule |  8 +---
 src/test/regress/serial_schedule   |  2 +-
 src/test/regress/sql/fsm.sql   | 41 +
 4 files changed, 77 insertions(+), 33 deletions(-)

diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index b02993188c..698d4b0be5 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -1,15 +1,40 @@
 --
 -- Free Space Map test
 --
+SELECT current_setting('block_size')::integer AS blocksize,
+current_setting('block_size')::integer / 8 AS strsize
+\gset
 CREATE TABLE fsm_check_size (num int, str text);
--- With one block, there should be no FSM
-INSERT INTO fsm_check_size VALUES(1, 'a');
+-- Fill 3 blocks with one record each
+ALTER TABLE fsm_check_size SET (fillfactor=15);
+INSERT INTO fsm_check_size SELECT i, rpad('', :strsize, 'a')
+FROM generate_series(1,3) i;
+-- There should be no FSM
 VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
-pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- heap_size | fsm_size 
+--
-  8192 |0
+SELECT pg_relation_size('fsm_check_size', 'main') / :blocksize AS heap_nblocks,
+pg_relation_size('fsm_check_size', 'fsm') / :blocksize AS fsm_nblocks;
+ heap_nblocks | fsm_nblocks 
+--+-
+3 |   0
+(1 row)
+
+-- The following operations are for testing the functionality of the local
+-- in-memory map. In particular, we want to be able to insert into some
+-- other block than the one at the end of the heap, without using a FSM.
+-- Fill most of the last block
+ALTER TABLE fsm_check_size SET (fillfactor=100);
+INSERT INTO fsm_check_size SELECT i, rpad('', :strsize, 'a')
+FROM generate_series(101,105) i;
+-- Make sure records can go into any block but the last one
+ALTER TABLE fsm_check_size SET (fillfactor=30);
+-- Insert large record and make sure it does not cause the relation to extend
+INSERT INTO fsm_check_size VALUES (111, rpad('', :strsize, 'a'));
+VACUUM fsm_check_size;
+SELECT pg_relation_size('fsm_check_size', 'main') / :blocksize AS heap_nblocks,
+pg_relation_size('fsm_check_size', 'fsm') / :blocksize AS fsm_nblocks;
+ heap_nblocks | fsm_nblocks 
+--+-
+3 |   0
 (1 row)
 
 -- Extend table with enough blocks to exceed the FSM threshold
@@ -26,23 +51,23 @@ num = 11;
 END;
 $$;
 VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-24576
+SELECT pg_relation_size('fsm_check_size', 'fsm') / :blocksize AS fsm_nblocks;
+ fsm_nblocks 
+-
+   3
 (1 row)
 
 -- Add long random string to extend TOAST table to 1 block
 INSERT INTO fsm_check_size
 VALUES(0, (SELECT string_agg(md5(chr(i)), '')
-		   FROM generate_series(1,100) i));
+		   FROM generate_series(1, :blocksize / 100) i));
 VACUUM fsm_check_size;
-SELECT pg_relation_size(reltoastrelid, 'main') AS toast_size,
-pg_relation_size(reltoastrelid, 'fsm') AS toast_fsm_size
+SELECT pg_relation_size(reltoastrelid, 'main') / :blocksize AS toast_nblocks,
+pg_relation_size(reltoastrelid, 'fsm') / :blocksize AS toast_fsm_nblocks
 FROM pg_class WHERE relname = 'fsm_check_size';
- toast_size | toast_fsm_size 
-+
-   8192 |  0
+ toast_nblocks | toast_fsm_nblocks 
+---+---
+ 1 | 0
 (1 row)
 
 DROP TABLE fsm_check_size;
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4051a4ad4e..a956775dd1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -23,7 +23,7 @@ 

Re: WIP: Avoid creation of the free space map for small tables

2019-02-22 Thread Alvaro Herrera
On 2019-Feb-22, Peter Geoghegan wrote:

> I find it suspicious that there is another crash in pageinspect's
> brin_page_items(), since like amcheck, pageinspect is a contrib module
> that relies on BLCKSZ when allocating a local temp buffer.

Ah.  Maybe they just weren't rebuilt.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-22 Thread Peter Geoghegan
On Fri, Feb 22, 2019 at 8:04 AM Alvaro Herrera  wrote:
> Wow, there's a lot less tests failing there than I thought there would
> be.  That increases hope that we can someday have them pass.  +1 on not
> making things worse.
>
> I think the crash in the amcheck test should be studied, one way or
> another; CCing Peter.

I built Postgres with "--with-blocksize=16" and "--with-blocksize=32",
and tested amcheck with both builds. All tests passed.

I have a hard time imagining what the problem could be here. If there
was a problem with amcheck relying on there being an 8KiB block size
specifically, then it would almost certainly have been there since the
initial commit from March 2017. Not much has changed since then, and
the crash that Amit reported occurs at the earliest possible point.

I find it suspicious that there is another crash in pageinspect's
brin_page_items(), since like amcheck, pageinspect is a contrib module
that relies on BLCKSZ when allocating a local temp buffer.

-- 
Peter Geoghegan



Re: WIP: Avoid creation of the free space map for small tables

2019-02-22 Thread Alvaro Herrera
On 2019-Feb-22, Amit Kapila wrote:

> On Fri, Feb 22, 2019 at 1:57 AM Alvaro Herrera  
> wrote:
> >
> > I think this test is going to break on nonstandard block sizes.  While
> > we don't promise that all tests work on such installs (particularly
> > planner ones),
> 
> The reason for not pushing much on making the test pass for
> nonstandard block sizes is that when I tried existing tests, there
> were already some failures.  For example, see the failures in the
> attached regression diff files (for block_size as 16K and 32K
> respectively).  I saw those failures during the previous
> investigation, the situation on HEAD might or might not be exactly the
> same.  Whereas I see the value in trying to make sure that tests pass
> for nonstandard block sizes, but that doesn't seem to be followed for
> all the tests.

Wow, there's a lot less tests failing there than I thought there would
be.  That increases hope that we can someday have them pass.  +1 on not
making things worse.

I think the crash in the amcheck test should be studied, one way or
another; CCing Peter.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-22 Thread Robert Haas
On Thu, Feb 21, 2019 at 9:59 PM Amit Kapila  wrote:
> The reason for not pushing much on making the test pass for
> nonstandard block sizes is that when I tried existing tests, there
> were already some failures.

Sure, but let's not make things worse.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Amit Kapila
On Fri, Feb 22, 2019 at 1:57 AM Alvaro Herrera  wrote:
>
> I think this test is going to break on nonstandard block sizes.  While
> we don't promise that all tests work on such installs (particularly
> planner ones),
>

The reason for not pushing much on making the test pass for
nonstandard block sizes is that when I tried existing tests, there
were already some failures.  For example, see the failures in the
attached regression diff files (for block_size as 16K and 32K
respectively).  I saw those failures during the previous
investigation, the situation on HEAD might or might not be exactly the
same.  Whereas I see the value in trying to make sure that tests pass
for nonstandard block sizes, but that doesn't seem to be followed for
all the tests.

> it seems fairly easy to cope with this one -- just use a
> record size expressed as a fraction of current_setting('block_size').
> So instead of "1024" you'd write current_setting('block_size') / 8.
> And then display the relation size in terms of pages, not bytes, so
> divide pg_relation_size by block size.
>

The idea sounds good.  John, would you like to give it a try?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


regression.16.diffs
Description: Binary data


regression.32.diffs
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Alvaro Herrera
I think this test is going to break on nonstandard block sizes.  While
we don't promise that all tests work on such installs (particularly
planner ones), it seems fairly easy to cope with this one -- just use a
record size expressed as a fraction of current_setting('block_size').
So instead of "1024" you'd write current_setting('block_size') / 8.
And then display the relation size in terms of pages, not bytes, so
divide pg_relation_size by block size.

(I see that there was already a motion for this, but was dismissed
because of lack of interest.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Amit Kapila
On Thu, Feb 21, 2019 at 6:39 PM Alvaro Herrera  wrote:
>
> On 2019-Feb-21, Amit Kapila wrote:
>
> > On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera  
> > wrote:
> > >
> > > Please remember to keep serial_schedule in sync.
> >
> > I don't understand what you mean by this?  It is already present in
> > serial_schedule.  In parallel_schedule, we are just moving this test
> > to one of the parallel groups.  Do we need to take care of something
> > else?
>
> Just to make sure it's in the same relative position.
>

Okay, thanks for the input.  Attached, find the updated patch, will
try to commit it tomorrow unless you or someone else has any other
comments.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v5-0001-Add-more-tests-for-FSM.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-02-21 Thread Alvaro Herrera
On 2019-Feb-21, Amit Kapila wrote:

> On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera  
> wrote:
> >
> > Please remember to keep serial_schedule in sync.
> 
> I don't understand what you mean by this?  It is already present in
> serial_schedule.  In parallel_schedule, we are just moving this test
> to one of the parallel groups.  Do we need to take care of something
> else?

Just to make sure it's in the same relative position.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-20 Thread John Naylor
On Thu, Feb 21, 2019 at 7:58 AM Amit Kapila  wrote:
> So here you are inserting 4-byte integer and 1024-bytes variable
> length record.  So the tuple length will be tuple_header (24-bytes) +
> 4-bytes for integer + 4-bytes header for variable length data + 1024
> bytes of actual data.  So, the length will be 1056 which is already
> MAXALIGN.  I took the new comments added in your latest version of the
> patch and added them to the previous version of the patch.   Kindly
> see if I have not missed anything while merging the patch-versions?

OK, that makes sense. Looks fine to me.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-20 Thread Amit Kapila
On Wed, Feb 20, 2019 at 8:08 PM Alvaro Herrera  wrote:
>
> Please remember to keep serial_schedule in sync.
>

I don't understand what you mean by this?  It is already present in
serial_schedule.  In parallel_schedule, we are just moving this test
to one of the parallel groups.  Do we need to take care of something
else?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-02-20 Thread Alvaro Herrera
Please remember to keep serial_schedule in sync.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-20 Thread Amit Kapila
On Mon, Feb 11, 2019 at 10:48 PM John Naylor
 wrote:
>
> On 2/9/19, Amit Kapila  wrote:
> > On Tue, Feb 5, 2019 at 3:25 PM John Naylor 
> > wrote:
> >>
> >> On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila 
> >> wrote:
> > This is certainly a good test w.r.t code coverage of new code, but I
> > have few comments:
> > 1. The size of records in test still depends on alignment (MAXALIGN).
> > Though it doesn't seem to be a problematic case, I still suggest we
> > can avoid using records whose size depends on alignment.  If you
> > change the schema as CREATE TABLE fsm_check_size (num1 int, num2 int,
> > str text);, then you can avoid alignment related issues for the
> > records being used in test.
>
> Done.
>
> > 2.
> > +-- Fill most of the last block
> > ..
> > +-- Make sure records can go into any block but the last one
> > ..
> > +-- Insert large record and make sure it does not cause the relation to
> > extend
> >
> > The comments in some part of the test seems too focussed towards the
> > algorithm used for in-memory map.  I think we can keep these if we
> > want, but it is required to write a more generic comment stating what
> > is the actual motive of additional tests (basically we are testing the
> > functionality of in-memory map (LSM) for the heap, so we should write
> > about it.).
>
> Done.
>

Thanks, the modification looks good.  I have slightly changed the
commit message in the attached patch.  I will spend some more time
tomorrow morning on this and will commit unless I see any new problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v3-0001-Add-more-tests-for-FSM.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-02-20 Thread John Naylor
On Wed, Feb 20, 2019 at 6:09 PM Amit Kapila  wrote:
> I have modified the patch for the above observations and added a
> commit message as well, see if it looks okay to you.

Looks good to me, thanks.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-20 Thread Amit Kapila
On Mon, Feb 11, 2019 at 10:48 PM John Naylor
 wrote:
>
> On 2/9/19, Amit Kapila  wrote:
>
> > Shall we add a note to the docs of pg_freespacemap and
> > pgstattuple_approx indicating that for small relations, FSM won't be
> > created, so these functions won't give appropriate value?
>
> I've given this a try in 0002.
>

This looks mostly correct, but I have a few observations:
1.
-  tuples.
+  tuples. Small tables don't have a free space map, so in that case
+  this function will report zero free space, likewise inflating the
+  estimated number of live tuples.

The last part of the sentence "likewise inflating the estimated number
of live tuples." seems incorrect to me because live tuples are
computed based on the pages scanned, live tuples in them and total
blocks in the relation.  So, I think it should be "likewise inflating
the approximate tuple length".

2.
+   In addition, small tables don't have a free space map, so this function
+   will return zero even if free space is available.

Actually, the paragraph you have modified applies to both the
functions mentioned on that page.  So instead of saying "this function
..", we can say "these functions .."

3.
* space from the FSM and move on.
+ * Note: If a relation has no FSM, GetRecordedFreeSpace() will report
+ * zero free space.  This is fine for the purposes of approximation.
  */

It is better to have an empty line before Note: ...

I have modified the patch for the above observations and added a
commit message as well, see if it looks okay to you.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v3-0002-Document-that-functions-that-use-the-FSM-for-will.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-02-11 Thread John Naylor
On 2/9/19, Amit Kapila  wrote:
> On Tue, Feb 5, 2019 at 3:25 PM John Naylor 
> wrote:
>>
>> On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila 
>> wrote:
> This is certainly a good test w.r.t code coverage of new code, but I
> have few comments:
> 1. The size of records in test still depends on alignment (MAXALIGN).
> Though it doesn't seem to be a problematic case, I still suggest we
> can avoid using records whose size depends on alignment.  If you
> change the schema as CREATE TABLE fsm_check_size (num1 int, num2 int,
> str text);, then you can avoid alignment related issues for the
> records being used in test.

Done.

> 2.
> +-- Fill most of the last block
> ..
> +-- Make sure records can go into any block but the last one
> ..
> +-- Insert large record and make sure it does not cause the relation to
> extend
>
> The comments in some part of the test seems too focussed towards the
> algorithm used for in-memory map.  I think we can keep these if we
> want, but it is required to write a more generic comment stating what
> is the actual motive of additional tests (basically we are testing the
> functionality of in-memory map (LSM) for the heap, so we should write
> about it.).

Done.

> Shall we add a note to the docs of pg_freespacemap and
> pgstattuple_approx indicating that for small relations, FSM won't be
> created, so these functions won't give appropriate value?

I've given this a try in 0002.

> Or other
> possibility could be that we return an error if the block number is
> less than the threshold value, but not sure if that is a good
> alternative as that can happen today also if the vacuum hasn't run on
> the table.

Yeah, an error doesn't seem helpful.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 7a7eb0b17aa55dbec2032bf7cc5189e22bb2ca0c Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 11 Feb 2019 18:08:01 +0100
Subject: [PATCH v2 1/2] Improve FSM regression test

In the interest of caution, in commit b0eaa4c51bb we left out a
test that used vacuum to remove dead rows. This test has been
rewritten to use fillfactor instead to control free space. Since
we no longer need to remove dead rows as part of the test, put
the fsm regression test in a parallel group.
---
 src/test/regress/expected/fsm.out  | 36 --
 src/test/regress/parallel_schedule |  8 +--
 src/test/regress/sql/fsm.sql   | 34 +++-
 3 files changed, 58 insertions(+), 20 deletions(-)

diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index b02993188c..f5f62429ac 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -1,15 +1,37 @@
 --
 -- Free Space Map test
 --
-CREATE TABLE fsm_check_size (num int, str text);
--- With one block, there should be no FSM
-INSERT INTO fsm_check_size VALUES(1, 'a');
+CREATE TABLE fsm_check_size (num1 int, num2 int, str text);
+-- Fill 3 blocks with one record each
+ALTER TABLE fsm_check_size SET (fillfactor=15);
+INSERT INTO fsm_check_size SELECT i, i, rpad('', 1024, 'a')
+FROM generate_series(1,3) i;
+-- There should be no FSM
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
 ---+--
-  8192 |0
+ 24576 |0
+(1 row)
+
+-- The following operations are for testing the functionality of the local
+-- in-memory map. In particular, we want to be able to insert into some
+-- other block than the one at the end of the heap, without using a FSM.
+-- Fill most of the last block
+ALTER TABLE fsm_check_size SET (fillfactor=100);
+INSERT INTO fsm_check_size SELECT i, i, rpad('', 1024, 'a')
+FROM generate_series(101,105) i;
+-- Make sure records can go into any block but the last one
+ALTER TABLE fsm_check_size SET (fillfactor=30);
+-- Insert large record and make sure it does not cause the relation to extend
+INSERT INTO fsm_check_size VALUES (111, 111, rpad('', 1024, 'a'));
+VACUUM fsm_check_size;
+SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
+pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
+ heap_size | fsm_size 
+---+--
+ 24576 |0
 (1 row)
 
 -- Extend table with enough blocks to exceed the FSM threshold
@@ -19,7 +41,7 @@ num int;
 BEGIN
 num = 11;
   LOOP
-INSERT INTO fsm_check_size VALUES (num, 'b') RETURNING ctid INTO curtid;
+INSERT INTO fsm_check_size VALUES (num, num, 'b') RETURNING ctid INTO curtid;
 EXIT WHEN curtid >= tid '(4, 0)';
 num = num + 1;
   END LOOP;
@@ -34,8 +56,8 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 
 -- Add long random string to extend TOAST table to 1 block
 INSERT INTO fsm_check_size
-VALUES(0, (SELECT string_agg(md5(chr(i)), '')
-		   FROM generate_series(1,100) i));
+VALUES(0, 0, (SELECT string_agg(md5(chr(i)), '')
+			  

Re: WIP: Avoid creation of the free space map for small tables

2019-02-09 Thread Amit Kapila
On Tue, Feb 5, 2019 at 3:25 PM John Naylor  wrote:
>
> On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila  wrote:
> >
> > On Mon, Feb 4, 2019 at 2:27 PM John Naylor  
> > wrote:
> > >
> > > 1. Earlier, I had a test to ensure that free space towards the front
> > > of the relation was visible with no FSM. In [1], I rewrote it without
> > > using vacuum, so we can consider adding it back now if desired. I can
> > > prepare a patch for this.
> > >
> >
> > Yes, this is required.  It is generally a good practise to add test (unless 
> > it takes a lot of time) which covers new code/functionality.
> >
> > > 2. As a follow-on, since we don't rely on vacuum to remove dead rows,
> > > we could try putting the fsm.sql test in some existing group in the
> > > parallel schedule, rather than its own group is it is now.
> > >
> >
> > +1.
>
> This is done in 0001.
>

This is certainly a good test w.r.t code coverage of new code, but I
have few comments:
1. The size of records in test still depends on alignment (MAXALIGN).
Though it doesn't seem to be a problematic case, I still suggest we
can avoid using records whose size depends on alignment.  If you
change the schema as CREATE TABLE fsm_check_size (num1 int, num2 int,
str text);, then you can avoid alignment related issues for the
records being used in test.
2.
+-- Fill most of the last block
..
+-- Make sure records can go into any block but the last one
..
+-- Insert large record and make sure it does not cause the relation to extend

The comments in some part of the test seems too focussed towards the
algorithm used for in-memory map.  I think we can keep these if we
want, but it is required to write a more generic comment stating what
is the actual motive of additional tests (basically we are testing the
functionality of in-memory map (LSM) for the heap, so we should write
about it.).


> > > 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this
> > > patch's effects on GetRecordedFreeSpace(), which will return zero for
> > > tables with no FSM.
> > >
> >
> > Right, but what exactly we want to do for it?  Do you want to add a comment 
> > atop of this function?
>
> Hmm, the comment already says "according to the FSM", so maybe it's
> already obvious. I was thinking more about maybe commenting the
> callsite where it's helpful, as in 0002.
>
> > > The other callers are in:
> > > contrib/pg_freespacemap/pg_freespacemap.c
> > > contrib/pgstattuple/pgstatapprox.c
> > >
> > > For pg_freespacemap, this doesn't matter, since it's just reporting
> > > the facts. For pgstattuple_approx(), it might under-estimate the free
> > > space and over-estimate the number of live tuples.
> > >
> >
> > Sure, but without patch also, it can do so, if the vacuum hasn't updated 
> > freespace map.
>
> Okay, then maybe we don't need to do anything else here.
>

Shall we add a note to the docs of pg_freespacemap and
pgstattuple_approx indicating that for small relations, FSM won't be
created, so these functions won't give appropriate value?  Or other
possibility could be that we return an error if the block number is
less than the threshold value, but not sure if that is a good
alternative as that can happen today also if the vacuum hasn't run on
the table.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-02-05 Thread John Naylor
On Tue, Feb 5, 2019 at 4:04 AM Amit Kapila  wrote:
>
> On Mon, Feb 4, 2019 at 2:27 PM John Naylor  
> wrote:
> >
> > 1. Earlier, I had a test to ensure that free space towards the front
> > of the relation was visible with no FSM. In [1], I rewrote it without
> > using vacuum, so we can consider adding it back now if desired. I can
> > prepare a patch for this.
> >
>
> Yes, this is required.  It is generally a good practise to add test (unless 
> it takes a lot of time) which covers new code/functionality.
>
> > 2. As a follow-on, since we don't rely on vacuum to remove dead rows,
> > we could try putting the fsm.sql test in some existing group in the
> > parallel schedule, rather than its own group is it is now.
> >
>
> +1.

This is done in 0001.

> > 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this
> > patch's effects on GetRecordedFreeSpace(), which will return zero for
> > tables with no FSM.
> >
>
> Right, but what exactly we want to do for it?  Do you want to add a comment 
> atop of this function?

Hmm, the comment already says "according to the FSM", so maybe it's
already obvious. I was thinking more about maybe commenting the
callsite where it's helpful, as in 0002.

> > The other callers are in:
> > contrib/pg_freespacemap/pg_freespacemap.c
> > contrib/pgstattuple/pgstatapprox.c
> >
> > For pg_freespacemap, this doesn't matter, since it's just reporting
> > the facts. For pgstattuple_approx(), it might under-estimate the free
> > space and over-estimate the number of live tuples.
> >
>
> Sure, but without patch also, it can do so, if the vacuum hasn't updated 
> freespace map.

Okay, then maybe we don't need to do anything else here.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ff821d820630313e4dcea65aa1b35d47a0736c64 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Tue, 5 Feb 2019 10:32:19 +0100
Subject: [PATCH v1 1/2] Improve FSM regression test

In the interest of caution, in commit b0eaa4c51bb we left out a
test that used vacuum to remove dead rows. This test has been
rewritten to use fillfactor instead to control free space. Since
we no longer need to remove dead rows as part of the test, put
the fsm regression test in a parallel group.
---
 src/test/regress/expected/fsm.out  | 25 ++---
 src/test/regress/parallel_schedule |  8 +---
 src/test/regress/sql/fsm.sql   | 22 --
 3 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index b02993188c..d747c1a7dd 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -2,14 +2,33 @@
 -- Free Space Map test
 --
 CREATE TABLE fsm_check_size (num int, str text);
--- With one block, there should be no FSM
-INSERT INTO fsm_check_size VALUES(1, 'a');
+-- Fill 3 blocks with one record each
+ALTER TABLE fsm_check_size SET (fillfactor=15);
+INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
+FROM generate_series(1,3) i;
+-- There should be no FSM
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
 ---+--
-  8192 |0
+ 24576 |0
+(1 row)
+
+-- Fill most of the last block
+ALTER TABLE fsm_check_size SET (fillfactor=100);
+INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
+FROM generate_series(101,105) i;
+-- Make sure records can go into any block but the last one
+ALTER TABLE fsm_check_size SET (fillfactor=30);
+-- Insert large record and make sure it does not cause the relation to extend
+INSERT INTO fsm_check_size VALUES (111, rpad('', 1024, 'a'));
+VACUUM fsm_check_size;
+SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
+pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
+ heap_size | fsm_size 
+---+--
+ 24576 |0
 (1 row)
 
 -- Extend table with enough blocks to exceed the FSM threshold
diff --git a/src/test/regress/parallel_schedule b/src/test/regress/parallel_schedule
index 4051a4ad4e..a956775dd1 100644
--- a/src/test/regress/parallel_schedule
+++ b/src/test/regress/parallel_schedule
@@ -23,7 +23,7 @@ test: numerology
 # --
 # The second group of parallel tests
 # --
-test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval inet macaddr macaddr8 tstypes
+test: point lseg line box path polygon circle date time timetz timestamp timestamptz interval inet macaddr macaddr8 tstypes fsm
 
 # --
 # Another group of parallel tests
@@ -68,12 +68,6 @@ test: create_aggregate create_function_3 create_cast constraints triggers inheri
 # --
 test: sanity_check
 
-# --
-# fsm does a delete followed by vacuum, and running it in parallel can prevent
-# removal of rows.
-# --
-test: fsm
-
 # --
 # 

Re: WIP: Avoid creation of the free space map for small tables

2019-02-04 Thread Amit Kapila
On Mon, Feb 4, 2019 at 2:27 PM John Naylor 
wrote:
>
> On Mon, Feb 4, 2019 at 8:41 AM Amit Kapila 
wrote:
> >
> > The change seems to have worked.  All the buildfarm machines that were
> > showing the failure are passed now.
>
> Excellent!
>
> Now that the buildfarm is green as far as this patch goes,
>

There is still one recent failure which I don't think is related to commit
of this patch:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=bowerbird=2019-02-04%2016%3A38%3A48

==
pgsql.build/src/bin/pg_ctl/tmp_check/log/004_logrotate_primary.log
===
TRAP: FailedAssertion("!(UsedShmemSegAddr != ((void *)0))", File:
"g:\prog\bf\root\head\pgsql.build\src\backend\port\win32_shmem.c", Line:
513)

I think we need to do something about this random failure, but not as part
of this thread/patch.


> I will
> touch on a few details to round out development in this area:
>
> 1. Earlier, I had a test to ensure that free space towards the front
> of the relation was visible with no FSM. In [1], I rewrote it without
> using vacuum, so we can consider adding it back now if desired. I can
> prepare a patch for this.
>

Yes, this is required.  It is generally a good practise to add test (unless
it takes a lot of time) which covers new code/functionality.

> 2. As a follow-on, since we don't rely on vacuum to remove dead rows,
> we could try putting the fsm.sql test in some existing group in the
> parallel schedule, rather than its own group is it is now.
>

+1.

> 3. While looking at 0d1fe9f74e, it occurred to me that I ignored this
> patch's effects on GetRecordedFreeSpace(), which will return zero for
> tables with no FSM.
>

Right, but what exactly we want to do for it?  Do you want to add a comment
atop of this function?

> The other callers are in:
> contrib/pg_freespacemap/pg_freespacemap.c
> contrib/pgstattuple/pgstatapprox.c
>
> For pg_freespacemap, this doesn't matter, since it's just reporting
> the facts. For pgstattuple_approx(), it might under-estimate the free
> space and over-estimate the number of live tuples.
>

Sure, but without patch also, it can do so, if the vacuum hasn't updated
freespace map.

> This might be fine,
> since it is approximate after all, but maybe a comment would be
> helpful. If this is a problem, we could tweak it to be more precise
> for tables without FSMs.
>

Sounds reasonable to me.

> Thoughts?
>
> 4. The latest patch for the pg_upgrade piece was in [2]
>

It will be good if we get this one as well.  I will look into it once we
are done with the other points you have mentioned.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: WIP: Avoid creation of the free space map for small tables

2019-02-04 Thread John Naylor
On Mon, Feb 4, 2019 at 8:41 AM Amit Kapila  wrote:
> > Yeah that can also work, but we still need to be careful about the
> > alignment of that one tuple, otherwise, there will could be different
> > free space on the fifth page.  The probably easier way could be to use
> > an even number of integers in the table say(int, int).  Anyway, for
> > now, I have avoided the dependency on FSM contents without losing on
> > coverage of test.  I have pushed my latest suggestion in the previous
> > email.
> >
>
> The change seems to have worked.  All the buildfarm machines that were
> showing the failure are passed now.

Excellent!

Now that the buildfarm is green as far as this patch goes, I will
touch on a few details to round out development in this area:

1. Earlier, I had a test to ensure that free space towards the front
of the relation was visible with no FSM. In [1], I rewrote it without
using vacuum, so we can consider adding it back now if desired. I can
prepare a patch for this.

2. As a follow-on, since we don't rely on vacuum to remove dead rows,
we could try putting the fsm.sql test in some existing group in the
parallel schedule, rather than its own group is it is now.

3. While looking at 0d1fe9f74e, it occurred to me that I ignored this
patch's effects on GetRecordedFreeSpace(), which will return zero for
tables with no FSM. The other callers are in:
contrib/pg_freespacemap/pg_freespacemap.c
contrib/pgstattuple/pgstatapprox.c

For pg_freespacemap, this doesn't matter, since it's just reporting
the facts. For pgstattuple_approx(), it might under-estimate the free
space and over-estimate the number of live tuples. This might be fine,
since it is approximate after all, but maybe a comment would be
helpful. If this is a problem, we could tweak it to be more precise
for tables without FSMs. Thoughts?

4. The latest patch for the pg_upgrade piece was in [2]

Anything else?

[1] 
https://www.postgresql.org/message-id/CACPNZCvEXLUx10pFvNcOs88RvqemMEjOv7D9MhL3ac86EzjAOA%40mail.gmail.com

[2] 
https://www.postgresql.org/message-id/CACPNZCu4cOdm3uGnNEGXivy7Gz8UWyQjynDpdkPGabQ18_zK6g%40mail.gmail.com

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread Amit Kapila
On Mon, Feb 4, 2019 at 10:29 AM Amit Kapila  wrote:
>
> On Mon, Feb 4, 2019 at 10:18 AM John Naylor  
> wrote:
> >
> > On Mon, Feb 4, 2019 at 4:17 AM Amit Kapila  wrote:
> > > This one seems to be FSM test portability issue (due to different page
> > > contents, maybe).  Looking into it, John, see if you are around and
> > > have some thoughts on it.
> >
> > Maybe we can use the same plpgsql loop as fsm.sql that exits after 1
> > tuple has inserted into the 5th page.
> >
>
> Yeah that can also work, but we still need to be careful about the
> alignment of that one tuple, otherwise, there will could be different
> free space on the fifth page.  The probably easier way could be to use
> an even number of integers in the table say(int, int).  Anyway, for
> now, I have avoided the dependency on FSM contents without losing on
> coverage of test.  I have pushed my latest suggestion in the previous
> email.
>

The change seems to have worked.  All the buildfarm machines that were
showing the failure are passed now.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread Amit Kapila
On Mon, Feb 4, 2019 at 10:18 AM John Naylor  wrote:
>
> On Mon, Feb 4, 2019 at 4:17 AM Amit Kapila  wrote:
> > This one seems to be FSM test portability issue (due to different page
> > contents, maybe).  Looking into it, John, see if you are around and
> > have some thoughts on it.
>
> Maybe we can use the same plpgsql loop as fsm.sql that exits after 1
> tuple has inserted into the 5th page.
>

Yeah that can also work, but we still need to be careful about the
alignment of that one tuple, otherwise, there will could be different
free space on the fifth page.  The probably easier way could be to use
an even number of integers in the table say(int, int).  Anyway, for
now, I have avoided the dependency on FSM contents without losing on
coverage of test.  I have pushed my latest suggestion in the previous
email.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread John Naylor
On Mon, Feb 4, 2019 at 4:17 AM Amit Kapila  wrote:
> This one seems to be FSM test portability issue (due to different page
> contents, maybe).  Looking into it, John, see if you are around and
> have some thoughts on it.

Maybe we can use the same plpgsql loop as fsm.sql that exits after 1
tuple has inserted into the 5th page.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread Amit Kapila
On Mon, Feb 4, 2019 at 9:24 AM Amit Kapila  wrote:
> On Mon, Feb 4, 2019 at 8:47 AM Amit Kapila  wrote:
>
> One more similar failure:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2019-02-04%2003%3A20%3A01
>
> So, basically, this is due to difference in the number of tuples that
> can fit on a page.  The freespace in FSM for the page is shown
> different because of available space on a particular page.  This can
> vary due to alignment.  It seems to me we can't rely on FSM contents
> if there are many tuples in a relation.  One idea is to get rid of
> dependency on FSM contents in this test, can you think of any better
> way to have consistent FSM contents across different platforms?
>

One more idea could be that we devise a test (say with a char/varchar)
such that it always consume same space in a page irrespective of its
alignment.  Yet another way could be we use explain (costs off,
analyze on, timing off, summary off) ..., this will ensure that we
will have test coverage for function fsm_page_contents, but we don't
rely on its contents.   What do you think?  I will go with last option
to stablize the buildfarm tests unless anyone thinks otherwise or has
better idea.  I willprobably wait for 20 minutes or so to see if
anyone has inputs.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread Amit Kapila
On Mon, Feb 4, 2019 at 8:47 AM Amit Kapila  wrote:
>
> On Mon, Feb 4, 2019 at 12:39 AM John Naylor  
> wrote:
> >
> > On Sun, Feb 3, 2019 at 2:06 PM Amit Kapila  wrote:
> > >
> > > On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila  
> > > wrote:
> > > This doesn't get applied cleanly after recent commit 0d1fe9f74e.
> > > Attached is a rebased version.  I have checked once that the changes
> > > done by 0d1fe9f74e don't impact this patch.  John, see if you can also
> > > once confirm whether the recent commit (0d1fe9f74e) has any impact.  I
> > > am planning to push this tomorrow morning (IST) unless you or anyone
> > > see any problem with this.
> >
> > Since that commit changes RelationAddExtraBlocks(), which can be
> > induces by your pgbench adjustment upthread, I ran make check with
> > that adjustment in the pgbench dir 300 times without triggering
> > asserts.
> >
> > I also tried to follow the logic in 0d1fe9f74e, and I believe it will
> > be correct without a FSM.
> >
>
> I have just pushed it and buildfarm has shown two failures:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary=2019-02-04%2002%3A27%3A26
>
> --- 
> /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out
> 2019-02-03 21:27:29.0 -0500
> +++ 
> /Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out
> 2019-02-03 21:41:32.0 -0500
> @@ -38,19 +38,19 @@
>  SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
>   fsm_page_contents
>  ---
> - 0: 39+
> - 1: 39+
> - 3: 39+
> - 7: 39+
> - 15: 39   +
> - 31: 39   +
> - 63: 39   +
> ..
>
> This one seems to be FSM test portability issue (due to different page
> contents, maybe).  Looking into it, John, see if you are around and
> have some thoughts on it.
>

One more similar failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2019-02-04%2003%3A20%3A01

So, basically, this is due to difference in the number of tuples that
can fit on a page.  The freespace in FSM for the page is shown
different because of available space on a particular page.  This can
vary due to alignment.  It seems to me we can't rely on FSM contents
if there are many tuples in a relation.  One idea is to get rid of
dependency on FSM contents in this test, can you think of any better
way to have consistent FSM contents across different platforms?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread Amit Kapila
On Mon, Feb 4, 2019 at 12:39 AM John Naylor  wrote:
>
> On Sun, Feb 3, 2019 at 2:06 PM Amit Kapila  wrote:
> >
> > On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila  wrote:
> > This doesn't get applied cleanly after recent commit 0d1fe9f74e.
> > Attached is a rebased version.  I have checked once that the changes
> > done by 0d1fe9f74e don't impact this patch.  John, see if you can also
> > once confirm whether the recent commit (0d1fe9f74e) has any impact.  I
> > am planning to push this tomorrow morning (IST) unless you or anyone
> > see any problem with this.
>
> Since that commit changes RelationAddExtraBlocks(), which can be
> induces by your pgbench adjustment upthread, I ran make check with
> that adjustment in the pgbench dir 300 times without triggering
> asserts.
>
> I also tried to follow the logic in 0d1fe9f74e, and I believe it will
> be correct without a FSM.
>

I have just pushed it and buildfarm has shown two failures:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary=2019-02-04%2002%3A27%3A26

--- 
/Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/expected/page.out
2019-02-03 21:27:29.0 -0500
+++ 
/Users/buildfarm/bf-data/HEAD/pgsql.build/contrib/pageinspect/results/page.out
2019-02-03 21:41:32.0 -0500
@@ -38,19 +38,19 @@
 SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
  fsm_page_contents
 ---
- 0: 39+
- 1: 39+
- 3: 39+
- 7: 39+
- 15: 39   +
- 31: 39   +
- 63: 39   +
..

This one seems to be FSM test portability issue (due to different page
contents, maybe).  Looking into it, John, see if you are around and
have some thoughts on it.

2.
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dory=2019-02-04%2002%3A30%3A25

select explain_parallel_append('execute ab_q5 (33, 44, 55)');
-explain_parallel_append

- Finalize Aggregate (actual rows=1 loops=1)
-   ->  Gather (actual rows=3 loops=1)
- Workers Planned: 2
- Workers Launched: 2
- ->  Partial Aggregate (actual rows=1 loops=3)
-   ->  Parallel Append (actual rows=0 loops=N)
- Subplans Removed: 8
- ->  Parallel Seq Scan on ab_a1_b1 (never executed)
-   Filter: ((b < 4) AND (a = ANY (ARRAY[$1, $2, $3])))
-(9 rows)
-
+ERROR:  lost connection to parallel worker
+CONTEXT:  PL/pgSQL function explain_parallel_append(text) line 5 at
FOR over EXECUTE statement
 -- Test Parallel Append with PARAM_EXEC Params
 select explain_parallel_append('select count(*) from ab where (a =
(select 1) or a = (select 3)) and b = 2');
  explain_parallel_append

Failure is something like:

2019-02-03 21:44:42.456 EST [2812:327] pg_regress/partition_prune LOG:
 statement: select explain_parallel_append('execute ab_q5 (1, 1, 1)');
2019-02-03 21:44:42.493 EST [2812:328] pg_regress/partition_prune LOG:
 statement: select explain_parallel_append('execute ab_q5 (2, 3, 3)');
2019-02-03 21:44:42.531 EST [2812:329] pg_regress/partition_prune LOG:
 statement: select explain_parallel_append('execute ab_q5 (33, 44,
55)');
2019-02-04 02:44:42.552 GMT [4172] FATAL:  could not reattach to
shared memory (key=01B4, addr=0198): error
code 487
2019-02-03 21:44:42.555 EST [5116:6] LOG:  background worker "parallel
worker" (PID 4172) exited with exit code 1
2019-02-03 21:44:42.560 EST [2812:330] pg_regress/partition_prune
ERROR:  lost connection to parallel worker

I don't think this is related to this commit.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread John Naylor
On Sun, Feb 3, 2019 at 2:06 PM Amit Kapila  wrote:
>
> On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila  wrote:
> This doesn't get applied cleanly after recent commit 0d1fe9f74e.
> Attached is a rebased version.  I have checked once that the changes
> done by 0d1fe9f74e don't impact this patch.  John, see if you can also
> once confirm whether the recent commit (0d1fe9f74e) has any impact.  I
> am planning to push this tomorrow morning (IST) unless you or anyone
> see any problem with this.

Since that commit changes RelationAddExtraBlocks(), which can be
induces by your pgbench adjustment upthread, I ran make check with
that adjustment in the pgbench dir 300 times without triggering
asserts.

I also tried to follow the logic in 0d1fe9f74e, and I believe it will
be correct without a FSM.

[1] 
https://www.postgresql.org/message-id/CAA4eK1KRByXY03qR2JvUjUxKBzpBnCSO5H19oAC%3D_v4r5dzTwQ%40mail.gmail.com

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-02-03 Thread Amit Kapila
On Thu, Jan 31, 2019 at 6:03 PM Amit Kapila  wrote:
> On Thu, Jan 31, 2019 at 2:02 PM John Naylor  
> wrote:
> >
> >
> > FYI, the second comment is still present in v20.
> >
>
> oops, forgot to include in commit after making a change, done now.
>

This doesn't get applied cleanly after recent commit 0d1fe9f74e.
Attached is a rebased version.  I have checked once that the changes
done by 0d1fe9f74e don't impact this patch.  John, see if you can also
once confirm whether the recent commit (0d1fe9f74e) has any impact.  I
am planning to push this tomorrow morning (IST) unless you or anyone
see any problem with this.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v22-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-02-02 Thread Amit Kapila
On Sat, Feb 2, 2019 at 7:30 AM Amit Kapila  wrote:
>
> On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila  wrote:
> >
> > On Mon, Jan 28, 2019 at 10:03 AM John Naylor
> >  wrote:
>
> In the past few days, we have done a further analysis of each problem
> and tried to reproduce it.  We are successful in generating some form
> of reproducer for 3 out of 4 problems in the same way as it was failed
> in the buildfarm.   For the fourth symptom, we have tried a lot (even
> Andrew Dunstan has helped us to run the regression tests with the
> faulty commit on Jacana for many hours, but it didn't got reproduced)
> but not able to regenerate a failure in a similar way.  However, I
> have a theory as mentioned below why the particular test could fail
> and the fix for the same is done in the patch.  I am planning to push
> the latest version of the patch [1] which has fixes for all the
> symptoms.
>

Today, I have spent some more time to generate a test which can
reproduce the failure though it is with the help of breakpoints.  See
below:

> >
> > 4.  Failure on jacana:
> > --- 
> > c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out
> > 2018-09-26
> > 17:53:33 -0400
> > +++ 
> > c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out
> > 2019-01-27 23:14:35
> > -0500
> > @@ -252,332 +252,7 @@
> >  ('(0,100)(0,infinity)'),
> >  ('(-infinity,0)(0,infinity)'),
> >  ('(-infinity,-infinity)(infinity,infinity)');
> > -SET enable_seqscan = false;
> > -SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)';
> > ..
> > ..
> > TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File:
> > "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c",
> > Line:
> > 1118)
> > ..
> > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG:  server process
> > (PID 14388) exited with exit code 3
> > 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL:  Failed process
> > was running: INSERT INTO box_temp
> > VALUES (NULL),
> >
> > I think the reason for this failure is same as previous (as mentioned
> > in point-3), but this can happen in a different way.  Say, we have
> > searched the local map and then try to extend a relation 'X' and in
> > the meantime, another backend has extended such that it creates FSM.
> > Now, we will reuse that page and won't clear local map.  Now, say we
> > try to insert in relation 'Y' which doesn't have FSM.  It will try to
> > set the local map and will find that it already exists, so will fail.
> > Now, the question is how it can happen in this box.sql test.  I guess
> > that is happening for some system table which is being populated by
> > Create Index statement executed just before the failing Insert.
> >

Based on the above theory, the test is as below:

Session-1
---
postgres=# create table test_1(c1 int, c2 char(1500));
CREATE TABLE
postgres=# create table test_2(c1 int, c2 char(1500));
CREATE TABLE
postgres=# insert into test_1 values(generate_series(1,20),'');
INSERT 0 20
postgres=# insert into test_2 values(1,'');
INSERT 0 1

Session-2

postgres=# analyze test_1;
ANALYZE
postgres=# analyze test_2;
ANALYZE
postgres=# select oid, relname, relpages from pg_class where relname
like 'test%';
  oid  |relname | relpages
---++--
 41835 | test_1 |4
 41838 | test_2 |1

Till here we can see that test_1 has 4 pages and test2_1 has 1 page.
At this stage, even one more record insertion in test_1 will create a
new page.  So, now we have to hit the scenario with the help of
debugger.   For session-1, attach debugger and put breakpoint in
RelationGetBufferForTuple().

Session-1

postgres=# insert into test_1 values(21,'');

It will hit the breakpoint in RelationGetBufferForTuple().  Now, add
one more breakpoint on line 542 in hio.c, aka below line:
RelationGetBufferForTuple
{
..
else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock))

Press continue and stop the debugger at line 542.  Attach the debugger
for session-2 and add a breakpoint on line 580 in hio.c, aka below
line:
RelationGetBufferForTuple
{
..
buffer = ReadBufferBI(relation, P_NEW, bistate);

Session-2
---
postgres=# insert into test_1 values(22,'');

It will hit the breakpoint in RelationGetBufferForTuple().   Now
proceed with debugging on session-1 by one step.  This is to ensure
that session-1 doesn't get a conditional lock. Now, continue the
debugger in session-2 and after that run vacuum test_1 from session-2,
this will ensure that FSM is created for relation test_1.  So the
state of session-2 will be as below:

Session-2

postgres=# insert into test_1 values(22,'');
INSERT 0 1
postgres=# vacuum test_1;
VACUUM
postgres=# select oid, relname, relpages from pg_class where relname
like 'test%';
  oid  |relname | relpages

Re: WIP: Avoid creation of the free space map for small tables

2019-02-01 Thread Amit Kapila
On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila  wrote:
>
> On Mon, Jan 28, 2019 at 10:03 AM John Naylor
>  wrote:
> >
> > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila  wrote:
> > > There are a few buildfarm failures due to this commit, see my email on
> > > pgsql-committers.  If you have time, you can also once look into
> > > those.
> >
> > I didn't see anything in common with the configs of the failed
> > members. None have a non-default BLCKSZ that I can see.
> >
>
> I have done an analysis of the different failures on buildfarm.
>

In the past few days, we have done a further analysis of each problem
and tried to reproduce it.  We are successful in generating some form
of reproducer for 3 out of 4 problems in the same way as it was failed
in the buildfarm.   For the fourth symptom, we have tried a lot (even
Andrew Dunstan has helped us to run the regression tests with the
faulty commit on Jacana for many hours, but it didn't got reproduced)
but not able to regenerate a failure in a similar way.  However, I
have a theory as mentioned below why the particular test could fail
and the fix for the same is done in the patch.  I am planning to push
the latest version of the patch [1] which has fixes for all the
symptoms. Does anybody have any opinion here?

>
> 4.  Failure on jacana:
> --- 
> c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out
> 2018-09-26
> 17:53:33 -0400
> +++ 
> c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out
> 2019-01-27 23:14:35
> -0500
> @@ -252,332 +252,7 @@
>  ('(0,100)(0,infinity)'),
>  ('(-infinity,0)(0,infinity)'),
>  ('(-infinity,-infinity)(infinity,infinity)');
> -SET enable_seqscan = false;
> -SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)';
> ..
> ..
> TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File:
> "c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c",
> Line:
> 1118)
> ..
> 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG:  server process
> (PID 14388) exited with exit code 3
> 2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL:  Failed process
> was running: INSERT INTO box_temp
> VALUES (NULL),
>
> I think the reason for this failure is same as previous (as mentioned
> in point-3), but this can happen in a different way.  Say, we have
> searched the local map and then try to extend a relation 'X' and in
> the meantime, another backend has extended such that it creates FSM.
> Now, we will reuse that page and won't clear local map.  Now, say we
> try to insert in relation 'Y' which doesn't have FSM.  It will try to
> set the local map and will find that it already exists, so will fail.
> Now, the question is how it can happen in this box.sql test.  I guess
> that is happening for some system table which is being populated by
> Create Index statement executed just before the failing Insert.
>
> I think both 3 and 4 are timing issues, so we didn't got in our local
> regression runs.
>

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2B3ajhRPC0jvUi6p_aMrTUpB568OBH10LrbHtvOLNTgqQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread Amit Kapila
On Thu, Jan 31, 2019 at 9:18 PM John Naylor  wrote:
>
> On Thu, Jan 31, 2019 at 4:06 PM Amit Kapila  wrote:
> > I don't think that moving fsm tests to brin would be a good approach.
> > We want to have a separate test for each access method.  I think if we
> > want to do something to avoid portability issues, maybe we can do what
> > Masahiko San has just suggested.
>
> We could also use the same plpgsql loop as in fsm.sql to check the ctid, 
> right?
>

Yes, however, I feel we should leave it as it is for now unless we see
any risk of portability issues.  The only reason to do that way is to
avoid any failure for bigger block size (say BLCKSZ is 16KB or 32KB).
 Does anyone else have any opinion on whether we should try to write
tests which should care for bigger block size?  I see that existing
regression tests fail if we configure with bigger block size, so not
sure if we should try to avoid that here.  In an ideal scenario,  I
think it would be good if we can write tests which pass on all kind of
block sizes, that will make the life easier if tomorrow one wants to
set up a buildfarm or do the testing for bigger block sizes.

> > OTOH, I think we are just good w.r.t
> > this issue with the last patch I sent.  I think unless we see some
> > problem here, we should put energy into having a reproducible test for
> > the fourth problem mentioned in my mail up thread [1].  Do you think
> > it makes sense to run make check in loop for multiple times or do you
> > have any idea how we can have a reproducible test?
>
> Okay. Earlier I tried running make installcheck with
> force_parallel_mode='regress', but didn't get a failure.
>

AFAICS, this failure was not for force_parallel_mode='regress'.  See
the config at [1].

> I may not
> have run enough times, though.
>

Yeah, probably running make check or make installcheck many times
would help, but not sure.

> I'll have to think about how to induce
> it.
>

Thanks!

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jacana=2019-01-28%2004%3A00%3A23

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread John Naylor
On Thu, Jan 31, 2019 at 4:06 PM Amit Kapila  wrote:
> I don't think that moving fsm tests to brin would be a good approach.
> We want to have a separate test for each access method.  I think if we
> want to do something to avoid portability issues, maybe we can do what
> Masahiko San has just suggested.

We could also use the same plpgsql loop as in fsm.sql to check the ctid, right?

> OTOH, I think we are just good w.r.t
> this issue with the last patch I sent.  I think unless we see some
> problem here, we should put energy into having a reproducible test for
> the fourth problem mentioned in my mail up thread [1].  Do you think
> it makes sense to run make check in loop for multiple times or do you
> have any idea how we can have a reproducible test?

Okay. Earlier I tried running make installcheck with
force_parallel_mode='regress', but didn't get a failure. I may not
have run enough times, though. I'll have to think about how to induce
it.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread Amit Kapila
On Thu, Jan 31, 2019 at 7:23 PM John Naylor  wrote:
>
> On Thu, Jan 31, 2019 at 1:52 PM John Naylor  
> wrote:
> >
> > On Thu, Jan 31, 2019 at 1:43 PM Amit Kapila  wrote:
> > > > I have an idea -- instead of adding a bunch of records and hoping that
> > > > the relation size and free space is consistent across platforms, how
> > > > about we revert to the original test input, and add a BRIN index? That
> > > > should have a FSM even with one record.
> > > >
> > >
> > > Why would BRIN index allow having FSM for heap relation?
> >
> > Oops, I forgot this file is for testing heaps only. That said, we
> > could possibly put most of the FSM tests such as
> >
> > SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
> >
> > into brin.sql since we know a non-empty BRIN index will have a FSM.
>
> As in the attached. Applies on top of v20. First to revert to HEAD,
> second to move FSM tests to brin.sql. This is a much less invasive and
> more readable patch, in addition to being hopefully more portable.
>

I don't think that moving fsm tests to brin would be a good approach.
We want to have a separate test for each access method.  I think if we
want to do something to avoid portability issues, maybe we can do what
Masahiko San has just suggested.  OTOH, I think we are just good w.r.t
this issue with the last patch I sent.  I think unless we see some
problem here, we should put energy into having a reproducible test for
the fourth problem mentioned in my mail up thread [1].  Do you think
it makes sense to run make check in loop for multiple times or do you
have any idea how we can have a reproducible test?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1L%3DqWp_bJ5aTc9%2Bfy4Ewx2LPaLWY-RbR4a60g_rupCKnQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread Masahiko Sawada
On Thu, Jan 31, 2019 at 6:41 AM Amit Kapila  wrote:
>
> On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada  
> wrote:
> >
> > On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
> > >
> > > On Tue, Jan 29, 2019 at 8:12 PM John Naylor  
> > > wrote:
> > > >
> > > > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  
> > > > wrote:
> > > >
> > > > > You can find this change in attached patch.  Then, I ran the make
> > > > > check in src/bin/pgbench multiple times using test_conc_insert.sh.
> > > > > You can vary the number of times the test should run, if you are not
> > > > > able to reproduce it with this.
> > > > >
> > > > > The attached patch (clear_local_map_if_exists_1.patch) atop the main
> > > > > patch fixes the issue for me.  Kindly verify the same.
> > > >
> > > > I got one failure in 50 runs. With the new patch, I didn't get any
> > > > failures in 300 runs.
> > > >
> > >
> > > Thanks for verification.  I have included it in the attached patch and
> > > I have also modified the page.sql test to have enough number of pages
> > > in relation so that FSM will get created irrespective of alignment
> > > boundaries.  Masahiko San, can you verify if this now works for you?
> > >
> >
> > Thank you for updating the patch!
> >
> > The modified page.sql test could fail if the block size is more than
> > 8kB?
>
> That's right, but I don't think current regression tests will work for
> block size greater than 8KB.  I have tried with 16 and 32 as block
> size, there were few failures on the head itself.

Understood. That means that no build farm configures other block size than 8kB.

>
> > We can ensure the number of pages are more than 4 by checking it
> > and adding more data if no enough but I'm really not sure we should
> > care the bigger-block size cases.
> >
>
> Yeah, I am not sure either.  I think as this is an existing test, we
> should not try to change it too much.  However, if both you and John
> feel it is better to change, we can go with that.
>

So I think the patch you proposed looks good to me but how about
adding the check whether the table is more than 4 pages? For example,

SELECT (pg_relation_size('test_rel_forks') /
current_setting('block_size')::int) > 4;


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread John Naylor
On Thu, Jan 31, 2019 at 1:52 PM John Naylor  wrote:
>
> On Thu, Jan 31, 2019 at 1:43 PM Amit Kapila  wrote:
> > > I have an idea -- instead of adding a bunch of records and hoping that
> > > the relation size and free space is consistent across platforms, how
> > > about we revert to the original test input, and add a BRIN index? That
> > > should have a FSM even with one record.
> > >
> >
> > Why would BRIN index allow having FSM for heap relation?
>
> Oops, I forgot this file is for testing heaps only. That said, we
> could possibly put most of the FSM tests such as
>
> SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
>
> into brin.sql since we know a non-empty BRIN index will have a FSM.

As in the attached. Applies on top of v20. First to revert to HEAD,
second to move FSM tests to brin.sql. This is a much less invasive and
more readable patch, in addition to being hopefully more portable.

> And in page.sql we could just have a test that the table has no FSM.

This is not possible, since we don't know the relfilenode for the
error text, and it's not important. Better to have everything in
brin.sql.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
commit 18b444cdfc2c50502389b84f4504ec9c7f2ab831
Author: John Naylor 
Date:   Thu Jan 31 14:20:14 2019 +0100

revert contrib so I can redo

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 89b73ca991..3fcd9fbe6d 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,69 +1,48 @@
 CREATE EXTENSION pageinspect;
-CREATE TABLE test_rel_forks (a int);
--- Make sure there are enough blocks in the heap for the FSM to be created.
-INSERT INTO test_rel_forks SELECT i from generate_series(1,2000) i;
--- set up FSM and VM
-VACUUM test_rel_forks;
+CREATE TABLE test1 (a int, b int);
+INSERT INTO test1 VALUES (16777217, 131584);
+VACUUM test1;  -- set up FSM
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
-SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
+SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
  main_0 
 
8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
-ERROR:  block number 100 is out of range for relation "test_rel_forks"
-SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1;
+ERROR:  block number 1 is out of range for relation "test1"
+SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
  fsm_0 
 ---
   8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 20)) AS fsm_20;
-ERROR:  block number 20 is out of range for relation "test_rel_forks"
-SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
+ fsm_1 
+---
+  8192
+(1 row)
+
+SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0;
  vm_0 
 --
  8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;
-ERROR:  block number 1 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1;
+ERROR:  block number 1 is out of range for relation "test1"
 SELECT octet_length(get_raw_page('xxx', 'main', 0));
 ERROR:  relation "xxx" does not exist
-SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0));
+SELECT octet_length(get_raw_page('test1', 'xxx', 0));
 ERROR:  invalid fork name
 HINT:  Valid fork names are "main", "fsm", "vm", and "init".
-SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
- fsm_page_contents 

- 0: 39+
- 1: 39+
- 3: 39+
- 7: 39+
- 15: 39   +
- 31: 39   +
- 63: 39   +
- 127: 39  +
- 255: 39  +
- 511: 39  +
- 1023: 39 +
- 2047: 39 +
- 4095: 39 +
- fp_next_slot: 0  +
- 
-(1 row)
-
-SELECT get_raw_page('test_rel_forks', 0) = get_raw_page('test_rel_forks', 'main', 0);
+SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0);
  ?column? 
 --
  t
 (1 row)
 
-DROP TABLE test_rel_forks;
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
 SELECT pagesize, version FROM page_header(get_raw_page('test1', 0));
  pagesize | version 
 --+-
@@ -83,6 +62,26 @@ SELECT tuple_data_split('test1'::regclass, t_data, t_infomask, t_infomask2, t_bi
  {"\\x0101","\\x00020200"}
 (1 row)
 
+SELECT * FROM fsm_page_contents(get_raw_page('test1', 'fsm', 0));
+ fsm_page_contents 
+---
+ 0: 254   +
+ 1: 254   +
+ 3: 254   +
+ 7: 254   +
+ 15: 254  +
+ 31: 254  +
+ 63: 254  +
+ 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread John Naylor
On Thu, Jan 31, 2019 at 1:43 PM Amit Kapila  wrote:
> > I have an idea -- instead of adding a bunch of records and hoping that
> > the relation size and free space is consistent across platforms, how
> > about we revert to the original test input, and add a BRIN index? That
> > should have a FSM even with one record.
> >
>
> Why would BRIN index allow having FSM for heap relation?

Oops, I forgot this file is for testing heaps only. That said, we
could possibly put most of the FSM tests such as

SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));

into brin.sql since we know a non-empty BRIN index will have a FSM.
And in page.sql we could just have a test that the table has no FSM.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread Amit Kapila
On Thu, Jan 31, 2019 at 2:12 PM John Naylor  wrote:
>
> On Thu, Jan 31, 2019 at 6:41 AM Amit Kapila  wrote:
> >
> > On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada  
> > wrote:
> > >
> > > The modified page.sql test could fail if the block size is more than
> > > 8kB?
> >
> > That's right, but I don't think current regression tests will work for
> > block size greater than 8KB.  I have tried with 16 and 32 as block
> > size, there were few failures on the head itself.
> >
> > > We can ensure the number of pages are more than 4 by checking it
> > > and adding more data if no enough but I'm really not sure we should
> > > care the bigger-block size cases.
> > >
> >
> > Yeah, I am not sure either.  I think as this is an existing test, we
> > should not try to change it too much.  However, if both you and John
> > feel it is better to change, we can go with that.
>
> I have an idea -- instead of adding a bunch of records and hoping that
> the relation size and free space is consistent across platforms, how
> about we revert to the original test input, and add a BRIN index? That
> should have a FSM even with one record.
>

Why would BRIN index allow having FSM for heap relation?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread Amit Kapila
On Thu, Jan 31, 2019 at 2:02 PM John Naylor  wrote:
>
> On Thu, Jan 31, 2019 at 6:37 AM Amit Kapila  wrote:
> >
> > On Wed, Jan 30, 2019 at 8:11 PM John Naylor  
> > wrote:
> > >
> > > That's probably a good idea to limit risk. I just very basic tests
> > > now, and vacuum before every relation size check to make sure any FSM
> > > extension (whether desired or not) is invoked. Also, in my last patch
> > > I forgot to implement explicit checks of the block number instead of
> > > assuming how many rows will fit on a page. I've used a plpgsql code
> > > block to do this.
> > >
> >
> > -- Extend table with enough blocks to exceed the FSM threshold
> >  -- FSM is created and extended to 3 blocks
> >
> > The second comment line seems redundant to me, so I have removed that
> > and integrated it in the main patch.
>
> FYI, the second comment is still present in v20.
>

oops, forgot to include in commit after making a change, done now.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v21-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread John Naylor
On Thu, Jan 31, 2019 at 6:41 AM Amit Kapila  wrote:
>
> On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada  
> wrote:
> >
> > The modified page.sql test could fail if the block size is more than
> > 8kB?
>
> That's right, but I don't think current regression tests will work for
> block size greater than 8KB.  I have tried with 16 and 32 as block
> size, there were few failures on the head itself.
>
> > We can ensure the number of pages are more than 4 by checking it
> > and adding more data if no enough but I'm really not sure we should
> > care the bigger-block size cases.
> >
>
> Yeah, I am not sure either.  I think as this is an existing test, we
> should not try to change it too much.  However, if both you and John
> feel it is better to change, we can go with that.

I have an idea -- instead of adding a bunch of records and hoping that
the relation size and free space is consistent across platforms, how
about we revert to the original test input, and add a BRIN index? That
should have a FSM even with one record.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-31 Thread John Naylor
On Thu, Jan 31, 2019 at 6:37 AM Amit Kapila  wrote:
>
> On Wed, Jan 30, 2019 at 8:11 PM John Naylor  
> wrote:
> >
> > That's probably a good idea to limit risk. I just very basic tests
> > now, and vacuum before every relation size check to make sure any FSM
> > extension (whether desired or not) is invoked. Also, in my last patch
> > I forgot to implement explicit checks of the block number instead of
> > assuming how many rows will fit on a page. I've used a plpgsql code
> > block to do this.
> >
>
> -- Extend table with enough blocks to exceed the FSM threshold
>  -- FSM is created and extended to 3 blocks
>
> The second comment line seems redundant to me, so I have removed that
> and integrated it in the main patch.

FYI, the second comment is still present in v20.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Amit Kapila
On Wed, Jan 30, 2019 at 10:41 PM Masahiko Sawada  wrote:
>
> On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
> >
> > On Tue, Jan 29, 2019 at 8:12 PM John Naylor  
> > wrote:
> > >
> > > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  
> > > wrote:
> > >
> > > > You can find this change in attached patch.  Then, I ran the make
> > > > check in src/bin/pgbench multiple times using test_conc_insert.sh.
> > > > You can vary the number of times the test should run, if you are not
> > > > able to reproduce it with this.
> > > >
> > > > The attached patch (clear_local_map_if_exists_1.patch) atop the main
> > > > patch fixes the issue for me.  Kindly verify the same.
> > >
> > > I got one failure in 50 runs. With the new patch, I didn't get any
> > > failures in 300 runs.
> > >
> >
> > Thanks for verification.  I have included it in the attached patch and
> > I have also modified the page.sql test to have enough number of pages
> > in relation so that FSM will get created irrespective of alignment
> > boundaries.  Masahiko San, can you verify if this now works for you?
> >
>
> Thank you for updating the patch!
>
> The modified page.sql test could fail if the block size is more than
> 8kB?

That's right, but I don't think current regression tests will work for
block size greater than 8KB.  I have tried with 16 and 32 as block
size, there were few failures on the head itself.

> We can ensure the number of pages are more than 4 by checking it
> and adding more data if no enough but I'm really not sure we should
> care the bigger-block size cases.
>

Yeah, I am not sure either.  I think as this is an existing test, we
should not try to change it too much.  However, if both you and John
feel it is better to change, we can go with that.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Amit Kapila
On Wed, Jan 30, 2019 at 8:11 PM John Naylor  wrote:
>
> On Wed, Jan 30, 2019 at 2:11 PM Amit Kapila  wrote:
> > This is much better than the earlier version of test and there is no
> > dependency on the vacuum.  However, I feel still there is some
> > dependency on how the rows will fit in a page and we have seen some
> > related failures due to alignment stuff.  By looking at the test, I
> > can't envision any such problem, but how about if we just write some
> > simple tests where we can check that the FSM won't be created for very
> > small number of records say one or two and then when we increase the
> > records FSM gets created, here if we want, we can even use vacuum to
> > ensure FSM gets created.  Once we are sure that the main patch passes
> > all the buildfarm tests, we can extend the test to something advanced
> > as you are proposing now.  I think that will reduce the chances of
> > failure, what do you think?
>
> That's probably a good idea to limit risk. I just very basic tests
> now, and vacuum before every relation size check to make sure any FSM
> extension (whether desired or not) is invoked. Also, in my last patch
> I forgot to implement explicit checks of the block number instead of
> assuming how many rows will fit on a page. I've used a plpgsql code
> block to do this.
>

-- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks

The second comment line seems redundant to me, so I have removed that
and integrated it in the main patch.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v20-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Masahiko Sawada
On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
>
> On Tue, Jan 29, 2019 at 8:12 PM John Naylor  
> wrote:
> >
> > On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  
> > wrote:
> >
> > > You can find this change in attached patch.  Then, I ran the make
> > > check in src/bin/pgbench multiple times using test_conc_insert.sh.
> > > You can vary the number of times the test should run, if you are not
> > > able to reproduce it with this.
> > >
> > > The attached patch (clear_local_map_if_exists_1.patch) atop the main
> > > patch fixes the issue for me.  Kindly verify the same.
> >
> > I got one failure in 50 runs. With the new patch, I didn't get any
> > failures in 300 runs.
> >
>
> Thanks for verification.  I have included it in the attached patch and
> I have also modified the page.sql test to have enough number of pages
> in relation so that FSM will get created irrespective of alignment
> boundaries.  Masahiko San, can you verify if this now works for you?
>

Thank you for updating the patch!

The modified page.sql test could fail if the block size is more than
8kB? We can ensure the number of pages are more than 4 by checking it
and adding more data if no enough but I'm really not sure we should
care the bigger-block size cases. However maybe it's good to check the
number of pages after insertion so that we can break down the issue in
case the test failed again.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread John Naylor
On Wed, Jan 30, 2019 at 2:11 PM Amit Kapila  wrote:
> This is much better than the earlier version of test and there is no
> dependency on the vacuum.  However, I feel still there is some
> dependency on how the rows will fit in a page and we have seen some
> related failures due to alignment stuff.  By looking at the test, I
> can't envision any such problem, but how about if we just write some
> simple tests where we can check that the FSM won't be created for very
> small number of records say one or two and then when we increase the
> records FSM gets created, here if we want, we can even use vacuum to
> ensure FSM gets created.  Once we are sure that the main patch passes
> all the buildfarm tests, we can extend the test to something advanced
> as you are proposing now.  I think that will reduce the chances of
> failure, what do you think?

That's probably a good idea to limit risk. I just very basic tests
now, and vacuum before every relation size check to make sure any FSM
extension (whether desired or not) is invoked. Also, in my last patch
I forgot to implement explicit checks of the block number instead of
assuming how many rows will fit on a page. I've used a plpgsql code
block to do this.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index df6b15b9d2..eb2c3842e2 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -2,46 +2,30 @@
 -- Free Space Map test
 --
 CREATE TABLE fsm_check_size (num int, str text);
--- Fill 3 blocks with as many large records as will fit
--- No FSM
-INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
+-- With one block, there should be no FSM
+INSERT INTO fsm_check_size VALUES(1, 'a');
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
 ---+--
- 24576 |0
-(1 row)
-
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
-VACUUM fsm_check_size;
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
--- Insert large record and make sure it goes in block 0 rather than
--- causing the relation to extend
-INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a'));
-SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
-pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- heap_size | fsm_size 
+--
- 24576 |0
+  8192 |0
 (1 row)
 
 -- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks
-INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-24576
-(1 row)
-
--- Truncate heap to 1 block
--- No change in FSM
-DELETE FROM fsm_check_size WHERE num > 7;
+DO $$
+DECLARE curtid tid;
+num int;
+BEGIN
+num = 11;
+  LOOP
+INSERT INTO fsm_check_size VALUES (num, 'b') RETURNING ctid INTO curtid;
+EXIT WHEN curtid >= tid '(4, 0)';
+num = num + 1;
+  END LOOP;
+END;
+$$;
 VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  fsm_size 
@@ -49,16 +33,6 @@ SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 24576
 (1 row)
 
--- Truncate heap to 0 blocks
--- FSM now truncated to 2 blocks
-DELETE FROM fsm_check_size;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-16384
-(1 row)
-
 -- Add long random string to extend TOAST table to 1 block
 INSERT INTO fsm_check_size
 VALUES(0, (SELECT string_agg(md5(chr(i)), '')
diff --git a/src/test/regress/sql/fsm.sql b/src/test/regress/sql/fsm.sql
index 07f505591a..b1debedea1 100644
--- a/src/test/regress/sql/fsm.sql
+++ b/src/test/regress/sql/fsm.sql
@@ -4,42 +4,28 @@
 
 CREATE TABLE fsm_check_size (num int, str text);
 
--- Fill 3 blocks with as many large records as will fit
--- No FSM
-INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
-pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
+-- With one block, there should be no FSM
+INSERT INTO fsm_check_size VALUES(1, 'a');
 
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
 VACUUM fsm_check_size;
-
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
-
--- Insert large record and make sure it goes in block 0 rather than
--- causing the relation to extend
-INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a'));
 SELECT pg_relation_size('fsm_check_size', 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread Amit Kapila
On Wed, Jan 30, 2019 at 3:26 PM John Naylor  wrote:
>
> On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
> > There are two more failures which we need to something about.
> > 1. Make fsm.sql independent of vacuum without much losing on coverage
> > of newly added code.  John, I guess you have an idea, see if you can
> > take care of it, otherwise, I will see what I can do for it.
>
> I've attached a patch that applies on top of v19 that uses Andrew
> Gierth's idea to use fillfactor to control free space. I've also
> removed tests that relied on truncation and weren't very useful to
> begin with.
>

This is much better than the earlier version of test and there is no
dependency on the vacuum.  However, I feel still there is some
dependency on how the rows will fit in a page and we have seen some
related failures due to alignment stuff.  By looking at the test, I
can't envision any such problem, but how about if we just write some
simple tests where we can check that the FSM won't be created for very
small number of records say one or two and then when we increase the
records FSM gets created, here if we want, we can even use vacuum to
ensure FSM gets created.  Once we are sure that the main patch passes
all the buildfarm tests, we can extend the test to something advanced
as you are proposing now.  I think that will reduce the chances of
failure, what do you think?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-30 Thread John Naylor
On Wed, Jan 30, 2019 at 4:33 AM Amit Kapila  wrote:
> There are two more failures which we need to something about.
> 1. Make fsm.sql independent of vacuum without much losing on coverage
> of newly added code.  John, I guess you have an idea, see if you can
> take care of it, otherwise, I will see what I can do for it.

I've attached a patch that applies on top of v19 that uses Andrew
Gierth's idea to use fillfactor to control free space. I've also
removed tests that relied on truncation and weren't very useful to
begin with.

--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/test/regress/expected/fsm.out b/src/test/regress/expected/fsm.out
index df6b15b9d2..f92d454042 100644
--- a/src/test/regress/expected/fsm.out
+++ b/src/test/regress/expected/fsm.out
@@ -2,11 +2,10 @@
 -- Free Space Map test
 --
 CREATE TABLE fsm_check_size (num int, str text);
--- Fill 3 blocks with as many large records as will fit
--- No FSM
+-- Fill 3 blocks with one record each
+ALTER TABLE fsm_check_size SET (fillfactor=15);
 INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
-VACUUM fsm_check_size;
+FROM generate_series(1,3) i;
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size 
@@ -14,11 +13,12 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  24576 |0
 (1 row)
 
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
-VACUUM fsm_check_size;
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
+-- Fill most of the last block
+ALTER TABLE fsm_check_size SET (fillfactor=100);
+INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
+FROM generate_series(11,15) i;
+-- Make sure records can go into any block but the last one
+ALTER TABLE fsm_check_size SET (fillfactor=30);
 -- Insert large record and make sure it goes in block 0 rather than
 -- causing the relation to extend
 INSERT INTO fsm_check_size VALUES (101, rpad('', 1024, 'a'));
@@ -32,38 +32,16 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 -- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks
 INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-24576
-(1 row)
-
--- Truncate heap to 1 block
--- No change in FSM
-DELETE FROM fsm_check_size WHERE num > 7;
-VACUUM fsm_check_size;
 SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  fsm_size 
 --
 24576
 (1 row)
 
--- Truncate heap to 0 blocks
--- FSM now truncated to 2 blocks
-DELETE FROM fsm_check_size;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
- fsm_size 
---
-16384
-(1 row)
-
 -- Add long random string to extend TOAST table to 1 block
 INSERT INTO fsm_check_size
 VALUES(0, (SELECT string_agg(md5(chr(i)), '')
 		   FROM generate_series(1,100) i));
-VACUUM fsm_check_size;
 SELECT pg_relation_size(reltoastrelid, 'main') AS toast_size,
 pg_relation_size(reltoastrelid, 'fsm') AS toast_fsm_size
 FROM pg_class WHERE relname = 'fsm_check_size';
diff --git a/src/test/regress/sql/fsm.sql b/src/test/regress/sql/fsm.sql
index 07f505591a..542bd6e203 100644
--- a/src/test/regress/sql/fsm.sql
+++ b/src/test/regress/sql/fsm.sql
@@ -4,20 +4,21 @@
 
 CREATE TABLE fsm_check_size (num int, str text);
 
--- Fill 3 blocks with as many large records as will fit
--- No FSM
+-- Fill 3 blocks with one record each
+ALTER TABLE fsm_check_size SET (fillfactor=15);
 INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
-FROM generate_series(1,7*3) i;
-VACUUM fsm_check_size;
+FROM generate_series(1,3) i;
+
 SELECT pg_relation_size('fsm_check_size', 'main') AS heap_size,
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 
--- Clear some space on block 0
-DELETE FROM fsm_check_size WHERE num <= 5;
-VACUUM fsm_check_size;
+-- Fill most of the last block
+ALTER TABLE fsm_check_size SET (fillfactor=100);
+INSERT INTO fsm_check_size SELECT i, rpad('', 1024, 'a')
+FROM generate_series(11,15) i;
 
--- Insert small record in block 2 to set the cached smgr targetBlock
-INSERT INTO fsm_check_size VALUES(99, 'b');
+-- Make sure records can go into any block but the last one
+ALTER TABLE fsm_check_size SET (fillfactor=30);
 
 -- Insert large record and make sure it goes in block 0 rather than
 -- causing the relation to extend
@@ -28,26 +29,12 @@ pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
 -- Extend table with enough blocks to exceed the FSM threshold
 -- FSM is created and extended to 3 blocks
 INSERT INTO fsm_check_size SELECT i, 'c' FROM generate_series(200,1200) i;
-VACUUM fsm_check_size;
-SELECT pg_relation_size('fsm_check_size', 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread Amit Kapila
On Tue, Jan 29, 2019 at 8:12 PM John Naylor  wrote:
>
> On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  wrote:
>
> > You can find this change in attached patch.  Then, I ran the make
> > check in src/bin/pgbench multiple times using test_conc_insert.sh.
> > You can vary the number of times the test should run, if you are not
> > able to reproduce it with this.
> >
> > The attached patch (clear_local_map_if_exists_1.patch) atop the main
> > patch fixes the issue for me.  Kindly verify the same.
>
> I got one failure in 50 runs. With the new patch, I didn't get any
> failures in 300 runs.
>

Thanks for verification.  I have included it in the attached patch and
I have also modified the page.sql test to have enough number of pages
in relation so that FSM will get created irrespective of alignment
boundaries.  Masahiko San, can you verify if this now works for you?

There are two more failures which we need to something about.
1. Make fsm.sql independent of vacuum without much losing on coverage
of newly added code.  John, I guess you have an idea, see if you can
take care of it, otherwise, I will see what I can do for it.
2. I still could not figure out how to verify if the failure on Jacana
will be fixed.  I have posted some theory above and the attached patch
has a solution for it, but I think it would be better if find out some
way to verify the same.

Note - you might see some cosmetic changes in freespace.c due to pgindent.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v19-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread John Naylor
On Tue, Jan 29, 2019 at 11:56 AM Amit Kapila  wrote:

> You can find this change in attached patch.  Then, I ran the make
> check in src/bin/pgbench multiple times using test_conc_insert.sh.
> You can vary the number of times the test should run, if you are not
> able to reproduce it with this.
>
> The attached patch (clear_local_map_if_exists_1.patch) atop the main
> patch fixes the issue for me.  Kindly verify the same.

I got one failure in 50 runs. With the new patch, I didn't get any
failures in 300 runs.


-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread Amit Kapila
On Tue, Jan 29, 2019 at 5:20 PM Masahiko Sawada  wrote:
>
> On Tue, Jan 29, 2019 at 9:29 AM Amit Kapila  wrote:
> >
>
> I'd suspect the alignment of integer. In my environemnt, the tuple
> actual size is 28 bytes but the aligned size is 32 bytes (=
> MAXALIGN(28)), so we can store 226 tuples to single page. But if
> MAXALIGN(28) = 28 then we can store 255 tuples and 1000 tuples fits
> within 4 pages. The MAXALIGN of four buildfarms seem 4 accroding to
> the configure script so MAXALIGN(28) might be 28 on these buildfarms.
>

Good finding.  I was also wondering along these lines and wanted to
verify.  Thanks a lot.  So, this clearly states why we have a second
failure in my email above [1].  I think this means for the fsm test
also we have to be careful when relying on the number of pages in the
test.  I think now we have found the reasons and solutions for the
first three problems mentioned in my email [1].  For the problem-4
(Failure on jacana:), I have speculated some theory, but not sure how
can we confirm?  Can we try the patch on Jacana before considering the
patch for commit?  Is there any other way we can replicate that error?

[1] - 
https://www.postgresql.org/message-id/CAA4eK1L%3DqWp_bJ5aTc9%2Bfy4Ewx2LPaLWY-RbR4a60g_rupCKnQ%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-29 Thread Amit Kapila
On Tue, Jan 29, 2019 at 5:59 AM Amit Kapila  wrote:
>
> On Tue, Jan 29, 2019 at 12:37 AM John Naylor
>  wrote:
> > > I think here you need to clear the map if it exists or clear it
> > > unconditionally, the earlier one would be better.
> >
> > Ok, maybe all callers should call it unconditonally, but within the
> > function, check "if (FSM_LOCAL_MAP_EXISTS)"?
> >
>
> Sounds sensible.  I think we should try to reproduce these failures,
> for ex. for pgbench failure, we can try the same test with more
> clients.
>

I am able to reproduce this by changing pgbench test as below:

--- a/src/bin/pgbench/t/001_pgbench_with_server.pl
+++ b/src/bin/pgbench/t/001_pgbench_with_server.pl
@@ -56,9 +56,9 @@ $node->safe_psql('postgres',
  'CREATE UNLOGGED TABLE insert_tbl (id serial primary key); ');

 pgbench(
- '--no-vacuum --client=5 --protocol=prepared --transactions=25',
+ '--no-vacuum --client=10 --protocol=prepared --transactions=25',
  0,
- [qr{processed: 125/125}],
+ [qr{processed: 250/250}],

You can find this change in attached patch.  Then, I ran the make
check in src/bin/pgbench multiple times using test_conc_insert.sh.
You can vary the number of times the test should run, if you are not
able to reproduce it with this.

The attached patch (clear_local_map_if_exists_1.patch) atop the main
patch fixes the issue for me.  Kindly verify the same.


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


change_pgbench_test_1.patch
Description: Binary data
for ((i = 1 ; i < 25; i++))
do
	make check
done;


clear_local_map_if_exists_1.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-28 Thread Amit Kapila
On Tue, Jan 29, 2019 at 12:37 AM John Naylor
 wrote:
>
> On Mon, Jan 28, 2019 at 12:10 PM Amit Kapila  wrote:
>
> > 2.
> > @@ -15,13 +15,9 @@
> >  SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS 
> > main_100;
> >  ERROR:  block number 100 is out of range for relation "test_rel_forks"
> >  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
> > - fsm_0
> > 
> > -  8192
> > -(1 row)
> > -
> > +ERROR:  could not open file "base/50769/50798_fsm": No such file or 
> > directory
> >  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> > -ERROR:  block number 10 is out of range for relation "test_rel_forks"
> > +ERROR:  could not open file "base/50769/50798_fsm": No such file or 
> > directory
> >
> > This indicates that even though the Vacuum is executed, but the FSM
> > doesn't get created.  This could be due to different BLCKSZ, but the
> > failed machines don't seem to have a non-default value of it.  I am
> > not sure why this could happen, maybe we need to check once in the
> > failed regression database to see the size of relation?
>
> I'm also having a hard time imagining why this failed. Just in case,
> we could return ctid in a plpgsql loop and stop as soon as we see the
> 5th block. I've done that for some tests during development and is a
> safer method anyway.
>

I think we can devise some concrete way, but it is better first we try
to understand why it failed, otherwise there is always a chance that
we will repeat the mistake in some other case.  I think we have no
other choice, but to request the buildfarm owners to either give us
the access to see what happens or help us in investigating the
problem. The four buildfarms where it failed were lapwing, locust,
dromedary, prairiedog.   Among these, the owner of last two is Tom
Lane and others I don't recognize.  Tom, Andrew, can you help us in
getting the access of one of those four?  Yet another alternative is
the owner can apply the patch attached (this is same what got
committed) or reset to commit ac88d2962a and execute below statements
and share the results:

CREATE EXTENSION pageinspect;

CREATE TABLE test_rel_forks (a int);
INSERT INTO test_rel_forks SELECT i from generate_series(1,1000) i;
VACUUM test_rel_forks;
SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;

SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;

SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;

If the above statements give error: "ERROR:  could not open file ...", then run:
Analyze test_rel_forks;
Select oid, relname, relpages, reltuples from pg_class where relname
like 'test%';

The result of the above tests will tell us whether there are 5 pages
in the table or not.  If the table contains 5 pages and throws an
error, then there is some bug in our code, otherwise, there is
something specific to those systems where the above insert doesn't
result in 5 pages.

> > I think here you need to clear the map if it exists or clear it
> > unconditionally, the earlier one would be better.
>
> Ok, maybe all callers should call it unconditonally, but within the
> function, check "if (FSM_LOCAL_MAP_EXISTS)"?
>

Sounds sensible.  I think we should try to reproduce these failures,
for ex. for pgbench failure, we can try the same test with more
clients.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v18-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-28 Thread John Naylor
On Mon, Jan 28, 2019 at 12:10 PM Amit Kapila  wrote:
>
> On Mon, Jan 28, 2019 at 10:03 AM John Naylor
>  wrote:
> >
> 1.
> @@ -26,7 +26,7 @@
>  pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
>   heap_size | fsm_size
>  ---+--
> - 24576 |0
> + 32768 |0
>  (1 row)
>
>  -- Extend table with enough blocks to exceed the FSM threshold
> @@ -56,7 +56,7 @@
>  SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
>   fsm_size
>  --
> -16384
> +24576
>  (1 row)
>
>
> As discussed on another thread, this seems to be due to the reason
> that a parallel auto-analyze doesn't allow vacuum to remove dead-row
> versions.  To fix this, I think we should avoid having a dependency on
> vacuum to remove dead rows.

Ok, to make the first test here more reliable I will try Andrew's idea
to use fillfactor to save free space. As I said earlier, I think that
second test isn't helpful and can be dropped.

> 2.
> @@ -15,13 +15,9 @@
>  SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
>  ERROR:  block number 100 is out of range for relation "test_rel_forks"
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
> - fsm_0
> 
> -  8192
> -(1 row)
> -
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or directory
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> -ERROR:  block number 10 is out of range for relation "test_rel_forks"
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or directory
>
> This indicates that even though the Vacuum is executed, but the FSM
> doesn't get created.  This could be due to different BLCKSZ, but the
> failed machines don't seem to have a non-default value of it.  I am
> not sure why this could happen, maybe we need to check once in the
> failed regression database to see the size of relation?

I'm also having a hard time imagining why this failed. Just in case,
we could return ctid in a plpgsql loop and stop as soon as we see the
5th block. I've done that for some tests during development and is a
safer method anyway.

> 
>
> @@ -377,20 +383,9 @@ RelationGetBufferForTuple(Relation relation, Size len,
> +
> + /*
> + * In case we used an in-memory map of available blocks, reset it
> + * for next use.
> + */
> + if (targetBlock < HEAP_FSM_CREATION_THRESHOLD)
> + FSMClearLocalMap();
> +
>
> I think here you need to clear the map if it exists or clear it
> unconditionally, the earlier one would be better.

Ok, maybe all callers should call it unconditonally, but within the
function, check "if (FSM_LOCAL_MAP_EXISTS)"?

Thanks for investigating the failures -- I'm a bit pressed for time this week.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-28 Thread Andrew Gierth
> "Amit" == Amit Kapila  writes:

 Amit> All of these seems to run with fsync=off. Is it possible that
 Amit> vacuum has updated FSM, but the same is not synced to disk and
 Amit> when we try to read it, we didn't get the required page?

No.

fsync never affects what programs see while the system is running, only
what happens after an OS crash.

-- 
Andrew (irc:RhodiumToad)



Re: WIP: Avoid creation of the free space map for small tables

2019-01-28 Thread Amit Kapila
On Mon, Jan 28, 2019 at 4:40 PM Amit Kapila  wrote:
>
> On Mon, Jan 28, 2019 at 10:03 AM John Naylor
>  wrote:
> >
> > On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila 
wrote:
> > > There are a few buildfarm failures due to this commit, see my email on
> > > pgsql-committers.  If you have time, you can also once look into
> > > those.
> >
> > I didn't see anything in common with the configs of the failed
> > members. None have a non-default BLCKSZ that I can see.
> >
>
> I have done an analysis of the different failures on buildfarm.
>
>
> 2.
> @@ -15,13 +15,9 @@
>  SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS
main_100;
>  ERROR:  block number 100 is out of range for relation "test_rel_forks"
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
> - fsm_0
> 
> -  8192
> -(1 row)
> -
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or
directory
>  SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> -ERROR:  block number 10 is out of range for relation "test_rel_forks"
> +ERROR:  could not open file "base/50769/50798_fsm": No such file or
directory
>
> This indicates that even though the Vacuum is executed, but the FSM
> doesn't get created.  This could be due to different BLCKSZ, but the
> failed machines don't seem to have a non-default value of it.  I am
> not sure why this could happen, maybe we need to check once in the
> failed regression database to see the size of relation?
>

This symptom is shown in the below buildfarm critters:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=prairiedog=2019-01-28%2005%3A05%3A22
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=lapwing=2019-01-28%2003%3A20%3A02
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust=2019-01-28%2003%3A13%3A47
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dromedary=2019-01-28%2003%3A07%3A39

All of these seems to run with fsync=off.  Is it possible that vacuum has
updated FSM, but the same is not synced to disk and when we try to read it,
we didn't get the required page?  This is just a guess.

I have checked all the buildfarm failures and I see only 4 symptoms for
which I have sent some initial analysis.  I think you can also once
cross-verify the same.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


Re: WIP: Avoid creation of the free space map for small tables

2019-01-28 Thread Amit Kapila
On Mon, Jan 28, 2019 at 10:03 AM John Naylor
 wrote:
>
> On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila  wrote:
> > There are a few buildfarm failures due to this commit, see my email on
> > pgsql-committers.  If you have time, you can also once look into
> > those.
>
> I didn't see anything in common with the configs of the failed
> members. None have a non-default BLCKSZ that I can see.
>

I have done an analysis of the different failures on buildfarm.

1.
@@ -26,7 +26,7 @@
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size
 ---+--
- 24576 |0
+ 32768 |0
 (1 row)

 -- Extend table with enough blocks to exceed the FSM threshold
@@ -56,7 +56,7 @@
 SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  fsm_size
 --
-16384
+24576
 (1 row)


As discussed on another thread, this seems to be due to the reason
that a parallel auto-analyze doesn't allow vacuum to remove dead-row
versions.  To fix this, I think we should avoid having a dependency on
vacuum to remove dead rows.

2.
@@ -15,13 +15,9 @@
 SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
 ERROR:  block number 100 is out of range for relation "test_rel_forks"
 SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
- fsm_0

-  8192
-(1 row)
-
+ERROR:  could not open file "base/50769/50798_fsm": No such file or directory
 SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
-ERROR:  block number 10 is out of range for relation "test_rel_forks"
+ERROR:  could not open file "base/50769/50798_fsm": No such file or directory

This indicates that even though the Vacuum is executed, but the FSM
doesn't get created.  This could be due to different BLCKSZ, but the
failed machines don't seem to have a non-default value of it.  I am
not sure why this could happen, maybe we need to check once in the
failed regression database to see the size of relation?

3.  Failure on 'mantid'
2019-01-28 00:13:55.191 EST [123979] 001_pgbench_with_server.pl LOG:
statement: CREATE UNLOGGED TABLE insert_tbl (id
serial primary key);
2019-01-28 00:13:55.218 EST [123982] 001_pgbench_with_server.pl LOG:
execute P0_0: INSERT INTO insert_tbl SELECT
FROM generate_series(1,1000);
2019-01-28 00:13:55.219 EST [123983] 001_pgbench_with_server.pl LOG:
execute P0_0: INSERT INTO insert_tbl SELECT
FROM generate_series(1,1000);
2019-01-28 00:13:55.220 EST [123984] 001_pgbench_with_server.pl LOG:
execute P0_0: INSERT INTO insert_tbl SELECT
FROM generate_series(1,1000);
..
..
TRAP: FailedAssertion("!((rel->rd_rel->relkind == 'r' ||
rel->rd_rel->relkind == 't') && fsm_local_map.map[oldPage] == 0x01)",
File: "freespace.c", Line: 223)

I think this can happen if we forget to clear the local map after we
get the block with space in function RelationGetBufferForTuple().  I
see the race condition in the code where that can happen.  Say, we
tried all the blocks in the local map and then tried to extend the
relation and we didn't get ConditionalLockRelationForExtension, in the
meantime, another backend has extended the relation and updated the
FSM (via RelationAddExtraBlocks).  Now, when the backend that didn't
get the extension lock will get the target block from FSM which will
be greater than HEAP_FSM_CREATION_THRESHOLD.  Next, it will find that
the block can be used to insert a new row and return the buffer, but
won't clear the local map due to below condition in code:

@@ -377,20 +383,9 @@ RelationGetBufferForTuple(Relation relation, Size len,
+
+ /*
+ * In case we used an in-memory map of available blocks, reset it
+ * for next use.
+ */
+ if (targetBlock < HEAP_FSM_CREATION_THRESHOLD)
+ FSMClearLocalMap();
+

I think here you need to clear the map if it exists or clear it
unconditionally, the earlier one would be better.

This test gets executed concurrently by 5 clients, so it can hit the
above race condition.

4.  Failure on jacana:
--- 
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/test/regress/expected/box.out
2018-09-26
17:53:33 -0400
+++ 
c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/src/test/regress/results/box.out
2019-01-27 23:14:35
-0500
@@ -252,332 +252,7 @@
 ('(0,100)(0,infinity)'),
 ('(-infinity,0)(0,infinity)'),
 ('(-infinity,-infinity)(infinity,infinity)');
-SET enable_seqscan = false;
-SELECT * FROM box_temp WHERE f1 << '(10,20),(30,40)';
..
..
TRAP: FailedAssertion("!(!(fsm_local_map.nblocks > 0))", File:
"c:/mingw/msys/1.0/home/pgrunner/bf/root/HEAD/pgsql.build/../pgsql/src/backend/storage/freespace/freespace.c",
Line:
1118)
..
2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:4] LOG:  server process
(PID 14388) exited with exit code 3
2019-01-27 23:14:35.495 EST [5c4e81a0.2e28:5] DETAIL:  Failed process
was running: INSERT INTO box_temp
VALUES (NULL),

I think the reason for this failure is same as previous (as mentioned
in point-3), but this can happen in a different way.  Say, we 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-27 Thread Amit Kapila
On Mon, Jan 28, 2019 at 10:03 AM John Naylor
 wrote:
>
> On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila  wrote:
> > There are a few buildfarm failures due to this commit, see my email on
> > pgsql-committers.  If you have time, you can also once look into
> > those.
>
> I didn't see anything in common with the configs of the failed
> members. None have a non-default BLCKSZ that I can see.
>
> Looking at this typical example from woodlouse:
>
> == pgsql.build/src/test/regress/regression.diffs
> ==
> --- C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/expected/fsm.out
> 2019-01-28 04:43:09.031456700 +0100
> +++ C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/results/fsm.out
> 2019-01-28 05:06:20.351100400 +0100
> @@ -26,7 +26,7 @@
>  pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
>   heap_size | fsm_size
>  ---+--
> - 24576 |0
> + 32768 |0
>  (1 row)
>
> ***It seems like the relation extended when the new records should
> have gone into block 0.
>
>  -- Extend table with enough blocks to exceed the FSM threshold
> @@ -56,7 +56,7 @@
>  SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
>   fsm_size
>  --
> -16384
> +24576
>  (1 row)
>
> ***And here it seems vacuum didn't truncate the FSM. I wonder if the
> heap didn't get truncated either.
>

Yeah, it seems to me that vacuum is not able to truncate the relation,
see my latest reply on another thread [1].

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JntHd7X6dLJVPGYV917HejjhbMKXn9m_RnnCE162LbLA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-27 Thread John Naylor
On Mon, Jan 28, 2019 at 4:53 AM Amit Kapila  wrote:
> There are a few buildfarm failures due to this commit, see my email on
> pgsql-committers.  If you have time, you can also once look into
> those.

I didn't see anything in common with the configs of the failed
members. None have a non-default BLCKSZ that I can see.

Looking at this typical example from woodlouse:

== pgsql.build/src/test/regress/regression.diffs
==
--- C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/expected/fsm.out
2019-01-28 04:43:09.031456700 +0100
+++ C:/buildfarm/buildenv/HEAD/pgsql.build/src/test/regress/results/fsm.out
2019-01-28 05:06:20.351100400 +0100
@@ -26,7 +26,7 @@
 pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  heap_size | fsm_size
 ---+--
- 24576 |0
+ 32768 |0
 (1 row)

***It seems like the relation extended when the new records should
have gone into block 0.

 -- Extend table with enough blocks to exceed the FSM threshold
@@ -56,7 +56,7 @@
 SELECT pg_relation_size('fsm_check_size', 'fsm') AS fsm_size;
  fsm_size
 --
-16384
+24576
 (1 row)

***And here it seems vacuum didn't truncate the FSM. I wonder if the
heap didn't get truncated either.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-27 Thread Amit Kapila
On Mon, Jan 28, 2019 at 9:16 AM John Naylor  wrote:
>
> On Mon, Jan 28, 2019 at 3:53 AM Amit Kapila  wrote:
> > On Thu, Jan 24, 2019 at 9:14 AM Amit Kapila  wrote:
> > > Sure, apart from this I have run pgindent on the patches and make some
> > > changes accordingly.  Latest patches attached (only second patch has
> > > some changes).  I will take one more pass on Monday morning (28th Jan)
> > > and will commit unless you or others see any problem.
> >
> > Pushed these two patches.
>
> Thank you for your input and detailed review! Thank you Mithun for testing!
>

There are a few buildfarm failures due to this commit, see my email on
pgsql-committers.  If you have time, you can also once look into
those.



-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-27 Thread John Naylor
On Mon, Jan 28, 2019 at 3:53 AM Amit Kapila  wrote:
> On Thu, Jan 24, 2019 at 9:14 AM Amit Kapila  wrote:
> > Sure, apart from this I have run pgindent on the patches and make some
> > changes accordingly.  Latest patches attached (only second patch has
> > some changes).  I will take one more pass on Monday morning (28th Jan)
> > and will commit unless you or others see any problem.
>
> Pushed these two patches.

Thank you for your input and detailed review! Thank you Mithun for testing!

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-27 Thread Amit Kapila
On Thu, Jan 24, 2019 at 9:14 AM Amit Kapila  wrote:
>
> Sure, apart from this I have run pgindent on the patches and make some
> changes accordingly.  Latest patches attached (only second patch has
> some changes).  I will take one more pass on Monday morning (28th Jan)
> and will commit unless you or others see any problem.
>

Pushed these two patches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-27 Thread John Naylor
On Sat, Jan 26, 2019 at 2:14 PM Amit Kapila  wrote:
>
> On Sat, Jan 26, 2019 at 5:05 AM John Naylor  
> wrote:
> >
> > So, in v19 we check pg_class.relpages and if it's
> > a heap and less than or equal the threshold we call stat on the 0th
> > segment to verify.
> >
>
> Okay, but the way logic is implemented appears clumsy to me.

> The function transfer_relfile has no clue about skipping of FSM stuff,
> but it contains comments about it.

Yeah, I wasn't entirely happy with how that turned out.

> I think there is some value in using the information from
> this function to skip fsm files, but the code doesn't appear to fit
> well, how about moving this check to new function
> new_cluster_needs_fsm()?

For v21, new_cluster_needs_fsm() has all responsibility for obtaining
the info it needs. I think this is much cleaner, but there is a small
bit of code duplication since it now has to form the file name. One
thing we could do is form the the base old/new file names in
transfer_single_new_db() and pass those to transfer_relfile(), which
will only add suffixes and segment numbers. We could then pass the
base old file name to new_cluster_needs_fsm() and use it as is. Not
sure if that's worthwhile, though.

> The order in which relkind and relpages is used in the above code is
> different from the order in which it is mentioned in the query, it
> won't matter, but keeping in order will make look code consistent.  I
> have made this and some more minor code adjustments in the attached
> patch.  If you like those, you can include them in the next version of
> your patch.

Okay, done.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 2b70b65f43c05234b81e939b0ed49e34b97d5667 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sun, 27 Jan 2019 21:42:07 +0100
Subject: [PATCH v21 3/3] During pg_upgrade, conditionally skip transfer of
 FSMs.

If a heap on the old cluster has 4 pages or fewer, and the old cluster
was PG v11 or earlier, don't copy or link the FSM. This will shrink
space usage for installations with large numbers of small tables.
---
 src/bin/pg_upgrade/info.c| 16 +++--
 src/bin/pg_upgrade/pg_upgrade.h  |  6 
 src/bin/pg_upgrade/relfilenode.c | 58 ++--
 3 files changed, 76 insertions(+), 4 deletions(-)

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 2f925f086c..902bfc647e 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data,
 
 	map->old_db_oid = old_db->db_oid;
 	map->new_db_oid = new_db->db_oid;
+	map->relpages = old_rel->relpages;
+	map->relkind = old_rel->relkind;
 
 	/*
 	 * old_relfilenode might differ from pg_class.oid (and hence
@@ -418,6 +420,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	char	   *nspname = NULL;
 	char	   *relname = NULL;
 	char	   *tablespace = NULL;
+	char	   *relkind = NULL;
 	int			i_spclocation,
 i_nspname,
 i_relname,
@@ -425,7 +428,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 i_indtable,
 i_toastheap,
 i_relfilenode,
-i_reltablespace;
+i_reltablespace,
+i_relpages,
+i_relkind;
 	char		query[QUERY_ALLOC];
 	char	   *last_namespace = NULL,
 			   *last_tablespace = NULL;
@@ -494,7 +499,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	 */
 	snprintf(query + strlen(query), sizeof(query) - strlen(query),
 			 "SELECT all_rels.*, n.nspname, c.relname, "
-			 "  c.relfilenode, c.reltablespace, %s "
+			 "  c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s "
 			 "FROM (SELECT * FROM regular_heap "
 			 "  UNION ALL "
 			 "  SELECT * FROM toast_heap "
@@ -525,6 +530,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	i_relname = PQfnumber(res, "relname");
 	i_relfilenode = PQfnumber(res, "relfilenode");
 	i_reltablespace = PQfnumber(res, "reltablespace");
+	i_relpages = PQfnumber(res, "relpages");
+	i_relkind = PQfnumber(res, "relkind");
 	i_spclocation = PQfnumber(res, "spclocation");
 
 	for (relnum = 0; relnum < ntups; relnum++)
@@ -556,6 +563,11 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 		curr->relname = pg_strdup(relname);
 
 		curr->relfilenode = atooid(PQgetvalue(res, relnum, i_relfilenode));
+		curr->relpages = atoi(PQgetvalue(res, relnum, i_relpages));
+
+		relkind = PQgetvalue(res, relnum, i_relkind);
+		curr->relkind = relkind[0];
+
 		curr->tblsp_alloc = false;
 
 		/* Is the tablespace oid non-default? */
diff --git a/src/bin/pg_upgrade/pg_upgrade.h b/src/bin/pg_upgrade/pg_upgrade.h
index 2f67eee22b..baeb8ff0f8 100644
--- a/src/bin/pg_upgrade/pg_upgrade.h
+++ b/src/bin/pg_upgrade/pg_upgrade.h
@@ -147,6 +147,8 @@ typedef struct
 	char	   *tablespace;		/* tablespace path; "" for cluster default */
 	bool		nsp_alloc;		/* should nspname be freed? */
 	bool		tblsp_alloc;	/* should tablespace be freed? */
+	

Re: WIP: Avoid creation of the free space map for small tables

2019-01-26 Thread Amit Kapila
On Sat, Jan 26, 2019 at 5:05 AM John Naylor  wrote:
>
> On Thu, Jan 24, 2019 at 5:19 AM Amit Kapila  wrote:
> >
> Performance testing is probably a good idea anyway, but I went ahead
> and implemented your next idea:
>
> > The other alternative is we can fetch pg_class.relpages and rely on
> > that to take this decision, but again if that is not updated, we might
> > take the wrong decision.
>
> We can think of it this way: Which is worse,
> 1. Transferring a FSM we don't need, or
> 2. Skipping a FSM we need
>
> I'd say #2 is worse.
>

Agreed.

> So, in v19 we check pg_class.relpages and if it's
> a heap and less than or equal the threshold we call stat on the 0th
> segment to verify.
>

Okay, but the way logic is implemented appears clumsy to me.

@@ -234,16 +243,40 @@ transfer_relfile(FileNameMap *map, const char
*type_suffix, bool vm_must_add_fro
  {
  /* File does not exist?  That's OK, just return */
  if (errno == ENOENT)
- return;
+ return first_seg_size;
  else
- pg_fatal("error while checking for file existence \"%s.%s\" (\"%s\"
to \"%s\"): %s\n",
- map->nspname, map->relname, old_file, new_file,
- strerror(errno));
+ goto fatal;
  }

  /* If file is empty, just return */
  if (statbuf.st_size == 0)
- return;
+ return first_seg_size;
+ }
+
+ /* Save size of the first segment of the main fork. */
+
+ else if (map->relpages <= HEAP_FSM_CREATION_THRESHOLD &&
+ (map->relkind == RELKIND_RELATION ||
+   map->relkind == RELKIND_TOASTVALUE))
+ {
+ /*
+ * In this case, if pg_class.relpages is wrong, it's possible
+ * that a FSM will be skipped when we actually need it.  To guard
+ * against this, we verify the size of the first segment.
+ */
+ if (stat(old_file, ) != 0)
+ goto fatal;
+ else
+ first_seg_size = statbuf.st_size;
+ }
+ else
+ {
+ /*
+ * For indexes etc., we don't care if pg_class.relpages is wrong,
+ * since we always transfer their FSMs.  For heaps, we might
+ * transfer a FSM when we don't need to, but this is harmless.
+ */
+ first_seg_size = Min(map->relpages, RELSEG_SIZE) * BLCKSZ;
  }

The function transfer_relfile has no clue about skipping of FSM stuff,
but it contains comments about it.  The check "if (map->relpages <=
HEAP_FSM_CREATION_THRESHOLD ..." will needlessly be executed for each
segment.  I think there is some value in using the information from
this function to skip fsm files, but the code doesn't appear to fit
well, how about moving this check to new function
new_cluster_needs_fsm()?


> In the common case, the cost of the stat call is
> offset by not linking the FSM.
>

Agreed.

> Despite needing another pg_class field,
> I think this code is actually easier to read than my earlier versions.
>

Yeah, the code appears cleaner from the last version, but I think we
can do more in that regards.

One more minor comment:
snprintf(query + strlen(query), sizeof(query) - strlen(query),
  "SELECT all_rels.*, n.nspname, c.relname, "
- "  c.relfilenode, c.reltablespace, %s "
+ "  c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s "
  "FROM (SELECT * FROM regular_heap "
  "  UNION ALL "
  "  SELECT * FROM toast_heap "
@@ -525,6 +530,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
  i_relname = PQfnumber(res, "relname");
  i_relfilenode = PQfnumber(res, "relfilenode");
  i_reltablespace = PQfnumber(res, "reltablespace");
+ i_relpages = PQfnumber(res, "relpages");
+ i_relkind = PQfnumber(res, "relkind");
  i_spclocation = PQfnumber(res, "spclocation");

The order in which relkind and relpages is used in the above code is
different from the order in which it is mentioned in the query, it
won't matter, but keeping in order will make look code consistent.  I
have made this and some more minor code adjustments in the attached
patch.  If you like those, you can include them in the next version of
your patch.


--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v20-0003-During-pg_upgrade-conditionally-skip-transfer-of-FSM.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-25 Thread John Naylor
On Thu, Jan 24, 2019 at 9:50 PM Amit Kapila  wrote:
>
> On Fri, Jan 25, 2019 at 1:03 AM John Naylor  
> wrote:
> >
> > On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila  
> > wrote:
> > > I think what doc means to say is
> > > that it copies any unlinked files present in primary's new cluster
> > > (which in your case will be data2).
> >
> > In that case, I'm still confused why that doc says, "Unfortunately,
> > rsync needlessly copies files associated with temporary and unlogged
> > tables because these files don't normally exist on standby servers."
> > I fail to see why the primary's new cluster would have these if they
> > weren't linked.
> >
>
> Why unlogged files won't be in primary's new cluster?  After the
> upgrade, they should be present in a new cluster if they were present
> in the old cluster.

I assume they would be linked, however (I haven't checked this). I did
think rewritten VM files would fall under this, but I was confused
about unlogged files.

> > And in the case we're discussing here, the skipped
> > FSMs won't be on data2, so won't end up in standby/data2.
> >
>
> Right.  I think we are safe with respect to rsync because I have seen
> that we do rewrite the vm files in link mode and rsync will copy them
> from primary's new cluster.

Okay.

> I think you can try to address my other comments on your pg_upgrade
> patch.   Once we agree on the code, we need to test below scenarios:
> (a) upgrade from all supported versions to the latest version
> (b) upgrade standby with and without using rsync.

Sounds good.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-25 Thread John Naylor
On Thu, Jan 24, 2019 at 5:19 AM Amit Kapila  wrote:
>
> 1.
> + if ((maps[mapnum].relkind != RELKIND_RELATION &&
> + maps[mapnum].relkind != RELKIND_TOASTVALUE) ||
> + first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ ||
> + GET_MAJOR_VERSION(new_cluster.major_version) <= 1100)
> + (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit);
>
> I think this check will needlessly be performed for future versions as
> well, say when wants to upgrade from PG12 to PG13.  That might not
> create any problem, but let's try to be more precise.  Can you try to
> rewrite this check?  You might want to encapsulate it inside a
> function.  I have thought of doing something similar to what we do for
> vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I
> guess for this patch it is not important to check catalog version as
> even if someone tries to upgrade to the same version.

Agreed, done for v19 (I've only attached the pg_upgrade patch).

> 2.
> transfer_relfile()
> {
> ..
> - /* Is it an extent, fsm, or vm file? */
> - if (type_suffix[0] != '\0' || segno != 0)
> + /* Did file open fail? */
> + if (stat(old_file, ) != 0)
> ..
> }
>
> So from now onwards, we will call stat for even 0th segment which
> means there is one additional system call for each relation, not sure
> if that matters, but I think there is no harm in once testing with a
> large number of relations say 10K to 50K relations which have FSM.

Performance testing is probably a good idea anyway, but I went ahead
and implemented your next idea:

> The other alternative is we can fetch pg_class.relpages and rely on
> that to take this decision, but again if that is not updated, we might
> take the wrong decision.

We can think of it this way: Which is worse,
1. Transferring a FSM we don't need, or
2. Skipping a FSM we need

I'd say #2 is worse. So, in v19 we check pg_class.relpages and if it's
a heap and less than or equal the threshold we call stat on the 0th
segment to verify. In the common case, the cost of the stat call is
offset by not linking the FSM. Despite needing another pg_class field,
I think this code is actually easier to read than my earlier versions.

> 3.
> -static void
> +static Size
>  transfer_relfile(FileNameMap *map, const char *type_suffix, bool
> vm_must_add_frozenbit)
>
> If we decide to go with the approach proposed by you, we should add
> some comments atop this function for return value change?

Done, as well as other comment edits.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 277d969c31349b22e2c7726935d681e72aacaece Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 25 Jan 2019 17:18:28 -0500
Subject: [PATCH v19 3/3] During pg_upgrade, conditionally skip transfer of
 FSMs.

If a heap on the old cluster has 4 pages or fewer, don't copy or
link the FSM. This will reduce space usage for installations with
large numbers of small tables.
---
 src/bin/pg_upgrade/info.c| 18 ++-
 src/bin/pg_upgrade/pg_upgrade.h  |  6 +++
 src/bin/pg_upgrade/relfilenode.c | 84 +++-
 3 files changed, 93 insertions(+), 15 deletions(-)

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 2f925f086c..55d4911d10 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data,
 
 	map->old_db_oid = old_db->db_oid;
 	map->new_db_oid = new_db->db_oid;
+	map->relpages = old_rel->relpages;
+	map->relkind = old_rel->relkind;
 
 	/*
 	 * old_relfilenode might differ from pg_class.oid (and hence
@@ -415,9 +417,11 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	int			ntups;
 	int			relnum;
 	int			num_rels = 0;
+	int			relpages;
 	char	   *nspname = NULL;
 	char	   *relname = NULL;
 	char	   *tablespace = NULL;
+	char	   *relkind = NULL;
 	int			i_spclocation,
 i_nspname,
 i_relname,
@@ -425,7 +429,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 i_indtable,
 i_toastheap,
 i_relfilenode,
-i_reltablespace;
+i_reltablespace,
+i_relpages,
+i_relkind;
 	char		query[QUERY_ALLOC];
 	char	   *last_namespace = NULL,
 			   *last_tablespace = NULL;
@@ -494,7 +500,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	 */
 	snprintf(query + strlen(query), sizeof(query) - strlen(query),
 			 "SELECT all_rels.*, n.nspname, c.relname, "
-			 "  c.relfilenode, c.reltablespace, %s "
+			 "  c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s "
 			 "FROM (SELECT * FROM regular_heap "
 			 "  UNION ALL "
 			 "  SELECT * FROM toast_heap "
@@ -526,6 +532,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	i_relfilenode = PQfnumber(res, "relfilenode");
 	i_reltablespace = PQfnumber(res, "reltablespace");
 	i_spclocation = PQfnumber(res, "spclocation");
+	i_relpages = PQfnumber(res, "relpages");
+	i_relkind = PQfnumber(res, "relkind");
 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-24 Thread Amit Kapila
On Fri, Jan 25, 2019 at 1:03 AM John Naylor  wrote:
>
> On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila  wrote:
> > I think what doc means to say is
> > that it copies any unlinked files present in primary's new cluster
> > (which in your case will be data2).
>
> In that case, I'm still confused why that doc says, "Unfortunately,
> rsync needlessly copies files associated with temporary and unlogged
> tables because these files don't normally exist on standby servers."
> I fail to see why the primary's new cluster would have these if they
> weren't linked.
>

Why unlogged files won't be in primary's new cluster?  After the
upgrade, they should be present in a new cluster if they were present
in the old cluster.

> And in the case we're discussing here, the skipped
> FSMs won't be on data2, so won't end up in standby/data2.
>

Right.  I think we are safe with respect to rsync because I have seen
that we do rewrite the vm files in link mode and rsync will copy them
from primary's new cluster.

I think you can try to address my other comments on your pg_upgrade
patch.   Once we agree on the code, we need to test below scenarios:
(a) upgrade from all supported versions to the latest version
(b) upgrade standby with and without using rsync.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-24 Thread John Naylor
On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila  wrote:
>
> On Thu, Jan 24, 2019 at 3:39 AM John Naylor  
> wrote:
> > mkdir -p data1 data2 standby
> >
> > echo 'heap' > data1/foo
> > echo 'fsm' > data1/foo_fsm
> >
> > # simulate streaming replication
> > rsync --archive data1 standby
> >
> > # simulate pg_upgrade, skipping FSM
> > ln data1/foo -t data2/
> >
> > rsync --archive --delete --hard-links --size-only --no-inc-recursive
> > data1 data2 standby
> >
> > # result
> > ls standby/data1
> > ls standby/data2
> >
> >
> > The result is that foo_fsm is not copied to standby/data2, contrary to
> > what the docs above imply for other unlinked files. Can anyone shed
> > light on this?
> >
>
> Is foo_fsm present in standby/data1?

Yes it is.

> I think what doc means to say is
> that it copies any unlinked files present in primary's new cluster
> (which in your case will be data2).

In that case, I'm still confused why that doc says, "Unfortunately,
rsync needlessly copies files associated with temporary and unlogged
tables because these files don't normally exist on standby servers."
I fail to see why the primary's new cluster would have these if they
weren't linked. And in the case we're discussing here, the skipped
FSMs won't be on data2, so won't end up in standby/data2.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-24 Thread Amit Kapila
On Thu, Jan 24, 2019 at 9:46 AM Amit Kapila  wrote:
>
> On Thu, Jan 24, 2019 at 3:39 AM John Naylor  
> wrote:
> >

Few comments related to pg_upgrade patch:

1.
+ if ((maps[mapnum].relkind != RELKIND_RELATION &&
+ maps[mapnum].relkind != RELKIND_TOASTVALUE) ||
+ first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ ||
+ GET_MAJOR_VERSION(new_cluster.major_version) <= 1100)
+ (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit);

I think this check will needlessly be performed for future versions as
well, say when wants to upgrade from PG12 to PG13.  That might not
create any problem, but let's try to be more precise.  Can you try to
rewrite this check?  You might want to encapsulate it inside a
function.  I have thought of doing something similar to what we do for
vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I
guess for this patch it is not important to check catalog version as
even if someone tries to upgrade to the same version.

2.
transfer_relfile()
{
..
- /* Is it an extent, fsm, or vm file? */
- if (type_suffix[0] != '\0' || segno != 0)
+ /* Did file open fail? */
+ if (stat(old_file, ) != 0)
..
}

So from now onwards, we will call stat for even 0th segment which
means there is one additional system call for each relation, not sure
if that matters, but I think there is no harm in once testing with a
large number of relations say 10K to 50K relations which have FSM.
The other alternative is we can fetch pg_class.relpages and rely on
that to take this decision, but again if that is not updated, we might
take the wrong decision.

Anyone else has any thoughts on this point?

3.
-static void
+static Size
 transfer_relfile(FileNameMap *map, const char *type_suffix, bool
vm_must_add_frozenbit)

If we decide to go with the approach proposed by you, we should add
some comments atop this function for return value change?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-23 Thread Amit Kapila
On Thu, Jan 24, 2019 at 3:39 AM John Naylor  wrote:
>
> On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila  wrote:
> > Also, another case to think in this regard is the upgrade for standby
> > servers, if you read below paragraph from the user manual [1], you
> > will see what I am worried about?
> >
> > "What this does is to record the links created by pg_upgrade's link
> > mode that connect files in the old and new clusters on the primary
> > server. It then finds matching files in the standby's old cluster and
> > creates links for them in the standby's new cluster. Files that were
> > not linked on the primary are copied from the primary to the standby.
> > (They are usually small.)"
> >
> > [1] - https://www.postgresql.org/docs/devel/pgupgrade.html
>
> I am still not able to get the upgraded standby to go into recovery
> without resorting to pg_basebackup, but in another attempt to
> investigate your question I tried the following (data1 = old cluster,
> data2 = new cluster):
>
>
> mkdir -p data1 data2 standby
>
> echo 'heap' > data1/foo
> echo 'fsm' > data1/foo_fsm
>
> # simulate streaming replication
> rsync --archive data1 standby
>
> # simulate pg_upgrade, skipping FSM
> ln data1/foo -t data2/
>
> rsync --archive --delete --hard-links --size-only --no-inc-recursive
> data1 data2 standby
>
> # result
> ls standby/data1
> ls standby/data2
>
>
> The result is that foo_fsm is not copied to standby/data2, contrary to
> what the docs above imply for other unlinked files. Can anyone shed
> light on this?
>

Is foo_fsm present in standby/data1?  I think what doc means to say is
that it copies any unlinked files present in primary's new cluster
(which in your case will be data2).

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-23 Thread Amit Kapila
On Wed, Jan 23, 2019 at 9:18 PM John Naylor  wrote:
>
> On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila  wrote:
> > I think the first two patches (a) removal of dead code in bootstrap
> > and (b) the core patch to avoid creation of FSM file for the small
> > table are good now.  I have prepared the patches along with commit
> > message.  There is no change except for some changes in README and
> > commit message of the second patch.  Kindly let me know what you think
> > about them?
>
> Good to hear! The additional language is fine. In "Once the FSM is
> created for heap", I would just change that to "...for a heap".
>

Sure, apart from this I have run pgindent on the patches and make some
changes accordingly.  Latest patches attached (only second patch has
some changes).  I will take one more pass on Monday morning (28th Jan)
and will commit unless you or others see any problem.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch
Description: Binary data


v18-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-23 Thread John Naylor
On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila  wrote:
> Also, another case to think in this regard is the upgrade for standby
> servers, if you read below paragraph from the user manual [1], you
> will see what I am worried about?
>
> "What this does is to record the links created by pg_upgrade's link
> mode that connect files in the old and new clusters on the primary
> server. It then finds matching files in the standby's old cluster and
> creates links for them in the standby's new cluster. Files that were
> not linked on the primary are copied from the primary to the standby.
> (They are usually small.)"
>
> [1] - https://www.postgresql.org/docs/devel/pgupgrade.html

I am still not able to get the upgraded standby to go into recovery
without resorting to pg_basebackup, but in another attempt to
investigate your question I tried the following (data1 = old cluster,
data2 = new cluster):


mkdir -p data1 data2 standby

echo 'heap' > data1/foo
echo 'fsm' > data1/foo_fsm

# simulate streaming replication
rsync --archive data1 standby

# simulate pg_upgrade, skipping FSM
ln data1/foo -t data2/

rsync --archive --delete --hard-links --size-only --no-inc-recursive
data1 data2 standby

# result
ls standby/data1
ls standby/data2


The result is that foo_fsm is not copied to standby/data2, contrary to
what the docs above imply for other unlinked files. Can anyone shed
light on this?

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-23 Thread John Naylor
On Wed, Jan 23, 2019 at 7:09 AM Amit Kapila  wrote:
> I think the first two patches (a) removal of dead code in bootstrap
> and (b) the core patch to avoid creation of FSM file for the small
> table are good now.  I have prepared the patches along with commit
> message.  There is no change except for some changes in README and
> commit message of the second patch.  Kindly let me know what you think
> about them?

Good to hear! The additional language is fine. In "Once the FSM is
created for heap", I would just change that to "...for a heap".

> I think these two patches can go even without the upgrade patch
> (during pg_upgrade, conditionally skip transfer of FSMs.) which is
> still under discussion.  However, I am not in a hurry if you or other
> thinks that upgrade patch must be committed along with the second
> patch.  I think the upgrade patch is generally going on track but
> might need some more review.

The pg_upgrade piece is a nice-to-have feature and not essential, so
can go in later. Additional review is also welcome.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-23 Thread Amit Kapila
On Sun, Jan 20, 2019 at 5:19 AM John Naylor  wrote:
>
> I have a test for in-range and out-of-range for each relation fork.
>

I think the first two patches (a) removal of dead code in bootstrap
and (b) the core patch to avoid creation of FSM file for the small
table are good now.  I have prepared the patches along with commit
message.  There is no change except for some changes in README and
commit message of the second patch.  Kindly let me know what you think
about them?

I think these two patches can go even without the upgrade patch
(during pg_upgrade, conditionally skip transfer of FSMs.) which is
still under discussion.  However, I am not in a hurry if you or other
thinks that upgrade patch must be committed along with the second
patch.  I think the upgrade patch is generally going on track but
might need some more review.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v02-0001-In-bootstrap-mode-don-t-allow-the-creation-of-files-.patch
Description: Binary data


v17-0002-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-21 Thread John Naylor
On Mon, Jan 21, 2019 at 6:32 AM Amit Kapila  wrote:
> So we won't allow transfer of FSM files if their size is below
> HEAP_FSM_CREATION_THRESHOLD.  What will be its behavior in link mode?
> It seems that the old files will remain there. Will it create any
> problem when we try to create the files via the new server, can you
> once test this case?

I tried upgrading in --link mode, and on the new cluster, enlarging
the table past the threshold causes a new FSM to be created as
expected.

> Also, another case to think in this regard is the upgrade for standby
> servers, if you read below paragraph from the user manual [1], you
> will see what I am worried about?
>
> "What this does is to record the links created by pg_upgrade's link
> mode that connect files in the old and new clusters on the primary
> server. It then finds matching files in the standby's old cluster and
> creates links for them in the standby's new cluster. Files that were
> not linked on the primary are copied from the primary to the standby.
> (They are usually small.)"
>
> [1] - https://www.postgresql.org/docs/devel/pgupgrade.html

Trying this, I ran into a couple problems. I'm probably doing
something wrong, but I can't help but think there's a pg_upgrade
bug/feature I'm unaware of:

I set up my test to have primary directory data1 and for the secondary
standby/data1. I instructed pg_upgrade to upgrade data1 into data1u,
and I tried the rsync recipe in the docs quoted above, and the
upgraded standby wouldn't go into recovery. While debugging that, I
found surprisingly that pg_upgrade also went further and upgraded
standby/data1 into standby/data1u. I tried deleting standby/data1u
before running the rsync command and still nothing. Because the
upgraded secondary is non-functional, I can't really answer your
question.

Not sure if this is normal, but the pg_upgraded new cluster no longer
had the replication slot. Re-adding it didn't allow my upgraded
secondary to go into recovery, either. (I made sure to copy the
recovery settings, so that can't be the problem)


--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: WIP: Avoid creation of the free space map for small tables

2019-01-21 Thread Amit Kapila
On Sun, Jan 20, 2019 at 5:19 AM John Naylor  wrote:
>

Review of v16-0002-During-pg_upgrade-conditionally-skip-transfer-of:

- * Copy/link any fsm and vm files, if they exist
+ *
Copy/link any fsm and vm files, if they exist and if they would
+ * be created in the
new cluster.
  */
- transfer_relfile([mapnum], "_fsm",
vm_must_add_frozenbit);
+ if ((maps[mapnum].relkind != RELKIND_RELATION &&
+
maps[mapnum].relkind != RELKIND_TOASTVALUE) ||
+
first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ ||
+ GET_MAJOR_VERSION
(new_cluster.major_version) <= 1100)
+ (void) transfer_relfile([mapnum],
"_fsm", vm_must_add_frozenbit);

So we won't allow transfer of FSM files if their size is below
HEAP_FSM_CREATION_THRESHOLD.  What will be its behavior in link mode?
It seems that the old files will remain there. Will it create any
problem when we try to create the files via the new server, can you
once test this case?

Also, another case to think in this regard is the upgrade for standby
servers, if you read below paragraph from the user manual [1], you
will see what I am worried about?

"What this does is to record the links created by pg_upgrade's link
mode that connect files in the old and new clusters on the primary
server. It then finds matching files in the standby's old cluster and
creates links for them in the standby's new cluster. Files that were
not linked on the primary are copied from the primary to the standby.
(They are usually small.)"

[1] - https://www.postgresql.org/docs/devel/pgupgrade.html

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: WIP: Avoid creation of the free space map for small tables

2019-01-19 Thread John Naylor
On Sat, Jan 19, 2019 at 8:06 AM Amit Kapila  wrote:
>
> On Thu, Jan 17, 2019 at 11:13 PM John Naylor
> Few more comments:
> 1.
> I think we should not allow to create FSM for toast tables as well
> till there size reaches HEAP_FSM_CREATION_THRESHOLD.  If you try below
> test, you can see that FSM will be created for the toast table even if
> the size of toast relation is 1 page.
...
> I have fixed this in the attached patch, kindly verify it once and see
> if you can add the test for same as well.

Works for me. For v16, I've added and tested similar logic to
pg_upgrade and verified that toast tables work the same as normal
tables in recovery. I used a slightly different method to generate the
long random string to avoid creating a function. Also, some cosmetic
adjustments -- I changed the regression test to use 'i' instead of 'g'
to match the use of generate_series in most other tests, and made
capitalization more consistent.

> 2.
> -CREATE TABLE test1 (a int, b int);
> -INSERT INTO test1 VALUES (16777217, 131584);
> +CREATE TABLE test_rel_forks (a
> int);
> +-- Make sure there are enough blocks in the heap for the FSM to be created.
> +INSERT INTO test_rel_forks SELECT g
> from generate_series(1,1) g;
>
> -VACUUM test1;  -- set up FSM
> +-- set up FSM and VM
> +VACUUM test_rel_forks;
>
> This test will create 45 pages instead of 1.  I know that to create
> FSM, we now need more than 4 pages, but 45 seems to be on the higher
> side.  I think we should not unnecessarily populate more data if there
> is no particular need for it, let's restrict the number of pages to 5
> if possible.

Good idea, done here and in the fsm regression test.

> 3.
> -SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
> - fsm_1
> 
> -  8192
> -(1 row)
> -
> -SELECT octet_length
> (get_raw_page('test1', 'vm', 0)) AS vm_0;
> +SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
> +ERROR:  block number 10 is out of range for relation "test_rel_forks"
>
> Why have you changed the test definition here?  Previously test checks
> the existing FSM page, but now it tries to access out of range page.

The patch is hard to read here, but I still have a test for the
existing FSM page:

-SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
+ERROR:  block number 100 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
  fsm_0
 ---
   8192
 (1 row)

I have a test for in-range and out-of-range for each relation fork.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From ecf0cecee8b0adf8529ddb1fbdabd041172de5e3 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 19 Jan 2019 18:39:39 -0500
Subject: [PATCH v16 1/2] Avoid creation of the free space map for small heap
 relations.

Previously, all heaps had FSMs. For very small tables, this means that
the FSM took up more space than the heap did. This is wasteful, so now
we refrain from creating the FSM for heaps with 4 pages or fewer. If the
last known target block has insufficient space, we still try to insert
into some other page before before giving up and extending the relation,
since doing otherwise leads to table bloat. Testing showed that trying
every page penalized performance slightly, so we compromise and try
every other page. This way, we visit at most two pages. Any pages with
wasted free space become visible at next relation extension, so we still
control table bloat. As a bonus, directly attempting one or two pages
can even be faster than consulting the FSM would have been.
---
 contrib/pageinspect/expected/page.out |  77 +++---
 contrib/pageinspect/sql/page.sql  |  33 ++-
 doc/src/sgml/storage.sgml |  13 +-
 src/backend/access/brin/brin.c|   2 +-
 src/backend/access/brin/brin_pageops.c|  10 +-
 src/backend/access/heap/hio.c |  47 ++--
 src/backend/access/heap/vacuumlazy.c  |  17 +-
 src/backend/access/transam/xact.c |  14 ++
 src/backend/storage/freespace/README  |  33 ++-
 src/backend/storage/freespace/freespace.c | 275 +-
 src/backend/storage/freespace/indexfsm.c  |   6 +-
 src/include/storage/freespace.h   |   9 +-
 src/test/regress/expected/fsm.out |  75 ++
 src/test/regress/parallel_schedule|   6 +
 src/test/regress/serial_schedule  |   1 +
 src/test/regress/sql/fsm.sql  |  55 +
 16 files changed, 571 insertions(+), 102 deletions(-)
 create mode 100644 src/test/regress/expected/fsm.out
 create mode 100644 src/test/regress/sql/fsm.sql

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fbe6d..11d56f49a4 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,48 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-19 Thread Amit Kapila
On Thu, Jan 17, 2019 at 11:13 PM John Naylor
 wrote:
>
> On Wed, Jan 16, 2019 at 10:35 PM Amit Kapila  wrote:
> > Yes, I think it would be good if you can explain the concept of
> > local-map with the help of this example.
>
> > Then let's not add a reference to the version number in this case.  I
>
> Okay, done in v14. I kept your spelling of the new macro. One minor
> detail added: use uint8 rather than char for the local map array. This
> seems to be preferred, especially in this file.
>

I am fine with your change.

Few more comments:
1.
I think we should not allow to create FSM for toast tables as well
till there size reaches HEAP_FSM_CREATION_THRESHOLD.  If you try below
test, you can see that FSM will be created for the toast table even if
the size of toast relation is 1 page.

CREATE OR REPLACE FUNCTION random_text(length INTEGER)
RETURNS TEXT
LANGUAGE SQL
 AS $$ select string_agg(chr
(32+(random()*96)::int), '') from generate_series(1,length); $$;

create table tt(c1 int, c2 text);
insert into tt values(1, random_text(2500));
Vacuum tt;

I have fixed this in the attached patch, kindly verify it once and see
if you can add the test for same as well.

2.
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
+CREATE TABLE test_rel_forks (a
int);
+-- Make sure there are enough blocks in the heap for the FSM to be created.
+INSERT INTO test_rel_forks SELECT g
from generate_series(1,1) g;

-VACUUM test1;  -- set up FSM
+-- set up FSM and VM
+VACUUM test_rel_forks;

This test will create 45 pages instead of 1.  I know that to create
FSM, we now need more than 4 pages, but 45 seems to be on the higher
side.  I think we should not unnecessarily populate more data if there
is no particular need for it, let's restrict the number of pages to 5
if possible.

3.
-SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
- fsm_1

-  8192
-(1 row)
-
-SELECT octet_length
(get_raw_page('test1', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
+ERROR:  block number 10 is out of range for relation "test_rel_forks"

Why have you changed the test definition here?  Previously test checks
the existing FSM page, but now it tries to access out of range page.

Apart from the above, I have changed one sentence in README.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v15-0001-Avoid-creation-of-the-free-space-map-for-small-heap-.patch
Description: Binary data


Re: WIP: Avoid creation of the free space map for small tables

2019-01-17 Thread John Naylor
On Wed, Jan 16, 2019 at 10:35 PM Amit Kapila  wrote:
> Yes, I think it would be good if you can explain the concept of
> local-map with the help of this example.

> Then let's not add a reference to the version number in this case.  I

Okay, done in v14. I kept your spelling of the new macro. One minor
detail added: use uint8 rather than char for the local map array. This
seems to be preferred, especially in this file.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 38bc2d9b4ae3f0d1ab6d21c54ed638c5aee6d637 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 17 Jan 2019 12:25:30 -0500
Subject: [PATCH v14 1/2] Avoid creation of the free space map for small heap 
 relations.

Previously, all heaps had FSMs. For very small tables, this means that
the FSM took up more space than the heap did. This is wasteful, so now
we refrain from creating the FSM for heaps with 4 pages or fewer. If the
last known target block has insufficient space, we still try to insert
into some other page before before giving up and extending the relation,
since doing otherwise leads to table bloat. Testing showed that trying
every page penalized performance slightly, so we compromise and try
every other page. This way, we visit at most two pages. Any pages with
wasted free space become visible at next relation extension, so we still
control table bloat. As a bonus, directly attempting one or two pages
can even be faster than consulting the FSM would have been.
---
 contrib/pageinspect/expected/page.out |  77 +++---
 contrib/pageinspect/sql/page.sql  |  33 +--
 doc/src/sgml/storage.sgml |  13 +-
 src/backend/access/brin/brin.c|   2 +-
 src/backend/access/brin/brin_pageops.c|  10 +-
 src/backend/access/heap/hio.c |  47 ++--
 src/backend/access/heap/vacuumlazy.c  |  17 +-
 src/backend/access/transam/xact.c |  14 ++
 src/backend/storage/freespace/README  |  32 ++-
 src/backend/storage/freespace/freespace.c | 272 +-
 src/backend/storage/freespace/indexfsm.c  |   6 +-
 src/include/storage/freespace.h   |   9 +-
 src/test/regress/expected/fsm.out |  62 +
 src/test/regress/parallel_schedule|   6 +
 src/test/regress/serial_schedule  |   1 +
 src/test/regress/sql/fsm.sql  |  46 
 16 files changed, 545 insertions(+), 102 deletions(-)
 create mode 100644 src/test/regress/expected/fsm.out
 create mode 100644 src/test/regress/sql/fsm.sql

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fbe6d..83e5910453 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,48 +1,69 @@
 CREATE EXTENSION pageinspect;
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
-VACUUM test1;  -- set up FSM
+CREATE TABLE test_rel_forks (a int);
+-- Make sure there are enough blocks in the heap for the FSM to be created.
+INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g;
+-- set up FSM and VM
+VACUUM test_rel_forks;
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
-SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
  main_0 
 
8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1;
-ERROR:  block number 1 is out of range for relation "test1"
-SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
+ERROR:  block number 100 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
  fsm_0 
 ---
   8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
- fsm_1 

-  8192
-(1 row)
-
-SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
+ERROR:  block number 10 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
  vm_0 
 --
  8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1;
-ERROR:  block number 1 is out of range for relation "test1"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;
+ERROR:  block number 1 is out of range for relation "test_rel_forks"
 SELECT octet_length(get_raw_page('xxx', 'main', 0));
 ERROR:  relation "xxx" does not exist
-SELECT octet_length(get_raw_page('test1', 'xxx', 0));
+SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0));
 ERROR:  invalid fork name
 HINT:  Valid fork names are "main", "fsm", "vm", and "init".
-SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0);
+SELECT * FROM 

Re: WIP: Avoid creation of the free space map for small tables

2019-01-16 Thread Amit Kapila
On Wed, Jan 16, 2019 at 10:10 PM John Naylor
 wrote:
>
> On Wed, Jan 16, 2019 at 8:41 AM Amit Kapila  wrote:
> >
> > On Fri, Jan 11, 2019 at 3:54 AM John Naylor  
> > wrote:
> > 1.
> > Commit message:
> > > Any pages with wasted free space become visible at next relation 
> > > extension, so we still control table bloat.
> >
> > I think the free space will be available after the next pass of
> > vacuum, no?  How can relation extension make it available?
>
> To explain, this diagram shows the map as it looks for different small
> table sizes:
>
> 0123
> A
> NA
> ANA
> NANA
>
> So for a 3-block table, the alternating strategy never checks block 1.
> Any free space block 1 has acquired via delete-and-vacuum will become
> visible if it extends to 4 blocks. We are accepting a small amount of
> bloat for improved performance, as discussed. Would it help to include
> this diagram in the README?
>

Yes, I think it would be good if you can explain the concept of
local-map with the help of this example.

> > 2.
> > +2. For very small heap relations, the FSM would be relatively large and
> > +wasteful, so as of PostgreSQL 12 we refrain from creating the FSM for
> > +heaps with HEAP_FSM_CREATION_THRESHOLD pages or fewer, both to save space
> > +and to improve performance.  To locate free space in this case, we simply
> > +iterate over the heap, trying alternating pages in turn.  There may be some
> > +wasted free space in this case, but it becomes visible again upon next
> > +relation extension.
> >
> > a. Again, how space becomes available at next relation extension.
> > b. I think there is no use of mentioning the version number in the
> > above comment, this code will be present from PG-12, so one can find
> > out from which version this optimization is added.
>
> It fits with the reference to PG 8.4 earlier in the document. I chose
> to be consistent, but to be honest, I'm not much in favor of a lot of
> version references in code/READMEs.
>

Then let's not add a reference to the version number in this case.  I
also don't see much advantage of adding version number at least in
this case.

>
> I'll include these with some grammar corrections in the next version.
>

Okay, thanks!

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



  1   2   >