Re: Performance improvements for src/port/snprintf.c

2018-10-07 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> However, it seems like it should still be on the table to look at
 Tom> other code that just does sprintf's job faster (such as the stb
 Tom> code Alexander mentioned).

The stb sprintf is indeed a lot faster for floats than other sprintfs,
but (a) it's still quite a lot slower than Ryu (COPY of my test table is
4.2 seconds with stb, vs 2.7 seconds with Ryu), and (b) it also produces
changes in the insignificant digits, so while (it claims) the values are
still round-trip convertible, they are neither the shortest
representation nor the exact representation.

For example, consider 1.9, which is 0x3FFE:

exact value: 1.899911182158029987476766109466552734375
accepted input range:
  min: 1.89980015985556747182272374629974365234375
  max: 1.90002220446049250313080847263336181640625

exact value rounded to 18 SF: 1.89991

Ryu output: 1.9E0
STB (%*.18g) output: 1.89992
sprintf (%*.18g) output: 1.89991

So while STB's output is in the acceptable range, it's not the result of
rounding the exact value to 18 digits (as sprintf does on my system at
least) and nor is it the shortest. Testing a bunch of random values it
usually seems to be off from the rounded exact result by +/- 1 in the
last digit.

-- 
Andrew (irc:RhodiumToad)



Re: Small performance tweak to run-time partition pruning

2018-10-07 Thread David Rowley
On 7 September 2018 at 19:29, David Rowley  wrote:
> While reviewing some other patches to improve partitioning performance
> I noticed that one of the loops in ExecFindInitialMatchingSubPlans()
> could be coded a bit more efficiently.  The current code loops over
> all the original subplans checking if the subplan is newly pruned, if
> it is, the code sets the new_subplan_indexes array element to -1, else
> it sets it assigns the new subplan index.  This can be done more
> efficiently if we make this array 1-based and initialise the whole
> thing to 0 then just loop over the non-pruned subplans instead of all
> subplans. Pruning all but 1 subplan is quite common.

I was looking at this again and I realised that we can completely skip
the re-sequence of the subplan map when we're not going to perform any
further pruning during execution. We possibly could also not make a
copy of the subplan_map in this case at all in
ExecCreatePartitionPruneState(), and just take the planner's copy
verbatim as we do for the subpart_map.  I was just unable to see any
performance gains from doing this, so I've just left it for now.

Currently, this improves performance about 2% with prepared queries
and 300 partitions.

Patched:

tps = 5169.169452 (excluding connections establishing)
tps = 5155.914286 (excluding connections establishing)

Unpatched:
tps = 5059.511370 (excluding connections establishing)
tps = 5082.851062 (excluding connections establishing)

However with other patches to remove partitioning bottlenecks in the
executor, the TPS goes to about 25,000, so 2% becomes 10%, which seems
more meaningful.

I've attached an updated patch which skips the re-sequence work when
doing that is not required for anything.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v2-0001-Improve-performance-of-run-time-partition-pruning.patch
Description: Binary data


Re: executor relation handling

2018-10-07 Thread David Rowley
On 8 October 2018 at 13:13, Tom Lane  wrote:
> The idea I had in mind was to allow hard pruning of any leaf that's
> been excluded *at plan time* based on partition constraints seen in
> its parent rel(s).  That should be safe enough as long as we take
> locks on all the non-leaf rels --- then we'd know about any change
> in the constraint situation.
>
> Rather than coping with renumbering the RTEs, it might be easiest
> to invent an "RTE_DUMMY" RTE type that a hard-pruned RTE could be
> changed to.

The problem with that is that, if we get [1] done in PG12, then the
RTE_DUMMYs would not exist, as we'd only have RTEs in the range table
for partitions that survived plan-time pruning.

It also leaves a problem to solve in the unneeded locks being taken on
partitions for PREPAREd queries using run-time pruning.

[1] https://commitfest.postgresql.org/20/1778/

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: executor relation handling

2018-10-07 Thread Tom Lane
David Rowley  writes:
> On 8 October 2018 at 12:18, Tom Lane  wrote:
>> So ISTM that the *real* win for this scenario is going to come from
>> teaching the system to prune unwanted relations from the query
>> altogether, not just from the PlanRowMark list.

> Idle thought:  I wonder if we could add another field to the
> RangeTblEntry; "delaylock". Set that to true in the planner for all
> other_member rels that are partitions then not obtain locks on those
> during AcquireExecutorLocks(). Instead, grab the lock in
> ExecGetRangeTableRelation() the first time through.

Hmm, I'm afraid that's not terribly safe, unless there are ALTER
TABLE restrictions on partitions that I'm not aware of.  For instance,
a leaf partition might have an index it didn't inherit from the parent,
and the query plan might be intending to use that index.  If you don't
take lock on the leaf, you might not know that the index has been dropped
so that a new plan is needed.

The idea I had in mind was to allow hard pruning of any leaf that's
been excluded *at plan time* based on partition constraints seen in
its parent rel(s).  That should be safe enough as long as we take
locks on all the non-leaf rels --- then we'd know about any change
in the constraint situation.

Rather than coping with renumbering the RTEs, it might be easiest
to invent an "RTE_DUMMY" RTE type that a hard-pruned RTE could be
changed to.

regards, tom lane



Re: executor relation handling

2018-10-07 Thread David Rowley
On 8 October 2018 at 12:18, Tom Lane  wrote:
> However, we should keep in mind that without partitioning overhead
> (ie "select * from lt_999 where b = 999 for share"), the TPS rate
> is over 25800 tps.  Most of the overhead in the partitioned case seems
> to be from acquiring locks on rangetable entries that we won't ever
> use, and none of these patch variants are touching that problem.
> So ISTM that the *real* win for this scenario is going to come from
> teaching the system to prune unwanted relations from the query
> altogether, not just from the PlanRowMark list.

Idle thought:  I wonder if we could add another field to the
RangeTblEntry; "delaylock". Set that to true in the planner for all
other_member rels that are partitions then not obtain locks on those
during AcquireExecutorLocks(). Instead, grab the lock in
ExecGetRangeTableRelation() the first time through.

We'd still obtain the lock for the table named in the query at the
normal time so cached plans could properly be invalidated. We'd need
to ensure that anything that could be changed in the partitions to
cause a plan to become invalid properly obtains a lock on the
partitioned table, all the way to the top of the hierarchy.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: executor relation handling

2018-10-07 Thread Tom Lane
I wrote:
> Still need to think a bit more about whether we want 0005 in
> anything like its current form.

So I poked at that for a bit, and soon realized that the *main* problem
there is that ExecFindRowMark() eats O(N^2) time due to repeated searches
of the es_rowMarks list.  While the patch as stated would improve that
for cases where most of the partitions can be pruned at plan time, it
does nothing much for cases where they can't.  However, it's pretty
trivial to fix that: let's just use an array not a list.  Patch 0001
attached does that.

A further improvement we could consider is to avoid opening the relcache
entries for pruned-away relations.  I could not find a way to do that
that was less invasive than removing ExecRowMark.relation and requiring
callers to call a new function to get the relation if they need it.
Patch 0002 attached is a delta on top of 0001 that does that.

Replicating your select-lt-for-share test case as best I can (you never
actually specified it carefully), I find that the TPS rate on my
workstation goes from about 250 tps with HEAD to 920 tps with patch 0001
or 1130 tps with patch 0002.  This compares to about 1600 tps for the
non-FOR-SHARE version of the query.

However, we should keep in mind that without partitioning overhead
(ie "select * from lt_999 where b = 999 for share"), the TPS rate
is over 25800 tps.  Most of the overhead in the partitioned case seems
to be from acquiring locks on rangetable entries that we won't ever
use, and none of these patch variants are touching that problem.
So ISTM that the *real* win for this scenario is going to come from
teaching the system to prune unwanted relations from the query
altogether, not just from the PlanRowMark list.

Keeping that comparison in mind, I'm inclined to think that 0001
is the best thing to do for now.  The incremental win from 0002
is not big enough to justify the API break it creates, while your
0005 is not really attacking the problem the right way.

regards, tom lane

diff --git a/src/backend/executor/execCurrent.c b/src/backend/executor/execCurrent.c
index 7480cf5..aadf749 100644
--- a/src/backend/executor/execCurrent.c
+++ b/src/backend/executor/execCurrent.c
@@ -91,21 +91,22 @@ execCurrentOf(CurrentOfExpr *cexpr,
 	 * the other code can't, while the non-FOR-UPDATE case allows use of WHERE
 	 * CURRENT OF with an insensitive cursor.
 	 */
-	if (queryDesc->estate->es_rowMarks)
+	if (queryDesc->estate->es_rowmarks)
 	{
 		ExecRowMark *erm;
-		ListCell   *lc;
+		Index		i;
 
 		/*
 		 * Here, the query must have exactly one FOR UPDATE/SHARE reference to
 		 * the target table, and we dig the ctid info out of that.
 		 */
 		erm = NULL;
-		foreach(lc, queryDesc->estate->es_rowMarks)
+		for (i = 0; i < queryDesc->estate->es_range_table_size; i++)
 		{
-			ExecRowMark *thiserm = (ExecRowMark *) lfirst(lc);
+			ExecRowMark *thiserm = queryDesc->estate->es_rowmarks[i];
 
-			if (!RowMarkRequiresRowShareLock(thiserm->markType))
+			if (thiserm == NULL ||
+!RowMarkRequiresRowShareLock(thiserm->markType))
 continue;		/* ignore non-FOR UPDATE/SHARE items */
 
 			if (thiserm->relid == table_oid)
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index b6abad5..0a766ef 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -909,61 +909,68 @@ InitPlan(QueryDesc *queryDesc, int eflags)
 	}
 
 	/*
-	 * Next, build the ExecRowMark list from the PlanRowMark(s), if any.
+	 * Next, build the ExecRowMark array from the PlanRowMark(s), if any.
 	 */
-	estate->es_rowMarks = NIL;
-	foreach(l, plannedstmt->rowMarks)
+	if (plannedstmt->rowMarks)
 	{
-		PlanRowMark *rc = (PlanRowMark *) lfirst(l);
-		Oid			relid;
-		Relation	relation;
-		ExecRowMark *erm;
+		estate->es_rowmarks = (ExecRowMark **)
+			palloc0(estate->es_range_table_size * sizeof(ExecRowMark *));
+		foreach(l, plannedstmt->rowMarks)
+		{
+			PlanRowMark *rc = (PlanRowMark *) lfirst(l);
+			Oid			relid;
+			Relation	relation;
+			ExecRowMark *erm;
 
-		/* ignore "parent" rowmarks; they are irrelevant at runtime */
-		if (rc->isParent)
-			continue;
+			/* ignore "parent" rowmarks; they are irrelevant at runtime */
+			if (rc->isParent)
+continue;
 
-		/* get relation's OID (will produce InvalidOid if subquery) */
-		relid = exec_rt_fetch(rc->rti, estate)->relid;
+			/* get relation's OID (will produce InvalidOid if subquery) */
+			relid = exec_rt_fetch(rc->rti, estate)->relid;
 
-		/* open relation, if we need to access it for this mark type */
-		switch (rc->markType)
-		{
-			case ROW_MARK_EXCLUSIVE:
-			case ROW_MARK_NOKEYEXCLUSIVE:
-			case ROW_MARK_SHARE:
-			case ROW_MARK_KEYSHARE:
-			case ROW_MARK_REFERENCE:
-relation = ExecGetRangeTableRelation(estate, rc->rti);
-break;
-			case ROW_MARK_COPY:
-/* no physical table access is required */
-relation = NULL;
-break;
-			default:
-elog(ERROR, "unrecognized markType: %d", rc->markType);
-

Re: [HACKERS] Secondary index access optimizations

2018-10-07 Thread David Rowley
On 5 October 2018 at 04:45, Konstantin Knizhnik
 wrote:
> Will the following test be enough:
>
> -- check that columns for parent table are correctly mapped to child
> partition of their order doesn't match
> create table paren (a int, b text) partition by range(a);
> create table child_1 partition of paren for values from (0) to (10);
> create table child_2 (b text, a int);
> alter table paren attach partition child_2 for values from (10) to (20);
> insert into paren values (generate_series(0,19), generate_series(100,119));
>
> explain (costs off) select * from paren where a between 0 and 9;
> explain (costs off) select * from paren where a between 10 and 20;
> explain (costs off) select * from paren where a >= 5;
> explain (costs off) select * from paren where a <= 15;
>
> select count(*) from paren where a >= 5;
> select count(*) from paren where a < 15;
>
> drop table paren cascade;

I started looking at this to see if this test would cause a crash with
the original code, but it does not. The closest I can get is:

drop table parent;
create table parent (a bytea, b int) partition by range(a);
create table child_1 (b int, a bytea);
alter table parent attach partition child_1 for values from ('a') to ('z');
explain (costs off) select * from parent where b = 1;

But despite the varattnos of the bytea and int accidentally matching,
there's no crash due to the way operator_predicate_proof() requires
more than just the varno and varattno to match. It requires the Vars
to be equal(), which includes vartype, and those are not the same. So
the proof just fails.

In short, probably this test is doing nothing in addition to what
various other tests are doing. So given the test is unable to crash
the unfixed code, then I think it's likely not a worthwhile test to
add.

I wrote:
> create table listp (a int, b int) partition by list(a);
> create table listp1 partition of listp for values in(1);
> create index listp_a_b_idx on listp (a,b);
>
> and a query:
>
> select * from listp where a = 1 order by b;
>
> if we remove the "a = 1" qual, then listp_a_b_idx can't be used.

I had a look at what allows this query still to use the index and it's
down to pathkey_is_redundant() returning true because there's still an
equivalence class containing {a,1}. I don't quite see any reason why
it would not be okay to rely on that working, but that only works for
pathkeys. If you have a case like:

set max_parallel_workers_per_gather=0;
create table listp (a int, b int) partition by list(a);
create table listp1 partition of listp for values in(1);
insert into listp select 1,x from generate_Series(1,100) x;
create index listp_a_b_idx on listp (a,b);
vacuum analyze listp;
explain analyze select * from listp where a = 1 and b = 1;

the "a = 1" will be dropped and the index on (a,b) does not get used.

Patched results in:

postgres=# explain analyze select * from listp where a = 1 and b = 1;
 QUERY PLAN

 Append  (cost=0.00..16925.01 rows=1 width=8) (actual
time=0.019..169.231 rows=1 loops=1)
   ->  Seq Scan on listp1  (cost=0.00..16925.00 rows=1 width=8)
(actual time=0.017..169.228 rows=1 loops=1)
 Filter: (b = 1)
 Rows Removed by Filter: 99
 Planning Time: 0.351 ms
 Execution Time: 169.257 ms
(6 rows)

Whereas unpatched gets:

postgres=# explain analyze select * from listp where a = 1 and b = 1;
QUERY PLAN
--
 Append  (cost=0.42..4.45 rows=1 width=8) (actual time=0.657..0.660
rows=1 loops=1)
   ->  Index Only Scan using listp1_a_b_idx on listp1
(cost=0.42..4.44 rows=1 width=8) (actual time=0.653..0.655 rows=1
loops=1)
 Index Cond: ((a = 1) AND (b = 1))
 Heap Fetches: 0
 Planning Time: 32.303 ms
 Execution Time: 0.826 ms
(6 rows)

so I was probably wrong about suggesting set_append_rel_size() as a
good place to remove these quals. It should perhaps be done later, or
maybe we can add some sort of marker to the qual to say it does not
need to be enforced during execution. Probably the former would be
best as we don't want to show these in EXPLAIN.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Performance improvements for src/port/snprintf.c

2018-10-07 Thread Tom Lane
Andrew Gierth  writes:
> "Tom" == Tom Lane  writes:
>  Tom> Now, "shortest value that converts back exactly" is technically
>  Tom> cool, but I am not sure that it solves any real-world problem that
>  Tom> we have.

> Well, it seems to me that it is perfect for pg_dump.

Perhaps.  I was hoping for something we could slot into snprintf.c;
not being able to select the number of digits to output is clearly
a deal-breaker for that usage.  But perhaps it's reasonable to allow
"extra_float_digits = 3" to be redefined as meaning "use the shortest
value that converts back exactly" in float[48]out.

However, it seems like it should still be on the table to look at
other code that just does sprintf's job faster (such as the stb
code Alexander mentioned).  If anything like that is acceptable
for the general case, then we have to ask whether ryu is enough
faster than *that* code, not faster than what we have now, to
justify carrying another umpteen KB of independent code path
for the dump-and-restore case.

> Also it's kind of a problem that our default float output is not
> round-trip safe - people do keep wondering why they can select a row and
> it'll show a certain value, but then doing WHERE col = 'xxx' on that
> value does not find the row.

Unfortunately, I do not think it's going to be acceptable for default
float output (as opposed to the dump/restore case) to become round-trip
safe.  The number of people complaining today would be dwarfed by the
number of people complaining about extra garbage digits in their results.
There isn't any compromise that will make things "just work" for people
who are unaware of the subtleties of float arithmetic.

regards, tom lane



msys2

2018-10-07 Thread Andrew Dunstan


Disclaimer: I might have done things in not the best way - I'll happily 
accept correction.



Here is some information about building on msys2, the rewrite of msys, 
plus a proposal for a couple of tiny changes to support it.



The simplest way to install msys2 is via the chocolatey package manager 
(See )



    choco install -y msys2


Alternatively, download the installer and run it.


Once that's done, fire up an Msys2 shell and use its package manager to 
install the required tools:



    pacman -S msys/bison \

        msys/flex \

        msys/make \

        msys/git \

        msys/perl \

        msys/ccache \

        msys/diffutils \

        mingw-w64-x86_64-toolchain \

        mingw-w64-i686-toolchain


If you want to run a buildfarm animal, there are a few other things you 
will also need:



    pacman -S msys/perl-libwww msys/perl-Crypt-SSLeay msys/tar


If you want to run TAP tests, you need to install IPC::Run somewhere and 
point PERL5LIB at it. There is no pacman package available. You will 
also need:



    pacman -S msys/perl-Test-Simple msys/perl-Test-Harness


And there are a few very useful things it makes sense to install:


    pacman -S msys/vim msys/patch


Pre-build setup:


For 64 bit builds:


    unset MSYSTEM_CHOST # or export MSYSTEM_CHOST=x86_64-w64-mingw32

    export MSYSTEM=MINGW64

    export PATH=/mingw64/bin:$PATH


For 32 bit builds


    unset MSYSTEM_CHOST # or export MSYSTEM_CHOST=i686-w64-mingw32

    export MSYSTEM=MINGW32

    export PATH=/mingw32/bin:$PATH


build:


    ./configure --with-template=win32 ...


Things that fail:


 * configure without a tamplate - there is a simple fix for this,
   included in the attached patch
 * pg_upgrade test - I was clearly not thorough enough with my fix in
   commit 608a71095. It occurred to me that rather than using `uname
   -s` and  putting "MINGW*|MSYS*" everywhere, it might be better to
   use the "host_os" value from Makefile.global. The patch does it that
   way, falling back to uname if necessary. But I'm with doing it the
   other way of people prefer. Yes, I know I should use backticks
   instead of $().
 * the pg_dump TAP test 010_dump_connstr.pl chokes badly on $dname3 and
   $dbname4. The commands complain about too many arguments. It seems
   weird because jacana is doing this just fine.
 * 32 bit builds only fail the circle regression test like this:

   *** 
C:/tools/msys64/home/Administrator/bf/root/HEAD/pgsql/src/test/regress/expected/circle.out
   2018-10-03 18:52:44.137775400 +
   --- 
C:/tools/msys64/home/administrator/bf/root/HEAD/pgsql.build/src/test/regress/results/circle.out
  2018-10-05 20:59:45.424014300 +
   ***
   *** 109,116 
WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
ORDER BY distance, area(c1.f1), area(c2.f1);
   five |  one   |  two   | distance
   ! --+++--
   !   | <(3,5),0>  | <(1,2),3>  | 0.60555127546399
| <(3,5),0>  | <(5,1),3>  | 1.47213595499958
| <(100,200),10> | <(100,1),115>  |   74
| <(100,200),10> | <(1,2),100>| 111.370729772479
   --- 109,116 
WHERE (c1.f1 < c2.f1) AND ((c1.f1 <-> c2.f1) > 0)
ORDER BY distance, area(c1.f1), area(c2.f1);
   five |  one   |  two   | distance
   ! --+++---
   !   | <(3,5),0>  | <(1,2),3>  | 0.605551275463989
| <(3,5),0>  | <(5,1),3>  |  1.47213595499958
| <(100,200),10> | <(100,1),115>  |74
| <(100,200),10> | <(1,2),100>|  111.370729772479


cheers

andrew


diff --git a/configure b/configure
index 0448c6b..a0a6989 100755
--- a/configure
+++ b/configure
@@ -2945,6 +2945,7 @@ dragonfly*) template=netbsd ;;
  linux*|gnu*|k*bsd*-gnu)
template=linux ;;
mingw*) template=win32 ;;
+msys*) template=win32 ;;
   netbsd*) template=netbsd ;;
  openbsd*) template=openbsd ;;
  solaris*) template=solaris ;;
diff --git a/configure.in b/configure.in
index 23b5bb8..11be746 100644
--- a/configure.in
+++ b/configure.in
@@ -67,6 +67,7 @@ dragonfly*) template=netbsd ;;
  linux*|gnu*|k*bsd*-gnu)
template=linux ;;
mingw*) template=win32 ;;
+msys*) template=win32 ;;
   netbsd*) template=netbsd ;;
  openbsd*) template=openbsd ;;
  solaris*) template=solaris ;;
diff --git a/src/bin/pg_upgrade/Makefile b/src/bin/pg_upgrade/Makefile
index adb0d5d..8b76650 100644
--- a/src/bin/pg_upgrade/Makefile
+++ b/src/bin/pg_upgrade/Makefile
@@ -37,7 +37,7 @@ clean distclean maintainer-clean:
 	   pg_upgrade_dump_*.custom pg_upgrade_*.log
 
 check: test.sh all
-	MAKE=$(MAKE) bindir=$(bindir) libdir=$(libdir) EXTRA_REGRESS_OPTS="$(EXTRA_REGRESS_OPTS)" $(SHELL) $< --install
+	MAKE=$(MAKE) bindir=$(bindir) libdir=$(libdir) 

Re: executor relation handling

2018-10-07 Thread Tom Lane
Amit Langote  writes:
> 0004: removes useless fields from certain planner nodes whose only purpose
> has been to assist the executor lock relations in proper order

I've pushed most of 0004 now; obviously, not the parts removing
PlannedStmt.rowMarks, since that's not possible without rearrangement
of the executor's RowMark handling.

I didn't like the idea of unifying ModifyTable.nominalRelation with
the partition root info.  Those fields serve different masters ---
nominalRelation, at least in its original intent, is only meant for
use of EXPLAIN and might have nothing to do with what happens at
execution.  So even though unifying them would work today, we might
regret it down the line.  Instead I left that field alone and added
a separate rootRelation field to carry the partition root RT index,
which ends up being the same number of fields anyway since we don't
need a flag for is-the-nominal-relation-a-partition-root.

Still need to think a bit more about whether we want 0005 in
anything like its current form.

regards, tom lane



Re: Defaulting to password_encryption = scram-sha-256

2018-10-07 Thread Andres Freund
Hi,

On 2018-10-07 11:37:20 -0400, Tom Lane wrote:
> Michael Paquier  writes:
> > On Sat, Oct 06, 2018 at 11:43:06PM -0700, Andres Freund wrote:
> >> Now that we probably have shaken the worst issues out of scram,
> >> shouldn't we change the default password_encryption to something that
> >> doesn't scare people?   The only reason I could think of not wanting to
> >> do that for is that we don't necessarily guarantee that we have a strong
> >> random generator, but if that's the issue, we should change initdb to
> >> default it to something safe if the platform provides something. Which
> >> is just about any sane one, no?
> 
> > In short, +1.
> 
> > The random function issue would apply to any platform in need of
> > --disable-strong-random, but this applies mainly to some old HP-UX stuff
> > if my memory serves me well, so I'd like to think that we should be safe
> > to just switch the default and not complicate initdb.
> 
> Yeah, I don't see why that should affect anything.  SCRAM with a poor
> random function is probably still better than MD5.

Cool.


> As I recall, the reason for not defaulting to SCRAM right away had
> nothing to do with that; it was worry about how many clients would
> get locked out for lack of SCRAM support.

Right, but two releases should be enough of a warning window.


> But the list at https://wiki.postgresql.org/wiki/List_of_drivers looks
> pretty positive, and another year would probably be enough to give the
> stragglers time to catch up ... especially if they know this is
> coming.

I've updated the list, and I think it looks a bit better now. Go/pq and
Node/node-postgres seem to be the only somewhat important ones without
support. The former has had open PRs for it for quite a while.

Greetings,

Andres Freund



Re: [HACKERS] proposal: schema variables

2018-10-07 Thread Pavel Stehule
Hi

ne 30. 9. 2018 v 0:19 odesílatel Pavel Stehule 
napsal:

>
>
> so 29. 9. 2018 v 10:34 odesílatel Pavel Stehule 
> napsal:
>
>>
>>
>> so 22. 9. 2018 v 8:00 odesílatel Pavel Stehule 
>> napsal:
>>
>>> Hi
>>>
>>> rebased against yesterday changes in tab-complete.c
>>>
>>
>> rebased against last changes in master
>>
>
> + using content of schema variable for estimation
> + subtransaction support
>
> I hope so now, there are almost complete functionality. Please, check it.
>

new update

minor white space issue
one more regress test and 2 pg_dump tests

Regards

Pavel


>
> Regards
>
> Pavel
>
>
>>
>> Regards
>>
>> Pavel
>>
>>
>>
>>> Regards
>>>
>>> Pavel
>>>
>>


schema-variables-20181007-01.patch.gz
Description: application/gzip


Re: WIP: Avoid creation of the free space map for small tables

2018-10-07 Thread Tom Lane
John Naylor  writes:
> On 10/6/18, Thomas Munro  wrote:
>> On Sat, Oct 6, 2018 at 7:47 AM John Naylor  wrote:
>>> A while back, Robert Haas noticed that the space taken up by very
>>> small tables is dominated by the FSM [1]. Tom suggested that we could
>>> prevent creation of the FSM until the heap has reached a certain
>>> threshold size [2]. Attached is a WIP patch to implement that.

BTW, don't we need a similar hack for visibility maps?

regards, tom lane



Re: Defaulting to password_encryption = scram-sha-256

2018-10-07 Thread Tom Lane
Michael Paquier  writes:
> On Sat, Oct 06, 2018 at 11:43:06PM -0700, Andres Freund wrote:
>> Now that we probably have shaken the worst issues out of scram,
>> shouldn't we change the default password_encryption to something that
>> doesn't scare people?   The only reason I could think of not wanting to
>> do that for is that we don't necessarily guarantee that we have a strong
>> random generator, but if that's the issue, we should change initdb to
>> default it to something safe if the platform provides something. Which
>> is just about any sane one, no?

> In short, +1.

> The random function issue would apply to any platform in need of
> --disable-strong-random, but this applies mainly to some old HP-UX stuff
> if my memory serves me well, so I'd like to think that we should be safe
> to just switch the default and not complicate initdb.

Yeah, I don't see why that should affect anything.  SCRAM with a poor
random function is probably still better than MD5.

As I recall, the reason for not defaulting to SCRAM right away had
nothing to do with that; it was worry about how many clients would
get locked out for lack of SCRAM support.  But the list at
https://wiki.postgresql.org/wiki/List_of_drivers
looks pretty positive, and another year would probably be enough
to give the stragglers time to catch up ... especially if they know
this is coming.

regards, tom lane



Re: WIP: Avoid creation of the free space map for small tables

2018-10-07 Thread John Naylor
On 10/6/18, Thomas Munro  wrote:
> On Sat, Oct 6, 2018 at 7:47 AM John Naylor  wrote:
>> A while back, Robert Haas noticed that the space taken up by very
>> small tables is dominated by the FSM [1]. Tom suggested that we could
>> prevent creation of the FSM until the heap has reached a certain
>> threshold size [2]. Attached is a WIP patch to implement that. I've
>> also attached a SQL script to demonstrate the change in behavior for
>> various scenarios.
>
> Hi John,
>
> You'll need to tweak the test in contrib/pageinspect/sql/page.sql,
> because it's currently asserting that there is an FSM on a small table
> so make check-world fails.

Whoops, sorry about that; the attached patch passes make check-world.
While looking into that, I also found a regression: If the cached
target block is the last block in the relation and there is no free
space, that block will be tried twice. That's been fixed as well.

Thanks,
-John Naylor
From 49c80647737edfaae82014f2c94a84114f3688f6 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sun, 7 Oct 2018 21:34:23 +0700
Subject: [PATCH v2] Avoid creation of the free space map for small tables.

The FSM isn't created if the heap has fewer than 10 blocks. If the last
known good block has insufficient space, try every block before extending
the heap.

If a heap with a FSM is truncated back to below the threshold, the FSM
stays around and can be used as usual.

If the heap tuples are all deleted, the FSM stays but has no leaf blocks
(same as before). Although it exists, it won't be re-extended until
the heap passes the threshold again.
---
 contrib/pageinspect/expected/page.out | 77 +-
 contrib/pageinspect/sql/page.sql  | 33 +---
 src/backend/access/heap/hio.c | 96 +--
 src/backend/storage/freespace/freespace.c | 35 -
 src/include/storage/freespace.h   |  4 +
 5 files changed, 169 insertions(+), 76 deletions(-)

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fbe6d..83e5910453 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,48 +1,69 @@
 CREATE EXTENSION pageinspect;
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
-VACUUM test1;  -- set up FSM
+CREATE TABLE test_rel_forks (a int);
+-- Make sure there are enough blocks in the heap for the FSM to be created.
+INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g;
+-- set up FSM and VM
+VACUUM test_rel_forks;
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
-SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
  main_0 
 
8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1;
-ERROR:  block number 1 is out of range for relation "test1"
-SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
+ERROR:  block number 100 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
  fsm_0 
 ---
   8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
- fsm_1 

-  8192
-(1 row)
-
-SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
+ERROR:  block number 10 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
  vm_0 
 --
  8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1;
-ERROR:  block number 1 is out of range for relation "test1"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;
+ERROR:  block number 1 is out of range for relation "test_rel_forks"
 SELECT octet_length(get_raw_page('xxx', 'main', 0));
 ERROR:  relation "xxx" does not exist
-SELECT octet_length(get_raw_page('test1', 'xxx', 0));
+SELECT octet_length(get_raw_page('test_rel_forks', 'xxx', 0));
 ERROR:  invalid fork name
 HINT:  Valid fork names are "main", "fsm", "vm", and "init".
-SELECT get_raw_page('test1', 0) = get_raw_page('test1', 'main', 0);
+SELECT * FROM fsm_page_contents(get_raw_page('test_rel_forks', 'fsm', 0));
+ fsm_page_contents 
+---
+ 0: 192   +
+ 1: 192   +
+ 3: 192   +
+ 7: 192   +
+ 15: 192  +
+ 31: 192  +
+ 63: 192  +
+ 127: 192 +
+ 255: 192 +
+ 511: 192 +
+ 1023: 192+
+ 2047: 192+
+ 4095: 192+
+ fp_next_slot: 0  +
+ 
+(1 row)
+
+SELECT get_raw_page('test_rel_forks', 0) = get_raw_page('test_rel_forks', 'main', 0);
  ?column? 
 --
  t
 (1 row)
 
+DROP TABLE test_rel_forks;
+CREATE TABLE test1 (a int, b int);
+INSERT INTO test1 VALUES 

Re: pg_upgrade failed with ERROR: null relpartbound for relation 18159 error.

2018-10-07 Thread Michael Paquier
On Sun, Oct 07, 2018 at 11:13:19AM -0300, Alvaro Herrera wrote:
> Good point.  I cannot do that today, but if you want to, please go
> ahead.

Okay, done.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade failed with ERROR: null relpartbound for relation 18159 error.

2018-10-07 Thread Alvaro Herrera
On 2018-Oct-07, Michael Paquier wrote:

> On Sat, Oct 06, 2018 at 10:15:32PM -0300, Alvaro Herrera wrote:
> > Pushed to 11 and master.  I'm not convinced that it's a good idea to
> > mess with 10 at this point -- the partitioning code is rather completely
> > different already ...
> 
> While I definitely agree with you to not mess up with this portion of
> the partitioning code for REL_10_STABLE, I think that we have a good set
> of arguments to get the regression test into this branch.  This
> particular case has been broken while developing v11, hence it would be
> nice to get the insurance that any future bug fix applying to
> REL_10_STABLE does not break it again.

Good point.  I cannot do that today, but if you want to, please go
ahead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



DSM segment handle generation in background workers

2018-10-07 Thread Thomas Munro
Hello,

I noticed that every parallel worker generates the same sequence of
handles here:

/*
 * Loop until we find an unused identifier for the new control
segment. We
 * sometimes use 0 as a sentinel value indicating that no
control segment
 * is known to exist, so avoid using that value for a real control
 * segment.
 */
for (;;)
{
Assert(dsm_control_address == NULL);
Assert(dsm_control_mapped_size == 0);
dsm_control_handle = random();
if (dsm_control_handle == DSM_HANDLE_INVALID)
continue;
if (dsm_impl_op(DSM_OP_CREATE, dsm_control_handle, segsize,

_control_impl_private, _control_address,

_control_mapped_size, ERROR))
break;
}

It's harmless AFAICS, but it produces sequences of syscalls like this
when Parallel Hash is building the hash table:

shm_open("/PostgreSQL.240477264",O_RDWR|O_CREAT|O_EXCL,0600) ERR#17
'File exists'
shm_open("/PostgreSQL.638747851",O_RDWR|O_CREAT|O_EXCL,0600) ERR#17
'File exists'
shm_open("/PostgreSQL.1551053007",O_RDWR|O_CREAT|O_EXCL,0600) = 5 (0x5)

That's because the bgworker startup path doesn't contain a call to
srandom(...distinguishing stuff...), unlike BackendRun().  I suppose
do_start_bgworker() could gain a similar call... or perhaps that call
should be moved into InitPostmasterChild().  If we put it in there
right after MyStartTime is assigned a new value, we could use the same
incantation that PostmasterMain() uses.

I noticed that the comment in PostmasterMain() refers to
PostmasterRandom(), which is gone.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Performance improvements for src/port/snprintf.c

2018-10-07 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 Tom> Now, "shortest value that converts back exactly" is technically
 Tom> cool, but I am not sure that it solves any real-world problem that
 Tom> we have.

Well, it seems to me that it is perfect for pg_dump.

Also it's kind of a problem that our default float output is not
round-trip safe - people do keep wondering why they can select a row and
it'll show a certain value, but then doing WHERE col = 'xxx' on that
value does not find the row. Yes, testing equality of floats is bad, but
there's no reason to put in extra landmines.

 Tom> I'm also worried that introducing it would result in complaints like
 Tom> 
https://www.postgresql.org/message-id/CANaXbVjw3Y8VmapWuZahtcRhpE61hsSUcjquip3HuXeuN8y4sg%40mail.gmail.com

Frankly for a >20x performance improvement in float8out I don't think
that's an especially big deal.

 Tom> As for #2, my *very* short once-over of the code led me to think
 Tom> that the speed win comes mostly from use of wide integer
 Tom> arithmetic,

Data point: forcing it to use 64-bit only (#define RYU_ONLY_64_BIT_OPS)
makes negligible difference on my test setup.

 Tom> and maybe from throwing big lookup tables at the problem. If so,
 Tom> it's very likely possible that we could adopt those techniques
 Tom> without necessarily buying into the shortest-exact rule for how
 Tom> many digits to print.

If you read the ACM paper (linked from the upstream github repo), it
explains how the algorithm works by combining the radix conversion step
with (the initial iterations of) the operation of finding the shortest
representation. This allows limiting the number of bits needed for the
intermediate results so that it can all be done in fixed-size integers,
rather than using an arbitrary-precision approach.

I do not see any obvious way to use this code to generate the same
output in the final digits that we currently do (in the sense of
overly-exact values like outputting 1.89991 for 1.9 when
extra_float_digits=3).

 >> One option would be to stick with snprintf if extra_float_digits is
 >> less than 0 (or less than or equal to 0 and make the default 1) and
 >> use ryu otherwise, so that the option to get rounded floats is still
 >> there. (Apparently some people do use negative values of
 >> extra_float_digits.) Unlike other format-changing GUCs, this one
 >> already exists and is already used by people who want more or less
 >> precision, including by pg_dump where rount-trip conversion is the
 >> requirement.

 Tom> I wouldn't necessarily object to having some value of
 Tom> extra_float_digits that selects the shortest-exact rule, but I'm
 Tom> thinking maybe it should be a value we don't currently accept.

Why would anyone currently set extra_float_digits > 0 if not to get
round-trip-safe values?

-- 
Andrew (irc:RhodiumToad)



Re: Unclear error message

2018-10-07 Thread Michael Paquier
On Sun, Oct 07, 2018 at 05:14:30PM +0900, Michael Paquier wrote:
> Here is a counter-proposal:
> "cannot use ONLY for foreign key on partitioned table \"%s\" referencing
> relation \"%s\""
> 
> +-- also, adding a NOT VALID foreign key should fail
> +ALTER TABLE fk_partitioned_fk ADD FOREIGN KEY (a, b) REFERENCES 
> fk_notpartitioned_pk NOT VALID;
> +ERROR:  cannot add NOT VALID foreign key to relation "fk_notpartitioned_pk"
> +DETAIL:  This feature is not yet supported on partitioned tables.
> 
> This error should mention "fk_partitioned_fk", and not
> "fk_notpartitioned_pk", no?  The foreign key is added to the former, not
> the latter.

And after some more study, I finish with the attached.  Thoughts?
--
Michael
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 0a5eb68eb6..8554c0a3cb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7365,12 +7365,14 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		if (!recurse)
 			ereport(ERROR,
 	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("foreign key referencing partitioned table \"%s\" must not be ONLY",
+	 errmsg("cannot use ONLY for foreign key on partitioned table \"%s\" referencing relation \"%s\"",
+			RelationGetRelationName(rel),
 			RelationGetRelationName(pkrel;
 		if (fkconstraint->skip_validation && !fkconstraint->initially_valid)
 			ereport(ERROR,
 	(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-	 errmsg("cannot add NOT VALID foreign key to relation \"%s\"",
+	 errmsg("cannot add NOT VALID foreign key on partitioned relation \"%s\" referencing relation \"%s\"",
+			RelationGetRelationName(rel),
 			RelationGetRelationName(pkrel)),
 	 errdetail("This feature is not yet supported on partitioned tables.")));
 	}
diff --git a/src/test/regress/expected/foreign_key.out b/src/test/regress/expected/foreign_key.out
index fc3bbe4deb..a1dbfabdd6 100644
--- a/src/test/regress/expected/foreign_key.out
+++ b/src/test/regress/expected/foreign_key.out
@@ -1465,6 +1465,13 @@ CREATE TABLE fk_partitioned_fk_3_0 PARTITION OF fk_partitioned_fk_3 FOR VALUES W
 CREATE TABLE fk_partitioned_fk_3_1 PARTITION OF fk_partitioned_fk_3 FOR VALUES WITH (MODULUS 5, REMAINDER 1);
 ALTER TABLE fk_partitioned_fk ATTACH PARTITION fk_partitioned_fk_3
   FOR VALUES FROM (2000,2000) TO (3000,3000);
+-- but creating a foreign key with ONLY on a partitioned table doesn't work
+ALTER TABLE ONLY fk_partitioned_fk ADD FOREIGN KEY (a, b) REFERENCES fk_notpartitioned_pk;
+ERROR:  cannot use ONLY for foreign key on partitioned table "fk_partitioned_fk" referencing relation "fk_notpartitioned_pk"
+-- also, adding a NOT VALID foreign key should fail
+ALTER TABLE fk_partitioned_fk ADD FOREIGN KEY (a, b) REFERENCES fk_notpartitioned_pk NOT VALID;
+ERROR:  cannot add NOT VALID foreign key on partitioned relation "fk_partitioned_fk" referencing relation "fk_notpartitioned_pk"
+DETAIL:  This feature is not yet supported on partitioned tables.
 -- these inserts, targetting both the partition directly as well as the
 -- partitioned table, should all fail
 INSERT INTO fk_partitioned_fk (a,b) VALUES (500, 501);
diff --git a/src/test/regress/sql/foreign_key.sql b/src/test/regress/sql/foreign_key.sql
index d2cecdf4eb..85540a0fae 100644
--- a/src/test/regress/sql/foreign_key.sql
+++ b/src/test/regress/sql/foreign_key.sql
@@ -1106,6 +1106,11 @@ CREATE TABLE fk_partitioned_fk_3_1 PARTITION OF fk_partitioned_fk_3 FOR VALUES W
 ALTER TABLE fk_partitioned_fk ATTACH PARTITION fk_partitioned_fk_3
   FOR VALUES FROM (2000,2000) TO (3000,3000);
 
+-- but creating a foreign key with ONLY on a partitioned table doesn't work
+ALTER TABLE ONLY fk_partitioned_fk ADD FOREIGN KEY (a, b) REFERENCES fk_notpartitioned_pk;
+-- also, adding a NOT VALID foreign key should fail
+ALTER TABLE fk_partitioned_fk ADD FOREIGN KEY (a, b) REFERENCES fk_notpartitioned_pk NOT VALID;
+
 -- these inserts, targetting both the partition directly as well as the
 -- partitioned table, should all fail
 INSERT INTO fk_partitioned_fk (a,b) VALUES (500, 501);


signature.asc
Description: PGP signature


Re: Calculate total_table_pages after set_base_rel_sizes()

2018-10-07 Thread David Rowley
On 6 October 2018 at 18:20, Edmund Horner  wrote:
> David Rowley said:
>> I am considering this a bug fix, but I'm proposing this for PG12 only
>> as I don't think destabilising plans in the back branches is a good
>> idea. I'll add this to the September commitfest.
>
> I played with the new patched today with a set of large partitions.
> It seems to work, though the effect is subtle.  The expected behaviour
> of index_pages_fetched is a bit hard to fathom when the cache share
> pushes it between cases A,B and C of the formula, but that's not
> really down to this patch.

Thanks for looking at this and testing it too.

The primary purpose of this patch was step 1 in delaying the creation
of RangeTblEntrys for partitions until after partition pruning has
taken place.  The code I had to do this caused problems around the
total_table_pages calculation due to the lack of RangeTblEntry for the
partitions at the time it was being calculated. But regardless of
that, I still believe where we currently calculate this number is
subtlely broken as it counts all partitions, even ones that will later
be pruned, thus decreasing the likelihood of an index being used as
random_page_cost would be applied to a higher number of index pages.

Amit Langote has since posted a patch to delay the RangeTblEntry
creation until after pruning. His patch happens to also move the
total_table_pages calculation, but I believe this change should be
made as an independent commit to anything else.  I've kept it in the
commitfest for that reason.

> Basically I think it's ready for a committer to look at.  Should I
> update the CF entry?

That sounds good, please do.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Unclear error message

2018-10-07 Thread Michael Paquier
On Sat, Oct 06, 2018 at 11:16:09PM +0200, Laurenz Albe wrote:
> True; how about the attached?
> 
> I think this should go in before v11.

Thanks for adding regression tests for that.

-  errmsg("foreign key referencing partitioned table \"%s\" must not be ONLY",
- RelationGetRelationName(pkrel;

Here is a counter-proposal:
"cannot use ONLY for foreign key on partitioned table \"%s\" referencing
relation \"%s\""

+-- also, adding a NOT VALID foreign key should fail
+ALTER TABLE fk_partitioned_fk ADD FOREIGN KEY (a, b) REFERENCES 
fk_notpartitioned_pk NOT VALID;
+ERROR:  cannot add NOT VALID foreign key to relation "fk_notpartitioned_pk"
+DETAIL:  This feature is not yet supported on partitioned tables.

This error should mention "fk_partitioned_fk", and not
"fk_notpartitioned_pk", no?  The foreign key is added to the former, not
the latter.
--
Michael


signature.asc
Description: PGP signature


Re: Defaulting to password_encryption = scram-sha-256

2018-10-07 Thread Michael Paquier
On Sat, Oct 06, 2018 at 11:43:06PM -0700, Andres Freund wrote:
> Now that we probably have shaken the worst issues out of scram,
> shouldn't we change the default password_encryption to something that
> doesn't scare people?   The only reason I could think of not wanting to
> do that for is that we don't necessarily guarantee that we have a strong
> random generator, but if that's the issue, we should change initdb to
> default it to something safe if the platform provides something. Which
> is just about any sane one, no?

In short, +1.

The random function issue would apply to any platform in need of
--disable-strong-random, but this applies mainly to some old HP-UX stuff
if my memory serves me well, so I'd like to think that we should be safe
to just switch the default and not complicate initdb.
--
Michael


signature.asc
Description: PGP signature


Re: pg_upgrade failed with ERROR: null relpartbound for relation 18159 error.

2018-10-07 Thread Michael Paquier
On Sat, Oct 06, 2018 at 10:15:32PM -0300, Alvaro Herrera wrote:
> Pushed to 11 and master.  I'm not convinced that it's a good idea to
> mess with 10 at this point -- the partitioning code is rather completely
> different already ...

While I definitely agree with you to not mess up with this portion of
the partitioning code for REL_10_STABLE, I think that we have a good set
of arguments to get the regression test into this branch.  This
particular case has been broken while developing v11, hence it would be
nice to get the insurance that any future bug fix applying to
REL_10_STABLE does not break it again.
--
Michael


signature.asc
Description: PGP signature


Defaulting to password_encryption = scram-sha-256

2018-10-07 Thread Andres Freund
Hi,

Now that we probably have shaken the worst issues out of scram,
shouldn't we change the default password_encryption to something that
doesn't scare people?   The only reason I could think of not wanting to
do that for is that we don't necessarily guarantee that we have a strong
random generator, but if that's the issue, we should change initdb to
default it to something safe if the platform provides something. Which
is just about any sane one, no?

Greetings,

Andres Freund



Re: now() vs transaction_timestamp()

2018-10-07 Thread Tom Lane
Amit Kapila  writes:
> On Sun, Oct 7, 2018 at 10:36 AM Tom Lane  wrote:
>> That state is restored at least two transactions too late.

> That is right, but I think we can perform shm_toc_lookup for
> PARALLEL_KEY_TRANSACTION_STATE at some earlier point and then use the
> variables from it to restore the respective state at the different
> point of times.

That hardly seems cleaner.

I think this is actually the right way, because
PARALLEL_KEY_TRANSACTION_STATE is (at least by the name) state associated
with the main transaction the worker is going to run.  But given what
I did to xact.c just now, xactStartTimestamp and stmtStartTimestamp are
*not* transaction-local state so far as the worker is concerned.  They
will persist throughout the lifetime of that process, just as the database
ID, user ID, etc, will.  So you might as well argue that none of
FixedParallelState should exist because it should all be in
PARALLEL_KEY_TRANSACTION_STATE, and that doesn't seem like much of
an improvement.

regards, tom lane