Re: Postgres with pthread

2017-12-06 Thread Simon Riggs
> But it is a theory. The main idea of this prototype was to prove or disprove
> this expectation at practice.

> But please notice that it is very raw prototype. A lot of stuff is not
> working yet.

> And supporting all of exited Postgres functionality requires
> much more efforts (and even more efforts are needed for optimizing Postgres
> for this architecture).
>
> I just want to receive some feedback and know if community is interested in
> any further work in this direction.

Looks good. You are right, it is a theory. If your prototype does
actually show what we think it does then it is a good and interesting
result.

I think we need careful analysis to show where these exact gains come
from. The actual benefit is likely not evenly distributed across the
list of possible benefits. Did they arise because you produced a
stripped down version of Postgres? Or did they arise from using
threads?

It would not be the first time a result shown in protoype did not show
real gains on a completed project.

I might also read your results to show that connection concentrators
would be a better area of work, since 100 connections perform better
than 1000 in both cases, so why bother optimising for 1000 connections
at all? In which case we should read the benefit at the 100
connections line, where it shows the lower 28% gain, closer to the
gain your colleague reported.

So I think we don't yet have enough to make a decision.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Runtime Partition Pruning

2017-12-06 Thread Beena Emerson
On Wed, Dec 6, 2017 at 1:21 PM, David Rowley
 wrote:
> On 2 December 2017 at 08:04, Robert Haas  wrote:
>> On Fri, Dec 1, 2017 at 6:20 AM, Beena Emerson  
>> wrote:
>>> David Q1:
>>> postgres=#  explain analyse execute ab_q1 (3,3); --const
>>>QUERY PLAN
>>> -
>>>  Append  (cost=0.00..43.90 rows=1 width=8) (actual time=0.006..0.006
>>> rows=0 loops=1)
>>>->  Seq Scan on ab_a3_b3  (cost=0.00..43.90 rows=1 width=8) (actual
>>> time=0.005..0.005 rows=0 loops=1)
>>>  Filter: ((a = 3) AND (b = 3))
>>>  Planning time: 0.588 ms
>>>  Execution time: 0.043 ms
>>> (5 rows)
>>
>> I think the EXPLAIN ANALYZE input should show something attached to
>> the Append node so that we can tell that partition pruning is in use.
>> I'm not sure if that is as simple as "Run-Time Partition Pruning: Yes"
>> or if we can give a few more useful details.
>
> It already does. Anything subnode with "(never executed)" was pruned
> at runtime. Do we really need anything else to tell us that?

I have added the partition quals that are used for pruning.

PFA the updated patch. I have changed the names of variables to make
it more appropriate, along with adding more code comments and doing
some refactoring and other code cleanups.

Few cases:

1. Only runtime pruning - David's case1
explain analyse execute ab_q1 (2,3);
   QUERY PLAN
-
 Append  (cost=0.00..395.10 rows=9 width=8) (actual time=0.101..0.101
rows=0 loops=1)
   Runtime Partition Pruning: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a1_b1  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a1_b2  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a1_b3  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a2_b1  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a2_b2  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a2_b3  (cost=0.00..43.90 rows=1 width=8) (actual
time=0.007..0.007 rows=0 loops=1)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a3_b1  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a3_b2  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
   ->  Seq Scan on ab_a3_b3  (cost=0.00..43.90 rows=1 width=8) (never executed)
 Filter: ((a = $1) AND (b = $2))
 Planning time: 0.780 ms
 Execution time: 0.220 ms
(22 rows)

2. Runtime pruning after optimizer pruning - David's case 2.
((a >= 4) AND (a <= 5)  is used during optimizer pruning and only (a =
$1) is used for runtime pruning.
=#  explain (analyse, costs off, summary off) execute ab_q1 (4);
QUERY PLAN
---
 Append (actual time=0.062..0.062 rows=0 loops=1)
   Runtime Partition Pruning: (a = $1)
   ->  Seq Scan on ab_a4 (actual time=0.005..0.005 rows=0 loops=1)
 Filter: ((a >= 4) AND (a <= 5) AND (a = $1))
   ->  Seq Scan on ab_a5 (never executed)
 Filter: ((a >= 4) AND (a <= 5) AND (a = $1))
(6 rows)

3. Nestloop Join
tbl1.col1 only has values from 1 to 10.

=# \d+ tprt
   Table "public.tprt"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats
target | Description
+-+---+--+-+-+--+-
 col1   | integer |   |  | | plain   |  |
 col2   | integer |   |  | | plain   |  |
Partition key: RANGE (col1)
Partitions: tprt_1 FOR VALUES FROM (1) TO (5001),
tprt_2 FOR VALUES FROM (5001) TO (10001),
tprt_3 FOR VALUES FROM (10001) TO (20001)


=# explain (analyse, costs off, summary off) SELECT * FROM tbl1 JOIN
tprt ON tbl1.col1 > tprt.col1;
 QUERY PLAN

 Nested Loop (actual time=0.053..0.192 rows=45 loops=1)
   ->  Seq Scan on tbl1 (actual time=0.007..0.009 rows=10 loops=1)
   ->  Append (actual time=0.003..0.004 rows=4 loops=10)
 Runtime Partition Pruning Join Filter: (tbl1.col1 > col1)
 ->  Index Scan using tprt1_idx on tprt_1 (actual
time=0.002..0.004 rows=5 loops=9)
   Index Cond: (tbl1.col1 > col1)
 ->  Index Scan using 

Re: Speeding up pg_upgrade

2017-12-06 Thread Bruce Momjian
On Tue, Dec  5, 2017 at 09:30:53AM -0500, Stephen Frost wrote:
> > > The other concern is if there's changes made to the catalogs by non-DDL
> > > activity that needs to be addressed too (logical replication?); nothing
> > > definite springs to mind off-hand for me, but perhaps others will think
> > > of things.
> > 
> > Yes, it could extend to many parts of the system, which is why I am
> > asking if it is worth it.
> 
> My initial reaction is that it's worth it, but then I also wonder about
> other issues (having to get an ANALYZE done on the new cluster before
> opening it up, for example..) and it makes me wonder if perhaps it'll
> end up being too much risk for too little gain.

Yes, dump/reload of analyze statistics seems like a better use of time. 
I have avoided it since it locks us into supporting the text
respresentation of data type, but at this point it might be worth it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Add PGDLLIMPORT lines to some variables

2017-12-06 Thread Craig Ringer
On 5 December 2017 at 22:49, Robert Haas  wrote:


>
> Committed with these additions.  Please check that I haven't messed
> anything up.
>
>
Looks good to me.

For the record the commit is

commit c572599c65bfe0387563233faabecd2845073538
Author: Robert Haas 
Date:   Tue Dec 5 09:23:57 2017 -0500

Mark assorted variables PGDLLIMPORT.

This makes life easier for extension authors who wish to support
Windows.

Brian Cloutier, slightly amended by me.

Discussion:
http://postgr.es/m/cajcy68fscdnhmzfps4kyo00cadkvxvea-28h-otenk-pa2o...@mail.gmail.com

plus back branches.

I was going to pipe up here to add ReplicationSlotCtl to the list.
Otherwise the only way to access slot information is via the SPI and
pg_stat_replication_slots, which isn't super fun. And it's not like
ReplicationSlotCtl is any more internal than MyReplicationSlot.

I missed the boat on your commit, but ... please?

Patches attached. MyReplicationSlot was only made PGDLLIMPORT in 9.6, so
there's one for 9.4/9.5 and one for 9.6, 10, and master. Personally I don't
care about 9.4/9.5 in the slightest for this, but that's where c572599c is
backpatched to.

--
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From de25c6aff0e07b6acf761a3b939baf2be0fa6389 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 7 Dec 2017 13:40:13 +0800
Subject: [PATCH v1] Make MyReplicationSlot and ReplicationSlotCtl PGDLLIMPORT

---
 src/include/replication/slot.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index c129a4a..0fee63c 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -136,8 +136,8 @@ typedef struct ReplicationSlotCtlData
 /*
  * Pointers to shared memory
  */
-extern ReplicationSlotCtlData *ReplicationSlotCtl;
-extern ReplicationSlot *MyReplicationSlot;
+extern PGDLLIMPORT ReplicationSlotCtlData *ReplicationSlotCtl;
+extern PGDLLIMPORT ReplicationSlot *MyReplicationSlot;
 
 /* GUCs */
 extern PGDLLIMPORT int max_replication_slots;
-- 
2.9.5

From 9e7331ecb3889cbdccc3f8a3a62ac71f215bb3f6 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 7 Dec 2017 13:32:11 +0800
Subject: [PATCH v1] Mark ReplicationSlotCtl as PGDLLIMPORT

---
 src/include/replication/slot.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h
index 0c44233..c3b7cc8 100644
--- a/src/include/replication/slot.h
+++ b/src/include/replication/slot.h
@@ -156,7 +156,7 @@ typedef struct ReplicationSlotCtlData
 /*
  * Pointers to shared memory
  */
-extern ReplicationSlotCtlData *ReplicationSlotCtl;
+extern PGDLLIMPORT ReplicationSlotCtlData *ReplicationSlotCtl;
 extern PGDLLIMPORT ReplicationSlot *MyReplicationSlot;
 
 /* GUCs */
-- 
2.9.5



Re: [HACKERS] INSERT ON CONFLICT and partitioned tables

2017-12-06 Thread Amit Langote
On 2017/12/02 2:57, Robert Haas wrote:
> On Fri, Dec 1, 2017 at 2:44 AM, Amit Langote
>  wrote:
>> I forgot to consider the fact that mtstate could be NULL in
>> ExecSetupPartitionTupleRouting(), so would result in dereferencing NULL
>> pointer when called from CopyFrom(), which fixed in the attached updated
>> patch.
> 
> a ? b : false can more simply be spelled a && b.
> 
> Committed after changing it like that, fixing the broken documentation
> build, and making minor edits to the comments and documentation.

Thanks for committing.

Regards,
Amit




Re: Postgres with pthread

2017-12-06 Thread Craig Ringer
On 7 December 2017 at 11:44, Tsunakawa, Takayuki <
tsunakawa.ta...@jp.fujitsu.com> wrote:

> From: Craig Ringer [mailto:cr...@2ndquadrant.com]
> >   I'd personally expect that an immediate conversion would result
> > in very
> >   little speedup, a bunch of code deleted, a bunch of complexity
> >   added. And it'd still be massively worthwhile, to keep medium to
> > long
> >   term complexity and feature viability in control.
>
> +1
> I hope for things like:
>


> * More performance statistics like system-wide LWLock waits, without the
> concern about fixed shared memory size
>
* Dynamic memory sizing, such as shared_buffers, work_mem,
> maintenance_work_mem
>

I'm not sure how threaded operations would help us much there. If we could
split shared_buffers into extents we could do this with something like dsm
already. Without the ability to split it into extents, we can't do it with
locally malloc'd memory in a threaded system either.

Re performance diagnostics though, you can already get a lot of useful data
from PostgreSQL's SDT tracepoints, which are usable with perf and DTrace
amongst other tools. Dynamic userspace 'perf' probes can tell you a lot too.

I'm confident you could collect some seriously useful data with perf
tracepoints and 'perf script' these days. (BTW, I extended the
https://wiki.postgresql.org/wiki/Profiling_with_perf article a bit
yesterday with some tips on this).

Of course better built-in diagnostics would be nice. But I really don't see
how it'd have much to do with threaded vs forked model of execution; we can
allocate chunks of memory with dsm now, after all.


> * Running multi-threaded components in postgres extension (is it really
> safe to run JVM for PL/Java in a single-threaded postgres?)
>

PL/Java is a giant mess for so many more reasons than that. The JVM is a
heavyweight  startup, lightweight thread model system. It doesn't play at
all well with postgres's lightweight process fork()-based CoW model. You
can't fork() the JVM because fork() doesn't play nice with threads, at all.
So you have to start it in each backend individually, which is just awful.

One of the nice things if Pg got a threaded model would be that you could
embed a JVM, Mono/.NET runtime, etc and have your sessions work together in
ways you cannot currently sensibly do. Folks using MS SQL, Oracle, etc are
pretty used to being able to do this, and while it should be done with
caution it can offer huge benefits for some complex workloads.

Right now if a PostgreSQL user wants to do anything involving IPC, shared
data, etc, we pretty much have to write quite complex C extensions to do it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Postgres with pthread

2017-12-06 Thread Craig Ringer
On 7 December 2017 at 05:58, Thomas Munro 
wrote:

>
> Using a ton of thread local variables may be a useful stepping stone,
> but if we want to be able to separate threads/processes from sessions
> eventually then I guess we'll want to model sessions as first class
> objects and pass them around explicitly or using a single TLS variable
> current_session.
>
>
Yep.

This is the real reason I'm excited by the idea of a threading conversion.

PostgreSQL's architecture conflates "connection", "session" and "executor"
into one somewhat muddled mess. I'd love to be able to untangle that to the
point where we can pool executors amongst active queries, while retaining
idle sessions' state properly even while they're in a transaction.

Yeah, that's a long way off, but it'd be a whole lot more practical if we
didn't have to serialize and deserialize the entire session state to do it.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Postgres with pthread

2017-12-06 Thread Craig Ringer
On 7 December 2017 at 01:17, Andres Freund  wrote:


>
> > I think you've done us a very substantial service by pursuing
> > this far enough to get some quantifiable performance results.
> > But now that we have some results in hand, I think we're best
> > off sticking with the architecture we've got.
>
> I don't agree.
>
> I'd personally expect that an immediate conversion would result in very
> little speedup, a bunch of code deleted, a bunch of complexity
> added. And it'd still be massively worthwhile, to keep medium to long
> term complexity and feature viability in control.
>

Personally I think it's a pity we didn't land up here before the
foundations for parallel query went in - DSM, shm_mq, DSA, etc. I know the
EDB folks at least looked into it though, and presumably there were good
reasons to go in this direction. Maybe that was just "community will never
accept threaded conversion" at the time, though.

Now we have quite a lot of homebrew infrastructure to consider if we do a
conversion.

That said, it might in some ways make it easier. shm_mq, for example, would
likely convert to a threaded backend with minimal changes to callers, and
probably only limited changes to shm_mq its self. So maybe these
abstractions will prove to have been a win in some ways. Except DSA, and
even then it could serve as a transitional API...

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-06 Thread Michael Paquier
On Thu, Dec 7, 2017 at 1:21 AM, Alvaro Herrera  wrote:
> Put together, I propose the attached delta for 0001.

I have been looking at Andres' 0001 and your tweaks here for some time
since yesterday...

I have also performed sanity checks using all the scripts that have
accumulated on my archives for this stuff. This looks solid to me. I
have not seen failures with broken hot chains, REINDEX, etc.

> This way, an xmax that has exactly the OldestXmin value will return
> RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
> OldestXmin value itself is supposed to be still possibly visible to
> somebody).  Also, this way it is consistent with the other comparison to
> OldestXmin at the bottom of the function.  There is no reason for the
> "else" or the extra braces.

+1. It would be nice to add a comment in the patched portion
mentioning that the new code had better match what is at the bottom of
the function.

+   else if (!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
+   {
+   /*
+* Not in Progress, Not Committed, so either Aborted or crashed.
+* Mark the Xmax as invalid.
+*/
+   SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
}

-   /*
-* Not in Progress, Not Committed, so either Aborted or crashed.
-* Remove the Xmax.
-*/
-   SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, InvalidTransactionId);
return HEAPTUPLE_LIVE;

I would find cleaner if the last "else if" is put into its own
separate if condition, and that for a multixact still running this
refers to an updating transaction aborted so hint bits are not set.

> Your commit message does a poor job of acknowledging prior work on
> diagnosing the problem starting from Dan's initial test case and patch.

(Nit: I have extracted from the test case of Dan an isolation test,
which Andres has reduced to a subset of permutations. Peter G. also
complained about what is visibly the same bug we are discussing here
but without a test case.)
-- 
Michael



Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i

2017-12-06 Thread Robert Haas
On Wed, Dec 6, 2017 at 4:31 PM, Tom Lane  wrote:
> Well, I think it's a minute per query not per whole test script.  But in
> any case, if it's taking a longer time than any other isolation test on
> the CLOBBER_CACHE_ALWAYS critters, then it's also taking a proportionately
> longer time than any other test on every other platform, and is therefore
> costing every developer precious time today and indefinitely far into the
> future.  I continue to say that this test ain't worth it.

Sure.  But, you also continue to not respond to my arguments about why
it IS worth it.  I don't want to spend a lot of time fighting about
this, but it looks to me like your preferences here are purely
arbitrary.  Yesterday, you added - without discussion - a test that I
had "obviously" left out by accident.  Today, you want a test removed
that I added on purpose but which you assert has insufficient value.
So, sometimes you think it's worth adding tests that make the test
suite longer, and other times you think it isn't.  That's fair enough
-- everyone comes down in different places on this at different times
-- but the only actual reason you've offered is that the script
contains a command that runs for over a minute on very slow machines
that have been artificially slowed down 100x.  That's a silly reason:
it means that on real machines we're talking less than a second of
runtime even without modifying the test case, and if we do modify the
test, it can probably be made much less.

Please give me a little time to see if I can speed up this test enough
to fix this problem.  If that doesn't work out, then we can rip this
out.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: compress method for spgist - 2

2017-12-06 Thread Nikita Glukhov



On 06.12.2017 21:49, Alexander Korotkov wrote:
On Wed, Dec 6, 2017 at 6:08 PM, Nikita Glukhov 
> wrote:


On 05.12.2017 23:59, Alexander Korotkov wrote:


On Tue, Dec 5, 2017 at 1:14 PM, Darafei Praliaskouski
> wrote:

The following review has been posted through the commitfest
application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            tested, passed

I've read the updated patch and see my concerns addressed.

I'm looking forward to SP-GiST compress method support, as it
will allow usage of SP-GiST index infrastructure for PostGIS.

The new status of this patch is: Ready for Committer


I went trough this patch.  I found documentation changes to be
not sufficient.  And I've made some improvements.

In particular, I didn't understand why is reconstructedValue
claimed to be of spgConfigOut.leafType while it should be of
spgConfigIn.attType both from general logic and code.  I've fixed
that.  Nikita, correct me if I'm wrong.


I think we are reconstructing a leaf datum, so documentation was
correct but the code in spgWalk() and freeScanStackEntry() wrongly
used attType instead of attLeafType. Fixed patch is attached.


Reconstructed datum is used for index-only scan.  Thus, it should be 
original indexed datum of attType, unless we have decompress method 
and pass reconstructed datum through it.
But if we have compress method and do not have decompress method then 
index-only scan seems to be impossible.



Also, I wonder should we check for existence of compress method
when attType and leafType are not the same in spgvalidate()
function?  We don't do this for now.

I've added compress method existence check to spgvalidate().

It would be nice to evade double calling of config method.  Possible 
option could be to memorize difference between attribute type and leaf 
type in high bit of functionset, which is guaranteed to be free.
I decided to simply set compress method's bit in functionset when this 
method it is not required.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 139c8ed..b4a8be4 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,20 +240,22 @@
 
  
   There are five user-defined methods that an index operator class for
-  SP-GiST must provide.  All five follow the convention
-  of accepting two internal arguments, the first of which is a
-  pointer to a C struct containing input values for the support method,
-  while the second argument is a pointer to a C struct where output values
-  must be placed.  Four of the methods just return void, since
-  all their results appear in the output struct; but
+  SP-GiST must provide, and one is optional.  All five
+  mandatory methods follow the convention of accepting two internal
+  arguments, the first of which is a pointer to a C struct containing input
+  values for the support method, while the second argument is a pointer to a
+  C struct where output values must be placed.  Four of the mandatory methods just
+  return void, since all their results appear in the output struct; but
   leaf_consistent additionally returns a boolean result.
   The methods must not modify any fields of their input structs.  In all
   cases, the output struct is initialized to zeroes before calling the
-  user-defined method.
+  user-defined method.  Optional sixth method compress
+  accepts datum to be indexed as the only argument and returns value suitable
+  for physical storage in leaf tuple.
  
 
  
-  The five user-defined methods are:
+  The five mandatory user-defined methods are:
  
 
  
@@ -283,6 +285,7 @@ typedef struct spgConfigOut
 {
 Oid prefixType; /* Data type of inner-tuple prefixes */
 Oid labelType;  /* Data type of inner-tuple node labels */
+Oid leafType;   /* Data type of leaf-tuple values */
 boolcanReturnData;  /* Opclass can reconstruct original data */
 boollongValuesOK;   /* Opclass can cope with values  1 page */
 } spgConfigOut;
@@ -305,6 +308,22 @@ typedef struct spgConfigOut
   class is capable of segmenting long values by repeated suffixing
   (see ).
  
+
+ 
+  leafType is typically the same as
+  attType.  For the reasons of backward
+  compatibility, method config can
+  leave leafType uninitialized; that would
+  give the same effect as setting leafType equal
+  to attType.  When attType
+  and leafType are different, then optional
+  method compress must be provided.
+  Method compress is responsible
+  for transformation of datums to 

Logical replication without a Primary Key

2017-12-06 Thread Joshua D. Drake

-Hackers,

In the docs it says:

"
If the table does not have any suitable key, then it can be set to 
replica identity“full”, which means the entire row becomes the key.


"

How does that work? Is it using one of the hidden columns on a row?


Thanks,


JD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.org
* Unless otherwise stated, opinions are my own.   *




Re: es_query_dsa is broken

2017-12-06 Thread Thomas Munro
On Wed, Dec 6, 2017 at 10:16 PM, Amit Kapila  wrote:
> On Wed, Dec 6, 2017 at 1:14 AM, Robert Haas  wrote:
>> On Tue, Dec 5, 2017 at 7:24 AM, Amit Kapila  wrote:
>>> + EState *estate = gatherstate->ps.state;
>>> +
>>> + /* Install our DSA area while executing the plan. */
>>> + estate->es_query_dsa = gatherstate->pei->area;
>>>   outerTupleSlot = ExecProcNode(outerPlan);
>>> + estate->es_query_dsa = NULL;
>>>
>>> Won't the above coding pattern create a problem, if ExecProcNode
>>> throws an error and outer block catches it and continues execution
>>> (consider the case of execution inside PL blocks)?
>>
>> I don't see what the problem is.  The query that got aborted by the
>> error wouldn't be sharing an EState with one that didn't.
>
> That's right.  Ignore my comment, I got confused.   Other than that, I
> don't see any problem with the code as such apart from that it looks
> slightly hacky.  I think Thomas or someone needs to develop a patch on
> the lines you have mentioned or what Thomas was trying to describe in
> his email and see how it comes out.

Yeah, it is a bit hacky, but I can't see a another way to fix it
without changing released APIs and it's only for one release and will
certainly be unhackified in v11.  For v11 I think we need to decide
between:

1.  Removing es_query_dsa and injecting the right context into the
executor tree as discussed.

2.  Another idea mentioned by Robert in an off-list chat:  We could
consolidate all DSM segments in a multi-gather plan into one.  See the
nearby thread where someone had over a hundred Gather nodes and had to
crank up max_connections to reserve enough DSM slots.  Of course,
optimising for that case doesn't make too much sense (I suspect
multi-gather plans will become less likely with Parallel Append and
Parallel Hash in the picture anyway), but it would reduce a bunch of
duplicated work we do when it happens as well as scarce slot
consumption.  If we did that, then all of a sudden es_query_dsa would
make sense again (ie it'd be whole-query scoped), and we could revert
that hacky change.

Or we could do both things anyway.

-- 
Thomas Munro
http://www.enterprisedb.com



Re: [HACKERS] Transaction control in procedures

2017-12-06 Thread Merlin Moncure
On Wed, Dec 6, 2017 at 8:41 AM, Peter Eisentraut
 wrote:
> On 12/5/17 13:33, Robert Haas wrote:
>> On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut
>>  wrote:
>>> I think ROLLBACK in a cursor loop might not make sense, because the
>>> cursor query itself could have side effects, so a rollback would have to
>>> roll back the entire loop.  That might need more refined analysis before
>>> it could be allowed.
>>
>> COMMIT really has the same problem; if the cursor query has side
>> effects, you can't commit those side effects piecemeal as the loop
>> executed and have things behave sanely.
>
> The first COMMIT inside the loop would commit the cursor query.  This
> isn't all that different from what you'd get now if you coded this
> manually using holdable cursors or just plain client code.  Clearly, you
> can create a mess if the loop body interacts with the loop expression,
> but that's already the case.
>
> But if you coded something like this yourself now and ran a ROLLBACK
> inside the loop, the holdable cursor would disappear (unless previously
> committed), so you couldn't proceed with the loop.
>
> The SQL standard for persistent stored modules explicitly prohibits
> COMMIT and ROLLBACK in cursor loop bodies.  But I think people will
> eventually want it.

The may want it, but silently promoting all cursors to held ones is
not the way to give it to them, unless we narrow it down the the
'for-loop derived cursor' only.  Even then we should consider syntax
decoration:

FOR x IN SELECT  WITH HOLD
LOOP

END LOOP;

merlin



Re: Postgres with pthread

2017-12-06 Thread Thomas Munro
On Thu, Dec 7, 2017 at 6:08 AM, Andres Freund  wrote:
> On 2017-12-06 19:40:00 +0300, Konstantin Knizhnik wrote:
>> As far as I remember, several years ago when implementation of intra-query
>> parallelism was just started there was discussion whether to use threads or
>> leave traditional Postgres process architecture. The decision was made to
>> leave processes. So now we have bgworkers, shared message queue, DSM, ...
>> The main argument for such decision was that switching to threads will
>> require rewriting of most of Postgres code.
>
>> It seems to be quit reasonable argument and and until now I agreed with it.
>>
>> But recently I wanted to check it myself.
>
> I think that's something pretty important to play with. There've been
> several discussions lately, both on and off list / in person, that we're
> taking on more-and-more technical debt just because we're using
> processes. Besides the above, we've grown:
> - a shared memory allocator
> - a shared memory hashtable
> - weird looking thread aware pointers
> - significant added complexity in various projects due to addresses not
>   being mapped to the same address etc.

Yes, those are all workarounds for an ancient temporary design choice.
To quote from a 1989 paper[1] "Currently, POSTGRES runs as one process
for each active user. This was done as an expedient to get a system
operational as quickly as possible. We plan on converting POSTGRES to
use lightweight processes [...]".  +1 for sticking to the plan.

While personally contributing to the technical debt items listed
above, I always imagined that all that machinery could become
compile-time options controlled with --with-threads and
dsa_get_address() would melt away leaving only a raw pointers, and
dsa_area would forward to the MemoryContext + ResourceOwner APIs, or
something like that.  It's unfortunate that we lose type safety along
the way though.  (If only there were some way we could write
dsa_pointer.  In fact it was also a goal of the original
project to adopt C++, based on a comment in 4.2's nodes.h: "Eventually
this code should be transmogrified into C++ classes, and this is more
or less compatible with those things.")

If there were a good way to reserve (but not map) a large address
range before forking, there could also be an intermediate build mode
that keeps the multi-process model but where DSA behaves as above,
which might an interesting way to decouple the
DSA-go-faster-and-reduce-tech-debt project from the threading project.
We could manage the reserved address space ourselves and map DSM
segments with MAP_FIXED, so dsa_get_address() address decoding could
be compiled away.  One way would be to mmap a huge range backed with
/dev/zero, and then map-with-MAP_FIXED segments over the top of it and
then remap /dev/zero back into place when finished, but that sucks
because it gives you that whole mapping in your core files and relies
on overcommit which we don't like, hence my interest in a way to
reserve but not map.

>> The first problem with porting Postgres to pthreads is static variables
>> widely used in Postgres code.
>> Most of modern compilers support thread local variables, for example GCC
>> provides __thread keyword.
>> Such variables are placed in separate segment which is address through
>> segment register (at Intel).
>> So access time to such variables is the same as to normal static variables.
>
> I experimented similarly. Although I'm not 100% sure that if were to go
> for it, we wouldn't instead want to abstract our session concept
> further, or well, at all.

Using a ton of thread local variables may be a useful stepping stone,
but if we want to be able to separate threads/processes from sessions
eventually then I guess we'll want to model sessions as first class
objects and pass them around explicitly or using a single TLS variable
current_session.

> I think the biggest problem with doing this for real is that it's a huge
> project, and that it'll take a long time.
>
> Thanks for working on this!

+1

[1] http://db.cs.berkeley.edu/papers/ERL-M90-34.pdf

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Accessing base table relational names via RelOptInfo

2017-12-06 Thread David Rowley
On 7 December 2017 at 09:26, Walter Cai  wrote:
> I want to be able to programmatically access the relation names inside from
> inside the calc_joinrel_size_estimate method (in costsize.c) using the
> RelOptInfo *outer_rel, RelOptInfo *inner_rel arguments. I'm pretty sure I
> need to use the Relids relids variables in the RelOptInfo struct but I'm not
> sure where to go from there.

If you loop through the RelOptInfo->relids with bms_next_member() and
lookup the RangeTblEntry from root->simple_rte_array[] you can get the
"relid", which is the Oid of the relation. You can then call
get_rel_name() on that Oid. You'll also need to ensure the relid
actually belongs to a relation, for this check that the
RangeTblEntry->rtekind is RTE_RELATION. You might also want to think
about schema names since two relations can share the same name in
different schemas.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: When VACUUM or ANALYZE skips a concurrently dropped table, log i

2017-12-06 Thread Robert Haas
On Wed, Dec 6, 2017 at 12:57 PM, Tom Lane  wrote:
> What appears to be happening is that a database-wide ANALYZE takes more
> than a minute under CLOBBER_CACHE_ALWAYS, causing isolationtester.c's
> hardwired one-minute timeout to trigger.
>
> While you could imagine doing something to get around that, I do not
> believe that this test is worth memorializing in perpetuity to begin
> with.  I'd recommend just taking it out again.

Mumble.  I don't really mind that, but I'll bet $0.05 that this will
get broken at some point and we won't notice right away without the
isolation test.

Is it really our policy that no isolation test can take more than a
minute on the slowest buildfarm critter?  If somebody decides to start
running CLOBBER_CACHE_ALWAYS on an even-slower critter, will we just
nuke isolation tests from orbit until the tests pass there?  I have
difficulty seeing that as a sound approach.

Another thought is that it might not be necessary to have a
database-wide ANALYZE to trigger this.  I managed to reproduce it
locally by doing VACUUM a, b while alternately locking a and b, so
that I let the name lookups complete, but then blocked trying to
vacuum a, and then at that point dropped b, then released the VACUUM.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Accessing base table relational names via RelOptInfo

2017-12-06 Thread Walter Cai
Hi,

I hope this is the appropriate list to send this to.

*Context:*

I (grad student) am trying to insert my own cardinality estimates into the
optimizer

*What I need to do to get there:*

I want to be able to programmatically access the relation names inside from
inside the calc_joinrel_size_estimate method (in costsize.c) using the
RelOptInfo
*outer_rel, RelOptInfo *inner_rel arguments. I'm pretty sure I need to use
the Relids relids variables in the RelOptInfo struct but I'm not sure where
to go from there.

Any help would be much appreciated

Best, Walter


Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-06 Thread Alvaro Herrera
Andres Freund wrote:

> I've played around quite some with the attached patch. So far, after
> applying the second patch, neither VACUUM nor VACUUM FULL / CLUSTER make
> the situation worse for already existing corruption. HOT pruning can
> change the exact appearance of existing corruption a bit, but I don't
> think it can make the corruption meaningfully worse.  It's a bit
> annoying and scary to add so many checks to backbranches but it kinda
> seems required.  The error message texts aren't perfect, but these are
> "should never be hit" type elog()s so I'm not too worried about that.

Looking at 0002: I agree with the stuff being done here.  I think a
couple of these checks could be moved one block outerwards in term of
scope; I don't see any reason why the check should not apply in that
case.  I didn't catch any place missing additional checks.

Despite these being "shouldn't happen" conditions, I think we should
turn these up all the way to ereports with an errcode and all, and also
report the XIDs being complained about.  No translation required,
though.  Other than those changes and minor copy editing a commit
(attached), 0002 looks good to me.

I started thinking it'd be good to report block number whenever anything
happened while scanning the relation.  The best way to go about this
seems to be to add an errcontext callback to lazy_scan_heap, so I attach
a WIP untested patch to add that.  (I'm not proposing this for
back-patch for now, mostly because I don't have the time/energy to push
for it right now.)

It appears that you got all the places that seem to reasonably need
additional checks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 9ac665638c86460f0b767628203f8bf28df35e49 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 6 Dec 2017 16:44:25 -0300
Subject: [PATCH 1/2] tweaks for 0002

---
 src/backend/access/heap/heapam.c  | 67 +++
 src/backend/commands/vacuumlazy.c |  6 ++--
 2 files changed, 50 insertions(+), 23 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index eb93718baa..5c284a4c32 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -6387,17 +6387,23 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
else if (MultiXactIdPrecedes(multi, cutoff_multi))
{
if (MultiXactIdPrecedes(multi, relminmxid))
-   elog(ERROR, "encountered multixact from before 
horizon");
+   ereport(ERROR,
+   (errcode(ERRCODE_DATA_CORRUPTED),
+errmsg_internal("found multixact %u 
from before relminmxid %u",
+multi, 
relminmxid)));
 
/*
-* This old multi cannot possibly have members still running.  
If it
-* was a locker only, it can be removed without any further
-* consideration; but if it contained an update, we might need 
to
-* preserve it.
+* This old multi cannot possibly have members still running, 
but
+* verify just in case.  If it was a locker only, it can be 
removed
+* without any further consideration; but if it contained an 
update, we
+* might need to preserve it.
 */
if (MultiXactIdIsRunning(multi,
 
HEAP_XMAX_IS_LOCKED_ONLY(t_infomask)))
-   elog(ERROR, "multixact from before cutoff found to be 
still running");
+   ereport(ERROR,
+   (errcode(ERRCODE_DATA_CORRUPTED),
+errmsg_internal("multixact %u from 
before cutoff %u found to be still running",
+multi, 
cutoff_multi)));
 
if (HEAP_XMAX_IS_LOCKED_ONLY(t_infomask))
{
@@ -6419,9 +6425,14 @@ FreezeMultiXactId(MultiXactId multi, uint16 t_infomask,
if (TransactionIdPrecedes(xid, cutoff_xid))
{
if (TransactionIdPrecedes(xid, relfrozenxid))
-   elog(ERROR, "encountered xid from 
before horizon");
+   ereport(ERROR,
+   
(errcode(ERRCODE_DATA_CORRUPTED),
+errmsg_internal("found 
xid %u from before relfrozenxid %u",
+   
 xid, relfrozenxid)));
if 

Re: compress method for spgist - 2

2017-12-06 Thread Alexander Korotkov
On Wed, Dec 6, 2017 at 6:08 PM, Nikita Glukhov 
wrote:

> On 05.12.2017 23:59, Alexander Korotkov wrote:
>
> On Tue, Dec 5, 2017 at 1:14 PM, Darafei Praliaskouski 
> wrote:
>
>> The following review has been posted through the commitfest application:
>> make installcheck-world:  not tested
>> Implements feature:   not tested
>> Spec compliant:   not tested
>> Documentation:tested, passed
>>
>> I've read the updated patch and see my concerns addressed.
>>
>> I'm looking forward to SP-GiST compress method support, as it will allow
>> usage of SP-GiST index infrastructure for PostGIS.
>>
>> The new status of this patch is: Ready for Committer
>>
>
> I went trough this patch.  I found documentation changes to be not
> sufficient.  And I've made some improvements.
>
> In particular, I didn't understand why is reconstructedValue claimed to be
> of spgConfigOut.leafType while it should be of spgConfigIn.attType both
> from general logic and code.  I've fixed that.  Nikita, correct me if I'm
> wrong.
>
>
> I think we are reconstructing a leaf datum, so documentation was correct
> but the code in spgWalk() and freeScanStackEntry() wrongly used attType
> instead of attLeafType. Fixed patch is attached.
>

Reconstructed datum is used for index-only scan.  Thus, it should be
original indexed datum of attType, unless we have decompress method and
pass reconstructed datum through it.

> Also, I wonder should we check for existence of compress method when
> attType and leafType are not the same in spgvalidate() function?  We don't
> do this for now.
>
> I've added compress method existence check to spgvalidate().
>

It would be nice to evade double calling of config method.  Possible option
could be to memorize difference between attribute type and leaf type in
high bit of functionset, which is guaranteed to be free.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: pow support for pgbench

2017-12-06 Thread Chapman Flack
Raúl Marín Rodríguez  wrote:

>  I don't want to go too deep into it, but you get stuff like this:
>
> Select pow(2.0, -3)::text = pow(2, -3)::text;
> ?column?
> --
> f

Indeed, to me, that has turned out to be the most intriguing part of
the whole thread.  Needs to be in some SQL subtleties exam somewhere:

select pow(2.0,-3), pow(2,-3);
pow |  pow
+---
 0.1250 | 0.125


Looks like the first call resolves to the numeric version, while
the second (with integer arguments) resolves to the double one:

select pow(2.0,-3) is of (numeric), pow(2,-3) is of (double precision);
 ?column? | ?column?
--+--
 t| t


Still, a numeric 0.125 doesn't always have those trailing zeros:

select pow(2.0,-3), pow(2,-3)::numeric;
pow |  pow
+---
 0.1250 | 0.125


What's going on in the representation?

select numeric_send(pow(2.0,-3)), numeric_send(pow(2,-3)::numeric);
  numeric_send  |  numeric_send
+
 \x0001001004e2 | \x0001000304e2


I assume the 10 vs. 03 hex in the tails of those things represent
either 'precision' or 'scale' of 16 vs. 3?  I don't get much help
from IS OF (numeric(p,s)), which seems to ignore any p,s and just
be true for any numeric. But here, this matches:

select numeric_send(0.125::numeric(16,16));
  numeric_send

 \x0001001004e2



How does numeric_power choose the precision and scale of its result?
Is that something the standard dictates?

Given that 0.125 is exact for this answer, at first I wanted to
ask if numeric_power could be made to produce the result with
precision 3, but then I realized that's backwards. A result with
precision 3 would be like saying, eh, it's somewhere between
0.1245 and 0.1255. If a result is known to be exact, it would be
better to go the other way and return it as numeric(huge).

That then led me to wonder if the cast float8_numeric is really
doing the right thing. Is it turning 0.125 (an exact representation
as float8) into numeric(3,3), again hedging as if it might be anything
from 0.1245 to 0.1255? Would it be better for float8_numeric to
produce a numeric with the precision/scale reflecting the actual
limits of float8?

Ok, now I've been driven to UTSL. It looks as if the intent of
the snprintf(..., "%.*g", DBL_DIG, val) in float8_numeric could
have been to accomplish that. It doesn't, though, as (at least
on my platform), %g drops trailing zeros, though there
is a documented 'alternate form' flag # that prevents that.
It works in bash:

bash-4.2$ printf '%.*g\n' 15 0.125
0.125
bash-4.2$ printf '%#.*g\n' 15 0.125
0.125

Does the standard prescribe how cast(float8 as numeric) ought to
select the precision/scale?

Sorry to drift OT, as this is more about the SQL functions than
pgbench, but it was too puzzling to ignore. :)

-Chap



Re: Usage of epoch in txid_current

2017-12-06 Thread Andres Freund
Hi,

On 2017-12-06 17:39:09 +0530, Amit Kapila wrote:
> We are using ShmemVariableCache->nextXid at many places so always
> converting/truncating it to 32-bit number before using seems slightly
> awkward, so we can think of using a separate nextBigXid 64bit number
> as well.

-many

It's not actually that many places. And a lot of them would should be
updated anyway, to be epoch aware. Let's not add this kind of crummyness
to avoid a few truncating casts here and there.


> Either way, it is not clear to me how we will keep it
> updated after recovery.  Right now, the mechanism is quite simple, at
> the beginning of a recovery we take the value of nextXid from
> checkpoint record and then if any xlog record indicates xid that
> follows nextXid, we advance it.  Here, the point to note is that we
> take the xid from the WAL record (which means that it assumes xids are
> non-contiguous or some xids are consumed without being logged) and
> increment it.  Unless we plan to change something in that logic (like
> storing 64-bit xids in WAL records), it is not clear to me how to make
> it work.  OTOH, recovering value of epoch which increments only at
> wraparound seems fairly straightforward as described in my previous
> email.

I think it should be fairly simple if simply add the 64bit xid to the
existing clog extension WAL records.

Greetings,

Andres Freund



Re: Postgres with pthread

2017-12-06 Thread Andres Freund
Hi,

On 2017-12-06 12:28:29 -0500, Robert Haas wrote:
> > Possibly we even want to continue having various
> > processes around besides that, the most interesting cases involving
> > threads are around intra-query parallelism, and pooling, and for both a
> > hybrid model could be beneficial.
> 
> I think if we only use threads for intra-query parallelism we're
> leaving a lot of money on the table.  For example, if all
> shmem-connected backends are using the same process, then we can make
> max_locks_per_transaction PGC_SIGHUP.  That would be sweet, and there
> are probably plenty of similar things.  Moreover, if threads are this
> thing that we only use now and then for parallel query, then our
> support for them will probably have bugs.  If we use them all the
> time, we'll actually find the bugs and fix them.  I hope.

I think it'd make a lot of sense to go there gradually. I agree that we
probably want to move to more and more use of threads, but we also want
our users not to kill us ;). Initially we'd surely continue to use
partitioned dynahash for locks, which'd make resizing infeasible
anyway. Similar for shared buffers (which I find a hell of a lot more
interesting to change at runtime than max_locks_per_transaction), etc...

- Andres



Re: Postgres with pthread

2017-12-06 Thread Andreas Karlsson

On 12/06/2017 06:08 PM, Andres Freund wrote:

I think the biggest problem with doing this for real is that it's a huge
project, and that it'll take a long time.


An additional issue is that this could break a lot of extensions and in 
a way that it is not apparent at compile time. This means we may need to 
break all extensions to force extensions authors to check if they are 
thread safe.


I do not like making life hard for out extension community, but if the 
gains are big enough it might be worth it.



Thanks for working on this!


Seconded.

Andreas



Re: Postgres with pthread

2017-12-06 Thread Adam Brusselback
> "barely a 50% speedup" - Hah. I don't believe the numbers, but that'd be
> huge.
They are numbers derived from a benchmark that any sane person would
be using a connection pool for in a production environment, but
impressive if true none the less.



Re: [HACKERS] Walsender timeouts and large transactions

2017-12-06 Thread Petr Jelinek
On 05/12/17 21:07, Robert Haas wrote:
> On Mon, Dec 4, 2017 at 10:59 PM, Craig Ringer  wrote:
>> To me it looks like it's time to get this committed, marking as such.
> 
> This version has noticeably more code rearrangement than before, and
> I'm not sure that is actually buying us anything.  Why not keep the
> changes minimal?
> 

Yeah we moved things around in the end, the main reason would be that it
actually works closer to how it was originally designed to work. Meaning
that the slow path is not so slow when !pq_is_send_pending() which seems
to have been the reasoning for original coding.

It's not completely necessary to do it for fixing the bug, but why make
things slower than they need to be.

> Also, TBH, this doesn't seem to have been carefully reviewed for style:
> 
> -if (!pq_is_send_pending())
> -return;
> +/* Try taking fast path unless we get too close to walsender timeout. */
> +if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
> +  wal_sender_timeout / 2))
> +{
> +if (!pq_is_send_pending())
> +return;
> +}
> 
> Generally we write if (a && b) { ... } not if (a) { if (b) .. }
> 

It's rather ugly with && because one of the conditions is two line, but
okay here you go. I am keeping the brackets even if normally don't for
one-liners because it's completely unreadable without them IMHO.

> -}
> +};
> 

Oops.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services
From 61a23cad58a0016876e3d6c829f6353ee491b4c7 Mon Sep 17 00:00:00 2001
From: Petr Jelinek 
Date: Tue, 12 Sep 2017 17:31:28 +0900
Subject: [PATCH] Fix walsender timeouts when decoding large transaction

The logical slots have fast code path for sending data in order to not
impose too high per message overhead. The fast path skips checks for
interrupts and timeouts. However, the existing coding failed to consider
the fact that transaction with large number of changes may take very long
to be processed and sent to the client. This causes walsender to ignore
interrupts for potentially long time and more importantly it will cause
walsender being killed due to timeout at the end of such transaction.

This commit changes the fast path to also check for interrupts and only
allows calling the fast path when last keeplaive check happened less
than half of walsender timeout ago, otherwise the slower code path will
be taken.
---
 src/backend/replication/walsender.c | 66 +
 1 file changed, 37 insertions(+), 29 deletions(-)

diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index fa1db748b5..e015870a4e 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -1151,6 +1151,8 @@ static void
 WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 bool last_write)
 {
+	TimestampTz	now;
+
 	/* output previously gathered data in a CopyData packet */
 	pq_putmessage_noblock('d', ctx->out->data, ctx->out->len);
 
@@ -1160,23 +1162,54 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 	 * several releases by streaming physical replication.
 	 */
 	resetStringInfo();
-	pq_sendint64(, GetCurrentTimestamp());
+	now = GetCurrentTimestamp();
+	pq_sendint64(, now);
 	memcpy(>out->data[1 + sizeof(int64) + sizeof(int64)],
 		   tmpbuf.data, sizeof(int64));
 
-	/* fast path */
+	CHECK_FOR_INTERRUPTS();
+
 	/* Try to flush pending output to the client */
 	if (pq_flush_if_writable() != 0)
 		WalSndShutdown();
 
-	if (!pq_is_send_pending())
+	/* Try taking fast path unless we get too close to walsender timeout. */
+	if (now < TimestampTzPlusMilliseconds(last_reply_timestamp,
+		  wal_sender_timeout / 2) &&
+		pq_is_send_pending())
+	{
 		return;
+	}
 
+	/* If we have pending write here, go to slow path */
 	for (;;)
 	{
 		int			wakeEvents;
 		long		sleeptime;
-		TimestampTz now;
+
+		/* Check for input from the client */
+		ProcessRepliesIfAny();
+
+		now = GetCurrentTimestamp();
+
+		/* die if timeout was reached */
+		WalSndCheckTimeOut(now);
+
+		/* Send keepalive if the time has come */
+		WalSndKeepaliveIfNecessary(now);
+
+		if (!pq_is_send_pending())
+			break;
+
+		sleeptime = WalSndComputeSleeptime(now);
+
+		wakeEvents = WL_LATCH_SET | WL_POSTMASTER_DEATH |
+			WL_SOCKET_WRITEABLE | WL_SOCKET_READABLE | WL_TIMEOUT;
+
+		/* Sleep until something happens or we time out */
+		WaitLatchOrSocket(MyLatch, wakeEvents,
+		  MyProcPort->sock, sleeptime,
+		  WAIT_EVENT_WAL_SENDER_WRITE_DATA);
 
 		/*
 		 * Emergency bailout if postmaster has died.  This is to avoid the
@@ -1198,34 +1231,9 @@ WalSndWriteData(LogicalDecodingContext *ctx, XLogRecPtr lsn, TransactionId xid,
 			SyncRepInitConfig();
 		}
 
-		/* Check for input from the client */
-		

Re: Postgres with pthread

2017-12-06 Thread Andres Freund
Hi,

On 2017-12-06 11:53:21 -0500, Tom Lane wrote:
> Konstantin Knizhnik  writes:
> However, if I guess at which numbers are supposed to be what,
> it looks like even the best case is barely a 50% speedup.

"barely a 50% speedup" - Hah. I don't believe the numbers, but that'd be
huge.


> That would be worth pursuing if it were reasonably low-hanging
> fruit, but converting PG to threads seems very far from being that.

I don't think immediate performance gains are the interesting part about
using threads. It's rather what their absence adds a lot in existing /
submitted code complexity, and makes some very commonly requested
features a lot harder to implement:

- we've a lot of duplicated infrastructure around dynamic shared
  memory. dsm.c dsa.c, dshash.c etc. A lot of these, especially dsa.c,
  are going to become a lot more complicated over time, just look at how
  complicated good multi threaded allocators are.

- we're adding a lot of slowness to parallelism, just because we have
  different memory layouts in different processes. Instead of just
  passing pointers through queues, we put entire tuples in there. We
  deal with dsm aware pointers.

- a lot of features have been a lot harder (parallelism!), and a lot of
  frequently requested ones are so hard due to processes that they never
  got off ground (in-core pooling, process reuse, parallel worker reuse)

- due to the statically sized shared memory a lot of our configuration
  is pretty fundamentally PGC_POSTMASTER, even though that present a lot
  of administrative problems.

...


> I think you've done us a very substantial service by pursuing
> this far enough to get some quantifiable performance results.
> But now that we have some results in hand, I think we're best
> off sticking with the architecture we've got.

I don't agree.

I'd personally expect that an immediate conversion would result in very
little speedup, a bunch of code deleted, a bunch of complexity
added. And it'd still be massively worthwhile, to keep medium to long
term complexity and feature viability in control.

Greetings,

Andres Freund



Re: Postgres with pthread

2017-12-06 Thread Robert Haas
On Wed, Dec 6, 2017 at 11:53 AM, Tom Lane  wrote:
> barely a 50% speedup.

I think that's an awfully strange choice of adverb.  This is, by its
authors own admission, a rough cut at this, probably with very little
of the optimization that could ultimately done, and it's already
buying 50% on some test cases?  That sounds phenomenally good to me.
A 50% speedup is huge, and chances are that it can be made quite a bit
better with more work, or that it already is quite a bit better with
the right test case.

TBH, based on previous discussion, I expected this to initially be
*slower* but still worthwhile in the long run because of optimizations
that it would let us do eventually with parallel query and other
things.  If it's this much faster out of the gate, that's really
exciting.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Postgres with pthread

2017-12-06 Thread Andres Freund
Hi!

On 2017-12-06 19:40:00 +0300, Konstantin Knizhnik wrote:
> As far as I remember, several years ago when implementation of intra-query
> parallelism was just started there was discussion whether to use threads or
> leave traditional Postgres process architecture. The decision was made to
> leave processes. So now we have bgworkers, shared message queue, DSM, ...
> The main argument for such decision was that switching to threads will
> require rewriting of most of Postgres code.

> It seems to be quit reasonable argument and and until now I agreed with it.
> 
> But recently I wanted to check it myself.

I think that's something pretty important to play with. There've been
several discussions lately, both on and off list / in person, that we're
taking on more-and-more technical debt just because we're using
processes. Besides the above, we've grown:
- a shared memory allocator
- a shared memory hashtable
- weird looking thread aware pointers
- significant added complexity in various projects due to addresses not
  being mapped to the same address etc.


> The first problem with porting Postgres to pthreads is static variables
> widely used in Postgres code.
> Most of modern compilers support thread local variables, for example GCC
> provides __thread keyword.
> Such variables are placed in separate segment which is address through
> segment register (at Intel).
> So access time to such variables is the same as to normal static variables.

I experimented similarly. Although I'm not 100% sure that if were to go
for it, we wouldn't instead want to abstract our session concept
further, or well, at all.


> Certainly may be not all compilers have builtin support of TLS and may be
> not at all hardware platforms them are implemented ias efficiently as at
> Intel.
> So certainly such approach decreases portability of Postgres. But IMHO it is
> not so critical.

I'd agree there, but I don't think the project necessarily does.


> What I have done:
> 1. Add session_local (defined as __thread) to definition of most of static
> and global variables.
> I leaved some variables pointed to shared memory as static. Also I have to
> changed initialization of some static variables,
> because address of TLS variable can not be used in static initializers.
> 2. Change implementation of GUCs to make them thread specific.
> 3. Replace fork() with pthread_create
> 4. Rewrite file descriptor cache to be global (shared by all threads).

That one I'm very unconvinced of, that's going to add a ton of new
contention.


> What are the advantages of using threads instead of processes?
> 
> 1. No need to use shared memory. So there is no static limit for amount of
> memory which can be used by Postgres. No need in distributed shared memory
> and other stuff designed to share memory between backends and
> bgworkers.

This imo is the biggest part. We can stop duplicating OS and our own
implementations in a shmem aware way.


> 2. Threads significantly simplify implementation of parallel algorithms:
> interaction and transferring data between threads can be done easily and
> more efficiently.

That's imo the same as 1.


> 3. It is possible to use more efficient/lightweight synchronization
> primitives. Postgres now mostly relies on its own low level sync.primitives
> which user-level implementation
> is using spinlocks and atomics and then fallback to OS semaphores/poll. I am
> not sure how much gain can we get by replacing this primitives with one
> optimized for threads.
> My colleague from Firebird community told me that just replacing processes
> with threads can obtain 20% increase of performance, but it is just first
> step and replacing sync. primitive
> can give much greater advantage. But may be for Postgres with its low level
> primitives it is not true.

I don't believe that that's actually the case to any significant degree.


> 6. Faster backend startup. Certainly starting backend at each user's request
> is bad thing in any case. Some kind of connection pooling should be used in
> any case to provide acceptable performance. But in any case, start of new
> backend process in postgres causes a lot of page faults which have
> dramatical impact on performance. And there is no such problem with threads.

I don't buy this in itself. The connection establishment overhead isn't
largely the fork, it's all the work afterwards. I do think it makes
connection pooling etc easier.


> I just want to receive some feedback and know if community is interested in
> any further work in this direction.

I personally am. I think it's beyond high time that we move to take
advantage of threads.

That said, I don't think just replacing threads is the right thing. I'm
pretty sure we'd still want to have postmaster as a separate process,
for robustness. Possibly we even want to continue having various
processes around besides that, the most interesting cases involving
threads are around intra-query parallelism, and pooling, and for both a
hybrid 

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-12-06 Thread Robert Haas
On Wed, Dec 6, 2017 at 9:54 AM, Tom Lane  wrote:
> Maybe track "worker is known to have launched" in the leader's state
> someplace?

That's probably one piece of it, yes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: views

2017-12-06 Thread Tom Lane
Andrzej Barszcz  writes:
> I would be happy when checkRuleResultList in rewriteDefine.c lost so strict
> conditions for returning clause. Are there any reasons to have such
> restriction for views ?

Uh, so that the results of a query that invokes the rule are well-defined?

If you think that for your application, it's okay for the RETURNING rule
to not bother providing useful data for some columns, you could just have
it return nulls for those.  But I don't think it's appropriate for the
system itself to make a value judgement like that.  If the columns don't
match up, it could very well be user error, and we should report that.

regards, tom lane



Re: Postgres with pthread

2017-12-06 Thread Adam Brusselback
Here it is formatted a little better.



​
So a little over 50% performance improvement for a couple of the test cases.



On Wed, Dec 6, 2017 at 11:53 AM, Tom Lane  wrote:

> Konstantin Knizhnik  writes:
> > Below are some results (1000xTPS) of select-only (-S) pgbench with scale
> > 100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM:
>
> > ConnectionsVanilla/default   Vanilla/prepared
> > pthreads/defaultpthreads/prepared
> > 10100 191
> > 106 207
> > 100  67 131
> > 105 168
> > 100041 65
> > 55   102
>
> This table is so mangled that I'm not very sure what it's saying.
> Maybe you should have made it an attachment?
>
> However, if I guess at which numbers are supposed to be what,
> it looks like even the best case is barely a 50% speedup.
> That would be worth pursuing if it were reasonably low-hanging
> fruit, but converting PG to threads seems very far from being that.
>
> I think you've done us a very substantial service by pursuing
> this far enough to get some quantifiable performance results.
> But now that we have some results in hand, I think we're best
> off sticking with the architecture we've got.
>
> regards, tom lane
>
>


views

2017-12-06 Thread Andrzej Barszcz
Dear hackers,

I would be happy when checkRuleResultList in rewriteDefine.c lost so strict
conditions for returning clause. Are there any reasons to have such
restriction for views ?
What I mean in short:

 create view ... as select MAIN_TABLE ... joins ...

1. Updating view means updating MAIN_TABLE
2. I want to return after insert at least primary key

No rule system, no trigger INSTEAD OF give me such possibility but I know
that it's possible after modification of checkRuleResultList.


Postgres with pthread

2017-12-06 Thread Konstantin Knizhnik

Hi hackers,

As far as I remember, several years ago when implementation of 
intra-query parallelism was just started there was discussion whether to 
use threads or leave traditional Postgres process architecture. The 
decision was made to leave processes. So now we have bgworkers, shared 
message queue, DSM, ...
The main argument for such decision was that switching to threads will 
require rewriting of most of Postgres code.

It seems to be quit reasonable argument and and until now I agreed with it.

But recently I wanted to check it myself.
The first problem with porting Postgres to pthreads is static variables 
widely used in Postgres code.
Most of modern compilers support thread local variables, for example GCC 
provides __thread keyword.
Such variables are placed in separate segment which is address through 
segment register (at Intel).

So access time to such variables is the same as to normal static variables.

Certainly may be not all compilers have builtin support of TLS and may 
be not at all hardware platforms them are implemented ias efficiently as 
at Intel.
So certainly such approach decreases portability of Postgres. But IMHO 
it is not so critical.


What I have done:
1. Add session_local (defined as __thread) to definition of most of 
static and global variables.
I leaved some variables pointed to shared memory as static. Also I have 
to changed initialization of some static variables,

because address of TLS variable can not be used in static initializers.
2. Change implementation of GUCs to make them thread specific.
3. Replace fork() with pthread_create
4. Rewrite file descriptor cache to be global (shared by all threads).

I have not changed all Postgres synchronization primitives and shared 
memory.

It took me about one week of work.

What is  not done yet:
1. Handling of signals (I expect that Win32 code can be somehow reused 
here).

2. Deallocation of memory and closing files on backend (thread) termination.
3. Interaction of postmaster and backends with PostgreSQL auxiliary 
processes (threads), such as autovacuum, bgwriter, checkpointer, stat 
collector,...


What are the advantages of using threads instead of processes?

1. No need to use shared memory. So there is no static limit for amount 
of memory which can be used by Postgres. No need in distributed shared 
memory and other stuff designed to share memory between backends and 
bgworkers.
2. Threads significantly simplify implementation of parallel algorithms: 
interaction and transferring data between threads can be done easily and 
more efficiently.
3. It is possible to use more efficient/lightweight synchronization 
primitives. Postgres now mostly relies on its own low level 
sync.primitives which user-level implementation
is using spinlocks and atomics and then fallback to OS semaphores/poll. 
I am not sure how much gain can we get by replacing this primitives with 
one optimized for threads.
My colleague from Firebird community told me that just replacing 
processes with threads can obtain 20% increase of performance, but it is 
just first step and replacing sync. primitive
can give much greater advantage. But may be for Postgres with its low 
level primitives it is not true.
4. Threads are more lightweight entities than processes. Context switch 
between threads takes less time than between process. And them consume 
less memory. It is usually possible to spawn more threads than processes.
5. More efficient access to virtual memory. As far as all threads are 
sharing the same memory space, TLB is used much efficiently in this case.
6. Faster backend startup. Certainly starting backend at each user's 
request is bad thing in any case. Some kind of connection pooling should 
be used in any case to provide acceptable performance. But in any case, 
start of new backend process in postgres causes a lot of page faults 
which have dramatical impact on performance. And there is no such 
problem with threads.


Certainly, processes are also having some advantages comparing with threads:
1. Better isolation and error protection
2. Easier error handling
3. Easier control of used resources

But it is a theory. The main idea of this prototype was to prove or 
disprove this expectation at practice.
I didn't expect large differences in performance because synchronization 
primitives are not changed and I performed my experiments at Linux where 
threads/processes are implemented in similar way.


Below are some results (1000xTPS) of select-only (-S) pgbench with scale 
100 at my desktop with quad-core i7-4770 3.40GHz and 16Gb of RAM:


Connections    Vanilla/default   Vanilla/prepared 
pthreads/defaultpthreads/prepared
10                    100 191   
106 207
100  67 131   
105 168
1000    41 65 
55   102


As you can see, for small number of connection results 

Re: ALTER TABLE ADD COLUMN fast default

2017-12-06 Thread Andrew Dunstan


On 12/06/2017 11:05 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 12/06/2017 10:16 AM, Tom Lane wrote:
>>> I'm quite disturbed both by the sheer invasiveness of this patch
>>> (eg, changing the API of heap_attisnull()) and by the amount of logic
>>> it adds to fundamental tuple-deconstruction code paths.  I think
>>> we'll need to ask some hard questions about whether the feature gained
>>> is worth the distributed performance loss that will ensue.
>> I'm sure we can reduce the footprint some.
>> w.r.t. heap_attisnull, only a handful of call sites actually need the
>> new functionality, 8 or possibly 10 (I need to check more on those in
>> relcache.c). So we could instead of changing the function provide a new
>> one to be used at those sites. I'll work on that. and submit a new patch
>> fairly shortly.
> Meh, I'm not at all sure that's an improvement.  A version of
> heap_attisnull that ignores the possibility of a "missing" attribute value
> will give the *wrong answer* if the other version should have been used
> instead.  Furthermore it'd be an attractive nuisance because people would
> fail to notice if an existing call were now unsafe.  If we're to do this
> I think we have no choice but to impose that compatibility break on
> third-party code.


Fair point.

>
> In general, you need to be thinking about this in terms of making sure
> that it's impossible to accidentally not update code that needs to be
> updated.  Another example is that I noticed that some places the patch
> has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
> the former will fail to copy the missing-attribute support data.
> I think that's an astonishingly bad idea.  There is basically no situation
> in which CreateTupleDescCopy defined in that way will ever be safe to use.
> The missing-attribute info is now a fundamental part of a tupdesc and it
> has to be copied always, just as much as e.g. atttypid.
>
> I would opine that it's a mistake to design TupleDesc as though the
> missing-attribute data had anything to do with default values.  It may
> have originated from the same place but it's a distinct thing.  Better
> to store it in a separate sub-structure.


OK, will work on all that. Thanks again.

cheers

andrew



-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-06 Thread Alvaro Herrera
I think you've done a stellar job of identifying what the actual problem
was.  I like the new (simpler) coding of that portion of
HeapTupleSatisfiesVacuum.

freeze-the-dead is not listed in isolation_schedule; an easy fix.
I confirm that the test crashes with an assertion failure without the
code fix, and that it doesn't with it.

I think the comparison to OldestXmin should be reversed:

if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;

return HEAPTUPLE_DEAD;

This way, an xmax that has exactly the OldestXmin value will return
RECENTLY_DEAD rather DEAD, which seems reasonable to me (since
OldestXmin value itself is supposed to be still possibly visible to
somebody).  Also, this way it is consistent with the other comparison to
OldestXmin at the bottom of the function.  There is no reason for the
"else" or the extra braces.

Put together, I propose the attached delta for 0001.

Your commit message does a poor job of acknowledging prior work on
diagnosing the problem starting from Dan's initial test case and patch.

I haven't looked at your 0002 yet.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/utils/time/tqual.c b/src/backend/utils/time/tqual.c
index 8be2980116..aab03835d1 100644
--- a/src/backend/utils/time/tqual.c
+++ b/src/backend/utils/time/tqual.c
@@ -1324,25 +1324,23 @@ HeapTupleSatisfiesVacuum(HeapTuple htup, TransactionId 
OldestXmin,
else if (TransactionIdDidCommit(xmax))
{
/*
-* The multixact might still be running due to lockers. 
If the
-* updater is below the horizon we have to return DEAD 
regardless
-* - otherwise we could end up with a tuple where the 
updater has
-* to be removed due to the horizon, but is not pruned 
away. It's
-* not a problem to prune that tuple because all the 
lockers will
-* also be present in the newer tuple version.
+* The multixact might still be running due to lockers. 
 If the
+* updater is below the xid horizon, we have to return 
DEAD
+* regardless -- otherwise we could end up with a tuple 
where the
+* updater has to be removed due to the horizon, but is 
not pruned
+* away.  It's not a problem to prune that tuple, 
because any
+* remaining lockers will also be present in newer 
tuple versions.
 */
-   if (TransactionIdPrecedes(xmax, OldestXmin))
-   {
-   return HEAPTUPLE_DEAD;
-   }
-   else
+   if (!TransactionIdPrecedes(xmax, OldestXmin))
return HEAPTUPLE_RECENTLY_DEAD;
+
+   return HEAPTUPLE_DEAD;
}
else if 
(!MultiXactIdIsRunning(HeapTupleHeaderGetRawXmax(tuple), false))
{
/*
 * Not in Progress, Not Committed, so either Aborted or 
crashed.
-* Remove the Xmax.
+* Mark the Xmax as invalid.
 */
SetHintBits(tuple, buffer, HEAP_XMAX_INVALID, 
InvalidTransactionId);
}
diff --git a/src/test/isolation/isolation_schedule 
b/src/test/isolation/isolation_schedule
index e41b9164cd..eb566ebb6c 100644
--- a/src/test/isolation/isolation_schedule
+++ b/src/test/isolation/isolation_schedule
@@ -44,6 +44,7 @@ test: update-locked-tuple
 test: propagate-lock-delete
 test: tuplelock-conflict
 test: tuplelock-update
+test: freeze-the-dead
 test: nowait
 test: nowait-2
 test: nowait-3


Re: ALTER TABLE ADD COLUMN fast default

2017-12-06 Thread Tom Lane
Andrew Dunstan  writes:
> On 12/06/2017 10:16 AM, Tom Lane wrote:
>> I'm quite disturbed both by the sheer invasiveness of this patch
>> (eg, changing the API of heap_attisnull()) and by the amount of logic
>> it adds to fundamental tuple-deconstruction code paths.  I think
>> we'll need to ask some hard questions about whether the feature gained
>> is worth the distributed performance loss that will ensue.

> I'm sure we can reduce the footprint some.

> w.r.t. heap_attisnull, only a handful of call sites actually need the
> new functionality, 8 or possibly 10 (I need to check more on those in
> relcache.c). So we could instead of changing the function provide a new
> one to be used at those sites. I'll work on that. and submit a new patch
> fairly shortly.

Meh, I'm not at all sure that's an improvement.  A version of
heap_attisnull that ignores the possibility of a "missing" attribute value
will give the *wrong answer* if the other version should have been used
instead.  Furthermore it'd be an attractive nuisance because people would
fail to notice if an existing call were now unsafe.  If we're to do this
I think we have no choice but to impose that compatibility break on
third-party code.

In general, you need to be thinking about this in terms of making sure
that it's impossible to accidentally not update code that needs to be
updated.  Another example is that I noticed that some places the patch
has s/CreateTupleDescCopy/CreateTupleDescCopyConstr/, evidently because
the former will fail to copy the missing-attribute support data.
I think that's an astonishingly bad idea.  There is basically no situation
in which CreateTupleDescCopy defined in that way will ever be safe to use.
The missing-attribute info is now a fundamental part of a tupdesc and it
has to be copied always, just as much as e.g. atttypid.

I would opine that it's a mistake to design TupleDesc as though the
missing-attribute data had anything to do with default values.  It may
have originated from the same place but it's a distinct thing.  Better
to store it in a separate sub-structure.

regards, tom lane



Re: ALTER TABLE ADD COLUMN fast default

2017-12-06 Thread Andrew Dunstan


On 12/06/2017 10:16 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> Attached is a patch for $subject. It's based on Serge Reilau's patch of
>> about a year ago, taking into account comments made at the time. with
>> bitrot removed and other enhancements such as documentation.
>> Essentially it stores a value in pg_attribute that is used when the
>> stored tuple is missing the attribute. This works unless the default
>> expression is volatile, in which case a table rewrite is forced as
>> happens now.
> I'm quite disturbed both by the sheer invasiveness of this patch
> (eg, changing the API of heap_attisnull()) and by the amount of logic
> it adds to fundamental tuple-deconstruction code paths.  I think
> we'll need to ask some hard questions about whether the feature gained
> is worth the distributed performance loss that will ensue.
>
>   


Thanks for prompt comments.

I'm sure we can reduce the footprint some.

w.r.t. heap_attisnull, only a handful of call sites actually need the
new functionality, 8 or possibly 10 (I need to check more on those in
relcache.c). So we could instead of changing the function provide a new
one to be used at those sites. I'll work on that. and submit a new patch
fairly shortly.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: ALTER TABLE ADD COLUMN fast default

2017-12-06 Thread Tom Lane
Andrew Dunstan  writes:
> Attached is a patch for $subject. It's based on Serge Reilau's patch of
> about a year ago, taking into account comments made at the time. with
> bitrot removed and other enhancements such as documentation.

> Essentially it stores a value in pg_attribute that is used when the
> stored tuple is missing the attribute. This works unless the default
> expression is volatile, in which case a table rewrite is forced as
> happens now.

I'm quite disturbed both by the sheer invasiveness of this patch
(eg, changing the API of heap_attisnull()) and by the amount of logic
it adds to fundamental tuple-deconstruction code paths.  I think
we'll need to ask some hard questions about whether the feature gained
is worth the distributed performance loss that will ensue.

regards, tom lane



Re: compress method for spgist - 2

2017-12-06 Thread Nikita Glukhov

On 05.12.2017 23:59, Alexander Korotkov wrote:

On Tue, Dec 5, 2017 at 1:14 PM, Darafei Praliaskouski > wrote:


The following review has been posted through the commitfest
application:
make installcheck-world:  not tested
Implements feature:       not tested
Spec compliant:           not tested
Documentation:            tested, passed

I've read the updated patch and see my concerns addressed.

I'm looking forward to SP-GiST compress method support, as it will
allow usage of SP-GiST index infrastructure for PostGIS.

The new status of this patch is: Ready for Committer


I went trough this patch.  I found documentation changes to be not 
sufficient.  And I've made some improvements.


In particular, I didn't understand why is reconstructedValue claimed 
to be of spgConfigOut.leafType while it should be of 
spgConfigIn.attType both from general logic and code.  I've fixed 
that.  Nikita, correct me if I'm wrong.


I think we are reconstructing a leaf datum, so documentation was correct 
but the code in spgWalk() and freeScanStackEntry() wrongly used attType 
instead of attLeafType. Fixed patch is attached.


Also, I wonder should we check for existence of compress method when 
attType and leafType are not the same in spgvalidate() function?  We 
don't do this for now.

I've added compress method existence check to spgvalidate().


0002-spgist-polygon-8.patch is OK for me so soon.


--
Nikita Glukhov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
diff --git a/doc/src/sgml/spgist.sgml b/doc/src/sgml/spgist.sgml
index 139c8ed..b4a8be4 100644
--- a/doc/src/sgml/spgist.sgml
+++ b/doc/src/sgml/spgist.sgml
@@ -240,20 +240,22 @@
 
  
   There are five user-defined methods that an index operator class for
-  SP-GiST must provide.  All five follow the convention
-  of accepting two internal arguments, the first of which is a
-  pointer to a C struct containing input values for the support method,
-  while the second argument is a pointer to a C struct where output values
-  must be placed.  Four of the methods just return void, since
-  all their results appear in the output struct; but
+  SP-GiST must provide, and one is optional.  All five
+  mandatory methods follow the convention of accepting two internal
+  arguments, the first of which is a pointer to a C struct containing input
+  values for the support method, while the second argument is a pointer to a
+  C struct where output values must be placed.  Four of the mandatory methods just
+  return void, since all their results appear in the output struct; but
   leaf_consistent additionally returns a boolean result.
   The methods must not modify any fields of their input structs.  In all
   cases, the output struct is initialized to zeroes before calling the
-  user-defined method.
+  user-defined method.  Optional sixth method compress
+  accepts datum to be indexed as the only argument and returns value suitable
+  for physical storage in leaf tuple.
  
 
  
-  The five user-defined methods are:
+  The five mandatory user-defined methods are:
  
 
  
@@ -283,6 +285,7 @@ typedef struct spgConfigOut
 {
 Oid prefixType; /* Data type of inner-tuple prefixes */
 Oid labelType;  /* Data type of inner-tuple node labels */
+Oid leafType;   /* Data type of leaf-tuple values */
 boolcanReturnData;  /* Opclass can reconstruct original data */
 boollongValuesOK;   /* Opclass can cope with values  1 page */
 } spgConfigOut;
@@ -305,6 +308,22 @@ typedef struct spgConfigOut
   class is capable of segmenting long values by repeated suffixing
   (see ).
  
+
+ 
+  leafType is typically the same as
+  attType.  For the reasons of backward
+  compatibility, method config can
+  leave leafType uninitialized; that would
+  give the same effect as setting leafType equal
+  to attType.  When attType
+  and leafType are different, then optional
+  method compress must be provided.
+  Method compress is responsible
+  for transformation of datums to be indexed from attType
+  to leafType.
+  Note: both consistent functions will get scankeys
+  unchanged, without transformation using compress.
+ 
  
 
 
@@ -380,10 +399,16 @@ typedef struct spgChooseOut
 } spgChooseOut;
 
 
-   datum is the original datum that was to be inserted
-   into the index.
-   leafDatum is initially the same as
-   datum, but can change at lower levels of the tree
+   datum is the original datum of
+   spgConfigIn.attType
+   type that was to be inserted into the index.
+   leafDatum is a value of
+   spgConfigOut.leafType
+   type which is initially an result of method
+   compress applied to datum
+   when method compress is provided, or same value as
+   datum otherwise.
+   leafDatum 

Re: [HACKERS] Custom compression methods

2017-12-06 Thread Ildus Kurbangaliev
On Fri, 1 Dec 2017 21:47:43 +0100
Tomas Vondra  wrote:
 
> 
> +1 to do the rewrite, just like for other similar ALTER TABLE commands

Ok. What about the following syntax:

ALTER COLUMN DROP COMPRESSION - removes compression from the column
with the rewrite and removes related compression options, so the user
can drop compression method.

ALTER COLUMN SET COMPRESSION NONE for the cases when
the users want to just disable compression for future tuples. After
that they can keep compressed tuples, or in the case when they have a
large table they can decompress tuples partially using e.g. UPDATE,
and then use ALTER COLUMN DROP COMPRESSION which will be much faster
then.

ALTER COLUMN SET COMPRESSION  WITH  will change
compression for new tuples but will not touch old ones. If the users
want the recompression they can use DROP/SET COMPRESSION combination.

I don't think that SET COMPRESSION with the rewrite of the whole table
will be useful enough on any somewhat big tables and same time big
tables is where the user needs compression the most.

I understand that ALTER with the rewrite sounds logical and much easier
to implement (and it doesn't require Oids in tuples), but it could be
unusable.

-- 

Regards,
Ildus Kurbangaliev



Re: [HACKERS] Transaction control in procedures

2017-12-06 Thread Peter Eisentraut
On 12/5/17 13:33, Robert Haas wrote:
> On Tue, Dec 5, 2017 at 1:25 PM, Peter Eisentraut
>  wrote:
>> I think ROLLBACK in a cursor loop might not make sense, because the
>> cursor query itself could have side effects, so a rollback would have to
>> roll back the entire loop.  That might need more refined analysis before
>> it could be allowed.
> 
> COMMIT really has the same problem; if the cursor query has side
> effects, you can't commit those side effects piecemeal as the loop
> executed and have things behave sanely.

The first COMMIT inside the loop would commit the cursor query.  This
isn't all that different from what you'd get now if you coded this
manually using holdable cursors or just plain client code.  Clearly, you
can create a mess if the loop body interacts with the loop expression,
but that's already the case.

But if you coded something like this yourself now and ran a ROLLBACK
inside the loop, the holdable cursor would disappear (unless previously
committed), so you couldn't proceed with the loop.

The SQL standard for persistent stored modules explicitly prohibits
COMMIT and ROLLBACK in cursor loop bodies.  But I think people will
eventually want it.

>>> - COMMIT or ROLLBACK inside a procedure with a SET clause attached,
>>
>> That also needs to be prohibited because of the way the GUC nesting
>> currently works.  It's probably possible to fix it, but it would be a
>> separate effort.
>>
>>> and/or while SET LOCAL is in effect either at the inner or outer
>>> level.
>>
>> That seems to work fine.
> 
> These two are related -- if you don't permit anything that makes
> temporary changes to GUCs at all, like SET clauses attached to
> functions, then SET LOCAL won't cause any problems.  The problem is if
> you do a transaction operation when something set locally is in the
> stack of values, but not at the top.

Yes, that's exactly the problem.  So right now I'm just preventing the
problematic scenario.  So fix that, one would possibly have to replace
the stack by something not quite a stack.


New patch attached.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 18aaf292fbb22647e09c38cc21b56ff98643e518 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 6 Dec 2017 09:29:25 -0500
Subject: [PATCH v4] Transaction control in PL procedures

In each of the supplied procedural languages (PL/pgSQL, PL/Perl,
PL/Python, PL/Tcl), add language-specific commit and rollback
functions/commands to control transactions in procedures in that
language.  Add similar underlying functions to SPI.  Some additional
cleanup so that transaction commit or abort doesn't blow away data
structures still used by the procedure call.  Add execution context
tracking to CALL and DO statements so that transaction control commands
can only be issued in top-level procedure and block calls, not function
calls or other procedure or block calls.
---
 doc/src/sgml/plperl.sgml  |  49 +
 doc/src/sgml/plpgsql.sgml |  91 
 doc/src/sgml/plpython.sgml|  38 
 doc/src/sgml/pltcl.sgml   |  41 
 doc/src/sgml/ref/call.sgml|   7 +
 doc/src/sgml/ref/create_procedure.sgml|   7 +
 doc/src/sgml/ref/do.sgml  |   7 +
 doc/src/sgml/spi.sgml | 219 
 src/backend/commands/functioncmds.c   |  45 +++-
 src/backend/executor/spi.c| 112 --
 src/backend/tcop/utility.c|   6 +-
 src/backend/utils/mmgr/portalmem.c|  58 +++---
 src/include/commands/defrem.h |   4 +-
 src/include/executor/spi.h|   5 +
 src/include/executor/spi_priv.h   |   4 +
 src/include/nodes/nodes.h |   3 +-
 src/include/nodes/parsenodes.h|   7 +
 src/include/utils/portal.h|   1 +
 src/pl/plperl/GNUmakefile |   2 +-
 src/pl/plperl/SPI.xs  |  12 ++
 src/pl/plperl/expected/plperl_transaction.out |  94 +
 src/pl/plperl/plperl.c|   6 +
 src/pl/plperl/sql/plperl_transaction.sql  |  88 
 src/pl/plpgsql/src/pl_exec.c  |  66 +-
 src/pl/plpgsql/src/pl_funcs.c |  44 
 src/pl/plpgsql/src/pl_gram.y  |  34 +++
 src/pl/plpgsql/src/pl_handler.c   |   8 +
 src/pl/plpgsql/src/pl_scanner.c   |   2 +
 src/pl/plpgsql/src/plpgsql.h  |  22 +-
 src/pl/plpython/Makefile  |   1 +
 src/pl/plpython/expected/plpython_test.out|   4 +-
 src/pl/plpython/expected/plpython_transaction.out | 104 

ALTER TABLE ADD COLUMN fast default

2017-12-06 Thread Andrew Dunstan

Attached is a patch for $subject. It's based on Serge Reilau's patch of
about a year ago, taking into account comments made at the time. with
bitrot removed and other enhancements such as documentation.

Essentially it stores a value in pg_attribute that is used when the
stored tuple is missing the attribute. This works unless the default
expression is volatile, in which case a table rewrite is forced as
happens now.

Comments welcome.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 3f02202..d5dc14a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -1150,6 +1150,18 @@
  
 
  
+  atthasmissing
+  bool
+  
+  
+This column has a value which is used where the column is entirely
+missing from the row, as happens when a column is added after the
+row is created. The actual value used is stored in the
+attmissingval  column.
+  
+ 
+
+ 
   attidentity
   char
   
@@ -1229,6 +1241,18 @@
   
  
 
+ 
+  attmissingval
+  bytea
+  
+  
+This column has a binary representation of the value used when the column
+is entirely missing from the row, as happens when the column is added after
+the row is created. The value is only used when
+atthasmissing is true.
+  
+ 
+
 

   
diff --git a/doc/src/sgml/ref/alter_table.sgml b/doc/src/sgml/ref/alter_table.sgml
index 7bcf242..780bead 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -1115,26 +1115,28 @@ ALTER TABLE [ IF EXISTS ] name

 

-When a column is added with ADD COLUMN, all existing
-rows in the table are initialized with the column's default value
-(NULL if no DEFAULT clause is specified).
-If there is no DEFAULT clause, this is merely a metadata
-change and does not require any immediate update of the table's data;
-the added NULL values are supplied on readout, instead.
+ When a column is added with ADD COLUMN and a
+ non-volatile DEFAULT is specified, the default
+ is evaluated at the time of the statement and the result stored in the
+ table's metadata. That value will be used for the column for
+ all existing rows. If no DEFAULT is specified
+ NULL is used. In neither case is a rewrite of the table required.

 

-Adding a column with a DEFAULT clause or changing the type of
-an existing column will require the entire table and its indexes to be
-rewritten.  As an exception when changing the type of an existing column,
-if the USING clause does not change the column
-contents and the old type is either binary coercible to the new type or
-an unconstrained domain over the new type, a table rewrite is not needed;
-but any indexes on the affected columns must still be rebuilt.  Adding or
-removing a system oid column also requires rewriting the entire
-table.  Table and/or index rebuilds may take a significant amount of time
-for a large table; and will temporarily require as much as double the disk
-space.
+ Adding a column with a volatile DEFAULT or
+ changing the type of
+ an existing column will require the entire table and its indexes to be
+ rewritten.  As an exception when changing the type of an existing column,
+ if the USING clause does not change the column
+ contents and the old type is either binary coercible to the new type or
+ an unconstrained domain over the new type, a table rewrite is not needed;
+ but any indexes on the affected columns must still be rebuilt.  Adding or
+ removing a system oid column also requires rewriting
+ the entire table.
+ Table and/or index rebuilds may take a significant amount of time
+ for a large table; and will temporarily require as much as double the disk
+ space.

 

diff --git a/src/backend/access/common/heaptuple.c b/src/backend/access/common/heaptuple.c
index a1a9d99..8188acf 100644
--- a/src/backend/access/common/heaptuple.c
+++ b/src/backend/access/common/heaptuple.c
@@ -76,6 +76,105 @@
  * 
  */
 
+/*
+ * Return the missing value of an attribute, or NULL if it
+ * there is no missing value.
+ */
+static Datum
+getmissingattr(TupleDesc tupleDesc,
+			   int attnum, bool *isnull)
+{
+	int			missingnum;
+	Form_pg_attribute att;
+	AttrDefault *attrdef;
+
+	Assert(attnum <= tupleDesc->natts);
+	Assert(attnum > 0);
+
+	att = TupleDescAttr(tupleDesc, attnum - 1);
+
+	if (att->atthasmissing)
+	{
+		Assert(tupleDesc->constr);
+		Assert(tupleDesc->constr->num_defval > 0);
+		Assert(tupleDesc->constr->has_missingval);
+
+		attrdef = tupleDesc->constr->defval;
+
+		/*
+		 * Look through the 

Re: [HACKERS] parallel.c oblivion of worker-startup failures

2017-12-06 Thread Robert Haas
On Wed, Dec 6, 2017 at 1:43 AM, Amit Kapila  wrote:
> Looks good to me.

Committed and back-patched to 9.4 where dynamic background workers
were introduced.

>> Barring objections, I'll commit and
>> back-patch this; then we can deal with the other part of this
>> afterward.
>
> Sure, I will rebase on top of the commit unless you have some comments
> on the remaining part.

I'm not in love with that part of the fix; the other parts of that if
statement are just testing variables, and a function call that takes
and releases an LWLock is a lot more expensive.  Furthermore, that
test will often be hit in practice, because we'll often arrive at this
function before workers have actually finished.  On top of that, we'll
typically arrive here having already communicated with the worker in
some way, such as by receiving tuples, which means that in most cases
we already know that the worker was alive at least at some point, and
therefore the extra test isn't necessary.  We only need that test, if
I understand correctly, to cover the failure-to-launch case, which is
presumably very rare.

All that having been said, I don't quite know how to do it better.  It
would be easy enough to modify this function so that if it ever
received a result other then BGWH_NOT_YET_STARTED for a worker, it
wouldn't call this function again for that worker.  However, that's
only mitigating the problem, not really fixing it.  We'll still end up
frequently inquiring - admittedly only once - about the status of a
worker with which we've previously communicated via the tuple queue.
Maybe we just have to live with that problem, but do you (or does
anyone else) have an idea?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] [COMMITTERS] pgsql: Fix freezing of a dead HOT-updated tuple

2017-12-06 Thread Michael Paquier
On Wed, Dec 6, 2017 at 5:03 PM, Andres Freund  wrote:
> Ping.  I'm a bit surprised that a bug fixing a significant data
> corruption issue has gotten no reviews at all.

Note that I was planning to look at this problem today and tomorrow my
time, getting stuck for CF handling last week and conference this
week. If you think that helps, I'll be happy to help at the extent of
what I can do.
-- 
Michael