date:20170626

Re: [HACKERS] Modifing returning value of PQgetvalue.

2017-06-26 Thread Tom Lane

Peter Eisentraut  writes:
> On 6/24/17 06:31, Dmitry Igrishin wrote:
>> PQgetvalue returns a value of type char* (without const). But the
>> documentation says:
>> "The pointer returned by PQgetvalue points to storage that is part of
>> the PGresult structure. /One should not modify the data it points to/"
>> (my italics). Could someone tell me please, what wrong with modifing
>> arbitrary character of the data pointed by PQgetvalue's returning value?
>> Or why this restriction is documented? Thanks.

> This is just how the API is defined.  It could be defined differently,
> but it is not.

To enlarge on that: it might well work today.  But if we change the
code in future so that it does not work, we will make no apologies.

There's at least one case where you could have a problem today.
All null entries in a PGresult share the same pointer to an empty
string, so that if you were to modify that string, it would affect
what's seen for other "null" values.  If the API allowed modification
of data values, that would be a bug.  But it doesn't, and so it isn't,
and we wouldn't look kindly on complaints about it.

Ideally, PQgetvalue should be redefined to return "const char *".
I don't really see us doing that anytime soon though.  It would
result in a lot of compile warnings for perfectly-safe client code.
I'm sure this API would look different if it weren't older than
universal availability of "const" :-(

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgjdbc logical replication client throwing exception

2017-06-26 Thread sanyam jain

Hi,

>What does the server log say?  If nothing interesting, turn up debugging.

I receive the following Log on server
LOG:  could not send data to client: Broken pipe

Thanks,
Sanyam Jain

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-26 Thread Mithun Cy

On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown  wrote:

>
> I also think pg_prewarm.dump_interval should be renamed to
> pg_prewarm.autoprewarm_interval.

Thanks, I have changed it to pg_prewarm.autoprewarm_interval.

>>
>> * In the documentation, don't say "This is a SQL callable function
>> to".  This is a list of SQL-callable functions, so each thing in
>> the list is one.  Just delete this from the beginning of each
>> sentence.

> One thing I couldn't quite make sense of is:
>
> "The autoprewarm process will start loading blocks recorded in
> $PGDATA/autoprewarm.blocks until there is a free buffer left in the
> buffer pool."
>
> Is this saying "until there is a single free buffer remaining in
> shared buffers"?  I haven't corrected or clarified this as I don't
> understand it.

Sorry, that was a typo I wanted to say  until there is no free buffer
left. Fixed in autoprewarm_16.patch.

>
> Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
> detect an autoprewarm process already running.  I'd want this to
> return NULL or an error if called for a 2nd time.

We log instead of error as we try to check only after launching the
worker and inside worker. One solution could be as similar to
autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
memory and check if we can launch worker in backend itself. I will try
to fix same.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-06-26 Thread Mithun Cy

Thanks, Robert, I have tried to fix all of you comments and merged to
fixes suggested by Thom in patch 15.

On Fri, Jun 23, 2017 at 3:22 AM, Robert Haas  wrote:
>
> * I suggest renaming launch_autoprewarm_dump() to
> autoprewarm_start_worker().  I think that will be clearer.  Remember
> that user-visible names, internal names, and the documentation should
> all match.

-- Fixed as suggested.

>
> * I think the GUC could be pg_prewarm.autoprewarm rather than
> pg_prewarm.enable_autoprewarm.  It's shorter and, I think, no less
> clear.

-- I have made GUC name as autoprewarm.

>
> * In the documentation, don't say "This is a SQL callable function
> to".  This is a list of SQL-callable functions, so each thing in
> the list is one.  Just delete this from the beginning of each
> sentence.

-- Fixed, Thom has provided the fix and I have merged same to my patch.

> * The reason for the AT_PWARM_* naming is not very obvious.  Does AT
> mean "at" or "auto" or something else?  How about
> AUTOPREWARM_INTERVAL_DISABLED, AUTOPREWARM_INTERVAL_SHUTDOWN_ONLY,
> AUTOPREWARM_INTERVAL_DEFAULT?

-- Fixed as suggested. The AUTOPREWARM_INTERVAL_DISABLED is removed
now as suggested by below comments.

>
> * Instead of defining apw_sigusr1_handler, I think you could just use
> procsignal_sigusr1_handler.  Instead of defining apw_sigterm_handler,
> perhaps you could just use die().  got_sigterm would go away, and
> you'd just CHECK_FOR_INTERRUPTS().

-- Hi have registered procsignal_sigusr1_handler instead of
apw_sigusr1_handler. But I have some doubts about using die instead of
apw_sigterm_handler in main autoprewarm worker. On shutdown(sigterm)
we should dump and then exit, so doing a CHECK_FOR_INTERRUPTS() we
might miss dumping the buffer contents. I think I need to modify some
server code in ProcessInterrupts to handle this, please let me know if
I am wrong about this.
For per-database prewarm worker, this seems right so I am registering
die for SIGTERM and calling CHECK_FOR_INTERRUPTS(). Also for
autoprewarm_dump_now().

>
> * The PG_TRY()/PG_CATCH() block in autoprewarm_dump_now() could reuse
> reset_apw_state(), which might be better named detach_apw_shmem().
> Similarly, init_apw_state() could be init_apw_shmem().

-- Fixed.

> * Instead of load_one_database(), I suggest
> autoprewarm_database_main().  That is more parallel to
> autoprewarm_main(), which you have elsewhere, and makes it more
> obvious that it's the main entrypoint for some background worker.

-- Fixed.

> * Instead of launch_and_wait_for_per_database_worker(), I suggest
> autoprewarm_one_database(), and instead of prewarm_buffer_pool(), I
> suggest autoprewarm_buffers().   The motivation for changing prewarm
> to autoprewarm is that we want the names here to be clearly distinct
> from the other parts of pg_prewarm that are not related to
> autoprewarm.  The motivation for changing buffer_pool to buffers is
> just that it's a little shorter.  Personally I  also like the sound it
> of it better, but YMMV.

-- Fixed as suggested. I have renamed as suggested.

> * prewarm_buffer_pool() ends with a useless return statement.  I
> suggest removing it.

-- Sorry Fixed.

>
> * Instead of creating our own buffering system via buffer_file_write()
> and buffer_file_flush(), why not just use the facilities provided by
> the operating system?  fopen() et. al. provide buffering, and we have
> AllocateFile() to provide a FILE *; it's just like
> OpenTransientFile(), which you are using, but you'll get the buffering
> stuff for free.  Maybe there's some reason why this won't work out
> nicely, but off-hand it seems like it might.  It looks like you are
> already using AllocateFile() to read the dump, so using it to write
> the dump as well seems like it would be logical.

-- Now using AllocateFile().

> * I think that it would be cool if, when autoprewarm completed, it
> printed a message at LOG rather than DEBUG1, and with a few more
> details, like "autoprewarm successfully prewarmed %d of %d
> previously-loaded blocks".  This would require some additional
> tracking that you haven't got right now; you'd have to keep track not
> only of the number of blocks read from the file but how many of those
> some worker actually loaded.  You could do that with an extra counter
> in the shared memory area that gets incremented by the per-database
> workers.
>
> * dump_block_info_periodically() calls ResetLatch() immediately before
> WaitLatch; that's backwards.  See the commit message for commit
> 887feefe87b9099c2967ec31ce20df4dfa9b and the comments it added to
> the top of latch.h for details on how to do this correctly.

-- Sorry Fixed.

> * dump_block_info_periodically()'s main loop is a bit confusing.  I
> think that after calling dump_now(true) it should just "continue",
> which will automatically retest got_sigterm.  You could rightly object
> to that plan on the grounds that we then wouldn't recheck got_sighup
> promptly, but you can fix that by moving the got_si

[HACKERS] Misleading comment in slru.h

2017-06-26 Thread Thomas Munro

Hi hackers,

As mentioned in another thread[1], slru.h says:

  * Note: slru.c currently assumes that segment file names will be four hex
  * digits.  This sets a lower bound on the segment size (64K transactions
  * for 32-bit TransactionIds).

That comment is out of date: commit 638cf09e extended SLRUs to support
5 character names to support pg_multixact and commit 73c986ad extended
support to 6 character SLRU file names for pg_commit_ts.

Should we just remove that comment?

[1] 
https://www.postgresql.org/message-id/CAEepm%3D1UdqnmHTikNBsBYsSDuk3nc9rXFPbeWYrbA2iM6K9q2w%40mail.gmail.com

-- 
Thomas Munro
http://www.enterprisedb.com


remove-misleading-comment.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Incorrect mentions to pg_xlog in walmethods.c/h

2017-06-26 Thread Michael Paquier

Hi all,

I have noticed $subject. A patch is attached. Those comments are not
completely wrong either as pg_basebackup can generate pg_xlog as well,
still I would recommend to just mention "pg_wal".
Thanks,
-- 
Michael


walmethods-comments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-06-26 Thread Craig Ringer

On 21 June 2017 at 17:30, sanyam jain  wrote:
> Hi,
> After changing
> sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
> to
> sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID;
>
> I was facing another issue.
> On promotion of a cascaded server ThisTimeLineID in the standby server
> having logical slot becomes 0.
> Then i added a function call to GetStandbyFlushRecPtr in
> StartLogicalReplication which updates ThisTimeLineID.
>
> After the above two changes timeline following is working.But i'm not sure
> whether this is correct or not.In any case please someone clarify.

That's a reasonable thing to do, and again, I thought I did it in a
later revision, but apparently not (?). I've been working on other
things and have lost track of progress here a bit.

I'll check more closely.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-06-26 Thread Craig Ringer

On 21 June 2017 at 13:28, sanyam jain  wrote:
> Hi,
>
> In this patch in walsender.c sendTimeLineIsHistoric is set to true when
> current and ThisTimeLineID are equal.
>
> sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
>
>
> Shouldn't sendTimeLineIsHistoric is true when state->currTLI is less than
> ThisTimeLineID.

Correct, that was a bug. I thought it got fixed upthread though.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Out of date comment in predicate.c

2017-06-26 Thread Thomas Munro

Hi hackers,

Commit ea9df812d8502fff74e7bc37d61bdc7d66d77a7f got rid of
FirstPredicateLockMgrLock, but it's still referred to in a comment in
predicate.c where the locking protocol is documented.  I think it's
probably best to use the name of the macro that's usually used to
access the lock array in the code.  Please see attached.

-- 
Thomas Munro
http://www.enterprisedb.com


fix-comments.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] transition table behavior with inheritance appears broken

2017-06-26 Thread Noah Misch

On Sat, Jun 24, 2017 at 10:57:49PM -0700, Noah Misch wrote:
> On Fri, Jun 23, 2017 at 02:39:48PM +0100, Andrew Gierth wrote:
> > > "Noah" == Noah Misch  writes:
> > 
> >  Noah> This PostgreSQL 10 open item is past due for your status update.
> >  Noah> Kindly send a status update within 24 hours,
> > 
> > oops, sorry! I forgot to include a date in the last one, and in fact a
> > personal matter delayed things anyway. I expect to have this wrapped up
> > by 23:59 BST on the 24th.
> 
> This PostgreSQL 10 open item is again past due for your status update.  Kindly
> send a status update within 24 hours, and include a date for your subsequent
> status update.

IMMEDIATE ATTENTION REQUIRED.  This PostgreSQL 10 open item is long past due
for your status update.  Please reacquaint yourself with the policy on open
item ownership[1] and then reply immediately.  If I do not hear from you by
2017-06-28 06:00 UTC, I will transfer this item to release management team
ownership without further notice.

[1] 
https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical decoding on standby

2017-06-26 Thread sanyam jain

Hi,
>After changing
>sendTimeLineIsHistoric = state->currTLI == ThisTimeLineID;
>to
>sendTimeLineIsHistoric = state->currTLI != ThisTimeLineID;
>
>I was facing another issue.
>On promotion of a cascaded server ThisTimeLineID in the standby server having 
>>logical slot becomes 0.
>Then i added a function call to GetStandbyFlushRecPtr in 
>StartLogicalReplication >which updates ThisTimeLineID.
>
>After the above two changes timeline following is working.But i'm not sure 
>whether >this is correct or not.In any case please someone clarify.


Please anyone with experience can explain whether the steps i have done are 
correct or not.

Thanks,
Sanyam Jain

Re: [HACKERS] Modifing returning value of PQgetvalue.

2017-06-26 Thread Peter Eisentraut

On 6/24/17 06:31, Dmitry Igrishin wrote:
> PQgetvalue returns a value of type char* (without const). But the
> documentation says:
> "The pointer returned by PQgetvalue points to storage that is part of
> the PGresult structure. /One should not modify the data it points to/"
> (my italics). Could someone tell me please, what wrong with modifing
> arbitrary character of the data pointed by PQgetvalue's returning value?
> Or why this restriction is documented? Thanks.

This is just how the API is defined.  It could be defined differently,
but it is not.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_basebackup fails on Windows when using tablespace mapping

2017-06-26 Thread Michael Paquier

On Tue, Jun 27, 2017 at 12:13 PM, Michael Paquier
 wrote:
> At quick glance, I think that this should definitely be a client-only
> fix. One reason is that pg_basebackup should be able to work with past
> servers. A second is that this impacts as well the backend code, and
> readlink may not return an absolute path. At least that's true for
> posix's version, Postgres uses its own wrapper with junction points..

The problem is in pg_basebackup.c's get_tablespace_mapping(), which
fails to provide a correct comparison for the directory given by
caller. In your case the caller of get_tablespace_mapping() uses
backslashes as directory value (path value received from backend), and
the tablespace mapping uses slashes as canonicalize_path has been
called once on it. Because of this inconsistency, the directory of the
original tablespace is used, causing the backup to fail as it should.
A simple fix is to make sure that the comparison between both things
is consistent by using canonicalize_path on the directory value given
by caller.

Attached is a patch. I am parking that in the next CF so as this does
not get forgotten as a bug fix. Perhaps a committer will show up
before. Or not.
-- 
Michael

basebackup-win-tbspace.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] distinct estimate of a hard-coded VALUES list

2017-06-26 Thread Jeff Janes

On Tue, Aug 23, 2016 at 5:28 AM, Tom Lane  wrote:

> Tomas Vondra  writes:
> > On 08/22/2016 07:42 PM, Alvaro Herrera wrote:
> >> Also, if we patch it this way and somebody has a slow query because of a
> >> lot of duplicate values, it's easy to solve the problem by
> >> de-duplicating.  But with the current code, people that have the
> >> opposite problem has no way to work around it.
>
> > I certainly agree it's better when a smart user can fix his query plan
> > by deduplicating the values than when we end up generating a poor plan
> > due to assuming some users are somewhat dumb.
>
> Well, that seems to be the majority opinion, so attached is a patch
> to do it.  Do we want to sneak this into 9.6, or wait?
>
> > I wonder how expensive would it be to actually count the number of
> > distinct values - there certainly are complex data types where the
> > comparisons are fairly expensive, but I would not expect those to be
> > used in explicit VALUES lists.
>
> You'd have to sort the values before de-duping, and deal with VALUES
> expressions that aren't simple literals.  And you'd have to do it a
> lot --- by my count, get_variable_numdistinct is invoked six times
> on the Var representing the VALUES output in Jeff's example query.
> Maybe you could cache the results somewhere, but that adds even
> more complication.  Given the lack of prior complaints about this,
> it's hard to believe it's worth the trouble.
>
>
This patch still applies, and I think the argument for it is still valid.
So I'm going to make a commit-fest entry for it.  Is there additional
evidence we should gather?

Cheers,

Jeff

Re: [HACKERS] pg_basebackup fails on Windows when using tablespace mapping

2017-06-26 Thread Michael Paquier

On Tue, Jun 27, 2017 at 1:57 AM, nb  wrote:
> Trying to take a `pg_basebackup -T OLDDIR=NEWDIR [etc]` on master (server
> running the cluster) fails on Windows with error "pg_basebackup: directory
> "OLDDIR" exists but is not empty".

Yes, I can see this error easily.

> This fixed the issue, but as a side effect, pg_tablespace_location displays
> path in canonicalized format which might look weird to most Windows users.
>
> There was a suggestion to fix this in client instead, but this fix was the
> simplest one to implement.

At quick glance, I think that this should definitely be a client-only
fix. One reason is that pg_basebackup should be able to work with past
servers. A second is that this impacts as well the backend code, and
readlink may not return an absolute path. At least that's true for
posix's version, Postgres uses its own wrapper with junction points..

> This is my first post to Hackers, so please let me know if I missed
> something or can provide any additional info to help further investigate
> this issue.

Thanks for the report! You are giving enough details to have a look at
the problem. No need for -X stream in the command.

I'll see if I can produce a patch soon.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-26 Thread Amit Kapila

On Mon, Jun 26, 2017 at 8:09 PM, Andrew Dunstan
 wrote:
>
>
> On 06/26/2017 10:36 AM, Amit Kapila wrote:
>> On Fri, Jun 23, 2017 at 9:12 AM, Andrew Dunstan
>>  wrote:
>>>
>>> On 06/22/2017 10:24 AM, Tom Lane wrote:
 Andrew Dunstan  writes:
> Please let me know if there are tests I can run. I  missed your earlier
> request in this thread, sorry about that.
 That earlier request is still valid.  Also, if you can reproduce the
 symptom that lorikeet just showed and get a stack trace from the
 (hypothetical) postmaster core dump, that would be hugely valuable.


>>>
>>> See attached log and stacktrace
>>>
>> Is this all the log contents or is there something else?  The attached
>> log looks strange to me in the sense that the first worker that gets
>> invoked is Parallel worker number 2 and it finds that somebody else
>> has already set the sender handle.
>>
>
>
>
> No, it's the end of the log file. I can rerun and get the whole log file
> if you like.
>

Okay, if possible, please share the same.  Another way to get better
information is if we change the code of shm_mq_set_sender such that it
will hang if we hit Assertion condition.  Once it hangs we can attach
a debugger and try to get some information.  Basically, just change
the code of shm_mq_set_sender as below or something like that:

void
shm_mq_set_sender(shm_mq *mq, PGPROC *proc)
{
volatile shm_mq *vmq = mq;
PGPROC   *receiver;

SpinLockAcquire(&mq->mq_mutex);
if (vmq->mq_sender != NULL)
{
while(1)
{
}
}

If we are able to hit the above code, then we can print the values of
mq_sender especially pid and see if the pid is of the current backend.
In theory, it should be of the different backend as this is the first
time we are setting the mq_sender.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-26 Thread Michael Paquier

On Mon, Jun 26, 2017 at 4:11 PM, Masahiko Sawada  wrote:
> Thank you for the patches! I checked additional patches for brin and
> spgist. They look good to me.

Last versions are still missing something: brin_mask() and spg_mask()
can be updated so as mask_unused_space() is called for meta pages.
Except that the patches look to be on the right track.

By the way, as this is an optimization and not an actual bug, could
you add this patch to the next commit fest? I don't think that this
should get into 10. The focus is to stabilize things lately.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Causal reads take II

2017-06-26 Thread Thomas Munro

On Sun, Jun 25, 2017 at 2:36 AM, Simon Riggs  wrote:
> I'm very happy that you are addressing this topic.
>
> I noticed you didn't put in links my earlier doubts about this
> specific scheme, though I can see doubts from myself and Heikki at
> least in the URLs. I maintain those doubts as to whether this is the
> right way forwards.

One technical problem was raised in the earlier thread by Ants Aasma
that I concede may be fatal to this design (see note below about
read-follows-read below), but I'm not sure.  All the other discussion
seemed to be about trade-offs between writer-waits and reader-waits
schemes, both of which I still view as reasonable options for an end
user to have in the toolbox.  Your initial reaction was:

> What we want is to isolate the wait only to people performing a write-read
> sequence, so I think it should be readers that wait.

I agree with you 100%, up to the comma.  The difficulty is identifying
which transactions are part of a write-read sequence.  An
application-managed LSN tracking system allows for waits to occur
strictly in reads that are part of a write-read sequence because the
application links them explicitly, and nobody is arguing that we
shouldn't support that for hard working expert users.  But to support
applications that don't deal with LSNs (or some kind of "causality
tokens") explicitly I think we'll finish up having to make readers
wait for incidental later transactions too, not just the write that
your read is dependent on, as I'll show below.  When you can't
identify write-read sequences perfectly, it comes down to a choice
between taxing writers or taxing readers, and I'm not sure that one
approach is universally better.  Let me summarise my understanding of
that trade-off.

I'm going to use this terminology:

synchronous replay = my proposal: ask the primary server to wait until
standbys have applied tx1, a bit like 9.6 synchronous_commit =
remote_apply, but with a system of leases for graceful failure.

causality tokens = the approach Heikki mentioned: provide a way for
tx1 to report its commit LSN to the client, then provide a way for a
client to wait for the LSN to be replayed before taking a snapshot for
tx2.

tx1, tx2 = a pair of transactions with a causal dependency; we want
tx2 to see tx1 because tx1 caused tx2 in some sense so must be seen to
precede it.

A poor man's causality token system can be cobbled together today with
pg_current_wal_lsn() and a polling loop that checks
pg_last_wal_replay_lsn().  It's a fairly obvious thing to want to do.
Several people including Heikki Linnakangas, Craig Ringer, Ants Aasma,
Ivan Kartyshov and probably many others have discussed better ways to
do that[1], and a patch for the wait-for-LSN piece appeared in a
recent commitfest[2].  I reviewed Ivan's patch and voted -1 only
because it didn't work for higher isolation levels.  If he continues
to develop that I will be happy to review and help get it into
committable shape, and if he doesn't I may try to develop it myself.
In short, I think this is a good tool to have in the toolbox and
PostgreSQL 11 should have it!  But I don't think it necessarily
invalidates my synchronous replay proposal: they represent different
sets of trade-offs and might appeal to different users.  Here's why:

To actually use a causality token system you either need a carefully
designed application that keeps track of causal dependencies and
tokens, in which case the developer works harder but can benefit from
from an asynchronous pipelining effect (by the time tx2 runs we hope
that tx1 has been applied, so neither transaction had to wait).  Let's
call that "application-managed causality tokens".  That's great for
those users -- let's make that possible -- but most users don't want
to write code like that.  So I see approximately three choices for
transparent middleware (or support built into standbys), which I'll
name and describe as follows:

1.  "Panoptic" middleware: Sees all queries so that it can observe
commit LSNs and inject wait directives into all following read-only
transactions.  Since all queries now need to pass through a single
process, you have a new bottleneck, an extra network hop, and a single
point of failure so you'll probably want a failover system with
split-brain defences.

2.  "Hybrid" middleware: The middleware (or standby itself) listens to
the replication stream so that it knows about commit LSNs (rather than
spying on commits).  The primary server waits until all connected
middleware instances acknowledge commit LSNs before it releases
commits, and then the middleware inserts wait-for-LSN directives into
read-only transactions.  Now there is no single point problem, but
writers are impacted too.  I mention this design because I believe
this is conceptually similar to how Galera wsrep_sync_wait (AKA
wsrep_causal_reads) works.  (I call this "hybrid" because it splits
the waiting between tx1 and tx2.  Since it's synchronous, dealing with
failure gracefully is tri

Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-26 Thread Michael Paquier

On Tue, Jun 27, 2017 at 3:44 AM, Tom Lane  wrote:
> Looks good as far as it goes, but I wonder whether any of the other
> get_slot_xmins calls need polling too.  Don't feel a need to add such
> calls until someone exhibits a failure there, but I won't be very
> surprised if someone does.

I got the same thought, wondering as well if get_slot_xmins should be
renamed check_slot_xmins with the is() tests moved inside it as well.
Not sure if that's worth the API ugliness though.

> Anyway, pushed this patch with minor editing.  Thanks!

Thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

Andres Freund  writes:
> On 2017-06-26 17:38:03 -0400, Tom Lane wrote:
>> Hm.  Take that a bit further, and we could drop the connection probes
>> altogether --- just put the whole responsibility on the postmaster to
>> show in the pidfile whether it's ready for connections or not.

> Yea, that seems quite appealing, both from an architectural, simplicity,
> and log noise perspective. I wonder if there's some added reliability by
> the connection probe though? Essentially wondering if it'd be worthwhile
> to keep a single connection test at the end. I'm somewhat disinclined
> though.

I agree --- part of the appeal of this idea is that there could be a net
subtraction of code from pg_ctl.  (I think it wouldn't have to link libpq
anymore at all, though maybe I forgot something.)  And you get rid of a
bunch of can't-connect failure modes, eg kernel packet filter in the way,
which probably outweighs any hypothetical reliability gain from confirming
the postmaster state the old way.

This would mean that v10 pg_ctl could not be used to start/stop older
postmasters, which is flexibility we tried to preserve in the past.
But I see that we already gave that up in quite a few code paths because
of their attempts to read pg_control, so I think it's a concern we can
ignore.

I'll draft something up later.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql's \d and \dt are sending their complaints to different output files

2017-06-26 Thread Daniel Gustafsson

> On 19 Jun 2017, at 17:32, Tom Lane  wrote:
> 
> So, if we're getting into enforcing consistency in describe.c, there's
> lots to do.
> 
> * listDbRoleSettings does this for a server too old to have the desired
> feature:
> 
>   fprintf(pset.queryFout,
>   _("No per-database role settings support in this server 
> version.\n"));
> 
> Just about every other function handles too-old-server like this:
> 
>   psql_error("The server (version %s) does not support full text 
> search.\n”,

Addressed in attached patch, see list of patches below.

> * listTables and listDbRoleSettings do this for zero matches:
> 
>   if (PQntuples(res) == 0 && !pset.quiet)
>   {
>   if (pattern)
>   fprintf(pset.queryFout, _("No matching relations 
> found.\n"));
>   else
>   fprintf(pset.queryFout, _("No relations found.\n"));
>   }
>   else
>   ... print the result normally
> 
> Note the test on quiet mode, as well as the choice of two messages
> depending on whether the command had a pattern argument or not.
> 
> Other functions in describe.c mostly follow this pattern:
> 
>   if (PQntuples(res) == 0)
>   {
>   if (!pset.quiet)
>   psql_error("Did not find any relation named \"%s\".\n",
>  pattern);
>   PQclear(res);
>   return false;
>   }
> 
> So this has a different behavior in quiet mode, which is to print
> *nothing* to queryFout rather than a result table with zero entries.
> That's probably bogus.  

There are two cases in verbose metacommands, we either print a generic “List of
XXX” title with a table following it, or multiple tables (with additional
non-table data) a per-table title.  For unmatched commands in the former case,
we print the title and an empty table in verbose mode, the latter case prints a
“Did not found XXX” message.  When QUIET is set to on, the latter case doesn’t
print anything for the most case.

Not printing anything is clearly not helpful, but choosing what to print
requires some thinking so before hacking on that I figured I’d solicit
opinions.  We can either keep printing a “Did not find ..” message; change a
per-table title to a generic one and include an empty table; a mix as today;
something completely different.  What preferences on output are there?

Personally I’m in favour of trying to add an empty table to all verbose
commands, simply because that’s what I expect to see when using psql.

> It will also crash, on some machines, if pattern
> is NULL, although no doubt nobody's noticed because there would always be
> a match.  (One call site does defend itself against null pattern, and it
> uses two messages like the previous example.)

Addressed in attached patch, see list of patches below.

> So we've got at least three things to normalize:
> 
> * fprintf(pset.queryFout) vs. psql_error()
> 
> * message wording inconsistencies
> 
> * behavior with -q and no matches.
> 
> Anybody want to draft a patch?

I poked around the code with an eye to improving consistency, and included some
more things that caught my eye.  Rather than attaching one patch with
everything, I pulled out the various proposals as separate patches:

0001 - Not really a consistency thing but included here since it’s sort of
related.  A small memleak on pattern2 spotted while reading code, unless I’m
missing where it’s freed.

0002 - While on the topic of consistency, tiny function comment reformat on the
metacommand function because I have comment-OCD, feel free to ignore.

0003 - Apply the same server version check pattern in listDbRoleSettings() as
elsewhere in other functions

0004 - Most verbose metacommands include additional information on top of what
is in the normal output, while \dRp+ dropped two columns (moving one to the
title).  This include the information from \dRp in \dRp+.  Having the pubname
in the title as well as the table is perhaps superfuous, but consistent with
other commands so opted for it.

0005 - Most tables titles were created using a PQExpBuffer with two using a
char buffer and snprintf().  Moved to using a PQExpBuffer instead in all cases
since it’s consistent and clean (not that the buffers risked overflowing or
anything like that, but I like the PQExpBuffer API).

0006 - Moves to psql_error() for all not-found messages and the same language
for all messages.  I don’t think these are errors per se, but psql_error() was
the most prevelant option used, so went there for consistency as a starting
point for discussion.  Also adds appropriate NULL deref check on pattern and
adds a not-found message in describePublications() which previously didn’t
print anything at all on not-found.

Hope there is something of interest in there.

cheers ./daniel

0001-Free-allocated-memory-when-2-patterns-used.patch
Description: Binary data

0002-Use-consistent-function-comments

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund

On 2017-06-26 17:38:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-06-26 17:30:30 -0400, Tom Lane wrote:
> >> No, I don't like that at all.  Has race conditions against updates
> >> coming from the startup process.
> 
> > You'd obviously have to take the appropriate locks.  I think the issue
> > here is less race conditions, and more that architecturally we'd
> > interact with shmem too much.
> 
> Uh, we are *not* taking any locks in the postmaster.

I'm not sure why you're 'Uh'ing, when my my point pretty much is that we
do not want to do so?

> Hm.  Take that a bit further, and we could drop the connection probes
> altogether --- just put the whole responsibility on the postmaster to
> show in the pidfile whether it's ready for connections or not.

Yea, that seems quite appealing, both from an architectural, simplicity,
and log noise perspective. I wonder if there's some added reliability by
the connection probe though? Essentially wondering if it'd be worthwhile
to keep a single connection test at the end. I'm somewhat disinclined
though.

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

Andres Freund  writes:
> On 2017-06-26 17:30:30 -0400, Tom Lane wrote:
>> No, I don't like that at all.  Has race conditions against updates
>> coming from the startup process.

> You'd obviously have to take the appropriate locks.  I think the issue
> here is less race conditions, and more that architecturally we'd
> interact with shmem too much.

Uh, we are *not* taking any locks in the postmaster.

>> Yeah, that would be a different way to go at it.  The postmaster would
>> probably just write the state of the hot_standby GUC to the file, and
>> pg_ctl would have to infer things from there.

> I'd actually say we should just mirror the existing
> #ifdef USE_SYSTEMD
>   if (!EnableHotStandby)
>   sd_notify(0, "READY=1");
> #endif
> with corresponding pidfile updates - doesn't really seem necessary for
> pg_ctl to do more?

Hm.  Take that a bit further, and we could drop the connection probes
altogether --- just put the whole responsibility on the postmaster to
show in the pidfile whether it's ready for connections or not.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund

On 2017-06-26 17:30:30 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > It'd be quite possible to address the race-condition by moving the
> > updating of the control file to postmaster, to the
> > CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require
> > updating the control file from postmaster, which'd be somewhat ugly.
> 
> No, I don't like that at all.  Has race conditions against updates
> coming from the startup process.

You'd obviously have to take the appropriate locks.  I think the issue
here is less race conditions, and more that architecturally we'd
interact with shmem too much.

> > Perhaps that indicates that field shouldn't be in pg_control, but in the
> > pid file?
> 
> Yeah, that would be a different way to go at it.  The postmaster would
> probably just write the state of the hot_standby GUC to the file, and
> pg_ctl would have to infer things from there.

I'd actually say we should just mirror the existing
#ifdef USE_SYSTEMD
if (!EnableHotStandby)
sd_notify(0, "READY=1");
#endif
with corresponding pidfile updates - doesn't really seem necessary for
pg_ctl to do more?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

Andres Freund  writes:
> It'd be quite possible to address the race-condition by moving the
> updating of the control file to postmaster, to the
> CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require
> updating the control file from postmaster, which'd be somewhat ugly.

No, I don't like that at all.  Has race conditions against updates
coming from the startup process.

> Perhaps that indicates that field shouldn't be in pg_control, but in the
> pid file?

Yeah, that would be a different way to go at it.  The postmaster would
probably just write the state of the hot_standby GUC to the file, and
pg_ctl would have to infer things from there.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund

Hi,

On 2017-06-26 16:49:07 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Arguably we could and should improve the logic when the server has
> > started, right now it's pretty messy because we never treat a standby as
> > up if hot_standby is disabled...
> 
> True.  If you could tell the difference between "HS disabled" and "HS not
> enabled yet" from pg_control, that would make pg_ctl's behavior with
> cold-standby servers much cleaner.  Maybe it *is* worth messing with the
> contents of pg_control at this late hour.

I'm +0.5.


> My inclination for the least invasive fix is to leave the DBState
> enum alone and add a separate hot-standby state field with three
> values (disabled/not-yet-enabled/enabled).

Yea, that seems sane.


> Then pg_ctl would start
> probing the postmaster when it saw either DB_IN_PRODUCTION DBstate
> or hot-standby-enabled.  (It'd almost not have to probe the postmaster
> at all, except there's a race condition that the startup process
> will probably change the field a little before the postmaster gets
> the word to open the gates.)  On the other hand, if it saw
> DB_IN_ARCHIVE_RECOVERY with hot standby disabled, it'd stop waiting.

It'd be quite possible to address the race-condition by moving the
updating of the control file to postmaster, to the
CheckPostmasterSignal(PMSIGNAL_BEGIN_HOT_STANDBY) block. That'd require
updating the control file from postmaster, which'd be somewhat ugly.
Perhaps that indicates that field shouldn't be in pg_control, but in the
pid file?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

Andres Freund  writes:
> Arguably we could and should improve the logic when the server has
> started, right now it's pretty messy because we never treat a standby as
> up if hot_standby is disabled...

True.  If you could tell the difference between "HS disabled" and "HS not
enabled yet" from pg_control, that would make pg_ctl's behavior with
cold-standby servers much cleaner.  Maybe it *is* worth messing with the
contents of pg_control at this late hour.

My inclination for the least invasive fix is to leave the DBState
enum alone and add a separate hot-standby state field with three
values (disabled/not-yet-enabled/enabled).  Then pg_ctl would start
probing the postmaster when it saw either DB_IN_PRODUCTION DBstate
or hot-standby-enabled.  (It'd almost not have to probe the postmaster
at all, except there's a race condition that the startup process
will probably change the field a little before the postmaster gets
the word to open the gates.)  On the other hand, if it saw
DB_IN_ARCHIVE_RECOVERY with hot standby disabled, it'd stop waiting.

Any objections to that design sketch?  Do we need to distinguish
between master and slave servers in the when-to-stop-waiting logic?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-06-26 Thread Alexander Korotkov

On Mon, Jun 26, 2017 at 10:55 PM, Tomas Vondra  wrote:

> On 06/26/2017 05:18 PM, Alexander Korotkov wrote:
>
>> I see that design question for PostgreSQL pluggable storages is very hard.
>>
>
> IMHO it's mostly expected to be hard.
>
> Firstly, PostgreSQL is a mature product with many advanced features, and
> reworking a low-level feature without breaking something on top of it is
> hard by definition.
>
> Secondly, project policies and code quality requirements set the bar very
> high too, I think.


Sure.

> BTW, I think it worth analyzing existing use-cases of pluggable
>
>> storages.  I think that the most famous DBMS with pluggable storage API
>> is MySQL. This why I decided to start with it. I've added
>> MySQL/MariaDB section on wiki page.
>> https://wiki.postgresql.org/wiki/Future_of_storage#MySQL.2FMariaDB
>> It appears that significant part of MySQL storage engines are misuses.
>> MySQL lacks of various features like FDWs or writable views and so on.
>> This is why developers created a lot of pluggable storages for that
>> purposes.  We definitely don't want something like this in PostgreSQL now.
>> I created special resume column where I expressed whether it
>> would be nice to have something like this table engine in PostgreSQL.
>>
>
> I don't want to discourage you, but I'm not sure how valuable this is.
>
> I agree it's valuable to have a an over-view of use cases for pluggable
> storage, but I don't think we'll get that from looking at MySQL. As you
> noticed, most of the storage engines are misuses, so it's difficult to
> learn anything valuable from them. You can argue that using FDWs to
> implement alternative storage engines is a misuse too, but at least that
> gives us a valid use case (columnar storage implemented using FDW).
>
> If anything, the MySQL storage engines should serve as a cautionary tale
> how not to do things - there's also a plenty of references in the MySQL
> "Restrictions and Limitations" section of the manual:
>
>   https://downloads.mysql.com/docs/mysql-reslimits-excerpt-5.7-en.pdf


Just to clarify the thing.  I don't propose any adoption of MySQL pluggable
storage API to PostgreSQL or something like this.  I just wrote this table
for completeness of vision.  It may appear that somebody will make some
valuable notes out of it, it may appear that not.  "Yes" in third column
means only that there is positive user visible effects which are *nice to
have* in PostgreSQL.

Also, I remember there was a table with comparison of different designs of
pluggable storages and their use-cases at PGCon 2017 unconference.  Could
somebody reproduce it?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

I wrote:
> Andres Freund  writes:
>> It'd not be unreasonble to check pg_control first, and only after that
>> indicates readyness check via the protocol.

> Hm, that's a thought.  The problem here isn't the frequency of checks,
> but the log spam.

Actually, that wouldn't help much as things stand, because you can't
tell from pg_control whether hot standby is active.  Assuming that
we want "pg_ctl start" to come back as soon as connections are allowed,
it'd have to start probing when it sees DB_IN_ARCHIVE_RECOVERY, which
means Jeff still has a problem with long recovery sessions.

We could maybe address that by changing the set of states in pg_control
(or perhaps simpler, adding a "hot standby active" flag there).  That
might have wider consequences than we really want to deal with post-beta1
though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund

On 2017-06-26 16:26:00 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-06-26 16:19:16 -0400, Tom Lane wrote:
> >> Sure, what do you think an appropriate behavior would be?
> 
> > It'd not be unreasonble to check pg_control first, and only after that
> > indicates readyness check via the protocol.
> 
> Hm, that's a thought.  The problem here isn't the frequency of checks,
> but the log spam.

Right.  I think to deal with hot-standby we'd probably have to add new
state to the control file however. We don't just want to treat the
server as ready once DB_IN_PRODUCTION is reached.

Arguably we could and should improve the logic when the server has
started, right now it's pretty messy because we never treat a standby as
up if hot_standby is disabled...

- Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

Andres Freund  writes:
> On 2017-06-26 16:19:16 -0400, Tom Lane wrote:
>> Sure, what do you think an appropriate behavior would be?

> It'd not be unreasonble to check pg_control first, and only after that
> indicates readyness check via the protocol.

Hm, that's a thought.  The problem here isn't the frequency of checks,
but the log spam.

> Doesn't quite seem like something backpatchable tho.

I didn't back-patch the pg_ctl change anyway, so that's no issue.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \set AUTOROLLBACK ON

2017-06-26 Thread David Fetter

On Mon, Jun 26, 2017 at 12:35:47PM -0700, David G. Johnston wrote:
> On Mon, Jun 26, 2017 at 12:19 PM, David Fetter  wrote:
> 
> > On Mon, Jun 26, 2017 at 04:00:55PM +0200, Joel Jacobson wrote:
> > > Hi hackers,
> > >
> > > A colleague of mine wondered if there is a way to always run
> > > everything you type into psql in a db txn and automatically rollback
> > > it as soon as it finish.
> > > I couldn't think of any way to do so, but thought it would be a nice
> > > feature and probably quite easy to add to psql, so I thought I should
> > > suggest it here.
> > >
> > > The typical use-case is you are doing something in production that you
> > > just want to
> > > a) test if some query works like expected and then rollback
> > > or,
> > > b) read-only queries that should not commit any changes anyway, so
> > > here the rollback would just be an extra layer of security, since your
> > > SELECT might call volatile functions that are actually not read-only
> > >
> > > Thoughts?
> >
> > Multi-statement transactions:
> >
> > Would flavor of BEGIN TRANSACTION undo the feature?
> > If not, would it auto-munge COMMIT into a ROLLBACK?
> >
> 
> We already have SET TRANSACTION READ ONLY.

Now there's an interesting and potentially fruitful idea.  How about
exposing GUCs to psql?  That way, it'd be possible to put (some
transformation of) them in the prompt, etc.

> > Side effects:
> >
> > Let's imagine you have a function called
> > ddos_the_entire_internet(message TEXT), or something less drastic
> > which nevertheless has side effects the DB can't control.
> >
> > How should this mode handle it?  Should it try to detect calls to
> > volatile functions, or should it just silently fail to do what
> > it's promised to do?
> >
> 
> It doesn't need to promise anything more than what happens today if
> someone manually keys in
> 
> BEGIN;
> [...]
> ROLLBACK;
> 
> using psql prompts.

Seems reasonable :)

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Andres Freund

On 2017-06-26 16:19:16 -0400, Tom Lane wrote:
> Jeff Janes  writes:
> > The 10 fold increase in log spam during long PITR recoveries is a bit
> > unfortunate.
> 
> > 9153  2017-06-26 12:55:40.243 PDT FATAL:  the database system is starting up
> > 9154  2017-06-26 12:55:40.345 PDT FATAL:  the database system is starting up
> > 9156  2017-06-26 12:55:40.447 PDT FATAL:  the database system is starting up
> > 9157  2017-06-26 12:55:40.550 PDT FATAL:  the database system is starting up
> > ...
> 
> > I can live with it, but could we use an escalating wait time so it slows
> > back down to once a second after a while?
> 
> Sure, what do you think an appropriate behavior would be?

It'd not be unreasonble to check pg_control first, and only after that
indicates readyness check via the protocol. Doesn't quite seem like
something backpatchable tho.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

Jeff Janes  writes:
> The 10 fold increase in log spam during long PITR recoveries is a bit
> unfortunate.

> 9153  2017-06-26 12:55:40.243 PDT FATAL:  the database system is starting up
> 9154  2017-06-26 12:55:40.345 PDT FATAL:  the database system is starting up
> 9156  2017-06-26 12:55:40.447 PDT FATAL:  the database system is starting up
> 9157  2017-06-26 12:55:40.550 PDT FATAL:  the database system is starting up
> ...

> I can live with it, but could we use an escalating wait time so it slows
> back down to once a second after a while?

Sure, what do you think an appropriate behavior would be?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Jeff Janes

On Mon, Jun 26, 2017 at 12:15 PM, Tom Lane  wrote:

> Michael Paquier  writes:
> > On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane  wrote:
> >> The attached proposed patch adjusts pg_ctl to check every 100msec,
> >> instead of every second, for the postmaster to be done starting or
> >> stopping.
>
> >> +#define WAITS_PER_SEC  10  /* should divide 100 evenly */
>
> > As a matter of style, you could define 100 as well in a variable
> > and refer to the variable for the division.
>
> Good idea, done that way.  (My initial thought was to use USECS_PER_SEC
> from timestamp.h, but that's declared as int64 which would have
> complicated matters, so I just made a new symbol.)
>
> > This also pops up more easily failures with 001_stream_rep.pl without
> > a patch applied from the other recent thread, so this patch had better
> > not get in before anything from
> > https://www.postgresql.org/message-id/8962.1498425...@sss.pgh.pa.us.
>
> Check.  I pushed your fix for that first.
>
> Thanks for the review!
>
> regards, tom lane
>

The 10 fold increase in log spam during long PITR recoveries is a bit
unfortunate.

9153  2017-06-26 12:55:40.243 PDT FATAL:  the database system is starting up
9154  2017-06-26 12:55:40.345 PDT FATAL:  the database system is starting up
9156  2017-06-26 12:55:40.447 PDT FATAL:  the database system is starting up
9157  2017-06-26 12:55:40.550 PDT FATAL:  the database system is starting up
...

I can live with it, but could we use an escalating wait time so it slows
back down to once a second after a while?

Cheers,

Jeff

Re: [HACKERS] Pluggable storage

2017-06-26 Thread Tomas Vondra


Hi,

On 06/26/2017 05:18 PM, Alexander Korotkov wrote:

Hackers,

I see that design question for PostgreSQL pluggable storages is very 
hard.


IMHO it's mostly expected to be hard.

Firstly, PostgreSQL is a mature product with many advanced features, and 
reworking a low-level feature without breaking something on top of it is 
hard by definition.


Secondly, project policies and code quality requirements set the bar 
very high too, I think.


> BTW, I think it worth analyzing existing use-cases of pluggable

storages.  I think that the most famous DBMS with pluggable storage API
is MySQL. This why I decided to start with it. I've added
MySQL/MariaDB section on wiki page.
https://wiki.postgresql.org/wiki/Future_of_storage#MySQL.2FMariaDB
It appears that significant part of MySQL storage engines are misuses.  
MySQL lacks of various features like FDWs or writable views and so on.  
This is why developers created a lot of pluggable storages for that 
purposes.  We definitely don't want something like this in PostgreSQL 
now.  I created special resume column where I expressed whether it

would be nice to have something like this table engine in PostgreSQL.



I don't want to discourage you, but I'm not sure how valuable this is.

I agree it's valuable to have a an over-view of use cases for pluggable 
storage, but I don't think we'll get that from looking at MySQL. As you 
noticed, most of the storage engines are misuses, so it's difficult to 
learn anything valuable from them. You can argue that using FDWs to 
implement alternative storage engines is a misuse too, but at least that 
gives us a valid use case (columnar storage implemented using FDW).


If anything, the MySQL storage engines should serve as a cautionary tale 
how not to do things - there's also a plenty of references in the MySQL 
"Restrictions and Limitations" section of the manual:


  https://downloads.mysql.com/docs/mysql-reslimits-excerpt-5.7-en.pdf

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \set AUTOROLLBACK ON

2017-06-26 Thread David G. Johnston

On Mon, Jun 26, 2017 at 12:19 PM, David Fetter  wrote:

> On Mon, Jun 26, 2017 at 04:00:55PM +0200, Joel Jacobson wrote:
> > Hi hackers,
> >
> > A colleague of mine wondered if there is a way to always run
> > everything you type into psql in a db txn and automatically rollback
> > it as soon as it finish.
> > I couldn't think of any way to do so, but thought it would be a nice
> > feature and probably quite easy to add to psql, so I thought I should
> > suggest it here.
> >
> > The typical use-case is you are doing something in production that you
> > just want to
> > a) test if some query works like expected and then rollback
> > or,
> > b) read-only queries that should not commit any changes anyway, so
> > here the rollback would just be an extra layer of security, since your
> > SELECT might call volatile functions that are actually not read-only
> >
> > Thoughts?
>
> Multi-statement transactions:
>
> Would flavor of BEGIN TRANSACTION undo the feature?
> If not, would it auto-munge COMMIT into a ROLLBACK?
>

We already have SET TRANSACTION READ ONLY.

If you begin a transaction and do not issue an explicit commit when the
session closes the default action is ROLLBACK.

At some point if you want to use SQL features you need to write SQL - not
pass command-line arguments to the client.

See also ".psqlrc" and shell functions/aliases.

This doesn't seem like material to build into psql but since the proposal
lacks an envisioned usage its hard to say conclusively.  Interplay with the
various ways to source SQL, and existing arguments, is a prime area of
concern.

Side effects:
>
> Let's imagine you have a function called
> ddos_the_entire_internet(message TEXT), or something less drastic
> which nevertheless has side effects the DB can't control.
>
> How should this mode handle it?  Should it try to detect calls to
> volatile functions, or should it just silently fail to do what
> it's promised to do?
>

It doesn't need to promise anything more than what happens today if
someone manually keys in

BEGIN;
[...]
ROLLBACK;

using psql prompts.

David J.

Re: [HACKERS] \set AUTOROLLBACK ON

2017-06-26 Thread David Fetter

On Mon, Jun 26, 2017 at 04:00:55PM +0200, Joel Jacobson wrote:
> Hi hackers,
> 
> A colleague of mine wondered if there is a way to always run
> everything you type into psql in a db txn and automatically rollback
> it as soon as it finish.
> I couldn't think of any way to do so, but thought it would be a nice
> feature and probably quite easy to add to psql, so I thought I should
> suggest it here.
> 
> The typical use-case is you are doing something in production that you
> just want to
> a) test if some query works like expected and then rollback
> or,
> b) read-only queries that should not commit any changes anyway, so
> here the rollback would just be an extra layer of security, since your
> SELECT might call volatile functions that are actually not read-only
> 
> Thoughts?

Multi-statement transactions:

Would flavor of BEGIN TRANSACTION undo the feature?
If not, would it auto-munge COMMIT into a ROLLBACK?

Side effects:

Let's imagine you have a function called
ddos_the_entire_internet(message TEXT), or something less drastic
which nevertheless has side effects the DB can't control.

How should this mode handle it?  Should it try to detect calls to
volatile functions, or should it just silently fail to do what
it's promised to do?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david(dot)fetter(at)gmail(dot)com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Reducing pg_ctl's reaction time

2017-06-26 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Jun 26, 2017 at 7:13 AM, Tom Lane  wrote:
>> The attached proposed patch adjusts pg_ctl to check every 100msec,
>> instead of every second, for the postmaster to be done starting or
>> stopping.

>> +#define WAITS_PER_SEC  10  /* should divide 100 evenly */

> As a matter of style, you could define 100 as well in a variable
> and refer to the variable for the division.

Good idea, done that way.  (My initial thought was to use USECS_PER_SEC
from timestamp.h, but that's declared as int64 which would have
complicated matters, so I just made a new symbol.)

> This also pops up more easily failures with 001_stream_rep.pl without
> a patch applied from the other recent thread, so this patch had better
> not get in before anything from
> https://www.postgresql.org/message-id/8962.1498425...@sss.pgh.pa.us.

Check.  I pushed your fix for that first.

Thanks for the review!

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Timing-sensitive case in src/test/recovery TAP tests

2017-06-26 Thread Tom Lane

Michael Paquier  writes:
> On Mon, Jun 26, 2017 at 12:12 PM, Craig Ringer  wrote:
>> I'm not sure I understand this.

> The patch attached may explain that better. Your patch added 3 poll
> phases. I think that a 4th is needed. At the same time I have found
> cleaner to put the poll calls into a small wrapper.

Looks good as far as it goes, but I wonder whether any of the other
get_slot_xmins calls need polling too.  Don't feel a need to add such
calls until someone exhibits a failure there, but I won't be very
surprised if someone does.

Anyway, pushed this patch with minor editing.  Thanks!

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another reason why the recovery tests take a long time

2017-06-26 Thread Simon Riggs

On 26 June 2017 at 19:06, Tom Lane  wrote:
> I wrote:
>> So this looks like a pretty obvious race condition in the postmaster,
>> which should be resolved by having it set a flag on receipt of
>> PMSIGNAL_START_WALRECEIVER that's cleared only when it does start a
>> new walreceiver.
>
> Concretely, I propose the attached patch.  Together with reducing
> wal_retrieve_retry_interval to 500ms, which I propose having
> PostgresNode::init do in its standard postgresql.conf adjustments,
> this takes the runtime of the recovery TAP tests down from 2m50s
> (after the patches I posted yesterday) to 1m30s.

Patch looks good

> I think there's still gold to be mined, because "top" is still
> showing pretty low CPU load over most of the run, but this is
> lots better than 4m30s.

Thanks for looking into this

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another reason why the recovery tests take a long time

2017-06-26 Thread Tom Lane

I wrote:
> So this looks like a pretty obvious race condition in the postmaster,
> which should be resolved by having it set a flag on receipt of
> PMSIGNAL_START_WALRECEIVER that's cleared only when it does start a
> new walreceiver.

Concretely, I propose the attached patch.  Together with reducing
wal_retrieve_retry_interval to 500ms, which I propose having
PostgresNode::init do in its standard postgresql.conf adjustments,
this takes the runtime of the recovery TAP tests down from 2m50s
(after the patches I posted yesterday) to 1m30s.

I think there's still gold to be mined, because "top" is still
showing pretty low CPU load over most of the run, but this is
lots better than 4m30s.

regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 83e99b7..6c79c36 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*** static volatile sig_atomic_t start_autov
*** 357,362 
--- 357,365 
  /* the launcher needs to be signalled to communicate some condition */
  static volatile bool avlauncher_needs_signal = false;
  
+ /* received START_WALRECEIVER signal */
+ static volatile sig_atomic_t WalReceiverRequested = false;
+ 
  /* set when there's a worker that needs to be started up */
  static volatile bool StartWorkerNeeded = true;
  static volatile bool HaveCrashedWorker = false;
*** static void maybe_start_bgworkers(void);
*** 426,431 
--- 429,435 
  static bool CreateOptsFile(int argc, char *argv[], char *fullprogname);
  static pid_t StartChildProcess(AuxProcType type);
  static void StartAutovacuumWorker(void);
+ static void MaybeStartWalReceiver(void);
  static void InitPostmasterDeathWatchHandle(void);
  
  /*
*** ServerLoop(void)
*** 1810,1815 
--- 1814,1823 
  kill(AutoVacPID, SIGUSR2);
  		}
  
+ 		/* If we need to start a WAL receiver, try to do that now */
+ 		if (WalReceiverRequested)
+ 			MaybeStartWalReceiver();
+ 
  		/* Get other worker processes running, if needed */
  		if (StartWorkerNeeded || HaveCrashedWorker)
  			maybe_start_bgworkers();
*** reaper(SIGNAL_ARGS)
*** 2958,2964 
  		/*
  		 * Was it the wal receiver?  If exit status is zero (normal) or one
  		 * (FATAL exit), we assume everything is all right just like normal
! 		 * backends.
  		 */
  		if (pid == WalReceiverPID)
  		{
--- 2966,2973 
  		/*
  		 * Was it the wal receiver?  If exit status is zero (normal) or one
  		 * (FATAL exit), we assume everything is all right just like normal
! 		 * backends.  (If we need a new wal receiver, we'll start one at the
! 		 * next iteration of the postmaster's main loop.)
  		 */
  		if (pid == WalReceiverPID)
  		{
*** sigusr1_handler(SIGNAL_ARGS)
*** 5066,5079 
  		StartAutovacuumWorker();
  	}
  
! 	if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER) &&
! 		WalReceiverPID == 0 &&
! 		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
! 		 pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
! 		Shutdown == NoShutdown)
  	{
  		/* Startup Process wants us to start the walreceiver process. */
! 		WalReceiverPID = StartWalReceiver();
  	}
  
  	if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE) &&
--- 5075,5086 
  		StartAutovacuumWorker();
  	}
  
! 	if (CheckPostmasterSignal(PMSIGNAL_START_WALRECEIVER))
  	{
  		/* Startup Process wants us to start the walreceiver process. */
! 		/* Start immediately if possible, else remember request for later. */
! 		WalReceiverRequested = true;
! 		MaybeStartWalReceiver();
  	}
  
  	if (CheckPostmasterSignal(PMSIGNAL_ADVANCE_STATE_MACHINE) &&
*** StartAutovacuumWorker(void)
*** 5410,5415 
--- 5417,5440 
  }
  
  /*
+  * MaybeStartWalReceiver
+  *		Start the WAL receiver process, if requested and our state allows.
+  */
+ static void
+ MaybeStartWalReceiver(void)
+ {
+ 	if (WalReceiverPID == 0 &&
+ 		(pmState == PM_STARTUP || pmState == PM_RECOVERY ||
+ 		 pmState == PM_HOT_STANDBY || pmState == PM_WAIT_READONLY) &&
+ 		Shutdown == NoShutdown)
+ 	{
+ 		WalReceiverPID = StartWalReceiver();
+ 		WalReceiverRequested = false;
+ 	}
+ }
+ 
+ 
+ /*
   * Create the opts file
   */
  static bool

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another reason why the recovery tests take a long time

2017-06-26 Thread Andres Freund

On 2017-06-26 13:42:52 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2017-06-26 12:32:00 -0400, Tom Lane wrote:
> >> ... But I wonder whether it's intentional that the old
> >> walreceiver dies in the first place.  That FATAL exit looks suspiciously
> >> like it wasn't originally-designed-in behavior.
> 
> > It's quite intentional afaik - I've complained about the bad error
> > message recently (we really shouldn't say "no COPY in progress), but
> > exiting seems quite reasonable.  Otherwise we'd have add a separate
> > retry logic into the walsender, that reconnects without a new walsender
> > being started.
> 
> Ah, I see.  I'd misinterpreted the purpose of the infinite loop in
> WalReceiverMain() --- now I see that's for receiving requests from the
> startup proc for different parts of the WAL stream, not for reconnecting
> to the master.

Right.  And if the connection fails, we intentionally (whether that's
good or bad is another question) switch to restore_command (or
pg_xlog...) based recovery, in which case we certainly do not want the
walsender around.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another reason why the recovery tests take a long time

2017-06-26 Thread Tom Lane

Andres Freund  writes:
> On 2017-06-26 12:32:00 -0400, Tom Lane wrote:
>> ... But I wonder whether it's intentional that the old
>> walreceiver dies in the first place.  That FATAL exit looks suspiciously
>> like it wasn't originally-designed-in behavior.

> It's quite intentional afaik - I've complained about the bad error
> message recently (we really shouldn't say "no COPY in progress), but
> exiting seems quite reasonable.  Otherwise we'd have add a separate
> retry logic into the walsender, that reconnects without a new walsender
> being started.

Ah, I see.  I'd misinterpreted the purpose of the infinite loop in
WalReceiverMain() --- now I see that's for receiving requests from the
startup proc for different parts of the WAL stream, not for reconnecting
to the master.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix empty array expression in get_qual_for_list()

2017-06-26 Thread Jeevan Ladhe

On Mon, Jun 26, 2017 at 8:14 PM, Tom Lane  wrote:

> Jeevan Ladhe  writes:
> > In case of list partitioned table:
> > 1. If there is a partition accepting only null values and nothing else,
> then
> > currently the partition constraints for such a partition are constructed
> as
> > "((a IS NULL) OR (a = ANY (ARRAY[]::integer[])))".
> > I think there it is better to avoid constructing an empty array to avoid
> > executing ANY expression.
>
> Looks like a good idea, pushed with trivial stylistic adjustments.
>

Thanks Tom for taking care of this.


Regards,
Jeevan Ladhe

[HACKERS] pg_basebackup fails on Windows when using tablespace mapping

2017-06-26 Thread nb


Summary:

Trying to take a `pg_basebackup -T OLDDIR=NEWDIR [etc]` on master 
(server running the cluster) fails on Windows with error "pg_basebackup: 
directory "OLDDIR" exists but is not empty".


Version: 9.6.2, installed from Standard EDB package (built with MSVC).


Repro steps:

1. Have a cluster running on Windows (you'll need max_wal_senders at 
least 2 and wal_level replica for pg_basebackup -X stream)


2. create tablespace testspc location 'somedisk:\some\location'; -- 
Slash direction is irrelevant


3. run `pg_basebackup -T somedisk:\some\location=somedisk:\new\location 
-X stream -D somedisk:\some\other_location -h 127.0.0.1 -U postgres`


The error should read:

pg_basebackup: directory "somedisk:\some\location" exists but is not empty

---

This was discussed today in IRC. As a temporary solution it was 
suggested to add 'canonicalize_path(buf);' before return in 
/src/port/dirmod.c:336



** DISCLAMER: note that value of r is not adjusted, so this is not a 
production ready fix in any way. **



This fixed the issue, but as a side effect, pg_tablespace_location 
displays path in canonicalized format which might look weird to most 
Windows users.


There was a suggestion to fix this in client instead, but this fix was 
the simplest one to implement.



This is my first post to Hackers, so please let me know if I missed 
something or can provide any additional info to help further investigate 
this issue.



Kind regards,

Nick.

Re: [HACKERS] Another reason why the recovery tests take a long time

2017-06-26 Thread Andres Freund

Hi,


On 2017-06-26 12:32:00 -0400, Tom Lane wrote:
> I've found another edge-case bug through investigation of unexpectedly
> slow recovery test runs.  It goes like this:
> 
> * While streaming from master to slave, test script shuts down master
> while slave is left running.  We soon restart the master, but meanwhile:
> 
> * slave's walreceiver process fails, reporting
> 
> 2017-06-26 16:06:50.209 UTC [13209] LOG:  replication terminated by primary 
> server
> 2017-06-26 16:06:50.209 UTC [13209] DETAIL:  End of WAL reached on timeline 1 
> at 0/398.
> 2017-06-26 16:06:50.209 UTC [13209] FATAL:  could not send end-of-streaming 
> message to primary: no COPY in progress
> 
> * slave's startup process observes that walreceiver is gone and sends
> PMSIGNAL_START_WALRECEIVER to ask for a new one
> 
> * more often than you would guess, in fact nearly 100% reproducibly for
> me, the postmaster receives/services the PMSIGNAL before it receives
> SIGCHLD for the walreceiver.  In this situation sigusr1_handler just
> throws away the walreceiver start request, reasoning that the walreceiver
> is already running.

Yuck.

I've recently seen a bunch of symptoms vaguely around this, e.g. I can
atm frequently reconnect to a new backend after a
PANIC/segfault/whatnot, before postmastre gets the message.  I've not
yet figured out whether that's a kernel change, or whether some of the
more recent tinkering in postmaster.c changed this.


> * eventually, it dawns on the startup process that the walreceiver
> isn't starting, and it asks for a new one.  But that takes ten seconds
> (WALRCV_STARTUP_TIMEOUT).
> 
> So this looks like a pretty obvious race condition in the postmaster,
> which should be resolved by having it set a flag on receipt of
> PMSIGNAL_START_WALRECEIVER that's cleared only when it does start a
> new walreceiver.  But I wonder whether it's intentional that the old
> walreceiver dies in the first place.  That FATAL exit looks suspiciously
> like it wasn't originally-designed-in behavior.

It's quite intentional afaik - I've complained about the bad error
message recently (we really shouldn't say "no COPY in progress), but
exiting seems quite reasonable.  Otherwise we'd have add a separate
retry logic into the walsender, that reconnects without a new walsender
being started.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] GSoC 2017: Foreign Key Arrays

2017-06-26 Thread Alexander Korotkov

On Mon, Jun 26, 2017 at 2:26 AM, Mark Rofail  wrote:

> *What I did:*
>
>
>
>- read into the old patch but couldn't apply it since it's quite old.
>It needs to be rebased and that's what I am working on.  It's a lot of
>work.
>   - incomplete patch can be found attached here
>
> Have you met any particular problem here?  Or is it just a lot of
mechanical work?

*Bugs*
>
>- problem with the @>(anyarray, anyelement) opertator: if for example,
>you apply the operator as follows  '{AA646'}' @> 'AA646' it
>maps to @>(anyarray, anyarray) since 'AA646' is interpreted as
>char[] instead of Text
>
> I don't think it is bug.  When types are not specified explicitly, then
optimizer do its best on guessing them.  Sometimes results are
counterintuitive to user.  But that is not bug, it's probably a room for
improvement.  And I don't think this improvement should be subject of this
GSoC.  Anyway, array FK code should use explicit type cast, and then you
wouldn't meet this problem.

On the other hand, you could just choose another operator name for
arraycontainselem.
Then such problem probably wouldn't occur.

*Suggestion:*
>
>- since I needed to check if the Datum was null and its type, I had to
>do it in the arraycontainselem and pass it as a parameter to the underlying
>function array_contains_elem. I'm proposing to introduce a new struct like
>ArrayType, but ElementType along all with brand new MACROs to make dealing
>with anyelement easier in any polymorphic context.
>
> You don't need to do explicit check for nulls, because arraycontainselem
is marked as strict function.  Executor never pass null inputs to your
function if its declared as strict.  See evaluate_function().
Also, during query planning it's checked that all polymorphic are
consistent between each other.  See
https://www.postgresql.org/docs/devel/static/extend-type-system.html#extend-types-polymorphic
and check_generic_type_consistency() for details.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

[HACKERS] Another reason why the recovery tests take a long time

2017-06-26 Thread Tom Lane

I've found another edge-case bug through investigation of unexpectedly
slow recovery test runs.  It goes like this:

* While streaming from master to slave, test script shuts down master
while slave is left running.  We soon restart the master, but meanwhile:

* slave's walreceiver process fails, reporting

2017-06-26 16:06:50.209 UTC [13209] LOG:  replication terminated by primary 
server
2017-06-26 16:06:50.209 UTC [13209] DETAIL:  End of WAL reached on timeline 1 
at 0/398.
2017-06-26 16:06:50.209 UTC [13209] FATAL:  could not send end-of-streaming 
message to primary: no COPY in progress

* slave's startup process observes that walreceiver is gone and sends
PMSIGNAL_START_WALRECEIVER to ask for a new one

* more often than you would guess, in fact nearly 100% reproducibly for
me, the postmaster receives/services the PMSIGNAL before it receives
SIGCHLD for the walreceiver.  In this situation sigusr1_handler just
throws away the walreceiver start request, reasoning that the walreceiver
is already running.

* eventually, it dawns on the startup process that the walreceiver
isn't starting, and it asks for a new one.  But that takes ten seconds
(WALRCV_STARTUP_TIMEOUT).

So this looks like a pretty obvious race condition in the postmaster,
which should be resolved by having it set a flag on receipt of
PMSIGNAL_START_WALRECEIVER that's cleared only when it does start a
new walreceiver.  But I wonder whether it's intentional that the old
walreceiver dies in the first place.  That FATAL exit looks suspiciously
like it wasn't originally-designed-in behavior.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pluggable storage

2017-06-26 Thread Alexander Korotkov

Hackers,

I see that design question for PostgreSQL pluggable storages is very hard.
BTW, I think it worth analyzing existing use-cases of pluggable storages.
I think that the most famous DBMS with pluggable storage API is MySQL.
This why I decided to start with it.  I've added MySQL/MariaDB section on
wiki page.
https://wiki.postgresql.org/wiki/Future_of_storage#MySQL.2FMariaDB
It appears that significant part of MySQL storage engines are misuses.
MySQL lacks of various features like FDWs or writable views and so on.
This is why developers created a lot of pluggable storages for that
purposes.  We definitely don't want something like this in PostgreSQL now.
I created special resume column where I expressed whether it would be nice
to have something like this table engine in PostgreSQL.

Any comments and additions are welcome.  I'm planning to write similar
table for MongoDB.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-26 Thread Andrew Dunstan



On 06/26/2017 10:45 AM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 06/23/2017 07:47 AM, Andrew Dunstan wrote:
>>> Rerunning with some different settings to see if I can get separate cores.
>> Numerous attempts to get core dumps following methods suggested in
>> Google searches have failed. The latest one is just hanging.
> Well, if it's hung, maybe you could attach to the processes with gdb and
> get stack traces manually?
>
>   



In theory what I have set up is supposed to be doing that, but I'll try.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-26 Thread Tom Lane

Andrew Dunstan  writes:
> On 06/23/2017 07:47 AM, Andrew Dunstan wrote:
>> Rerunning with some different settings to see if I can get separate cores.

> Numerous attempts to get core dumps following methods suggested in
> Google searches have failed. The latest one is just hanging.

Well, if it's hung, maybe you could attach to the processes with gdb and
get stack traces manually?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix empty array expression in get_qual_for_list()

2017-06-26 Thread Tom Lane

Jeevan Ladhe  writes:
> In case of list partitioned table:
> 1. If there is a partition accepting only null values and nothing else, then
> currently the partition constraints for such a partition are constructed as
> "((a IS NULL) OR (a = ANY (ARRAY[]::integer[])))".
> I think there it is better to avoid constructing an empty array to avoid
> executing ANY expression.

Looks like a good idea, pushed with trivial stylistic adjustments.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-26 Thread Andrew Dunstan



On 06/26/2017 10:36 AM, Amit Kapila wrote:
> On Fri, Jun 23, 2017 at 9:12 AM, Andrew Dunstan
>  wrote:
>>
>> On 06/22/2017 10:24 AM, Tom Lane wrote:
>>> Andrew Dunstan  writes:
 Please let me know if there are tests I can run. I  missed your earlier
 request in this thread, sorry about that.
>>> That earlier request is still valid.  Also, if you can reproduce the
>>> symptom that lorikeet just showed and get a stack trace from the
>>> (hypothetical) postmaster core dump, that would be hugely valuable.
>>>
>>>
>>
>> See attached log and stacktrace
>>
> Is this all the log contents or is there something else?  The attached
> log looks strange to me in the sense that the first worker that gets
> invoked is Parallel worker number 2 and it finds that somebody else
> has already set the sender handle.
>



No, it's the end of the log file. I can rerun and get the whole log file
if you like.

cheers

andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-26 Thread Joshua D. Drake


On 06/26/2017 07:15 AM, Joel Jacobson wrote:

+1

On Tue, Jun 20, 2017 at 8:54 PM, Alvaro Herrera
 wrote:

Unless you have a lot of users running psql manually, I don't see how
this is actually very useful or actionable.  What would the user do with
the information?  Hopefully your users already trust that you'd keep the
downtime to the minimum possible.


I think this feature would be useful for PgTerminator
(https://github.com/trustly/pgterminator)
a tool which automatically kills unprotected processes that could
potentially be the reason why

X number of protected important processes have been waiting for >Y seconds.


When I'm guilty of locking this in the production DB and get killed by
PgTerminator,
it would be nice to know the reason, e.g. that it was PgTerminator
that killed me
and what process I was blocking.


And not just the pid but literally "what".

jD


--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
* Unless otherwise stated, opinions are my own.   *


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-26 Thread Amit Kapila

On Fri, Jun 23, 2017 at 9:12 AM, Andrew Dunstan
 wrote:
>
>
> On 06/22/2017 10:24 AM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> Please let me know if there are tests I can run. I  missed your earlier
>>> request in this thread, sorry about that.
>> That earlier request is still valid.  Also, if you can reproduce the
>> symptom that lorikeet just showed and get a stack trace from the
>> (hypothetical) postmaster core dump, that would be hugely valuable.
>>
>>
>
>
> See attached log and stacktrace
>

Is this all the log contents or is there something else?  The attached
log looks strange to me in the sense that the first worker that gets
invoked is Parallel worker number 2 and it finds that somebody else
has already set the sender handle.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-26 Thread Joel Jacobson

+1

On Tue, Jun 20, 2017 at 8:54 PM, Alvaro Herrera
 wrote:
> Unless you have a lot of users running psql manually, I don't see how
> this is actually very useful or actionable.  What would the user do with
> the information?  Hopefully your users already trust that you'd keep the
> downtime to the minimum possible.

I think this feature would be useful for PgTerminator
(https://github.com/trustly/pgterminator)
a tool which automatically kills unprotected processes that could
potentially be the reason why
>X number of protected important processes have been waiting for >Y seconds.

When I'm guilty of locking this in the production DB and get killed by
PgTerminator,
it would be nice to know the reason, e.g. that it was PgTerminator
that killed me
and what process I was blocking.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] intermittent failures in Cygwin from select_parallel tests

2017-06-26 Thread Andrew Dunstan



On 06/23/2017 07:47 AM, Andrew Dunstan wrote:
>
> On 06/23/2017 12:11 AM, Tom Lane wrote:
>> Andrew Dunstan  writes:
>>> On 06/22/2017 10:24 AM, Tom Lane wrote:
 That earlier request is still valid.  Also, if you can reproduce the
 symptom that lorikeet just showed and get a stack trace from the
 (hypothetical) postmaster core dump, that would be hugely valuable.
>>> See attached log and stacktrace
>> The stacktrace seems to be from the parallel-query-leader backend.
>> Was there another core file?
>>
>> The lack of any indication in the postmaster log that the postmaster saw
>> the parallel worker's crash sure looks the same as lorikeet's failure.
>> But if the postmaster didn't dump core, then we're still at a loss as to
>> why it didn't respond.
>>
>>  
>
>
> Rerunning with some different settings to see if I can get separate cores.
>


Numerous attempts to get core dumps following methods suggested in
Google searches have failed. The latest one is just hanging.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] \set AUTOROLLBACK ON

2017-06-26 Thread Joel Jacobson

Hi hackers,

A colleague of mine wondered if there is a way to always run
everything you type into psql in a db txn and automatically rollback
it as soon as it finish.
I couldn't think of any way to do so, but thought it would be a nice
feature and probably quite easy to add to psql, so I thought I should
suggest it here.

The typical use-case is you are doing something in production that you
just want to
a) test if some query works like expected and then rollback
or,
b) read-only queries that should not commit any changes anyway, so
here the rollback would just be an extra layer of security, since your
SELECT might call volatile functions that are actually not read-only

Thoughts?

/Joel


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A mistake in a comment

2017-06-26 Thread Tom Lane

Victor Drobny  writes:
> I believe that I have found a mistake in a comment to 
> parse_phrase_operator function. The comment has the following line:
> a  b (distance is no greater than X)
> which is not. According to documentation and practical results, this 
> line should me changed on something like:
> a  b (distance is equal to X)

Ah, this comment got missed when we changed the definition of .

> Patch in the attachments fixes the issue.

Will apply, thanks.  Looks to me like there's an outright bug here
as well: if errno happened to already be ERANGE when we reach the
strtol() call, the code will incorrectly report an error.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] fix empty array expression in get_qual_for_list()

2017-06-26 Thread Jeevan Ladhe

Here is an example:

create table t1 ( a int) partition by list (a);
create table t11 partition of t1 for values in (null);

*Current constraints:*
postgres=# \d+ t11;
Table "public.t11"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 a  | integer |   |  | | plain   |
 |
Partition of: t1 FOR VALUES IN (NULL)
Partition constraint: ((a IS NULL) OR (a = ANY (ARRAY[]::integer[])))

*Constraints after fix:*
postgres=# \d+ t11;
Table "public.t11"
 Column |  Type   | Collation | Nullable | Default | Storage | Stats target
| Description
+-+---+--+-+-+--+-
 a  | integer |   |  | | plain   |
 |
Partition of: t1 FOR VALUES IN (NULL)
Partition constraint: (a IS NULL)


Regards,
Jeevan Ladhe

On Mon, Jun 26, 2017 at 4:53 PM, Jeevan Ladhe  wrote:

> Hi,
>
> In case of list partitioned table:
> 1. If there is a partition accepting only null values and nothing else,
> then
> currently the partition constraints for such a partition are constructed as
> "((a IS NULL) OR (a = ANY (ARRAY[]::integer[])))".
> I think there it is better to avoid constructing an empty array to avoid
> executing ANY expression.
>
> 2.Also, we are constructing an expression using index 0 of arrays in
> PartitionKey since currently we have only one column for list partition in
> key,
> added an assert for this.
>
> PFA.
>
> Regards,
> Jeevan Ladhe
>

[HACKERS] fix empty array expression in get_qual_for_list()

2017-06-26 Thread Jeevan Ladhe

Hi,

In case of list partitioned table:
1. If there is a partition accepting only null values and nothing else, then
currently the partition constraints for such a partition are constructed as
"((a IS NULL) OR (a = ANY (ARRAY[]::integer[])))".
I think there it is better to avoid constructing an empty array to avoid
executing ANY expression.

2.Also, we are constructing an expression using index 0 of arrays in
PartitionKey since currently we have only one column for list partition in
key,
added an assert for this.

PFA.

Regards,
Jeevan Ladhe


fix_empty_arry_get_qual_for_list.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-06-26 Thread Kyotaro HORIGUCHI

Hello,

At Mon, 26 Jun 2017 18:16:42 +0900, Amit Langote 
 wrote in 
<7f5fe522-f328-3c40-565f-5e1ce3745...@lab.ntt.co.jp>
> Hello,
> 
> On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
> > Hello.
> > 
> > I had a case of unexpected error caused by ALTER TABLE NO
> > INHERIT.
> > 
> > =# CREATE TABLE p (a int);
> > =# CREATE TABLE c1 () INHERITS (p);
> > 
> > session A=# BEGIN;
> > session A=# ALTER TABLE c1 NO INHERIT p;
> > 
> > session B=# EXPLAIN ANALYZE SELECT * FROM p;
> > (blocked)
> > 
> > session A=# COMMIT;
> > 
> > session B: ERROR:  could not find inherited attribute "a" of relation "c1"
> > 
> > This happens at least back to 9.1 to master and doesn't seem to
> > be a designed behavior.
> 
> I recall I had proposed a fix for the same thing some time ago [1].

Wow. About 1.5 years ago and stuck? I have a concrete case where
this harms  so I'd like to fix that this time. How can we move on?

> > The cause is that NO INHERIT doesn't take an exlusive lock on the
> > parent. This allows expand_inherited_rtentry to add the child
> > relation into appendrel after removal from the inheritance but
> > still exists.
> 
> Right.
> 
> > I see two ways to fix this.
> > 
> > The first patch adds a recheck of inheritance relationship if the
> > corresponding attribute is missing in the child in
> > make_inh_translation_list(). The recheck is a bit complex but it
> > is not performed unless the sequence above is happen. It checks
> > duplication of relid (or cycles in inheritance) following
> > find_all_inheritors (but doing a bit different) but I'm not sure
> > it is really useful.
> 
> I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
> I guess your hash table based solution will do the job as far as
> performance of this check is concerned, although I haven't checked the
> code closely.

The hash table is not crucial in the patch. Substantially it just
recurs down looking up pg_inherits for the child. I considered
the new index but abandoned because I thought that such case
won't occur so frequently.

> > The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> > the parent first.
> > 
> > Since the latter has a larger impact on the current behavior and
> > we already treat "DROP TABLE child" case in the similar way, I
> > suppose that the first approach would be preferable.
> 
> That makes sense.
> 
> BTW, in the partitioned table case, the parent is always locked first
> using an AccessExclusiveLock.  There are other considerations in that case
> such as needing to recreate the partition descriptor upon termination of
> inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Apart from the degree of concurrency, if we keep parent->children
order of locking, such recreation does not seem to be
needed. Maybe I'm missing something.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-06-26 Thread Amit Langote

Hello,

On 2017/06/26 17:46, Kyotaro HORIGUCHI wrote:
> Hello.
> 
> I had a case of unexpected error caused by ALTER TABLE NO
> INHERIT.
> 
> =# CREATE TABLE p (a int);
> =# CREATE TABLE c1 () INHERITS (p);
> 
> session A=# BEGIN;
> session A=# ALTER TABLE c1 NO INHERIT p;
> 
> session B=# EXPLAIN ANALYZE SELECT * FROM p;
> (blocked)
> 
> session A=# COMMIT;
> 
> session B: ERROR:  could not find inherited attribute "a" of relation "c1"
> 
> This happens at least back to 9.1 to master and doesn't seem to
> be a designed behavior.

I recall I had proposed a fix for the same thing some time ago [1].

> The cause is that NO INHERIT doesn't take an exlusive lock on the
> parent. This allows expand_inherited_rtentry to add the child
> relation into appendrel after removal from the inheritance but
> still exists.

Right.

> I see two ways to fix this.
> 
> The first patch adds a recheck of inheritance relationship if the
> corresponding attribute is missing in the child in
> make_inh_translation_list(). The recheck is a bit complex but it
> is not performed unless the sequence above is happen. It checks
> duplication of relid (or cycles in inheritance) following
> find_all_inheritors (but doing a bit different) but I'm not sure
> it is really useful.

I had proposed adding a syscache on pg_inherits(inhrelid, inhparent), but
I guess your hash table based solution will do the job as far as
performance of this check is concerned, although I haven't checked the
code closely.

> The second patch lets ALTER TABLE NO INHERIT to acquire locks on
> the parent first.
> 
> Since the latter has a larger impact on the current behavior and
> we already treat "DROP TABLE child" case in the similar way, I
> suppose that the first approach would be preferable.

That makes sense.

BTW, in the partitioned table case, the parent is always locked first
using an AccessExclusiveLock.  There are other considerations in that case
such as needing to recreate the partition descriptor upon termination of
inheritance (both the DETACH PARTITION and also DROP TABLE child cases).

Thanks,
Amit

[1] https://www.postgresql.org/message-id/565EB1F2.7000201%40lab.ntt.co.jp



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] shift_sjis_2004 related autority files are remaining

2017-06-26 Thread Kyotaro HORIGUCHI

Hi,

At Sun, 25 Jun 2017 09:20:10 +0900 (JST), Tatsuo Ishii  
wrote in <20170625.092010.542143642647288693.t-is...@sraoss.co.jp>
> > I don't have a clear opinion on this particular issue, but I think we
> > should have clarity on why particular files or code exist.  So why do
> > these files exist and the others don't?  Is it just the license?
> 
> I think so.
> 
> Many of those files are from http://ftp.unicode.org. There's no
> license description there, so I think we should not copy those files
> for safety reason. (I vaguely recall that they explicitly prohibited
> to distribute the files before but I could no find such a statement at
> this moment).

The license for the files is seen in "EXHIBIT 1" in the following URL.

http://www.unicode.org/copyright.html

Roughly it claims that the copied files or software containing
the copy of thefiles should be accompanied by the same copyright
notice, or it should be seen in associated documentation. So we
could contain the files by adding some notice but fially we
decide not to contain them in the repository, I think

> gb-18030-2000.xml and windows-949-2000.xml are from
> https://ssl.icu-project.org/. I do not know what licenses those files
> use (maybe Apache).
> 
> Regarding euc-jis-2004-std.txt and sjis-0213-2004-std.txt are from
> http://x0213.org. The license are described in the files.

I'm not intending to insisnt on removing them if someone strongly
wants to preserve them, since their existence don't harm
anything.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Race between SELECT and ALTER TABLE NO INHERIT

2017-06-26 Thread Kyotaro HORIGUCHI

Hello.

I had a case of unexpected error caused by ALTER TABLE NO
INHERIT.

=# CREATE TABLE p (a int);
=# CREATE TABLE c1 () INHERITS (p);

session A=# BEGIN;
session A=# ALTER TABLE c1 NO INHERIT p;

session B=# EXPLAIN ANALYZE SELECT * FROM p;
(blocked)

session A=# COMMIT;

session B: ERROR:  could not find inherited attribute "a" of relation "c1"

This happens at least back to 9.1 to master and doesn't seem to
be a designed behavior.

The cause is that NO INHERIT doesn't take an exlusive lock on the
parent. This allows expand_inherited_rtentry to add the child
relation into appendrel after removal from the inheritance but
still exists.

I see two ways to fix this.

The first patch adds a recheck of inheritance relationship if the
corresponding attribute is missing in the child in
make_inh_translation_list(). The recheck is a bit complex but it
is not performed unless the sequence above is happen. It checks
duplication of relid (or cycles in inheritance) following
find_all_inheritors (but doing a bit different) but I'm not sure
it is really useful.


The second patch lets ALTER TABLE NO INHERIT to acquire locks on
the parent first.

Since the latter has a larger impact on the current behavior and
we already treat "DROP TABLE child" case in the similar way, I
suppose that the first approach would be preferable.


Any comments or thoughts?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/catalog/pg_inherits.c b/src/backend/catalog/pg_inherits.c
index e5fb52c..1670d11 100644
--- a/src/backend/catalog/pg_inherits.c
+++ b/src/backend/catalog/pg_inherits.c
@@ -42,6 +42,8 @@ typedef struct SeenRelsEntry
 	ListCell   *numparents_cell;	/* corresponding list cell */
 } SeenRelsEntry;
 
+static bool is_descendent_of_internal(Oid parentId, Oid childId,
+	  HTAB *seen_rels);
 /*
  * find_inheritance_children
  *
@@ -376,3 +378,71 @@ typeInheritsFrom(Oid subclassTypeId, Oid superclassTypeId)
 
 	return result;
 }
+
+/*
+ * Check if the child is really a descendent of the parent
+ */
+bool
+is_descendent_of(Oid parentId, Oid childId)
+{
+	HTAB	   *seen_rels;
+	HASHCTL		ctl;
+	bool		ischild = false;
+
+	memset(&ctl, 0, sizeof(ctl));
+	ctl.keysize = sizeof(Oid);
+	ctl.entrysize = sizeof(Oid);
+	ctl.hcxt = CurrentMemoryContext;
+
+	seen_rels = hash_create("is_descendent_of temporary table",
+			32, /* start small and extend */
+			&ctl,
+			HASH_ELEM | HASH_BLOBS | HASH_CONTEXT);
+
+	ischild = is_descendent_of_internal(parentId, childId, seen_rels);
+
+	hash_destroy(seen_rels);
+
+	return ischild;
+}
+
+static bool
+is_descendent_of_internal(Oid parentId, Oid childId, HTAB *seen_rels)
+{
+	Relation	inhrel;
+	SysScanDesc scan;
+	ScanKeyData key[1];
+	bool		ischild = false;
+	HeapTuple	inheritsTuple;
+
+	inhrel = heap_open(InheritsRelationId, AccessShareLock);
+	ScanKeyInit(&key[0], Anum_pg_inherits_inhparent,
+BTEqualStrategyNumber, F_OIDEQ,	ObjectIdGetDatum(parentId));
+	scan = systable_beginscan(inhrel, InheritsParentIndexId, true,
+			  NULL, 1, key);
+
+	while ((inheritsTuple = systable_getnext(scan)) != NULL)
+	{
+		bool found;
+		Oid inhrelid = ((Form_pg_inherits) GETSTRUCT(inheritsTuple))->inhrelid;
+
+		hash_search(seen_rels, &inhrelid, HASH_ENTER, &found);
+
+		/*
+		 * Recursively check into children. Although there can't theoretically
+		 * be any cycles in the inheritance graph, check the cycles following
+		 * find_all_inheritors.
+		 */
+		if (inhrelid == childId ||
+			(!found && is_descendent_of_internal(inhrelid, childId, seen_rels)))
+		{
+			ischild = true;
+			break;
+		}
+	}
+
+	systable_endscan(scan);
+	heap_close(inhrel, AccessShareLock);
+
+	return ischild;
+}
diff --git a/src/backend/optimizer/prep/prepunion.c b/src/backend/optimizer/prep/prepunion.c
index cf46b74..f1c582a 100644
--- a/src/backend/optimizer/prep/prepunion.c
+++ b/src/backend/optimizer/prep/prepunion.c
@@ -99,10 +99,9 @@ static List *generate_append_tlist(List *colTypes, List *colCollations,
 static List *generate_setop_grouplist(SetOperationStmt *op, List *targetlist);
 static void expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte,
 		 Index rti);
-static void make_inh_translation_list(Relation oldrelation,
+static List *make_inh_translation_list(Relation oldrelation,
 		  Relation newrelation,
-		  Index newvarno,
-		  List **translated_vars);
+		  Index newvarno);
 static Bitmapset *translate_col_privs(const Bitmapset *parent_privs,
 	List *translated_vars);
 static Node *adjust_appendrel_attrs_mutator(Node *node,
@@ -1502,14 +1501,28 @@ expand_inherited_rtentry(PlannerInfo *root, RangeTblEntry *rte, Index rti)
 		 */
 		if (childrte->relkind != RELKIND_PARTITIONED_TABLE)
 		{
+			List *translated_vars =
+make_inh_translation_list(oldrelation, newrelation,
+		  childRTindex);
+
+			if (!translated_vars)
+			{
+/*
+ * This childrel is no longer a child of the parent.
+ * Close the child relation releas

[HACKERS] A mistake in a comment

2017-06-26 Thread Victor Drobny


Hello,

I believe that I have found a mistake in a comment to 
parse_phrase_operator function. The comment has the following line:

a  b (distance is no greater than X)
which is not. According to documentation and practical results, this 
line should me changed on something like:

a  b (distance is equal to X)

Patch in the attachments fixes the issue.

Thank you for attention!

Best,
Victordiff --git a/src/backend/utils/adt/tsquery.c b/src/backend/utils/adt/tsquery.c
index ee047bd..260d780 100644
--- a/src/backend/utils/adt/tsquery.c
+++ b/src/backend/utils/adt/tsquery.c
@@ -113,7 +113,7 @@ get_modifiers(char *buf, int16 *weight, bool *prefix)
  * Parse phrase operator. The operator
  * may take the following forms:
  *
- *		a  b (distance is no greater than X)
+ *		a  b (distance is equal to X)
  *		a <-> b (default distance = 1)
  *
  * The buffer should begin with '<' char

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Optional message to user when terminating/cancelling backend

2017-06-26 Thread Daniel Gustafsson

> On 22 Jun 2017, at 17:52, Dilip Kumar  wrote:
> 
> On Thu, Jun 22, 2017 at 7:18 PM, Daniel Gustafsson  wrote:
>> Good point.  I’ve attached a new version which issues a NOTICE on truncation
>> and also addresses all other comments so far in this thread.  Thanks a lot 
>> for
>> the early patch reviews!
>> 
>> cheers ./daniel
> 
> I have done an initial review of the patch. I have some comments/suggestions.

Thanks!

> +int
> +GetCancelMessage(char **buffer, size_t buf_len)
> +{
> + volatile BackendCancelShmemStruct *slot = MyCancelSlot;
> + int msg_length = 0;
> +
> 
> Returned value of this function is never used, better to convert it to
> just void.

You’re probably right, I was thinking that someone might be interested in
knowing about truncation when extracting the message but I can’t really think
of a callsite which wouldn’t just pass in a large enough buffer in the first
place.

> +bool
> +HasCancelMessage(void)
> +{
> + volatile BackendCancelShmemStruct *slot = MyCancelSlot;
> 
> +/*
> + * Return the configured cancellation message and its length. If the returned
> + * length is greater than the size of the passed buffer, truncation has been
> + * performed. The message is cleared on reading.
> + */
> +int
> +GetCancelMessage(char **buffer, size_t buf_len)
> 
> I think it will be good to merge these two function where
> GetCancelMessage will first check whether it has the message or not
> if it does then allocate the buffer of size slot->len and copy the
> slot message to allocated buffer otherwise just return NULL.
> 
> So it will look like "char *GetCancelMessage()”

It doesn’t seem like a good idea to perform memory allocation inside a spinlock
in a signalled backend, that would probably at least require an LWLock wouldn’t
it?  It seems safer to leave memory management to the signalled backend but it
may be paranoia on my part.

> + SpinLockAcquire(&slot->mutex);
> + strlcpy(*buffer, (const char *) slot->message, buf_len);
> 
> strlcpy(*buffer, (const char *) slot->message, slot->len);
> 
> Isn't it better to copy only upto slot->len, seems like it will always
> be <= buf_len, and if not then
> we can do min(buf_len, slot->len)

strlcpy(3) is defined as taking the size of the passed buffer and not the
copied string.  Since we guarantee that slot->message is NUL terminated it
seems wise to stick to the API. Or did I misunderstand your comment?

cheers ./daniel

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Remove old comments in dependencies.c and README.dependencies

2017-06-26 Thread atorikoshi


Hi,

I found some comments which are not implemented.

As far as I have examined, these comments refer to min_group_size,
but min_group_size was decided not to adopt and removed[1], so
it seems these comments also should be removed.

[1]  
https://www.postgresql.org/message-id/cakjs1f_vuck+qbxqsh4m_fns+d4xa2kpyvvh0zymjvg-eeh...@mail.gmail.com



1) DEPENDENCY_MIN_GROUP_SIZE

I'm not sure we still need the min_group_size, when evaluating
dependencies. It was meant to deal with 'noisy' data, but I think it
after switching to the 'degree' it might actually be a bad idea.


Yeah, I'd wondered about this when I first started testing the patch.
I failed to get any functional dependencies because my values were too
unique. Seems I'd gotten a bit used to it, and in the end thought that
if the values are unique enough then they won't suffer as much from
the underestimation problem you're trying to solve here.

I've removed that part of the code now.



Attached patch removes the comments about min_group_size.

Regards,

--
Atsushi Torikoshi
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
diff --git a/src/backend/statistics/README.dependencies 
b/src/backend/statistics/README.dependencies
index 702d34e..6c446bd 100644
--- a/src/backend/statistics/README.dependencies
+++ b/src/backend/statistics/README.dependencies
@@ -71,10 +71,6 @@ To count the rows consistent with the dependency (a => b):
  (c) If there's a single distinct value in 'b', the rows are consistent with
  the functional dependency, otherwise they contradict it.
 
-The algorithm also requires a minimum size of the group to consider it
-consistent (currently 3 rows in the sample). Small groups make it less likely
-to break the consistency.
-
 
 Clause reduction (planner/optimizer)
 
diff --git a/src/backend/statistics/dependencies.c 
b/src/backend/statistics/dependencies.c
index 89dece3..2e7c0ad 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -286,14 +286,6 @@ dependency_degree(int numrows, HeapTuple *rows, int k, 
AttrNumber *dependency,
 * first (k-1) columns. If there's a single value in the last column, we
 * count the group as 'supporting' the functional dependency. Otherwise 
we
 * count it as contradicting.
-*
-* We also require a group to have a minimum number of rows to be
-* considered useful for supporting the dependency. Contradicting groups
-* may be of any size, though.
-*
-* XXX The minimum size requirement makes it impossible to identify case
-* when both columns are unique (or nearly unique), and therefore
-* trivially functionally dependent.
 */
 
/* start with the first row forming a group */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Setting pd_lower in GIN metapage

2017-06-26 Thread Masahiko Sawada

On Mon, Jun 26, 2017 at 3:54 PM, Amit Langote
 wrote:
> On 2017/06/26 10:54, Michael Paquier wrote:
>> On Fri, Jun 23, 2017 at 11:17 AM, Amit Langote
>>  wrote:
>>> That was it, thanks for the pointer.
>>
>> GinInitMetabuffer() sets up pd_lower and pd_upper anyway using
>> PageInit so the check of PageIsVerified is guaranteed to work in any
>> case. Upgraded pages will still have their pd_lower set to the
>> previous values, and new pages will have the optimization. So this
>> patch is actually harmless for past pages, while newer ones are seen
>> as more compressible.
>
> Right.
>
>>> Attached updated patch, which I confirmed, passes wal_consistency_check = 
>>> gin.
>>
>> I have spent some time looking at this patch, playing with pg_upgrade
>> to check the state of the page upgraded. And this looks good to me.
>
> Thanks for the review.
>
>> One thing that I noticed is that this optimization could as well
>> happen for spgist meta pages. What do others think?
>
> I agree.  As Sawada-san mentioned, brin metapage code can use a similar patch.
>
> So attached are three patches for gin, brin, and sp-gist respectively.
> Both brin and sp-gist cases didn't require any special consideration for
> passing wal_consistency_checking, as the patch didn't cause brin and
> sp-gist metapages to become invalid when recreated on standby (remember
> that patch 0001 needed to be updated for that).
>

Thank you for the patches! I checked additional patches for brin and
spgist. They look good to me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_filedump doesn't compile with v10 sources

2017-06-26 Thread Ashutosh Sharma

Hi,

On Mon, Jun 26, 2017 at 12:25 PM, tushar  wrote:
> Hi,
>
> While trying to do - make of pg_filedump against v10 sources , getting an
> errors
>
> [centos@centos-cpula pg_filedump]$ make
> cc -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector
> --param=ssp-buffer-size=4 -m64 -mtune=generic -DLINUX_OOM_ADJ=0 -Wall
> -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
> -Wendif-labels -fno-strict-aliasing -fwrapv
> -I/home/centos/pg10_/postgresql/src/include/ pg_filedump.c -c
> pg_filedump.c: In function ‘FormatControl’:
> pg_filedump.c:1650: error: ‘ControlFileData’ has no member named
> ‘enableIntTimes’
> make: *** [pg_filedump.o] Error 1
> [centos@centos-cpula pg_filedump]$
>

That's because the following git commit in v10 has removed
'enableIntTimes' member from 'ControlFileData' structure,

commit d28aafb6dda326688e2f042c95c93ea57963c03c
Author: Tom Lane 
Date:   Thu Feb 23 12:23:12 2017 -0500

Remove pg_control's enableIntTimes field.

We don't need it any more.

pg_controldata continues to report that date/time type storage is
"64-bit integers", but that's now a hard-wired behavior not something
it sees in the data.  This avoids breaking pg_upgrade, and perhaps other
utilities that inspect pg_control this way.  Ditto for pg_resetwal.

I chose to remove the "bigint_timestamps" output column of
pg_control_init(), though, as that function hasn't been around long
and probably doesn't have ossified users.

Discussion: https://postgr.es/m/26788.1487455...@sss.pgh.pa.us

I think we will have change pg_filedump such that it no more refers to
'enableIntTimes'.

--
With Regards,
Ashutosh Sharma
EnterpriseDB:http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

71 matches

Mail list logo