Re: [HACKERS] Race conditions with WAL sender PID lookups

2017-07-02 Thread Michael Paquier
On Sat, Jul 1, 2017 at 7:20 AM, Alvaro Herrera  wrote:
> Michael Paquier wrote:
>> Considering how crazy the conditions to make the information fetched
>> by users inconsistent are met, I agree with that.
>
> Pushed.

Thanks Álvaro for pushing the patch. I had a second look and the
result looks good to me.

-   SpinLockAcquire(>mutex);
+   }
+   pid = walsnd->pid;
The WAL receiver code used a cast to (int) in
pg_stat_get_wal_receiver(). I don't recall adding it. Why not being
consistent for both by removing the one of the WAL receiver code?

> In passing, clean up some leftover braces which were used to create
> unconditional blocks.  Once upon a time these were used for
> volatile-izing accesses to those shmem structs, which is no longer
> required.  Many other occurrences of this pattern remain.

Here are the places where a cleanup can happen:
- WalSndSetState
- ProcessStandbyReplyMessage
- XLogRead, 2 places
- XLogSendLogical
- WalRcvWaitForStartPosition
- WalRcvDie
- XLogWalRcvFlush
- ProcessWalSndrMessage
In most of the places of the WAL sender, braces could be removed to
improve the style. For the WAL receiver, declarations are not
necessary. As a matter of style, why not cleaning up just the WAL
sender stuff? Changing the WAL receiver code just to remove some
declarations would not improve readability, and would make back-patch
more difficult.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-02 Thread Amit Langote
On 2017/07/03 2:15, Dean Rasheed wrote:
> On 30 June 2017 at 10:04, Ashutosh Bapat
>  wrote:
>> On Fri, Jun 30, 2017 at 1:36 PM, Amit Langote
>>  wrote:
>>>
>>> Alright, I spent some time implementing a patch to allow specifying
>>> -infinity and +infinity in arbitrary ways.  Of course, it prevents
>>> nonsensical inputs with appropriate error messages.
>>
>> I don't think -infinity and +infinity are the right terms. For a
>> string or character data type there is no -infinity and +infinity.
>> Similarly for enums. We need to extend UNBOUNDED somehow to indicate
>> the end of a given type in the given direction. I thought about
>> UNBOUNDED LEFT/RIGHT but then whether LEFT indicates -ve side or +side
>> would cause confusion. Also LEFT/RIGHT may work for a single
>> dimensional datatype but not for multi-dimensional spaces. How about
>> MINIMUM/MAXIMUM or UNBOUNDED MIN/MAX to indicate the extremities.
>>
> 
> Yes, I think you're right. Also, some datatypes include values that
> are equal to +/-infinity, which would then behave differently from
> unbounded as range bounds, so it wouldn't be a good idea to overload
> that term.

Agree with you both that using (+/-) infinity may not be a good idea after
all.

> My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
> terminology already in use of upper and lower bounds.

I was starting to like the Ashutosh's suggested UNBOUNDED MIN/MAX syntax,
but could you clarify your comment that ABOVE/BELOW is the terminology
already in use of upper and lower bounds?  I couldn't find ABOVE/BELOW in
our existing syntax anywhere that uses the upper/lower bound notion, so
was confused a little bit.

Also, I assume UNBOUNDED ABOVE signifies positive infinity and vice versa.

Thanks,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Broken hint bits (freeze)

2017-07-02 Thread Amit Kapila
On Thu, Jun 29, 2017 at 6:57 AM, Bruce Momjian  wrote:
> On Sat, Jun 24, 2017 at 09:24:21AM +0530, Amit Kapila wrote:
>> > I was not clear.  I was not saying there can be only one extra WAL file.
>> > I am saying the "Latest checkpoint location" should be one WAL file
>> > farther on the master.  I think the big problem is that we need a full
>> > replay of that WAL file, not just having it one less than the master.
>> >
>>
>> If the user has properly shutdown, then that last file should only
>> have checkpoint record, is it safe to proceed with upgrade without
>> actually copying that file?
>
> Yes, but how do we know they processed all the records in the
> second-to-last WAL file (in WAL shipping mode).
>

I don't see any straightforward way to know the same except that user
gets the latest WAL location (replay or flush) and then verify it
against last wal file (maybe by using something like pg_waldump).  I
think the another problem as pointed by Sergey up thread is how to
ensure all the buffers that contain changes are flushed to disk.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] hash index on unlogged tables doesn't behave as expected

2017-07-02 Thread Amit Kapila
While discussing the behavior of hash indexes with Bruce in the nearby
thread [1], it has been noticed that hash index on unlogged tables
doesn't behave as expected.  Prior to 10, it has the different set of
problems (mainly because hash indexes are not WAL-logged) which were
discussed on that thread [1], however when I checked, it doesn't work
even for 10.  Below are steps to reproduce the problem.

1. Setup master and standby
2. On the master, create unlogged table and hash index.
   2A.  Create unlogged table t1(c1 int);
   2B.  Create hash index idx_t1_hash on t1 using hash(c1);
3. On Standby, try selecting data,
   select * from t1;
   ERROR:  cannot access temporary or unlogged relations during recovery
---Till here everything works as expected
4. Promote standby to master (I have just stopped the standby and
master and removed recovery.conf file from the standby database
location) and try starting the new master, it
gives below error and doesn't get started.
FATAL:  could not create file "base/12700/16387": File exists

The basic issue was that the WAL logging for Create Index operation
was oblivion of the fact that for unlogged tables only INIT forks need
to be logged.  Another point which we need to consider is that while
replaying the WAL for the create index operation, we need to flush the
buffer if it is for init fork.  This needs to be done only for pages
that can be part of init fork file (like metapage, bitmappage).
Attached patch fixes the issue.

Another approach to fix the issue could be that always log complete
pages for the create index operation on unlogged tables (in
hashbuildempty).  However, the logic for initial hash index pages
needs us to perform certain other actions (like update metapage after
the creation of bitmappage) which make it difficult to follow that
approach.

I think this should be considered a PostgreSQL 10 open item.


[1] - https://www.postgresql.org/message-id/20170630005634.GA4448%40momjian.us

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


fix_unlogged_hash_index_issue_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] "SELECT *" vs hidden columns and logical column order

2017-07-02 Thread Thomas Munro
On Sat, Jul 1, 2017 at 6:09 AM, Peter Eisentraut
 wrote:
> On 6/28/17 23:52, Thomas Munro wrote:
>> 2.  SQL:2011 temporal tables track system time and/or valid time with
>> columns that users create and then declare to be temporal control
>> columns.  I don't think they show up unless you name them directly (I
>> didn't check the standard but I noticed that it's that way in another
>> product), so I guess that's basically the same as (1).
>
> In my reading of the standard, those start/end time columns would show
> up normally in SELECT *.

Oh, well that's surprising.  I don't have a recent enough SQL standard
to hand, but I saw that a couple of popular RDBMSs with temporal
support understand GENERATED ALWAYS AS ROW { START | END } [[ IMPLICITLY ]
HIDDEN ] where that last bit controls whether SELECT * sees it.

-- 
Thomas Munro
http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] [HACKERS] Segmentation fault in libpq

2017-07-02 Thread Craig Ringer
On 3 July 2017 at 03:12, Andres Freund  wrote:
> Hi,
>
> On 2017-07-02 20:58:52 +0200, Michal Novotný wrote:
>> thank you all for your advice. I've been investigating this a little more
>> and finally it turned out it's not a bug in libpq although I got confused
>> by going deep as several libpq functions. The bug was really on our side
>> after trying to use connection pointer after calling PQfinish(). The code
>> is pretty complex so it took some time to investigate however I would like
>> to apologize for "blaming" libpq instead of our code.
>
> Usually using a tool like valgrind is quite helpful to find issues like
> that, because it'll show you the call-stack accessing the memory and
> *also* the call-stack that lead to the memory being freed.

Yep, huge help.

BTW, on Windows, the free tool DrMemory (now 64-bit too, yay) or
commercial Purify work great.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-02 Thread Amit Langote
Hi Dean,

Thanks a lot for the review.

On 2017/07/03 1:59, Dean Rasheed wrote:
> On 30 June 2017 at 09:06, Amit Langote  wrote:
>> When testing the patch, I realized that the current code in
>> check_new_partition_bound() that checks for range partition overlap had a
>> latent bug that resulted in false positives for the new cases that the new
>> less restrictive syntax allowed.  I spent some time simplifying that code
>> while also fixing the aforementioned bug.  It's implemented in the
>> attached patch 0001.
>>
> 
> I haven't had time to look at 0002 yet, but looking at 0001, I'm not
> convinced that this really represents much of a simplification, but I
> do prefer the way it now consistently reports the first overlapping
> partition in the error message.
> 
> I'm not entirely convinced by this change either:
> 
> -if (equal || off1 != off2)
> +if (off2 > off1 + 1 || ((off2 == off1 + 1) && 
> !equal))
> 
> Passing probe_is_bound = true to partition_bound_bsearch() will I
> think cause it to return equal = false when the upper bound of one
> partition equals the lower bound of another, so relying on the "equal"
> flag here seems a bit dubious. I think I can just about convince
> myself that it works, but not for the reasons stated in the comments.

You are right.  What's actually happening in the case where I was thinking
equal would be set to true is that off2 ends up being equal to off1, so
the second arm of that || is not checked at all.

> It also seems unnecessary for this code to be doing 2 binary searches.
> I think a better simplification would be to just do one binary search
> to find the gap that the lower bound fits in, and then test the upper
> bound of the new partition against the lower bound of the next
> partition (if there is one), as in the attached patch.

I agree.  The patch looks good to me.

Thanks again.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Get stuck when dropping a subscription during synchronizing table

2017-07-02 Thread Masahiko Sawada
On Wed, Jun 28, 2017 at 2:13 PM, Masahiko Sawada  wrote:
> On Wed, Jun 28, 2017 at 1:47 AM, Petr Jelinek
>  wrote:
>>
>> On 27/06/17 10:51, Masahiko Sawada wrote:
>>> On Mon, Jun 26, 2017 at 12:12 PM, Masahiko Sawada  
>>> wrote:
>>>
>>> I've reviewed this patch briefly.
>>
>> Thanks!
>>
>>>
>>> @@ -515,6 +533,31 @@ logicalrep_worker_stop(Oid subid, Oid relid)
>>>  }
>>>
>>>  /*
>>> + * Request worker to be stopped on commit.
>>> + */
>>> +void
>>> +logicalrep_worker_stop_at_commit(Oid subid, Oid relid)
>>> +{
>>> +   LogicalRepWorkerId *wid;
>>> +   MemoryContext old;
>>> +
>>> +   old = MemoryContextSwitchTo(TopTransactionContext);
>>> +
>>> +   wid = (LogicalRepWorkerId *) palloc(sizeof(LogicalRepWorkerId));
>>> +
>>> +   /*
>>> +   wid = MemoryContextAlloc(TopTransactionContext,
>>> +
>>> sizeof(LogicalRepWorkerId));
>>> +   */
>>> +   wid->subid = subid;
>>> +   wid->relid = relid;
>>> +
>>> +   on_commit_stop_workers = lappend(on_commit_stop_workers, wid);
>>> +
>>> +   MemoryContextSwitchTo(old);
>>> +}
>>>
>>> logicalrep_worker_stop_at_commit() has a problem that new_list()
>>> called by lappend() allocates the memory from current memory context.
>>> It should switch the memory context and then allocate the memory for
>>> wid and append it to the list.
>>>
>
> Thank you for updating the patch!
>
>>
>> Right, fixed (I see you did that locally as well based on the above
>> excerpt ;) ).
>
> Oops, yeah that's my test code.
>
>>> 
>>> @@ -754,9 +773,25 @@ ApplyLauncherShmemInit(void)
>>>  void
>>>  AtEOXact_ApplyLauncher(bool isCommit)
>>>  {
>>> -   if (isCommit && on_commit_launcher_wakeup)
>>> -   ApplyLauncherWakeup();
>>> +   ListCell *lc;
>>> +
>>> +   if (isCommit)
>>> +   {
>>> +   foreach (lc, on_commit_stop_workers)
>>> +   {
>>> +   LogicalRepWorkerId *wid = lfirst(lc);
>>> +   logicalrep_worker_stop(wid->subid, wid->relid);
>>> +   }
>>> +
>>> +   if (on_commit_launcher_wakeup)
>>> +   ApplyLauncherWakeup();
>>>
>>> Stopping the logical rep worker in AtEOXact_ApplyLauncher doesn't
>>> support the prepared transaction. Since we allocate the list
>>> on_commit_stop_workers in TopTransactionContext the postgres crashes
>>> if we execute any query after prepared transaction that removes
>>> subscription relation state. Also after fixed this issue, we still
>>> need to something: the list of on_commit_stop_workers is not
>>> associated the prepared transaction.  A query next to "preapre
>>> transaction" tries to stop workers at the commit. There was similar
>>> discussion before.
>>>
>>
>> Hmm, good point. I think for now it makes sense to simply don't allow
>> PREPARE for transactions that manipulate workers similar to what we do
>> when there are exported snapshots. Done it that way in attached.
>
> Agreed. Should we note that in docs?
>
>>
>>> 
>>> +
>>> +   ensure_transaction();
>>> +
>>> UpdateSubscriptionRelState(MyLogicalRepWorker->subid,
>>> +
>>> rstate->relid, rstate->state,
>>> +
>>> rstate->lsn);
>>>
>>>
>>> Should we commit the transaction if we started a new transaction
>>> before update the subscription relation state, or it could be deadlock
>>> risk.
>>
>> We only lock the whole subscription (and only conflicting things are
>> DROP and ALTER SUBSCRIPTION), not individual subscription-relation
>> mapping so it doesn't seem to me like there is any higher risk of
>> deadlocks than anything else which works with multiple tables (or
>> compared to previous behavior).
>>
>
> I'm concerned that a lock for whole subscription could be conflicted
> between ALTER SUBSCRIPTION, table sync worker and apply worker:
>
> Please imagine the following case.
>
> 1. The apply worker updates a subscription relation state R1 of
> subscription S1.
> -> It acquires AccessShareLock on S1, and keep holding.
> 2. ALTER SUBSCRIPTION SET PUBLICATION tries to acquire
> AccessExclusiveLock on S1.
> -> But it waits for the apply worker to release the lock.
> 3. The apply worker calls wait_for_relation_state_change(relid,
> SUBREL_STATE_SYNCDONE) and waits for the table sync worker
> for R2 to change its status.
> -> Note that the apply worker is still holding AccessShareLock on S1
> 4. The table sync worker tries to update its status to SYNCDONE
> -> In UpdateSubscriptionRelState(), it tires to acquire AccessShareLock
>on S1 but waits for it. *deadlock*
>
> To summary, because the apply worker keeps holding AccessShareLock on
> S1, the acquiring AccessExclusiveLock by ALTER SUBSCRIPTION waits for
> the apply worker, and then the table sync worker also waits for the
> ALTER SUBSCRIPTION in order to change its status. And the apply worker
> waits for the 

Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Tom Lane
Craig Ringer  writes:
> That's my bad.
> (Insert dark muttering about Perl here).

Yeah, pretty much the only good thing about Perl is it's ubiquitous.
But you could say the same of C.  Or SQL.  For a profession that's
under 70 years old, we sure spend a lot of time dealing with legacy
stuff.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgresNode::poll_query_until hacking

2017-07-02 Thread Tom Lane
Michael Paquier  writes:
> If you would like to get some feedback from me, waiting until Monday
> morning my time (Sunday evening yours) before pushing patches would be
> better.

If you have ideas for improvement, it's always possible to amend the
code later.  I've been pushing this stuff to see what happens in the
buildfarm, not to foreclose discussion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Craig Ringer
On 3 July 2017 at 05:10, Tom Lane  wrote:
> I wrote:
>> Any ideas what's wrong there?
>
> Hah: the answer is that query_hash's split() call is broken.
> "man perlfunc" quoth
>
>split   Splits the string EXPR into a list of strings and returns that
>list.  By default, empty leading fields are preserved, and
>empty trailing ones are deleted.
>^^^
>
> In the case at hand, the SQL query returns something like
> "|physical|||t|10338|||0/302B7E8" with normal timing, but with
> enough delay in there, you get "|physical|||t|11542|||" which
> triggers split's default behavior of ignoring the trailing empty
> fields.  It looks like the way to get split to not do that is
> to pass it a "limit" of -1.

That's my bad.

(Insert dark muttering about Perl here).

Thanks for spotting it.


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Typo in comment in xlog.c: ReadRecord

2017-07-02 Thread Amit Langote
On 2017/07/01 3:49, Peter Eisentraut wrote:
> On 6/27/17 20:54, Amit Langote wrote:
>> Attached fixes $SUBJECT.
>>
>> s/fetch_ckpt/fetching_ckpt/g
> 
> committed

Thanks.

Regards,
Amit



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix a typo in aclchk.c

2017-07-02 Thread Masahiko Sawada
On Sat, Jul 1, 2017 at 4:55 AM, Peter Eisentraut
 wrote:
> On 6/30/17 03:58, Masahiko Sawada wrote:
>> Attached patch for $subject.
>>
>> s/entires/entries/
>
> fixed
>

Thanks.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Tom Lane
Michael Paquier  writes:
> On Mon, Jul 3, 2017 at 7:02 AM, Tom Lane  wrote:
>> Anyone have a different view of what to fix here?

> No, this sounds like a good plan. What do you think about the attached?

Oh, that's a good way.  I just finished testing a fix that involved
not turning on the second server's sync commit until later (it seems
that only the first action on "paris" is really at risk currently).
But disabling sync commit for individual transactions is clearly cleaner
and more extensible to future test script changes.

FWIW, I just got done doing a few check-world cycles with the delay in
WalReceiverMain plus speeding up pg_ctl.c to WAITS_PER_SEC = 1000.
No other problems seem to be revealed this way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Michael Paquier
(catching up test threads)

On Mon, Jul 3, 2017 at 7:02 AM, Tom Lane  wrote:
> I'm now inclined to think that the correct fix is to ensure that we
> run synchronous rep in both directions, rather than to insert delays
> to substitute for that.  Just setting synchronous_standby_names for
> node paris at the top of the script doesn't work, because there's
> at least one place where the script intentionally issues commands
> to paris while london is stopped.

I bet that using syncrep in both directions will likely avoid
inconsistencies in the future if the test suite is extended on way or
another.

> But we could turn off sync rep for that step, perhaps.

Yeah, by using synchronous_commit = off.

> Anyone have a different view of what to fix here?

No, this sounds like a good plan. What do you think about the attached?
-- 
Michael


2pc-test-fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PostgresNode::poll_query_until hacking

2017-07-02 Thread Michael Paquier
On Sun, Jul 2, 2017 at 4:53 AM, Tom Lane  wrote:
> The attached proposed patch changes the TAP test infrastructure's
> poll_query_until function in two ways:
>
> 1. An optional argument is added to allow specifying the query result
> value we're waiting for, overriding the normal "t".  This allows
> folding a handwritten delay loop in 007_sync_rep.pl into the
> poll_query_until ecosystem.

True that in this test the expected output can be quite complicated,
so turning that into a query that returned 't' would make the code
unreadable.

> As far as I've found, there are no other
> handwritten delay loops in the TAP tests.

Good catch. Yes here using a poll_query_until call makes sense.

> 2. poll_query_until is modified to probe 10X per second not once
> per second, in keeping with the changes I've been making elsewhere
> to remove not-carefully-analyzed 1s delays in the regression tests.
>
> On my workstation, the reduced probe delay shaves a useful amount
> of time off the recovery and subscription regression tests.  I also
> tried it on dromedary, which is about the slowest hardware I'd care
> to run the TAP tests on regularly, and it seems to be about a wash
> there --- some tests get faster, but some get slower, presumably due
> to more overhead from the probe queries.

Check.

> Thoughts?

-   is($result, $expected, $msg);
+   ok( $self->poll_query_until('postgres', $check_sql, $expected), $msg);
A matter of taste here: using a space after "ok(".

If you would like to get some feedback from me, waiting until Monday
morning my time (Sunday evening yours) before pushing patches would be
better.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Re: pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional tests for subtransactions in recovery)

2017-07-02 Thread Michael Paquier
On Sat, Jul 1, 2017 at 4:47 AM, Peter Eisentraut
 wrote:
> On 5/1/17 12:19, Peter Eisentraut wrote:
>> However: Failure to complete promotion within the waiting time does not
>> lead to an error exit, so you will not get a failure if the promotion
>> does not finish.  This is probably a mistake.  Looking around pg_ctl, I
>> found that this was handled seemingly inconsistently in do_start(), but
>> do_stop() errors when it does not complete.

This inconsistency could be treated like a bug, though changing such
an old behavior in bacl-branches would be risky. So +1 for only HEAD
with such a change, and pg_ctl promote -w is new in 10.

>> Possible patches for this attached.
>>
>> Perhaps we need a separate exit code in pg_ctl to distinguish general
>> errors from did not finish within timeout?

I would treat that as a separate item for 11, but that's as far as my
opinion goes. Per this link in pg_ctl.c the error code ought to be 4:
https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

> I was going to hold this back for PG11, but since we're now doing some
> other tweaks in pg_ctl, it might be useful to add this too.  Thoughts?

The use of 0 as exit code for the new promote -w if timeout is reached
looks like an open item to me. Cleaning up the pool queries after
promotion would be nice to see as well.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] More race conditions in logical replication

2017-07-02 Thread Tom Lane
I noticed a recent failure that looked suspiciously like a race condition:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hornet=2017-07-02%2018%3A02%3A07

The critical bit in the log file is

error running SQL: 'psql::1: ERROR:  could not drop the replication slot 
"tap_sub" on publisher
DETAIL:  The error was: ERROR:  replication slot "tap_sub" is active for PID 
3866790'
while running 'psql -XAtq -d port=59543 host=/tmp/QpCJtafT7R dbname='postgres' 
-f - -v ON_ERROR_STOP=1' with sql 'DROP SUBSCRIPTION tap_sub' at 
/home/nm/farm/xlc64/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pm
 line 1198.

After poking at it a bit, I found that I can cause several different
failures of this ilk in the subscription tests by injecting delays at
the points where a slot's active_pid is about to be cleared, as in the
attached patch (which also adds some extra printouts for debugging
purposes; none of that is meant for commit).  It seems clear that there
is inadequate interlocking going on when we kill and restart a logical
rep worker: we're trying to start a new one before the old one has
gotten out of the slot.

I'm not particularly interested in fixing this myself, so I'm just
going to add it to the open items list.

regards, tom lane

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index dc7de20..78cd416 100644
*** a/src/backend/replication/slot.c
--- b/src/backend/replication/slot.c
*** ReplicationSlotAcquire(const char *name)
*** 358,364 
  	if (active_pid != MyProcPid)
  		ereport(ERROR,
  (errcode(ERRCODE_OBJECT_IN_USE),
!  errmsg("replication slot \"%s\" is active for PID %d",
  		name, active_pid)));
  
  	/* We made this slot active, so it's ours now. */
--- 358,364 
  	if (active_pid != MyProcPid)
  		ereport(ERROR,
  (errcode(ERRCODE_OBJECT_IN_USE),
!  errmsg("replication slot \"%s\" is active for PID %d (acq)",
  		name, active_pid)));
  
  	/* We made this slot active, so it's ours now. */
*** ReplicationSlotRelease(void)
*** 391,396 
--- 391,398 
  		 * Mark persistent slot inactive.  We're not freeing it, just
  		 * disconnecting.
  		 */
+ 		pg_usleep(10);
+ 		elog(LOG, "ReplicationSlotRelease: unmarking persistent slot");
  		SpinLockAcquire(>mutex);
  		slot->active_pid = 0;
  		SpinLockRelease(>mutex);
*** ReplicationSlotDropPtr(ReplicationSlot *
*** 523,528 
--- 525,532 
  	{
  		bool		fail_softly = slot->data.persistency != RS_PERSISTENT;
  
+ 		pg_usleep(10);
+ 		elog(LOG, "ReplicationSlotDropPtr: unmarking slot after rename fail");
  		SpinLockAcquire(>mutex);
  		slot->active_pid = 0;
  		SpinLockRelease(>mutex);
*** ReplicationSlotDropPtr(ReplicationSlot *
*** 540,545 
--- 544,551 
  	 * and nobody can be attached to this slot and thus access it without
  	 * scanning the array.
  	 */
+ 	pg_usleep(10);
+ 	elog(LOG, "ReplicationSlotDropPtr: unmarking slot");
  	LWLockAcquire(ReplicationSlotControlLock, LW_EXCLUSIVE);
  	slot->active_pid = 0;
  	slot->in_use = false;
*** restart:
*** 876,882 
  		if (active_pid)
  			ereport(ERROR,
  	(errcode(ERRCODE_OBJECT_IN_USE),
! 	 errmsg("replication slot \"%s\" is active for PID %d",
  			slotname, active_pid)));
  
  		/*
--- 882,888 
  		if (active_pid)
  			ereport(ERROR,
  	(errcode(ERRCODE_OBJECT_IN_USE),
! 	 errmsg("replication slot \"%s\" is active for PID %d (drop)",
  			slotname, active_pid)));
  
  		/*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: protocol version negotiation (Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility)

2017-07-02 Thread Robert Haas
On Thu, Jun 29, 2017 at 7:29 PM, Satyanarayana Narlapuram
 wrote:
> -Original Message-

The formatting of this message differs from the style normally used on
this mailing list, and is hard to read.

> 2. If the client version is anything other than 3.0, the server responds with 
> a ServerProtocolVersion indicating the highest version it supports, and 
> ignores any pg_protocol. options not known to it as being either 
> third-party extensions or something from a future version.  If the initial 
> response to the startup message is anything other than a 
> ServerProtocolVersion message, the client should assume it's talking to a 3.0 
> server.  (To make this work, we would back-patch a change into existing 
> releases to allow any 3.x protocol version and ignore any 
> pg_protocol. options that were specified.)
>
> We can avoid one round trip if the server accepts the startupmessage as is 
> (including understanding all the parameters supplied by the client), and in 
> the cases where server couldn’t accept the startupmessage / require 
> negotiation, it should send ServerProtocolVersion message that contains both 
> MIN and MAX versions it can support. Providing Min version helps server 
> enforce the client Min protocol version, and provides a path to deprecate 
> older versions. Thoughts?

With this latest proposal, there are no extra round-trips anyway.  I
don't much see the point of having the server advertise a minimum
supported version.  The idea of new minor protocol versions is to add
*optional* features, so there shouldn't be an issue with the client
being too old to talk to the server altogether.  Of course, the server
might be configured to reject the client unless some particular new
feature is in use, but that's best handled by a message about the
specific problem at hand rather than a generic complaint.

> Does the proposal also include the client can negotiate the protocol version 
> on the same connection rather than going through connection setup process 
> again? The state machine may not sound simple with this proposal but helps 
> bringing down total time taken for the login.

Nothing in that proposal involved an extra connection setup process;
if that's not clear, you might want to reread it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Tom Lane
I wrote:
> Anyway, having vented about that ... it's not very clear to me whether the
> test script is at fault for not being careful to let the slave catch up to
> the master before we promote it (and then deem the master to be usable as
> a slave without rebuilding it first), or whether we actually imagine that
> should work, in which case there's a replication logic bug here someplace.

OK, now that I can make some sense of what's going on in the 009 test
script ... it seems like that test script is presuming synchronous
replication behavior, but it's only actually set up sync rep in one
direction, namely the london->paris direction.  The failure occurs
when we lose data in the paris->london direction.  Specifically,
with the delay hack in place, I find this in the log before things
go south completely:

# Now london is master and paris is slave
ok 11 - Restore prepared transactions from files with master down
### Enabling streaming replication for node "paris"
### Starting node "paris"
# Running: pg_ctl -D 
/home/postgres/pgsql/src/test/recovery/tmp_check/data_paris_xSFF/pgdata -l 
/home/postgres/pgsql/src/test/recovery/tmp_check/log/009_twophase_paris.log 
start
waiting for server to start done
server started
# Postmaster PID for node "paris" is 30930
psql::1: ERROR:  prepared transaction with identifier "xact_009_11" does 
not exist

That ERROR is being reported by the london node, at line 267 of the
current script:
$cur_master->psql('postgres', "COMMIT PREPARED 'xact_009_11'");
So london is missing a prepared transaction that was created while
paris was master, a few lines earlier.  (It's not real good that
the test script isn't bothering to check the results of any of
these queries, although the end-state test I just added should close
the loop on that.)  london has no idea that it's missing data, but
when we restart the paris node a little later, it notices that its
WAL is past where london is.

I'm now inclined to think that the correct fix is to ensure that we
run synchronous rep in both directions, rather than to insert delays
to substitute for that.  Just setting synchronous_standby_names for
node paris at the top of the script doesn't work, because there's
at least one place where the script intentionally issues commands
to paris while london is stopped.  But we could turn off sync rep
for that step, perhaps.

Anyone have a different view of what to fix here?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Using postgres planner as standalone component

2017-07-02 Thread Álvaro Hernández Tortosa



On 01/07/17 22:48, Ricky Stevens wrote:

Hi,

For one of my personal projects I am interested in using the 
PostgreSQL planner as a standalone library. However, I would like to 
run this as an embedded library instead of actually creating anything 
on disk.


I've realized that postgres has several pg_operator, pg_class etc. 
tables which it uses for query planning purposes. Is there any 
PostgreSQL component interface whose implementation could be 
overridden to not actually try to read these tables from disk but 
instead read it from a custom memory region that is managed by my code.


Thanks!



Maybe you'd like to consider gporca 
https://github.com/greenplum-db/gporca as an alternative. You may also 
want to look at Calcite https://calcite.apache.org/docs/ if you were 
more into the Java world.



Álvaro

--

Álvaro Hernández Tortosa


---
<8K>data



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Tom Lane
I wrote:
> Any ideas what's wrong there?

Hah: the answer is that query_hash's split() call is broken.
"man perlfunc" quoth

   split   Splits the string EXPR into a list of strings and returns that
   list.  By default, empty leading fields are preserved, and
   empty trailing ones are deleted.
   ^^^

In the case at hand, the SQL query returns something like
"|physical|||t|10338|||0/302B7E8" with normal timing, but with
enough delay in there, you get "|physical|||t|11542|||" which
triggers split's default behavior of ignoring the trailing empty
fields.  It looks like the way to get split to not do that is
to pass it a "limit" of -1.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Tom Lane
I wrote:
> The reporting critters are all on the slow side, so I suspected
> a timing problem, especially since it only started to show up
> after changing pg_ctl's timing behavior.  I can't reproduce it
> locally on unmodified sources, but I could after putting my thumb
> on the scales like this:
> 
> diff --git a/src/backend/replication/walreceiver.c 
> b/src/backend/replication/walreceiver.c
> index ea9d21a..7ef22c1 100644
> *** a/src/backend/replication/walreceiver.c
> --- b/src/backend/replication/walreceiver.c
> *** WalReceiverMain(void)
> *** 446,451 
> --- 446,454 
>* Something was 
> received from master, so reset
>* timeout
>*/
> + 
> + pg_usleep(10); /* 
> be slow about processing */
> + 
>   last_recv_timestamp = 
> GetCurrentTimestamp();
>   ping_sent = false;
>   
> XLogWalRcvProcessMsg(buf[0], [1], len - 1);


BTW, I neglected to mention that this hack also produces a second failure
in the recovery test suite:

[20:28:31] t/001_stream_rep.pl .. 11/28 
#   Failed test 'xmin of cascaded slot null with no hs_feedback'
#   at t/001_stream_rep.pl line 178.
#  got: undef
# expected: ''

#   Failed test 'catalog xmin of cascaded slot null with no hs_feedback'
#   at t/001_stream_rep.pl line 179.
#  got: undef
# expected: ''
[20:28:31] t/001_stream_rep.pl .. 28/28 # Looks like you failed 
2 tests of 28.
[20:28:31] t/001_stream_rep.pl .. Dubious, test returned 2 
(wstat 512, 0x200)
Failed 2/28 subtests 

I had supposed last night that this was merely a matter of needing
to insert a wait_slot_xmins() call before those tests, but doing so
doesn't help.  And, looking closer, the report is complaining that
$xmin and $catalog_xmin are coming back as undef, not empty strings.
My Perl-fu is too weak to figure out how that could possibly be ---
the documentation for PostgresNode::query_hash certainly states
flatly that it will never happen.  And it even more certainly
shouldn't happen if we already successfully probed the
pg_replication_slots row in wait_slot_xmins().

The corresponding entries in the postmaster's log don't look abnormal:

2017-07-02 20:07:29.313 UTC [30862] LOG:  database system is ready to accept 
read only connections
2017-07-02 20:07:29.318 UTC [30867] LOG:  started streaming WAL from primary at 
0/300 on timeline 1
2017-07-02 20:07:29.415 UTC [30870] t/001_stream_rep.pl LOG:  statement: SELECT 
pg_create_physical_replication_slot('standby_2');
2017-07-02 20:07:29.619 UTC [30863] LOG:  redo starts at 0/3029458
2017-07-02 20:07:29.665 UTC [30882] t/001_stream_rep.pl LOG:  statement: 
SELECT xmin IS NULL AND catalog_xmin IS NULL
FROM pg_catalog.pg_replication_slots
WHERE slot_name = 'standby_2';

2017-07-02 20:07:29.679 UTC [30884] t/001_stream_rep.pl LOG:  statement: SELECT 
plugin, slot_type, datoid, database, active, active_pid, xmin, catalog_xmin, 
restart_lsn FROM pg_catalog.pg_replication_slots WHERE slot_name = 'standby_2'

(This is from a run with a wait_slot_xmins() call added, else the
statement logged by PID 30882 would not have been issued.)

Any ideas what's wrong there?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [BUGS] [HACKERS] Segmentation fault in libpq

2017-07-02 Thread Michal Novotný
Hi all,
thank you all for your advice. I've been investigating this a little more
and finally it turned out it's not a bug in libpq although I got confused
by going deep as several libpq functions. The bug was really on our side
after trying to use connection pointer after calling PQfinish(). The code
is pretty complex so it took some time to investigate however I would like
to apologize for "blaming" libpq instead of our code.

Anyway, thank you all for valuable advice.
Have a great time,
Michal

2017-06-29 16:30 GMT+02:00 Merlin Moncure :

> On Thu, Jun 29, 2017 at 9:12 AM, Tom Lane  wrote:
> > Merlin Moncure  writes:
> >> On Thu, Jun 29, 2017 at 8:23 AM, Michal Novotny
> >>  wrote:
> >>> Could you please help me based on information provided above?
> >
> >> You might want to run your code through some analysis tools (for
> >> example, valgrind).
> >
> > valgrind is not a perfect tool for finding that kind of problem,
> > especially if you can't reproduce the crash reliably; but at least
> > valgrind is readily available and easy to use, so you might as
> > well start there and see if it finds anything.  If you have access
> > to any sort of static analysis tool (eg, Coverity), that might be
> > more likely to help.  Or you could fall back on manual code
> > auditing, if the program isn't very big.
>
> clang static analyzer is another good tool to check out
>
> https://clang-analyzer.llvm.org/
>
> merlin
>



-- 
Michal Novotny
System Development Lead
michal.novo...@greycortex.com

*GREYCORTEX s.r.o.*
Purkynova 127, 61200 Brno
Czech Republic
www.greycortex.com


Re: [BUGS] [HACKERS] Segmentation fault in libpq

2017-07-02 Thread Andres Freund
Hi,

On 2017-07-02 20:58:52 +0200, Michal Novotný wrote:
> thank you all for your advice. I've been investigating this a little more
> and finally it turned out it's not a bug in libpq although I got confused
> by going deep as several libpq functions. The bug was really on our side
> after trying to use connection pointer after calling PQfinish(). The code
> is pretty complex so it took some time to investigate however I would like
> to apologize for "blaming" libpq instead of our code.

Usually using a tool like valgrind is quite helpful to find issues like
that, because it'll show you the call-stack accessing the memory and
*also* the call-stack that lead to the memory being freed.

- Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> * Some effort should be put into emitting text to the log showing
>> what's going on, eg print("Now london is master."); as appropriate.

> Check.  Not "print" though; I think using note(" .. ") (from Test::More)
> is more appropriate.

Will do, thanks for the suggestion.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] WIP patch for avoiding duplicate initdb runs during "make check"

2017-07-02 Thread Tom Lane
Yesterday I spent a bit of time on an idea that we've talked about
before, which is to not run initdb over and over again in contexts like
"make check-world", or even just during "make check" in contrib or in the
recovery or subscription tests.  The idea would be to do it once and
then copy the created data directory tree into place for each test
run.  Thus we might trade this:

$ time initdb -D $PGDATA -N
...
real0m1.235s
user0m0.896s
sys 0m0.341s

for this:

$ time cp -a $PGDATA junk
real0m0.146s
user0m0.008s
sys 0m0.136s

The attached patch is far from being committable, but it's complete enough
to get an idea of what sort of overall performance gain we might expect.
On my workstation, I've lately been doing check-world like this:
time make -s check-world -j4 PROVE_FLAGS='-j4'
and what I see is that HEAD takes about this long:
real2m1.514s
user2m8.113s
sys 1m15.993s
and with this patch I get
real1m42.549s
user0m42.888s
sys 0m46.982s
(These are median-of-3-runs; the real time bounces around a fair bit,
the CPU times not very much.)

On my laptop, a simple non-parallelized "time make check-world" takes
real9m12.813s
user2m15.113s
sys 1m25.631s
and with patch
real8m3.785s
user1m19.024s
sys 1m13.707s

So those are respectable gains, but not earth-shattering.  I'm not
sure if it's worth putting more work into it right now.  The main
thing that's needed to make it committable is to get rid of the
system("cp -a ...") call in favor of something more portable.
I looked longingly at our existing copydir() function but it seems
pretty tied to the backend environment.  It wouldn't be hard to
make a version for frontend, but I'm not going to expend that work
right now unless people are excited enough to want to commit this
now rather than later (or not at all).

(Other unfinished work: teaching the MSVC scripts to use this,
and teaching pg_upgrade's test script to use it.)

regards, tom lane

diff --git a/src/Makefile.global.in b/src/Makefile.global.in
index dc8a89a..a28a03a 100644
*** a/src/Makefile.global.in
--- b/src/Makefile.global.in
*** ifeq ($(MAKELEVEL),0)
*** 332,337 
--- 332,338 
  	rm -rf '$(abs_top_builddir)'/tmp_install
  	$(MKDIR_P) '$(abs_top_builddir)'/tmp_install/log
  	$(MAKE) -C '$(top_builddir)' DESTDIR='$(abs_top_builddir)'/tmp_install install >'$(abs_top_builddir)'/tmp_install/log/install.log 2>&1
+ 	'$(abs_top_builddir)/tmp_install$(bindir)/initdb' -D '$(abs_top_builddir)/tmp_install/proto_pgdata' --no-clean --no-sync -A trust >'$(abs_top_builddir)'/tmp_install/log/initdb.log 2>&1
  endif
  	$(if $(EXTRA_INSTALL),for extra in $(EXTRA_INSTALL); do $(MAKE) -C '$(top_builddir)'/$$extra DESTDIR='$(abs_top_builddir)'/tmp_install install >>'$(abs_top_builddir)'/tmp_install/log/install.log || exit; done)
  endif
*** $(if $(filter $(PORTNAME),darwin),DYLD_L
*** 355,361 
  endef
  
  define with_temp_install
! PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir))
  endef
  
  ifeq ($(enable_tap_tests),yes)
--- 356,362 
  endef
  
  define with_temp_install
! PATH="$(abs_top_builddir)/tmp_install$(bindir):$$PATH" $(call add_to_path,$(ld_library_path_var),$(abs_top_builddir)/tmp_install$(libdir)) PG_PROTO_DATADIR='$(abs_top_builddir)/tmp_install/proto_pgdata'
  endef
  
  ifeq ($(enable_tap_tests),yes)
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index f4fa600..22c1a72 100644
*** a/src/test/perl/PostgresNode.pm
--- b/src/test/perl/PostgresNode.pm
*** sub init
*** 404,411 
  	mkdir $self->backup_dir;
  	mkdir $self->archive_dir;
  
! 	TestLib::system_or_bail('initdb', '-D', $pgdata, '-A', 'trust', '-N',
! 		@{ $params{extra} });
  	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
  
  	open my $conf, '>>', "$pgdata/postgresql.conf";
--- 404,426 
  	mkdir $self->backup_dir;
  	mkdir $self->archive_dir;
  
! 	# If we're using default initdb parameters, and the top-level "make check"
! 	# created a prototype data directory for us, just copy that.
! 	if (!defined($params{extra}) &&
! 		defined($ENV{PG_PROTO_DATADIR}) &&
! 		-d $ENV{PG_PROTO_DATADIR})
! 	{
! 		rmdir($pgdata);
! 		RecursiveCopy::copypath($ENV{PG_PROTO_DATADIR}, $pgdata);
! 		chmod(0700, $pgdata);
! 	}
! 	else
! 	{
! 		TestLib::system_or_bail('initdb', '-D', $pgdata,
! '--no-clean', '--no-sync', '-A', 'trust',
! @{ $params{extra} });
! 	}
! 
  	TestLib::system_or_bail($ENV{PG_REGRESS}, '--config-auth', $pgdata);
  
  	open my $conf, '>>', "$pgdata/postgresql.conf";
diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index abb742b..04d7fb9 100644
*** a/src/test/regress/pg_regress.c
--- b/src/test/regress/pg_regress.c

Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Alvaro Herrera
Tom Lane wrote:

> I'd kind of like to fix it now, so I can reason in a less confused way
> about the actual problem.

OK, no objections here.

> Last night I didn't have a clear idea of how
> to make it better, but what I'm thinking this morning is:
> 
> * Naming the underlying server objects "master" and "slave" is just
> wrong, because the script makes them swap those roles repeatedly.
> Better to choose neutral names, like "alice" and "bob" or "london"
> and "paris".

Check.

> * We could simply make the test script refer directly to the appropriate
> server at each step, ie s/node_master/node_london/ in relevant parts of
> the script and s/node_slave/node_london/ elsewhere.  Maybe that's the
> best way, but there is some value in identifying commands as to whether
> we're issuing them to the current master or the current slave.  Plan B
> might be to do
>   ($cur_master, $cur_slave) = ($node_paris, $node_london);
> at the swap points and use those names where it seems clearer.

Hmm, yeah, using role-based aliases seems like a good idea too.

> * Some effort should be put into emitting text to the log showing
> what's going on, eg print("Now london is master."); as appropriate.

Check.  Not "print" though; I think using note(" .. ") (from Test::More)
is more appropriate.

> * Another reason why I had the feeling of being lost in a maze of
> twisty little passages all alike was the relentless sameness of the
> commands being sent to the servers.  There is no good reason for the
> prepared transactions to be all alike; they should be all different
> so that you can match up postmaster log entries to points in the
> script.

Check.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-02 Thread Dean Rasheed
On 30 June 2017 at 10:04, Ashutosh Bapat
 wrote:
> On Fri, Jun 30, 2017 at 1:36 PM, Amit Langote
>  wrote:
>>
>> Alright, I spent some time implementing a patch to allow specifying
>> -infinity and +infinity in arbitrary ways.  Of course, it prevents
>> nonsensical inputs with appropriate error messages.
>
> I don't think -infinity and +infinity are the right terms. For a
> string or character data type there is no -infinity and +infinity.
> Similarly for enums. We need to extend UNBOUNDED somehow to indicate
> the end of a given type in the given direction. I thought about
> UNBOUNDED LEFT/RIGHT but then whether LEFT indicates -ve side or +side
> would cause confusion. Also LEFT/RIGHT may work for a single
> dimensional datatype but not for multi-dimensional spaces. How about
> MINIMUM/MAXIMUM or UNBOUNDED MIN/MAX to indicate the extremities.
>

Yes, I think you're right. Also, some datatypes include values that
are equal to +/-infinity, which would then behave differently from
unbounded as range bounds, so it wouldn't be a good idea to overload
that term.

My first thought was UNBOUNDED ABOVE/BELOW, because that matches the
terminology already in use of upper and lower bounds.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-07-02 Thread Mithun Cy
On Tue, Jun 27, 2017 at 11:41 AM, Mithun Cy  wrote:
> On Fri, Jun 23, 2017 at 5:45 AM, Thom Brown  wrote:
>>
>> Also, I find it a bit messy that launch_autoprewarm_dump() doesn't
>> detect an autoprewarm process already running.  I'd want this to
>> return NULL or an error if called for a 2nd time.
>
> We log instead of error as we try to check only after launching the
> worker and inside worker. One solution could be as similar to
> autoprewam_dump_now(), the autoprewarm_start_worker() can init shared
> memory and check if we can launch worker in backend itself. I will try
> to fix same.

I have fixed it now as follows

+Datum
+autoprewarm_start_worker(PG_FUNCTION_ARGS)
+{
+   pid_t   pid;
+
+   init_apw_shmem();
+   pid = apw_state->bgworker_pid;
+   if (pid != InvalidPid)
+   ereport(ERROR,
+   (errmsg("autoprewarm worker is already running under PID %d",
+   pid)));
+
+   autoprewarm_dump_launcher();
+   PG_RETURN_VOID();
+}

In backend itself, we shall check if an autoprewarm worker is running
then only start the server. There is a possibility if this function is
executed concurrently when there is no worker already running (Which I
think is not a normal usage) then both call will say it has
successfully launched the worker even though only one could have
successfully done that (other will log and silently die). I think that
is okay as the objective was to get one worker up and running.

I have changed the return value to void. The worker could be restarted
when there is an error. So returned pid is not going to be same as
worker pid in such cases. Also, I do not see any use of pid.  Made
documentation changes regarding above changes.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


autoprewarm_17.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Multi column range partition table

2017-07-02 Thread Dean Rasheed
On 30 June 2017 at 09:06, Amit Langote  wrote:
> When testing the patch, I realized that the current code in
> check_new_partition_bound() that checks for range partition overlap had a
> latent bug that resulted in false positives for the new cases that the new
> less restrictive syntax allowed.  I spent some time simplifying that code
> while also fixing the aforementioned bug.  It's implemented in the
> attached patch 0001.
>

I haven't had time to look at 0002 yet, but looking at 0001, I'm not
convinced that this really represents much of a simplification, but I
do prefer the way it now consistently reports the first overlapping
partition in the error message.

I'm not entirely convinced by this change either:

-if (equal || off1 != off2)
+if (off2 > off1 + 1 || ((off2 == off1 + 1) && !equal))

Passing probe_is_bound = true to partition_bound_bsearch() will I
think cause it to return equal = false when the upper bound of one
partition equals the lower bound of another, so relying on the "equal"
flag here seems a bit dubious. I think I can just about convince
myself that it works, but not for the reasons stated in the comments.

It also seems unnecessary for this code to be doing 2 binary searches.
I think a better simplification would be to just do one binary search
to find the gap that the lower bound fits in, and then test the upper
bound of the new partition against the lower bound of the next
partition (if there is one), as in the attached patch.

Regards,
Dean
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
new file mode 100644
index 7da2058..96760a0
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -745,78 +745,62 @@ check_new_partition_bound(char *relname,
 if (partdesc->nparts > 0)
 {
 	PartitionBoundInfo boundinfo = partdesc->boundinfo;
-	int			off1,
-off2;
-	bool		equal = false;
+	int			offset;
+	bool		equal;
 
 	Assert(boundinfo && boundinfo->ndatums > 0 &&
 		   boundinfo->strategy == PARTITION_STRATEGY_RANGE);
 
 	/*
-	 * Firstly, find the greatest range bound that is less
-	 * than or equal to the new lower bound.
+	 * Test whether the new lower bound (which is treated
+	 * inclusively as part of the new partition) lies inside an
+	 * existing partition, or in a gap.
+	 *
+	 * If it's in a gap, the next index value will be -1 (the
+	 * lower bound of the next partition).  This is also true
+	 * if there is no next partition, since the index array is
+	 * initialised with an extra -1 at the end.
+	 *
+	 * Note that this also allows for the possibility that the
+	 * new lower bound equals an existing upper bound.
 	 */
-	off1 = partition_bound_bsearch(key, boundinfo, lower, true,
-   );
+	offset = partition_bound_bsearch(key, boundinfo, lower,
+	 true, );
 
-	/*
-	 * off1 == -1 means that all existing bounds are greater
-	 * than the new lower bound.  In that case and the case
-	 * where no partition is defined between the bounds at
-	 * off1 and off1 + 1, we have a "gap" in the range that
-	 * could be occupied by the new partition.  We confirm if
-	 * so by checking whether the new upper bound is confined
-	 * within the gap.
-	 */
-	if (!equal && boundinfo->indexes[off1 + 1] < 0)
+	if (boundinfo->indexes[offset + 1] < 0)
 	{
-		off2 = partition_bound_bsearch(key, boundinfo, upper,
-	   true, );
-
 		/*
-		 * If the new upper bound is returned to be equal to
-		 * the bound at off2, the latter must be the upper
-		 * bound of some partition with which the new
-		 * partition clearly overlaps.
-		 *
-		 * Also, if bound at off2 is not same as the one
-		 * returned for the new lower bound (IOW, off1 !=
-		 * off2), then the new partition overlaps at least one
-		 * partition.
+		 * Check that the new partition will fit in the gap.
+		 * For it to fit, the new upper bound must be less than
+		 * or equal to the lower bound of the next partition,
+		 * if there is one.
 		 */
-		if (equal || off1 != off2)
+		if (offset + 1 < boundinfo->ndatums)
 		{
-			overlap = true;
+			int32		cmpval;
 
-			/*
-			 * The bound at off2 could be the lower bound of
-			 * the partition with which the new partition
-			 * overlaps.  In that case, use the upper bound
-			 * (that is, the bound at off2 + 1) to get the
-			 * index of that partition.
-			 */
-			if (boundinfo->indexes[off2] < 0)
-with = boundinfo->indexes[off2 + 1];
-			else
-with = boundinfo->indexes[off2];
+			cmpval = partition_bound_cmp(key, boundinfo,
+		 offset + 1, upper,
+		 true);
+			if (cmpval < 0)
+			{
+/*
+ * The new 

Re: [HACKERS] Using postgres planner as standalone component

2017-07-02 Thread Tom Lane
Ricky Stevens  writes:
> For one of my personal projects I am interested in using the PostgreSQL
> planner as a standalone library. However, I would like to run this as an
> embedded library instead of actually creating anything on disk.

I'm not really clear on what value that would have.  Aside from the
problem you mentioned that lots of information comes from the PG
system catalogs, there are a lot of other issues large and small:

* the code depends extensively on the PG backend programming environment
(palloc and elog, for instance);

* the input data structure is a PG-specific query representation, and
the output structure is a PG-specific plan representation;

* the knowledge that it has is all about the behavior of PG-specific
operators and execution plan types.

By the time you got done dealing with all that, either you'd have imported
pretty much the entire Postgres system into your "standalone library",
or you'd have done so much rewrite work that you might as well have
started from scratch.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Using postgres planner as standalone component

2017-07-02 Thread Ricky Stevens
Hi,

For one of my personal projects I am interested in using the PostgreSQL
planner as a standalone library. However, I would like to run this as an
embedded library instead of actually creating anything on disk.

I've realized that postgres has several pg_operator, pg_class etc. tables
which it uses for query planning purposes. Is there any PostgreSQL
component interface whose implementation could be overridden to not
actually try to read these tables from disk but instead read it from a
custom memory region that is managed by my code.

Thanks!


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Tom Lane
Alvaro Herrera  writes:
> Tom Lane wrote:
>> Part of the reason I'm confused is that the programming technique
>> being used in 009_twophase.pl, namely doing
>> ($node_master, $node_slave) = ($node_slave, $node_master);
>> and then working with the reversed variable names, is ENTIRELY TOO CUTE
>> FOR ITS OWN GOOD.

> This is my fault.  I noticed this in the submitted test (and was pretty
> confused about it too) when I reviewed it for commit, but somehow it
> didn't reach my threshold to require a rewrite.  I'll fix it sometime
> during the week.

I'd kind of like to fix it now, so I can reason in a less confused way
about the actual problem.  Last night I didn't have a clear idea of how
to make it better, but what I'm thinking this morning is:

* Naming the underlying server objects "master" and "slave" is just
wrong, because the script makes them swap those roles repeatedly.
Better to choose neutral names, like "alice" and "bob" or "london"
and "paris".

* We could simply make the test script refer directly to the appropriate
server at each step, ie s/node_master/node_london/ in relevant parts of
the script and s/node_slave/node_london/ elsewhere.  Maybe that's the
best way, but there is some value in identifying commands as to whether
we're issuing them to the current master or the current slave.  Plan B
might be to do
($cur_master, $cur_slave) = ($node_paris, $node_london);
at the swap points and use those names where it seems clearer.

* Some effort should be put into emitting text to the log showing
what's going on, eg print("Now london is master."); as appropriate.

* Another reason why I had the feeling of being lost in a maze of
twisty little passages all alike was the relentless sameness of the
commands being sent to the servers.  There is no good reason for the
prepared transactions to be all alike; they should be all different
so that you can match up postmaster log entries to points in the
script.

If anyone has other ideas please speak up.

Barring objection I'll go do something about this shortly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Race-like failure in recovery/t/009_twophase.pl

2017-07-02 Thread Alvaro Herrera
Tom Lane wrote:

> Part of the reason I'm confused is that the programming technique
> being used in 009_twophase.pl, namely doing
> 
>   ($node_master, $node_slave) = ($node_slave, $node_master);
> 
> and then working with the reversed variable names, is ENTIRELY TOO CUTE
> FOR ITS OWN GOOD.

This is my fault.  I noticed this in the submitted test (and was pretty
confused about it too) when I reviewed it for commit, but somehow it
didn't reach my threshold to require a rewrite.  I'll fix it sometime
during the week.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UPDATE of partition key

2017-07-02 Thread Robert Haas
On Fri, Jun 30, 2017 at 4:20 PM, Robert Haas  wrote:
> I don't think the approach of building a hash table to figure out
> which result rels have already been created is a good one.  That too
> feels like something that the planner should be figuring out and the
> executor should just be implementing what the planner decided.  I
> haven't figured out exactly how that should work yet, but it seems
> like it ought to be doable.

I was imagining when I wrote the above that the planner should somehow
compute a list of relations that it has excluded so that the executor
can skip building ResultRelInfos for exactly those relations, but on
further study, that's not particularly easy to achieve and wouldn't
really save anything anyway, because the list of OIDs is coming
straight out of the partition descriptor, so it's pretty much free.
However, I still think it would be a nifty idea if we could avoid
needing the hash table to deduplicate.  The reason we need that is, I
think, that expand_inherited_rtentry() is going to expand the
inheritance hierarchy in whatever order the scan(s) of pg_inherits
return the descendant tables, whereas the partition descriptor is
going to put them in a canonical order.

But that seems like it wouldn't be too hard to fix: let's have
expand_inherited_rtentry() expand the partitioned table in the same
order that will be used by ExecSetupPartitionTupleRouting().  That
seems pretty easy to do - just have expand_inherited_rtentry() notice
that it's got a partitioned table and call
RelationGetPartitionDispatchInfo() instead of find_all_inheritors() to
produce the list of OIDs.  Then - I think -
ExecSetupPartitionTupleRouting() doesn't need the hash table; it can
just scan through the return value of ExecSetupPartitionTupleRouting()
and the list of already-created ResultRelInfo structures in parallel -
the order must be the same, but the latter can be missing some
elements, so it can just create the missing ones.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stop_backup(wait_for_archive := true) on standby server

2017-07-02 Thread Michael Paquier
On Sat, Jul 1, 2017 at 3:59 AM, Stephen Frost  wrote:
> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
>> On 6/30/17 04:08, Masahiko Sawada wrote:
>> >> I'm not sure. I think this can be considered a bug in the implementation 
>> >> for
>> >> 10, and as such is "open for fixing". However, it's not a very critical 
>> >> bug
>> >> so I doubt it should be a release blocker, but if someone wants to work 
>> >> on a
>> >> fix I think we should commit it.
>> >
>> > I agree with you. I'd like to hear opinions from other hackers as well.
>>
>> It's preferable to make it work.  If it's not easily possible, then we
>> should prohibit it.
>>
>> Comments from Stephen (original committer)?
>
> I agree that it'd be preferable to make it work, but I'm not sure I can
> commit to having it done in short order.  I'm happy to work to prohibit
> it, but if someone has a few spare cycles to make it actually work,
> that'd be great.

Fixing the limitation instead of prohibiting it looks like a better
way of doing things to me. It would be hard to explain to users why
the implementation does not consider archive_mode = always. Blocking
it is just four lines of code, still that feels wrong.

> In short, I agree with Magnus and feel like I'm more-or-less in the same
> boat as he is (though slightly jealous as that's not actually physically
> the case, for I hear he has a rather nice boat...).

That means a PG-EU in Sweden at some point?!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers