date:20151202

On 2015-12-02 08:25:13 -0800, Joshua D. Drake wrote:
> A feature does not exist without documentation.

Uh, you do realize there's actually documentation about RLS? The issues
mentioned here are some small adjustments, not entirely new docs.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] psql ignores failure to open -o target file

I just noticed that parse_psql_options() ignores the result of setQFout(),
meaning that if the argument of a -o command line option is bogus, we'll
ignore the switch entirely after printing an error report.  For example

$ psql -o /dev/foo -c 'select 1'
/dev/foo: Permission denied
 ?column? 
--
1
(1 row)

$

This seems surprising to me: any other program in the world would do
exit(1) after discovering that it couldn't write where it had been
told to.  Should we change this?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-02 Thread Joshua D. Drake


On 12/02/2015 05:27 AM, Robert Haas wrote:

On Mon, Nov 30, 2015 at 4:55 PM, Stephen Frost  wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Well, it's December nearly, and we don't seem to be making much progress
towards pushing out 9.5.0.  I see the following items on
https://wiki.postgresql.org/wiki/PostgreSQL_9.5_Open_Items

* Open Row-Level Security Issues

Seems like what's left here is only documentation fixes, but they still
need to get done.


These are mainly just documentation improvements which I'm working on,
though the docs were recently updated and I need to incorporate Peter's
changes which I wasn't exactly anticipating.


So, when do you foresee this documentation stuff getting taken care
of?  We kinda need to do a release here.  Is this really a blocker?



A feature does not exist without documentation.

JD

--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-02 Thread Amit Kapila

On Wed, Dec 2, 2015 at 7:45 AM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:
>
> Thank you for picking this up.
>
> At Tue, 1 Dec 2015 20:33:02 +0530, Amit Kapila 
wrote in 
> > On Mon, Nov 30, 2015 at 6:17 PM, Kyotaro HORIGUCHI <
> > horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > > == TODO or random thoughts, not restricted on this patch.
> > >
> > > - This patch doesn't contain planner part, it must be aware of
> > >   async execution in order that this can be  in effective.
> > >
> >
> > How will you decide whether sync-execution is cheaper than parallel
> > execution.  Do you have some specific cases in mind where async
> > execution will be more useful than parallel execution?
>
> Mmm.. Some confusion in wording? Sync-async is a discrimination
> about when to start execution of a node (and its
> descendents). Parallel-serial(sequential) is that of whether
> multiple nodes can execute simultaneously. Async execution
> premises parallel execution in any terms, bgworker or FDW.
>
> As I wrote in the previous mail, async execution reduces startup
> time of execution of parallel execution.
>

Could you please explain in more detail how async execution reduces
the startup time for parallel execution? Can you share the plans for
both the execution plans(with and without async execution), that might
help to understand the reason for same?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] Rework the way multixact truncations work

On 2015-12-02 09:57:19 -0500, Noah Misch wrote:
> On Tue, Dec 01, 2015 at 05:07:15PM -0500, Robert Haas wrote:
> > I think it's weird to back a commit out only to put a bunch of very similar
> > stuff back in.
>
> I agree with that.  If the original patches and their replacements shared 95%
> of diff lines in common, we wouldn't be having this conversation. These
> replacements redo closer to 50% of the lines, so the patches are not very
> similar by verbatim line comparison.

Which is a huge problem, because it makes it very hard to see what your
changes actually do. And a significant portion of the changes relative
to master aren't particularly critical. Which is easy to see if if a
commit only changes comments, but harder if you see one commit reverting
things, and another redoing most of the same things.

> Hackers have been too reticent to revert and redo defect-laden
> commits.  If doing that is weird today, let it be normal.

Why? Especially if reverting and redoing includes conflicts that mainly
increases the chance of accidental bugs.

> git remote add community git://git.postgresql.org/git/postgresql.git
> git remote add nmisch_github https://github.com/nmisch/postgresql.git
> git fetch --multiple community nmisch_github
> git diff community/master...nmisch_github/mxt4-rm-legacy

That's a nearly useless diff, because it includes a lot of other changes
(218 files changed, 2828 insertions(+), 8742 deletions(-)) made since
you published the changes. What kinda works is
git diff $(git merge-base community/master 
nmisch_github/mxt4-rm-legacy)..nmisch_github/mxt4-rm-legacy
which shows the diff to the version of master you start off from.

Review of the above diff:
> @@ -2013,7 +2017,7 @@ TrimMultiXact(void)
>  {
>   MultiXactId nextMXact;
>   MultiXactOffset offset;
> - MultiXactId oldestMXact;
> + MultiXactId oldestMXact;

That's a bit weird, given that nextMXact isn't indented...


> @@ -2190,7 +2194,8 @@ MultiXactSetNextMXact(MultiXactId nextMulti,
> - /*
> -  * Computing the actual limits is only possible once the data directory 
> is
> -  * in a consistent state. There's no need to compute the limits while
> -  * still replaying WAL - no decisions about new multis are made even
> -  * though multixact creations might be replayed. So we'll only do 
> further
> -  * checks after TrimMultiXact() has been called.
> -  */
> + /* Before the TrimMultiXact() call at end of recovery, skip the rest. */
>   if (!MultiXactState->finishedStartup)
>   return;
> -
>   Assert(!InRecovery);
> 
> - /* Set limits for offset vacuum. */
> + /*
> +  * Setting MultiXactState->oldestOffset entails a find_multixact_start()
> +  * call, which is only possible once the data directory is in a 
> consistent
> +  * state. There's no need for an offset limit while still replaying WAL;
> +  * no decisions about new multis are made even though multixact 
> creations
> +  * might be replayed.
> +  */
>   needs_offset_vacuum = SetOffsetVacuumLimit();

I don't really see the benefit of this change. The previous location of
the comment is where we return early, so it seems appropriate to
document the reason there?

>   /*
> @@ -2354,6 +2356,12 @@ MultiXactAdvanceNextMXact(MultiXactId minMulti,
>   debug_elog3(DEBUG2, "MultiXact: setting next multi to %u", 
> minMulti);
>   MultiXactState->nextMXact = minMulti;
>   }
> +
> + /*
> +  * MultiXactOffsetPrecedes() gives the wrong answer if nextOffset would
> +  * advance more than 2^31 between calls.  Since we get a call for each
> +  * XLOG_MULTIXACT_CREATE_ID, that should never happen.
> +  */

Independent comment improvement. Good idea though.


>  /*
> - * Update our oldestMultiXactId value, but only if it's more recent than what
> - * we had.
> - *
> - * This may only be called during WAL replay.
> + * Update our oldestMultiXactId value, but only if it's more recent than
> + * what we had.  This may only be called during WAL replay.
>   */

Whatever?

>  void
>  MultiXactAdvanceOldest(MultiXactId oldestMulti, Oid oldestMultiDB)
> @@ -2544,14 +2550,13 @@ GetOldestMultiXactId(void)
>  static bool
>  SetOffsetVacuumLimit(void)
>  {
> - MultiXactId oldestMultiXactId;
> + MultiXactId oldestMultiXactId;
>   MultiXactId nextMXact;
> - MultiXactOffset oldestOffset = 0;   /* placate compiler */
> - MultiXactOffset prevOldestOffset;
> + MultiXactOffset oldestOffset = 0;   /* placate compiler */
>   MultiXactOffset nextOffset;
>   boololdestOffsetKnown = false;
> + MultiXactOffset prevOldestOffset;
>   boolprevOldestOffsetKnown;
> - MultiXactOffset offsetStopLimit = 0;

I don't see the benefit of the order changes here.

> @@ -2588,40 +2590,50 @@ SetOffsetVacuumLimit(void)
>   else
>   {
>   /*
> -  * Figure out where

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

On Fri, Nov 27, 2015 at 2:32 AM, Amit Kapila  wrote:
> Okay, as discussed I have handled the case of sub-transactions without
> additional shmem in the attached patch.  Apart from that, I have tried
> to apply this optimization for Prepared transactions as well, but as
> the dummy proc used for such transactions doesn't have semaphore like
> backend proc's, so it is not possible to use such a proc in group status
> updation as each group member needs to wait on semaphore.  It is not tad
> difficult to add the support for that case if we are okay with creating
> additional
> semaphore for each such dummy proc which I was not sure, so I have left
> it for now.

"updation" is not a word.  "acquirations" is not a word.  "penality"
is spelled wrong.

I think the approach this patch takes is pretty darned strange, and
almost certainly not what we want.  What you're doing here is putting
every outstanding CLOG-update request into a linked list, and then the
leader goes and does all of those CLOG updates.  But there's no
guarantee that the pages that need to be updated are even present in a
CLOG buffer.  If it turns out that all of the batched CLOG updates are
part of resident pages, then this is going to work great, just like
the similar ProcArrayLock optimization.  But if the pages are not
resident, then you will get WORSE concurrency and SLOWER performance
than the status quo.  The leader will sit there and read every page
that is needed, and to do that it will repeatedly release and
reacquire CLogControlLock (inside SimpleLruReadPage).  If you didn't
have a leader, the reads of all those pages could happen at the same
time, but with this design, they get serialized.  That's not good.

My idea for how this could possibly work is that you could have a list
of waiting backends for each SLRU buffer page.  Pages with waiting
backends can't be evicted without performing the updates for which
backends are waiting.  Updates to non-resident pages just work as they
do now.  When a backend acquires CLogControlLock to perform updates to
a given page, it also performs all other pending updates to that page
and releases those waiters.  When a backend acquires CLogControlLock
to evict a page, it must perform any pending updates and write the
page before completing the eviction.

I agree with Simon that it's probably a good idea for this
optimization to handle cases where a backend has a non-overflowed list
of subtransactions.  That seems doable.  Handling the case where the
subxid list has overflowed seems unimportant; it should happen rarely
and is therefore not performance-critical.  Also, handling the case
where the XIDs are spread over multiple pages seems far too
complicated to be worth the effort of trying to fit into a "fast
path".  Optimizing the case where there are 1+ XIDs that need to be
updated but all on the same page should cover well over 90% of commits
on real systems, very possibly over 99%.  That should be plenty good
enough to get whatever contention-reduction benefit is possible here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and multimaster

On Mon, Nov 30, 2015 at 11:20 AM, Konstantin Knizhnik
 wrote:
> We have implemented ACID multimaster based on logical replication and our
> DTM (distributed transaction manager) plugin.
> Good news is that it works and no inconsistency is detected.
> But unfortunately it is very very slow...
>
> At standalone PostgreSQL I am able to achieve about 3 TPS with 10
> clients performing simple depbit-credit transactions.
> And with multimaster consisting of three nodes spawned at the same system I
> got about 100 (one hundred) TPS.
> There are two main reasons of such awful performance:
>
> 1. Logical replication serializes all transactions:  there is single
> connection between wal-sender and receiver BGW.
> 2. 2PC synchronizes transaction commit at all nodes.
>
> None of these two reasons are show stoppers themselves.
> If we remove DTM and do asynchronous logical replication then performance of
> multimaster is increased to 6000 TPS
> (please notice that in this test all multimaster node are spawned at the
> same system, sharing its resources,
> so 6k is not bad result comparing with 30k at standalone system).
> And according to 2ndquadrant results, BDR performance is very close to hot
> standby.

Logical decoding only begins decoding a transaction once the
transaction is complete.  So I would guess that the sequence of
operations here is something like this - correct me if I'm wrong:

1. Do the transaction.
2. PREPARE.
3. Replay the transaction.
4. PREPARE the replay.
5. COMMIT PREPARED on original machine.
6. COMMIT PREPARED on replica.

Step 3 introduces latency proportional to the amount of work the
transaction did, which could be a lot.   If you were doing synchronous
physical replication, the replay of the COMMIT record would only need
to wait for the replay of the commit record itself.  But with
synchronous logical replication, you've got to wait for the replay of
the entire transaction.  That's a major bummer, especially if replay
is single-threaded and there a large number of backends generating
transactions.  Of course, the 2PC dance itself can also add latency -
that's most likely to be the issue if the transactions are each very
short.

What I'd suggest is trying to measure where the latency is coming
from.  You should be able to measure how much time each transaction
spends (a) executing, (b) preparing itself, (c) waiting for the replay
thread to begin replaying it, (d) waiting for the replay thread to
finish replaying it, and (e) committing.  Separating (c) and (d) might
be a little bit tricky, but I bet it's worth putting some effort in,
because the answer is probably important to understanding what sort of
change will help here.  If (c) is the problem, you might be able to
get around it by having multiple processes, though that only helps if
applying is slower than decoding.  But if (d) is the problem, then the
only solution is probably to begin applying the transaction
speculatively before it's prepared/committed.  I think.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

On Tue, Dec 1, 2015 at 2:25 AM, Kyotaro HORIGUCHI
 wrote:
> Yeah, it is actually restricted in that length. But if we allow
> the buffer to store whole the qualified names, it will need 64 *
> 2 + 1 +1 = 130 bytes * 10 1300 bytes for each beentry... It might
> be acceptable by others, but I don't think that is preferable..

There's no such thing as a free lunch here, but we probably don't need
room for 10 strings.  If we allowed say 4 strings per beentry and
limited each one to, say, 140 characters for Twitter-compatibility,
that's 560 bytes per backend.  Throw in some int8 counters and you're
up to maybe 600 bytes per backend.  So that's ~60kB of memory for 100
backends.  That doesn't sound like a huge problem to me, but it
wouldn't be stupid to have a PGC_POSTMASTER GUC to turn this feature
on and off, for the benefit of people who may want to run this in
low-memory environments.

> As a more dractic change in design, since these fields are
> written/read in sequential manner, providing one free buffer of
> the size of.. so.. about 128 bytes for each beentry and storing
> strings delimiting with '\0' and numbers in binary format, as an
> example, would do. Additional functions to write into/read from
> this buffer safely would be needed but this gives both the
> ability to store longer messages and relatively short total
> buffer size, and allows arbitrary number of parameters limited
> only by the length of the free buffer.
>
> What do you think about this?

I think it sounds like a mess with uncertain benefits.  Now instead of
having individual fields that maybe don't fit and have to be
truncated, you have to figure out what to leave out when the overall
message doesn't fit.  That's likely to lead to a lot of messy logic on
the server side, and even messier logic for any clients that read the
data and try to parse it programmatically.

> By the way, how about giving separate columns for relname and
> namespace? I think it is more usual way to designate a relation
> in this kind of view and it makes the snprintf to concatenate
> name and schema unnecessary(it's not significant, though). (The
> following example is after pg_stat_all_tables)

I could go either way on this.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-02 Thread Joshua D. Drake


On 12/02/2015 08:39 AM, Andres Freund wrote:

On 2015-12-02 08:25:13 -0800, Joshua D. Drake wrote:

A feature does not exist without documentation.


Uh, you do realize there's actually documentation about RLS? The issues
mentioned here are some small adjustments, not entirely new docs.


No I didn't. I apologize for the noise.

JD


--
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Errors in our encoding conversion tables

On Fri, Nov 27, 2015 at 8:54 PM, Tatsuo Ishii  wrote:
> I explain why the manual editing is necessary.
>
> One of the most famous problems with Unicode is "wave dash"
> (U+301C). According the Unicode consortium's Unicode/SJIS map, it
> corresponds to 0x8160 of Shift_JIS. Unfortunately this was a mistake
> in Unicode (the glyph of Shift_JIS and Unicode is slightly different -
> looks like to be rotated in 90 degrees of wave dash in vertical
> scripting. Probably they did not understand the Japanese vertical
> writing at that time). So later on the Unicode consortium decided to
> add another "wave dash" as U+FF5E which has a correct glyph of "wave
> dash". However since Unicode already decided that U+301C corresponds
> to 0x8160 of Shift_JIS, there's no Shift_JIS code corresponding to
> U+FF5E. Unlike Unicode's definition, Microsoft defines that 0x8160
> (wave dash) corresponds to U+FF5E. This is widely used in Japan. So I
> decided to hire this for "wave dash". i.e.
>
> 0x8160 -> U+FF5E (sjis_to_utf8.map)
>
> U+301C -> 0x8160 (utf_to_sjis.map)
> U+FF5E -> 0x8160 (utf_to_sjis.map)
>
> Another problem is vendor extension.
>
> There are several standards for SJIS and EUC_JP in Japan. There is a
> standard "Shift_JIS" defined by Japanese Government (probably the
> Unicode consortium's map can be based on this, but I need to
> verify). However several major vendors include IBM, NEC added their
> own additional characters to Shift_JIS and they are widely used in
> Japan. Unfortunately they are not compatible. So as a compromise I and
> other developers decided to "merge" NEC and IBM extension part and
> added to Shift_JIS. Same thing can be said to EUC_JP.
>
> In short, there are number of reasons we cannot simply import the
> consortium's mapping regarding SJIS (and EUC_JP).

I haven't seen a response to this point, but it seems important.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: add \pset true/false

2015-12-02 Thread Jim Nasby


On 11/15/15 7:37 PM, Peter Eisentraut wrote:

On 11/15/15 3:20 PM, Jim Nasby wrote:

As to the argument about displaying a check or an X, why should that
capability only exist for boolean types? For example, why not allow psql
to convert a numeric value into a bar of varying sizes? I've frequently
emulated that with something like SELECT repeat( '*', blah * 30 /
max_of_blah ). I'm sure there's other examples people could think of.


Well, why not?  The question there is only how many marginal features
you want to stuff into psql, not whether it's the right place to stuff them.


I was more thinking it would be nice to be able to temporarily 
over-ride/wrap what an output function is doing. AFAIK that would allow 
this to work everywhere (row(), copy, etc). I don't know of any remotely 
practical way to do that, though.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql ignores failure to open -o target file

2015-12-02 Thread David G. Johnston

On Wed, Dec 2, 2015 at 11:04 AM, Robert Haas  wrote:

> On Wed, Dec 2, 2015 at 11:07 AM, Tom Lane  wrote:
> > I just noticed that parse_psql_options() ignores the result of
> setQFout(),
> > meaning that if the argument of a -o command line option is bogus, we'll
> > ignore the switch entirely after printing an error report.  For example
> >
> > $ psql -o /dev/foo -c 'select 1'
> > /dev/foo: Permission denied
> >  ?column?
> > --
> > 1
> > (1 row)
> >
> > $
> >
> > This seems surprising to me: any other program in the world would do
> > exit(1) after discovering that it couldn't write where it had been
> > told to.  Should we change this?
>
> I assume this is a rhetorical question.
>
>
How about this one: do we change this behavior in the back branches?

Given other instances of in script errors not causing psql to exit (hence
ON_ERROR_STOP) this doesn't seem as surprising as it is being made out to
be.

David J.

Re: [HACKERS] psql ignores failure to open -o target file

"David G. Johnston"  writes:
> On Wed, Dec 2, 2015 at 11:04 AM, Robert Haas  wrote:
>> On Wed, Dec 2, 2015 at 11:07 AM, Tom Lane  wrote:
>>> This seems surprising to me: any other program in the world would do
>>> exit(1) after discovering that it couldn't write where it had been
>>> told to.  Should we change this?

>> I assume this is a rhetorical question.

> How about this one: do we change this behavior in the back branches?

I don't think we should change this in stable branches.  I would vote
for fixing it in 9.5, though, mainly because the fix is going to interact
with the extended-mode-wrap fixes I'm also working on.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql ignores failure to open -o target file

I wrote:
> I just noticed that parse_psql_options() ignores the result of setQFout(),
> meaning that if the argument of a -o command line option is bogus, we'll
> ignore the switch entirely after printing an error report.

There's more silliness in the same area.  \o with an invalid target spec
is treated like \o with no argument, ie, revert the target to stdout:

regression=# \o somefile
regression=# select 1;  -- output goes to somefile, as expected
regression=# \o /dev/bogus
/dev/bogus: Permission denied
regression=# select 1;
 ?column? 
--
1
(1 row)

Surely that should have no effect beyond printing the error message.

Also, I notice that we try to set SIGPIPE handling to SIG_IGN whenever
the \o target is a pipe, which is good; but places that transiently
change SIGPIPE handling will unconditionally reset it back to SIG_DFL.
One such place is ClosePager().  That bug seems only latent because we
only invoke a pager when queryFout == stdout, so we would never get
there while the target is a pipe.  But another such place is do_copy(),
and we definitely can call that while \o is set to a pipe.

In short, a bit of refactoring is needed around setQFout...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql ignores failure to open -o target file

On Wed, Dec 2, 2015 at 1:24 PM, Tom Lane  wrote:
> "David G. Johnston"  writes:
>> On Wed, Dec 2, 2015 at 11:04 AM, Robert Haas  wrote:
>>> On Wed, Dec 2, 2015 at 11:07 AM, Tom Lane  wrote:
 This seems surprising to me: any other program in the world would do
 exit(1) after discovering that it couldn't write where it had been
 told to.  Should we change this?
>
>>> I assume this is a rhetorical question.
>
>> How about this one: do we change this behavior in the back branches?
>
> I don't think we should change this in stable branches.  I would vote
> for fixing it in 9.5, though, mainly because the fix is going to interact
> with the extended-mode-wrap fixes I'm also working on.

Oh.  I would assume this was an obvious back-patch back to the stone
age.  It seems like flagrant bug.  Let's revive 7.4 just to fix this.
:-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Errors in our encoding conversion tables

Robert Haas  writes:
> On Fri, Nov 27, 2015 at 8:54 PM, Tatsuo Ishii  wrote:
>> In short, there are number of reasons we cannot simply import the
>> consortium's mapping regarding SJIS (and EUC_JP).

> I haven't seen a response to this point, but it seems important.

I'll defer to Tatsuo-san concerning whether the Far Eastern conversions
should act the way they do.  However, I still think the Cyrillic and
Latin-2 conversions are broken.  There is no reason to question the
Unicode consortium's mappings in those cases AFAIK, and even if somebody
wants to, our current tables fail to round-trip some characters, which
is surely wrong.  (See the "inconsistent reverse conversion" complaints
in the test output in <32464.1448742...@sss.pgh.pa.us>.)

Regardless of that, it's dismaying that we have files in our tree that
claim to produce our mapping tables from authoritative sources, when in
fact those tables were not produced in that way.  This is a documentation
failure even if you consider the actual conversion behavior valid.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Making the C collation less inclined to abort abbreviation

On Sun, Nov 29, 2015 at 4:02 PM, Peter Geoghegan  wrote:
> The C collation is treated exactly the same as other collations when
> considering whether the generation of abbreviated keys for text should
> continue. This doesn't make much sense. With text, the big cost that
> we are concerned about going to waste should abbreviated keys not
> capture sufficient entropy is the cost of n strxfrm() calls. However,
> the C collation doesn't use strxfrm() -- it uses memcmp(), which is
> far cheaper.
>
> With other types, like numeric and now UUID, the cost of generating an
> abbreviated key is significantly lower than text when using collations
> other than the C collation. Their cost models reflect this, and abort
> abbreviation far less aggressively than text's, even though the
> trade-off is very similar when text uses the C collation.
>
> Attached patch fixes this inconsistency by making it significantly
> less likely that abbreviation will be aborted when the C collation is
> in use. The behavior with other collations is unchanged. This should
> be backpatched to 9.5 as a bugfix, IMV.

Could you provide a test case where this change is a winner for the C
locale but a loser for some other locale?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

On Sat, Nov 28, 2015 at 7:05 PM, Peter Geoghegan  wrote:
> On Sat, Nov 28, 2015 at 2:04 PM, Jeff Janes  wrote:
>> For me very large sorts (100,000,000 ints) with work_mem below 4MB do
>> better with unpatched than with your patch series, by about 5%.  Not a
>> big deal, but also if it is easy to keep the old behavior then I think
>> we should.  Yes, it is dumb to do large sorts with work_mem below 4MB,
>> but if you have canned apps which do a mixture of workloads it is not
>> so easy to micromanage their work_mem.  Especially as there are no
>> easy tools that let me as the DBA say "if you connect from this IP
>> address, you get this work_mem".
>
> I'm not very concerned about a regression that is only seen when
> work_mem is set below the (very conservative) postgresql.conf default
> value of 4MB when sorting 100 million integers.

Perhaps surprisingly, I tend to agree.  I'm cautious of regressions
here, but large sorts in queries are relatively uncommon.  You're
certainly not going to want to return a 100 million tuples to the
client.  If you're trying to do a merge join with 100 million tuples,
well, 100 million integers @ 32 bytes per tuple is 3.2GB, and that's
the size of a tuple with a 4 byte integer and at most 4 bytes of other
data being carried along with it.  So in practice you'd probably need
to have at least 5-10GB of data, which means you are trying to sort
data over a million times larger than the amount of memory you allowed
for the sort.   With or without that patch, you should really consider
raising work_mem.  And maybe create some indexes so that the planner
doesn't choose a merge join any more.  The aggregate case is perhaps
with a little more thought: maybe you are sorting 100 million tuples
so that you can GroupAggregate them.  But, there again, the benefits
of raising work_mem are quite large with or without this patch.  Heck,
if you're lucky, a little more work_mem might switch you to a
HashAggregate.  I'm not sure it's worth complicating the code to cater
to those cases.

While large sorts are uncommon in queries, they are much more common
in index builds.  Therefore, I think we ought to be worrying more
about regressions at 64MB than at 4MB, because we ship with
maintenance_work_mem = 64MB and a lot of people probably don't change
it before trying to build an index.  If we make those index builds go
faster, users will be happy.  If we make them go slower, users will be
sad.  So I think it's worth asking the question "are there any CREATE
INDEX commands that someone might type on a system on which they've
done no other configuration that will be slower with this patch"?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

On Mon, Nov 30, 2015 at 9:10 PM, Vinayak  wrote:
> Thanks for the v7.
> Please check the comment below.
> -Table name in the vacuum progress
>
> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s.%s",
> schemaname,relname);

Uh, I hope that line doesn't appear in the patch.  We're scarcely
likely to commit anything that has such an obvious SQL-injection risk
built into it.

https://xkcd.com/327/

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql ignores failure to open -o target file

On Wed, Dec 2, 2015 at 11:07 AM, Tom Lane  wrote:
> I just noticed that parse_psql_options() ignores the result of setQFout(),
> meaning that if the argument of a -o command line option is bogus, we'll
> ignore the switch entirely after printing an error report.  For example
>
> $ psql -o /dev/foo -c 'select 1'
> /dev/foo: Permission denied
>  ?column?
> --
> 1
> (1 row)
>
> $
>
> This seems surprising to me: any other program in the world would do
> exit(1) after discovering that it couldn't write where it had been
> told to.  Should we change this?

I assume this is a rhetorical question.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] broken tests

2015-12-02 Thread Alvaro Herrera

Pavel Stehule wrote:
> Hi
> 
> Today I have problem with regress tests on my laptop.

Maybe this is because of the libxml version?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-02 Thread Alvaro Herrera

Noah Misch wrote:
> On Tue, Dec 01, 2015 at 08:11:21PM -0300, Alvaro Herrera wrote:
> > Finally, I ran perltidy on all the files, which strangely changed stuff
> > that I didn't expect it to change.  I wonder if this is related to the
> > perltidy version.
> 
> The last pgindent run (commit 807b9e0) used perltidy v20090616, and perltidy
> behavior has changed slightly over time.  Install that version to do your own
> perltidy runs.

I tried that version, but it seems to emit the same.  How did you figure
that that was the version used, anyway?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and multimaster

2015-12-02 Thread Konstantin Knizhnik

Thank you for reply.

On 12/02/2015 08:30 PM, Robert Haas wrote:

Logical decoding only begins decoding a transaction once the
transaction is complete. So I would guess that the sequence of
operations here is something like this - correct me if I'm wrong:

1. Do the transaction.
2. PREPARE.
3. Replay the transaction.
4. PREPARE the replay.
5. COMMIT PREPARED on original machine.
6. COMMIT PREPARED on replica.

Logical decoding is started after execution of XLogFlush method.
So atually transaction is not yet completed at this moment:
- it is not marked as committed in clog
- It is marked as in-progress in procarray
- locks are not released

We are not using PostgreSQL two-phase commit here.
Instead of our DTM catches control in TransactionIdCommitTree and sends request
to arbiter which in turn wait status of committing transactions on replicas.
The problem is that transactions are delivered to replica through single
channel: logical replication slot.
And while such transaction is waiting acknowledgement from arbiter, it is
blocking replication channel preventing other (parallel transactions) from
been replicated and applied.

I have implemented pool of background workers. May be it will be useful not
only for me.
It consists of one produces-multiple consumers queue implemented using buffer
in shared memory, spinlock and two semaphores.
API is very simple:

typedef void(*BgwPoolExecutor)(int id, void* work, size_t size);
typedef BgwPool*(*BgwPoolConstructor)(void);

extern void BgwPoolStart(int nWorkers, BgwPoolConstructor constructor);
extern void BgwPoolInit(BgwPool* pool, BgwPoolExecutor executor, char const*
dbname, size_t queueSize);
extern void BgwPoolExecute(BgwPool* pool, void* work, size_t size);

You just place in this queue some bulk of bytes (work, size), it is placed in
queue and then first available worker will dequeue it and execute.

Using this pool and larger number of accounts (reducing possibility of
conflict), I get better results.
So now receiver of logical replication is not executing transactions directly,
instead of it receiver is placing them in queue and them are executed
concurrent by pool of background workers.

At cluster with three nodes results of out debit-credit benchmark are the
following:

TPS
Multimaster (ACID transactions)
12500
Multimaster (async replication)
34800
Standalone PostgreSQL
44000

We tested two modes: when client randomly distribute queries between cluster nodes and when client is working only with one master nodes and other are just used as replicas. Performance is slightly better in the second case, but the difference is not very
large (about 11000 TPS in first case).

Number of workers in pool has signficant imact on performance: with 8 workers
we get about 7800 TPS and with 16 workers - 12500.
Also performance greatly depends on number of accounts (and so probability of
lock conflicts). In case of 100 accounts speed is less than 1000 TPS.

Step 3 introduces latency proportional to the amount of work the
transaction did, which could be a lot. If you were doing synchronous
physical replication, the replay of the COMMIT record would only need
to wait for the replay of the commit record itself. But with
synchronous logical replication, you've got to wait for the replay of
the entire transaction. That's a major bummer, especially if replay
is single-threaded and there a large number of backends generating
transactions. Of course, the 2PC dance itself can also add latency -
that's most likely to be the issue if the transactions are each very
short.

What I'd suggest is trying to measure where the latency is coming
from. You should be able to measure how much time each transaction
spends (a) executing, (b) preparing itself, (c) waiting for the replay
thread to begin replaying it, (d) waiting for the replay thread to
finish replaying it, and (e) committing. Separating (c) and (d) might
be a little bit tricky, but I bet it's worth putting some effort in,
because the answer is probably important to understanding what sort of
change will help here. If (c) is the problem, you might be able to
get around it by having multiple processes, though that only helps if
applying is slower than decoding. But if (d) is the problem, then the
only solution is probably to begin applying the transaction
speculatively before it's prepared/committed. I think.

Re: [HACKERS] Another little thing about psql wrapped expanded output

2015-12-02 Thread Josh Berkus

On 12/01/2015 11:20 AM, Tom Lane wrote:
> Notice that the dashed lines go all the way to the right margin of my
> 80-column terminal window, even though the data requires no more than
> 22 columns.  While this doesn't look so awful as-is, when I'm working
> in a very wide window it starts to look a little silly.
> 
> The behavior I'd have expected is that if the data is narrower than
> the window, the lines only go to the right margin of the data.  This
> is a trivial change to the logic in print_aligned_vertical, but before
> I go make it, does anyone want to argue that the current behavior is
> preferable to that?

If you're fixing the dashed-line code, is there a way to say that we
never have more than a reasonable number of dashes (ideally, the width
of the terminal) no matter how wide the data is?  Having 4000 dashes
because of large text on one row is kinda painful, and not at all useful.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Using quicksort for every external sort run

2015-12-02 Thread Peter Geoghegan

On Wed, Dec 2, 2015 at 10:03 AM, Robert Haas  wrote:
>> I'm not very concerned about a regression that is only seen when
>> work_mem is set below the (very conservative) postgresql.conf default
>> value of 4MB when sorting 100 million integers.
>
> Perhaps surprisingly, I tend to agree.  I'm cautious of regressions
> here, but large sorts in queries are relatively uncommon.  You're
> certainly not going to want to return a 100 million tuples to the
> client.

Right. The fact that it was only a 5% regression is also a big part of
what made me unconcerned. I am glad that we've characterized the
regression that I assumed was there, though -- I certainly knew that
Knuth and so on were not wrong to emphasize increasing run size in the
1970s. Volume 3 of The Art of Computer Programming literally has a
pull-out chart showing the timing of external sorts. This includes the
time it takes for a human operator to switch magnetic tapes, and
rewind those tapes. The underlying technology has changed rather a lot
since, of course.

> While large sorts are uncommon in queries, they are much more common
> in index builds.  Therefore, I think we ought to be worrying more
> about regressions at 64MB than at 4MB, because we ship with
> maintenance_work_mem = 64MB and a lot of people probably don't change
> it before trying to build an index.  If we make those index builds go
> faster, users will be happy.  If we make them go slower, users will be
> sad.  So I think it's worth asking the question "are there any CREATE
> INDEX commands that someone might type on a system on which they've
> done no other configuration that will be slower with this patch"?

I certainly agree that that's a good place to focus. I think that it's
far, far less likely that anything will be slowed down when you take
this as a cut-off point. I don't want to overemphasize it, but the
analysis of how many more passes are needed because of lack of a
replacement selection heap (the "quadratic growth" thing) gives me
confidence. A case with less than 4MB of work_mem is where we actually
saw *some* regression.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: function parse_ident

Hi

2015-11-17 1:49 GMT+01:00 Marko Tiikkaja :

> On 9/11/15 12:25 PM, Pavel Stehule wrote:
>
>> new update of parse_ident function patch
>>
>
> Nice!  I've certainly wanted something like this a number of times.
>
> Some comments about the v2 of the patch:
>
>- The patch doesn't apply anymore, so it should be rebased.
>

done


>- The docs don't even try and explain what the "strictmode" parameter
> does.


fixed

   - The comment before the parse_ident function is not up to date anymore,
> since "the rest" was removed from the interface.


fixed

   - I can't immediately grep for any uses of  do { .. } while (true) from
> our code base.  AFAICT the majority look like  for (;;);  I see no reason
> not to be consistent here.
>

fixed

   - What should happen if the input is a string like
> 'one.two.three.four.five.six'?  Do we want to accept input like that?
>

I don't see the reason why not. It is pretty simple to count fields in
result array and raise error later. The application has better information
about expected and valid numbers. But any opinion in this question should
be valid. I have not strong position here.


>- I haven't reviewed the actual parsing code carefully, but didn't we
> already have a function which splits identifiers up?  I of course can't
> find one with my grepping right now, so I might be wrong.
>

There is: SplitIdentifierString or textToQualifiedNameList in varlena.c. My
first patch was based on these functions. But I cannot to use it.

1. major reason: The buildin parser is based on searching the dot "." and
doesn't search any disallowed identifiers chars. So there is not possible
to implement non strict mode - find last char of last identifier and ignore
other.
2. minor reason: little bit more precious diagnostics - buildin routines
returns only true (valid) and false (invalid).

Regards

Pavel


>
>
> .m
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 60b9a09..7b65ef4
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
***
*** 1707,1712 
--- 1707,1729 

 
  
+  parse_ident
+ 
+ parse_ident(str text,
+[ strictmode boolean DEFAULT true ] )
+
+text[]
+Split qualified identifier to array parts.
+When second parameter is true, then no any chars after last identifier is allowed. When
+second parameter is false, then chars after last identifier are ignored.
+
+parse_ident('"SomeSchema".someTable')
+"SomeSchema,sometable"
+   
+ 
+   
+
+ 
   pg_client_encoding
  
  pg_client_encoding()
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
new file mode 100644
index ccc030f..9a7e89d
*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
*** RETURNS jsonb
*** 940,942 
--- 940,949 
  LANGUAGE INTERNAL
  STRICT IMMUTABLE
  AS 'jsonb_set';
+ 
+ CREATE OR REPLACE FUNCTION
+   parse_ident(str text, strictmode boolean DEFAULT true)
+ RETURNS text[]
+ LANGUAGE INTERNAL
+ STRICT IMMUTABLE
+ AS 'parse_ident';
diff --git a/src/backend/utils/adt/misc.c b/src/backend/utils/adt/misc.c
new file mode 100644
index 3ef6e43..2540dac
*** a/src/backend/utils/adt/misc.c
--- b/src/backend/utils/adt/misc.c
***
*** 21,32 
--- 21,35 
  #include 
  
  #include "access/sysattr.h"
+ #include "access/htup_details.h"
  #include "catalog/catalog.h"
+ #include "catalog/namespace.h"
  #include "catalog/pg_tablespace.h"
  #include "catalog/pg_type.h"
  #include "commands/dbcommands.h"
  #include "funcapi.h"
  #include "miscadmin.h"
+ #include "parser/scansup.h"
  #include "parser/keywords.h"
  #include "postmaster/syslogger.h"
  #include "rewrite/rewriteHandler.h"
***
*** 38,43 
--- 41,47 
  #include "utils/ruleutils.h"
  #include "tcop/tcopprot.h"
  #include "utils/acl.h"
+ #include "utils/array.h"
  #include "utils/builtins.h"
  #include "utils/timestamp.h"
  
*** pg_column_is_updatable(PG_FUNCTION_ARGS)
*** 598,600 
--- 602,755 
  
  	PG_RETURN_BOOL((events & REQ_EVENTS) == REQ_EVENTS);
  }
+ 
+ 
+ /*
+  * This simple parser utility are compatible with lexer implementation,
+  * used only in parse_ident function
+  */
+ static bool
+ is_ident_start(unsigned char c)
+ {
+ 	if (c == '_')
+ 		return true;
+ 	if ((c >= 'a' && c <= 'z') || (c >= 'A' && c <= 'Z'))
+ 		return true;
+ 
+ 	if (c >= 0200 && c <= 0377)
+ 		return true;
+ 
+ 	return false;
+ }
+ 
+ static bool
+ is_ident_cont(unsigned char c)
+ {
+ 	if (c >= '0' && c <= '9')
+ 		return true;
+ 
+ 	return is_ident_start(c);
+ }
+ 
+ /*
+  * parse_ident - parse SQL composed identifier to separate identifiers.
+  * When strict mode is active (second parameter), then any chars after
+  * last identifiers are disallowed.
+  */
+ Datum
+

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-02 Thread Alvaro Herrera

Vinayak wrote:

> In the vacuum progress, column table_name is showing first 30 characters of
> table name.
> postgres=# create table test_vacuum_progress_in_postgresql(c1 int,c2 text);
> postgres=# select * from pg_stat_vacuum_progress ;
> -[ RECORD 1 ]---+--
> pid | 12284
> table_name  | public.test_vacuum_progress_i

Actually, do we really need to have the table name as a string at all
here?  Why not just report the table OID?  Surely whoever wants to check
the progress can connect to the database in question to figure out the
table name.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] broken tests

2015-12-02 20:08 GMT+01:00 Alvaro Herrera :

> Pavel Stehule wrote:
> > Hi
> >
> > Today I have problem with regress tests on my laptop.
>
> Maybe this is because of the libxml version?


100%, same issue is with 9.4.5

After downgrade to 2.9.2 (from 2.9.3) this issue was out

So it is looking like Fedora fault

Regards

Pavel



>
> --
> Álvaro Herrerahttp://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

Re: [HACKERS] snapshot too old, configured by time

2015-12-02 Thread Kevin Grittner

On Wed, Dec 2, 2015 at 12:39 AM, Michael Paquier
 wrote:
> On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer  wrote:

>> In snapmgr.c
>>
>>
>> + * XXX: If we can trust a read of an int64 value to be atomic, we can skip 
>> the
>> + * spinlock here.
>> + */
>> +int64
>> +GetOldSnapshotThresholdTimestamp(void)
>>
>>
>> Was your intent with the XXX for it to be a TODO to only aquire the lock on
>> platforms without the atomic 64bit operations?

I'm not sure whether we can safely assume a read of an int64 to be
atomic on any platform; if we actually can on some platforms, and
we have a #define to tell us whether we are in such an environment,
we could condition the spinlock calls on that.  Are we there yet?

>> On a more general note:
>>
>> I've tried various manual tests of this feature and it sometimes works as
>> expected and sometimes doesn't.
>> I'm getting the feeling that how I expect it to work isn't quite in sync
>> with how it does work.
>>
>> I'd expect the following to be sufficient to generate the test
>>
>> T1: Obtains a snapshot that can see some rows
>> T2: Waits 60 seconds and performs an update on those rows
>> T2: Performs a vacuum
>> T1: Waits 60 seconds, tries to select from the table.  The snapshot should
>> be too old
>>
>>
>> For example it seems that in test 002 the select_ok on conn1 following the
>> vacuum but right before the final sleep is critical to the snapshot too old
>> error showing up (ie if I remove that select_ok but leave in the sleep I
>> don't get the error)
>>
>> Is this intended and if so is there a better way we can explain how things
>> work?

At every phase I took a conservative approach toward deferring
pruning of tuples still visible to any snapshot -- often reducing
the overhead of tracking by letting things go to the next minute
boundary.  The idea is that an extra minute or two probably isn't
going to be a big deal in terms of bloat, so if we can save any
effort on the bookkeeping by letting things go a little longer, it
is a worthwhile trade-off.  That does make it hard to give a
precise statement of exactly when a transaction *will* be subject
to cancellation based on this feature, so I have emphasized the
minimum guaranteed time that a transaction will be *safe*.  In
reviewing what you describe, I think I still don't have it as
aggressive as I can (and probably should).  My biggest concern is
that a long-running transaction which gets a transaction ID
matching the xmin on a snapshot it will hold for a long time may
not be subject to cancellation.  That's probably not too hard to
fix, but the bigger problem is the testing.

People have said that issuing SQL commands directly from a TAP test
via DBD::Pg is not acceptable for a core feature, and (despite
assertions to the contrary) I see no way to test this feature with
existing testing mechanisms.  The bigger set of work here, if we
don't want this feature to go in without any testing scripts (which
is not acceptable IMO), is to enhance the isolation tester or
hybridize TAP testing with the isolation tester.

>> Also is 0 intended to be an acceptable value for old_snapshot_threshold and
>> if so what should we expect to see then?

The docs in the patch say this:

+
+ A value of -1 disables this feature, and is the default.
+ Useful values for production work probably range from a small number
+ of hours to a few days.  The setting will be coerced to a granularity
+ of minutes, and small numbers (such as 0 or
+ 1min) are only allowed because they may sometimes be
+ useful for testing.  While a setting as high as 60d is
+ allowed, please note that in many workloads extreme bloat or
+ transaction ID wraparound may occur in much shorter time frames.
+

Once we can agree on a testing methodology I expect that I will be
adding a number of tests based on a cluster started with
old_snapshot_threshold = 0, but as I said in my initial post of the
patch I was keeping the tests in the patch thin until it was
confirmed whether this testing methodology was acceptable.  Since
it isn't, that was just as well.  The time put into learning enough
about perl and TAP tests to create those tests already exceeds the
time to develop the actual patch, and it looks like even more will
be needed for a test methodology that doesn't require adding a
package or downloading a CPAN module.  C'est la vie.  I did expand
my perl and TAP knowledge considerably, for what that's worth.

> There has been a review but no replies for more than 1 month. Returned
> with feedback?

I do intend to post another version of the patch to tweak the
calculations again, after I can get a patch in to expand the
testing capabilities to allow an acceptable way to test the patch
-- so I put it into the next CF instead.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing

[HACKERS] broken tests

Hi

Today I have problem with regress tests on my laptop.

I did fresh clone of git

Is it working elsewhere?

Regards

Pavel


regression.diffs
Description: Binary data


regression.out
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Postgres_FDW optimizations

2015-12-02 Thread cevian

Hi all,

I have a question about postgres_fdw optimizations/pushdown:

I have the following code running on 9.5beta2 (same format as
previous/related message for consistency)
CREATE EXTENSION postgres_fdw; 
CREATE SERVER loop foreign data wrapper postgres_fdw 
  OPTIONS (port '5432', dbname 'testdb'); 
CREATE USER MAPPING FOR PUBLIC SERVER loop; 

create table onemillion ( 
id serial primary key, 
inserted timestamp default clock_timestamp(), 
data text 
); 

insert into onemillion(data) select random() from 
generate_series(1,100); 

CREATE FOREIGN TABLE onemillion_pgfdw ( 
id int, 
inserted timestamp, 
data text 
) SERVER loop 
OPTIONS (table_name 'onemillion', 
 use_remote_estimate 'true'); 

explain verbose select * from onemillion_pgfdw order by id limit 1;
 QUERY PLAN

 Limit  (cost=43434.00..43434.00 rows=1 width=30)
   Output: id, inserted, data
   ->  Sort  (cost=43434.00..45934.00 rows=100 width=30)
 Output: id, inserted, data
 Sort Key: onemillion_pgfdw.id
 ->  Foreign Scan on public.onemillion_pgfdw  (cost=100.00..38434.00
rows=100 width=30)
   Output: id, inserted, data
   Remote SQL: SELECT id, inserted, data FROM public.onemillion

This is obviously highly inefficient. The sort and limit should be pushed
down to the foreign node, especially on such a simple query. I have 3
questions:

1) Is this the expected stated of the fdw optimizations for now, or is it a
bug?
2) Is anybody working on this type of pushdown right now (I would be more
than willing to collaborate on a patch)
3) Is this possible to fix with with views/rules/triggers/different query. I
couldn't find a way. Relatedly, is there a way to explicitly specify an
explicit remote query to run through the fdw? 

Thanks,
Matvey Arye
Iobeam, Inc.




--
View this message in context: 
http://postgresql.nabble.com/Postgres-FDW-optimizations-tp5875911.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-12-02 Thread Merlin Moncure

On Tue, Dec 1, 2015 at 8:46 AM, YUriy Zhuravlev
 wrote:
> On Tuesday 01 December 2015 08:38:21 you wrote:
>> it (zero
>> based indexing support) doesn't meet the standard of necessity for
>> adding to the core API and as stated it's much to magical.
>
> We do not touch the arrays, we simply create a function to access them with a
> comfortable behavior. Creating a separate array types in the form extension is
> very difficult IMHO.

Correct; what I'm saying is that we don't need core API support for
zero based array indexing.  A hypothetical extension could give 100%
function based support for that so that equivalents to all functions
are given: array_upper, array_lower, etc etc etc.  You are correct
that it could not implement alternative syntactical array features.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

2015-12-02 Thread Julian Schauder

Hello Hackers,


I recently analyzed an incident where a major lag in synchronous replication
blocked a number of synchronous backends. I found myself looking at backends
that, according to pg_stat_activity, were neither waiting nor idle but yet they
didn't finish their work.

As it turns out, the major waiting loop for syncrep updates the processtitle,
but is silent within postgres and stat_activity. It seems misleading that
commited but waiting backends are 'active' although there is little done apart
from waiting.

> # select pid, waiting, state, substr(query,1,6) from pg_stat_activity ;
>   pid  | waiting | state  | substr
> ---+-++
>  26294 | f   | active | END;
>  26318 | f   | active | create
>  26323 | f   | active | insert
>  26336 | f   | active | insert
(output of waiting statements [vanilla])

While 'active' is technically correct for a backend that is commited but waiting
for replication in terms of 'not beeing available for new tasks', it also
implies that a backend is dealing with the issue at hand. The remote host
however is out of our clusters control, hence all signs should be pointing to
the standby-host.


I suggest adding a new state to pg_stat_activity.state for backends that are
waiting for their synchronous commit to be flushed on the remote host.
I chose 'waiting for synchronous replication' for now.

One should refrain from the waiting flag at this point as there is no waiting
done on internal processes. Instead the backend waits for factors beyond our
clusters control to change.


> # select pid, waiting, state, substr(query,1,6) from pg_stat_activity ;
>  pid  | waiting |state| substr
> --+-+-+
>  3360 | f   | waiting for synchronous replication | END;
>  3465 | f   | waiting for synchronous replication | create
>  3477 | f   | waiting for synchronous replication | insert
>  3489 | f   | waiting for synchronous replication | insert
(output of waiting statements [patched])


patch attached


regards,

Julian Schauder
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e64b7ef..458ae0f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -642,6 +642,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   
11:34   0:00 postgres: ser
  
  
   
+   waiting for synchronous replication: The backend is 
waiting for its transaction to be flushed on a synchronous standby.
+  
+ 
+ 
+  
idle: The backend is waiting for a new client command.
   
  
diff --git a/src/backend/replication/syncrep.c 
b/src/backend/replication/syncrep.c
index 325239d..b6ee1c3 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -45,7 +45,7 @@
 #include "postgres.h"
 
 #include 
-
+#include 
 #include "access/xact.h"
 #include "miscadmin.h"
 #include "replication/syncrep.h"
@@ -151,6 +151,16 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
set_ps_display(new_status, false);
new_status[len] = '\0'; /* truncate off " waiting ..." */
}
+   /*
+* Alter state in pg_stat before entering the loop.
+* As with updating the ps display it is save to assume that we'll wait
+* at least for a short time. Hence updating to a waiting state seems
+* appropriate even without exactly checking if waiting is required.
+* However, we avoid using the waiting-flag at this point as there is
+* no lock to wait for.
+*/
+
+   pgstat_report_activity(STATE_WAITINGFORREPLICATION,NULL);
 
/*
 * Wait for specified LSN to be confirmed.
diff --git a/src/backend/utils/adt/pgstatfuncs.c 
b/src/backend/utils/adt/pgstatfuncs.c
index f7c9bf6..84d67e0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -663,6 +663,9 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
case STATE_IDLEINTRANSACTION_ABORTED:
values[4] = CStringGetTextDatum("idle 
in transaction (aborted)");
break;
+   case STATE_WAITINGFORREPLICATION:
+   values[4] = 
CStringGetTextDatum("waiting for synchronous replication");
+   break;
case STATE_DISABLED:
values[4] = 
CStringGetTextDatum("disabled");
break;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9ecc163..ab1befc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -692,6 +692,7 @@ typedef enum BackendState
STATE_IDLEINTRANSACTION,
STATE_FASTPATH,
STATE_IDLEINTRANSACTION_ABORTED,
+

Re: [HACKERS] More stable query plans via more predictable column statistics

2015-12-02 Thread Shulgin, Oleksandr

On Tue, Dec 1, 2015 at 7:00 PM, Tom Lane  wrote:

> "Shulgin, Oleksandr"  writes:
> > This post summarizes a few weeks of research of ANALYZE statistics
> > distribution on one of our bigger production databases with some
> real-world
> > data and proposes a patch to rectify some of the oddities observed.
>
> Please add this to the 2016-01 commitfest ...
>

Added: https://commitfest.postgresql.org/8/434/

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual


On 2015/12/02 1:54, Robert Haas wrote:

On Fri, Nov 27, 2015 at 1:25 AM, Kouhei Kaigai  wrote:

Sorry, I don't understand this.  In my understanding, fdw_recheck_quals
can be defined for a foreign join, regardless of the join type,



Yes, "can be defined", but will not be workable if either side of
joined tuple is NULL because of outer join. SQL functions returns
NULL prior to evaluation, then ExecQual() treats this result as FALSE.
However, a joined tuple that has NULL fields may be a valid tuple.

We don't need to care about unmatched tuple if INNER JOIN.



This is a really good point, and a very strong argument for the design
KaiGai has chosen here.


Maybe my explanation was not enough.  Sorry about that.  But I mean that 
we define fdw_recheck_quals for a foreign-join as quals that 1) were 
extracted by extract_actual_join_clauses as "otherclauses" 
(rinfo->is_pushed_down=true) and that 2) were pushed down to the remote 
server, not scan quals relevant to all the base tables invoved in the 
foreign-join.  So in this definition, I think fdw_recheck_quals for a 
foreign-join will be workable, regardless of the join type.


Best regards,
Etsuro Fujita




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual


On 2015/12/02 14:54, Kouhei Kaigai wrote:

On 2015/12/02 1:41, Robert Haas wrote:

On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
 wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.



I understand this, as I also did the same thing in my patches, but actually,
that seems a bit complicated to me.  Instead, could we keep the
fdw_outerpath in the fdw_private of a ForeignPath node when creating the
path node during GetForeignPaths, and then create an outerplan accordingly
from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by
using create_plan_recurse there?  I think that that would make the core
involvment much simpler.



I can't see how it's going to get much simpler than this.  The core
core is well under a hundred lines, and it all looks pretty
straightforward to me.  All of our existing path and plan types keep
lists of paths and plans separate from other kinds of data, and I
don't think we're going to win any awards for deviating from that
principle here.



One thing I can think of is that we can keep both the structure of a
ForeignPath node and the API of create_foreignscan_path as-is.  The
latter is a good thing for FDW authors.  And IIUC the patch you posted
today, I think we could make create_foreignscan_plan a bit simpler too.
   Ie, in your patch, you modified that function as follows:

@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root,
ForeignPath *best_path,
 */
scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,

best_path,
-
tlist, scan_clauses);
+
tlist,
+
scan_clauses);
+   outerPlan(scan_plan) = fdw_outerplan;

I think that would be OK, but I think we would have to do a bit more
here about the fdw_outerplan's targetlist and qual; I think that the
targetlist needs to be changed to fdw_scan_tlist, as in the patch [1],



Hmm... you are right. The sub-plan shall generate a tuple according to
the fdw_scan_tlist, if valid. Do you think the surgical operation is best
to apply alternative target-list than build_path_tlist()?


Sorry, I'm not sure about that.  I thought changing it to fdw_scan_tlist 
just because that's simple.



and that it'd be better to change the qual to remote conditions, ie,
quals not in the scan_plan's scan.plan.qual, to avoid duplicate
evaluation of local conditions.  (In the patch [1], I didn't do anything
about the qual because the current postgres_fdw join pushdown patch
assumes that all the the scan_plan's scan.plan.qual are pushed down.)
Or, FDW authors might want to do something about fdw_recheck_quals for a
foreign-join while creating the fdw_outerplan.  So if we do that during
GetForeignPlan, I think we could make create_foreignscan_plan a bit
simpler, or provide flexibility to FDW authors.



So, you suggest it is better to pass fdw_outerplan on the GetForeignPlan
callback, to allow FDW to adjust target-list and quals of sub-plans.


I think that is one option for us.  Another option, which I proposed 
above, is 1) store an fdw_outerpath in the fdw_private when creating the 
ForeignPath node in GetForeignPaths, and then 2) create an fdw_outerplan 
from the fdw_outerpath stored into the fdw_private when creating the 
ForeignScan node in GetForeignPlan, by using create_plan_recurse in 
GetForeignPlan.  (To do so, I was thinking to make that function 
extern.)  One good point about that is that we can keep the API of 
create_foreignscan_path as-is, which I think would be a good thing for 
FDW authors that don't care about join pushdown.



I think it is reasonable argue. Only FDW knows which qualifiers are
executable on remote side, so it is not easy to remove qualifiers to be
executed on host-side only, from the sub-plan tree.


Yeah, we could provide the flexibility to FDW authors.


@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

  ResetExprContext(econtext);

+   /*
+* FDW driver has to recheck visibility of EPQ tuple towards
+* the scan qualifiers once it gets pushed down.
+* In addition, if this node represents a join sub-tree, not
+* a scan, FDW driver is also responsible to reconstruct
+* a joined tuple according to the primitive EPQ tuples.
+*/
+   if (fdwroutine->RecheckForeignScan)
+   {
+   if (!fdwroutine->RecheckForeignScan(node, slot))
+   return false;
+   }

Maybe I'm missing something, but I think we should let FDW do the work if
scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
abort the transaction.)



That

Re: [HACKERS] psql: add \pset true/false

2015-12-02 Thread Daniel Verite

Michael Paquier wrote:

> So, looking at this thread, here is the current status:
> - Tom Lane: -1
> - Michael Paquier: -1
> - Peter Geoghegan: +1?
> - Peter Eisentraut: +1
> - the author: surely +1.
> Any other opinions? Feel free to correct this list if needed, and then
> let's try to move on on a consensus if need be to decide the destiny
> of this patch.

In the positive case, the doc should warn somehow about the setting
not being effective when the boolean is inside:
- a field of ROW() type
- an array
- \copy or COPY output
- a field of composite type

(personally this list of exceptions makes me side with the "-1" team).


Other than that,  minor typos to fix (s/prined/printed):

+   fprintf(output, _("  true   set the string to be prined
in place of a TRUE value\n"));
+   fprintf(output, _("  false  set the string to be prined
in place of a FALSE value\n"));



Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-12-02 Thread Amit Kapila

On Wed, Dec 2, 2015 at 12:06 PM, Michael Paquier 
wrote:
>
> On Sun, Nov 22, 2015 at 3:25 PM, Amit Kapila 
wrote:
> > On Fri, Nov 20, 2015 at 11:34 PM, Robert Haas 
wrote:
> >>
> >> On Thu, Nov 19, 2015 at 11:59 PM, Amit Kapila 
> >> wrote:
> >> > Isn't it better to destroy the memory for readers array as that gets
> >> > allocated
> >> > even if there are no workers available for execution?
> >> >
> >> > Attached patch fixes the issue by just destroying readers array.
> >>
> >> Well, then you're making ExecGatherShutdownWorkers() not a no-op any
> >> more.  I'll go commit a combination of your two patches.
> >>
> >
> > Thanks!
>
> There is still an entry in the CF app for this thread as "Parallel Seq
> scan". The basic infrastructure has been committed, and I understand
> that this is a never-ending tasks and that there will be many
> optimizations. Still, are you guys fine to switch this entry as
> committed for now?
>

I am fine with it.  I think the further optimizations can be done
separately.



With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

2015-12-02 Thread Kouhei Kaigai

> On Thu, Nov 26, 2015 at 5:27 AM, Kouhei Kaigai  wrote:
> > I'm now implementing. The above design perfectly works on ForeignScan.
> > On the other hands, I'd like to have deeper consideration for CustomScan.
> >
> > My recent patch adds LibraryName and SymbolName on CustomScanMethods
> > to lookup the method table even if library is not loaded yet.
> > However, this ExtensibleNodeMethods relies custom scan provider shall
> > be loaded, by parallel infrastructure, prior to the deserialization.
> > It means extension has a chance to register itself as well.
> >
> > My idea is, redefine CustomScanMethod as follows:
> >
> > typedef struct ExtensibleNodeMethods
> > {
> > const char *extnodename;
> > Sizenode_size;
> > Node *(*nodeCopy)(const Node *from);
> > bool  (*nodeEqual)(const Node *a, const Node *b);
> > void  (*nodeOut)(struct StringInfoData *str, const Node *node);
> > void  (*nodeRead)(Node *node);
> > } ExtensibleNodeMethods;
> >
> > typedef struct CustomScanMethods
> > {
> > union {
> > const char *CustomName;
> > ExtensibleNodeMethods  xnode;
> > };
> > /* Create execution state (CustomScanState) from a CustomScan plan node
> */
> > Node   *(*CreateCustomScanState) (struct CustomScan *cscan);
> > } CustomScanMethods;
> >
> > It allows to pull CustomScanMethods using GetExtensibleNodeMethods
> > by CustomName; that is alias of extnodename in ExtensibleNodeMethods.
> > Thus, we don't need to care about LibraryName and SymbolName.
> >
> > This kind of trick is not needed for ForeignScan because all the pointer
> > stuff are reproducible by catalog, however, method table of CustomScan
> > is not.
> >
> > How about your opinion?
> 
> Anonymous unions are not C89, so this is a no-go.
> 
> I think we should just drop CustomName and replace it with
> ExtensibleNodeMethods.  That will break things for third-party code
> that is using the existing CustomScan stuff, but there's a good chance
> that nobody other than you has written any such code.  And even if
> someone has, updating it for this change will not be very difficult.
>
Thanks, I have same opinion.
At this moment, CustomName is not utilized so much except for EXPLAIN
and debug output, so it is not a hard stuff to replace this field by
extnodename, even if custom scan provider does not have own structure
thus no callbacks of node operations are defined.

The attached patch adds 'extnodename' fields on ForeignPath and
ForeignScan, also embeds ExtensibleNodeMethods in CustomPathMethods,
CustomScanMethods and CustomExecMethods.

Its handlers in copyfuncs.c were enhanced to utilize the callback
to allocate a larger structure and copy private fields.
Handlers in outfuncs.c and readfuncs.c have to be symmetric. The
core backend writes out 'extnodename' prior to any private fields,
then we can identify the callback to process rest of tokens for
private fields.

RegisterExtensibleNodeMethods() tracks a pair of 'extnodename'
and ExtensibleNodeMethods itself. It saves the pointer of the
method table, but not duplicate, because _readCustomScan()
cast the method pulled by 'extnodename' thus registered table
has to be a static variable on extension; that means extension
never update ExtensibleNodeMethods once registered.

The one other patch is my test using PG-Strom, however, I didn't
have a test on FDW extensions yet.

Thanks,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 


pgstrom-custom-private.test.patch
Description: pgstrom-custom-private.test.patch


pgsql-v9.6-custom-private.v2.patch
Description: pgsql-v9.6-custom-private.v2.patch

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-02 Thread Dave Page

On Tue, Dec 1, 2015 at 9:55 PM, Bruce Momjian  wrote:
> On Tue, Dec  1, 2015 at 06:40:09PM -0300, Alvaro Herrera wrote:
>> Bruce Momjian wrote:
>>
>> > Do we still have licensing issues if we ship Postgres and OpenSSL
>> > together?
>>
>> See
>> https://www.postgresql.org/message-id/20150801151410.GA28344%40awork2.anarazel.de
>
> True, but the current license is unchanged and has the advertising
> clause, which I think we have to honor if we ship OpenSSL:
>
> https://www.openssl.org/source/license.html
>
> I assume Windows has to ship OpenSSL with the installer and has to abide
> by this, for example.  OSX might have to do the same.  It might be good
> to see what we do for Windows packages.

We already do it for all our installers - Windows, OSX and Linux. We
have to, otherwise we wouldn't be able to ensure the same binaries
would run on all the different supported versions.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] find_inheritance_children() and ALTER TABLE NO INHERIT

Currently find_inheritance_children() is smart enough to skip a child
table that it finds has been dropped concurrently after it gets a lock on
the same. It does so by looking up the child relid in syscache. It seems
it should also check if the table is still in the list of children of the
parent. Doing so by scanning the pg_inherits(inhparent) index may likely
be inefficient. So, how about adding that syscache on
pg_inherits(inherelid, inhparent) [1]?

I was prompted by a report sometime ago on -general [2] about the "could
not find inherited attribute..." error. Also, it was reported as a bug few
years ago where locking parent exclusively in ALTER TABLE NO INHERIT as a
solution was dismissed for being prone to deadlock issues [3].

Would it be worth the trouble?

Thanks,
Amit

[1] http://www.postgresql.org/message-id/25515.1149652...@sss.pgh.pa.us
[2] http://www.postgresql.org/message-id/55ba1a06.1000...@gmail.com
[3] http://www.postgresql.org/message-id/19666.1213709...@sss.pgh.pa.us




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)

On 2015-12-02 08:52:20 +, Kouhei Kaigai wrote:
> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
> index 26264cb..c4bb76e 100644
> --- a/src/backend/nodes/copyfuncs.c
> +++ b/src/backend/nodes/copyfuncs.c
> @@ -635,8 +635,12 @@ _copyWorkTableScan(const WorkTableScan *from)
>  static ForeignScan *
>  _copyForeignScan(const ForeignScan *from)
>  {
> - ForeignScan *newnode = makeNode(ForeignScan);
> -
> + const ExtensibleNodeMethods *methods
> + = GetExtensibleNodeMethods(from->extnodename, true);
> + ForeignScan *newnode = (ForeignScan *) newNode(!methods
> + 
>? sizeof(ForeignScan)
> + 
>: methods->node_size,
> + 
>T_ForeignScan);

Changing the FDW API seems to put this squarely into 9.6
territory. Agreed?

>   /*
>* copy node superclass fields
>*/
> @@ -645,6 +649,7 @@ _copyForeignScan(const ForeignScan *from)
>   /*
>* copy remainder of node
>*/
> + COPY_STRING_FIELD(extnodename);
>   COPY_SCALAR_FIELD(fs_server);
>   COPY_NODE_FIELD(fdw_exprs);
>   COPY_NODE_FIELD(fdw_private);
> @@ -652,6 +657,8 @@ _copyForeignScan(const ForeignScan *from)
>   COPY_NODE_FIELD(fdw_recheck_quals);
>   COPY_BITMAPSET_FIELD(fs_relids);
>   COPY_SCALAR_FIELD(fsSystemCol);
> + if (methods && methods->nodeCopy)
> + methods->nodeCopy((Node *) newnode, (const Node *) from);

I wonder if we shouldn't instead "recurse" into a generic routine for
extensible nodes here.
> +void
> +RegisterExtensibleNodeMethods(const ExtensibleNodeMethods *methods)
> +{
> + uint32  hash;
> + int index;
> + ListCell   *lc;
> + MemoryContext   oldcxt;
> +
> + if (!xnodes_methods_slots)
> + xnodes_methods_slots = MemoryContextAllocZero(TopMemoryContext,
> + 
>   sizeof(List *) * XNODES_NUM_SLOTS);
> +
> + hash = hash_any(methods->extnodename, strlen(methods->extnodename));
> + index = hash % XNODES_NUM_SLOTS;
> +
> + /* duplication check */
> + foreach (lc, xnodes_methods_slots[index])
> + {
> + const ExtensibleNodeMethods *curr = lfirst(lc);
> +
> + if (strcmp(methods->extnodename, curr->extnodename) == 0)
> + ereport(ERROR,
> + (errcode(ERRCODE_DUPLICATE_OBJECT),
> +  errmsg("ExtensibleNodeMethods \"%s\" 
> already exists",
> + methods->extnodename)));
> + }
> + /* ok, register this entry */
> + oldcxt = MemoryContextSwitchTo(TopMemoryContext);
> + xnodes_methods_slots[index] = lappend(xnodes_methods_slots[index],
> + 
>   (void *)methods);
> + MemoryContextSwitchTo(oldcxt);
> +}

Why aren't you using dynahash here, and instead reimplement a hashtable?


>  static ForeignScan *
>  _readForeignScan(void)
>  {
> + const ExtensibleNodeMethods *methods;
>   READ_LOCALS(ForeignScan);
>  
>   ReadCommonScan(_node->scan);
>  
> + READ_STRING_FIELD(extnodename);
>   READ_OID_FIELD(fs_server);
>   READ_NODE_FIELD(fdw_exprs);
>   READ_NODE_FIELD(fdw_private);
> @@ -1804,6 +1806,19 @@ _readForeignScan(void)
>   READ_BITMAPSET_FIELD(fs_relids);
>   READ_BOOL_FIELD(fsSystemCol);
>  
> + methods = GetExtensibleNodeMethods(local_node->extnodename, true);
> + if (methods && methods->nodeRead)
> + {
> + ForeignScan*local_temp = palloc0(methods->node_size);
> +
> + Assert(methods->node_size >= sizeof(ForeignScan));
> +
> + memcpy(local_temp, local_node, sizeof(ForeignScan));
> + methods->nodeRead((Node *) local_temp);
> +
> + pfree(local_node);
> + local_node = local_temp;
> + }
>   READ_DONE();
>  }

This imo pretty clearly shows why the 'extensible node' handling should
be delegated to separate routines.


I wonder if we shouldn't, before committing this, at least write a PoC
implementation of actual extensible nodes. I.e. new node types that are
registered at runtime by extensions. ISTM those are relatively closely
linked, and the above doesn't yet support them nicely afaics.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

On Tue, Dec 1, 2015 at 7:21 AM, Amit Kapila  wrote:
> Above and changes in add_path() makes planner not to select parallel path
> for seq scan where earlier it was possible. I think you want to change the
> costing of parallel plans based on rows selected instead of total_cost,
> but there seems to be some problem in the logic (I think gather node is not
> taking into account the reduced cost).

Oops.  The new version I've attached should fix this.  The reason why
I needed to make a change there is because previously the number of
rows estimated for the Parallel Seq Scan was the total number of rows,
not the number of rows per worker.  That doesn't really matter when
we're only doing Parallel Seq Scan, but if you push a join below the
Gather, then the cost of the join won't be computed correctly unless
the row count is the number of rows per worker.

> - There seems to be some inconsistency in Explain's output when
> multiple workers are used.

What is going on here is a bit confusing, but in fact I believe it to
be more correct than what we get with unpatched master.  The problem
has to do with the way that the instrumentation counts loops, and the
way that EXPLAIN displays that information.  In unpatched master,
InstrEndLoop() is not called before the worker instrumentation data is
aggregated to the leader.  Therefore, each node under the Gather ends
up with a loop count of 1.  Unless, of course, it was executed
multiple times in one of the workers, for example because it was on
the inside of a nested loop.  In that case, it ends up with a loop
count equal to the number of times it was executed *minus the number
of workers*.  Therefore, if there are 4 workers and a leader, and
between those 5 processes they  executed the inner side of a nested
loop 1000 times, the final loop count is 996.

With the patch, the loop count is equal to the number of times that
the nodes were actually executed.  Typically, this ends up being equal
to one more than the number of workers, because the leader executes it
and so do all the workers, but it can end up being less if not all
workers execute a particular node.  Of course, it can also be more.
If the node is executed repeatedly, the final loop count is equal to
the total number of times that the node was executed across the leader
and all workers.  So, in the above example, the inner side of a nested
loop would be 1000, not 996, which has the noteworthy advantage of
being correct.

What makes the output a tad confusing is that some but not all fields
in EXPLAIN output are shown as per loop values.  The startup cost,
total cost, and row counts are divided by the number of iterations.  I
have always thought this was a terrible idea: when EXPLAIN tells me
about a nested loop with an inner index scan, I want to know the TOTAL
time spent on that index scan and the TOTAL number of rows returned,
but what I get is the result of dividing those values by the number of
loops and rounded off to a number of decimal places that almost
entirely eliminate the possibility of extracting useful infromation
from the results.  However, I expect to be told that other people
(especially Tom Lane) don't want to change this, and in any case if we
were going to change it I think that would properly be a separate
patch.

So the net result of this is that the times and row counts are
*averages* across all of the loop iterations.  In the case of the
inner side of a nested loop, this means you can read the data just as
you would in a non-parallel plan.  For nodes executed exactly once per
worker plus once in the master, the value displayed ends up being a
per-process average of the amount of time spent, and a per-process
average of the number of rows.  On the other hand, values for buffers
are NOT divided by the loop count, so those values are absolute
totals.  Once you understand this, I think the data is pretty easy to
read.

>->  Gather  (cost=1000.00..46203.83 rows=9579 width=0) (actual
> time=33.983..3
> 3592.030 rows= loops=1)
>  Output: c1, c2
>  Number of Workers: 4
>  Buffers: shared hit=548 read=142506
>  ->  Parallel Seq Scan on public.tbl_parallel_test
> (cost=0.00..44245.93
>  rows=2129 width=0) (actual time=13.447..33354.099 rows=2000 loops=5)
>Output: c1, c2
>Filter: (tbl_parallel_test.c1 < 1)
>Rows Removed by Filter: 198000
>Buffers: shared hit=352 read=142506
>Worker 0: actual time=18.422..33322.132 rows=2170 loops=1
>  Buffers: shared hit=4 read=30765
>Worker 1: actual time=0.803..33283.979 rows=1890 loops=1
>  Buffers: shared hit=1 read=26679
>Worker 2: actual time=0.711..33360.007 rows=1946 loops=1
>  Buffers: shared hit=197 read=30899
>Worker 3: actual time=15.057..33252.605 rows=2145 loops=1
>  Buffers: shared hit=145

Re: [HACKERS] proposal: function parse_ident

2015-12-02 Thread Jim Nasby


On 9/15/15 11:49 PM, Pavel Stehule wrote:

1. processing user input with little bit more comfort - the user doesn't
need to separate schema and table


This is especially useful if you're doing anything that needs to 
dynamically work with different objects. I'd say about 80% of the time 
I'm doing this ::regclass is good enough, but obviously the object has 
to exist then, and ::regclass doesn't separate schema from name.


There's a number of other handy convenience functions/views you can 
create to interface with the catalog, none of which are rocket science. 
But you're pretty screwed if what you want isn't in the catalog yet. (On 
a side note, something my TODO is to restart pg_newsysviews as an 
extension, and then add a bunch of convenience functions on top of 
that... things like relation_info(regclass) RETURNS (everything in 
pg_class, plus other useful bits like nspname), and 
relation_schema(regclass) RETURNS regnamespace).


FWIW, the other function I've wanted in the past that's difficult to 
implement externally is parsing the arguments of a function definition. 
::regprocedure kinda works for this, but it blows up on certain things 
(defaults being one, iirc). I've specifically wanted that capability for 
a function I wrote that made it easy to specify *everything* about a 
function in a single call, including it's permissions and a comment on 
the function. That may sound trivial, but it's a PITA to cut and paste 
the whole argument list into multiple REVOKE/GRANT/COMMENT on 
statements. Even worse, not all the options of CREATE FUNCTION are 
supported in those other commands, so often you can't even just cut and 
paste.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Fwd: [HACKERS] Another little thing about psql wrapped expanded output

2015-12-02 Thread Jeff Janes

Sorry, I initially responded only to Josh.  Forwarding to list:

On Wed, Dec 2, 2015 at 12:37 PM, Josh Berkus  wrote:
> On 12/01/2015 11:20 AM, Tom Lane wrote:
>> Notice that the dashed lines go all the way to the right margin of my
>> 80-column terminal window, even though the data requires no more than
>> 22 columns.  While this doesn't look so awful as-is, when I'm working
>> in a very wide window it starts to look a little silly.
>>
>> The behavior I'd have expected is that if the data is narrower than
>> the window, the lines only go to the right margin of the data.  This
>> is a trivial change to the logic in print_aligned_vertical, but before
>> I go make it, does anyone want to argue that the current behavior is
>> preferable to that?
>
> If you're fixing the dashed-line code, is there a way to say that we
> never have more than a reasonable number of dashes (ideally, the width
> of the terminal) no matter how wide the data is?  Having 4000 dashes
> because of large text on one row is kinda painful, and not at all useful.

If you use \pset format wrapped, then this automatically happens as part of
the data wrapping.

If you use the default format (\pset format aligned) in expanded mode, then
I agree with you we shouldn't print a half screen full of dashes to
separate every tuple.   However, a very simple patch to do exactly what you
want is what was originally submitted, and was rejected in favor of instead
implementing this new feature of \pset format wrapped for expanded mode.
Since it was rejected I was reluctant to bring it up again, but like you, I
do still think it is a good idea.

You could argue that it is the pager's job to deal with this.  If the pager
chooses to wrap the parts that don't fit on the screen, it should truncate
the separators after one line's width rather than wrapping them.  But if it
instead shows you a sliding slice of an infinitely-wide screen, then it
should keep the separators.  But since the pager has no way of knowing that
the dashes are separators and not actual data, the pager can't reasonably
do a good job of that.

For those who can't follow my hand-waving, try this:

psql -c 'select * from pg_stats' -x

Cheers,

Jeff

Re: [HACKERS] broken tests

Pavel Stehule  writes:
> Today I have problem with regress tests on my laptop.

Looks like

https://bugzilla.redhat.com/show_bug.cgi?id=1286692

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: Fwd: [HACKERS] Another little thing about psql wrapped expanded output

Jeff Janes  writes:
> On Wed, Dec 2, 2015 at 12:37 PM, Josh Berkus  wrote:
>> If you're fixing the dashed-line code, is there a way to say that we
>> never have more than a reasonable number of dashes (ideally, the width
>> of the terminal) no matter how wide the data is?  Having 4000 dashes
>> because of large text on one row is kinda painful, and not at all useful.

> If you use the default format (\pset format aligned) in expanded mode, then
> I agree with you we shouldn't print a half screen full of dashes to
> separate every tuple.

Don't think I agree.  Suppose that you have a wider-than-screen table
and you use a pager to scroll left and right in that.  If we shorten the
dashed lines, then once you scroll to the right of wherever they stop,
you lose that visual cue separating the rows.  This matters a lot if
only a few of the column values are very wide: everywhere else, there's
gonna be lots of whitespace.

In the situation you are describing, you've already pretty much lost
user-friendliness of the display, and the only way to get it back is
to use a suitable pager (or make the window bigger, but that only goes
so far).  So I don't think we should optimize the non-pager case at
the expense of the pager case.

It's possible that it'd be worth the trouble to give psql two operating
modes, one for pagers with left-right scroll ability and one for those
without.  But that would be a good deal more work than what I propose
to do at the moment.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

On 3 December 2015 at 04:22, Julian Schauder 
wrote:


> I suggest adding a new state to pg_stat_activity.state for backends that
> are
>
waiting for their synchronous commit to be flushed on the remote host.
>
>
Excellent idea. Anything that improves management and visibility into what
the system is doing like this is really valuable.

I notice that you don't set the 'waiting' flag.  'waiting' is presently
documented as:

   True if this backend is currently waiting on a lock

... but I'm inclined to just widen its definition and set it here, since we
most certainly are waiting, and the column isn't named 'waiting_on_a_lock'.
It shouldn't upset various canned lock monitoring queries people have since
they generally do an inner join on pg_locks anyway.

There are no test changes in this patch, but that's reasonable because we
don't currently have a way to automate tests of sync rep.

I've attached a patch with minor wording/formatting changes, but I think
I'd like to change 'waiting' to true as well. Opinions?

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From e0bcab8ec199e92aaf51ac732275f1a2a5f1eb9f Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Thu, 3 Dec 2015 07:57:59 +0800
Subject: [PATCH] Add 'waiting for replication' state to pg_stat_activity

---
 doc/src/sgml/monitoring.sgml|  5 +
 src/backend/replication/syncrep.c   | 14 +-
 src/backend/utils/adt/pgstatfuncs.c |  3 +++
 src/include/pgstat.h|  1 +
 4 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index e64b7ef..458ae0f 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -642,6 +642,11 @@ postgres   27093  0.0  0.0  30096  2752 ?Ss   11:34   0:00 postgres: ser
  
  
   
+   waiting for synchronous replication: The backend is waiting for its transaction to be flushed on a synchronous standby.
+  
+ 
+ 
+  
idle: The backend is waiting for a new client command.
   
  
diff --git a/src/backend/replication/syncrep.c b/src/backend/replication/syncrep.c
index 325239d..f2bf5e1 100644
--- a/src/backend/replication/syncrep.c
+++ b/src/backend/replication/syncrep.c
@@ -45,7 +45,7 @@
 #include "postgres.h"
 
 #include 
-
+#include 
 #include "access/xact.h"
 #include "miscadmin.h"
 #include "replication/syncrep.h"
@@ -153,6 +153,18 @@ SyncRepWaitForLSN(XLogRecPtr XactCommitLSN)
 	}
 
 	/*
+	 * Alter 'state' in pg_stat before entering the loop.
+	 *
+	 * As with updating the ps display it is safe to assume that we'll wait
+	 * at least for a short time so we just set the state without bothering
+	 * to check if we're really going to have to wait for the standby.
+	 *
+	 * We don't set the 'waiting' flag because that's documented as waiting
+	 * on a lock.
+	 */
+	pgstat_report_activity(STATE_WAITINGFORREPLICATION,NULL);
+
+	/*
 	 * Wait for specified LSN to be confirmed.
 	 *
 	 * Each proc has its own wait latch, so we perform a normal latch
diff --git a/src/backend/utils/adt/pgstatfuncs.c b/src/backend/utils/adt/pgstatfuncs.c
index f7c9bf6..84d67e0 100644
--- a/src/backend/utils/adt/pgstatfuncs.c
+++ b/src/backend/utils/adt/pgstatfuncs.c
@@ -663,6 +663,9 @@ pg_stat_get_activity(PG_FUNCTION_ARGS)
 case STATE_IDLEINTRANSACTION_ABORTED:
 	values[4] = CStringGetTextDatum("idle in transaction (aborted)");
 	break;
+case STATE_WAITINGFORREPLICATION:
+	values[4] = CStringGetTextDatum("waiting for synchronous replication");
+	break;
 case STATE_DISABLED:
 	values[4] = CStringGetTextDatum("disabled");
 	break;
diff --git a/src/include/pgstat.h b/src/include/pgstat.h
index 9ecc163..ab1befc 100644
--- a/src/include/pgstat.h
+++ b/src/include/pgstat.h
@@ -692,6 +692,7 @@ typedef enum BackendState
 	STATE_IDLEINTRANSACTION,
 	STATE_FASTPATH,
 	STATE_IDLEINTRANSACTION_ABORTED,
+	STATE_WAITINGFORREPLICATION,
 	STATE_DISABLED
 } BackendState;
 
-- 
2.1.0


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too old, configured by time

On Thu, Dec 3, 2015 at 5:48 AM, Kevin Grittner wrote:
>> There has been a review but no replies for more than 1 month. Returned
>> with feedback?
>
> I do intend to post another version of the patch to tweak the
> calculations again, after I can get a patch in to expand the
> testing capabilities to allow an acceptable way to test the patch
> -- so I put it into the next CF instead.

OK, thanks.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: add \pset true/false

On Thu, Dec 3, 2015 at 3:10 AM, Jim Nasby  wrote:
> On 11/15/15 7:37 PM, Peter Eisentraut wrote:
>>
>> On 11/15/15 3:20 PM, Jim Nasby wrote:
>>>
>>> As to the argument about displaying a check or an X, why should that
>>> capability only exist for boolean types? For example, why not allow psql
>>> to convert a numeric value into a bar of varying sizes? I've frequently
>>> emulated that with something like SELECT repeat( '*', blah * 30 /
>>> max_of_blah ). I'm sure there's other examples people could think of.
>>
>>
>> Well, why not?  The question there is only how many marginal features
>> you want to stuff into psql, not whether it's the right place to stuff
>> them.
>
>
> I was more thinking it would be nice to be able to temporarily
> over-ride/wrap what an output function is doing. AFAIK that would allow this
> to work everywhere (row(), copy, etc). I don't know of any remotely
> practical way to do that, though.

You can basically do that with a custom data type and at worse a
custom GUC, no? It does not seem worth bothering the backend with an
extra layer to manage the output of a data type.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS]WIP: Covering + unique indexes.

On Tue, Dec 1, 2015 at 7:53 AM, Anastasia Lubennikova
 wrote:
> If we don't need c4 as an index scankey, we don't need any btree opclass on
> it.
> But we still want to have it in covering index for queries like
>
> SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
> SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
>
> The patch "optional_opclass" completely ignores opclasses of included
> attributes.

OK, I don't get it.  Why have an opclass here at all, even optionally?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and multimaster

On 1 December 2015 at 00:20, Konstantin Knizhnik 
wrote:

> We have implemented ACID multimaster based on logical replication and our
DTM (distributed transaction manager) plugin.

What are you using for an output plugin and for replay?

I'd really like to collaborate using pglogical_output if at all possible.
Petr's working really hard to get the pglogical downstrem out too, with me
helping where I can.

I'd hate to be wasting time and effort working in parallel on overlapping
functionality. I did a LOT of work to make pglogical_output extensible and
reusable for different needs, with hooks used heavily instead of making
things specific to the pglogical downstream. A protocol documented in
detail. A json output mode as an option. Parameters for clients to
negotiate options. etc.

Would a different name for the upstream output plugin help?

> And according to 2ndquadrant results, BDR performance is very close to hot
> standby.
>

Yes... but it's asynchronous multi-master. Very different to what you're
doing.

> I wonder if it is principle limitation of logical replication approach
> which is efficient only for asynchronous replication or it can be somehow
> tuned/extended to efficiently support synchronous replication?
>

I'm certain there are improvements to be made for synchronous replication.

We have also considered alternative approaches:
> 1. Statement based replication.
>

Just don't go there. Really.

> It seems to be better to have one connection between nodes, but provide
> parallel execution of received transactions at destination side.

I agree. This is something I'd like to be able to do through logical
decoding. As far as I can tell there's no fundamental barrier to doing so,
though there are a few limitations when streaming logical xacts:

- We can't avoid sending transactions that get rolled back

- We can't send the commit timestamp, commit LSN, etc at BEGIN time, so
last-update-wins
  conflict resolution can't be done based on commit timestamp

- When streaming, the xid must be in each message, not just in begin/commit.

- The apply process can't use the SPI to apply changes directly since we
can't multiplex transactions. It'll need to use
  shmem to communicate with a pool of workers, dispatching messages to
workers as they arrive. Or it can multiplex
  a set of libpq connections in async mode, which I suspect may prove to be
better.

I've made provision for streaming support in the pglogical_output
extension. It'll need core changes to allow logical decoding to stream
changes though.

Separately, I'd also like to look at decoding and sending sequence
advances, which are something that happens outside transaction boundaries.

> We have now in PostgreSQL some infrastructure for background works, but
> there is still no abstraction of workers pool and job queue which can
> provide simple way to organize parallel execution of some jobs. I wonder if
> somebody is working now on it or we should try to propose our solution?
>

I think a worker pool would be quite useful to have.

For BDR and for pglogical we had to build an infrastructure on top of
static and dynamic bgworkers. A static worker launches a dynamic bgworker
for each database. The dynamic bgworker for the database looks at
extension-provided user catalogs to determine whether it should launch more
dynamic bgworkers for each connection to a peer node.

Because the bgworker argument is a single by-value Datum the argument
passed is an index into a static shmem array of structs. The struct is
populated with the target database oid (or name, for 9.4, due to bgworker
API limitations) and other info needed to start the worker.

Because registered static and dynamic bgworkers get restarted by the
postmaster after a crash/restart cycle, and the restarted static worker
will register new dynamic workers after restart, we have to jump through
some annoying hoops to avoid duplicate bgworkers. A generation counter is
stored in postmaster memory and incremented on crash recovery then copied
to shmem. The high bits of the Datum argument to the workers embeds the
generation counter. They compare their argument's counter to the one in
shmem and exit if the counter differs, so the relaunched old generation of
workers exits after a crash/restart cycle. See the thread on
BGW_NO_RESTART_ON_CRASH for details.

In pglogical we're instead using BGW_NEVER_RESTART workers and doing
restarts ourselves when needed, ignoring the postmaster's ability to
restart bgworkers when the worker crashes.

It's likely that most projects using bgworkers for this sort of thing will
need similar functionality, so generalizing it into a worker pool API makes
a lot of sense. In the process we could really use API to examine currently
registered and running bgworkers. Interested in collaborating on that?

Another thing I've wanted as part of this work is a way to get a one-time
authentication cookie from the server that can be passed as a

Re: [HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

On 3 December 2015 at 04:22, Julian Schauder 
wrote:

>
> I suggest adding a new state to pg_stat_activity.state for backends that
> are
> waiting for their synchronous commit to be flushed on the remote host.
> I chose 'waiting for synchronous replication' for now.
>
>
I've added this to the next CF:

https://commitfest.postgresql.org/8/436/


-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] psql: add \pset true/false

Hello,

At Thu, 3 Dec 2015 09:24:35 +0900, Michael Paquier  
wrote in 
> On Thu, Dec 3, 2015 at 3:10 AM, Jim Nasby  wrote:
> > On 11/15/15 7:37 PM, Peter Eisentraut wrote:
> > I was more thinking it would be nice to be able to temporarily
> > over-ride/wrap what an output function is doing. AFAIK that would allow this
> > to work everywhere (row(), copy, etc). I don't know of any remotely
> > practical way to do that, though.
> 
> You can basically do that with a custom data type and at worse a
> custom GUC, no? It does not seem worth bothering the backend with an
> extra layer to manage the output of a data type.

How about plugins on psql side? Calling hooked function in
printQuery could do that on psql. Impact on psql itself is
minimized. (Of course code for loading is omitted in the below
but it would also small...)

> --- a/src/bin/psql/print.c
> @@ -3210,6 +3210,10 @@ printQuery(const PGresult *result, const printQueryOpt 
> *opt, FILE *fout, FILE *f
>else
>{
>  cell = PQgetvalue(result, r, c);
> +if (outputplugin)
> +  char *cell = outputplugin(cell, PQftype(result, c),
> +);
>  if (cont.aligns[c] == 'r' && opt->topt.numericLocale)
>  {
>cell = format_numeric_locale(cell);



One problem of this is who wants to do this must write a
program. But someone might write general purpose plugin.


=$ loadlib 'outputhook.so';
=$ select 't'::bool;
 bool 
--
 X
(1 row)

=== outputhook.so
char *
outputhook(char *origcell, type, bool *mustfree)
{
   char *retcell;

   switch (type)
   {
   case BOOLOID:
 retcell = (*origcell == 't' ? 'TUEEE' : "FAAALSE");
 if (*mustfree) free(origcell);
 *mustfree = false;
 break;
   default:
 retcell = origcell;
 break;
   }
   return retcell;
}
=

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and multimaster

On 3 December 2015 at 04:18, Konstantin Knizhnik 
wrote:

> The problem is that transactions are delivered to replica through single
> channel: logical replication slot.
> And while such transaction is waiting acknowledgement from arbiter, it is
> blocking replication channel preventing other (parallel transactions)  from
> been replicated and applied.
>

Streaming interleaved xacts from the slot as discussed in the prior mail
would help there.

You'd continue to apply concurrent work from other xacts, and just handle
commit messages as they arrive, sending the confirmations back through the
DTM API.

> I have implemented pool of background workers. May be it will be useful
> not only for me.
>

Excellent.

It should be possible to make that a separate extension. You can use C
functions from other extensions by exposing a single pg_proc function with
'internal' return type that populates a struct of function pointers for the
API. A single DirectFunctionCall lets you get the API struct. That's how
pglogical_output handles hooks. The main downside is that you can't do that
without a connection to a database with the extension installed so the
pg_proc entry is exposed.

So it could make more sense to just keep it as a separate .c / .h file
that's copied into trees that use it. Simpler and easier, but uglier.

> It consists of one produces-multiple consumers queue implemented using
> buffer in shared memory, spinlock and two semaphores.
>
[snip]

> You just place in this queue some bulk of bytes (work, size), it is placed
> in queue and then first available worker will dequeue it and execute.
>

Very nice.

To handle xact streaming  I think you're likely to need a worker dispatch
key too, where the dispatch keys are "sticky" to a given worker. So you
assign xid 1231 to a worker at BEGIN. Send all work to the pool and
everything with xid 1231 goes to that worker. At commit you clear the
assignment of xis 1231.

Alternately a variant of the Execute method that lets you dispatch to a
specific worker would do the job.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] find_inheritance_children() and ALTER TABLE NO INHERIT

Amit Langote  writes:
> Currently find_inheritance_children() is smart enough to skip a child
> table that it finds has been dropped concurrently after it gets a lock on
> the same. It does so by looking up the child relid in syscache. It seems
> it should also check if the table is still in the list of children of the
> parent. Doing so by scanning the pg_inherits(inhparent) index may likely
> be inefficient. So, how about adding that syscache on
> pg_inherits(inherelid, inhparent) [1]?

I doubt that a syscache would fix the performance issue there; it wouldn't
get referenced enough to be likely to have the desired tuple in cache.

I wonder whether we could improve matters by rechecking validity of the
pg_inherits tuple (which we saw already and could presumably retain the
TID of).  There is at least one place where we do something like that now,
IIRC.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)

Hi Ashutosh,

On 2015/12/02 20:45, Ashutosh Bapat wrote:

It's been a long time since last patch on this thread was posted. I have
started
to work on supporting join pushdown for postgres_fdw.

Thanks for the work!

Generating paths

A join between two foreign relations is considered safe to push down if
1. The joining sides are pushable
2. The type of join is OUTER or INNER (LEFT/RIGHT/FULL/INNER). SEMI and ANTI
joins are not considered right now, because of difficulties in
constructing
the queries involving those. The join clauses of SEMI/ANTI joins are
not in a
form that can be readily converted to IN/EXISTS/NOT EXIST kind of
expression.
We might consider this as future optimization.
3. Joining sides do not have clauses which can not be pushed down to the
foreign
server. For an OUTER join this is important since those clauses need
to be
applied before performing the join and thus join can not be pushed
to the
foreign server. An example is
SELECT * FROM ft1 LEFT JOIN (SELECT * FROM ft2 where local_cond) ft2
ON (join clause)
Here the local_cond on ft2 needs to be executed before performing
LEFT JOIN
between ft1 and ft2.
This condition can be relaxed for an INNER join by pulling the local
clauses
up the join tree. But this needs more investigation and is not
considered in
this version.
4. The join conditions (e.g. conditions in ON clause) are all safe to
push down.
This is important for OUTER joins as pushing down join clauses
partially and
applying rest locally changes the result. There are ways [1] by
which partial
OUTER join can be completed by applying unpushable clauses locally
and then
nullifying the nullable side and eliminating duplicate non-nullable side
rows. But that's again out of scope of first version of postgres_fdw
join
pushdown.

As for 4, as commented in the patch, we could relax the requirement that
all the join conditions (given by JoinPathExtraData's restrictlist) need
to be safe to push down to the remote server;

* In case of inner join, all the conditions would not need to be safe.
* In case of outer join, all the "otherclauses" would not need to be
safe, while I think all the "joinclauses" need to be safe to get the
right results (where "joinclauses" and "otherclauses" are defined by
extract_actual_join_clauses). And I think we should do this relaxation
to some extent for 9.6, to allow more joins to be pushed down. I don't
know about [1]. May I see more information about [1]?

Generating plan
===

Rest of this section describes the logic to construct
the SQL
for join; the logic is implemented as function deparseSelectSqlForRel().

deparseSelectSqlForRel() builds the SQL for given joinrel (and now for
baserel
asd well) recursively.
For joinrels
1. it constructs SQL representing either side of join, by calling itself
in recursive fashion.
2. These SQLs are converted into subqueries and become part of the FROM
clause
with appropriate JOIN type and clauses. The left and right
subqueries are
given aliases "l" and "r" respectively. The columns in each subquery are
aliased as "a1", "a2", "a3" and so on. Thus the third column on left
side can
be referenced as "l.a3" at any recursion level.
3. Targetlist is added representing the columns in the join result
expected at
that level.
4. The join clauses are added as part of ON clause
5. Any clauses that planner has deemed fit to be evaluated at that level
of join
are added as part of WHERE clause.

Honestly, I'm not sure that that is a good idea. One reason for that is
that a query string constructed by the procedure is difficult to read
especially when the procedure is applied recursively. So, I'm thinking
to revise the procedure so as to construct a query string with a
flattened FROM clause, as discussed in eg, [2].

TODOs
=
This patch is very much WIP patch to show case the approach and invite early
comments. I will continue to improve the patch and some of the areas
that will
be improved are
1. Costing of foreign join paths.
2. Various TODOs in the patch, making it more readable, finishing etc.
3. Tests
4. Any comments/suggestions on approach or the attached patch.

That would be great!

In another thread Robert, Fujita-san and Kaigai-san are discussing about
EvalPlanQual support for foreign joins. Corresponding changes to
postgres_fdw
will need to be added once those changes get committed.

Yeah, we would need those changes including helper functions to create a
local join execution plan for that support. I'd like to add those
changes to your updated patch if it's okay.

Best regards,
Etsuro Fujita

[2]
http://www.postgresql.org/message-id/ca+tgmozh9pb8bc+z3re7wo8cwuxaf7vp3066isg39qfr1jj...@mail.gmail.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-02 Thread David Rowley

On 3 December 2015 at 19:24, Haribabu Kommi 
wrote:

> On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
>  wrote:
> >
> > Hi,
> >
> > I just wanted to cross post here to mark that I've posted an updated
> patch
> > for combining aggregate states:
> >
> http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=t271xbymzjrzwqbjeixiqrf-olh_u...@mail.gmail.com
> >
> > I also wanted to check if you've managed to make any progress on Parallel
> > Aggregation? I'm very interested in this myself and would like to
> progress
> > with it, if you're not already doing so.
>
> Yes, the parallel aggregate basic patch is almost ready.
> This patch is based on your earlier combine state patch.
> I will post it to community with in a week or so.
>

That's great news!

Also note that there's some bug fixes in the patch I just posted on the
other thread for combining aggregate states:

For example:  values[Anum_pg_aggregate_aggcombinefn - 1] =
ObjectIdGetDatum(combinefn);
was missing from AggregateCreate().

It might be worth diffing to the updated patch just to pull in anything
else that's changed.

 --
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: Fwd: [HACKERS] Another little thing about psql wrapped expanded output

2015-12-02 Thread David Fetter

On Wed, Dec 02, 2015 at 10:36:56PM -0500, Josh Berkus wrote:
> On 12/02/2015 05:24 PM, Tom Lane wrote:
> > Don't think I agree.  Suppose that you have a wider-than-screen table
> > and you use a pager to scroll left and right in that.  If we shorten the
> > dashed lines, then once you scroll to the right of wherever they stop,
> > you lose that visual cue separating the rows.  This matters a lot if
> > only a few of the column values are very wide: everywhere else, there's
> > gonna be lots of whitespace.
> 
> What pager lets me scroll right infinitely?  Because I wanna install that.

I don't know about infinitely, but at least with the -S (no wrap)
option, less lets you use left- and right-arrow, prefixed by
multipliers, if you like, to scroll horizontally

And now I have learned something new about a pager I've used every day
for decades.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

Hello,

At Thu, 3 Dec 2015 14:18:50 +0900, Amit Langote  
wrote in <565fd0ba.5020...@lab.ntt.co.jp>
> On 2015/12/03 13:47, Kyotaro HORIGUCHI wrote:
> > At Wed, 2 Dec 2015 15:48:20 -0300, Alvaro Herrera 
> >  wrote
> >>
> >> Actually, do we really need to have the table name as a string at all
> >> here?  Why not just report the table OID?  Surely whoever wants to check
> >> the progress can connect to the database in question to figure out the
> >> table name.
> > 
> > I thought the same thing but found that the same kind of view
> > (say, pg_stat_user_tables) has separate relanme and shcemaname in
> > string (not a qualified name, though).
> > 
> > Apart from the representation of the relation, OID would be
> > better as a field in beentry.
> 
> I wonder if the field should be a standalone field or as yet another
> st_progress_* array?
> 
> IMHO, there are some values that a command would report that should not be
> mixed with pgstat_report_progress()'s interface. That is, things like
> command ID/name, command target (table name or OID) should not be mixed
> with actual progress parameters like num_pages, num_indexes (integers),
> processing "phase" (string) that are shared via st_progress_* fields. The
> first of them  already has its own reporting interface in proposed patch
> in the form of pgstat_report_activity_flag(). Although, we must be careful
> to choose these interfaces carefully.

Sorry I misunderstood the patch.

Agreed. The patch already separates integer values and texts.
And re-reviewing the patch, there's no fields necessary to be
passed as string.

total_heap_pages, scanned_heap_pages, total_index_pages,
scanned_index_pages, vacrelstats->num_index_scans are currently
in int32.

Phase can be in integer, and schema_name and relname can be
represented by one OID, uint32.

Finally, *NO* text field is needed at least this usage. So
progress_message is totally useless regardless of other usages
unknown to us.

Am I missing somethig?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

On Thu, Dec 3, 2015 at 6:59 AM, Alvaro Herrera wrote:
> I didn't push the changed for config_default you requested a few
> messages upthread; it's not clear to me how setting it to undef affects
> the whole thing.  If setting it to undef makes the MSVC toolchain run
> the tap tests in the default config, then I can do it; let's be clear
> about what branch to backpatch this to.  Also the "1;" at the end of
> RewindTest.


Setting it to undef will prevent the tests to run, per vcregress.pl:
die "Tap tests not enabled in configuration"
  unless $config->{tap_tests};
Also, setting it to undef will match the existing behavior on
platforms where ./configure is used because the switch
--enable-tap-tests needs to be used there. And I would believe that in
most cases Windows environments are not going to have IPC::Run
deployed.

I have also rebased the recovery test suite as the attached, using the
infrastructure that has been committed lately.
Regards,
--
Michael
From de2121eeb50c5ae49b29a2ac21a16579eae2de98 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 3 Dec 2015 13:48:48 +0900
Subject: [PATCH 1/2] Fix tap_test configuration in MSVC builds

---
 src/tools/msvc/config_default.pl | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/src/tools/msvc/config_default.pl b/src/tools/msvc/config_default.pl
index b9f2ff4..e50be7e 100644
--- a/src/tools/msvc/config_default.pl
+++ b/src/tools/msvc/config_default.pl
@@ -3,7 +3,7 @@ use strict;
 use warnings;
 
 our $config = {
-	asserts => 0,# --enable-cassert
+	asserts  => 0,# --enable-cassert
 	  # integer_datetimes=>1,   # --enable-integer-datetimes - on is now default
 	  # float4byval=>1, # --disable-float4-byval, on by default
 
@@ -13,18 +13,19 @@ our $config = {
 	# blocksize => 8, # --with-blocksize, 8kB by default
 	# wal_blocksize => 8, # --with-wal-blocksize, 8kB by default
 	# wal_segsize => 16,  # --with-wal-segsize, 16MB by default
-	ldap => 1,# --with-ldap
-	extraver => undef,# --with-extra-version=
-	nls  => undef,# --enable-nls=
-	tcl  => undef,# --with-tls=
-	perl => undef,# --with-perl
-	python   => undef,# --with-python=
-	openssl  => undef,# --with-openssl=
-	uuid => undef,# --with-ossp-uuid
-	xml  => undef,# --with-libxml=
-	xslt => undef,# --with-libxslt=
-	iconv=> undef,# (not in configure, path to iconv)
-	zlib => undef # --with-zlib=
+	ldap  => 1,# --with-ldap
+	extraver  => undef,# --with-extra-version=
+	nls   => undef,# --enable-nls=
+	tap_tests => undef,# --enable-tap-tests
+	tcl   => undef,# --with-tls=
+	perl  => undef,# --with-perl
+	python=> undef,# --with-python=
+	openssl   => undef,# --with-openssl=
+	uuid  => undef,# --with-ossp-uuid
+	xml   => undef,# --with-libxml=
+	xslt  => undef,# --with-libxslt=
+	iconv => undef,# (not in configure, path to iconv)
+	zlib  => undef # --with-zlib=
 };
 
 1;
-- 
2.6.3

From 4ee2a99db2a414f85e961110279f4b97309cd927 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 3 Dec 2015 15:41:12 +0900
Subject: [PATCH 2/2] Add recovery test suite

This includes basic tests maipulating standbys, be they archiving or
streaming nodes, and some basic sanity checks around them.
---
 src/test/Makefile   |   2 +-
 src/test/perl/RecoveryTest.pm   | 167 
 src/test/recovery/.gitignore|   3 +
 src/test/recovery/Makefile  |  17 +++
 src/test/recovery/README|  19 
 src/test/recovery/t/001_stream_rep.pl   |  58 ++
 src/test/recovery/t/002_archiving.pl|  43 +++
 src/test/recovery/t/003_recovery_targets.pl | 123 
 src/test/recovery/t/004_timeline_switch.pl  |  66 +++
 src/test/recovery/t/005_replay_delay.pl |  41 +++
 10 files changed, 538 insertions(+), 1 deletion(-)
 create mode 100644 src/test/perl/RecoveryTest.pm
 create mode 100644 src/test/recovery/.gitignore
 create mode 100644 src/test/recovery/Makefile
 create mode 100644 src/test/recovery/README
 create mode 100644 src/test/recovery/t/001_stream_rep.pl
 create mode 100644 src/test/recovery/t/002_archiving.pl
 create mode 100644 src/test/recovery/t/003_recovery_targets.pl
 create mode 100644 src/test/recovery/t/004_timeline_switch.pl
 create mode 100644 src/test/recovery/t/005_replay_delay.pl

diff --git a/src/test/Makefile b/src/test/Makefile
index b713c2c..7f7754f 100644
--- a/src/test/Makefile
+++ b/src/test/Makefile
@@ -12,7 +12,7 @@ subdir = src/test
 top_builddir = ../..
 include $(top_builddir)/src/Makefile.global
 
-SUBDIRS = regress isolation modules
+SUBDIRS = regress isolation modules recovery
 
 # We don't build or execute examples/, locale/, or thread/ by

Re: [HACKERS] Logical replication and multimaster

2015-12-02 Thread konstantin knizhnik


On Dec 3, 2015, at 4:09 AM, Craig Ringer wrote:

> On 1 December 2015 at 00:20, Konstantin Knizhnik  
> wrote:
> 
> > We have implemented ACID multimaster based on logical replication and our 
> > DTM (distributed transaction manager) plugin.
> 
> What are you using for an output plugin and for replay?

I have implemented output plugin for multimaster based on Michael's 
decoder_raw+receiver_raw.
Right now it decodes WAL into correspondent SQL insert/update statements.
Certainly it is very inefficient way and in future I will replace it with some 
binary protocol, as it is used for example in BDR
(but BDR plugin contains a lot of stuff related with detecting and handling 
conflicts which is not relevant for multimaster).
But right now performance of Multimaster is not limited by logical replication 
protocol - if I remove DTM and use asynchronous replication (lightweight 
version of BDR:)
then I get 38k TPS instead of 12k.


> 
> I'd really like to collaborate using pglogical_output if at all possible. 
> Petr's working really hard to get the pglogical downstrem out too, with me 
> helping where I can.
> 
> I'd hate to be wasting time and effort working in parallel on overlapping 
> functionality. I did a LOT of work to make pglogical_output extensible and 
> reusable for different needs, with hooks used heavily instead of making 
> things specific to the pglogical downstream. A protocol documented in detail. 
> A json output mode as an option. Parameters for clients to negotiate options. 
> etc.
> 
> Would a different name for the upstream output plugin help?


And where I can get  pglogical_output plugin? Sorry, but I can't quickly find 
reference with Google...
Also I wonder if this plugin perform DDL replication (most likely not). But 
then naive question - why DDL was excluded from logical replication protocol?
Are there some principle problems with it? In BDR it was handled in alternative 
way, using executor callback. It will be much easier if DDL can be replicated 
in the same way as normal SQL statements.


>  
> And according to 2ndquadrant results, BDR performance is very close to hot 
> standby.
> 
> Yes... but it's asynchronous multi-master. Very different to what you're 
> doing.
>  
> I wonder if it is principle limitation of logical replication approach which 
> is efficient only for asynchronous replication or it can be somehow 
> tuned/extended to efficiently support synchronous replication?
> 
> I'm certain there are improvements to be made for synchronous replication.
> 
> We have also considered alternative approaches:
> 1. Statement based replication.
> 
> Just don't go there. Really.
>  
> It seems to be better to have one connection between nodes, but provide 
> parallel execution of received transactions at destination side.
> 
> I agree. This is something I'd like to be able to do through logical 
> decoding. As far as I can tell there's no fundamental barrier to doing so, 
> though there are a few limitations when streaming logical xacts:
> 
> - We can't avoid sending transactions that get rolled back
> 
> - We can't send the commit timestamp, commit LSN, etc at BEGIN time, so 
> last-update-wins
>   conflict resolution can't be done based on commit timestamp
> 
> - When streaming, the xid must be in each message, not just in begin/commit.
> 
> - The apply process can't use the SPI to apply changes directly since we 
> can't multiplex transactions. It'll need to use
>   shmem to communicate with a pool of workers, dispatching messages to 
> workers as they arrive. Or it can multiplex
>   a set of libpq connections in async mode, which I suspect may prove to be 
> better.
> 
> I've made provision for streaming support in the pglogical_output extension. 
> It'll need core changes to allow logical decoding to stream changes though.
> 
> Separately, I'd also like to look at decoding and sending sequence advances, 
> which are something that happens outside transaction boundaries.
> 
>  
> We have now in PostgreSQL some infrastructure for background works, but there 
> is still no abstraction of workers pool and job queue which can provide 
> simple way to organize parallel execution of some jobs. I wonder if somebody 
> is working now on it or we should try to propose our solution?
> 
> I think a worker pool would be quite useful to have.
> 
> For BDR and for pglogical we had to build an infrastructure on top of static 
> and dynamic bgworkers. A static worker launches a dynamic bgworker for each 
> database. The dynamic bgworker for the database looks at extension-provided 
> user catalogs to determine whether it should launch more dynamic bgworkers 
> for each connection to a peer node.
> 
> Because the bgworker argument is a single by-value Datum the argument passed 
> is an index into a static shmem array of structs. The struct is populated 
> with the target database oid (or name, for 9.4, due to bgworker API 
> limitations) and other info needed to

Re: [HACKERS] proposal: add 'waiting for replication' to pg_stat_activity.state

On 3 December 2015 at 09:32, Peter Eisentraut  wrote:

> On 12/2/15 7:00 PM, Craig Ringer wrote:
> > I notice that you don't set the 'waiting' flag.  'waiting' is presently
> > documented as:
> >
> >True if this backend is currently waiting on a lock
> >
> > ... but I'm inclined to just widen its definition and set it here, since
> > we most certainly are waiting, and the column isn't named
> > 'waiting_on_a_lock'. It shouldn't upset various canned lock monitoring
> > queries people have since they generally do an inner join on pg_locks
> > anyway.
>
> I'm not so sure about that assumption.
>

Even if it's an outer join, the worst that'll happen is that they'll get
entries with nulls in pg_locks. I don't think it's worth worrying about too
much.

We could always mitigate it by adding a pg_lock_status view to the system
catalogs with a decent canned query over pg_stat_activity and pg_locks, so
people can stop copying & pasting from the wiki or using buggy homebrew
queries ;)

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

Hello,

At Wed, 2 Dec 2015 15:48:20 -0300, Alvaro Herrera  
wrote in <20151202184820.GL2763@alvherre.pgsql>
> Vinayak wrote:
> 
> > In the vacuum progress, column table_name is showing first 30 characters of
> > table name.
> > postgres=# create table test_vacuum_progress_in_postgresql(c1 int,c2 text);
> > postgres=# select * from pg_stat_vacuum_progress ;
> > -[ RECORD 1 ]---+--
> > pid | 12284
> > table_name  | public.test_vacuum_progress_i
> 
> Actually, do we really need to have the table name as a string at all
> here?  Why not just report the table OID?  Surely whoever wants to check
> the progress can connect to the database in question to figure out the
> table name.

I thought the same thing but found that the same kind of view
(say, pg_stat_user_tables) has separate relanme and shcemaname in
string (not a qualified name, though).

Apart from the representation of the relation, OID would be
better as a field in beentry.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-02 Thread David Rowley

On 20 October 2015 at 23:23, David Rowley 
wrote:

> On 13 October 2015 at 20:57, Haribabu Kommi 
> wrote:
>
>> On Tue, Oct 13, 2015 at 5:53 PM, David Rowley
>>  wrote:
>> > On 13 October 2015 at 17:09, Haribabu Kommi 
>> > wrote:
>> >>
>> >> On Tue, Oct 13, 2015 at 12:14 PM, Robert Haas 
>> >> wrote:
>> >> > Also, I think the path for parallel aggregation should probably be
>> >> > something like FinalizeAgg -> Gather -> PartialAgg -> some partial
>> >> > path here.  I'm not clear whether that is what you are thinking or
>> >> > not.
>> >>
>> >> No. I am thinking of the following way.
>> >> Gather->partialagg->some partial path
>> >>
>> >> I want the Gather node to merge the results coming from all workers,
>> >> otherwise
>> >> it may be difficult to merge at parent of gather node. Because in case
>> >> the partial
>> >> group aggregate is under the Gather node, if any of two workers are
>> >> returning
>> >> same group key data, we need to compare them and combine it to make it
>> a
>> >> single group. If we are at Gather node, it is possible that we can
>> >> wait till we get
>> >> slots from all workers. Once all workers returns the slots we can
>> compare
>> >> and merge the necessary slots and return the result. Am I missing
>> >> something?
>> >
>> >
>> > My assumption is the same as Robert's here.
>> > Unless I've misunderstood, it sounds like you're proposing to add logic
>> into
>> > the Gather node to handle final aggregation? That sounds like a
>> modularity
>> > violation of the whole node concept.
>> >
>> > The handling of the final aggregate stage is not all that different
>> from the
>> > initial aggregate stage. The primary difference is just that your
>> calling
>> > the combine function instead of the transition function, and the values
>>
>> Yes, you are correct, till now i am thinking of using transition types as
>> the
>> approach, because of that reason only I proposed it as Gather node to
>> handle
>> the finalize aggregation.
>>
>> > being aggregated are aggregates states rather than the type of the
>> values
>> > which were initially aggregated. The handling of GROUP BY is all the
>> same,
>> > yet you only apply the HAVING clause during final aggregation. This is
>> why I
>> > ended up implementing this in nodeAgg.c instead of inventing some new
>> node
>> > type that's mostly a copy and paste of nodeAgg.c [1]
>>
>> After going through your Partial Aggregation / GROUP BY before JOIN patch,
>> Following is my understanding of parallel aggregate.
>>
>> Finalize [hash] aggregate
>> -> Gather
>>   -> Partial [hash] aggregate
>>
>> The data that comes from the Gather node contains the group key and
>> grouping results.
>> Based on these we can generate another hash table in case of hash
>> aggregate at
>> finalize aggregate and return the final results. This approach works
>> for both plain and
>> hash aggregates.
>>
>> For group aggregate support of parallel aggregate, the plan should be
>> as follows.
>>
>> Finalize Group aggregate
>> ->sort
>> -> Gather
>>   -> Partial group aggregate
>>->sort
>>
>> The data that comes from Gather node needs to be sorted again based on
>> the grouping key,
>> merge the data and generates the final grouping result.
>>
>> With this approach, we no need to change anything in Gather node. Is
>> my understanding correct?
>>
>>
> Our understandings are aligned.
>
>
Hi,

I just wanted to cross post here to mark that I've posted an updated patch
for combining aggregate states:
http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=t271xbymzjrzwqbjeixiqrf-olh_u...@mail.gmail.com

I also wanted to check if you've managed to make any progress on Parallel
Aggregation? I'm very interested in this myself and would like to progress
with it, if you're not already doing so.

My current thinking is that most of the remaining changes required for
parallel aggregation, after applying the combine aggregate state patch,
will be in the exact area that Tom will be making changes for the upper
planner path-ification work. I'm not all that certain if we should hold off
for that or not.

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Postgres_FDW optimizations

2015-12-02 Thread Ashutosh Bapat

On Thu, Dec 3, 2015 at 12:55 AM, cevian  wrote:

> Hi all,
>
> I have a question about postgres_fdw optimizations/pushdown:
>
> I have the following code running on 9.5beta2 (same format as
> previous/related message for consistency)
> CREATE EXTENSION postgres_fdw;
> CREATE SERVER loop foreign data wrapper postgres_fdw
>   OPTIONS (port '5432', dbname 'testdb');
> CREATE USER MAPPING FOR PUBLIC SERVER loop;
>
> create table onemillion (
> id serial primary key,
> inserted timestamp default clock_timestamp(),
> data text
> );
>
> insert into onemillion(data) select random() from
> generate_series(1,100);
>
> CREATE FOREIGN TABLE onemillion_pgfdw (
> id int,
> inserted timestamp,
> data text
> ) SERVER loop
> OPTIONS (table_name 'onemillion',
>  use_remote_estimate 'true');
>
> explain verbose select * from onemillion_pgfdw order by id limit 1;
>  QUERY PLAN
>
> 
>  Limit  (cost=43434.00..43434.00 rows=1 width=30)
>Output: id, inserted, data
>->  Sort  (cost=43434.00..45934.00 rows=100 width=30)
>  Output: id, inserted, data
>  Sort Key: onemillion_pgfdw.id
>  ->  Foreign Scan on public.onemillion_pgfdw
> (cost=100.00..38434.00
> rows=100 width=30)
>Output: id, inserted, data
>Remote SQL: SELECT id, inserted, data FROM public.onemillion
>
> This is obviously highly inefficient. The sort and limit should be pushed
> down to the foreign node, especially on such a simple query. I have 3
> questions:
>
>
The patch for sort pushdown was committed few weeks ago and will be
available in 9.6. Let me know what you see when you execute the query on
current developement branch. Limit support will be added eventually, but
the timeline is not clear yet.


> 1) Is this the expected stated of the fdw optimizations for now, or is it a
> bug?
>

For 9.2 this is expected behaviour and not a bug.


> 2) Is anybody working on this type of pushdown right now (I would be more
> than willing to collaborate on a patch)
>

There are few people working on this. You will see Hanada-san,
Horiguchi-san, Fujita-san, myself and Robert working on it mostly. But
there are other contributors too, so forgive me if I have missed any. You
are welcome to join hands. Right now we are concentrating on join pushdown,
DML pushdown and asynchronous query execution.


> 3) Is this possible to fix with with views/rules/triggers/different query.
> I
> couldn't find a way. Relatedly, is there a way to explicitly specify an
> explicit remote query to run through the fdw?
>

dblink module can be of help here.


>
> Thanks,
> Matvey Arye
> Iobeam, Inc.
>
>
>
>
> --
> View this message in context:
> http://postgresql.nabble.com/Postgres-FDW-optimizations-tp5875911.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company

Re: [HACKERS] Logical replication and multimaster

On 3 December 2015 at 14:54, konstantin knizhnik 
wrote:

>
> I'd really like to collaborate using pglogical_output if at all possible.
> Petr's working really hard to get the pglogical downstrem out too, with me
> helping where I can.
>
> And where I can get  pglogical_output plugin? Sorry, but I can't quickly
> find reference with Google...
>

It's been submitted to this CF.

https://commitfest.postgresql.org/7/418/

https://github.com/2ndQuadrant/postgres/tree/dev/pglogical-output

Any tests and comments would be greatly appreciated.

I have a version compatible with 9.4 and older in a separate tree I want to
make public. I'll get back to you on that later today. It's  the same code
with a few more ifdefs and an uglier structure for the example hooks module
(because it can be a separate contrib)¸so it's not that exciting.

You should be able to just "git remote add" that repo, "git fetch" and "git
merge dev/pglogical-output" into your working tree.

> Also I wonder if this plugin perform DDL replication (most likely not).
>

No, it doesn't. The way it's done in BDR is too intrusive and has to be
reworked before it can be made more generally re-usable.

How I envision DDL replication working for pglogical (or anything else) is
to take the DDL hooks added in 9.5 and use them with a separate DDL deparse
extension based on Álvaro's deparse work. If you want to replicate DDL you
make sure this extension is loaded then use it from your event triggers to
capture DDL in a useful form and write it to a queue table where your
downstream client can find it and consume it. That way the deparse code
doesn't have to be embedded in the Pg backend like it is in BDR, and
instead can be a reusable extension.

But then naive question - why DDL was excluded from logical replication
> protocol?
>

logical decoding can't handle DDL because all it sees is the effects of
that DDL in the xlog as a series of changes to catalog tables, relfilenode
changes, etc. It can't turn that back into the original DDL in any kind of
reliable way. A downstream can't do very much with "rename relfilenode 1231
to 1241".

There are a few cases we might want to handle through decoding - in
particular I'd like to be able to decode changes to rows in shared catalogs
like pg_authid, since we can't handle that with DDL deparse. For things
like DROP TABLE, CREATE TABLE, etc we really need DDL hooks. At least as I
currently understand things.

So we try to capture DDL at a higher level. That's why event triggers were
added (http://www.postgresql.org/docs/current/static/event-triggers.html)
and why DDL deparse was implemented (
https://commitfest-old.postgresql.org/action/patch_view?id=1610).

You can't just capture the raw DDL statement since there are issues with
search_path normalization, etc. Similar problems to statement based
replication exist. Deparse is required to get the DDL after it's converted
to a utility statement so we can obtain it in an unambiguous form.

I'll add some explanation in pglogical_output's DESIGN.md for why DDL is
not currently handled.

BTW, TRUNCATE _is_ handled by the way. In pglogical we use regular TRUNCATE
triggers (marked tgisinternal) for that. There are some significant
complexities around foreign keys, sequence reset, etc, which are not fully
handled yet.

Are there some principle problems with it? In BDR it was handled in
> alternative way, using executor callback. It will be much easier if DDL can
> be replicated in the same way as normal SQL statements.
>

It can't. I wish it could.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Fwd: [HACKERS] Another little thing about psql wrapped expanded output

2015-12-02 Thread Josh Berkus

On 12/02/2015 05:24 PM, Tom Lane wrote:
> Don't think I agree.  Suppose that you have a wider-than-screen table
> and you use a pager to scroll left and right in that.  If we shorten the
> dashed lines, then once you scroll to the right of wherever they stop,
> you lose that visual cue separating the rows.  This matters a lot if
> only a few of the column values are very wide: everywhere else, there's
> gonna be lots of whitespace.

What pager lets me scroll right infinitely?  Because I wanna install that.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: add \pset true/false

On Thu, Dec 3, 2015 at 2:53 PM, Kyotaro HORIGUCHI
 wrote:
> The attached patch adds a function to load output filter DLL.
> The second file is an example filter module. The following
> commandline with the file can create a test filter module. I
> suppose preload feature only needs additional few lines.

 #include "print.h"
+#include "dlload.h"

I guess that your environment is on Windows... My guess is that you
would need a facility similar to what is in
src/backend/port/dynloader/ but for frontends, and this is not really
worth the move just for this particularly type enforcement in psql.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers

2015-12-02 Thread Amit Kapila

On Wed, Dec 2, 2015 at 8:59 PM, Robert Haas  wrote:
>
>
> I think the approach this patch takes is pretty darned strange, and
> almost certainly not what we want.  What you're doing here is putting
> every outstanding CLOG-update request into a linked list, and then the
> leader goes and does all of those CLOG updates.  But there's no
> guarantee that the pages that need to be updated are even present in a
> CLOG buffer.  If it turns out that all of the batched CLOG updates are
> part of resident pages, then this is going to work great, just like
> the similar ProcArrayLock optimization.  But if the pages are not
> resident, then you will get WORSE concurrency and SLOWER performance
> than the status quo.  The leader will sit there and read every page
> that is needed, and to do that it will repeatedly release and
> reacquire CLogControlLock (inside SimpleLruReadPage).  If you didn't
> have a leader, the reads of all those pages could happen at the same
> time, but with this design, they get serialized.  That's not good.
>

I think the way to address is don't add backend to Group list if it is
not intended to update the same page as Group leader.  For transactions
to be on different pages, they have to be 32768 transactionid's far apart
and I don't see much possibility of that happening for concurrent
transactions that are going to be grouped.

> My idea for how this could possibly work is that you could have a list
> of waiting backends for each SLRU buffer page.
>

Won't this mean that first we need to ensure that page exists in one of
the buffers and once we have page in SLRU buffer, we can form the
list and ensure that before eviction, the list must be processed?
If my understanding is right, then for this to work we need to probably
acquire CLogControlLock in Shared mode in addition to acquiring it
in Exclusive mode for updating the status on page and performing
pending updates for other backends.

>
> I agree with Simon that it's probably a good idea for this
> optimization to handle cases where a backend has a non-overflowed list
> of subtransactions.  That seems doable.
>

Agreed and I have already handled it in the last version of patch posted
by me.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] psql: add \pset true/false

Hello, it's disappointing.

At Thu, 3 Dec 2015 15:48:50 +0900, Michael Paquier  
wrote in 
> On Thu, Dec 3, 2015 at 2:53 PM, Kyotaro HORIGUCHI
>  wrote:
> > The attached patch adds a function to load output filter DLL.
> > The second file is an example filter module. The following
> > commandline with the file can create a test filter module. I
> > suppose preload feature only needs additional few lines.
> 
>  #include "print.h"
> +#include "dlload.h"
> 
> I guess that your environment is on Windows... My guess is that you
> would need a facility similar to what is in
> src/backend/port/dynloader/ but for frontends, and this is not really
> worth the move just for this particularly type enforcement in psql.

My environment is CentOS7. But completely forgot Windows
environment (or other platforms). I agree with you. It will
indeed too much considering all possible platforms.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Aggregate

2015-12-02 Thread Haribabu Kommi

On Thu, Dec 3, 2015 at 4:18 PM, David Rowley
 wrote:
>
> Hi,
>
> I just wanted to cross post here to mark that I've posted an updated patch
> for combining aggregate states:
> http://www.postgresql.org/message-id/CAKJS1f9wfPKSYt8CG=t271xbymzjrzwqbjeixiqrf-olh_u...@mail.gmail.com
>
> I also wanted to check if you've managed to make any progress on Parallel
> Aggregation? I'm very interested in this myself and would like to progress
> with it, if you're not already doing so.

Yes, the parallel aggregate basic patch is almost ready.
This patch is based on your earlier combine state patch.
I will post it to community with in a week or so.

Regards,
Hari Babu
Fujitsu Australia


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Refactor Perl test code

On Thu, Dec 3, 2015 at 12:50 PM, Tom Lane  wrote:
> Michael Paquier  writes:
>> On Thu, Dec 3, 2015 at 12:19 PM, Tom Lane  wrote:
>>> BTW, not the fault of this patch in particular, but this example points
>>> up the complaint I've had right along about how opaque TAP test failures
>>> are.  How did you dig down to see that error message?
>
>> Well, it showed up on my terminal...
>
> Not on mine, as per the extract I showed.  Probably a Perl version
> difference, but I don't think we can exactly write off RHEL6 as an
> uninteresting obsolete distribution.  (The Perl here is 5.10.1.)

Possible. On OSX 10.8:
$ perl --version

This is perl 5, version 16, subversion 2 (v5.16.2) built for
darwin-thread-multi-2level
(with 3 registered patches, see perl -V for more detail
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] broken tests

2015-12-02 23:00 GMT+01:00 Tom Lane :

> Pavel Stehule  writes:
> > Today I have problem with regress tests on my laptop.
>
> Looks like
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1286692


sure

Pavel

>
>
> regards, tom lane
>

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

Hello, sorry for the cloberred CC list.

# I restored it manually from upthread..

At Wed, 2 Dec 2015 13:42:01 -0500, Robert Haas  wrote in 

> On Tue, Dec 1, 2015 at 2:25 AM, Kyotaro HORIGUCHI
>  wrote:
> > Yeah, it is actually restricted in that length. But if we allow
> > the buffer to store whole the qualified names, it will need 64 *
> > 2 + 1 +1 = 130 bytes * 10 1300 bytes for each beentry... It might
> > be acceptable by others, but I don't think that is preferable..
> 
> There's no such thing as a free lunch here, but we probably don't need
> room for 10 strings.  If we allowed say 4 strings per beentry and
> limited each one to, say, 140 characters for Twitter-compatibility,
> that's 560 bytes per backend.  Throw in some int8 counters and you're
> up to maybe 600 bytes per backend.  So that's ~60kB of memory for 100
> backends.  That doesn't sound like a huge problem to me, but it
> wouldn't be stupid to have a PGC_POSTMASTER GUC to turn this feature
> on and off, for the benefit of people who may want to run this in
> low-memory environments.

This is similar to Amit-L's proposal and either sound fair for me.

> > As a more dractic change in design, since these fields are
> > written/read in sequential manner, providing one free buffer of
> > the size of.. so.. about 128 bytes for each beentry and storing
> > strings delimiting with '\0' and numbers in binary format, as an
> > example, would do. Additional functions to write into/read from
> > this buffer safely would be needed but this gives both the
> > ability to store longer messages and relatively short total
> > buffer size, and allows arbitrary number of parameters limited
> > only by the length of the free buffer.
> >
> > What do you think about this?
> 
> I think it sounds like a mess with uncertain benefits.  Now instead of
> having individual fields that maybe don't fit and have to be
> truncated, you have to figure out what to leave out when the overall
> message doesn't fit.  That's likely to lead to a lot of messy logic on
> the server side, and even messier logic for any clients that read the
> data and try to parse it programmatically.

Ok, I understood that the packed format itself is unaccetable.

> > By the way, how about giving separate columns for relname and
> > namespace? I think it is more usual way to designate a relation
> > in this kind of view and it makes the snprintf to concatenate
> > name and schema unnecessary(it's not significant, though). (The
> > following example is after pg_stat_all_tables)
> 
> I could go either way on this.

It would depends on the field length but 140 bytes can hold a
whole qualified names.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

Hi,

On 2015/12/03 15:27, Kyotaro HORIGUCHI wrote:
> At Thu, 3 Dec 2015 14:18:50 +0900, Amit Langote wrote
>> On 2015/12/03 13:47, Kyotaro HORIGUCHI wrote:
>>>
>>> Apart from the representation of the relation, OID would be
>>> better as a field in beentry.
>>
>> I wonder if the field should be a standalone field or as yet another
>> st_progress_* array?
>>
>> IMHO, there are some values that a command would report that should not be
>> mixed with pgstat_report_progress()'s interface. That is, things like
>> command ID/name, command target (table name or OID) should not be mixed
>> with actual progress parameters like num_pages, num_indexes (integers),
>> processing "phase" (string) that are shared via st_progress_* fields. The
>> first of them  already has its own reporting interface in proposed patch
>> in the form of pgstat_report_activity_flag(). Although, we must be careful
>> to choose these interfaces carefully.
> 
> Sorry I misunderstood the patch.
> 
> Agreed. The patch already separates integer values and texts.
> And re-reviewing the patch, there's no fields necessary to be
> passed as string.
> 
> total_heap_pages, scanned_heap_pages, total_index_pages,
> scanned_index_pages, vacrelstats->num_index_scans are currently
> in int32.
> 
> Phase can be in integer, and schema_name and relname can be
> represented by one OID, uint32.

AIUI, st_progress_message (strings) are to be used to share certain
messages as progress information. I think the latest vacuum-progress patch
uses it to report which phase lazy_scan_heap() is in, for example,
"Scanning heap" for phase 1 of its processing and "Vacuuming index and
heap" for phase 2. Those values are shown to the user in a text column
named "phase" of the pg_stat_vacuum_progress view. That said, reporting
phase as an integer value may also be worth a consideration. Some other
command might choose to do that.

> Finally, *NO* text field is needed at least this usage. So
> progress_message is totally useless regardless of other usages
> unknown to us.

I think it may be okay at this point to add just those st_progress_*
fields which are required by lazy vacuum progress reporting. If someone
comes up with instrumentation ideas for some other command, they could
post patches to add more st_progress_* fields and to implement
instrumentation and a progress view for that command. This is essentially
what Robert said in [1] in relation to my suggestion of taking out
st_progress_param_float from this patch.

By the way, there are some non-st_progress_* fields that I was talking
about in my previous message. For example, st_activity_flag (which I have
suggested to rename to st_command instead). It needs to be set once at the
beginning of the command processing and need not be touched again. I think
it may be a better idea to do the same for table name or OID. It also
won't change over the duration of the command execution. So, we could set
it once at the beginning where that becomes known. Currently in the patch,
it's reported in st_progress_message[0] by every pgstat_report_progress()
invocation. So, the table name will be strcpy()'d to shared memory for
every scanned block that's reported.

Thanks,
Amit

[1]
http://www.postgresql.org/message-id/CA+TgmoYdZk9nPDtS+_kOt4S6fDRQLW+1jnJBmG0pkRT4ynxO=q...@mail.gmail.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Refactor Perl test code

On Thu, Dec 3, 2015 at 12:19 PM, Tom Lane  wrote:
> BTW, not the fault of this patch in particular, but this example points
> up the complaint I've had right along about how opaque TAP test failures
> are.  How did you dig down to see that error message?

Well, it showed up on my terminal...

> Is it even possible to diagnose such a failure from what the buildfarm logs?

Yes. If you look at TestLib.pm, stderr/stdout redirection is done in
an INIT block, not BEGIN block, and INIT gets executed *after* the
code is compiled. FWIW, I recall arguing in favor of adding this
redirection logic in BEGIN so as we could get compilation errors
directly in the log files... The reason why it is done this way is
that it has been argued as well that we should not change the FS in
the BEGIN block, but in the INIT block when TestLib.pm is loaded.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-02 Thread Haribabu Kommi

On Wed, Nov 25, 2015 at 7:18 PM, Pavel Stehule  wrote:
>
>
> 2015-11-25 8:05 GMT+01:00 Haribabu Kommi :
>>
>>
>> Thanks. Here I attached the poc patch that returns authentication method
>> of the
>> first matched hba entry in pg_hba.conf with the given input values.
>> Currently these
>> functions returns text type. Based on the details required to be
>> printed, it can
>> be changed.
>>
>> postgres=# select pg_hba_lookup('all', 'all');
>>  pg_hba_lookup
>> ---
>>  trust
>> (1 row)
>>
>> comments for the approach?
>
>
> From my perspective, it shows too less informations.
>
> What I am expecting:
>
> 1. line num of choosed rule
> 2. some tracing - via NOTICE, what and why some rules was skipped.

Here I attached the patch with the suggested changes.
Along with line number, I kept the options column also with authentication
options as a jsonb datatype.

Example output:

postgres=# select pg_hba_lookup('test','all','::1');
NOTICE:  Skipped 84 Hba line, because of non matching IP.
NOTICE:  Skipped 86 Hba line, because of non matching database.
NOTICE:  Skipped 87 Hba line, because of non matching role.
 pg_hba_lookup
---
 (89,trust,{})
(1 row)

comments?

Regards,
Hari Babu
Fujitsu Australia


pg_hba_lookup_poc_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_lookup function to get all matching pg_hba.conf entries

2015-12-03 5:00 GMT+01:00 Haribabu Kommi :

> On Wed, Nov 25, 2015 at 7:18 PM, Pavel Stehule 
> wrote:
> >
> >
> > 2015-11-25 8:05 GMT+01:00 Haribabu Kommi :
> >>
> >>
> >> Thanks. Here I attached the poc patch that returns authentication method
> >> of the
> >> first matched hba entry in pg_hba.conf with the given input values.
> >> Currently these
> >> functions returns text type. Based on the details required to be
> >> printed, it can
> >> be changed.
> >>
> >> postgres=# select pg_hba_lookup('all', 'all');
> >>  pg_hba_lookup
> >> ---
> >>  trust
> >> (1 row)
> >>
> >> comments for the approach?
> >
> >
> > From my perspective, it shows too less informations.
> >
> > What I am expecting:
> >
> > 1. line num of choosed rule
> > 2. some tracing - via NOTICE, what and why some rules was skipped.
>
> Here I attached the patch with the suggested changes.
> Along with line number, I kept the options column also with authentication
> options as a jsonb datatype.
>
> Example output:
>
> postgres=# select pg_hba_lookup('test','all','::1');
> NOTICE:  Skipped 84 Hba line, because of non matching IP.
> NOTICE:  Skipped 86 Hba line, because of non matching database.
> NOTICE:  Skipped 87 Hba line, because of non matching role.
>  pg_hba_lookup
> ---
>  (89,trust,{})
> (1 row)
>
> comments?
>

I liked it

The text of notice can be reduced "Skipped xx line, ..." - it have to be
pg_hba

Pavel


>
> Regards,
> Hari Babu
> Fujitsu Australia
>

Re: [HACKERS] Combining Aggregates

2015-12-02 Thread David Rowley

On 27 July 2015 at 04:58, Heikki Linnakangas  wrote:

>
> This patch seems sane to me, as far as it goes. However, there's no
> planner or executor code to use the aggregate combining for anything. I'm
> not a big fan of dead code, I'd really like to see something to use this.
>

I've attached an updated version of the patch. The main change from last
time is that I've added executor support and exposed this to the planner
via two new parameters in make_agg().

I've also added EXPLAIN support, this will display "Partial
[Hash|Group]Aggregate" for cases where the final function won't be called
and displays "Finalize [Hash|Group]Aggregate" when combining states and
finalizing aggregates.

This patch is currently intended for foundation work for parallel
aggregation.

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

combine_aggregate_state_6ea1aad_2015-12-03.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

On 2015/12/03 13:47, Kyotaro HORIGUCHI wrote:
> At Wed, 2 Dec 2015 15:48:20 -0300, Alvaro Herrera  
> wrote
>>
>> Actually, do we really need to have the table name as a string at all
>> here?  Why not just report the table OID?  Surely whoever wants to check
>> the progress can connect to the database in question to figure out the
>> table name.
> 
> I thought the same thing but found that the same kind of view
> (say, pg_stat_user_tables) has separate relanme and shcemaname in
> string (not a qualified name, though).
> 
> Apart from the representation of the relation, OID would be
> better as a field in beentry.

I wonder if the field should be a standalone field or as yet another
st_progress_* array?

IMHO, there are some values that a command would report that should not be
mixed with pgstat_report_progress()'s interface. That is, things like
command ID/name, command target (table name or OID) should not be mixed
with actual progress parameters like num_pages, num_indexes (integers),
processing "phase" (string) that are shared via st_progress_* fields. The
first of them  already has its own reporting interface in proposed patch
in the form of pgstat_report_activity_flag(). Although, we must be careful
to choose these interfaces carefully.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] find_inheritance_children() and ALTER TABLE NO INHERIT

On 2015/12/03 13:09, Tom Lane wrote:
> Amit Langote  writes:
>> Currently find_inheritance_children() is smart enough to skip a child
>> table that it finds has been dropped concurrently after it gets a lock on
>> the same. It does so by looking up the child relid in syscache. It seems
>> it should also check if the table is still in the list of children of the
>> parent. Doing so by scanning the pg_inherits(inhparent) index may likely
>> be inefficient. So, how about adding that syscache on
>> pg_inherits(inherelid, inhparent) [1]?
> 
> I doubt that a syscache would fix the performance issue there; it wouldn't
> get referenced enough to be likely to have the desired tuple in cache.

Ah, right.

> I wonder whether we could improve matters by rechecking validity of the
> pg_inherits tuple (which we saw already and could presumably retain the
> TID of).  There is at least one place where we do something like that now,
> IIRC.

Given that the generation of child OID list and locking of child tables
occur independently, do you mean to collect catalog tuple TIDs along with
corresponding OIDs during the catalog scan and recheck them during the
locking step?

Not sure whether sane but how about performing ordered scan on pg_inherits
(systable_getnext_ordered())and using systable_recheck_tuple() in step
with it? Does using ordered catalog scan ensure safety against deadlocks
that the existing approach of ordered locking of child tables does?
Perhaps I'm missing something.

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] W-TinyLfu for cache eviction

2015-12-02 Thread Vladimir Sitnikov

I've recently noticed W-TinyLfu cache admission policy (see [1]) being
used for caffeine "high performance caching library for Java 8".
It demonstrates high cache hit ratios (see [2]) and enables to build
high-throughput caches (see caffeine in [3])
Authors explicitly allow implementations of the algorithm (see [4]).

Does it make sense to evaluate the algorithm for buffer replacement?

[1]: http://arxiv.org/pdf/1512.00727v1.pdf
[2]: https://github.com/ben-manes/caffeine/wiki/Efficiency
[3]: https://github.com/ben-manes/caffeine/wiki/Benchmarks
[4]: https://github.com/ben-manes/caffeine/issues/23#issuecomment-161536706

Vladimir Sitnikov


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical replication and multimaster

2015-12-02 Thread konstantin knizhnik

On Dec 3, 2015, at 4:18 AM, Craig Ringer wrote:

> Excellent.
> 
> It should be possible to make that a separate extension. You can use C 
> functions from other extensions by exposing a single pg_proc function with 
> 'internal' return type that populates a struct of function pointers for the 
> API. A single DirectFunctionCall lets you get the API struct. That's how 
> pglogical_output handles hooks. The main downside is that you can't do that 
> without a connection to a database with the extension installed so the 
> pg_proc entry is exposed.

Actually, working under cluster and columnar storage extension I got several 
questions about PostgreSQL infrastructure.
I always found some workarounds, but may it is better to ask community about 
it:)

1. Why there is no "conditional event" synchronization primitive in PostgreSQL. 
There is latch, but it is implemented using sockets and I afraid that it is not 
very fast.
It will be nice to have some fast primitive like pthread condition variables.

2. PostgreSQL semaphores seems to be not intended for external use outside 
PostgreSQL core  (for example in extensions).
There is no way to request additional amount of semaphores. Right now 
semaphores are allocated based on maximal number of backends and spinlocks.
And a semaphore as well as event is very popular and convenient synchronization 
primitive required in many cases.

3. What is the right way of creation of background worker requiring access to 
shared memory, i.e. having control structure in main memory?
As far as I understand background workers have to be registered either PG_init, 
either outside Postmaster environment.
If extension requires access to shared memory, then it should be registered in 
shared_preload_libraries list and should be initialized using shmem_startup 
hook.
Something like this:

void _PG_init(void)
{
if (!process_shared_preload_libraries_in_progress)
return;
...
prev_shmem_startup_hook = shmem_startup_hook;
shmem_startup_hook = My_shmem_startup;
}

My_shmem_startup is needed because in _PG_init it is not possible to allocate 
shared memory.
So if I need to allocate some control structure for background workers  in 
shared memory, then I should do it in My_shmem_startup.
But I can not register background workers in My_shmem_startup! I will get "must 
be registered in shared_preload_libraries" error:

void
RegisterBackgroundWorker(BackgroundWorker *worker)
{
if (!process_shared_preload_libraries_in_progress)
{
if (!IsUnderPostmaster)
ereport(LOG,
(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 errmsg("background worker \"%s\": must 
be registered in shared_preload_libraries",
worker->bgw_name)));
return;
}
}

So I have to register background workers in PG_init while control structure for 
them is not yet ready.
When I have implemented pool of background workers, I solved this problem by 
proving function which return address of control structure later - when it will 
be actually allocated.
But it seems to be some design flaw in BGW, isn' it?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Refactor Perl test code

Michael Paquier  writes:
> On Thu, Dec 3, 2015 at 8:38 AM, Tom Lane wrote:
>> Test Summary Report
>> ---
>> t/001_initdb.pl (Wstat: 6400 Tests: 8 Failed: 0)
>> Non-zero exit status: 25
>> Parse errors: Bad plan.  You planned 14 tests but ran 8.
>> Files=1, Tests=8,  0 wallclock secs ( 0.02 usr  0.00 sys +  0.08 cusr  0.01 
>> csys =  0.11 CPU)
>> Result: FAIL
>> make[2]: *** [check] Error 1
>> make[2]: Leaving directory `/home/postgres/pgsql/src/bin/initdb'
>> make[1]: *** [check-initdb-recurse] Error 2
>> make[1]: Leaving directory `/home/postgres/pgsql/src/bin'
>> make: *** [check-world-src/bin-recurse] Error 2
>> 
>> The buildfarm looks pretty unhappy too, though I've not checked to see if
>> it's exactly the same symptom everywhere.

> Or in more details:
> Undefined subroutine ::run called at
> /Users/mpaquier/git/postgres/src/bin/initdb/../../../src/test/perl/TestLib.pm
> line 146.

BTW, not the fault of this patch in particular, but this example points
up the complaint I've had right along about how opaque TAP test failures
are.  How did you dig down to see that error message?  Is it even possible
to diagnose such a failure from what the buildfarm logs?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Refactor Perl test code

Michael Paquier  writes:
> On Thu, Dec 3, 2015 at 12:19 PM, Tom Lane  wrote:
>> BTW, not the fault of this patch in particular, but this example points
>> up the complaint I've had right along about how opaque TAP test failures
>> are.  How did you dig down to see that error message?

> Well, it showed up on my terminal...

Not on mine, as per the extract I showed.  Probably a Perl version
difference, but I don't think we can exactly write off RHEL6 as an
uninteresting obsolete distribution.  (The Perl here is 5.10.1.)

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: add \pset true/false

Hello, the attched is an example implement of output filter
dynamic loading feature of psql.

At Thu, 3 Dec 2015 10:41:11 +0900, Michael Paquier  
wrote in 
> > How about plugins on psql side? Calling hooked function in
> > printQuery could do that on psql. Impact on psql itself is
> > minimized. (Of course code for loading is omitted in the below
> > but it would also small...)
> 
> That's interesting, and crazy. You would basically need to have the
> equivalent of \load, or an environment variable like PSQL_SHARED_LIBS
> that is similar to shared_preload_libraries on client-side. In short I
> guess that's actually a clone of LD_PRELOAD.

It is bothersome to share the server-side preload feature so I
made this as a minimal implement. Some safety meature should be
added but not so severe than backend. I home this is an
acceptable mess.

The attached patch adds a function to load output filter DLL.
The second file is an example filter module. The following
commandline with the file can create a test filter module. I
suppose preload feature only needs additional few lines.

gcc -I -fPIC -shared outfunctest.c -o /tmp/outfunctest.so

Then, on psql command line.

=# select 't'::bool;
 bool 
--
 t
=# \load_dll /tmp/outfunctest.so
=# select 't'::bool;
   bool
---
 TUEEE
(1 row)

Anyone can tweak any type of output by this.

Opinions, suggestions or rejections are welcome.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


>From ac5e917a1bd802c37e56dc0861adb8807126d2eb Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Thu, 3 Dec 2015 14:38:03 +0900
Subject: [PATCH] Test implement of dynamic-loadable output filter.

This is a minimum implement so that psql can dynamically load output
filter module. As a test code, this can load only by slash command, no
unload or reload or preloasd feature.
---
 src/bin/psql/Makefile  | 2 +-
 src/bin/psql/command.c | 8 
 src/bin/psql/print.c   | 6 ++
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/src/bin/psql/Makefile b/src/bin/psql/Makefile
index f1336d5..2fb1fd1 100644
--- a/src/bin/psql/Makefile
+++ b/src/bin/psql/Makefile
@@ -23,7 +23,7 @@ override CPPFLAGS := -I. -I$(srcdir) -I$(libpq_srcdir) -I$(top_srcdir)/src/bin/p
 OBJS=	command.o common.o help.o input.o stringutils.o mainloop.o copy.o \
 	startup.o prompt.o variables.o large_obj.o print.o describe.o \
 	tab-complete.o mbprint.o dumputils.o keywords.o kwlookup.o \
-	sql_help.o \
+	sql_help.o dlload.o\
 	$(WIN32RES)
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 438a4ec..4101d99 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -40,6 +40,7 @@
 #include "common.h"
 #include "copy.h"
 #include "describe.h"
+#include "dlload.h"
 #include "help.h"
 #include "input.h"
 #include "large_obj.h"
@@ -1001,6 +1002,13 @@ exec_command(const char *cmd,
 		free(opt2);
 	}
 
+	else if (strcmp(cmd, "load_dll") == 0)
+	{
+		char *opt1 = psql_scan_slash_option(scan_state,
+			OT_NORMAL, NULL, true);
+		expand_tilde();
+		success = do_dlload(opt1);
+	}
 
 	/* \o -- set query output */
 	else if (strcmp(cmd, "o") == 0 || strcmp(cmd, "out") == 0)
diff --git a/src/bin/psql/print.c b/src/bin/psql/print.c
index ad4350e..37febb2 100644
--- a/src/bin/psql/print.c
+++ b/src/bin/psql/print.c
@@ -27,6 +27,7 @@
 #include "common.h"
 #include "mbprint.h"
 #include "print.h"
+#include "dlload.h"
 
 /*
  * We define the cancel_pressed flag in this file, rather than common.c where
@@ -3210,6 +3211,11 @@ printQuery(const PGresult *result, const printQueryOpt *opt, FILE *fout, FILE *f
 			else
 			{
 cell = PQgetvalue(result, r, c);
+
+/* Call output filter if registered */
+if (outputfilter)
+	cell = outputfilter(cell, PQftype(result, c), ());
+
 if (cont.aligns[c] == 'r' && opt->topt.numericLocale)
 {
 	cell = format_numeric_locale(cell);
-- 
1.8.3.1

#include 
#include "postgres_fe.h"
#include "catalog/pg_type.h"

char *
outputfilter(char *origcell, int type, bool *mustfree)
{
   char *retcell;

   switch (type)
   {
   case BOOLOID:
 retcell = (*origcell == 't' ? "TUEEE" : "FAAALSE");
 if (*mustfree) free(origcell);
 *mustfree = false;
 break;
   default:
 retcell = origcell;
 break;
   }
   return retcell;
}

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-12-02 Thread Catalin Iacob

On Tue, Dec 1, 2015 at 2:17 AM, Pavel Stehule  wrote:
> here is complete patch - regress tests for all supported Python branches

I had a look at what changed in v10 since my last reviewed version and
indeed most of it is straightforward: renames from SPIError to Error.
The patch also changes plpy.Fatal and plpy.SPIError to inherit from
the existing plpy.Error which is a backward incompatibility: catching
a plpy.Error will now also catch SPIError and Fatal.

But by doing this I noticed plpy.Error already existed without the
patch and raising plpy.Error(msg) is documented as equivalent to
calling plpy.error(msg), similar for plpy.Fatal and plpy.fatal(). This
patch makes it possible to raise plpy.Error with more arguments
including keyword ones but doesn't change plpy.error(msg). And Fatal
is not touched so it becomes inconsistent with Error.

So I now think the approach this patch took is wrong. We should
instead enhance the existing error and fatal functions and Error and
Fatal exceptions to accept other arguments that ereport accepts (hint,
sqlstate) and be able to pass all those as keyword parameters.
SPIError should be left unchanged as that's for errors raised by query
execution not by the PL/Python user. This is also close to Pavel's
original ereport proposal but by extending existing functionality
instead of adding a new function and it covers Peter's observation
that in Python the ereport function should be "raise an exception" as
that's already an alternative way of doing it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: PL/Pythonu - function ereport

2015-12-03 7:04 GMT+01:00 Catalin Iacob :

> On Tue, Dec 1, 2015 at 2:17 AM, Pavel Stehule 
> wrote:
> > here is complete patch - regress tests for all supported Python branches
>
> I had a look at what changed in v10 since my last reviewed version and
> indeed most of it is straightforward: renames from SPIError to Error.
> The patch also changes plpy.Fatal and plpy.SPIError to inherit from
> the existing plpy.Error which is a backward incompatibility: catching
> a plpy.Error will now also catch SPIError and Fatal.
>
> But by doing this I noticed plpy.Error already existed without the
> patch and raising plpy.Error(msg) is documented as equivalent to
> calling plpy.error(msg), similar for plpy.Fatal and plpy.fatal(). This
> patch makes it possible to raise plpy.Error with more arguments
> including keyword ones but doesn't change plpy.error(msg). And Fatal
> is not touched so it becomes inconsistent with Error.
>
> So I now think the approach this patch took is wrong. We should
> instead enhance the existing error and fatal functions and Error and
> Fatal exceptions to accept other arguments that ereport accepts (hint,
> sqlstate) and be able to pass all those as keyword parameters.
> SPIError should be left unchanged as that's for errors raised by query
> execution not by the PL/Python user. This is also close to Pavel's
> original ereport proposal but by extending existing functionality
> instead of adding a new function and it covers Peter's observation
> that in Python the ereport function should be "raise an exception" as
> that's already an alternative way of doing it.
>


I am sorry, I don't understand. Now due inheritance plpy.Fatal and
plpy.SPIError has availability to use keyword parameters.

postgres=# do $$ raise plpy.Fatal('AHOJ','NAZDAR');
$$ language plpythonu;
FATAL:  38000: plpy.Fatal: AHOJ
DETAIL:  NAZDAR
CONTEXT:  Traceback (most recent call last):
  PL/Python anonymous code block, line 1, in 
raise plpy.Fatal('AHOJ','NAZDAR');
PL/Python anonymous code block

Regards

Pavel

Re: [HACKERS] Postgres_FDW optimizations

2015-12-02 Thread Julien Rouhaud

On 02/12/2015 20:25, cevian wrote:
> Hi all,
> 

Hello,

> I have a question about postgres_fdw optimizations/pushdown:
> 
> I have the following code running on 9.5beta2 (same format as
> previous/related message for consistency)
> CREATE EXTENSION postgres_fdw; 
> CREATE SERVER loop foreign data wrapper postgres_fdw 
>   OPTIONS (port '5432', dbname 'testdb'); 
> CREATE USER MAPPING FOR PUBLIC SERVER loop; 
> 
> create table onemillion ( 
> id serial primary key, 
> inserted timestamp default clock_timestamp(), 
> data text 
> ); 
> 
> insert into onemillion(data) select random() from 
> generate_series(1,100); 
> 
> CREATE FOREIGN TABLE onemillion_pgfdw ( 
> id int, 
> inserted timestamp, 
> data text 
> ) SERVER loop 
> OPTIONS (table_name 'onemillion', 
>  use_remote_estimate 'true'); 
> 
> explain verbose select * from onemillion_pgfdw order by id limit 1;
>  QUERY PLAN
> 
>  Limit  (cost=43434.00..43434.00 rows=1 width=30)
>Output: id, inserted, data
>->  Sort  (cost=43434.00..45934.00 rows=100 width=30)
>  Output: id, inserted, data
>  Sort Key: onemillion_pgfdw.id
>  ->  Foreign Scan on public.onemillion_pgfdw  (cost=100.00..38434.00
> rows=100 width=30)
>Output: id, inserted, data
>Remote SQL: SELECT id, inserted, data FROM public.onemillion
> 
> This is obviously highly inefficient. The sort and limit should be pushed
> down to the foreign node, especially on such a simple query. I have 3
> questions:
> 
> 1) Is this the expected stated of the fdw optimizations for now, or is it a
> bug?
> 2) Is anybody working on this type of pushdown right now (I would be more
> than willing to collaborate on a patch)

The sort pushdown for postgres_fdw has been committed a few weeks ago
for 9.6, see
http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f18c944b6137329ac4a6b2dce5745c5dc21a8578

> 3) Is this possible to fix with with views/rules/triggers/different query. I
> couldn't find a way. Relatedly, is there a way to explicitly specify an
> explicit remote query to run through the fdw? 
> 

For now, I don't see any other solution than executing a remote query
with the dblink extension:
http://www.postgresql.org/docs/current/static/contrib-dblink-function.html

Regards.

> Thanks,
> Matvey Arye
> Iobeam, Inc.
> 
> 
> 
> 
> --
> View this message in context: 
> http://postgresql.nabble.com/Postgres-FDW-optimizations-tp5875911.html
> Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.
> 
> 


-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns

On Mon, Nov 30, 2015 at 11:00 PM, David Rowley
 wrote:
> There are in fact also two queries in TPC-H (Q10 and Q18) which are written
> to include all of the non-aggregated column in the GROUP BY list. During a
> recent test I witnessed a 50% gain in performance in Q10 by removing the
> unneeded columns from the GROUP BY clause.

Wow, that's pretty impressive.  +1 for trying to do something about this.

(Full disclosure: I have not read your patch.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin

2015-12-02 Thread Petr Jelinek


Hi,

I can't really do huge review considering I wrote half of the code, but 
I have couple of things I noticed.


First, I wonder if it would be useful to mention somewhere, even if it's 
only here in the mailing list how can the protocol be extended in 
non-breaking way in future for transaction streaming if we ever get 
that. I am mainly asking for this because the protocol does not 
currently send xid for every change as it's not necessary, but for 
streaming it will be. I know of couple of ways myself but I think they 
should be described publicly.


The other thing is that I think we don't need the "forward_changesets" 
parameter which currently decides if to forward changes that didn't 
originate on local node. There is already hook for origin filtering 
which provides same functionality in more flexible way so it seems 
redundant to also have special boolean option for it.


--
 Petr Jelinek  http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual


On 2015/12/02 1:53, Robert Haas wrote:

On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita
 wrote: Plan   *plan =
>scan.plan;

@@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist,
 /* cost will be filled in by create_foreignscan_plan */
 plan->targetlist = qptlist;
 plan->qual = qpqual;
-   plan->lefttree = NULL;
+   plan->lefttree = fdw_outerplan;
 plan->righttree = NULL;
 node->scan.scanrelid = scanrelid;

I think that that would break the EXPLAIN output.



In what way?  EXPLAIN recurses into the left and right trees of every
plan node regardless of what type it is, so superficially I feel like
this ought to just work. What problem do you foresee?

I do think that ExecInitForeignScan ought to be changed to
ExecInitNode on it's outer plan if present rather than leaving that to
the FDW's BeginForeignScan method.


IIUC, I think the EXPLAIN output for eg,

select localtab.* from localtab, ft1, ft2 where localtab.a = ft1.a and 
ft1.a = ft2.a for update


would be something like this:

 LockRows
   ->  Nested Loop
 Join Filter: (ft1.a = localtab.a)
 ->  Seq Scan on localtab
 ->  Foreign Scan on ft1/ft2-foreign-join
   -> Nested Loop
Join Filter: (ft1.a = ft2.a)
->  Foreign Scan on ft1
->  Foreign Scan on ft2

The subplan below the Foreign Scan on the foreign-join seems odd to me. 
 One option to avoid that is to handle the subplan as in my patch [2], 
which I created to address your comment that we should not break the 
equivalence discussed below.  I'm not sure that the patch's handling of 
chgParam for the subplan is a good idea, though.



One option to avoid that
is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or
BeginForeignScan as you proposed.  That breaks the equivalence that the Plan
tree and the PlanState tree should be mirror images of each other, but I
think that that break would be harmless.



I'm not sure how many times I have to say this, but we are not doing
that.  I will not commit any patch that does that, and I will
vigorously argue against anyone else committing such a patch either.
That *would* break EXPLAIN, because EXPLAIN relies on being able to
walk the PlanState tree and find all the Plan nodes from the
corresponding PlanState nodes.  Now you might think that it would be
OK to omit a plan node that we decided we weren't ever going to
execute, but today we don't do that, and I don't think we should.  I
think it could be very confusing if EXPLAIN and EXPLAIN ANALYZE show
different sets of plan nodes for the same query.  Quite apart from
EXPLAIN, there are numerous other places that assume that they can
walk the PlanState tree and find all the Plan nodes.  Breaking that
assumption would be bad news.


Agreed.  Thanks for the explanation!

Best regards,
Etsuro Fujita

[2] http://www.postgresql.org/message-id/5624d583.10...@lab.ntt.co.jp




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

On 2015-12-02 12:25:37 +0100, Joel Jacobson wrote:
> On Wed, Dec 2, 2015 at 12:19 PM, Andres Freund  wrote:
> > The significant changes are in 9.5.
>
> Will multixact truncations be WAL logged in 9.5?

Yes.

C.f. the release notes:

* Rework truncation of the multixact commit log to be properly WAL-logged
  (Andres Freund)

  This makes things substantially simpler and more robust.

The relevant commits are

commit aa29c1ccd9f785f9365809f5133e5491acc7ae53
Author: Andres Freund 
Date:   2015-09-26 19:04:25 +0200

Remove legacy multixact truncation support.
...

Backpatch: 9.5

and
commit 4f627f897367f15702d59973f75f6391d5d3e06f
Author: Andres Freund 
Date:   2015-09-26 19:04:25 +0200

Rework the way multixact truncations work.
...

Backpatch: 9.5


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Getting sorted data from foreign server for merge join

2015-12-02 Thread Rushabh Lathia

Thanks Ashutosh.

Re-reviewed and Re-verified the patch, pg_sort_all_pd_v5.patch
looks good to me.


On Fri, Nov 27, 2015 at 3:02 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> Thanks Rushabh for your review and comments.
>
> On Thu, Nov 26, 2015 at 5:39 PM, Rushabh Lathia 
> wrote:
>
>> Hi Ashutosh,
>>
>> I reviewed your latest version of patch and over all the implementation
>> and other details look good to me.
>>
>> Here are few cosmetic issues which I found:
>>
>> 1) Patch not getting applied cleanly - white space warning
>>
>>
> Done.
>
>
>> 2)
>>
>> -List   *usable_pathkeys = NIL;
>> +List*useful_pathkeys_list = NIL;/* List of all pathkeys
>> */
>>
>> Code alignment is not correct with other declared variables.
>>
>>
> Incorporated the change in the patch.
>
> 3)
>>
>> +{
>> +PathKey*pathkey;
>> +List*pathkeys;
>> +
>> +pathkey = make_canonical_pathkey(root, cur_ec,
>> +
>> linitial_oid(cur_ec->ec_opfamilies),
>> +BTLessStrategyNumber,
>> +false);
>> +pathkeys = list_make1(pathkey);
>> +useful_pathkeys_list = lappend(useful_pathkeys_list,
>> pathkeys);
>> +}
>>
>> Code alignment need to fix at make_canonical_pathkey().
>>
>
> Incorporated the change in the patch.
>
> I have also removed the TODO item in the prologue of this function, since
> none has objected to externalization of make_canonical_pathkeys till now
> and it's not expected to be part of the final commit.
>
>
>>
>> 4)
>>
>> I don't understand the meaning of following added testcase into
>> postgres_fdw.
>>
>> +++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
>> @@ -171,20 +171,35 @@ SELECT COUNT(*) FROM ft1 t1;
>>  -- join two tables
>>  SELECT t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3,
>> t1.c1 OFFSET 100 LIMIT 10;
>>  -- subquery
>>  SELECT * FROM ft1 t1 WHERE t1.c3 IN (SELECT c3 FROM ft2 t2 WHERE c1 <=
>> 10) ORDER BY c1;
>>  -- subquery+MAX
>>  SELECT * FROM ft1 t1 WHERE t1.c3 = (SELECT MAX(c3) FROM ft2 t2) ORDER BY
>> c1;
>>  -- used in CTE
>>  WITH t1 AS (SELECT * FROM ft1 WHERE c1 <= 10) SELECT t2.c1, t2.c2,
>> t2.c3, t2.c4 FROM t1, ft2 t2 WHERE t1.c1 = t2.c1 ORDER BY t1.c1;
>>  -- fixed values
>>  SELECT 'fixed', NULL FROM ft1 t1 WHERE c1 = 1;
>> +-- getting data sorted from the foreign table for merge join
>> +-- Since we are interested in merge join, disable other joins
>> +SET enable_hashjoin TO false;
>> +SET enable_nestloop TO false;
>> +-- inner join, expressions in the clauses appear in the equivalence
>> class list
>> +EXPLAIN (VERBOSE, COSTS false)
>> +SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 =
>> t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
>> +SELECT t1.c1, t2."C 1" FROM ft2 t1 JOIN "S 1"."T 1" t2 ON (t1.c1 = t2."C
>> 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
>> +-- outer join, expression in the clauses do not appear in equivalence
>> class list
>> +-- but no output change as compared to the previous query
>> +EXPLAIN (VERBOSE, COSTS false)
>> +SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON
>> (t1.c1 = t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
>> +SELECT t1.c1, t2."C 1" FROM ft2 t1 LEFT JOIN "S 1"."T 1" t2 ON (t1.c1 =
>> t2."C 1") ORDER BY t1.c1 OFFSET 100 LIMIT 10;
>> +SET enable_hashjoin TO true;
>> +SET enable_nestloop TO true;
>>
>> Because, I removed the code changes of the patch and then I run the test
>> seem like it has nothing to do with the code changes. Above set of test
>> giving
>> same result with/without patch.
>>
>> Am I missing something ?
>>
>
> Actually, the output of merge join is always ordered by the pathkeys used
> for merge join. That routed through LIMIT node remains ordered. So, we
> actually do not need ORDER BY t1.c1 clause in the above queries. Without
> that clause, the tests will show difference output with and without patch.
> I have changed the attached patch accordingly.
>
>
>>
>> Apart from this I debugged the patch for each scenario (query pathkeys and
>> pathkeys arising out of the equivalence classes) and so far patch looks
>> good
>> to me.
>>
>>
> Thanks.
>
>
>> Attaching update version of patch by fixing the cosmetic changes.
>>
>>
> Attached version of patch contains your changes.
> --
> Best Wishes,
> Ashutosh Bapat
> EnterpriseDB Corporation
> The Postgres Database Company
>
>
>
>


-- 
Rushabh Lathia

Re: [HACKERS] Remaining 9.5 open items

2015-12-02 Thread Joel Jacobson

On Mon, Nov 30, 2015 at 8:43 PM, Tom Lane  wrote:
> * Finish multixact truncation rework
>
> We're not seriously going to push something this large into 9.5 at this
> point, are we?

I don't know all the details here, so my apologies if any of this is
incorrect/stupid/misinformed.

Given all the quite serious data corruption issues related to
multixact, wouldn't it be motivated to wait releasing 9.5.0 until this
much needed multixct truncation rework has been finished?

This is of course not an issue for users already on >=9.3, since the
problems started in 9.3 and those already on 9.3 and 9.4 already face
the risk.

But users who will consider upgrading from previous versions, will
have to ask themselves if all the new features in 9.5 are worth the
extra risk of data corruption due to the multixact issues.

We at Trustly are still running 9.1 and have been waiting for the
multixact problems to be fixed, which is why we didn't upgrade to 9.4,
and now when I read this I feel really sad we probably have to wait
for 9.6, unless we can accept some increase risk of data corruption.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items