date:20170124

Hi Ashutosh,

On 2017/01/25 14:54, Ashutosh Bapat wrote:
> The documentation available at
> https://www.postgresql.org/docs/devel/static/sql-createtable.html,
> does not make it clear that the lower bound of a range partition is
> always inclusive and the higher one is exclusive. I think a note in
> section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
> would be helpful.

Yes, I'm working on a documentation patch for that and a few other things
such as revising the Partitioning chapter.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] macaddr 64 bit (EUI-64) datatype support

2017-01-24 Thread Vitaly Burovoy

On 1/23/17, Haribabu Kommi  wrote:
> The patch is split into two parts.
> 1. Macaddr8 datatype support
> 2. Contrib module support.

Hello,

I'm sorry for the delay.
The patch is almost done, but I have two requests since the last review.

1.
src/backend/utils/adt/mac8.c:
+   int a,
+   b,
+   c,
+   d = 0,
+   e = 0,
...

There is no reason to set them as 0. For EUI-48 they will be
reassigned in the "if (count != 8)" block, for EUI-64 -- in one of
sscanf.
They could be set to "d = 0xFF, e = 0xFE," and avoid the "if" block
mentioned above, but it makes the code be much less readable.

Oh. I see. In the current version it must be assigned because for
EUI-48 memory can have values <0 or >255 in uninitialized variables.
It is better to avoid initialization by merging two parts of the code:
+   if (count != 6)
+   {
+   /* May be a 8-byte MAC address */
...
+   if (count != 8)
+   {
+   d = 0xFF;
+   e = 0xFE;
+   }

to a single one:
+   if (count == 6)
+   {
+   d = 0xFF;
+   e = 0xFE;
+   }
+   else
+   {
+   /* May be a 8-byte MAC address */
...

2.
src/backend/utils/adt/network.c:
+   res = (mac->a << 24) | (mac->b << 16) | (mac->c 
<< 8) | (mac->d);
+   res = (double)((uint64)res << 32);
+   res += (mac->e << 24) | (mac->f << 16) | 
(mac->g << 8) | (mac->h);

Khm... I trust that modern compilers can do a lot of optimizations but
for me it looks terrible because of needless conversions.
The reason why earlier versions did have two lines "res *= 256 * 256"
was an integer overflow for four multipliers, but it can be solved by
defining the first multiplier as a double:
+   res = (mac->a << 24) | (mac->b << 16) | (mac->c 
<< 8) | (mac->d);
+   res *= (double)256 * 256 * 256 * 256;
+   res += (mac->e << 24) | (mac->f << 16) | 
(mac->g << 8) | (mac->h);

In this case the left-hand side argument for the "*=" operator is
computed at the compile time as a single constant.
The second line can be written as "res *= 256. * 256 * 256 * 256;"
(pay attention to a dot in the first multiplier), but it is not
readable at all (and produces the same code).

-- 
Best regards,
Vitaly Burovoy


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2017-01-24 Thread Fabien COELHO




I agree, and I think that's pretty much what we all said back in
October, and the patch hasn't been revised since then to match those
comments.


Hmmm. It really had been revised, although I forgot to remove the "if" doc 
but I had remove the if function.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-01-24 Thread Andrew Borodin

2017-01-24 22:29 GMT+05:00 Jeff Davis :
> On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin  wrote:
>> Technically, approach of locking a subtree is not novel. Lehman and
>> Yao focused on "that any process for manipulating the tree uses only a
>> small (constant) number of locks at any time." We are locking unknown
>> and possibly large amount of pages.
>
> By the way, can you show me where the Lehman and Yao paper addresses
> page recycling?
>
> It says that one approach is to allow fewer than K entries on a leaf
> node; presumably as few as zero. But it doesn't seem to show how to
> remove all references to the page and recycle it in a new place in the
> tree.
>
> Regards,
>  Jeff Davis

Here J. Hellerstein comments L paper [1] saying that effectively
there is no deletion algorithm provided.
Here [2] is paper referenced from nbtree docs. I'll check if this is
applicable to GIN B-tree.

[1] http://db.cs.berkeley.edu/jmh/cs262b/treeCCR.html
[2] http://dl.acm.org/citation.cfm?id=324589


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] I could not see any row in audit table

2017-01-24 Thread Craig Ringer

On 25 January 2017 at 13:28, Shailesh Singh
 wrote:
> Dear Group Member ,
>
>
> I had configured the audit trigger for my datbase following the below
> document url:
>
> https://wiki.postgresql.org/wiki/Audit_trigger_91plus

That is not part of PostgreSQL proper.

Please use the pgsql-general mailing list or Stack Overflow to discuss
issues not related to the core code of the PostgreSQL project.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-01-24 Thread Craig Ringer

On 25 January 2017 at 13:44, Craig Ringer  wrote:
> On 24 January 2017 at 23:49, Robert Haas  wrote:
>
>> I think it's fairly surprising that TruncateCLOG() has responsibility
>> for writing the xlog record that protects advancing
>> ShmemVariableCache->oldestXid, but not the responsibility for actually
>> advancing that value.  In other words, I think the AdvanceOldestXid()
>> call in vac_truncate_clog() should be moved into TruncateClog().
>> (Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the
>> responsibility of TruncateCommitTs().)
>
> Agreed, and done in attached.
>
> I haven't done the same for commit_ts but will do so on the thread for
> the commit_ts race fix if we go ahead with this change here.
>
>> I think it is not correct to advance oldestXid but not oldestXidDB.
>> Otherwise, GetNewTransactionId() might complain about the wrong
>> database.
>
> That's a clear oversight. Fixed in attached.

New attached also records it in xlog and applies it to the standby.

If we want to save the 4 bytes per xmin advance (probably not worth
caring) we can instead skip setting it on the standby, in which case
it'll be potentially wrong until the next checkpoint. I'd rather make
sure it stays correct.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From ce69d2824f89f675a25e098b3980a304baca7ade Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 23 Jan 2017 13:25:30 +0800
Subject: [PATCH 1/2] Fix race between clog truncation and lookup

There was previously no way to look up an arbitrary xid without
running the risk of having clog truncated out from under you.

This hasn 't been a problem because anything looking up xids in clog knows
they're protected by datminxid, but that's not the case for arbitrary
user-supplied XIDs. clog was truncated before we advanced oldestXid under
XidGenLock, so holding XidGenLock during a clog lookup was insufficient to
prevent the race. There's no way to look up a SLRU with soft-failure;
attempting a lookup produces an I/O error. There's also no safe way to trap and
swallow the SLRU lookup error due mainly to locking issues.

To address this, increase oldestXid under XidGenLock before we trunate clog
rather than after, so concurrent lookups of arbitrary XIDs are safe if they are
done under XidGenLock. The rest of the xid limits are still advanced after clog
truncation to ensure there's no chance of a new xid trying to access an
about-to-be-truncated clog page. In practice this can't happen anyway since we
use only half the xid space at any time, but additional guards against future
change are warranted with something this crucial.

This race also exists in a worse form on standby servers. On a standby we only
advance oldestXid when we replay the next checkpoint, so there's a much larger
window between clog truncation and subsequent updating of the limit. Fix this
by recording the oldest xid in clog truncation records and applying the
oldestXid under XidGenLock before replaying the clog truncation.

Note that There's no need to take XidGenLock for normal clog lookups protected
by datfrozenxid, only if accepting arbitrary XIDs that might not be protected by
vacuum thresholds.
---
 src/backend/access/rmgrdesc/clogdesc.c | 12 ++--
 src/backend/access/transam/clog.c  | 50 +++---
 src/backend/access/transam/varsup.c| 49 ++---
 src/backend/access/transam/xlog.c  | 22 ++-
 src/backend/commands/vacuum.c  |  4 +--
 src/include/access/clog.h  |  8 +-
 src/include/access/transam.h   |  4 +--
 7 files changed, 115 insertions(+), 34 deletions(-)

diff --git a/src/backend/access/rmgrdesc/clogdesc.c b/src/backend/access/rmgrdesc/clogdesc.c
index 352de48..ef268c5 100644
--- a/src/backend/access/rmgrdesc/clogdesc.c
+++ b/src/backend/access/rmgrdesc/clogdesc.c
@@ -23,12 +23,20 @@ clog_desc(StringInfo buf, XLogReaderState *record)
 	char	   *rec = XLogRecGetData(record);
 	uint8		info = XLogRecGetInfo(record) & ~XLR_INFO_MASK;

-	if (info == CLOG_ZEROPAGE || info == CLOG_TRUNCATE)
+	if (info == CLOG_ZEROPAGE)
 	{
 		int			pageno;

 		memcpy(, rec, sizeof(int));
-		appendStringInfo(buf, "%d", pageno);
+		appendStringInfo(buf, "page %d", pageno);
+	}
+	else if (info == CLOG_TRUNCATE)
+	{
+		xl_clog_truncate xlrec;
+
+		memcpy(, rec, sizeof(xl_clog_truncate));
+		appendStringInfo(buf, "page %d; oldestXact %u",
+			xlrec.pageno, xlrec.oldestXact);
 	}
 }

diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c
index 1a43819..cfd1c91 100644
--- a/src/backend/access/transam/clog.c
+++ b/src/backend/access/transam/clog.c
@@ -83,7 +83,8 @@ static SlruCtlData ClogCtlData;
 static int	ZeroCLOGPage(int pageno, bool writeXlog);
 static bool CLOGPagePrecedes(int page1, int page2);
 static void

Re: [HACKERS] \h tab-completion

2017-01-24 Thread Beena Emerson

On Wed, Jan 25, 2017 at 11:43 AM, Michael Paquier  wrote:

> On Wed, Jan 25, 2017 at 3:03 PM, Beena Emerson 
> wrote:
> > I think the following change in tab-complete.c would do the trick.
> >
> > -   else if (Matches1("ALTER"))
> > +   else if (TailMatches1("ALTER"))
>
> Nope. This change breaks a bunch of subcommands, take for example
> ALTER TABLE foo ALTER, which would be completed to all the potential
> objects of ALTER commands with your patch, but in this case for
> example we just need to look at the column names, CONSTRAINT and
> COLUMN. CREATE is not part of any subcommands so that's easy to see it
> work with \h. What I think you should do is making the code path of
> \\h smarter with some exceptions by using TailMatchesCS2() for ALTER.
> There is as well the case of DROP commands that should be treated by
> the way.
>
>
I feared the same.
I will check this.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] \h tab-completion

On Wed, Jan 25, 2017 at 3:03 PM, Beena Emerson  wrote:
> I think the following change in tab-complete.c would do the trick.
>
> -   else if (Matches1("ALTER"))
> +   else if (TailMatches1("ALTER"))

Nope. This change breaks a bunch of subcommands, take for example
ALTER TABLE foo ALTER, which would be completed to all the potential
objects of ALTER commands with your patch, but in this case for
example we just need to look at the column names, CONSTRAINT and
COLUMN. CREATE is not part of any subcommands so that's easy to see it
work with \h. What I think you should do is making the code path of
\\h smarter with some exceptions by using TailMatchesCS2() for ALTER.
There is as well the case of DROP commands that should be treated by
the way.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

On Wed, Jan 25, 2017 at 2:58 PM, Ideriha, Takeshi
 wrote:
> Hi,
>
> I'm sorry but I think I don't have time to work on this patch in this CF.
> I've gotten busy for another work.
>
> So I moved this patch to next CF with "waiting on author" status.

Thanks for doing so, that's a time-saver for me!
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \h tab-completion

2017-01-24 Thread Beena Emerson

On Wed, Jan 25, 2017 at 9:03 AM, Stephen Frost  wrote:

> All,
>
> I'm not really inclined to do it myself right now, but it'd be awful
> nice if we had better table completion for \h.
>
> Right now, '\h alter' returns nothing, and '\h alter' returns a
> *bunch* of stuff.
>
> Yet, we happily support '\h alter view' and friends, returning just the
> info relevant for that particular command.
>
> This would be a good starter project for someone new to work on, imv,
> tho it could also go on the to-do list.
>
> Thanks!
>
>

I think the following change in tab-complete.c would do the trick.

-   else if (Matches1("ALTER"))
+   else if (TailMatches1("ALTER"))


postgres=# \h ALTER
AGGREGATE DOMAINFUNCTION
 MATERIALIZED VIEW RULE  SYSTEMTYPE
COLLATION EVENT TRIGGER GROUP OPERATOR
 SCHEMATABLE USER
CONVERSIONEXTENSION INDEX POLICY
 SEQUENCE  TABLESPACEUSER MAPPING FOR
DATABASE  FOREIGN DATA WRAPPER  LANGUAGE
 PUBLICATION   SERVERTEXT SEARCH   VIEW
DEFAULT PRIVILEGESFOREIGN TABLE LARGE OBJECT  ROLE
 SUBSCRIPTION  TRIGGER


-- 
Thank you,

Beena Emerson

Have a Great Day!


tab-complete-hALTER.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in ECPG

2017-01-24 Thread Ideriha, Takeshi

Hi, 

I'm sorry but I think I don't have time to work on this patch in this CF.
I've gotten busy for another work.

So I moved this patch to next CF with "waiting on author" status.

Regards, 
Ideriha Takeshi

> -Original Message-
> From: Ideriha, Takeshi 
> Sent: Friday, January 20, 2017 6:12 PM
> To: 'Michael Paquier' 
> Cc: Michael Meskes ; pgsql-hackers@postgresql.org
> Subject: RE: [WIP] RE: [HACKERS] DECLARE STATEMENT setting up a connection in
> ECPG
> 
> > Idehira-san, this is a very intrusive patch... It really needs a
> > specific reviewer to begin with, but really it would be nice if you
> > could review someone else's patch, with a difficulty rather equivalent to
> what we have here.
> 
> Michael, thank you for taking a look at my patch and giving me an advice.
> And sorry that I didn't follow a procedure of developing postgresql.
> I think I should have sent a system design idea or small patch first and made
> it bigger and bigger step by step instead of dumping a huge patch suddenly.
> 
> Yeah, I'm going to reviewing hackers' patches.
> 
> > By the way, I have been able to crash your patch when running the
> > regression
> > tests:
> 
> Thank you for checking and I'm going to handle this.
> 
> Regards,
> Ideriha Takeshi

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] multivariate statistics (v19)

On Wed, Jan 4, 2017 at 11:35 AM, Tomas Vondra
 wrote:
> On 01/03/2017 05:22 PM, Tomas Vondra wrote:
>>
>> On 01/03/2017 02:42 PM, Dilip Kumar wrote:
>
> ...
>>>
>>> I think it should be easily reproducible, in case it's not I can send
>>> call stack or core dump.
>>>
>>
>> Thanks for the report. It was trivial to reproduce and it turned out to
>> be a fairly simple bug. Will send a new version of the patch soon.
>>
>
> Attached is v22 of the patch series, rebased to current master and fixing
> the reported bug. I haven't made any other changes - the issues reported by
> Petr are mostly minor, so I've decided to wait a bit more for (hopefully)
> other reviews.

And nothing has happened since. Are there people willing to review
this patch and help it proceed? As this patch is quite large, I am not
sure if it is fit to join the last CF. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

The documentation available at
https://www.postgresql.org/docs/devel/static/sql-createtable.html,
does not make it clear that the lower bound of a range partition is
always inclusive and the higher one is exclusive. I think a note in
section " PARTITION OF parent_table FOR VALUES partition_bound_spec"
would be helpful.

On Wed, Jan 25, 2017 at 7:11 AM, Amit Langote
 wrote:
> On 2017/01/25 5:55, Robert Haas wrote:
>> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>>  wrote:
>>> [ new patches ]
>>
>> Committed 0001 and 0002.  See my earlier email for comments on 0003.
>
> It seems patches for all the issues mentioned in this thread so far,
> except 0003 (I just sent an updated version in another email), have been
> committed.  Thanks a lot for your time.  I will create new threads for any
> more patches from here on.
>
> Regards,
> Amit
>
>



-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pinning a buffer in TupleTableSlot is unnecessary

On Mon, Dec 5, 2016 at 11:32 AM, Haribabu Kommi
 wrote:
>
>
> On Tue, Nov 15, 2016 at 1:17 PM, Noah Misch  wrote:
>>
>> On Mon, Nov 14, 2016 at 10:21:53AM -0800, Peter Geoghegan wrote:
>> > BTW, I recently noticed that the latest version of Valgrind, 3.12,
>> > added this new feature:
>> >
>> > * Memcheck:
>> >
>> >   - Added meta mempool support for describing a custom allocator which:
>> >  - Auto-frees all chunks assuming that destroying a pool destroys
>> > all
>> >objects in the pool
>> >  - Uses itself to allocate other memory blocks
>> >
>> > It occurred to me that we might be able to make good use of this.
>>
>> PostgreSQL memory contexts don't acquire blocks from other contexts; they
>> all
>> draw from malloc() directly.  Therefore, I don't see this feature giving
>> us
>> something that the older Memcheck MEMPOOL support does not give us.
>
> Moved to next CF with "needs review" status.
> Please feel free to update the status if the current status doesn't reflect
> the status of the patch.

The latest patch available still applies, one person has added his
name (John G) in October though there have been no reviews. There have
been a couple of arguments against this patch, and the thread has had
no activity for the last month, so I would be incline to mark that as
returned with feedback. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2017-01-24 Thread Fabien COELHO



I'm spending time to try to make something useful of pgbench, which 
require a bunch of patches that work together to improve it for new 
use case, including not being limited to the current set of operators.


This decision is both illogical and arbitrary.


I disagree.  I think his decision was probably based on this email from me:


https://www.postgresql.org/message-id/ca+tgmoa0zp4a+s+kosav4qfdz-wa56vlph8me86rmpsnkvw...@mail.gmail.com



Nobody responded to that,


The answer is on the same day a direct reply that you can check here:

https://www.postgresql.org/message-id/alpine.DEB.2.20.1610041941150.24533%40lancre

The short version is: I have removed XOR and replaced "if" with the SQL 
CASE syntax, and provided justification for the added operators in a 
benchmarking context, i.e. some kind of test is necessary for TPC-B 2.0.0. 
For conditions, logical expressions are needed. Bitwise operators are used 
to skew distribution in some benchmarks (TPC-C as I recall). Functions 
ln/exp could be used for the same purpose, but I can remove those two if 
this is a blocker.


--
Fabien.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

2017-01-24 Thread Craig Ringer

On 24 January 2017 at 23:49, Robert Haas  wrote:

> I think it's fairly surprising that TruncateCLOG() has responsibility
> for writing the xlog record that protects advancing
> ShmemVariableCache->oldestXid, but not the responsibility for actually
> advancing that value.  In other words, I think the AdvanceOldestXid()
> call in vac_truncate_clog() should be moved into TruncateClog().
> (Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the
> responsibility of TruncateCommitTs().)

Agreed, and done in attached.

I haven't done the same for commit_ts but will do so on the thread for
the commit_ts race fix if we go ahead with this change here.

> I think it is not correct to advance oldestXid but not oldestXidDB.
> Otherwise, GetNewTransactionId() might complain about the wrong
> database.

That's a clear oversight. Fixed in attached.

> The way that SetTransactionIdLimit() now works looks a bit dangerous.
> xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the
> passed-in oldestXid value and written straight into shared memory.
> But the shared memory copy of oldestXid could have a different value.
> I'm not sure if that breaks anything, but it certainly weakens any
> confidence callers might have had that all those values are consistent
> with each other.

This was my main hesitation with the whole thing too.

It's necessary to advance oldestXmin before we xlog the advance and
truncate clog, and necessary to advance the vacuum limits only
afterwards.

I thought it sensible to be conservative about the xid limit to
minimise the change from current behaviour, i.e. advance only up to
xid limits calculated from the oldestXid determined by the currently
vac_truncate_clog. But the current one is actually wrong; in the
(unlikely if it's possible at all) case where two vac_truncate_clog()s
A and B run with ordering A(advanceOldestXmin) B(advanceOldestXmin)
A(truncate) B(truncate) B(advanceLimits) A(advanceLimits), A's advance
of the limits would clobber  b's. Not too bad, but it'd delay
advancing the limits until the next vacuum, and some xid could get
allocated before the limits go backwards.

It's safer to take XidGenLock for slightly longer in order to capture
the oldestXid from shmem and calculate using it. That ensures we never
have any risk of going backwards. The attached updated patch does so.

BTW, I find it quite amusing that this was intended to be a small,
unintrusive patch, and now it's messing with xid limits and clog
truncation. I'm almost tempted to say we should commit it with the
(tiny) race with clog truncation in place. We certainly haven't had
any complaints about the same race from people using commit_ts. But
the race window on standby is quite large and I'd rather not knowingly
introduce new bugs.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services
From ca892924e859cdac455fd175e1361c5e08ebee47 Mon Sep 17 00:00:00 2001
From: Craig Ringer 
Date: Mon, 23 Jan 2017 13:25:30 +0800
Subject: [PATCH 1/2] Fix race between clog truncation and lookup

There was previously no way to look up an arbitrary xid without
running the risk of having clog truncated out from under you.

This hasn 't been a problem because anything looking up xids in clog knows
they're protected by datminxid, but that's not the case for arbitrary
user-supplied XIDs. clog was truncated before we advanced oldestXid under
XidGenLock, so holding XidGenLock during a clog lookup was insufficient to
prevent the race. There's no way to look up a SLRU with soft-failure;
attempting a lookup produces an I/O error. There's also no safe way to trap and
swallow the SLRU lookup error due mainly to locking issues.

To address this, increase oldestXid under XidGenLock before we trunate clog
rather than after, so concurrent lookups of arbitrary XIDs are safe if they are
done under XidGenLock. The rest of the xid limits are still advanced after clog
truncation to ensure there's no chance of a new xid trying to access an
about-to-be-truncated clog page. In practice this can't happen anyway since we
use only half the xid space at any time, but additional guards against future
change are warranted with something this crucial.

This race also exists in a worse form on standby servers. On a standby we only
advance oldestXid when we replay the next checkpoint, so there's a much larger
window between clog truncation and subsequent updating of the limit. Fix this
by recording the oldest xid in clog truncation records and applying the
oldestXid under XidGenLock before replaying the clog truncation.

Note that There's no need to take XidGenLock for normal clog lookups protected
by datfrozenxid, only if accepting arbitrary XIDs that might not be protected by
vacuum thresholds.
---
 src/backend/access/rmgrdesc/clogdesc.c | 12 ++--
 src/backend/access/transam/clog.c  | 54 +++---

Re: [HACKERS] COPY as a set returning function

On Mon, Dec 5, 2016 at 2:10 PM, Haribabu Kommi  wrote:
> On Tue, Nov 1, 2016 at 7:45 AM, Corey Huinker 
> wrote:
>>
>> Attached is a patch that implements copy_srf().
>
> Moved to next CF with "needs review" status.

This patch is still waiting for review. David, are you planning to
look at it by the end of the CF?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

On Wed, Jan 25, 2017 at 2:28 PM, Ashutosh Bapat
 wrote:
> On Wed, Jan 25, 2017 at 10:54 AM, Michael Paquier
>  wrote:
>> On Wed, Jan 25, 2017 at 12:46 AM, Tom Lane  wrote:
>>> I wrote:
 Michael Paquier  writes:
> The table of Pseudo-Types needs to be updated as well with unknown in
> datatype.sgml.
>>>
 Check.
>>>
>>> Here's an updated patch with doc changes.  Aside from that one,
>>> I tried to spell "pseudo-type" consistently, and I changed one
>>> place where we were calling something a pseudo-type that isn't.
>>
>> Thanks for the updated version, all the comments have been addressed.
>> You have left a lot of code comments using "pseudotype" instead of
>> "pseudo-type" in the code. I am guessing that this is on purpose,
>> which is fine for me. There is no point to make back-patching more
>> complicated for just that.
>>
>> The CF app has been updated to ready for committer for the record.
>
> I have listed myself as reviewer for this commitfest entry and I am
> yet to review the patch. Can you please wait for the listed reviewers,
> esp. when those reviewers are active, for changing the commitfest
> entry status?

If you want to have an extra look, please be my guest. To be
consistent, I have added my name as well in the list of reviewers.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] I could not see any row in audit table

2017-01-24 Thread Shailesh Singh

Dear Group Member ,


I had configured the audit trigger for my datbase following the below
document url:




*https://wiki.postgresql.org/wiki/Audit_trigger_91plus
*
Now my audit table :

CREATE TABLE audit.logged_actions (
event_id bigserial PRIMARY KEY,
schema_name text NOT NULL,
TABLE_NAME text NOT NULL,
relid oid NOT NULL,
session_user_name text,
action_tstamp_tx TIMESTAMP WITH TIME ZONE NOT NULL,
action_tstamp_stm TIMESTAMP WITH TIME ZONE NOT NULL,
action_tstamp_clk TIMESTAMP WITH TIME ZONE NOT NULL,
transaction_id BIGINT,
application_name text,
client_addr inet,
client_port INTEGER,
client_query text NOT NULL,
action CHAR(1) NOT NULL CHECK (action IN ('I','D','U', 'T')),
row_data hstore,
changed_fields hstore,
statement_only BOOLEAN NOT NULL);


Now this table contains 50 GB of data , But when taking its backup using
pg_dump and after restoring , it show that it has zero row.


How to see the restored data?

-- 
With Regards!
Shailesh Singh

Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

On Wed, Jan 25, 2017 at 10:54 AM, Michael Paquier
 wrote:
> On Wed, Jan 25, 2017 at 12:46 AM, Tom Lane  wrote:
>> I wrote:
>>> Michael Paquier  writes:
 The table of Pseudo-Types needs to be updated as well with unknown in
 datatype.sgml.
>>
>>> Check.
>>
>> Here's an updated patch with doc changes.  Aside from that one,
>> I tried to spell "pseudo-type" consistently, and I changed one
>> place where we were calling something a pseudo-type that isn't.
>
> Thanks for the updated version, all the comments have been addressed.
> You have left a lot of code comments using "pseudotype" instead of
> "pseudo-type" in the code. I am guessing that this is on purpose,
> which is fine for me. There is no point to make back-patching more
> complicated for just that.
>
> The CF app has been updated to ready for committer for the record.

I have listed myself as reviewer for this commitfest entry and I am
yet to review the patch. Can you please wait for the listed reviewers,
esp. when those reviewers are active, for changing the commitfest
entry status?

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

On Wed, Jan 25, 2017 at 6:38 AM, Robert Haas  wrote:
> Let's mark this Returned with Feedback and move on.  We've only got a
> week left in the CommitFest anyhow and there are lots of other things
> that still need work (and which actually have been revised to match
> previous feedback).

Done as such, let's move on.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

On Wed, Jan 25, 2017 at 12:46 AM, Tom Lane  wrote:
> I wrote:
>> Michael Paquier  writes:
>>> The table of Pseudo-Types needs to be updated as well with unknown in
>>> datatype.sgml.
>
>> Check.
>
> Here's an updated patch with doc changes.  Aside from that one,
> I tried to spell "pseudo-type" consistently, and I changed one
> place where we were calling something a pseudo-type that isn't.

Thanks for the updated version, all the comments have been addressed.
You have left a lot of code comments using "pseudotype" instead of
"pseudo-type" in the code. I am guessing that this is on purpose,
which is fine for me. There is no point to make back-patching more
complicated for just that.

The CF app has been updated to ready for committer for the record.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-24 Thread Beena Emerson

On Wed, Jan 25, 2017 at 10:36 AM, Jim Nasby 
wrote:

> On 1/24/17 2:26 AM, Mithun Cy wrote:
>
>> Thanks for looking into this patch, I just downloaded the patch and
>> applied same to the latest code, I can see file " autoprewarm.save" in
>> $PGDATA which is created and dumped at shutdown time and an activity
>> is logged as below
>> 2017-01-24 13:22:25.012 IST [91755] LOG:  Buffer Dump: saved metadata
>> of 59 blocks.
>>
>
> Yeah, I wasn't getting that at all, though I did see the shared library
> being loaded. If I get a chance I'll try it again.



Hope u added the following to the conf file:

shared_preload_libraries = 'pg_autoprewarm' # (change requires restart)
pg_autoprewarm.buff_dump_interval=20

Even after this u could not see the message then that's strange.

-- 
Thank you,

Beena Emerson

Have a Great Day!

Re: [HACKERS] Proposal : For Auto-Prewarm.


On 1/24/17 2:26 AM, Mithun Cy wrote:

Thanks for looking into this patch, I just downloaded the patch and
applied same to the latest code, I can see file " autoprewarm.save" in
$PGDATA which is created and dumped at shutdown time and an activity
is logged as below
2017-01-24 13:22:25.012 IST [91755] LOG:  Buffer Dump: saved metadata
of 59 blocks.


Yeah, I wasn't getting that at all, though I did see the shared library 
being loaded. If I get a chance I'll try it again.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Proposal : For Auto-Prewarm.


On 1/24/17 4:56 AM, Beena Emerson wrote:

2. buff_dump_interval could be renamed to just dump_interval or
buffer_dump_interval. Also, since it is now part of pg_prewarm. I think
it makes sense to have the conf parameter be: pg_prewarm.xxx instead of
pg_autoprewarm.xxx


I'd really like to find terminology other than "buffer dump", because 
that makes it sound like we're dumping the contents of the buffers 
themselves.


Maybe block_map? Buffer_map?
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?


On 1/24/17 10:30 AM, Joshua D. Drake wrote:


Tom is correct here. They are not a net win for the average user. We
tend to forget that although we collectively have a lot of enterprise
installs where this does matter, we collectively do not equal near the
level of average user installs.

From an advocacy perspective, the average user install is the one that
we tend most because that tending (in theory) will grow something that
is more fruitful e.g; the enterprise install over time because we
constantly and consistently provided a reasonable and expected
experience to the average user.


I'm not completely grokking your second paragraph, but I would think 
that an average user would love got get a heads-up that their hardware 
is failing.

--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-24 Thread Pavel Stehule

2017-01-25 5:45 GMT+01:00 Tom Lane :

> Andres Freund  writes:
> > On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> >> XMLTABLE is specified by the standard to return multiple rows ... but
> >> then as far as my reading goes, it is only supposed to be supported in
> >> the range table (FROM clause) not in the target list.  I wonder if
> >> this would end up better if we only tried to support it in RT.  I asked
> >> Pavel to implement it like that a few weeks ago, but ...
>
> > Right - it makes sense in the FROM list - but then it should be an
> > executor node, instead of some expression thingy.
>
> +1 --- we're out of the business of having simple expressions that
> return rowsets.
>

If we do decision so this kind of function will have different behave than
other SRF functions, then I remove support for this.

There are not technical reasons (maybe I don't see it) - last Andres
changes do well support for my code.

Regards

Pavel


>
> regards, tom lane
>

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-24 Thread Haribabu Kommi

On Tue, Jan 24, 2017 at 6:17 PM, Michael Paquier 
wrote:

> On Mon, Jan 23, 2017 at 5:13 PM, Haribabu Kommi
>  wrote:
> > On Sat, Jan 21, 2017 at 8:01 AM, Tom Lane  wrote:
> >> * I'm not exactly convinced that the way you approached the error
> message
> >> reporting, ie duplicating the logged message, is good.  In particular
> >> this results in localizing the strings reported in pg_hba_rules.error,
> >> which is exactly opposite to the decision we reached for the
> >> pg_file_settings view.  What's the reasoning for deciding that this
> >> view should contain localized strings?  (More generally, we found in
> >> the pg_file_settings view that we didn't always want to use exactly
> >> the same string that was logged, anyway.)
> >
> > Actually there is no particular reason to display the localized strings,
> > Just thought that it may be useful to the user if it get displayed in
> their
> > own language. And also doing this way will reduce the error message
> > duplicate in the code that is used for display in the view and writing it
> > into the log file.
>
> Perhaps consistency would not hurt and something like
> record_config_file_error() could be done to save the error parsing
> error. What's actually the problem with localized strings exposed in a
> system view? Encoding conflicts?
>
> >> * Also, there seems to be a lot of ereports remaining unconverted,
> >> eg the "authentication file token too long" error.  One of the things
> >> we wanted pg_file_settings to be able to do was finger pretty much any
> >> mistake in the config file, including syntax errors.  It seems like
> >> it'd be a shame if pg_hba_rules is unable to help with that.  You
> >> should be able to fill in line number and error even if the line is
> >> too mangled to be able to populate the other fields sanely.
> >
> > The two errors that are missed are, "could not open secondary
> authentication
> > file"
> > and "authentication file token too long" errors. For these two cases, the
> > server
> > is not throwing any error, it just logs the message and continues. Is it
> > fine to add
> > these these two cases as errors in the view?
>
> Missed those ones during the initial review... It would be a good idea
> to include them to track problems.
>

The above mentioned two error logs that occur in the tokenize_file function.
Currently during the loading of pg_hba.conf file, it just logs the above two
problems and continue to load the file.

Currently, I added the errors for the cases, where the server will stop
proceeding
because of these errors. Those are mostly in parse_hba_line function.

To enhance error reporting of failures in tokenize_file also, the
tokenize_file should
return errors along with line_nums and those lines should be ignored in
processing
the parse_hba_line function. To do that, the tokenize_file should return
whenever
it encounters above those two errors only in pg_hba_rules case, but not for
normal
scenario.

Is it fine to proceed with the changes?

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] pg_hba_file_settings view patch

On Wed, Jan 25, 2017 at 9:58 AM, Haribabu Kommi
 wrote:
>
>
> On Wed, Jan 25, 2017 at 2:50 PM, Ashutosh Bapat
>  wrote:
>>
>> On Wed, Jan 25, 2017 at 6:34 AM, Michael Paquier
>>  wrote:
>> > On Tue, Jan 24, 2017 at 11:19 PM, Ashutosh Bapat
>> >  wrote:
>> >> On Mon, Jan 23, 2017 at 1:43 PM, Haribabu Kommi
>> >>  wrote:
>> >>>
>> >>>
>> >>> On Sat, Jan 21, 2017 at 8:01 AM, Tom Lane  wrote:
>> 
>>  Haribabu Kommi  writes:
>>  > [ pg_hba_rules_10.patch ]
>> 
>>  I took a quick look over this.
>> >>>
>> >>>
>> >>> Thanks for the review.
>> >>>
>> 
>>  * I'm not exactly convinced that the way you approached the error
>>  message
>>  reporting, ie duplicating the logged message, is good.  In particular
>>  this results in localizing the strings reported in
>>  pg_hba_rules.error,
>>  which is exactly opposite to the decision we reached for the
>>  pg_file_settings view.  What's the reasoning for deciding that this
>>  view should contain localized strings?  (More generally, we found in
>>  the pg_file_settings view that we didn't always want to use exactly
>>  the same string that was logged, anyway.)
>> >>>
>> >>>
>> >>> Actually there is no particular reason to display the localized
>> >>> strings,
>> >>> Just thought that it may be useful to the user if it get displayed in
>> >>> their
>> >>> own language. And also doing this way will reduce the error message
>> >>> duplicate in the code that is used for display in the view and writing
>> >>> it
>> >>> into the log file.
>> >>>
>> >>
>> >> Would it be better, if we could parse each HBA line within
>> >> PG_TRY()/PG_CATCH() and read errmsg from errordata stack in
>> >> PG_CATCH()? We do that only when errcode is ERRCODE_CONFIG_FILE_ERROR,
>> >> PG_THROWing otherwise. That's probably a bad idea but wanted to put it
>> >> out as it came to me. It would eliminate a lot of changes in this
>> >> patch.
>> >
>> > It still needs to save the error message string somewhere. So I am not
>> > sure that it would save much patch size.
>>
>> My understanding is that ereport (and some other calls included in
>> that statement) call saves it on errordata stack before jumping to the
>> handler.
>
>
> All the ereport messages of level are LOG, because of this reason, because
> of this reason even if we use the TRY/CATCH, it doesn't work.  As the
> messages gets printed to the logfile and continue to process the next
> statement.

Right. Sorry for missing to mention about this change in the patch.
Originally the messages are at level ERROR so TRY/CATCH will be able
to catch it. We will need to somehow then turn ERROR to LOG and
re-throw it. I haven't tried it myself though.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-24 Thread Pavel Stehule

Hi

2017-01-25 1:35 GMT+01:00 Andres Freund :

> On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> > Andres Freund wrote:
> > > Hi,
> > >
> > > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext
> *econtext,
> > > > +   bool *isnull);
> > > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate,
> ExprContext *econtext,
> > > > +   bool *isNull);
> > > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext
> *econtext,
> > > > + bool *isNull);
> > > > +static void tabexprInitialize(TableExprState *tstate, ExprContext
> *econtext,
> > > > +   Datum doc);
> > > > +static void ShutdownTableExpr(Datum arg);
> > >
> > > To me this (and a lot of the other code) hints quite strongly that
> > > expression evalution is the wrong approach to implementing this.  What
> > > you're essentially doing is building a vulcano style scan node.  Even
> if
> > > we can this, we shouldn't double down on the bad decision to have these
> > > magic expressions that return multiple rows.  There's historical reason
> > > for tSRFs, but we shouldn't add more weirdness like this.
> >
> > Thanks for giving it a look.  I have long thought that this patch would
> > be at odds with your overall executor work.
>
> Not fundamentally, but it makes it harder.
>

If you plan to hold support SRFin target list, then nothing is different.
In last patch is executed under nodeProjectSet.

>
>
> > XMLTABLE is specified by the standard to return multiple rows ... but
> > then as far as my reading goes, it is only supposed to be supported in
> > the range table (FROM clause) not in the target list.  I wonder if
> > this would end up better if we only tried to support it in RT.  I asked
> > Pavel to implement it like that a few weeks ago, but ...
>
> Right - it makes sense in the FROM list - but then it should be an
> executor node, instead of some expression thingy.
>

The XMLTABLE function is from user perspective, from implementation
perspective a form of SRF function. I use own executor node, because fcinfo
is complex already and not too enough to hold all information about result
columns.

The implementation as RT doesn't reduce code - it is just moving to
different file.

I'll try to explain my motivation. Please, check it and correct me if I am
wrong. I don't keep on my implementation - just try to implement XMLTABLE
be consistent with another behave and be used all time without any
surprise.

1. Any function that produces a content can be used in target list. We
support SRF in target list and in FROM part. Why XMLTABLE should be a
exception?

2. In standard the XMLTABLE is placed only on FROM part - but standard
doesn't need to solve my question - there are not SRF functions allowed in
targetlist.

If there be a common decision so this inconsistency (in behave of this kind
of functions) is expected, required - then I have not a problem to remove
this support from XMLTABLE.

In this moment I don't see a technical reason for this step - with last
Andres changes the support of XMLTABLE in target list needs less than 40
lines and there is not any special path for XMLTABLE only. Andres write
support for SRF functions and SRF operator. TableExpr is third category.

Regards

Pavel

> Greetings,
>
> Andres Freund
>

Re: [HACKERS] patch: function xmltable

Andres Freund  writes:
> On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
>> XMLTABLE is specified by the standard to return multiple rows ... but
>> then as far as my reading goes, it is only supposed to be supported in
>> the range table (FROM clause) not in the target list.  I wonder if
>> this would end up better if we only tried to support it in RT.  I asked
>> Pavel to implement it like that a few weeks ago, but ...

> Right - it makes sense in the FROM list - but then it should be an
> executor node, instead of some expression thingy.

+1 --- we're out of the business of having simple expressions that
return rowsets.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Faster methods for getting SPI results (460% improvement)


On 1/23/17 10:36 PM, Craig Ringer wrote:

which is currently returned as

[ {"a":1, "b":10}, {"a":2, "b":20} ]

instead as

{ "a": [1, 2], "b": [10, 20] }


Correct.


If so I see that as a lot more of a niche thing. I can see why it'd be
useful and would help performance, but it seems much more disruptive.
It requires users to discover it exists, actively adopt a different
style of ingesting data, etc. For a 10%-ish gain in a PL.


In data science, what we're doing now is actually the niche. All real 
analytics happens with something like a Pandas DataFrame, which is 
organized as a dict of lists.


This isn't just idle nomenclature either: organizing results in what 
amounts to a column store provides a significant speed improvement for 
most analytics, because you're working on an array of contiguous memory 
(at least, when you're using more advanced types like DataFrames and 
Series).



I strongly suggest making this design effort a separate thread, and
focusing on the SPI improvements that give "free" no-user-action
performance boosts here.


Fair enough. I posted the SPI portion of that yesterday. That should be 
useful for pl/R and possibly pl/perl. pl/tcl could make use of it, but 
it would end up executing arbitrary tcl code in the middle of portal 
execution, which doesn't strike me as a great idea. Unfortunately, I 
don't think plpgsql could make much use of this for similar reasons.


I'll post a plpython patch that doesn't add the output format control.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_hba_file_settings view patch

2017-01-24 Thread Haribabu Kommi

On Wed, Jan 25, 2017 at 2:50 PM, Ashutosh Bapat <
ashutosh.ba...@enterprisedb.com> wrote:

> On Wed, Jan 25, 2017 at 6:34 AM, Michael Paquier
>  wrote:
> > On Tue, Jan 24, 2017 at 11:19 PM, Ashutosh Bapat
> >  wrote:
> >> On Mon, Jan 23, 2017 at 1:43 PM, Haribabu Kommi
> >>  wrote:
> >>>
> >>>
> >>> On Sat, Jan 21, 2017 at 8:01 AM, Tom Lane  wrote:
> 
>  Haribabu Kommi  writes:
>  > [ pg_hba_rules_10.patch ]
> 
>  I took a quick look over this.
> >>>
> >>>
> >>> Thanks for the review.
> >>>
> 
>  * I'm not exactly convinced that the way you approached the error
> message
>  reporting, ie duplicating the logged message, is good.  In particular
>  this results in localizing the strings reported in pg_hba_rules.error,
>  which is exactly opposite to the decision we reached for the
>  pg_file_settings view.  What's the reasoning for deciding that this
>  view should contain localized strings?  (More generally, we found in
>  the pg_file_settings view that we didn't always want to use exactly
>  the same string that was logged, anyway.)
> >>>
> >>>
> >>> Actually there is no particular reason to display the localized
> strings,
> >>> Just thought that it may be useful to the user if it get displayed in
> their
> >>> own language. And also doing this way will reduce the error message
> >>> duplicate in the code that is used for display in the view and writing
> it
> >>> into the log file.
> >>>
> >>
> >> Would it be better, if we could parse each HBA line within
> >> PG_TRY()/PG_CATCH() and read errmsg from errordata stack in
> >> PG_CATCH()? We do that only when errcode is ERRCODE_CONFIG_FILE_ERROR,
> >> PG_THROWing otherwise. That's probably a bad idea but wanted to put it
> >> out as it came to me. It would eliminate a lot of changes in this
> >> patch.
> >
> > It still needs to save the error message string somewhere. So I am not
> > sure that it would save much patch size.
>
> My understanding is that ereport (and some other calls included in
> that statement) call saves it on errordata stack before jumping to the
> handler.


All the ereport messages of level are LOG, because of this reason, because
of this reason even if we use the TRY/CATCH, it doesn't work.  As the
messages gets printed to the logfile and continue to process the next
statement.

Regards,
Hari Babu
Fujitsu Australia

Re: [HACKERS] pg_hba_file_settings view patch

On Wed, Jan 25, 2017 at 6:34 AM, Michael Paquier
 wrote:
> On Tue, Jan 24, 2017 at 11:19 PM, Ashutosh Bapat
>  wrote:
>> On Mon, Jan 23, 2017 at 1:43 PM, Haribabu Kommi
>>  wrote:
>>>
>>>
>>> On Sat, Jan 21, 2017 at 8:01 AM, Tom Lane  wrote:

 Haribabu Kommi  writes:
 > [ pg_hba_rules_10.patch ]

 I took a quick look over this.
>>>
>>>
>>> Thanks for the review.
>>>

 * I'm not exactly convinced that the way you approached the error message
 reporting, ie duplicating the logged message, is good.  In particular
 this results in localizing the strings reported in pg_hba_rules.error,
 which is exactly opposite to the decision we reached for the
 pg_file_settings view.  What's the reasoning for deciding that this
 view should contain localized strings?  (More generally, we found in
 the pg_file_settings view that we didn't always want to use exactly
 the same string that was logged, anyway.)
>>>
>>>
>>> Actually there is no particular reason to display the localized strings,
>>> Just thought that it may be useful to the user if it get displayed in their
>>> own language. And also doing this way will reduce the error message
>>> duplicate in the code that is used for display in the view and writing it
>>> into the log file.
>>>
>>
>> Would it be better, if we could parse each HBA line within
>> PG_TRY()/PG_CATCH() and read errmsg from errordata stack in
>> PG_CATCH()? We do that only when errcode is ERRCODE_CONFIG_FILE_ERROR,
>> PG_THROWing otherwise. That's probably a bad idea but wanted to put it
>> out as it came to me. It would eliminate a lot of changes in this
>> patch.
>
> It still needs to save the error message string somewhere. So I am not
> sure that it would save much patch size.

My understanding is that ereport (and some other calls included in
that statement) call saves it on errordata stack before jumping to the
handler.

-- 
Best Wishes,
Ashutosh Bapat
EnterpriseDB Corporation
The Postgres Database Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] \h tab-completion

2017-01-24 Thread Stephen Frost

All,

I'm not really inclined to do it myself right now, but it'd be awful
nice if we had better table completion for \h.

Right now, '\h alter' returns nothing, and '\h alter' returns a
*bunch* of stuff.

Yet, we happily support '\h alter view' and friends, returning just the
info relevant for that particular command.

This would be a good starter project for someone new to work on, imv,
tho it could also go on the to-do list.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] increasing the default WAL segment size

On Wed, Jan 25, 2017 at 6:58 AM, Robert Haas  wrote:
> On Fri, Jan 20, 2017 at 7:00 PM, Michael Paquier
>  wrote:
>>> No, because the output of SHOW is always of type text, regardless of
>>> the type of the GUC.
>>
>> Thinking about that over night, that looks pretty nice. What would be
>> nicer though would be to add INT8OID and BYTEAOID in the list, and
>> convert as well the other replication commands. At the end, I think
>> that we should finish by being able to remove all pq_* routine
>> dependencies in walsender.c, saving quite a couple of lines.
>
> Might be worth investigating, but I don't feel any obligation to do
> that right now.  Thanks for the review; committed.

OK, I have done this refactoring effort as attached because I think
that's really worth it. And here are the diff numbers:
 3 files changed, 113 insertions(+), 162 deletions(-)
That's a bit less than what I thought first because of all the
singularities of bytea in its output and the way TIMELINE_HISTORY
takes advantage of the message level routines. Still for
IDENTIFY_SYSTEM, START_REPLICATION and CREATE_REPLICATION_SLOT the
gains in readability are here.

What do you think?
-- 
Michael

refactor-repl-cmd-output.patch
Description: application/download

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pageinspect: Hash index support

2017-01-24 Thread Mithun Cy

On Thu, Jan 19, 2017 at 5:08 PM, Ashutosh Sharma  wrote:

Thanks, Ashutosh and Jesper. I have tested the patch I do not have any
more comments so making it ready for committer.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] simplify sequence test

2017-01-24 Thread Peter Eisentraut

We maintain a separate test output file sequence_1.out because the
log_cnt value can vary if there is a checkpoint happening at the right
time.  So we have to maintain two files because of a one character
difference.  I propose the attached patch to restructure the test a bit
to avoid this, without loss of test coverage.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 3d28ea1b6684fb00591526a79dfaeafbff459a44 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 24 Jan 2017 09:44:40 -0500
Subject: [PATCH] Simplify sequence test

We maintained two separate expected files because log_cnt could be one
of two values.  Rewrite the test so that we only need one file.
---
 src/test/regress/expected/sequence.out   |  10 +-
 src/test/regress/expected/sequence_1.out | 559 ---
 src/test/regress/sql/sequence.sql|   4 +-
 3 files changed, 9 insertions(+), 564 deletions(-)
 delete mode 100644 src/test/regress/expected/sequence_1.out

diff --git a/src/test/regress/expected/sequence.out b/src/test/regress/expected/sequence.out
index a2bdd3002b..ad03a31a4e 100644
--- a/src/test/regress/expected/sequence.out
+++ b/src/test/regress/expected/sequence.out
@@ -190,10 +190,12 @@ SELECT nextval('foo_seq_new');
2
 (1 row)
 
-SELECT * FROM foo_seq_new;
- last_value | log_cnt | is_called 
-+-+---
-  2 |  31 | t
+-- log_cnt can be higher if there is a checkpoint just at the right
+-- time, so just test for the expected range
+SELECT last_value, log_cnt IN (31, 32) AS log_cnt_ok, is_called FROM foo_seq_new;
+ last_value | log_cnt_ok | is_called 
+++---
+  2 | t  | t
 (1 row)
 
 DROP SEQUENCE foo_seq_new;
diff --git a/src/test/regress/expected/sequence_1.out b/src/test/regress/expected/sequence_1.out
deleted file mode 100644
index 5d7ab72944..00
--- a/src/test/regress/expected/sequence_1.out
+++ /dev/null
@@ -1,559 +0,0 @@

 test creation of SERIAL column

-CREATE TABLE serialTest (f1 text, f2 serial);
-INSERT INTO serialTest VALUES ('foo');
-INSERT INTO serialTest VALUES ('bar');
-INSERT INTO serialTest VALUES ('force', 100);
-INSERT INTO serialTest VALUES ('wrong', NULL);
-ERROR:  null value in column "f2" violates not-null constraint
-DETAIL:  Failing row contains (wrong, null).
-SELECT * FROM serialTest;
-  f1   | f2  
+-
- foo   |   1
- bar   |   2
- force | 100
-(3 rows)
-
--- test smallserial / bigserial
-CREATE TABLE serialTest2 (f1 text, f2 serial, f3 smallserial, f4 serial2,
-  f5 bigserial, f6 serial8);
-INSERT INTO serialTest2 (f1)
-  VALUES ('test_defaults');
-INSERT INTO serialTest2 (f1, f2, f3, f4, f5, f6)
-  VALUES ('test_max_vals', 2147483647, 32767, 32767, 9223372036854775807,
-  9223372036854775807),
- ('test_min_vals', -2147483648, -32768, -32768, -9223372036854775808,
-  -9223372036854775808);
--- All these INSERTs should fail:
-INSERT INTO serialTest2 (f1, f3)
-  VALUES ('bogus', -32769);
-ERROR:  smallint out of range
-INSERT INTO serialTest2 (f1, f4)
-  VALUES ('bogus', -32769);
-ERROR:  smallint out of range
-INSERT INTO serialTest2 (f1, f3)
-  VALUES ('bogus', 32768);
-ERROR:  smallint out of range
-INSERT INTO serialTest2 (f1, f4)
-  VALUES ('bogus', 32768);
-ERROR:  smallint out of range
-INSERT INTO serialTest2 (f1, f5)
-  VALUES ('bogus', -9223372036854775809);
-ERROR:  bigint out of range
-INSERT INTO serialTest2 (f1, f6)
-  VALUES ('bogus', -9223372036854775809);
-ERROR:  bigint out of range
-INSERT INTO serialTest2 (f1, f5)
-  VALUES ('bogus', 9223372036854775808);
-ERROR:  bigint out of range
-INSERT INTO serialTest2 (f1, f6)
-  VALUES ('bogus', 9223372036854775808);
-ERROR:  bigint out of range
-SELECT * FROM serialTest2 ORDER BY f2 ASC;
-  f1   | f2  |   f3   |   f4   |  f5  |  f6  
+-+++--+--
- test_min_vals | -2147483648 | -32768 | -32768 | -9223372036854775808 | -9223372036854775808
- test_defaults |   1 |  1 |  1 |1 |1
- test_max_vals |  2147483647 |  32767 |  32767 |  9223372036854775807 |  9223372036854775807
-(3 rows)
-
-SELECT nextval('serialTest2_f2_seq');
- nextval 
--
-   2
-(1 row)
-
-SELECT nextval('serialTest2_f3_seq');
- nextval 
--
-   2
-(1 row)
-
-SELECT nextval('serialTest2_f4_seq');
- nextval 
--
-   2
-(1 row)
-
-SELECT nextval('serialTest2_f5_seq');
- nextval 
--
-   2
-(1 row)
-
-SELECT nextval('serialTest2_f6_seq');
- nextval 
--
-   2
-(1 row)
-
--- basic sequence operations using both text and oid references
-CREATE SEQUENCE sequence_test;
-CREATE SEQUENCE IF NOT EXISTS sequence_test;
-NOTICE:  relation "sequence_test" already exists,

Re: [HACKERS] Declarative partitioning - another take

On 2017/01/25 5:55, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>  wrote:
>> [ new patches ]
> 
> Committed 0001 and 0002.  See my earlier email for comments on 0003.

It seems patches for all the issues mentioned in this thread so far,
except 0003 (I just sent an updated version in another email), have been
committed.  Thanks a lot for your time.  I will create new threads for any
more patches from here on.

Regards,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

On 2017/01/25 2:56, Robert Haas wrote:
> On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
>  wrote:
>>> But I wonder why we don't instead just change this function to
>>> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
>>> comparing the type OIDs is to find out whether the table-has-OIDs
>>> setting matches, we could instead test that directly and avoid needing
>>> to pass an extra argument.  I wonder if there's some other reason this
>>> code is there which is not documented in the comment...
>>
>> With the following patch, regression tests run fine:
>>
>>   if (indesc->natts == outdesc->natts &&
>> - indesc->tdtypeid == outdesc->tdtypeid)
>> + indesc->tdhasoid != outdesc->tdhasoid)
>>  {
>>
>> If checking tdtypeid (instead of tdhasoid directly) has some other
>> consideration, I'd would have seen at least some tests broken by this
>> change.  So, if we are to go with this, I too prefer it over my previous
>> proposal to add an argument to convert_tuples_by_name().  Attached 0003
>> implements the above approach.
> 
> I think this is not quite right.  First, the patch compares the
> tdhasoid status with != rather than ==, which would have the effect of
> saying that we can skip conversion of the has-OID statuses do NOT
> match.  That can't be right.

You're right.

> Second, I believe that the comments
> imply that conversion should be done if *either* tuple has OIDs.  I
> believe that's because whoever wrote this comment thought that we
> needed to replace the OID if the tuple already had one, which is what
> do_convert_tuple would do.  I'm not sure whether that's really
> necessary, but we're less likely to break anything if we preserve the
> existing behavior, and I don't think we lose much from doing so
> because few user tables will have OIDs.  So I would change this test
> to if (indesc->natts == outdesc->natts && !indesc->tdhasoid &&
> !outdesc->tdhasoid), and I'd revise the one in
> convert_tuples_by_position() to match.  Then I think it's much clearer
> that we're just optimizing what's there already, not changing the
> behavior.

Agreed.  Updated patch attached.

Thanks,
Amit
>From c1fa4b9f04f328b8df54ef487ee9d46f5978e0de Mon Sep 17 00:00:00 2001
From: amit 
Date: Mon, 26 Dec 2016 17:44:14 +0900
Subject: [PATCH] Avoid tuple coversion in common partitioning cases

Currently, the tuple conversion is performed after a tuple is routed,
even if the attributes of a target leaf partition map one-to-one with
those of the root table, which is wasteful.  Avoid that by making
convert_tuples_by_name() return a NULL map for such cases.

Reported by: n/a
Patch by: Amit Langote
Reports: n/a
---
 src/backend/access/common/tupconvert.c | 18 ++
 src/backend/catalog/partition.c|  3 +--
 2 files changed, 11 insertions(+), 10 deletions(-)

diff --git a/src/backend/access/common/tupconvert.c b/src/backend/access/common/tupconvert.c
index b17ceafa6e..a4012525d8 100644
--- a/src/backend/access/common/tupconvert.c
+++ b/src/backend/access/common/tupconvert.c
@@ -138,12 +138,13 @@ convert_tuples_by_position(TupleDesc indesc,
 		   nincols, noutcols)));
 
 	/*
-	 * Check to see if the map is one-to-one and the tuple types are the same.
-	 * (We check the latter because if they're not, we want to do conversion
-	 * to inject the right OID into the tuple datum.)
+	 * Check to see if the map is one-to-one, in which case we need not do
+	 * the tuple conversion.  That's not enough though if either source or
+	 * destination (tuples) contains OIDs; we'd need conversion in that case
+	 * to inject the right OID into the tuple datum.
 	 */
 	if (indesc->natts == outdesc->natts &&
-		indesc->tdtypeid == outdesc->tdtypeid)
+		!indesc->tdhasoid && !outdesc->tdhasoid)
 	{
 		for (i = 0; i < n; i++)
 		{
@@ -214,12 +215,13 @@ convert_tuples_by_name(TupleDesc indesc,
 	attrMap = convert_tuples_by_name_map(indesc, outdesc, msg);
 
 	/*
-	 * Check to see if the map is one-to-one and the tuple types are the same.
-	 * (We check the latter because if they're not, we want to do conversion
-	 * to inject the right OID into the tuple datum.)
+	 * Check to see if the map is one-to-one, in which case we need not do
+	 * the tuple conversion.  That's not enough though if either source or
+	 * destination (tuples) contains OIDs; we'd need conversion in that case
+	 * to inject the right OID into the tuple datum.
 	 */
 	if (indesc->natts == outdesc->natts &&
-		indesc->tdtypeid == outdesc->tdtypeid)
+		!indesc->tdhasoid && !outdesc->tdhasoid)
 	{
 		same = true;
 		for (i = 0; i < n; i++)
diff --git a/src/backend/catalog/partition.c b/src/backend/catalog/partition.c
index 7be60529c5..4bcef58763 100644
--- a/src/backend/catalog/partition.c
+++ b/src/backend/catalog/partition.c
@@ -1700,12 +1700,11 @@ get_partition_for_tuple(PartitionDispatch *pd,
 			return -1;
 		}
 
-		if (myslot != NULL)
+		if (myslot != NULL && map != NULL)
 		{

Re: [HACKERS] pg_hba_file_settings view patch

On Tue, Jan 24, 2017 at 11:19 PM, Ashutosh Bapat
 wrote:
> On Mon, Jan 23, 2017 at 1:43 PM, Haribabu Kommi
>  wrote:
>>
>>
>> On Sat, Jan 21, 2017 at 8:01 AM, Tom Lane  wrote:
>>>
>>> Haribabu Kommi  writes:
>>> > [ pg_hba_rules_10.patch ]
>>>
>>> I took a quick look over this.
>>
>>
>> Thanks for the review.
>>
>>>
>>> * I'm not exactly convinced that the way you approached the error message
>>> reporting, ie duplicating the logged message, is good.  In particular
>>> this results in localizing the strings reported in pg_hba_rules.error,
>>> which is exactly opposite to the decision we reached for the
>>> pg_file_settings view.  What's the reasoning for deciding that this
>>> view should contain localized strings?  (More generally, we found in
>>> the pg_file_settings view that we didn't always want to use exactly
>>> the same string that was logged, anyway.)
>>
>>
>> Actually there is no particular reason to display the localized strings,
>> Just thought that it may be useful to the user if it get displayed in their
>> own language. And also doing this way will reduce the error message
>> duplicate in the code that is used for display in the view and writing it
>> into the log file.
>>
>
> Would it be better, if we could parse each HBA line within
> PG_TRY()/PG_CATCH() and read errmsg from errordata stack in
> PG_CATCH()? We do that only when errcode is ERRCODE_CONFIG_FILE_ERROR,
> PG_THROWing otherwise. That's probably a bad idea but wanted to put it
> out as it came to me. It would eliminate a lot of changes in this
> patch.

It still needs to save the error message string somewhere. So I am not
sure that it would save much patch size.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

Hi Keith,

On 2017/01/20 12:40, Keith Fiske wrote:
> So testing things out in pg_partman for native sub-partitioning and ran
> into what is a bug for me that I know I have to fix, but I'm curious if
> this can be prevented in the first place within the native partitioning
> code itself. The below shows a sub-partitioning set where the sub-partition
> has a constraint range that is outside of the range of its parent. If the
> columns were different I could see where this would be allowed, but the
> same column is used throughout the levels of sub-partitioning.
> Understandable if that may be too complex to check for, but figured I'd
> bring it up as something I accidentally ran into in case you see an easy
> way to prevent it.

This was discussed.  See Robert's response (2nd part of):
https://www.postgresql.org/message-id/CA%2BTgmoaQABrsLQK4ms_4NiyavyJGS-b6ZFkZBBNC%2B-P5DjJNFA%40mail.gmail.com

In short, defining partitions across different levels such that the data
user intended to insert into the table (the part of the sub-partition's
range that doesn't overlap with its parent's) couldn't be, that's an
operator error.  It's like adding contradictory check constraints to the
table:

create table foo (a int check (a > 0 and a < 0));
insert into foo values (1);
ERROR:  new row for relation "foo" violates check constraint "foo_a_check"
DETAIL:  Failing row contains (1).

One (perhaps the only) thing that could be done is to warn users to
prevent this kind of mistake through documentation.  Trying to do anything
else in the core partitioning code is making it too complicated for not
much benefit (also see Robert's last line, :)).

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

On 2017-01-24 21:32:56 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > Hi,
> > 
> > On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext 
> > > *econtext,
> > > +   bool *isnull);
> > > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, 
> > > ExprContext *econtext,
> > > +   bool *isNull);
> > > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext 
> > > *econtext,
> > > + bool *isNull);
> > > +static void tabexprInitialize(TableExprState *tstate, ExprContext 
> > > *econtext,
> > > +   Datum doc);
> > > +static void ShutdownTableExpr(Datum arg);
> > 
> > To me this (and a lot of the other code) hints quite strongly that
> > expression evalution is the wrong approach to implementing this.  What
> > you're essentially doing is building a vulcano style scan node.  Even if
> > we can this, we shouldn't double down on the bad decision to have these
> > magic expressions that return multiple rows.  There's historical reason
> > for tSRFs, but we shouldn't add more weirdness like this.
> 
> Thanks for giving it a look.  I have long thought that this patch would
> be at odds with your overall executor work.

Not fundamentally, but it makes it harder.


> XMLTABLE is specified by the standard to return multiple rows ... but
> then as far as my reading goes, it is only supposed to be supported in
> the range table (FROM clause) not in the target list.  I wonder if
> this would end up better if we only tried to support it in RT.  I asked
> Pavel to implement it like that a few weeks ago, but ...

Right - it makes sense in the FROM list - but then it should be an
executor node, instead of some expression thingy.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

2017-01-24 Thread Alvaro Herrera

Andres Freund wrote:
> Hi,
> 
> On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> > +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext 
> > *econtext,
> > + bool *isnull);
> > +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext 
> > *econtext,
> > + bool *isNull);
> > +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> > +   bool *isNull);
> > +static void tabexprInitialize(TableExprState *tstate, ExprContext 
> > *econtext,
> > + Datum doc);
> > +static void ShutdownTableExpr(Datum arg);
> 
> To me this (and a lot of the other code) hints quite strongly that
> expression evalution is the wrong approach to implementing this.  What
> you're essentially doing is building a vulcano style scan node.  Even if
> we can this, we shouldn't double down on the bad decision to have these
> magic expressions that return multiple rows.  There's historical reason
> for tSRFs, but we shouldn't add more weirdness like this.

Thanks for giving it a look.  I have long thought that this patch would
be at odds with your overall executor work.

XMLTABLE is specified by the standard to return multiple rows ... but
then as far as my reading goes, it is only supposed to be supported in
the range table (FROM clause) not in the target list.  I wonder if
this would end up better if we only tried to support it in RT.  I asked
Pavel to implement it like that a few weeks ago, but ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] patch: function xmltable

Hi,

On 2017-01-24 17:38:49 -0300, Alvaro Herrera wrote:
> +static Datum ExecEvalTableExpr(TableExprState *tstate, ExprContext *econtext,
> +   bool *isnull);
> +static Datum ExecEvalTableExprFast(TableExprState *exprstate, ExprContext 
> *econtext,
> +   bool *isNull);
> +static Datum tabexprFetchRow(TableExprState *tstate, ExprContext *econtext,
> + bool *isNull);
> +static void tabexprInitialize(TableExprState *tstate, ExprContext *econtext,
> +   Datum doc);
> +static void ShutdownTableExpr(Datum arg);

To me this (and a lot of the other code) hints quite strongly that
expression evalution is the wrong approach to implementing this.  What
you're essentially doing is building a vulcano style scan node.  Even if
we can this, we shouldn't double down on the bad decision to have these
magic expressions that return multiple rows.  There's historical reason
for tSRFs, but we shouldn't add more weirdness like this.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PATCH: recursive json_populate_record()

2017-01-24 Thread Nikita Glukhov


On 22.01.2017 21:58, Tom Lane wrote:

In the meantime, we are *not* going to have attndims be semantically
significant in just this one function.  Therefore, please remove this patch's
dependence on attndims.


Ok, I've removed patch's dependence on attndims.  But I still believe that
someday PostgreSQL's type system will be fixed.



I looked through the patch briefly and have some other comments:


Thank you very much for your review.



1. It's pretty bizarre that you need dummy forward struct declarations
for ColumnIOData and RecordIOData.  The reason evidently is that somebody
ignored project style and put a bunch of typedefs after the local function
declarations in jsonfuncs.c.  Project style of course is to put the
typedefs first, precisely because they may be needed in the local function
declarations.  I will go move those declarations right now, as a single-
purpose patch, so that you have something a bit cleaner to modify.


These forward struct declarations were moved to the type declaration section.



2. The business with isstring, saved_scalar_is_string, etc makes me itch.
I'm having a hard time explaining exactly why, but it just seems like a
single-purpose kluge.  Maybe it would seem less so if you saved the
original JsonTokenType instead.


Original JsonTokenType is saved now.  Also "isnull" fields of several structures
were replaced by it.


But also I'm wondering why the ultimate consumers are concerned with
string-ness as such.


saved_scalar_is_string was necessary for the case when json string is converted
to json/jsonb type.  json lexer returns strings with stripped quotes and we
must recover them before passing to json_in() or jsonb_in().  There were
examples for this case in my first message:

[master]=# select * from json_to_record('{"js": "a"}') as rec(js json);
ERROR:  invalid input syntax for type json
DETAIL:  Token "a" is invalid.
CONTEXT:  JSON data, line 1: a

[master]=# select * from json_to_record('{"js": "true"}') as rec(js json);
   js
--
true

[patched]=# select * from json_to_record('{"js": "a"}') as rec(js json);
   js
-
  "a"

[patched]=# select * from json_to_record('{"js": "true"}') as rec(js json);
js

  "true"



It seems like mainly what they're trying to do is forbid converting a string
into a non-scalar Postgres type, but (a) why, and (b) if you are going to
restrict it, wouldn't it be more sensible to frame it as you can't convert any
JSON scalar to a non-scalar type?  As to (a), it seems like you're just
introducing unnecessary compatibility breakage if you insist on that.
As to (b), really it seems like an appropriate restriction of that sort
would be like "if target type is composite, source must be a JSON object,
and if target type is array, source must be a JSON array".  Personally
I think what you ought to do here is not change the semantics of cases
that are allowed today, but only change the semantics in the cases of
JSON object being assigned to PG composite and JSON array being assigned
to PG array; both of those fail today, so there's nobody depending on the
existing behavior.  But if you reject string-to-composite cases then you
will be breaking existing apps, and I doubt people will thank you for it.


I've removed compatibility-breaking restrictions and now string-to-non-scalar
conversions through the input function are allowed.  Also I've added
corresponding regression test cases.



3. I think you'd be better off declaring ColumnIOData.type_category as an
actual enum type, not "char" with some specific values documented only
by a comment.  Also, could you fold the domain case in as a fourth
type_category?


I've introduced enum type TypeCat for type categories with domain category as
its fourth member.  (TypeCategory and JsonTypeCategory names conflict with
existing names, you might offer a better name.)


Also, why does ColumnIOData have both an ndims field and another one buried
in io.array.ndims?


Now there are no ndims fields at all, but earlier their values could differ:
ColumnIOData.ndims was typically copied from attndims, but ArrayIOData.ndims
could be copied from typndims for domain types.



4. populate_array_report_expected_array() violates our error message
guidelines, first by using elog rather than ereport for a user-facing
error, and second by assembling the message from pieces, which would
make it untranslatable even if it were being reported through ereport.
I'm not sure if this needs to be fixed though; will the function even
still be needed once you remove the attndims dependency?  Even if there
are still callers, you wouldn't necessarily need such a function if
you scale back your ambitions a bit as to the amount of detail required.
I'm not sure you really need to report what you think the dimensions
are when complaining that an array is nonrectangular.


It was my mistake that I was not familiar message-error guidelines.  Now
ereport() is used and there is no message assembling, but I'm also not

Re: [HACKERS] [COMMITTERS] pgsql: Add pg_sequence system catalog

2017-01-24 Thread Peter Eisentraut

On 1/24/17 10:23 AM, Stephen Frost wrote:
> It wouldn't hurt to add a comment as to why things are so different in
> PG10 than other versions, ie:
> 
> /*
>  * In PG10, sequence meta-data was moved into pg_sequence, so switch to
>  * the pg_catalog schema instead of operating in a user schema and pull
>  * the necessary meta-data from there.
>  */

I've committed this, but I've put the comment in the old sections and
explained how they are different from the new style, instead of vice versa.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pgpassfile connection option

Fabien COELHO  writes:
> I've switch in the CF to "ready for committer", and we'll see what the 
> next level thinks about it:-)

Pushed with a few adjustments:

* It seemed to me that the appropriate parameter name is "passfile"
not "pgpassfile".  In all but a couple of legacy cases, the environment
variable's name is the parameter name plus a PG prefix; therefore,
with the environment variable already named PGPASSFILE, we have to use
"passfile" for consistency.  Putting a PG prefix on a connection parameter
name is rather pointless anyway, since there's no larger namespace that
it might have a conflict in.

* The error handling in the submitted patch was pretty shoddy; it didn't
check for malloc failure at all, and it didn't report a suitable error on
pqGetHomeDirectory failure either (which is worth doing, if it's going to
be a reason for connection failure).  I ended up concluding that the
easiest way to fix that involved just inlining getPgPassFilename into the
one remaining call site.

* I also got rid of the assignment of DefaultPassword to conn->pgpass;
that wasn't doing anything very useful anymore.  AFAICS, the only visible
effect was to make PQpass() return an empty string not NULL, but we can
make that happen without paying a malloc/free cycle for it.

* Minor comment and docs cleanups.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [bug fix] Cascading standby cannot catch up and get stuck emitting the same message repeatedly

2017-01-24 Thread Fujii Masao

On Thu, Nov 24, 2016 at 4:24 PM, Amit Kapila  wrote:
> On Thu, Nov 24, 2016 at 10:29 AM, Tsunakawa, Takayuki
>  wrote:
>> From: pgsql-hackers-ow...@postgresql.org
>>> [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Amit Kapila
>>> Thanks for the clarification, I could reproduce the issue and confirms that
>>> patch has fixed it.  Find logs of cascading standby at  PG9.2 Head and Patch
>>> attached (I have truncated few lines at end of server log generated in Head
>>> as those were repetitive).  I think the way you have directly explained
>>> the bug steps in code comments is not right (think if we start writing bug
>>> steps for each bug fix, how the code will look like).  So I have modified
>>> the comment to explain the situation and reason of check,  see if you find
>>> that as okay?
>>
>> Thank you, I'm happy with your comment.
>>
>
> Okay, I have marked the patch as 'Ready for Committer'.

Pushed. Thanks!

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] increasing the default WAL segment size

On Fri, Jan 20, 2017 at 7:00 PM, Michael Paquier
 wrote:
>> No, because the output of SHOW is always of type text, regardless of
>> the type of the GUC.
>
> Thinking about that over night, that looks pretty nice. What would be
> nicer though would be to add INT8OID and BYTEAOID in the list, and
> convert as well the other replication commands. At the end, I think
> that we should finish by being able to remove all pq_* routine
> dependencies in walsender.c, saving quite a couple of lines.

Might be worth investigating, but I don't feel any obligation to do
that right now.  Thanks for the review; committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Index Scans

On Mon, Jan 23, 2017 at 1:03 AM, Amit Kapila  wrote:
> In spite of being careful, I missed reorganizing the functions in
> genam.h which I have done in attached patch.

Cool.  Committed parallel-generic-index-scan.2.patch.  Thanks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

On Mon, Jan 23, 2017 at 1:55 PM, Peter Eisentraut
 wrote:
> For the record, I don't like the name "xlog" either.  It would be nice
> if we could have more consistent and intuitive naming.

Great!

> But I don't see any proposals to actually change all uses of "xlog" to
> "wal".  What about program names, command line options, etc.?  If the
> argument is, we changed one thing, we should change the rest, then let's
> see that.  I think that argument itself is flawed, but if that's what
> we're going with, let's see the whole plan.

I'm happy to go change every last bit of it.  I was expecting after I
committed the initial rename that somebody would provide a follow-on
patch to do the rest of it in short order.  Instead, months went by
and we still don't have a complete patch.  But I don't see why that
has to take more than a day's work, probably just a few hours.  I'd
like to do that and move on.

> Moreover, I see we still have the pg_clog directory.  I thought that was
> supposed to be renamed as well, to avoid confusing it with a "log"
> directory.  Surely, we should at least conclude that original chapter
> before going further.

I'm not excited about starting to change pg_clog before we finish with
xlog -> wal.  Then we just have two half-done things, IMO.  But I'm
also not the only one with a commit bit.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

On Tue, Jan 24, 2017 at 12:58 PM, Tom Lane  wrote:
> I'd be okay with the parts of this that duplicate existing backend syntax
> and semantics, but I don't especially want to invent further than that.

I agree, and I think that's pretty much what we all said back in
October, and the patch hasn't been revised since then to match those
comments.  Perhaps I'm in a grumpy mood today, but I've got enough
patches to review from people who are willing to revise their patches
in response to feedback to spend very much time on patches from people
who aren't.  I realize that it can be frustrating to have to defer to
a committer when it's a 1-1 tie between you and the person with git
access - is that really a consensus?  But in this case, 3 separate
committers weighed in on this thread to very politely give essentially
the same feedback and that is certainly a consensus.  Not only was the
patch not revised, but the very idea that the patch might need to be
revised before further consideration was greeted with indignation.

Let's mark this Returned with Feedback and move on.  We've only got a
week left in the CommitFest anyhow and there are lots of other things
that still need work (and which actually have been revised to match
previous feedback).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
 wrote:
> [ new patches ]

Committed 0001 and 0002.  See my earlier email for comments on 0003.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PoC plpgsql - possibility to force custom or generic plan

2017-01-24 Thread Pavel Stehule

2017-01-24 6:38 GMT+01:00 Pavel Stehule :

> Hi
>
> 2017-01-23 21:59 GMT+01:00 Jim Nasby :
>
>> On 1/23/17 2:10 PM, Pavel Stehule wrote:
>>
>>> Comments, notes?
>>>
>>
>> +1 on the idea. It'd also be nice if we could expose control of plans for
>> dynamic SQL, though I suspect that's not terribly useful without some kind
>> of global session storage.
>>
>> A couple notes on a quick read-through:
>>
>> Instead of paralleling all the existing namespace stuff, I wonder if it'd
>> be better to create explicit block infrastructure. AFAIK PRAGMAs are going
>> to have a lot of the same requirements (certainly the nesting is the same),
>> and we might want more of this king of stuff in the future. (I've certainly
>> wished I could set a GUC in a plpgsql block and have it's settings revert
>> when exiting the block...)
>>
>
> I am not sure if I understand. ?? Setting GUC by PRAGMA can work - the
> syntax supports it and GUC API supports nesting. Not sure about exception
> handling - but it should not be problem probably.
>
> Please, can you show some examples.
>
>
>> Perhaps that's as simple as renaming all the existing _ns_* functions to
>> _block_ and then adding support for pragmas...
>>
>> Since you're adding cursor_options to PLpgSQL_expr it should probably be
>> removed as an option to exec_*.
>>
>
> I have to recheck it. Some cursor options going from dynamic cursor
> variables and are related to dynamic query - not query that creates query
> string.
>

hmm .. so current state is better due using options like
CURSOR_OPT_PARALLEL_OK

 if (expr->plan == NULL)
exec_prepare_plan(estate, expr, (parallelOK ?
  CURSOR_OPT_PARALLEL_OK : 0) |
expr->cursor_options);

This options is not permanent feature of expression - and then I cannot to
remove cursor_option argument from exec_*

I did minor cleaning - remove cursor_options from plpgsql_var

Regards

Pavel



>> finit_ would be better named free_.
>
>
> good idea
>
> Regards
>
> Pavel
>
>
>>
>> --
>> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
>> Experts in Analytics, Data Architecture and PostgreSQL
>> Data in Trouble? Get it in Treble! http://BlueTreble.com
>> 855-TREBLE2 (855-873-2532)
>>
>
>
diff --git a/src/pl/plpgsql/src/pl_comp.c b/src/pl/plpgsql/src/pl_comp.c
index b25b3f1..304fc91 100644
--- a/src/pl/plpgsql/src/pl_comp.c
+++ b/src/pl/plpgsql/src/pl_comp.c
@@ -51,6 +51,8 @@ bool		plpgsql_check_syntax = false;
 
 PLpgSQL_function *plpgsql_curr_compile;
 
+PLpgSQL_directives *plpgsql_directives;
+
 /* A context appropriate for short-term allocs during compilation */
 MemoryContext plpgsql_compile_tmp_cxt;
 
@@ -83,6 +85,11 @@ static const ExceptionLabelMap exception_label_map[] = {
 	{NULL, 0}
 };
 
+PLpgSQL_directives default_directives = {
+	NULL,
+	true,		/* is_function_scope */
+	0			/* no special cursor option */
+};
 
 /* --
  * static prototypes
@@ -374,6 +381,9 @@ do_compile(FunctionCallInfo fcinfo,
 	plpgsql_DumpExecTree = false;
 	plpgsql_start_datums();
 
+	/* set default compile directives */
+	plpgsql_directives = _directives;
+
 	switch (function->fn_is_trigger)
 	{
 		case PLPGSQL_NOT_TRIGGER:
@@ -852,6 +862,9 @@ plpgsql_compile_inline(char *proc_source)
 	plpgsql_DumpExecTree = false;
 	plpgsql_start_datums();
 
+	/* set default compile directives */
+	plpgsql_directives = _directives;
+
 	/* Set up as though in a function returning VOID */
 	function->fn_rettype = VOIDOID;
 	function->fn_retset = false;
diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c
index b48146a..9971ed2 100644
--- a/src/pl/plpgsql/src/pl_exec.c
+++ b/src/pl/plpgsql/src/pl_exec.c
@@ -2335,7 +2335,7 @@ exec_stmt_forc(PLpgSQL_execstate *estate, PLpgSQL_stmt_forc *stmt)
 	Assert(query);
 
 	if (query->plan == NULL)
-		exec_prepare_plan(estate, query, curvar->cursor_options);
+		exec_prepare_plan(estate, query, query->cursor_options);
 
 	/*
 	 * Set up short-lived ParamListInfo
@@ -3625,7 +3625,7 @@ exec_stmt_execsql(PLpgSQL_execstate *estate,
 	{
 		ListCell   *l;
 
-		exec_prepare_plan(estate, expr, 0);
+		exec_prepare_plan(estate, expr, expr->cursor_options);
 		stmt->mod_stmt = false;
 		foreach(l, SPI_plan_get_plan_sources(expr->plan))
 		{
@@ -4096,7 +4096,8 @@ exec_stmt_open(PLpgSQL_execstate *estate, PLpgSQL_stmt_open *stmt)
 		 */
 		query = stmt->query;
 		if (query->plan == NULL)
-			exec_prepare_plan(estate, query, stmt->cursor_options);
+			exec_prepare_plan(estate, query,
+			  query->cursor_options | stmt->cursor_options);
 	}
 	else if (stmt->dynquery != NULL)
 	{
@@ -4167,7 +4168,7 @@ exec_stmt_open(PLpgSQL_execstate *estate, PLpgSQL_stmt_open *stmt)
 
 		query = curvar->cursor_explicit_expr;
 		if (query->plan == NULL)
-			exec_prepare_plan(estate, query, curvar->cursor_options);
+			exec_prepare_plan(estate, query, query->cursor_options);
 	}
 
 	/*
@@ -4366,7 +4367,7 @@ exec_assign_expr(PLpgSQL_execstate

Re: [HACKERS] Improvements in psql hooks for variables

"Daniel Verite"  writes:
> Here's an update with these changes:

I scanned this patch very quickly and think it addresses my previous
stylistic objections.  Rahila is the reviewer of record though, so
I'll wait for his comments before going further.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-01-24 Thread Corey Huinker

Revised patch, with one caveat: It contains copy/pasted code from
variable.c intended to bridge the gap until https://commitfest.postgresql.
org/12/799/  (changing ParseVariableBool to detect invalid boolean-ish
strings) is merged. We may want to pause full-review of this patch pending
resolution of that one. I'm happy to continue with the stop-gap in place.

Changes made:
- \elseif is now \elif
- Invalid boolean values now return an error
- ON_ERROR_STOP is respected in all errors raided by \if, \elsif, \else,
\endif commands.
- Documentation gives a more real-world example of usage.
- Documentation gives a more explicit list of valid boolean values
- Regression tests for out-of-place \endif, \else, and \endif
- Regression test for invalid boolean values
- Removal of debug detritus.

Changes not(yet) made:
- No TAP test for errors respecting ON_ERROR_STOP
- function comments in psqlscan.l follow the style found in other comments
there, which goes counter to global style.


On Tue, Jan 24, 2017 at 3:58 AM, Fabien COELHO  wrote:

>
> I would suggest to assume false on everything else, and/or maybe to ignore
>>> the whole if/endif section in such cases.
>>>
>>
>> +1, it also halves the number of values we have to support later.
>>
>
> After giving it some thought, I revise a little bit my opinion:
>
>
> I think that if the value is evaluated to TRUE or FALSE, then fine. If it
> is anything else, then an error is raised (error message shown), which
> should also stop the script on "ON_ERROR_STOP", and if not the script
> continues with assuming the value was FALSE.
>
> --
> Fabien.
>
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 9915731..20091e5 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -2007,6 +2007,78 @@ hello 10
 
   
 
+  
+\if expr
+\elif expr
+\else
+\endif
+
+
+This group of commands implements nestable conditional blocks, like
+this:
+
+
+SELECT
+EXISTS(SELECT 1 FROM customer) as has_customers,
+EXISTS(SELECT 1 FROM employee) as has_employees
+\gset
+\if :has_users
+SELECT * FROM customer ORDER BY creation_date LIMIT 5;
+\elif :has_employees
+\echo 'no customers found'
+SELECT * FROM employee ORDER BY creation_date LIMIT 5;
+\else
+\if yes
+\echo 'No customers or employees'
+\else
+\echo 'this should never print'
+\endif
+\endif
+
+
+Conditional blocks must begin with a \if and end
+with an \endif, and the pairs must be found in
+the same source file. If an EOF is reached on the main file or an
+\include-ed file before all 
+\if-\endif are matched, then
+psql will raise an error.
+
+
+The \if and \elif commands
+read the rest of the line and evaluate it as a boolean expression.
+Currently, expressions are limited to a single unquoted string
+which is evaluated like other options booleans, so the valid values
+are any unabiguous case insensitive matches for one of:
+true, false, 
1,
+0, on, off,
+yes, no.  So 
+t, T, and tR
+will all match true.
+
+
+Queries within a false branch of a conditional block will not be
+sent to the server.
+
+
+Non-conditional \-commands within a false branch
+of a conditional block will not be evaluated for correctness. The
+command will be ignored along with all remaining input to the end
+of the line.
+
+
+Expressions on \if and \elif
+commands within a false branch of a conditional block will not be
+evaluated.
+
+
+A conditional block can at most one \else command.
+
+
+The \elif command cannot follow the
+\else command.
+
+
+  
 
   
 \ir or \include_relative 
filename
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4139b77..feb9ddc 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -49,6 +49,7 @@
 #include "psqlscanslash.h"
 #include "settings.h"
 #include "variables.h"
+#include "fe_utils/psqlscan_int.h"
 
 /*
  * Editable database object types.
@@ -132,7 +133,7 @@ HandleSlashCmds(PsqlScanState scan_state,
status = PSQL_CMD_ERROR;
}
 
-   if (status != PSQL_CMD_ERROR)
+   if (status != PSQL_CMD_ERROR && psqlscan_branch_active(scan_state))
{
/* eat any remaining arguments after a valid command */
/* note we suppress evaluation of backticks here */
@@ -194,6 +195,68 @@ read_connect_arg(PsqlScanState scan_state)
return result;
 }
 
+/*
+ * Read and interpret argument as a boolean expression.
+ * Return true if a boolean value was successfully parsed.
+ */
+static bool

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

"Daniel Verite"  writes:
> ISTM that it's important that eventually ParseVariableBool()
> and \if agree on what evaluates to true and false (and the
> more straightforward way to achieve that is by \if calling
> directly ParseVariableBool), but that it's not productive that we
> discuss /if issues relatively to the behavior of ParseVariableBool()
> in HEAD at the moment, as it's likely to change.

AFAIK we do have consensus on changing its behavior to disallow
assignment of invalid values.  It's just a matter of getting the
patch to be stylistically nice.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

"David G. Johnston"  writes:
> Maybe consider writing a full patch series that culminates with this
> additional builtin option being added as the final patch - breaking out
> this (and probably other) "requirements" patches separately to aid in
> review.  The presentation of just "add new stuff that seems useful" opens
> too much room for isolated discussion without knowing what the big picture
> requires.  At worse you'll at least have a working version of a forked
> pgbench that can do what you want.

> As it stands right now you haven't provided enough context for this patch
> and only the social difficulty of actually marking a patch rejected has
> prevented its demise in its current form - because while it has interesting
> ideas its added maintenance burden for -core without any in-core usage.
> But it you make it the first patch in a 3-patch series that implements the
> per-spec tpc-b the discussion moves away from these support functions and
> into the broader framework in which they are made useful.

I think Fabien already did post something of the sort, or at least
discussion towards it, and there was immediately objection as to whether
his idea of TPC-B compliance was actually right.  I remember complaining
that he had a totally artificial idea of what "fetching a data value"
requires.  So a full patch series would be a good idea in that we could
discuss whether the requirements are correct before we get into the nitty
gritty of implementing them.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

On 2017-01-24 19:25:52 +0100, Tobias Oberstein wrote:
> Hi,
> 
> > >  pid |syscall|   cnt   | cnt_per_sec
> > > -+---+-+-
> > >  | syscalls:sys_enter_lseek  | 4091584 |  136386
> > >  | syscalls:sys_enter_newfstat   | 2054988 |   68500
> > >  | syscalls:sys_enter_read   |  767990 |   25600
> > >  | syscalls:sys_enter_close  |  503803 |   16793
> > >  | syscalls:sys_enter_newstat|  434080 |   14469
> > >  | syscalls:sys_enter_open   |  380382 |   12679
> > > 
> > > Note: there isn't a lot of load currently (this is from production).
> > 
> > That doesn't really mean that much - sure it shows that lseek is
> > frequent, but it doesn't tell you how much impact this has to the
> 
> Above is on a mostly idle system ("idle" for our loads) .. when things get
> hot, lseek calls can reach into the millions/sec.
> 
> Doing 5 million syscalls per sec comes with overhead no matter how
> lightweight the syscall is, doesn't it?

> Using pread instead of lseek+read halfes the syscalls.
> 
> I really don't understand what you are fighting here ..

Sure, there's some overhead. And as I said upthread, I'm much less
against this change than Tom.  What I'm saying is that your benchmarks
haven't shown a benefit in a meaningful way, so I don't think I can
agree with

> "Well, my point remains that I see little value in messing with
> long-established code if you can't demonstrate a benefit that's clearly
> above the noise level."
> 
> I have done lots of benchmarking over the last days on a massive box, and I
> can provide numbers that I think show that the impact can be significant.

since you've not actually shown that the impact is above the noise level
when measured with an actual postgres workload.

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Rename pg_switch_xlog to pg_switch_wal

2017-01-24 Thread Stephen Frost

* Vladimir Rusinov (vrusi...@google.com) wrote:
> On Mon, Jan 23, 2017 at 6:59 PM, Stephen Frost  wrote:
> > I don't have any problem with asking for a summary of the exact set of
> > changes that he's planning to make though.  My understanding is that it
> > includes changing program names, command line options, etc.
> 
> Here's what I currently have in mind:
> 
> - sql function names (current patch)
> - binaries in bin: pg_receivexlog, pg_xlogdump, pg_resetxlog

Command-line options for other binaries in bin should also be changed
(eg: initdb's --xlogdir).

Looking at the other binaries in src/bin:

pg_archivecleanup - seems ok as-is (I'll note that it's Usage mentioned
'OLDESTKEPTWALFILE' ...)
pg_basebackup - --xlog/--xlog-method/--xlogdir
pgbench - looks fine
pg_config - looks fine
pg_controldata - looks fine
pg_ctl - looks fine
pgevent - DLL, not a binary, looks ok
pg_rewind - looks fine
pg_test_fsync - looks fine
pg_test_timing - looks fine
pg_upgrade - looks fine
psql - looks fine

> - (maybe) public/exported symbols from libpq,
> e.g. libpqGetCurrentXlogInsertLocation

Hm..?  That function is just a local function in pg_rewind.  I'd be kind
of surprised if we actually had anything in libpq that needed changing
due to this and I don't see anything there in a quick look.

> - at this time I don't target internal function and filenames, but that
> could also be done if there's desire for this.

No, I don't think that'll be necessary.

Thanks!

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

On 2017-01-24 15:36:13 -0300, Alvaro Herrera wrote:
> Tobias Oberstein wrote:
> 
> > I am benchmarking IOPS, and while doing so, it becomes apparent that at
> > these scales it does matter _how_ IO is done.
> > 
> > The most efficient way is libaio. I get 9.7 million/sec IOPS with low CPU
> > load. Using any synchronous IO engine is slower and produces higher load.
> > 
> > I do understand that switching to libaio isn't going to fly for PG
> > (completely different approach).
> 
> Maybe it is possible to write a new f_smgr implementation (parallel to
> md.c) that uses libaio.  There is no "seek" in that interface, at least,
> though the interface does assume that the implementation is blocking.

For it to be beneficial we'd need to redesign the IO stack above that so
much that it'd be basically not recognizable (since we'd need to
actually use async io for it to be beneficial). Using libaio IIRC still
requires O_DIRECT, so we'd to take more care with ordering of writeback
etc too - we got closer with 9.6, but we're still far away from it.
Besides that, it's also not always that clear when AIO would be
beneficial, since a lot of the synchronous IO is actually synchronous
for a reason.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

2017-01-24 Thread David G. Johnston

On Wed, Oct 5, 2016 at 2:03 AM, Fabien COELHO  wrote:

>
> I've got no objection to a more-nearly-TPC-B script as an option.
>>
>
> Good, because adding a "per-spec" tpc-b as an additional builtin option is
> one of my intentions, once pgbench is capable of it.

Trying to scan the thread as an interested third-party - without hacker
skills...

Maybe consider writing a full patch series that culminates with this
additional builtin option being added as the final patch - breaking out
this (and probably other) "requirements" patches separately to aid in
review.  The presentation of just "add new stuff that seems useful" opens
too much room for isolated discussion without knowing what the big picture
requires.  At worse you'll at least have a working version of a forked
pgbench that can do what you want.

As it stands right now you haven't provided enough context for this patch
and only the social difficulty of actually marking a patch rejected has
prevented its demise in its current form - because while it has interesting
ideas its added maintenance burden for -core without any in-core usage.
But it you make it the first patch in a 3-patch series that implements the
per-spec tpc-b the discussion moves away from these support functions and
into the broader framework in which they are made useful.

My $0.02

David J.

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-01-24 Thread Corey Huinker

On Tue, Jan 24, 2017 at 1:25 PM, Corey Huinker 
wrote:

> This might be asking a lot, but could we make a "strict" mode for
> ParseVariableBool() that returns a success boolean, and have the existing
> ParseVariableBool() signature call that new function, and issue the
> "assuming " warning if the strict function failed?

Nevermind. Looking at the v7 patch I see that it does everything I need and
more. I should have looked first.

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

2017-01-24 Thread Alvaro Herrera

Tobias Oberstein wrote:

> I am benchmarking IOPS, and while doing so, it becomes apparent that at
> these scales it does matter _how_ IO is done.
> 
> The most efficient way is libaio. I get 9.7 million/sec IOPS with low CPU
> load. Using any synchronous IO engine is slower and produces higher load.
> 
> I do understand that switching to libaio isn't going to fly for PG
> (completely different approach).

Maybe it is possible to write a new f_smgr implementation (parallel to
md.c) that uses libaio.  There is no "seek" in that interface, at least,
though the interface does assume that the implementation is blocking.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..


Hi,


 pid |syscall|   cnt   | cnt_per_sec
-+---+-+-
 | syscalls:sys_enter_lseek  | 4091584 |  136386
 | syscalls:sys_enter_newfstat   | 2054988 |   68500
 | syscalls:sys_enter_read   |  767990 |   25600
 | syscalls:sys_enter_close  |  503803 |   16793
 | syscalls:sys_enter_newstat|  434080 |   14469
 | syscalls:sys_enter_open   |  380382 |   12679

Note: there isn't a lot of load currently (this is from production).


That doesn't really mean that much - sure it shows that lseek is
frequent, but it doesn't tell you how much impact this has to the


Above is on a mostly idle system ("idle" for our loads) .. when things 
get hot, lseek calls can reach into the millions/sec.


Doing 5 million syscalls per sec comes with overhead no matter how 
lightweight the syscall is, doesn't it?


Using pread instead of lseek+read halfes the syscalls.

I really don't understand what you are fighting here ..


overall workload.  For that'd you'd need a generic (i.e. not syscall
tracepoint, but cpu cycle) perf profile, and look in the call graph (via
perf report --children) how much of that is below the lseek syscall.


I see. I might find time to extend our helper function f_perf_syscalls.


I'm much less against this change than Tom, but doing artificial syscall
microbenchmark seems unlikely to make a big case for using it in


This isn't a syscall benchmark, but FIO.


There's not really a difference between those, when you use fio to
benchmark seek vs pseek.


Sorry, I don't understand what you are talking about.


Fio as you appear to have used is a microbenchmark benchmarking
individual syscalls.


I am benchmarking IOPS, and while doing so, it becomes apparent that at 
these scales it does matter _how_ IO is done.


The most efficient way is libaio. I get 9.7 million/sec IOPS with low 
CPU load. Using any synchronous IO engine is slower and produces higher 
load.


I do understand that switching to libaio isn't going to fly for PG 
(completely different approach). But doing pread instead of lseek+read 
seems simple enough. But then, I don't know about the PG codebase ..


Among the synchronous methods of doing IO, psync is much better than sync.

pvsync, pvsync2 and pvsync2 + hipri (busy polling, no interrupts) are 
better, but the gain is smaller, and all of them are inferior to libaio.



Glad to hear it.


With 3TB RAM, huge pages is absolutely essential (otherwise, the system bogs
down in TLB etc overhead).


I was one of the people working on adding hugepage support to pg, that's
why I was glad ;)


Ahh;) Sorry, wasn't aware. This is really invaluable. Thanks for that!

Cheers,
/Tobias



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

2017-01-24 Thread Corey Huinker

>
> ISTM that it's important that eventually ParseVariableBool()
> and \if agree on what evaluates to true and false (and the
> more straightforward way to achieve that is by \if calling
> directly ParseVariableBool), but that it's not productive that we
> discuss /if issues relatively to the behavior of ParseVariableBool()
> in HEAD at the moment, as it's likely to change.
>

I'd like to keep in sync with ParseVariableBoolean(), but

Also, Fabien has made a good case for invalid parsed values being an
ON_ERROR_STOP-able error, and not defaulted to either true or false.

This might be asking a lot, but could we make a "strict" mode for
ParseVariableBool() that returns a success boolean, and have the existing
ParseVariableBool() signature call that new function, and issue the
"assuming " warning if the strict function failed?

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

Hi,

On 2017-01-24 18:57:47 +0100, Tobias Oberstein wrote:
> Am 24.01.2017 um 18:41 schrieb Andres Freund:
> > On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:
> > > The syscall overhead is visible in production too .. I watched PG using 
> > > perf
> > > live, and lseeks regularily appear at the top of the list.
> > 
> > Could you show such perf profiles? That'll help us.
> 
> oberstet@bvr-sql18:~$ psql -U postgres -d adr
> psql (9.5.4)
> Type "help" for help.
> 
> adr=# select * from svc_sqlbalancer.f_perf_syscalls();
> NOTICE:  starting Linux perf syscalls sampling - be patient, this can take
> some time ..
> NOTICE:  sudo /usr/bin/perf stat -e "syscalls:sys_enter_*"  -x ";" -a
> sleep 30 2>&1
>  pid |syscall|   cnt   | cnt_per_sec
> -+---+-+-
>  | syscalls:sys_enter_lseek  | 4091584 |  136386
>  | syscalls:sys_enter_newfstat   | 2054988 |   68500
>  | syscalls:sys_enter_read   |  767990 |   25600
>  | syscalls:sys_enter_close  |  503803 |   16793
>  | syscalls:sys_enter_newstat|  434080 |   14469
>  | syscalls:sys_enter_open   |  380382 |   12679
>  | syscalls:sys_enter_mmap   |  301491 |   10050
>  | syscalls:sys_enter_munmap |  182313 |6077
>  | syscalls:sys_enter_getdents   |  162443 |5415
>  | syscalls:sys_enter_rt_sigaction   |  158947 |5298
>  | syscalls:sys_enter_openat |   85325 |2844
>  | syscalls:sys_enter_readlink   |   77439 |2581
>  | syscalls:sys_enter_rt_sigprocmask |   60929 |2031
>  | syscalls:sys_enter_mprotect   |   58372 |1946
>  | syscalls:sys_enter_futex  |   49726 |1658
>  | syscalls:sys_enter_access |   40845 |1362
>  | syscalls:sys_enter_write  |   39513 |1317
>  | syscalls:sys_enter_brk|   33656 |1122
>  | syscalls:sys_enter_epoll_wait |   23776 | 793
>  | syscalls:sys_enter_ioctl  |   19764 | 659
>  | syscalls:sys_enter_wait4  |   17371 | 579
>  | syscalls:sys_enter_newlstat   |   13008 | 434
>  | syscalls:sys_enter_exit_group |   10135 | 338
>  | syscalls:sys_enter_recvfrom   |8595 | 286
>  | syscalls:sys_enter_sendto |8448 | 282
>  | syscalls:sys_enter_poll   |7200 | 240
>  | syscalls:sys_enter_lgetxattr  |6477 | 216
>  | syscalls:sys_enter_dup2   |5790 | 193
> 
> 
> 
> Note: there isn't a lot of load currently (this is from production).

That doesn't really mean that much - sure it shows that lseek is
frequent, but it doesn't tell you how much impact this has to the
overall workload.  For that'd you'd need a generic (i.e. not syscall
tracepoint, but cpu cycle) perf profile, and look in the call graph (via
perf report --children) how much of that is below the lseek syscall.


> > > > I'm much less against this change than Tom, but doing artificial syscall
> > > > microbenchmark seems unlikely to make a big case for using it in
> > > 
> > > This isn't a syscall benchmark, but FIO.
> > 
> > There's not really a difference between those, when you use fio to
> > benchmark seek vs pseek.
> 
> Sorry, I don't understand what you are talking about.

Fio as you appear to have used is a microbenchmark benchmarking
individual syscalls.


> > > > postgres, where it's part of vastly more expensive operations (like
> > > > actually reading data afterwards, exclusive locks, ...).
> > > 
> > > PG is very CPU hungry, yes.
> > 
> > Indeed - working on it ;)
> > 
> > 
> > > But there are quite some system related effects
> > > too .. eg we've managed to get down the system load with huge pages (big
> > > improvement).
> > 
> > Glad to hear it.
> 
> With 3TB RAM, huge pages is absolutely essential (otherwise, the system bogs
> down in TLB etc overhead).

I was one of the people working on adding hugepage support to pg, that's
why I was glad ;)


Regards,

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: \if, \elseif, \else, \endif (was Re: [HACKERS] PSQL commands: \quit_if, \quit_unless)

Corey Huinker wrote:

> >
> > 1: unrecognized value "whatever" for "\if"; assuming "on"
> >
> > I do not think that the script should continue with such an assumption.
> >
> 
> I agree, and this means we can't use ParseVariableBool() as-is

The patch at https://commitfest.postgresql.org/12/799/
in the ongoing CF already changes ParseVariableBool()
to not assume that unrecognizable values should be set to
"on".

There's also the fact that ParseVariableBool() in HEAD assumes
that an empty value is valid and true, which I think leads to this
inconsistency in the current patch:

\set empty
\if :empty
select 1 as result \gset
\else
select 2 as result \gset
\endif
\echo 'result is' :result

produces: result is 1 (so an empty string evaluates to true)

Yet this sequence:

\if
select 1 as result \gset
\else
select 2 as result \gset
\endif
\echo 'result is' :result

produces: result is 2 (so an empty \if evaluates to false)

The equivalence between empty value and true in
ParseVariableBool() is also suppressed in the above-mentioned
patch.

ISTM that it's important that eventually ParseVariableBool()
and \if agree on what evaluates to true and false (and the
more straightforward way to achieve that is by \if calling
directly ParseVariableBool), but that it's not productive that we
discuss /if issues relatively to the behavior of ParseVariableBool()
in HEAD at the moment, as it's likely to change.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

Robert Haas  writes:
>> On Fri, Dec 2, 2016 at 1:28 AM, Fabien COELHO  wrote:
>>> This decision is both illogical and arbitrary.

>> I disagree.  I think his decision was probably based on this email from me:

> (argh, sent too soon)
> https://www.postgresql.org/message-id/ca+tgmoa0zp4a+s+kosav4qfdz-wa56vlph8me86rmpsnkvw...@mail.gmail.com

> Nobody responded to that, and I have not seen any committer say that
> they are in favor of this.  Against that, three committers (Tom,
> Stephen, me) have taken the view that they are opposed to at least
> some parts of it.  No changes to the patch have resulted from those
> complaints.  So this is just submitting the same thing over and over
> again and hoping that the fourth or fifth committer who looks at this
> is going to have a different opinion than the first three, or fail to
> notice the previous discussion.

I concur that this is expanding pgbench's expression language well beyond
what anybody has shown a need for.  I'm also concerned that there's an
opportunity cost here, in that the patch establishes a precedence ordering
for its new operators, which we'd have a hard time changing later.  That's
significant because the proposed precedence is different from what you'd
get for similarly-named operators on the backend side.  I realize that
it's trying to follow the C standard instead, but do we really want to
open that can of worms right now?  That's a decision I'd much rather put
off if we need not make it now.

I'd be okay with the parts of this that duplicate existing backend syntax
and semantics, but I don't especially want to invent further than that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..


Hi,

Am 24.01.2017 um 18:41 schrieb Andres Freund:

Hi,

On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:

assume that it'd get more than swamped with doing actualy work, and with
buffering the frequently accessed stuff in memory.



What I am trying to say is: the syscall overhead of doing lseek/read/write
instead of pread/pwrite do become visible and hurt at a certain point.


Sure - but the question is whether it's measurable when you do actual
work.


The syscall overhead is visible in production too .. I watched PG using perf
live, and lseeks regularily appear at the top of the list.


Could you show such perf profiles? That'll help us.


oberstet@bvr-sql18:~$ psql -U postgres -d adr
psql (9.5.4)
Type "help" for help.

adr=# select * from svc_sqlbalancer.f_perf_syscalls();
NOTICE:  starting Linux perf syscalls sampling - be patient, this can 
take some time ..
NOTICE:  sudo /usr/bin/perf stat -e "syscalls:sys_enter_*" 
 -x ";" -a sleep 30 2>&1

 pid |syscall|   cnt   | cnt_per_sec
-+---+-+-
 | syscalls:sys_enter_lseek  | 4091584 |  136386
 | syscalls:sys_enter_newfstat   | 2054988 |   68500
 | syscalls:sys_enter_read   |  767990 |   25600
 | syscalls:sys_enter_close  |  503803 |   16793
 | syscalls:sys_enter_newstat|  434080 |   14469
 | syscalls:sys_enter_open   |  380382 |   12679
 | syscalls:sys_enter_mmap   |  301491 |   10050
 | syscalls:sys_enter_munmap |  182313 |6077
 | syscalls:sys_enter_getdents   |  162443 |5415
 | syscalls:sys_enter_rt_sigaction   |  158947 |5298
 | syscalls:sys_enter_openat |   85325 |2844
 | syscalls:sys_enter_readlink   |   77439 |2581
 | syscalls:sys_enter_rt_sigprocmask |   60929 |2031
 | syscalls:sys_enter_mprotect   |   58372 |1946
 | syscalls:sys_enter_futex  |   49726 |1658
 | syscalls:sys_enter_access |   40845 |1362
 | syscalls:sys_enter_write  |   39513 |1317
 | syscalls:sys_enter_brk|   33656 |1122
 | syscalls:sys_enter_epoll_wait |   23776 | 793
 | syscalls:sys_enter_ioctl  |   19764 | 659
 | syscalls:sys_enter_wait4  |   17371 | 579
 | syscalls:sys_enter_newlstat   |   13008 | 434
 | syscalls:sys_enter_exit_group |   10135 | 338
 | syscalls:sys_enter_recvfrom   |8595 | 286
 | syscalls:sys_enter_sendto |8448 | 282
 | syscalls:sys_enter_poll   |7200 | 240
 | syscalls:sys_enter_lgetxattr  |6477 | 216
 | syscalls:sys_enter_dup2   |5790 | 193



Note: there isn't a lot of load currently (this is from production).


I'm much less against this change than Tom, but doing artificial syscall
microbenchmark seems unlikely to make a big case for using it in


This isn't a syscall benchmark, but FIO.


There's not really a difference between those, when you use fio to
benchmark seek vs pseek.


Sorry, I don't understand what you are talking about.


postgres, where it's part of vastly more expensive operations (like
actually reading data afterwards, exclusive locks, ...).


PG is very CPU hungry, yes.


Indeed - working on it ;)



But there are quite some system related effects
too .. eg we've managed to get down the system load with huge pages (big
improvement).


Glad to hear it.


With 3TB RAM, huge pages is absolutely essential (otherwise, the system 
bogs down in TLB etc overhead).



I'd welcome seeing profiles of that - I'm working quite heavily on
speeding up analytics workloads for pg.


Here:

https://github.com/oberstet/scratchbox/raw/master/cruncher/adr_stats/ADR-PostgreSQL-READ-Statistics.pdf

https://github.com/oberstet/scratchbox/tree/master/cruncher/adr_stats


Thanks, unfortunately those appear to mostly have io / cache hit ratio
related stats?


Yep, this was just to proof that we are really running a DWH workload at 
scale;)


Cheers,
/Tobias



Greetings,

Andres Freund





--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning - another take

On Thu, Jan 19, 2017 at 9:58 PM, Amit Langote
 wrote:
>> But I wonder why we don't instead just change this function to
>> consider tdhasoid rather than tdtypeid.  I mean, if the only point of
>> comparing the type OIDs is to find out whether the table-has-OIDs
>> setting matches, we could instead test that directly and avoid needing
>> to pass an extra argument.  I wonder if there's some other reason this
>> code is there which is not documented in the comment...
>
> With the following patch, regression tests run fine:
>
>   if (indesc->natts == outdesc->natts &&
> - indesc->tdtypeid == outdesc->tdtypeid)
> + indesc->tdhasoid != outdesc->tdhasoid)
>  {
>
> If checking tdtypeid (instead of tdhasoid directly) has some other
> consideration, I'd would have seen at least some tests broken by this
> change.  So, if we are to go with this, I too prefer it over my previous
> proposal to add an argument to convert_tuples_by_name().  Attached 0003
> implements the above approach.

I think this is not quite right.  First, the patch compares the
tdhasoid status with != rather than ==, which would have the effect of
saying that we can skip conversion of the has-OID statuses do NOT
match.  That can't be right.  Second, I believe that the comments
imply that conversion should be done if *either* tuple has OIDs.  I
believe that's because whoever wrote this comment thought that we
needed to replace the OID if the tuple already had one, which is what
do_convert_tuple would do.  I'm not sure whether that's really
necessary, but we're less likely to break anything if we preserve the
existing behavior, and I don't think we lose much from doing so
because few user tables will have OIDs.  So I would change this test
to if (indesc->natts == outdesc->natts && !indesc->tdhasoid &&
!outdesc->tdhasoid), and I'd revise the one in
convert_tuples_by_position() to match.  Then I think it's much clearer
that we're just optimizing what's there already, not changing the
behavior.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] ICU integration

2017-01-24 Thread Peter Eisentraut

On 1/9/17 3:45 PM, Peter Geoghegan wrote:
> * I think it's worth looking into ucol_nextSortKeyPart(), and using
> that as an alternative to ucol_getSortKey(). It doesn't seem any
> harder, and when I tested it it was clearly faster. (I think that
> ucol_nextSortKeyPart() is more or less intended to be used for
> abbreviated keys.)

I will try to look into that.

> * I think that it's not okay that convert_string_datum() still uses
> strxfrm() without considering if it's an ICU build. That's why I
> raised the idea of a pg_strxfrm() wrapper at one point.

That code works in a locale-agnostic way now, which might be
questionable, but it's not the fault of this patch, I think.

> * Similarly, I think that check_strxfrm_bug() should have something
> about ICU. It's looking for a particular bug in some very old version
> of Solaris 8. At a minimum, check_strxfrm_bug() should now not run at
> all (a broken OS strxfrm() shouldn't be a problem with ICU).

But we'll still be using strxfrm for the database default locale, so we
need to keep that check.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

Hi,

On 2017-01-24 18:37:14 +0100, Tobias Oberstein wrote:
> > assume that it'd get more than swamped with doing actualy work, and with
> > buffering the frequently accessed stuff in memory.
> > 
> > 
> > > What I am trying to say is: the syscall overhead of doing lseek/read/write
> > > instead of pread/pwrite do become visible and hurt at a certain point.
> > 
> > Sure - but the question is whether it's measurable when you do actual
> > work.
> 
> The syscall overhead is visible in production too .. I watched PG using perf
> live, and lseeks regularily appear at the top of the list.

Could you show such perf profiles? That'll help us.


> > I'm much less against this change than Tom, but doing artificial syscall
> > microbenchmark seems unlikely to make a big case for using it in
> 
> This isn't a syscall benchmark, but FIO.

There's not really a difference between those, when you use fio to
benchmark seek vs pseek.


> > postgres, where it's part of vastly more expensive operations (like
> > actually reading data afterwards, exclusive locks, ...).
> 
> PG is very CPU hungry, yes.

Indeed - working on it ;)


> But there are quite some system related effects
> too .. eg we've managed to get down the system load with huge pages (big
> improvement).

Glad to hear it.


> > I'd welcome seeing profiles of that - I'm working quite heavily on
> > speeding up analytics workloads for pg.
> 
> Here:
> 
> https://github.com/oberstet/scratchbox/raw/master/cruncher/adr_stats/ADR-PostgreSQL-READ-Statistics.pdf
> 
> https://github.com/oberstet/scratchbox/tree/master/cruncher/adr_stats

Thanks, unfortunately those appear to mostly have io / cache hit ratio
related stats?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

Hi,

Switching to sync engine, it drops to 9.1 mio - but the system load then is
also much higher!

I doubt those have very much to do with postgres - I'd quite strongly

In the machine in production, we see 8kB reads in the 300k-650k/s range.
In spikes, because, yes, due to the 3TB RAM, we have high buffer hit
ratios as well.

assume that it'd get more than swamped with doing actualy work, and with
buffering the frequently accessed stuff in memory.

What I am trying to say is: the syscall overhead of doing lseek/read/write
instead of pread/pwrite do become visible and hurt at a certain point.

Sure - but the question is whether it's measurable when you do actual
work.

The syscall overhead is visible in production too .. I watched PG using
perf live, and lseeks regularily appear at the top of the list.

I'm much less against this change than Tom, but doing artificial syscall
microbenchmark seems unlikely to make a big case for using it in

This isn't a syscall benchmark, but FIO.

postgres, where it's part of vastly more expensive operations (like
actually reading data afterwards, exclusive locks, ...).

PG is very CPU hungry, yes. But there are quite some system related
effects too .. eg we've managed to get down the system load with huge
pages (big improvement).

This isn't academic, as we have experience (in prod) with a similarily
designed box and PostgreSQL used as a data-warehouse.

We are using an internal tool to parallelize via sessions and this box is
completely CPU bound (same NVMes, 3TB RAM as the new one, but only 48 cores
and no HT).

I'd welcome seeing profiles of that - I'm working quite heavily on
speeding up analytics workloads for pg.

Here:

https://github.com/oberstet/scratchbox/raw/master/cruncher/adr_stats/ADR-PostgreSQL-READ-Statistics.pdf

https://github.com/oberstet/scratchbox/tree/master/cruncher/adr_stats

Cheers,
/Tobias

Greetings,

Andres Freund

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: GIN non-intrusive vacuum of posting tree

2017-01-24 Thread Jeff Davis

On Tue, Jan 24, 2017 at 3:18 AM, Andrew Borodin  wrote:
> Technically, approach of locking a subtree is not novel. Lehman and
> Yao focused on "that any process for manipulating the tree uses only a
> small (constant) number of locks at any time." We are locking unknown
> and possibly large amount of pages.

By the way, can you show me where the Lehman and Yao paper addresses
page recycling?

It says that one approach is to allow fewer than K entries on a leaf
node; presumably as few as zero. But it doesn't seem to show how to
remove all references to the page and recycle it in a new place in the
tree.

Regards,
 Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Performance improvement for joins where outer side is unique

[ getting back to this at long last ]

David Rowley  writes:
> However, having said that, I'm not sure why we'd need outer_unique
> available so we'd know that we could skip mark/restore. I think
> inner_unique is enough for this purpose. Take the comment from
> nodeMergejoin.c:

> * outer: (0 ^1 1 2 5 5 5 6 6 7) current tuple: 1
>  * inner: (1 ^3 5 5 5 5 6) current tuple: 3
>  ...
> *
>  * Consider the above relations and suppose that the executor has
>  * just joined the first outer "5" with the last inner "5". The
>  * next step is of course to join the second outer "5" with all
>  * the inner "5's". This requires repositioning the inner "cursor"
>  * to point at the first inner "5". This is done by "marking" the
>  * first inner 5 so we can restore the "cursor" to it before joining
>  * with the second outer 5. The access method interface provides
>  * routines to mark and restore to a tuple.

> If only one inner "5" can exist (unique_inner), then isn't the inner
> side already in the correct place, and no restore is required?

Hmm ... let me see whether I have my head wrapped around this correctly.

What I had in mind is a different optimization.  When the inner is not
unique, we can't know we're done joining the first outer "5" until we
reach the inner "6".  At that point, what we normally do is advance the
outer by one row and back up the inner to the mark point (the first inner
"5").  But the only reason to back up is that the new outer row might join
to the same inner rows the previous one did.  If we know the outer side is
unique, then those inner rows can't join to the new outer row so no need
to back up.  So this requires no real change to the mergejoin algorithm,
we just skip the mark and restore steps.

I think what you're saying is that, if we know the inner side is unique,
we can also skip mark/restore overhead; but it would work a bit
differently.  After joining two rows with equal keys, instead of advancing
the inner as per standard algorithm, we'd need to advance the outer side.
(This is OK because we know the next inner can't join to this same outer.
But we don't know if the next outer can join to this inner.)  We advance
inner only when current inner < current outer, so we're done with that
inner and need never rewind.  So this is a more fundamental algorithm
change but it gets the same performance benefit.

So the question is, if we can skip mark/restore overhead when we know that
either input is unique, is it necessary for the planner to account for
both ways explicitly?  Given the general symmetry of mergejoins, it might
be okay for the planner to preferentially generate plans with the unique
input on the inside, and not worry about optimizing in this way when it's
on the outside.

Now, JOIN_SEMI and JOIN_ANTI cases are *not* symmetric, since we don't
implement reverse-semi or reverse-anti join modes.  But I don't know that
there's anything to be won by noticing that the outer side is unique in
those cases.  I think probably we can apply the same technique of
advancing outer not inner, and never rewinding, in those join modes
whether or not either input is known unique.

Another asymmetry is that if one input is likely to be empty, it's better
to put that one on the outside, because if it is empty we don't have to
fetch anything from the other input (thus saving its startup cost).  But
the planner doesn't account for that anyway because it never believes
non-dummy relations are truly empty; so even if it's getting this right
today, it's purely by chance.

I can't think of any other asymmetries offhand.

In short, I think you are right that it might be enough to account for
inner uniqueness only, and not worry about it for the outer side, even
for mergejoin.  This means my previous objection is wrong and we don't
really have to allow for a future extension of that kind while choosing
the notation the planner uses.

So ... would you rather go back to the previous notation (extra
JoinTypes), or do you like the separate boolean better anyway?

Sorry for jerking you back and forth like this, but sometimes the
correct path isn't apparent from the start.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] lseek/read/write overhead becomes visible at scale ..

Hi,

On 2017-01-24 18:11:09 +0100, Tobias Oberstein wrote:
> I have done lots of benchmarking over the last days on a massive box, and I
> can provide numbers that I think show that the impact can be significant.

> Above number was using psync FIO engine .. with libaio, it's at 9.7 mio with
> much lower CPU load - but this doesn't apply to PG of course.

> Switching to sync engine, it drops to 9.1 mio - but the system load then is
> also much higher!

I doubt those have very much to do with postgres - I'd quite strongly
assume that it'd get more than swamped with doing actualy work, and with
buffering the frequently accessed stuff in memory.

> What I am trying to say is: the syscall overhead of doing lseek/read/write
> instead of pread/pwrite do become visible and hurt at a certain point.

Sure - but the question is whether it's measurable when you do actual
work.

I'm much less against this change than Tom, but doing artificial syscall
microbenchmark seems unlikely to make a big case for using it in
postgres, where it's part of vastly more expensive operations (like
actually reading data afterwards, exclusive locks, ...).

> This isn't academic, as we have experience (in prod) with a similarily
> designed box and PostgreSQL used as a data-warehouse.
> 
> We are using an internal tool to parallelize via sessions and this box is
> completely CPU bound (same NVMes, 3TB RAM as the new one, but only 48 cores
> and no HT).

I'd welcome seeing profiles of that - I'm working quite heavily on
speeding up analytics workloads for pg.

Greetings,

Andres Freund

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] lseek/read/write overhead becomes visible at scale ..

Hi guys,

pls bare with me, this is my first post here. Pls also excuse the length
.. I was trying to do all my homework before posting here;)

The overhead of lseek/read/write vs pread/pwrite (or even
pvread/pvwrite) was previously discussed here

https://www.postgresql.org/message-id/flat/CABUevEzZ%3DCGdmwSZwW9oNuf4pQZMExk33jcNO7rseqrAgKzj5Q%40mail.gmail.com#CABUevEzZ=cgdmwszww9onuf4pqzmexk33jcno7rseqragkz...@mail.gmail.com

The thread ends with

"Well, my point remains that I see little value in messing with
long-established code if you can't demonstrate a benefit that's clearly
above the noise level."

I have done lots of benchmarking over the last days on a massive box,
and I can provide numbers that I think show that the impact can be
significant.

Our storage tops out at 9.4 million random 4kB read IOPS.

Storage consists of 8 x Intel P3608 4TB NVMe (which logically is 16 NVMe
block devices).

Above number was using psync FIO engine .. with libaio, it's at 9.7 mio
with much lower CPU load - but this doesn't apply to PG of course.

Switching to sync engine, it drops to 9.1 mio - but the system load then
is also much higher!

In a way, our massive CPU 4 x E7 8880 with 88 cores / 176 threads) hides
the impact of sync vs psync.

So, with less CPU, the syscall overhead kicks in (we are CPU bound).

It also becomes much more visible with Linux MD in the mix, because MD
comes with it's own overhead/bottleneck, and our then CPU cannot hide
the overhead of sync vs psync anymore:

sync on MD: IOPS=1619k
psync on MD: IOPS=4289k
sync on non-MD: IOPS=9165k
psync on non-MD: IOPS=9410k

Please find all the details here

https://github.com/oberstet/scratchbox/tree/master/cruncher/sync-engines

Note: MD has a lock contention (lock_qsc) - I am going down that rabbit
hole too. But this is only related to PG in that the negative impacts
multiply.

What I am trying to say is: the syscall overhead of doing
lseek/read/write instead of pread/pwrite do become visible and hurt at a
certain point.

I totally agree with the entry citation ("show up numbers first!"), but
I think I have shown numbers;)

I'd love to get the 9.4 mio IOPS right through MD and XFS up to PG
(yeah, I know, PG does 8kB, but it'll be similar).

Cheers,
/Tobias

PS:
This isn't academic, as we have experience (in prod) with a similarily
designed box and PostgreSQL used as a data-warehouse.

We are using an internal tool to parallelize via sessions and this box
is completely CPU bound (same NVMes, 3TB RAM as the new one, but only 48
cores and no HT).

Squeezing out CPU and imrpoving CPU usage efficiency is hence very
important for us.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

  I just wrote:

> - add enum-style suggestions on invalid input for \pset x, \pset pager,
>  and \set of ECHO, ECHO_HIDDEN, ON_ERROR_ROLLBACK, COMP_KEYWORD_CASE,
>  HISTCONTROL, VERBOSITY, SHOW_CONTEXT, \x, \pager

There's no such thing as \pager, I meant to write:

  \pset x, \pset pager,
  and \set of ECHO, ECHO_HIDDEN, ON_ERROR_ROLLBACK, COMP_KEYWORD_CASE,
  HISTCONTROL, VERBOSITY, SHOW_CONTEXT


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

Tom Lane wrote:

> I took a quick look through this.  It seems to be going in generally
> the right direction, but here's a couple of thoughts:

Here's an update with these changes:

per Tom's suggestions upthread:

- change ParseVariableBool() signature to return validity as bool.

- remove ParseCheckVariableNum() in favor of using tightened up
  ParseVariableNum() and GetVariableNum().

- updated header comments in variables.h

other changes:

- autocommit_hook rejects transitions from OFF to ON when inside a
transaction, per suggestion of Rahila Syed (which was the original
motivation for the set of changes of this patch).

- slight doc update for HISTCONTROL (values outside of enum not longer
allowed)

- add enum-style suggestions on invalid input for \pset x, \pset pager,
  and \set of ECHO, ECHO_HIDDEN, ON_ERROR_ROLLBACK, COMP_KEYWORD_CASE,
  HISTCONTROL, VERBOSITY, SHOW_CONTEXT, \x, \pager

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 9915731..042d277 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -3213,9 +3213,8 @@ bar
  list. If set to a value of ignoredups, lines
  matching the previous history line are not entered. A value of
  ignoreboth combines the two options. If
- unset, or if set to none (or any other value
- than those above), all lines read in interactive mode are
- saved on the history list.
+ unset, or if set to none, all lines read
+ in interactive mode are saved on the history list.
 
 
 
diff --git a/src/bin/psql/command.c b/src/bin/psql/command.c
index 4139b77..091a138 100644
--- a/src/bin/psql/command.c
+++ b/src/bin/psql/command.c
@@ -253,26 +253,31 @@ exec_command(const char *cmd,
opt1 = read_connect_arg(scan_state);
if (opt1 != NULL && strncmp(opt1, prefix, sizeof(prefix) - 1) 
== 0)
{
-   reuse_previous =
-   ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix) ?
-   TRI_YES : TRI_NO;
-
-   free(opt1);
-   opt1 = read_connect_arg(scan_state);
+   bool on_off;
+   success = ParseVariableBool(opt1 + sizeof(prefix) - 1, 
prefix, _off);
+   if (success)
+   {
+   reuse_previous = on_off ? TRI_YES : TRI_NO;
+   free(opt1);
+   opt1 = read_connect_arg(scan_state);
+   }
}
else
reuse_previous = TRI_DEFAULT;
 
-   opt2 = read_connect_arg(scan_state);
-   opt3 = read_connect_arg(scan_state);
-   opt4 = read_connect_arg(scan_state);
+   if (success)/* do not attempt to connect if 
reuse_previous argument was invalid */
+   {
+   opt2 = read_connect_arg(scan_state);
+   opt3 = read_connect_arg(scan_state);
+   opt4 = read_connect_arg(scan_state);
 
-   success = do_connect(reuse_previous, opt1, opt2, opt3, opt4);
+   success = do_connect(reuse_previous, opt1, opt2, opt3, 
opt4);
 
+   free(opt2);
+   free(opt3);
+   free(opt4);
+   }
free(opt1);
-   free(opt2);
-   free(opt3);
-   free(opt4);
}
 
/* \cd */
@@ -1548,7 +1553,7 @@ exec_command(const char *cmd,

 OT_NORMAL, NULL, false);
 
if (opt)
-   pset.timing = ParseVariableBool(opt, "\\timing");
+   success = ParseVariableBool(opt, "\\timing", 
);
else
pset.timing = !pset.timing;
if (!pset.quiet)
@@ -2660,7 +2665,16 @@ do_pset(const char *param, const char *value, 
printQueryOpt *popt, bool quiet)
if (value && pg_strcasecmp(value, "auto") == 0)
popt->topt.expanded = 2;
else if (value)
-   popt->topt.expanded = ParseVariableBool(value, param);
+   {
+   bool on_off;
+   if (ParseVariableBool(value, NULL, _off))
+   popt->topt.expanded = on_off ? 1 : 0;
+   else
+   {
+   PsqlVarEnumError(param, value, "on, off, auto");
+   return false;
+   }
+   }
else

Re: [HACKERS] Proposal : For Auto-Prewarm.

2017-01-24 Thread Mithun Cy

On Fri, Oct 28, 2016 at 6:36 AM, Tsunakawa, Takayuki
 wrote:
> I welcome this feature!  I remember pg_hibernate did this.   I wonder what 
> happened to pg_hibernate.  Did you check it?
Thanks, when I checked with pg_hibernate there were two things people
complained about it. Buffer loading will start after the recovery is
finished and the database has reached the consistent state. Two It can
replace existing buffers which are loaded due to recovery and newly
connected clients. And this solution tried to solve them.

-- 
Thanks and Regards
Mithun C Y
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improvements in psql hooks for variables

Tom Lane wrote:

> However, it only really works if psql never overwrites the values
> after startup, whereas I believe all of these are overwritten by
> a \c command.

Yes, there are reset to reflect the properties of the new connection.

> So maybe it's more user-friendly to make these variables fully
> reserved, even at the risk of breaking existing scripts.  But
> I don't think it's exactly an open-and-shut question.

You mean if we make that fail:  \set ENCODING UTF8
it's going to make that fail too:
  SELECT something AS "ENCODING"[,...]  \gset
and I agree it's not obvious that this trade-off has to be
made. Personally I'm fine with the status quo and will
not add that hook into the patch unless pressed to.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical-replication.sgml improvements

2017-01-24 Thread Erik Rijkers


On 2017-01-24 17:29, Robert Haas wrote:

On Fri, Jan 20, 2017 at 11:00 AM, Erik Rijkers  wrote:

logical-replication.sgml.diff changes


I had a look at this but these are not all open-and-shut cases.  In
many cases I think your proposed wording is better, but it's probably
a good idea for the people who were involved in the development of the
logical replication feature to have a look at it just to see if there
is anything with which they'd like to disagree.



Fair enough.

I agree that most changes are arguable - it fixes really only one real 
error:


   
Subscriptions are not dumped by pg_dump by 
default but

-   can be requested using the command-line
-   option --subscriptions.
+   this can be requested using the command-line
+   option --include-subscriptions.
   


thanks,

Erik Rijkers


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

On Tue, Jan 24, 2017 at 11:32 AM, Robert Haas  wrote:
> On Fri, Dec 2, 2016 at 1:28 AM, Fabien COELHO  wrote:
>>> Closed in 2016-11 commitfest with "returned with feedback" status.
>>> Please feel free to update the status once you submit the updated patch.
>>
>> Given the thread discussions, I do not understand why this "ready for
>> committer" patch is switched to "return with feedback", as there is nothing
>> actionnable, and I've done everything required to improve the syntax and
>> implementation, and to justify why these functions are useful.
>>
>> I'm spending time to try to make something useful of pgbench, which require
>> a bunch of patches that work together to improve it for new use case,
>> including not being limited to the current set of operators.
>>
>> This decision is both illogical and arbitrary.
>
> I disagree.  I think his decision was probably based on this email from me:

(argh, sent too soon)

https://www.postgresql.org/message-id/ca+tgmoa0zp4a+s+kosav4qfdz-wa56vlph8me86rmpsnkvw...@mail.gmail.com

Nobody responded to that, and I have not seen any committer say that
they are in favor of this.  Against that, three committers (Tom,
Stephen, me) have taken the view that they are opposed to at least
some parts of it.  No changes to the patch have resulted from those
complaints.  So this is just submitting the same thing over and over
again and hoping that the fourth or fifth committer who looks at this
is going to have a different opinion than the first three, or fail to
notice the previous discussion.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgbench more operators & functions

On Fri, Dec 2, 2016 at 1:28 AM, Fabien COELHO  wrote:
>> Closed in 2016-11 commitfest with "returned with feedback" status.
>> Please feel free to update the status once you submit the updated patch.
>
> Given the thread discussions, I do not understand why this "ready for
> committer" patch is switched to "return with feedback", as there is nothing
> actionnable, and I've done everything required to improve the syntax and
> implementation, and to justify why these functions are useful.
>
> I'm spending time to try to make something useful of pgbench, which require
> a bunch of patches that work together to improve it for new use case,
> including not being limited to the current set of operators.
>
> This decision is both illogical and arbitrary.

I disagree.  I think his decision was probably based on this email from me:


-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-24 Thread Joshua D. Drake


On 01/21/2017 09:09 AM, Tom Lane wrote:

Stephen Frost  writes:

As for checksums, I do see value in them and I'm pretty sure that the
author of that particular feature did as well, or we wouldn't even have
it as an option.  You seem to be of the opinion that we might as well
just rip all of that code and work out as being useless.


Not at all; I just think that it's not clear that they are a net win
for the average user,


Tom is correct here. They are not a net win for the average user. We 
tend to forget that although we collectively have a lot of enterprise 
installs where this does matter, we collectively do not equal near the 
level of average user installs.


From an advocacy perspective, the average user install is the one that 
we tend most because that tending (in theory) will grow something that 
is more fruitful e.g; the enterprise install over time because we 
constantly and consistently provided a reasonable and expected 
experience to the average user.


Sincerely,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.
Unless otherwise stated, opinions are my own.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical-replication.sgml improvements

On Fri, Jan 20, 2017 at 11:00 AM, Erik Rijkers  wrote:
> logical-replication.sgml.diff changes

I had a look at this but these are not all open-and-shut cases.  In
many cases I think your proposed wording is better, but it's probably
a good idea for the people who were involved in the development of the
logical replication feature to have a look at it just to see if there
is anything with which they'd like to disagree.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deadlock in XLogInsert at AIX

2017-01-24 Thread Konstantin Knizhnik


More information about the problem - Postgres log contains several records:

2017-01-24 19:15:20.272 MSK [19270462] LOG:  request to flush past end 
of generated WAL; request 6/AAEBE000, currpos 6/AAEBC2B0


and them correspond to the time when deadlock happen.
There is the following comment in xlog.c concerning this message:

/*
 * No-one should request to flush a piece of WAL that hasn't even been
 * reserved yet. However, it can happen if there is a block with a 
bogus

 * LSN on disk, for example. XLogFlush checks for that situation and
 * complains, but only after the flush. Here we just assume that to 
mean

 * that all WAL that has been reserved needs to be finished. In this
 * corner-case, the return value can be smaller than 'upto' argument.
 */

So looks like it should not happen.
The first thing to suspect is spinlock implementation which is different 
for GCC and XLC.
But ... if I rebuild Postgres without spinlocks, then the problem is 
still reproduced.


On 24.01.2017 17:47, Konstantin Knizhnik wrote:

Hi Hackers,

We are running Postgres at AIX and encoountered two strqange problems: 
active zombies process and deadlock in XLOG writer.
First problem I will explain in separate mail, now I am mostly 
concerning about deadlock.
It is irregularly reproduced with standard pgbench launched with 100 
connections.


It sometimes happens with 9.6 stable version of Postgres but only when 
it is compiled with xlc compiler.
We failed to reproduce the problem with GCC. So it looks like as bug 
in compiler or xlc-specific atomics implementation...

But there are few moments which contradicts with this hypothesis:

1. The problem is reproduce with Postgres built without optimization. 
Usually compiler bugs affect only optimized code.

2. Disabling atomics doesn't help.
3. Without optimization and with  LOCK_DEBUG defined time of 
reproducing the problem significantly increased. With optimized code 
it is almost always reproduced in few minutes.

With debug version it usually takes much more time.

But the most confusing thing is stack trace:

(dbx) where
semop(??, ??, ??) at 0x91f5790
PGSemaphoreLock(sema = 0x0a0044b95928), line 387 in "pg_sema.c"
unnamed block in LWLockWaitForVar(lock = 0x0a00d980, valptr = 
0x0a00d9a8, oldval = 102067874256, newval = 
0x0fff9c10), line 1666 in "lwlock.c"
LWLockWaitForVar(lock = 0x0a00d980, valptr = 
0x0a00d9a8, oldval = 102067874256, newval = 
0x0fff9c10), line 1666 in "lwlock.c"
unnamed block in WaitXLogInsertionsToFinish(upto = 102067874328), line 
1583 in "xlog.c"

WaitXLogInsertionsToFinish(upto = 102067874328), line 1583 in "xlog.c"
AdvanceXLInsertBuffer(upto = 102067874256, opportunistic = '\0'), line 
1916 in "xlog.c"

unnamed block in GetXLogBuffer(ptr = 102067874256), line 1697 in "xlog.c"
GetXLogBuffer(ptr = 102067874256), line 1697 in "xlog.c"
CopyXLogRecordToWAL(write_len = 70, isLogSwitch = '\0', rdata = 
0x00011007ce10, StartPos = 102067874256, EndPos = 102067874328), 
line 1279 in "xlog.c"
XLogInsertRecord(rdata = 0x00011007ce10, fpw_lsn = 102067718328), 
line 1011 in "xlog.c"
unnamed block in XLogInsert(rmid = '\n', info = '@'), line 453 in 
"xloginsert.c"

XLogInsert(rmid = '\n', info = '@'), line 453 in "xloginsert.c"
log_heap_update(reln = 0x000110273540, oldbuf = 40544, newbuf = 
40544, oldtup = 0x0fffa2a0, newtup = 0x0001102bb958, 
old_key_tuple = (nil), all_visible_cleared = '\0', 
new_all_visible_cleared = '\0'), line 7708 in "heapam.c"
unnamed block in heap_update(relation = 0x000110273540, otid = 
0x0fffa6f8, newtup = 0x0001102bb958, cid = 1, crosscheck = 
(nil), wait = '^A', hufd = 0x0fffa5b0, lockmode = 
0x0fffa5c8), line 4212 in "heapam.c"
heap_update(relation = 0x000110273540, otid = 0x0fffa6f8, 
newtup = 0x0001102bb958, cid = 1, crosscheck = (nil), wait = '^A', 
hufd = 0x0fffa5b0, lockmode = 0x0fffa5c8), line 4212 
in "heapam.c"
unnamed block in ExecUpdate(tupleid = 0x0fffa6f8, oldtuple = 
(nil), slot = 0x0001102bb308, planSlot = 0x0001102b4630, 
epqstate = 0x0001102b2cd8, estate = 0x0001102b29e0, canSetTag 
= '^A'), line 937 in "nodeModifyTable.c"
ExecUpdate(tupleid = 0x0fffa6f8, oldtuple = (nil), slot = 
0x0001102bb308, planSlot = 0x0001102b4630, epqstate = 
0x0001102b2cd8, estate = 0x0001102b29e0, canSetTag = '^A'), 
line 937 in "nodeModifyTable.c"
ExecModifyTable(node = 0x0001102b2c30), line 1516 in 
"nodeModifyTable.c"

ExecProcNode(node = 0x0001102b2c30), line 396 in "execProcnode.c"
ExecutePlan(estate = 0x0001102b29e0, planstate = 
0x0001102b2c30, use_parallel_mode = '\0', operation = CMD_UPDATE, 
sendTuples = '\0', numberTuples = 0, direction = ForwardScanDirection, 
dest = 0x0001102b7520), line 1569 in "execMain.c"
standard_ExecutorRun(queryDesc = 0x0001102b25c0, direction =

Re: [HACKERS] [ patch ] pg_dump: new --custom-fetch-table and --custom-fetch-value parameters

On Fri, Jan 20, 2017 at 12:52 AM, Andrea Urbani  wrote:
> I have used "custom" parameters because I want to decrease the fetch size 
> only on the tables with big bloab fields. If we remove the 
> "custom-fetch-table" parameter and we provide only the "fetch-size" parameter 
> all the tables will use the new fetch size and the execution time will be 
> slower (according to my few tests). But just "fetch-size" will be faster to 
> use and maybe more clear.
> Well, how to go on? I add it to the commitfest and somebody will decide and 
> fix it?

OK, so I think the idea is that --custom-fetch-size affects only the
tables mentioned in --custom-fetch-table.  I understand why you want
to do it that way but it's kind of messy.  Suppose somebody else comes
along and wants to customize some other thing for some other set of
tables.  Then we'll have --custom2-otherthing and --custom2-tables?
Blech.

Interestingly, this isn't the first attempt to solve a problem of this
type.  Kyotaro Horiguchi ran into a similar issue with postgres_fdw
trying to fetch too much data at once from a remote server:

https://www.postgresql.org/message-id/20150122.192739.164180273.horiguchi.kyotaro%40lab.ntt.co.jp

In the end, all that got done there was a table-level-configurable
fetch limit, and we could do the same thing here (e.g. by adding a
dummy storage parameter that only pg_dump uses).  But I think what we
really ought to do is what Kyotaro Horiguchi proposed originally: have
a way to limit the FETCH command to a certain number of bytes.  If
that number of bytes is exceeded, the FETCH stops after that row even
if the number of rows the user requested isn't fulfilled yet.  The
user can FETCH again if they want more.

Tom wasn't a big fan of this idea, but I thought it was clever and
still do.  And it's undeniable that it provides a much better solution
to this problem than forcing the user to manually tweak the fetch size
based on their installation-specific knowledge of which tables have
blobs large enough that returning 100 rows at a time will exhaust the
local server's memory.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-24 Thread Torsten Zuehlsdorff


On 21.01.2017 19:35, Tom Lane wrote:

Andres Freund  writes:

Sure, it might be easy, but we don't have it.  Personally I think
checksums just aren't even ready for prime time. If we had:
- ability to switch on/off at runtime (early patches for that have IIRC
  been posted)
- *builtin* tooling to check checksums for everything
- *builtin* tooling to compute checksums after changing setting
- configurable background sweeps for checksums


Yeah, and there's a bunch of usability tooling that we don't have,
centered around "what do you do after you get a checksum error?".
AFAIK there's no way to check or clear such an error; but without
such tools, I'm afraid that checksums are as much of a foot-gun
as a benefit.


I wanted to raise the same issue. A "something is broken" flag is fine 
to avoid more things get broken. But if you can't repair them, its not 
very useful.


Since i'm a heavy user of ZFS: there are checksums and if you enable 
shadow-copies or using a raid, checksums are helpful, since the allow to 
recover from the problems.


I personally would prefer to enable checksums manually and than get the 
possibility to repair damages. Manually because this would at least 
double the needed space.


Greetings,
Torsten


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Checksums by default?

2017-01-24 Thread Torsten Zuehlsdorff




On 21.01.2017 19:37, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

Stephen Frost  writes:

Because I see having checksums as, frankly, something we always should
have had (as most other databases do, for good reason...) and because
they will hopefully prevent data loss.  I'm willing to give us a fair
bit to minimize the risk of losing data.


To be perfectly blunt, that's just magical thinking.  Checksums don't
prevent data loss in any way, shape, or form.  In fact, they can *cause*
data loss, or at least make it harder for you to retrieve your data,
in the event of bugs causing false-positive checksum failures.


This is not a new argument, at least to me, and I don't agree with it.


I don't agree also. Yes, statistically it is more likely that checksum 
causes data-loss. The IO is greater, therefore the disc has more to do 
and breaks faster.
But the same is true for RAID: adding more disk increases the odds of an 
disk-fallout.


So: yes. If you use checksums at a single disc its more likely to cause 
problems. But if you managed it right (like ZFS for example) its an 
overall gain.



What checksums can do for you, perhaps, is notify you in a reasonably
timely fashion if you've already lost data due to storage-subsystem
problems.  But in a pretty high percentage of cases, that fact would
be extremely obvious anyway, because of visible data corruption.


Exactly, and that awareness will allow a user to prevent further data
loss or corruption.  Slow corruption over time is a very much known and
accepted real-world case that people do experience, as well as bit
flipping enough for someone to write a not-that-old blog post about
them:

https://blogs.oracle.com/ksplice/entry/attack_of_the_cosmic_rays1

A really nice property of checksums on pages is that they also tell you
what data you *didn't* lose, which can be extremely valuable.


Indeed!

Greetings,
Torsten


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Reindent table partitioning code.

On Tue, Jan 24, 2017 at 10:28 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> Reindent table partitioning code.
>
> Oh, thank you, I was starting to get annoyed with that too.

Glad you like.  The pgindent damage introduced by the logical
replication commits is even more extensive, but I didn't want to touch
that without discussion.  Peter might want to look for a time when not
too many patches are pending to do something similar, though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Transaction traceability - txid_status(bigint)

On Mon, Jan 23, 2017 at 1:32 AM, Craig Ringer  wrote:
> Rebased patch attached. I've split the clog changes out from
> txid_status() its self.

I think it's fairly surprising that TruncateCLOG() has responsibility
for writing the xlog record that protects advancing
ShmemVariableCache->oldestXid, but not the responsibility for actually
advancing that value.  In other words, I think the AdvanceOldestXid()
call in vac_truncate_clog() should be moved into TruncateClog().
(Similarly, one wonders why AdvanceOldestCommitTsXid() isn't the
responsibility of TruncateCommitTs().)

I think it is not correct to advance oldestXid but not oldestXidDB.
Otherwise, GetNewTransactionId() might complain about the wrong
database.

The way that SetTransactionIdLimit() now works looks a bit dangerous.
xidWrapLimit, xidStopLimit, and xidWarnLimit are computed based on the
passed-in oldestXid value and written straight into shared memory.
But the shared memory copy of oldestXid could have a different value.
I'm not sure if that breaks anything, but it certainly weakens any
confidence callers might have had that all those values are consistent
with each other.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Assignment of valid collation for SET operations on queries with UNKNOWN types.

I wrote:
> Michael Paquier  writes:
>> The table of Pseudo-Types needs to be updated as well with unknown in
>> datatype.sgml.

> Check.

Here's an updated patch with doc changes.  Aside from that one,
I tried to spell "pseudo-type" consistently, and I changed one
place where we were calling something a pseudo-type that isn't.

regards, tom lane

diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
index 3bc6854..9ef7b4a 100644
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*** SELECT * FROM pg_attribute
*** 4661,4666 
--- 4661,4670 
 
  
 
+ unknown
+
+ 
+
  opaque
 
  
*** SELECT * FROM pg_attribute
*** 4782,4789 
 
  
 
  opaque
! An obsolete type name that formerly served all the above purposes.
 

   
--- 4786,4800 
 
  
 
+ unknown
+ Identifies a not-yet-resolved type, e.g. of an undecorated
+  string literal.
+
+ 
+
  opaque
! An obsolete type name that formerly served many of the above
!  purposes.
 

   
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index d7117cb..aebe898 100644
*** a/doc/src/sgml/ddl.sgml
--- b/doc/src/sgml/ddl.sgml
*** WHERE c.altitude  500 AND c.tableoid
*** 2579,2585 
  

 Another way to get the same effect is to use the regclass
!pseudo-type, which will print the table OID symbolically:
  
  
  SELECT c.tableoid::regclass, c.name, c.altitude
--- 2579,2585 
  

 Another way to get the same effect is to use the regclass
!alias type, which will print the table OID symbolically:
  
  
  SELECT c.tableoid::regclass, c.name, c.altitude
diff --git a/doc/src/sgml/plhandler.sgml b/doc/src/sgml/plhandler.sgml
index 0fc5d7b..57a2a05 100644
*** a/doc/src/sgml/plhandler.sgml
--- b/doc/src/sgml/plhandler.sgml
***
*** 26,32 
  language such as C, using the version-1 interface, and registered
  with PostgreSQL as taking no arguments
  and returning the type language_handler.  This
! special pseudotype identifies the function as a call handler and
  prevents it from being called directly in SQL commands.
  For more details on C language calling conventions and dynamic loading,
  see .
--- 26,32 
  language such as C, using the version-1 interface, and registered
  with PostgreSQL as taking no arguments
  and returning the type language_handler.  This
! special pseudo-type identifies the function as a call handler and
  prevents it from being called directly in SQL commands.
  For more details on C language calling conventions and dynamic loading,
  see .
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 5f89db5..e315548 100644
*** a/doc/src/sgml/protocol.sgml
--- b/doc/src/sgml/protocol.sgml
***
*** 668,674 
   number of parameter symbols ($n)
   used in the query string.  Another special case is that a parameter's
   type can be specified as void (that is, the OID of the
!  void pseudotype).  This is meant to allow parameter symbols
   to be used for function parameters that are actually OUT parameters.
   Ordinarily there is no context in which a void parameter
   could be used, but if such a parameter symbol appears in a function's
--- 668,674 
   number of parameter symbols ($n)
   used in the query string.  Another special case is that a parameter's
   type can be specified as void (that is, the OID of the
!  void pseudo-type).  This is meant to allow parameter symbols
   to be used for function parameters that are actually OUT parameters.
   Ordinarily there is no context in which a void parameter
   could be used, but if such a parameter symbol appears in a function's
diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index ef623d5..30792f4 100644
*** a/doc/src/sgml/queries.sgml
--- b/doc/src/sgml/queries.sgml
*** SELECT * FROM vw_getfoo;
*** 762,768 
   In some cases it is useful to define table functions that can
   return different column sets depending on how they are invoked.
   To support this, the table function can be declared as returning
!  the pseudotype record.  When such a function is used in
   a query, the expected row structure must be specified in the
   query itself, so that the system can know how to parse and plan
   the query.  This syntax looks like:
--- 762,768 
   In some cases it is useful to define table functions that can
   return different column sets depending on how they are invoked.
   To support this, the table function can be declared as returning
!  the pseudo-type record.  When such a function is used in
   a query, the expected row

Re: [HACKERS] Active zombies at AIX

2017-01-24 Thread Konstantin Knizhnik




On 24.01.2017 18:26, Tom Lane wrote:

Konstantin Knizhnik  writes:

But ps shows that status of process is 
[14:46:02]root@postgres:~ # ps -elk | grep 25691556
   * A - 25691556 - - - - - 

As far as I could find by googling, this means that the process is
not actually a zombie yet, so it's not the postmaster's fault.

Apparently it's possible in some versions of AIX for an exiting process to
get stuck while releasing its reference to a socket, though I couldn't
find much detail about that.  I wonder how old your AIX is ...


It is AIX 7.1 (I expect that it is most recent version of AIX).




regards, tom lane


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Active zombies at AIX

Konstantin Knizhnik  writes:
> But ps shows that status of process is 
> [14:46:02]root@postgres:~ # ps -elk | grep 25691556
>   * A - 25691556 - - - - - 

As far as I could find by googling, this means that the process is
not actually a zombie yet, so it's not the postmaster's fault.

Apparently it's possible in some versions of AIX for an exiting process to
get stuck while releasing its reference to a socket, though I couldn't
find much detail about that.  I wonder how old your AIX is ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Add pg_sequence system catalog

2017-01-24 Thread Stephen Frost

Peter,

* Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> On 1/19/17 11:03 AM, Stephen Frost wrote:
> > I'd suggest using our usual approach in pg_dump, which is matching based
> > on the OID, like so:
> > 
> > WHERE c.oid = '%u'::oid
> > 
> > The OID is in: tbinfo->dobj.catId.oid
> > 
> > Also, you should move the selectSourceSchema() into the per-version
> > branches and set it to 'pg_catalog' for PG10 and up, which would allow
> > you to avoid having to qualify the table names, et al.
> 
> Does the attached patch look correct to you?

On a one-over, yes, that looks correct and avoids the issue of running
in a user's schema.

It wouldn't hurt to add a comment as to why things are so different in
PG10 than other versions, ie:

/*
 * In PG10, sequence meta-data was moved into pg_sequence, so switch to
 * the pg_catalog schema instead of operating in a user schema and pull
 * the necessary meta-data from there.
 */

Thanks!

Stephen

signature.asc
Description: Digital signature

[HACKERS] Active zombies at AIX

2017-01-24 Thread Konstantin Knizhnik


Hi hackers,

Yet another story about AIX. For some reasons AIX very slowly cleaning 
zombie processes.
If we launch pgbench with -C parameter then very soon limit for maximal 
number of connections is exhausted.
If maximal number of connection is set to 1000, then after ten seconds 
of pgbench activity we get about 900 zombie processes and it takes about 
100 seconds (!)

before all of them are terminated.

proctree shows a lot of defunt processes:

[14:44:41]root@postgres:~ # proctree 26084446
26084446 /opt/postgresql/xlc/9.6/bin/postgres -D /postg_fs/postgresql/xlc
4784362 
4980786 
11403448 
11468930 
11993176 
12189710 
12517390 
13238374 
13565974 
13893826 postgres: wal writer process
14024716 
15401000 
...
25691556 

But ps shows that status of process is 

[14:46:02]root@postgres:~ # ps -elk | grep 25691556

 * A - 25691556 - - - - - 


Breakpoint set in reaper() function in postmaster shows that each 
invocation of this functions (called by SIGCHLD handler) proceed 5-10 
PIDS per invocation.
So there are two hypothesis: either AIX is very slowly delivering 
SIGCHLD to parent, either exit of process takes too much time.


The fact the backends are in exiting state makes second hypothesis more 
reliable.
We have tried different Postgres configurations with local and TCP 
sockets, with different amount of shared buffers and built both with gcc 
and xlc.

In all cases behavior is similar: zombies do not want to die.
As far as it is not possible to attach debugger to defunct process, it 
is not clear how to understand what's going on.


I wonder if somebody has encountered similar problems at AIX and may be 
can suggest some solution to solve this problem.

Thanks in advance

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] [PATCH] Change missleading comment for tm_mon field of pg_tm structure