date:20151201

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Alvaro Herrera

Bruce Momjian wrote:

> Do we still have licensing issues if we ship Postgres and OpenSSL
> together?

See
https://www.postgresql.org/message-id/20150801151410.GA28344%40awork2.anarazel.de

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Bruce Momjian

On Tue, Dec  1, 2015 at 03:35:39PM -0500, Robert Haas wrote:
> > Well, you'd have to use MacPorts' version of the openssl libraries,
> > too, since there'd be no certainty that their headers match the
> > Apple-provided libraries (in fact, I'd bet a lot that they don't).
> > This would be a pain if you wanted to put your compiled PG executables
> > on some other Mac.
> 
> Yeah, I guess it means that people building for MacOS X will probably
> have to ship OpenSSL as a dependency, which also means that they will
> need to update it when new versions are released.  That is already a
> pretty obnoxious disease on Windows, and it's unfortunate to see it
> spreading.  It would save us a good deal of staff time here at
> EnterpriseDB if we didn't have to do new releases of everything on
> Windows every time there is an OpenSSL update.

Do we still have licensing issues if we ship Postgres and OpenSSL
together?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Rework the way multixact truncations work

2015-12-01 Thread Peter Geoghegan

On Tue, Dec 1, 2015 at 2:07 PM, Robert Haas  wrote:
> Hmm.  I read Peter's message as agreeing with Andres rather than with
> you.  And I have to say I agree with Andres as well.  I think it's
> weird to back a commit out only to put a bunch of very similar stuff
> back in.

Your interpretation was correct. I think it's surprising to structure
things this way, especially since we haven't done things this way in
the past.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: Multixact slru doesn't don't force WAL flushes in SlruPhysicalWritePage()

2015-12-01 Thread Robert Haas

On Sat, Nov 28, 2015 at 11:15 PM, Noah Misch  wrote:
> On Tue, Nov 10, 2015 at 11:22:47PM -0500, Noah Misch wrote:
>> On Mon, Nov 09, 2015 at 10:40:07PM +0100, Andres Freund wrote:
>> > /*
>> >  * Optional array of WAL flush LSNs associated with entries in the SLRU
>> >  * pages.  If not zero/NULL, we must flush WAL before writing pages 
>> > (true
>> >  * for pg_clog, false for multixact, pg_subtrans, pg_notify).  
>> > group_lsn[]
>> >  * has lsn_groups_per_page entries per buffer slot, each containing the
>> >  * highest LSN known for a contiguous group of SLRU entries on that 
>> > slot's
>> >  * page.
>> >  */
>> > XLogRecPtr *group_lsn;
>> > int lsn_groups_per_page;
>> >
>
>> Here's the multixact.c comment justifying it:
>>
>>  * XLOG interactions: this module generates an XLOG record whenever a new
>>  * OFFSETs or MEMBERs page is initialized to zeroes, as well as an XLOG 
>> record
>>  * whenever a new MultiXactId is defined.  This allows us to completely
>>  * rebuild the data entered since the last checkpoint during XLOG replay.
>>  * Because this is possible, we need not follow the normal rule of
>>  * "write WAL before data"; the only correctness guarantee needed is that
>>  * we flush and sync all dirty OFFSETs and MEMBERs pages to disk before a
>>  * checkpoint is considered complete.  If a page does make it to disk ahead
>>  * of corresponding WAL records, it will be forcibly zeroed before use 
>> anyway.
>>  * Therefore, we don't need to mark our pages with LSN information; we have
>>  * enough synchronization already.
>>
>> The comment's justification is incomplete, though.  What of pages filled over
>> the course of multiple checkpoint cycles?
>
> Having studied this further, I think it is safe for a reason given elsewhere:
>
>  * Note: we need not flush this XLOG entry to disk before proceeding. 
> The
>  * only way for the MXID to be referenced from any data page is for
>  * heap_lock_tuple() to have put it there, and heap_lock_tuple() 
> generates
>  * an XLOG record that must follow ours.  The normal LSN interlock 
> between
>  * the data page and that XLOG record will ensure that our XLOG record
>  * reaches disk first.  If the SLRU members/offsets data reaches disk
>  * sooner than the XLOG record, we do not care because we'll 
> overwrite it
>  * with zeroes unless the XLOG record is there too; see notes at top 
> of
>  * this file.
>
> I find no flaw in the first three sentences.  In most cases, one of
> multixact_redo(), TrimMultiXact() or ExtendMultiXactOffset() will indeed
> overwrite the widowed mxid data with zeroes before the system again writes
> data in that vicinity.  We can fail to do that if a backend stalls just after
> calling GetNewMultiXactId(), then recovers and updates SLRU pages long after
> other backends filled the balance of those pages.  That's still okay; if we
> never flush the XLOG_MULTIXACT_CREATE_ID record, no xmax referring to the mxid
> will survive recovery.  I drafted the attached comment update to consolidate
> and slightly correct this justification.  (If we were designing the multixact
> subsystem afresh today, I'd vote for following the write-WAL-before-data rule
> despite believing it is not needed.  The simplicity would be worth it.)
>
> While contemplating the "stalls ... just after calling GetNewMultiXactId()"
> scenario, I noticed a race condition not involving any write-WAL-before-data
> violation.  MultiXactIdCreateFromMembers() and callees have this skeleton:
>
> GetNewMultiXactId
>   LWLockAcquire(MultiXactGenLock, LW_EXCLUSIVE)
>   ExtendMultiXactOffset()
>   ExtendMultiXactMember()
>   START_CRIT_SECTION()
>   (MultiXactState->nextMXact)++
>   MultiXactState->nextOffset += nmembers
>   LWLockRelease(MultiXactGenLock)
> XLogInsert(RM_MULTIXACT_ID, XLOG_MULTIXACT_CREATE_ID)
> RecordNewMultiXact
>   LWLockAcquire(MultiXactOffsetControlLock, LW_EXCLUSIVE)
>   *offptr = offset;  /* update pg_multixact/offsets SLRU page */
>   LWLockRelease(MultiXactOffsetControlLock)
>   LWLockAcquire(MultiXactMemberControlLock, LW_EXCLUSIVE)
>   *memberptr = members[i].xid;  /* update pg_multixact/members SLRU page */
>   *flagsptr = flagsval;  /* update pg_multixact/members SLRU page */
>   LWLockRelease(MultiXactMemberControlLock)
> END_CRIT_SECTION
>
> Between GetNewMultiXactId() and XLogInsert(), other backends will be free to
> create higher-numbered mxids.  They will skip over SLRU space the current
> backend reserved.  Future GetMultiXactIdMembers() calls rely critically on the
> current backend eventually finishing:
>
>  * 2. The next multixact may still be in process of being filled in: 
> that
>  * is, another process may have done GetNewMultiXactId but not yet 
> written
>  * the offset entry for that ID.  In that scenario, it is guaranteed 
> that
>  * the offset

Re: [HACKERS] silent data loss with ext4 / all current versions

2015-12-01 Thread Peter Eisentraut

On 11/27/15 8:18 AM, Michael Paquier wrote:
> On Fri, Nov 27, 2015 at 8:17 PM, Tomas Vondra
>  wrote:
>> > So, what's going on? The problem is that while the rename() is atomic, it's
>> > not guaranteed to be durable without an explicit fsync on the parent
>> > directory. And by default we only do fdatasync on the recycled segments,
>> > which may not force fsync on the directory (and ext4 does not do that,
>> > apparently).
> Yeah, that seems to be the way the POSIX spec clears things.
> "If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall
> force all currently queued I/O operations associated with the file
> indicated by file descriptor fildes to the synchronized I/O completion
> state. All I/O operations shall be completed as defined for
> synchronized I/O file integrity completion."
> http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
> If I understand that right, it is guaranteed that the rename() will be
> atomic, meaning that there will be only one file even if there is a
> crash, but that we need to fsync() the parent directory as mentioned.

I don't see anywhere in the spec that a rename needs an fsync of the
directory to be durable.  I can see why that would be needed in
practice, though.  File system developers would probably be able to give
a more definite answer.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2015-12-01 Thread Tomas Vondra


Attached is v2 of the patch, that

(a) adds explicit fsync on the parent directory after all the rename()
calls in timeline.c, xlog.c, xlogarchive.c and pgarch.c

(b) adds START/END_CRIT_SECTION around the new fsync_fname calls
(except for those in timeline.c, as the START/END_CRIT_SECTION is
not available there)

The patch is fairly trivial and I've done some rudimentary testing, but 
I'm sure I haven't exercised all the modified paths.


regards
Tomas

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/transam/timeline.c b/src/backend/access/transam/timeline.c
index c6862a8..998e50b 100644
--- a/src/backend/access/transam/timeline.c
+++ b/src/backend/access/transam/timeline.c
@@ -437,6 +437,9 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
 		tmppath, path)));
 #endif
 
+	/* Make sure the rename is permanent by fyncing the directory. */
+	fsync_fname(XLOGDIR, true);
+
 	/* The history file can be archived immediately. */
 	if (XLogArchivingActive())
 	{
@@ -526,6 +529,9 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
  errmsg("could not rename file \"%s\" to \"%s\": %m",
 		tmppath, path)));
 #endif
+
+	/* Make sure the rename is permanent by fyncing the directory. */
+	fsync_fname(XLOGDIR, true);
 }
 
 /*
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f17f834..de24a09 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3282,6 +3282,10 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
 	}
 #endif
 
+	START_CRIT_SECTION();
+	fsync_fname(XLOGDIR, true);
+	END_CRIT_SECTION();
+
 	if (use_lock)
 		LWLockRelease(ControlFileLock);
 
@@ -3806,6 +3810,11 @@ RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
 #else
 		rc = unlink(path);
 #endif
+
+		START_CRIT_SECTION();
+		fsync_fname(XLOGDIR, true);
+		END_CRIT_SECTION();
+
 		if (rc != 0)
 		{
 			ereport(LOG,
@@ -5302,6 +5311,10 @@ exitArchiveRecovery(TimeLineID endTLI, XLogRecPtr endOfLog)
  errmsg("could not rename file \"%s\" to \"%s\": %m",
 		RECOVERY_COMMAND_FILE, RECOVERY_COMMAND_DONE)));
 
+	START_CRIT_SECTION();
+	fsync_fname(".", true);
+	END_CRIT_SECTION();
+
 	ereport(LOG,
 			(errmsg("archive recovery complete")));
 }
@@ -6155,6 +6168,11 @@ StartupXLOG(void)
 TABLESPACE_MAP, BACKUP_LABEL_FILE),
 		 errdetail("Could not rename file \"%s\" to \"%s\": %m.",
    TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
+
+			/* fsync the data directory to persist the rename() */
+			START_CRIT_SECTION();
+			fsync_fname(".", true);
+			END_CRIT_SECTION();
 		}
 
 		/*
@@ -6522,6 +6540,14 @@ StartupXLOG(void)
 TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
 		}
 
+		/* fsync the data directory to persist the rename() */
+		if (haveBackupLabel || haveTblspcMap)
+		{
+			START_CRIT_SECTION();
+			fsync_fname(".", true);
+			END_CRIT_SECTION();
+		}
+
 		/* Check that the GUCs used to generate the WAL allow recovery */
 		CheckRequiredParameterValues();
 
@@ -7303,6 +7329,10 @@ StartupXLOG(void)
 		 errmsg("could not rename file \"%s\" to \"%s\": %m",
 origpath, partialpath)));
 XLogArchiveNotify(partialfname);
+
+START_CRIT_SECTION();
+fsync_fname(XLOGDIR, true);
+END_CRIT_SECTION();
 			}
 		}
 	}
@@ -10906,6 +10936,11 @@ CancelBackup(void)
 		   BACKUP_LABEL_FILE, BACKUP_LABEL_OLD,
 		   TABLESPACE_MAP, TABLESPACE_MAP_OLD)));
 	}
+
+	/* fsync the data directory to persist the renames */
+	START_CRIT_SECTION();
+	fsync_fname(".", true);
+	END_CRIT_SECTION();
 }
 
 /*
diff --git a/src/backend/access/transam/xlogarchive.c b/src/backend/access/transam/xlogarchive.c
index 7af56a9..1219f4b 100644
--- a/src/backend/access/transam/xlogarchive.c
+++ b/src/backend/access/transam/xlogarchive.c
@@ -476,6 +476,10 @@ KeepFileRestoredFromArchive(char *path, char *xlogfname)
  errmsg("could not rename file \"%s\" to \"%s\": %m",
 		path, xlogfpath)));
 
+	START_CRIT_SECTION();
+	fsync_fname(XLOGDIR, true);
+	END_CRIT_SECTION();
+
 	/*
 	 * Create .done file forcibly to prevent the restored segment from being
 	 * archived again later.
@@ -586,6 +590,10 @@ XLogArchiveForceDone(const char *xlog)
 	 errmsg("could not rename file \"%s\" to \"%s\": %m",
 			archiveReady, archiveDone)));
 
+		START_CRIT_SECTION();
+		fsync_fname(XLOGDIR "/archive_status", true);
+		END_CRIT_SECTION();
+
 		return;
 	}
 
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 4df669e..be59442 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -733,4 +733,8 @@ pgarch_archiveDone(char *xlog)
 (errcode_for_file_access(),
  errmsg("could not rename file \"%s\" to \"%s\": %m",
 		rlogready, rlogdone)));
+
+	START_CRIT_SECTION();
+	fsync_fname(XLOGDIR "/archive_status",

Re: [HACKERS] Rework the way multixact truncations work

2015-12-01 Thread Robert Haas

On Fri, Nov 27, 2015 at 5:16 PM, Noah Misch  wrote:
> On Mon, Nov 23, 2015 at 11:44:45AM -0800, Peter Geoghegan wrote:
>> On Sun, Nov 8, 2015 at 11:52 AM, Noah Misch  wrote:
>> >> I'm not following along right now - in order to make cleanups the plan is 
>> >> to revert a couple commits and then redo them prettyfied?
>> >
>> > Yes, essentially.  Given the volume of updates, this seemed neater than
>> > framing those updates as in-tree incremental development.
>>
>> I think that's an odd way of representing this work. I tend to
>> remember roughly when major things were committed even years later. An
>> outright revert should represent a total back out of the original
>> commit IMV. Otherwise, a git blame can be quite misleading.
>
> I think you're saying that "clearer git blame" is a more-important reason than
> "volume of updates" for preferring an outright revert over in-tree incremental
> development.  Fair preference.  If that's a correct reading of your message,
> then we do agree on the bottom line.

Hmm.  I read Peter's message as agreeing with Andres rather than with
you.  And I have to say I agree with Andres as well.  I think it's
weird to back a commit out only to put a bunch of very similar stuff
back in.  Even figuring out what you've actually changed here seems
rather hard.  I couldn't get github to give me a diff showing your
changes vs. master.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Magnus Hagander

On Tue, Dec 1, 2015 at 9:14 PM, Tom Lane  wrote:

> Robert Haas  writes:
> > On Tue, Dec 1, 2015 at 2:56 PM, Tom Lane  wrote:
> >> "David E. Wheeler"  writes:
> >>> I don’t suppose anyone has looked at what it would take to get
> PostgreSQL use Secure Transport, right?
>
> >> This is going to put a bit more urgency into the project Heikki had been
> >> working on to allow use of more than one SSL implementation.  I can't
> >> really see us back-porting that, though, which is going to leave things
> >> in a fairly nasty place for all pre-9.6 branches ...
>
> > I think it'd be great to finish that project, but having to use
> > MacPorts to install the headers isn't really a big deal, is it?
>
> Well, you'd have to use MacPorts' version of the openssl libraries,
> too, since there'd be no certainty that their headers match the
> Apple-provided libraries (in fact, I'd bet a lot that they don't).
> This would be a pain if you wanted to put your compiled PG executables
> on some other Mac.
>

Presumably the folks who build Postgres.app and the EDB installers will
take care of that for the big majority of people though, won't they?

I agree it's something we should fix, but I'm not sure it's that urgent.
It's no different from what Windows people have been dealing with all
along, is it? And while it affects pg developers, I doubt it'll hit that
many users?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Bruce Momjian

On Tue, Dec  1, 2015 at 06:40:09PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > Do we still have licensing issues if we ship Postgres and OpenSSL
> > together?
> 
> See
> https://www.postgresql.org/message-id/20150801151410.GA28344%40awork2.anarazel.de

True, but the current license is unchanged and has the advertising
clause, which I think we have to honor if we ship OpenSSL:

https://www.openssl.org/source/license.html

I assume Windows has to ship OpenSSL with the installer and has to abide
by this, for example.  OSX might have to do the same.  It might be good
to see what we do for Windows packages.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Robert Haas

On Tue, Dec 1, 2015 at 3:14 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Tue, Dec 1, 2015 at 2:56 PM, Tom Lane  wrote:
>>> "David E. Wheeler"  writes:
 I don’t suppose anyone has looked at what it would take to get PostgreSQL 
 use Secure Transport, right?
>
>>> This is going to put a bit more urgency into the project Heikki had been
>>> working on to allow use of more than one SSL implementation.  I can't
>>> really see us back-porting that, though, which is going to leave things
>>> in a fairly nasty place for all pre-9.6 branches ...
>
>> I think it'd be great to finish that project, but having to use
>> MacPorts to install the headers isn't really a big deal, is it?
>
> Well, you'd have to use MacPorts' version of the openssl libraries,
> too, since there'd be no certainty that their headers match the
> Apple-provided libraries (in fact, I'd bet a lot that they don't).
> This would be a pain if you wanted to put your compiled PG executables
> on some other Mac.

Yeah, I guess it means that people building for MacOS X will probably
have to ship OpenSSL as a dependency, which also means that they will
need to update it when new versions are released.  That is already a
pretty obnoxious disease on Windows, and it's unfortunate to see it
spreading.  It would save us a good deal of staff time here at
EnterpriseDB if we didn't have to do new releases of everything on
Windows every time there is an OpenSSL update.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2015-12-01 Thread Tomas Vondra




On 12/01/2015 10:44 PM, Peter Eisentraut wrote:

On 11/27/15 8:18 AM, Michael Paquier wrote:

On Fri, Nov 27, 2015 at 8:17 PM, Tomas Vondra
 wrote:

So, what's going on? The problem is that while the rename() is atomic, it's
not guaranteed to be durable without an explicit fsync on the parent
directory. And by default we only do fdatasync on the recycled segments,
which may not force fsync on the directory (and ext4 does not do that,
apparently).

Yeah, that seems to be the way the POSIX spec clears things.
"If _POSIX_SYNCHRONIZED_IO is defined, the fsync() function shall
force all currently queued I/O operations associated with the file
indicated by file descriptor fildes to the synchronized I/O completion
state. All I/O operations shall be completed as defined for
synchronized I/O file integrity completion."
http://pubs.opengroup.org/onlinepubs/009695399/functions/fsync.html
If I understand that right, it is guaranteed that the rename() will be
atomic, meaning that there will be only one file even if there is a
crash, but that we need to fsync() the parent directory as mentioned.


I don't see anywhere in the spec that a rename needs an fsync of the
 directory to be durable. I can see why that would be needed in
practice, though. File system developers would probably be able to
give a more definite answer.


Yeah, POSIX is the smallest common denominator. In this case the spec 
seems not to require this durability guarantee (rename without fsync on 
directory), which allows a POSIX-compliant filesystem.


At least that's my conclusion from reading https://lwn.net/Articles/322823/

However, as I explained in the original post, it's more complicated as 
this only seems to be problem with fdatasync. I've been unable to 
reproduce the issue with wal_sync_method=fsync.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Robert Haas

On Tue, Dec 1, 2015 at 2:56 PM, Tom Lane  wrote:
> "David E. Wheeler"  writes:
>> Looks like Mac OS X 10.11 El Capitan has remove the OpenSSL header files. 
>> They recommend building your own or using native OS X SDKs, like Secure 
>> Transport:
>>   http://lists.apple.com/archives/macnetworkprog/2015/Jun/msg00025.html
>
> That's annoying.
>
>> I don’t suppose anyone has looked at what it would take to get PostgreSQL 
>> use Secure Transport, right?
>
> This is going to put a bit more urgency into the project Heikki had been
> working on to allow use of more than one SSL implementation.  I can't
> really see us back-porting that, though, which is going to leave things
> in a fairly nasty place for all pre-9.6 branches ...

I think it'd be great to finish that project, but having to use
MacPorts to install the headers isn't really a big deal, is it?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Tom Lane

Robert Haas  writes:
> On Tue, Dec 1, 2015 at 2:56 PM, Tom Lane  wrote:
>> "David E. Wheeler"  writes:
>>> I donât suppose anyone has looked at what it would take to get PostgreSQL 
>>> use Secure Transport, right?

>> This is going to put a bit more urgency into the project Heikki had been
>> working on to allow use of more than one SSL implementation.  I can't
>> really see us back-porting that, though, which is going to leave things
>> in a fairly nasty place for all pre-9.6 branches ...

> I think it'd be great to finish that project, but having to use
> MacPorts to install the headers isn't really a big deal, is it?

Well, you'd have to use MacPorts' version of the openssl libraries,
too, since there'd be no certainty that their headers match the
Apple-provided libraries (in fact, I'd bet a lot that they don't).
This would be a pain if you wanted to put your compiled PG executables
on some other Mac.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-01 Thread Alvaro Herrera

Michael Paquier wrote:

> OK... I have merged TestLib and PostgresNode of the previous patch
> into PostgresNode into the way suggested by Noah. TestBase has been
> renamed back to TestLib, and includes as well the base test functions
> like command_ok.

Great, thanks.  Here's one more version, hopefully the last one.

- I discovered that not setting PGPORT was causing some of the tests
  that fail (using command_fails) to fail to test what was being tested.
  The problem is that the command was failing with "could not connect to
  server" instead of failing because of trying to cluster a nonexistant
  table, etc.  Unfortunately the only way to verify this is by looking
  at the regress_log_xxx_foobar file.  Not ideal, but not this patch's
  fault.

- I changed the routines moved to PostgresNode so that they are instance
  methods, i.e. $node->poll_until_query; also psql and
  issues_query_like.  The latter also sets "local $PGPORT" so that the
  command is run with the correct port.

- It would be nice to have command_ok and command_fails in PostgresNode
  too; that would remove the need for setting $ENV{PGPORT} but it's
  possible to run commands outside a node too, so we'd need duplicates,
  which would be worse.

- I changed start/stop/restart so that they keep track of the postmaster
  PID; also added a DESTROY sub to PostgresNode that sends SIGQUIT.
  This means that when the test finishes, the server gets an immediate
  stop signal.  We were getting a lot of errors in the server log about
  failing to write to the stats file otherwise, until the node noticed
  that the datadir was gone.

- I removed the @active_nodes array, which is now unnecessary (per
  above).

- I moved the "delete $ENV{PGxxx}" back to a BEGIN block in TestLib.
  I did it because it's possible to write test scripts without
  PostgresNode, and it's not nice to have those pick up the environment.
  This still affects anything using PostgresNode because that one uses
  TestLib.

Finally, I ran perltidy on all the files, which strangely changed stuff
that I didn't expect it to change.  I wonder if this is related to the
perltidy version.  Do you see further changes if you run perltidy on the
patched tree?  This is my version:

$ perltidy --version
This is perltidy, v20140328 

Copyright 2000-2014, Steve Hancock

Perltidy is free software and may be copied under the terms of the GNU
General Public License, which is included in the distribution files.

Complete documentation for perltidy can be found using 'man perltidy'
or on the internet at http://perltidy.sourceforge.net.


-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 299dcf5..f64186d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -4,6 +4,7 @@
 
 use strict;
 use warnings;
+use PostgresNode;
 use TestLib;
 use Test::More tests => 14;
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index dc96bbf..86cc14e 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -2,6 +2,7 @@ use strict;
 use warnings;
 use Cwd;
 use Config;
+use PostgresNode;
 use TestLib;
 use Test::More tests => 51;
 
@@ -9,8 +10,15 @@ program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
 program_options_handling_ok('pg_basebackup');
 
-my $tempdir = tempdir;
-start_test_server $tempdir;
+my $tempdir = TestLib::tempdir;
+
+my $node = get_new_node();
+# Initialize node without replication settings
+$node->init(hba_permit_replication => 0);
+$node->start;
+my $pgdata = $node->data_dir;
+
+$ENV{PGPORT} = $node->port;
 
 command_fails(['pg_basebackup'],
 	'pg_basebackup needs target directory specified');
@@ -26,19 +34,19 @@ if (open BADCHARS, ">>$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
 	close BADCHARS;
 }
 
-configure_hba_for_replication "$tempdir/pgdata";
-system_or_bail 'pg_ctl', '-D', "$tempdir/pgdata", 'reload';
+$node->set_replication_conf();
+system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
 
 command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup" ],
 	'pg_basebackup fails because of WAL configuration');
 
-open CONF, ">>$tempdir/pgdata/postgresql.conf";
+open CONF, ">>$pgdata/postgresql.conf";
 print CONF "max_replication_slots = 10\n";
 print CONF "max_wal_senders = 10\n";
 print CONF "wal_level = archive\n";
 close CONF;
-restart_test_server;
+$node->restart;
 
 command_ok([ 'pg_basebackup', '-D', "$tempdir/backup" ],
 	'pg_basebackup runs');
@@ -81,13 +89,13 @@ command_fails(
 
 # Tar format doesn't support filenames longer than 100 bytes.
 my $superlongname = "superlongname_" . ("x" x 100);
-my $superlongpath = "$tempdir/pgdata/$superlongname";
+my $superlongpath = "$pgdata/$superlongname";
 
 open FILE, ">$superlongpath" or die "unable to create file $superlongpath";

Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 3:30 AM, Dean Rasheed  wrote:
> On 1 December 2015 at 12:59, Michael Paquier  
> wrote:
>> Dean, are you planning to continue working on this patch? If yes, are
>> you fine to move it to next CF? It seems that the current consensus is
>> to split this effort into two patches:
>
> Yes, I still plan to work on it. I might not get much time over the
> next few days, so moving it to the next CF is probably reasonable.

Deal.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Issue on C function that reads int2[] (using "int2vector")

2015-12-01 Thread Rodrigo Hjort

2015-11-30 0:39 GMT-02:00 Tom Lane:

> Rodrigo Hjort writes:
> > I created a custom C function with this signature:
>
> > CREATE FUNCTION calculate_hash(numbers int2[])
> > RETURNS int8
> > AS 'MODULE_PATHNAME', 'pg_calculate_hash'
> > LANGUAGE C
> > IMMUTABLE STRICT;
>
> > And here is the function source code (inspired in codes I found in
> > src/backend/utils/adt/int.c):
>
> > PG_FUNCTION_INFO_V1(pg_calculate_hash);
> > Datum
> > pg_calculate_hash(PG_FUNCTION_ARGS)
> > {
> >   int2vector *int2Array = (int2vector *) PG_GETARG_POINTER(0);
>
> Nope.  int2vector is not the same as int2[].  It might occasionally seem
> to work, but in general it's not the same type.  And this particular
> coding won't work at all on on-disk int2[] data, because it doesn't
> account for toasting.
>
> regards, tom lane
>


Thanks for the advice, Tom.

I ended up with the following code, which worked successfully:

#define ARRPTR16(x)   ((uint16 *) ARR_DATA_PTR(x))
#define ARRNELEMS(x)  ArrayGetNItems(ARR_NDIM(x), ARR_DIMS(x))
#define ARRISEMPTY(x) (ARRNELEMS(x) == 0)

PG_FUNCTION_INFO_V1(pg_calculate_hash);
Datum
pg_calculate_hash(PG_FUNCTION_ARGS)
{
  ArrayType *a = PG_GETARG_ARRAYTYPE_P_COPY(0);
  unsigned int i, qtd, tipo, nums[MAX_NUMEROS];
  uint16 *da;

  qtd = ARRNELEMS(a);
  tipo = ARR_ELEMTYPE(a);
  da = ARRPTR16(a);

  elog(DEBUG1, "pg_calculate_hash(qtd=%d, tipo=%d)", qtd, tipo);

  [...]

  pfree(a);
  PG_RETURN_INT64(hash);
}


Regards,

Rodrigo Hjort

Re: [HACKERS] pgsql: Further tweaking of print_aligned_vertical().

2015-12-01 Thread Greg Stark

On 1 Dec 2015 19:48, "Tom Lane"  wrote:
>
> In passing, avoid possible calculation of log10(0).  Probably that's
> harmless, given the lack of field complaints, but it seems risky:
> conversion of NaN to an integer isn't well defined.

Am I going to have to fire up the emulator again?

Re: [HACKERS] proposal: multiple psql option -c

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 2:56 AM, Pavel Stehule  wrote:
> 2015-12-01 17:52 GMT+01:00 Catalin Iacob :
>> One maybe slightly surprising behaviour is that -f - can be specified
>> multiple times and only the first one has an effect since the others
>> act on an exhausted stdin. But I don't think forbidding multiple -f -
>> is better.

I don't see any good reason to forbid it actually. This simplifies the
code and it's not like this would break psql.

>> As for the code (these still apply to Michael's latest patch):
>>
>> 1. the be compiler quiete comment is not good English, /* silence the
>> compiler */ would be better or remove it completely

Fixed. Indeed I didn't notice that.

>> 2. shouldn't atyp in SimpleActionListCell be of type enum _atypes?
>> Otherwise why an enum if it's casted to int when actually used? If
>> it's an enum the repeated ifs on cell->atyp should be a switch, either
>> with a default Assert(0) or no default which makes gcc give a warning
>> if an enum value is ever added without having a corresponding case.
> It is maybe different topic - the psql uses enums and ints very freely. So I
> wrote code consistent with current code.

Yeah, I don't think that's a big issue either to be honest. The code
is kept consistent a maximum with what is there previously.

Patch is switched to ready for committer.
-- 
Michael
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 5899bb4..2928c92 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -38,9 +38,10 @@ PostgreSQL documentation
  PostgreSQL. It enables you to type in
  queries interactively, issue them to
  PostgreSQL, and see the query results.
- Alternatively, input can be from a file. In addition, it provides a
- number of meta-commands and various shell-like features to
- facilitate writing scripts and automating a wide variety of tasks.
+ Alternatively, input can be from a file or from command line
+ arguments. In addition, it provides a number of meta-commands and various
+ shell-like features to facilitate writing scripts and automating a wide
+ variety of tasks.
 
  
 
@@ -89,38 +90,45 @@ PostgreSQL documentation
   --command=command
   
   
-  Specifies that psql is to execute one
-  command string, command,
-  and then exit. This is useful in shell scripts. Start-up files
-  (psqlrc and ~/.psqlrc) are
-  ignored with this option.
+  Specifies that psql is to execute the given
+  command string, command.
+  This option can be repeated and combined in any order with
+  the -f option.
   
   
-  command must be either
-  a command string that is completely parsable by the server (i.e.,
-  it contains no psql-specific features),
-  or a single backslash command. Thus you cannot mix
-  SQL and psql
-  meta-commands with this option. To achieve that, you could
-  pipe the string into psql, for example:
-  echo '\x \\ SELECT * FROM foo;' | psql.
+  command must be either a
+  command string that is completely parsable by the server (i.e., it
+  contains no psql-specific features), or a
+  single backslash command. Thus you cannot mix
+  SQL and psql meta-commands
+  with this option. To achieve that, you could use
+  repeated -c options or pipe the string
+  into psql, for example:
+  psql -c '\x' -c 'SELECT * FROM foo;' or
+  echo '\x \\ SELECT * FROM foo;' | psql
   (\\ is the separator meta-command.)
   
   
-   If the command string contains multiple SQL commands, they are
-   processed in a single transaction, unless there are explicit
-   BEGIN/COMMIT commands included in the
-   string to divide it into multiple transactions.  This is
-   different from the behavior when the same string is fed to
-   psql's standard input.  Also, only
-   the result of the last SQL command is returned.
+   Each command string passed to -c is sent to the server
+   as a single query. Because of this, the server executes it as a single
+   transaction, even if a command string contains
+   multiple SQL commands, unless there are
+   explicit BEGIN/COMMIT commands included in the
+   string to divide it into multiple transactions.  Also, the server only
+   returns the result of the last SQL command to the
+   client.  This is different from the behavior when the same string with
+   multiple SQL commands is fed
+   to psql's standard input because
+   then psql sends each SQL
+   command separately.
   
   
-   Because of these legacy behaviors, putting more than one command in
-   the -c string often has unexpected results.  It's
-   better to feed multiple commands to psql's
-   standard input, either using echo as
-   illustrated above, or via a shell

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-01 Thread Kyotaro HORIGUCHI

Thank you for picking this up.

At Tue, 1 Dec 2015 20:33:02 +0530, Amit Kapila  wrote 
in 
> On Mon, Nov 30, 2015 at 6:17 PM, Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > == TODO or random thoughts, not restricted on this patch.
> >
> > - This patch doesn't contain planner part, it must be aware of
> >   async execution in order that this can be  in effective.
> >
> 
> How will you decide whether sync-execution is cheaper than parallel
> execution.  Do you have some specific cases in mind where async
> execution will be more useful than parallel execution?

Mmm.. Some confusion in wording? Sync-async is a discrimination
about when to start execution of a node (and its
descendents). Parallel-serial(sequential) is that of whether
multiple nodes can execute simultaneously. Async execution
premises parallel execution in any terms, bgworker or FDW.

As I wrote in the previous mail, async execution reduces startup
time of execution of parallel execution. So async execution is
not useful than parallel execution, but it accelerates parallel
execution. Is is effective when startup time of every parallel
execution node is rather long. We have enough numbers to cost it.

> > - Some measture to control execution on bgworker would be
> >   needed. At least merge join requires position mark/reset
> >   functions.
> >
> > - Currently, more tuples make reduce effectiveness of parallel
> >   execution, some method to transfer tuples in larger unit would
> >   be needed, or would be good to have shared workmem?
> >
> 
> Yeah, I think here one thing we need to figure out is whether the
> performance bottleneck is due to the amount of data that is transferred
> between worker and master or something else. One idea could be to pass
> TID and may be keep the buffer pin (which will be released by master
> backend), but on the other hand if we have to perform costly target list
> evaluation by bgworker, then it might be beneficial to pass the projected
> list back.

On possible bottle neck is singnalling between backends. Current
parallel execution uses signal to make producer-consumer world go
round. Conveying TID won't make it faster if the bottleneck is
the inter-process communication. I brought up bulk-transferring
or shared workmem as a example menas to reduce IPC frequency.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql: Further tweaking of print_aligned_vertical().

2015-12-01 Thread Tom Lane

Greg Stark  writes:
> On 1 Dec 2015 19:48, "Tom Lane"  wrote:
>> In passing, avoid possible calculation of log10(0).  Probably that's
>> harmless, given the lack of field complaints, but it seems risky:
>> conversion of NaN to an integer isn't well defined.

> Am I going to have to fire up the emulator again?

No, 'cause I fixed it ;-).  I initially thought that the case might
be unreachable, but it isn't: you can get there with cont->nrows == 0
if you have FETCH_SIZE set and the resultset size is an exact multiple
of FETCH_SIZE.

While poking at that, though, I ran into yet another bug in this code:

regression=# \pset format wrapped
Output format is wrapped.
regression=# \x
Expanded display is on.
regression=# select repeat('xyzzy',22) from generate_series(1,25);
-[ RECORD 1 ]--
repeat | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy.
   |.xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy
-[ RECORD 2 ]--
repeat | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy.
   |.xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy
.. etc etc ...

regression=# \set FETCH_COUNT 2
regression=# select repeat('xyzzy',22) from generate_series(1,25);
-[ RECORD 1 ]---
---
repeat | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyx
yzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy
-[ RECORD 2 ]---
---
repeat | xyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyx
yzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzyxyzzy
.. etc etc ...

That is, wrap width determination is broken the moment you set FETCH_SIZE.

The reason seems to be this ugly kluge in ExecQueryUsingCursor():

else if (pset.queryFout == stdout && !did_pager)
{
/*
 * If query requires multiple result sets, hack to ensure that
 * only one pager instance is used for the whole mess
 */
pset.queryFout = PageOutput(10, &(my_popt.topt));
did_pager = true;
}

After we have run that code, pset.queryFout is no longer pointing at
stdout, which breaks every single one of the "fout == stdout" tests in
print.c.  It'd be all right if those tests knew that they were dealing
with an is_pager situation, but they don't because that critical piece of
information is not passed down to printQuery().  I think we could probably
fix this by making is_pager an additional argument of printQuery() and its
relevant subroutines.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Kyotaro HORIGUCHI

Hello, thank you for taking time for this.

At Tue, 1 Dec 2015 14:56:54 -0500, Robert Haas  wrote in 

> On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai  wrote:
> > This patch is not tested by actual FDW extensions, so it is helpful
> > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.
> 
> I have done some editing and some small revisions on this patch.
> Here's what I came up with.  The revisions are mostly cosmetic, but I
> revised it a bit so that the signature of GetForeignPlan need not
> change.  Also, I made nodeForeignScan.c do some of the outer plan
> handling automatically, and I fixed the compile breaks in
> contrib/file_fdw and contrib/postgres_fdw.
> 
> Comments/review/testing are very welcome.

Applied on HEAD with no error. Regtests of core, postgres_fdw and
file_fdw finished with no error. (I haven't done any further testing)


nodeScan.c:

  The comments in nodeScan.c looks way clearer. Thank you for rewriting.

nodeForeignscan.c:

 Is this a mistake?

 > @@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState *estate, 
 > int eflags)
 >  scanstate->fdwroutine = fdwroutine;
 >  scanstate->fdw_state = NULL;
 >  
 > +/* Initialize any outer plan. */
-> +if (outerPlanState(scanstate))
+> +if (outerPlanState(node))
 > +outerPlanState(scanstate) =

createplan.c, planmain.h:

 I agree with reverting the signature of GetForeignPlan.

fdwapi.h:

 The reverting of the additional parameter of ForeignScan leaves
 only change of indentation of the last parameter.

fdwhandler.sgml:

 This is easy to understand to me. Thank you.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Kyotaro HORIGUCHI

Sorry, I made a mistake.

At Wed, 02 Dec 2015 10:29:17 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20151202.102917.50152198.horiguchi.kyot...@lab.ntt.co.jp>
> Hello, thank you for editing.
> 
> At Tue, 1 Dec 2015 14:56:54 -0500, Robert Haas  wrote 
> in 
> > On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai  
> > wrote:
> > > This patch is not tested by actual FDW extensions, so it is helpful
> > > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.
> > 
> > I have done some editing and some small revisions on this patch.
> > Here's what I came up with.  The revisions are mostly cosmetic, but I
> > revised it a bit so that the signature of GetForeignPlan need not
> > change.  Also, I made nodeForeignScan.c do some of the outer plan
> > handling automatically, and I fixed the compile breaks in
> > contrib/file_fdw and contrib/postgres_fdw.
> > 
> > Comments/review/testing are very welcome.
> 
> Applied on HEAD with no error. Regtests of core, postgres_fdw and
> file_fdw finished with no error.
> 
> 
> nodeScan.c:
> 
>   The comments in nodeScan.c looks way clearer. Thank you for rewriting.
> 
> nodeForeignscan.c:
> 
>  Is this a mistake?
> 
>  > @@ -205,6 +218,11 @@ ExecInitForeignScan(ForeignScan *node, EState 
> *estate, int eflags)
>  >scanstate->fdwroutine = fdwroutine;
>  >scanstate->fdw_state = NULL;
>  >  
>  > +  /* Initialize any outer plan. */
> -> +  if (outerPlanState(scanstate))
> +> +  if (outerPlanState(node))
>  > +  outerPlanState(scanstate) =

No, the above is wrong.

-> +if (outerPlanState(scanstate))
+> +if (outerPlan(node))
 > +outerPlanState(scanstate) =

> createplan.c, planmain.h:
> 
>  I agree with reverting the signature of GetForeignPlan.
> 
> fdwapi.h:
> 
>  The reverting of the additional parameter of ForeignScan leaves
>  only change of indentation of the last parameter.
> 
> fdwhandler.sgml:
> 
>  This is easy to understand to me. Thank you.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera  wrote:
> - I discovered that not setting PGPORT was causing some of the tests
>   that fail (using command_fails) to fail to test what was being tested.
>   The problem is that the command was failing with "could not connect to
>   server" instead of failing because of trying to cluster a nonexistant
>   table, etc.  Unfortunately the only way to verify this is by looking
>   at the regress_log_xxx_foobar file.  Not ideal, but not this patch's
>   fault.

Nice catch.

> - I changed the routines moved to PostgresNode so that they are instance
>   methods, i.e. $node->poll_until_query; also psql and
>   issues_query_like.  The latter also sets "local $PGPORT" so that the
>   command is run with the correct port.

OK.

> - It would be nice to have command_ok and command_fails in PostgresNode
>   too; that would remove the need for setting $ENV{PGPORT} but it's
>   possible to run commands outside a node too, so we'd need duplicates,
>   which would be worse.

I am fine to let it the way your patch does it. There are already many changes.

> - I removed the @active_nodes array, which is now unnecessary (per
>   above).

So that's basically replaced by @all_nodes.

> - I moved the "delete $ENV{PGxxx}" back to a BEGIN block in TestLib.
>   I did it because it's possible to write test scripts without
>   PostgresNode, and it's not nice to have those pick up the environment.
>   This still affects anything using PostgresNode because that one uses
>   TestLib.

OK.

> Finally, I ran perltidy on all the files, which strangely changed stuff
> that I didn't expect it to change.  I wonder if this is related to the
> perltidy version.  Do you see further changes if you run perltidy on the
> patched tree?

SimpleTee.pm shows some diffs, though it doesn't seem that this patch
should care about that. The rest is showing no diffs. And I have used
perltidy v20140711.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-01 Thread Alvaro Herrera

Michael Paquier wrote:
> On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera  
> wrote:

> > - It would be nice to have command_ok and command_fails in PostgresNode
> >   too; that would remove the need for setting $ENV{PGPORT} but it's
> >   possible to run commands outside a node too, so we'd need duplicates,
> >   which would be worse.
> 
> I am fine to let it the way your patch does it. There are already many 
> changes.

Idea: we can have a bare command_ok exported by TestLib just as
currently, and instance method PostgresNode->command_ok that first sets
local $ENV{PGPORT} and then calls the other one.

> > - I removed the @active_nodes array, which is now unnecessary (per
> >   above).
> 
> So that's basically replaced by @all_nodes.

@all_nodes is only used to look for unused port numbers.

> > Finally, I ran perltidy on all the files, which strangely changed stuff
> > that I didn't expect it to change.  I wonder if this is related to the
> > perltidy version.  Do you see further changes if you run perltidy on the
> > patched tree?
> 
> SimpleTee.pm shows some diffs, though it doesn't seem that this patch
> should care about that. The rest is showing no diffs. And I have used
> perltidy v20140711.

Yes, the patch doesn't modify SimpleTee -- I just used "find" to indent
perl files.  What I don't understand is why doesn't my perltidy run
match what was in master currently.  It should be a no-op ...

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 12:57 AM, Teodor Sigaev  wrote:
> Thank you, committed.

Youhou, thanks!
I still think that we should have a test case for this patch close to
the script I sent on this thread. I'll get into it once the
infrastructure patch for recovery regression tests gets in.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Etsuro Fujita


On 2015/12/02 1:41, Robert Haas wrote:

On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
 wrote:

The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
FDW driver can set arbitrary but one path-node here.
After that, this path-node shall be transformed to plan-node by
createplan.c, then passed to FDW driver using GetForeignPlan callback.



I understand this, as I also did the same thing in my patches, but actually,
that seems a bit complicated to me.  Instead, could we keep the
fdw_outerpath in the fdw_private of a ForeignPath node when creating the
path node during GetForeignPaths, and then create an outerplan accordingly
from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by
using create_plan_recurse there?  I think that that would make the core
involvment much simpler.



I can't see how it's going to get much simpler than this.  The core
core is well under a hundred lines, and it all looks pretty
straightforward to me.  All of our existing path and plan types keep
lists of paths and plans separate from other kinds of data, and I
don't think we're going to win any awards for deviating from that
principle here.


One thing I can think of is that we can keep both the structure of a 
ForeignPath node and the API of create_foreignscan_path as-is.  The 
latter is a good thing for FDW authors.  And IIUC the patch you posted 
today, I think we could make create_foreignscan_plan a bit simpler too. 
 Ie, in your patch, you modified that function as follows:


@@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, 
ForeignPath *best_path,

 */
scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,

best_path,
-   
tlist, scan_clauses);
+   
tlist,
+   
scan_clauses);
+   outerPlan(scan_plan) = fdw_outerplan;

I think that would be OK, but I think we would have to do a bit more 
here about the fdw_outerplan's targetlist and qual; I think that the 
targetlist needs to be changed to fdw_scan_tlist, as in the patch [1], 
and that it'd be better to change the qual to remote conditions, ie, 
quals not in the scan_plan's scan.plan.qual, to avoid duplicate 
evaluation of local conditions.  (In the patch [1], I didn't do anything 
about the qual because the current postgres_fdw join pushdown patch 
assumes that all the the scan_plan's scan.plan.qual are pushed down.) 
Or, FDW authors might want to do something about fdw_recheck_quals for a 
foreign-join while creating the fdw_outerplan.  So if we do that during 
GetForeignPlan, I think we could make create_foreignscan_plan a bit 
simpler, or provide flexibility to FDW authors.



@@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
*slot)

 ResetExprContext(econtext);

+   /*
+* FDW driver has to recheck visibility of EPQ tuple towards
+* the scan qualifiers once it gets pushed down.
+* In addition, if this node represents a join sub-tree, not
+* a scan, FDW driver is also responsible to reconstruct
+* a joined tuple according to the primitive EPQ tuples.
+*/
+   if (fdwroutine->RecheckForeignScan)
+   {
+   if (!fdwroutine->RecheckForeignScan(node, slot))
+   return false;
+   }

Maybe I'm missing something, but I think we should let FDW do the work if
scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
abort the transaction.)



That would be unnecessarily restrictive.  On the one hand, even if
scanrelid != 0, the FDW can decide that it prefers to do the rechecks
using RecheckForeignScan rather than fdw_recheck_quals.  For most
FDWs, I expect using fdw_recheck_quals to be more convenient, but
there may be cases where somebody prefers to use RecheckForeignScan,
and allowing that costs nothing.


I suppose that the flexibility would probably be a good thing, but I'm a 
little bit concerned that that might be rather confusing to FDW authors. 
 Maybe I'm missing something, though.


Best regards,
Etsuro Fujita

[1] http://www.postgresql.org/message-id/5624d583.10...@lab.ntt.co.jp




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Robert Haas

On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita
 wrote: Plan   *plan =
>scan.plan;
> @@ -3755,7 +3763,7 @@ make_foreignscan(List *qptlist,
> /* cost will be filled in by create_foreignscan_plan */
> plan->targetlist = qptlist;
> plan->qual = qpqual;
> -   plan->lefttree = NULL;
> +   plan->lefttree = fdw_outerplan;
> plan->righttree = NULL;
> node->scan.scanrelid = scanrelid;
>
> I think that that would break the EXPLAIN output.

In what way?  EXPLAIN recurses into the left and right trees of every
plan node regardless of what type it is, so superficially I feel like
this ought to just work. What problem do you foresee?

I do think that ExecInitForeignScan ought to be changed to
ExecInitNode on it's outer plan if present rather than leaving that to
the FDW's BeginForeignScan method.

> One option to avoid that
> is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or
> BeginForeignScan as you proposed.  That breaks the equivalence that the Plan
> tree and the PlanState tree should be mirror images of each other, but I
> think that that break would be harmless.

I'm not sure how many times I have to say this, but we are not doing
that.  I will not commit any patch that does that, and I will
vigorously argue against anyone else committing such a patch either.
That *would* break EXPLAIN, because EXPLAIN relies on being able to
walk the PlanState tree and find all the Plan nodes from the
corresponding PlanState nodes.  Now you might think that it would be
OK to omit a plan node that we decided we weren't ever going to
execute, but today we don't do that, and I don't think we should.  I
think it could be very confusing if EXPLAIN and EXPLAIN ANALYZE show
different sets of plan nodes for the same query.  Quite apart from
EXPLAIN, there are numerous other places that assume that they can
walk the PlanState tree and find all the Plan nodes.  Breaking that
assumption would be bad news.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Robert Haas

On Fri, Nov 27, 2015 at 1:25 AM, Kouhei Kaigai  wrote:
>> Sorry, I don't understand this.  In my understanding, fdw_recheck_quals
>> can be defined for a foreign join, regardless of the join type,
>>
> Yes, "can be defined", but will not be workable if either side of
> joined tuple is NULL because of outer join. SQL functions returns
> NULL prior to evaluation, then ExecQual() treats this result as FALSE.
> However, a joined tuple that has NULL fields may be a valid tuple.
>
> We don't need to care about unmatched tuple if INNER JOIN.

This is a really good point, and a very strong argument for the design
KaiGai has chosen here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Tom Lane

Robert Haas  writes:
> On Fri, Nov 27, 2015 at 1:33 AM, Etsuro Fujita
>> One option to avoid that
>> is to set the fdw_outerplan in ExecInitForeignScan as in my patch [1], or
>> BeginForeignScan as you proposed.  That breaks the equivalence that the Plan
>> tree and the PlanState tree should be mirror images of each other, but I
>> think that that break would be harmless.

> I'm not sure how many times I have to say this, but we are not doing
> that.  I will not commit any patch that does that, and I will
> vigorously argue against anyone else committing such a patch either.

And I'll back him up.  That's a horrible idea.  You're proposing to break
a very fundamental structural property for the convenience of one little
corner of the system.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] gincostestimate and hypothetical indexes

2015-12-01 Thread Tom Lane

Julien Rouhaud  writes:
> On 01/12/2015 00:37, Tom Lane wrote:
>> Maybe we could do something along the lines of pretending that 90% of the
>> index size given by the plugin is entry pages?  Don't know what a good
>> ratio would be exactly, but we could probably come up with one with a bit
>> of testing.

> I used zero values because gincostestimate already handle empty
> statistics, and pretend that 100% of the pages are entry pages:

Yeah, but that code is pretty bogus.  It was never intended to do more
than minimally hold the fort until someone had vacuumed.  If we're trying
to support hypothetical-index plugins with this code, it should try to
do something a bit more realistic.

I did a bit of investigation using some sample data I had laying around
(JSONB and TSVECTOR data).  It looks like assuming that entry pages are
90% of the index is not too awful; I saw actual values ranging from
80% to 94%.  The real weak spot of the current code, however, is

numEntries = numTuples; /* bogus, but no other info available */

which is just as bogus as it says, because numTuples is going to be the
heap tuple count not anything specific to GIN.  Often you'd expect the
number of entries to be some multiple of the number of tuples, because
the whole point of GIN is to be able to index components of the indexed
column's values.  But on the other hand if there are not a lot of
distinct component values, you could get many fewer entries than tuples.

Based on what I saw in this small sample, I'm inclined to propose setting
numEntries to 100 times numEntryPages, that is, assume 100 entries per
entry page.  That could easily be off by a factor of 2, but it seems more
robust than just blindly using the heap tuple count.

(Of course, all of this is predicated on the assumption that the
hypothetical-index plugin gives us some realistic value for the index's
size in pages, but if it doesn't it's the plugin's fault.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-12-01 Thread Catalin Iacob

On Tue, Dec 1, 2015 at 1:53 PM, Michael Paquier
 wrote:
> Attached is a patch implementing those suggestions. This simplifies
> the code without changing its usefulness. If you are fine with those
> changes I will switch this patch as ready for committer.

I tested the v07 patch (so not Michael's version) a few days ago but
didn't send this email earlier.

I combined various -c and -f with --echo-all, --single-transaction,
\set ON_ERROR_STOP=1, separate -c "VACUUM", "SELECT + VACUUM" in a
single and in two -c, inserting -f - somewhere in the middle of the
other -c and -f. They all behave as I would expect.

One maybe slightly surprising behaviour is that -f - can be specified
multiple times and only the first one has an effect since the others
act on an exhausted stdin. But I don't think forbidding multiple -f -
is better.

As for the code (these still apply to Michael's latest patch):

1. the be compiler quiete comment is not good English, /* silence the
compiler */ would be better or remove it completely

2. shouldn't atyp in SimpleActionListCell be of type enum _atypes?
Otherwise why an enum if it's casted to int when actually used? If
it's an enum the repeated ifs on cell->atyp should be a switch, either
with a default Assert(0) or no default which makes gcc give a warning
if an enum value is ever added without having a corresponding case.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] overflow checks optimized away

2015-12-01 Thread Greg Stark

On Fri, Oct 9, 2015 at 2:52 PM, Bruce Momjian  wrote:

> Any news on this item from 2013, worked on again 2014?
>

Sorry, I didn't look at it since. At the time I was using Xi Wang's
software to find the overflow checks that need to be redone. He published a
paper on it and it's actually pretty impressive. It constructs a constraint
problem and then throws a kSAT solver at it to find out if there's any code
that a compiler could optimize away regardless of whether any existant
compiler is actually capable of detecting the case and optimizing it away.
https://pdos.csail.mit.edu/papers/stack:sosp13.pdf

Xi Wang actually went on to pursue the same issues in the Linux kernel,
Clang, and elsewhere:

https://css.csail.mit.edu/stack/
http://kqueue.org/blog/2012/03/16/fast-integer-overflow-detection/
http://lists.llvm.org/pipermail/cfe-commits/Week-of-Mon-20130617/082221.html

I'm up for trying again but it's a long slong to replace all the overflow
checks and the patch will be big

-- 
greg

Re: [HACKERS] proposal: multiple psql option -c

2015-12-01 Thread Pavel Stehule

2015-12-01 13:53 GMT+01:00 Michael Paquier :

> On Tue, Dec 1, 2015 at 11:46 AM, Pavel Stehule 
> wrote:
> > 2015-11-30 15:17 GMT+01:00 Michael Paquier :
> >> Removing some items from the list of potential actions and creating a
> >> new sublist listing action types is a bit weird. Why not grouping them
> >> together and allow for example -l as well in the list of things that
> >> is considered as a repeatable action? It seems to me that we could
> >> simplify the code this way, and instead of ACT_NOTHING we could check
> >> if the list of actions is empty or not.
> >
> >
> > fixed
>
> Thanks for the patch. I have to admit that adding ACT_LIST_DB in the
> list of actions was not actually a good idea. It makes the patch
> uglier.
>
> Except that, I just had an in-depth look at this patch, and there are
> a couple of things that looked strange:
> - ACT_LIST_DB would live better if removed from the list of actions
> and be used as a separate, independent option. My previous suggestion
> was unadapted. Sorry.
> - There is not much meaning to have simple_action_list_append and all
> its structures in common.c and common.h. Its use is limited in
> startup.c (this code is basically a duplicate of dumputils.c still
> things are rather different, justifying the duplication) and
> centralized around parse_psql_options.
> - use_stdin is not necessary. It is sufficient to rely on actions.head
> == NULL instead.
> - The documentation is pretty clear. That's nice.
> Attached is a patch implementing those suggestions. This simplifies
> the code without changing its usefulness. If you are fine with those
> changes I will switch this patch as ready for committer.
> Regards,
>

yes, it is looking well

Thank you

Pavel



> --
> Michael
>

Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-12-01 Thread Dean Rasheed

On 1 December 2015 at 12:59, Michael Paquier  wrote:
> Dean, are you planning to continue working on this patch? If yes, are
> you fine to move it to next CF? It seems that the current consensus is
> to split this effort into two patches:

Yes, I still plan to work on it. I might not get much time over the
next few days, so moving it to the next CF is probably reasonable.

Regards,
Dean


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal: multiple psql option -c

2015-12-01 Thread Pavel Stehule

2015-12-01 17:52 GMT+01:00 Catalin Iacob :

> On Tue, Dec 1, 2015 at 1:53 PM, Michael Paquier
>  wrote:
> > Attached is a patch implementing those suggestions. This simplifies
> > the code without changing its usefulness. If you are fine with those
> > changes I will switch this patch as ready for committer.
>
> I tested the v07 patch (so not Michael's version) a few days ago but
> didn't send this email earlier.
>
> I combined various -c and -f with --echo-all, --single-transaction,
> \set ON_ERROR_STOP=1, separate -c "VACUUM", "SELECT + VACUUM" in a
> single and in two -c, inserting -f - somewhere in the middle of the
> other -c and -f. They all behave as I would expect.
>
> One maybe slightly surprising behaviour is that -f - can be specified
> multiple times and only the first one has an effect since the others
> act on an exhausted stdin. But I don't think forbidding multiple -f -
> is better.
>
> As for the code (these still apply to Michael's latest patch):
>
> 1. the be compiler quiete comment is not good English, /* silence the
> compiler */ would be better or remove it completely
>
> 2. shouldn't atyp in SimpleActionListCell be of type enum _atypes?
> Otherwise why an enum if it's casted to int when actually used? If
> it's an enum the repeated ifs on cell->atyp should be a switch, either
> with a default Assert(0) or no default which makes gcc give a warning
> if an enum value is ever added without having a corresponding case.
>

It is maybe different topic - the psql uses enums and ints very freely. So
I wrote code consistent with current code.

The code there uses some older patterns and the cleaning should be bigger
patch.

I have not strong option about this.

Regards

Pavel

Re: [HACKERS] More stable query plans via more predictable column statistics

2015-12-01 Thread Tom Lane

"Shulgin, Oleksandr"  writes:
> This post summarizes a few weeks of research of ANALYZE statistics
> distribution on one of our bigger production databases with some real-world
> data and proposes a patch to rectify some of the oddities observed.

Please add this to the 2016-01 commitfest ...

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 3:23 PM, Michael Paquier
 wrote:
> On Wed, Dec 2, 2015 at 7:05 AM, Tomas Vondra
>  wrote:
>> Attached is v2 of the patch, that
>>
>> (a) adds explicit fsync on the parent directory after all the rename()
>> calls in timeline.c, xlog.c, xlogarchive.c and pgarch.c
>>
>> (b) adds START/END_CRIT_SECTION around the new fsync_fname calls
>> (except for those in timeline.c, as the START/END_CRIT_SECTION is
>> not available there)
>>
>> The patch is fairly trivial and I've done some rudimentary testing, but I'm
>> sure I haven't exercised all the modified paths.
>
> I would like to have an in-depth look at that after finishing the
> current CF, I am the manager of this one after all... Could you
> register it to 2016-01 CF for the time being? I don't mind being
> beaten by someone else if this someone has some room to look at this
> patch..

And please feel free to add my name as reviewer.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] silent data loss with ext4 / all current versions

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 7:05 AM, Tomas Vondra
 wrote:
> Attached is v2 of the patch, that
>
> (a) adds explicit fsync on the parent directory after all the rename()
> calls in timeline.c, xlog.c, xlogarchive.c and pgarch.c
>
> (b) adds START/END_CRIT_SECTION around the new fsync_fname calls
> (except for those in timeline.c, as the START/END_CRIT_SECTION is
> not available there)
>
> The patch is fairly trivial and I've done some rudimentary testing, but I'm
> sure I haven't exercised all the modified paths.

I would like to have an in-depth look at that after finishing the
current CF, I am the manager of this one after all... Could you
register it to 2016-01 CF for the time being? I don't mind being
beaten by someone else if this someone has some room to look at this
patch..
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql: add \pset true/false

2015-12-01 Thread Michael Paquier

On Mon, Nov 16, 2015 at 10:43 AM, Peter Geoghegan  wrote:
> On Thu, Nov 12, 2015 at 1:09 PM, Tom Lane  wrote:
>> Peter Eisentraut  writes:
>>> Plus we already have \pset numericlocale as a similar feature in psql.
>>
>> But \pset numericlocale is also a crock.  It doesn't affect COPY output
>> for instance, and its ability to identify which data types it should apply
>> to is really shaky.  And it's virtually unused, as demonstrated by the
>> fact that serious bugs in it went undetected for many years (cf 4778a0bda,
>> 77130fc14).  That's a really poor advertisement for the usefulness of the
>> proposed feature.
>
> FWIW, I didn't realize that we had "\pset numericlocale" until about a
> year ago. I now use it all the time, and find it very useful.

So, looking at this thread, here is the current status:
- Tom Lane: -1
- Michael Paquier: -1
- Peter Geoghegan: +1?
- Peter Eisentraut: +1
- the author: surely +1.
Any other opinions? Feel free to correct this list if needed, and then
let's try to move on on a consensus if need be to decide the destiny
of this patch.
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] snapshot too old, configured by time

2015-12-01 Thread Michael Paquier

On Mon, Nov 9, 2015 at 8:07 AM, Steve Singer  wrote:
> On 10/15/2015 05:47 PM, Kevin Grittner wrote:
>>
>> All other issues raised by Álvaro and Steve have been addressed, except
>> for this one, which I will argue against:
>
>
> I've been looking through the updated patch
>
>
> In snapmgr.c
>
>
> + * XXX: If we can trust a read of an int64 value to be atomic, we can skip
> the
> + * spinlock here.
> + */
> +int64
> +GetOldSnapshotThresholdTimestamp(void)
>
>
> Was your intent with the XXX for it to be a TODO to only aquire the lock on
> platforms without the atomic 64bit operations?
>
> On a more general note:
>
> I've tried various manual tests of this feature and it sometimes works as
> expected and sometimes doesn't.
> I'm getting the feeling that how I expect it to work isn't quite in sync
> with how it does work.
>
> I'd expect the following to be sufficient to generate the test
>
> T1: Obtains a snapshot that can see some rows
> T2: Waits 60 seconds and performs an update on those rows
> T2: Performs a vacuum
> T1: Waits 60 seconds, tries to select from the table.  The snapshot should
> be too old
>
>
> For example it seems that in test 002 the select_ok on conn1 following the
> vacuum but right before the final sleep is critical to the snapshot too old
> error showing up (ie if I remove that select_ok but leave in the sleep I
> don't get the error)
>
> Is this intended and if so is there a better way we can explain how things
> work?
>
>
> Also is 0 intended to be an acceptable value for old_snapshot_threshold and
> if so what should we expect to see then?

There has been a review but no replies for more than 1 month. Returned
with feedback?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Parallel Seq Scan

2015-12-01 Thread Michael Paquier

On Sun, Nov 22, 2015 at 3:25 PM, Amit Kapila  wrote:
> On Fri, Nov 20, 2015 at 11:34 PM, Robert Haas  wrote:
>>
>> On Thu, Nov 19, 2015 at 11:59 PM, Amit Kapila 
>> wrote:
>> > Isn't it better to destroy the memory for readers array as that gets
>> > allocated
>> > even if there are no workers available for execution?
>> >
>> > Attached patch fixes the issue by just destroying readers array.
>>
>> Well, then you're making ExecGatherShutdownWorkers() not a no-op any
>> more.  I'll go commit a combination of your two patches.
>>
>
> Thanks!

There is still an entry in the CF app for this thread as "Parallel Seq
scan". The basic infrastructure has been committed, and I understand
that this is a never-ending tasks and that there will be many
optimizations. Still, are you guys fine to switch this entry as
committed for now?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Tab completion for ALTER COLUMN SET STATISTICS

2015-12-01 Thread Michael Paquier

On Wed, Nov 18, 2015 at 2:12 AM, Jeff Janes  wrote:
> On Mon, Sep 28, 2015 at 8:48 AM, Robert Haas  wrote:
>> On Sat, Sep 26, 2015 at 7:24 AM, Michael Paquier
>>  wrote:
>>> On Sat, Sep 26, 2015 at 7:18 AM, Jeff Janes  wrote:
 If I have "alter table foo alter COLUMN bar SET STATISTICS" in the line
 buffer,
 it tab completes to add " TO", which is not legal.

 The attached patch makes it not tab complete anything at all, which is at
 least not actively misleading.
 I thought of having it complete "-1", "" so that it gives a clue
 about what is needed, but I didn't see any precedence for non-literal
 clue-giving and I did not want to try to create new precedence.
>>>
>>> +1 for the way you are doing it in your patch.
>>
>> Before we take that approach, can I back up and ask whether we
>> shouldn't instead narrow the rule that's inserting TO?  I think that
>> completion is coming from here:
>>
>> else if (pg_strcasecmp(prev2_wd, "SET") == 0 &&
>>  pg_strcasecmp(prev4_wd, "UPDATE") != 0 &&
>>  pg_strcasecmp(prev_wd, "TABLESPACE") != 0 &&
>>  pg_strcasecmp(prev_wd, "SCHEMA") != 0 &&
>>  prev_wd[strlen(prev_wd) - 1] != ')' &&
>>  prev_wd[strlen(prev_wd) - 1] != '=' &&
>>  pg_strcasecmp(prev4_wd, "DOMAIN") != 0)
>> COMPLETE_WITH_CONST("TO");
>>
>> Now, that is basically an incredibly broad production: every time the
>> second-most recent word is SET, complete with TO.  It looks to me like
>> this has already been patched around multiple times to make it
>> slightly narrower.  Those exceptions were added by three different
>> commits, in 2004, 2010, and 2012.
>>
>> Maybe it's time to back up and make the whole thing a lot narrower.
>> Like, if second-most-recent word of the command is also the FIRST word
>> of the command, this is probably right.  And there may be a few other
>> situations where it's right.  But in general, SET is used in lots of
>> places and the fact that we've seen one recently isn't enough to
>> decide in TO.
>
> There are quite a few places where TO is legitimately the 2nd word
> after SET.  I don't know how to do an exhaustive survey of them, but
> here are the ones I know about:
>
> SET  TO
> ALTER DATABASE  SET  TO
> ALTER ROLE  SET  TO
> ALTER USER  SET  TO
> ALTER FUNCTION  (  )
> SET  TO
>
> I don't know if coding the non-TO cases as exceptions is easier or
> harder than coding the TO cases more tightly.
>
> In the case of "SET SCHEMA", it seems like we might be able to just
> move that case higher in the giant list of 'else if' so that it
> triggers before getting to the generic SET  code.  But I can't
> figure out where in the file that code is, to see if it might be
> moved.  And I'm not sure that intentionally relying on order more than
> already is a better strategy, it seems kind of fragile.
>
> In any case, I'd rather not try re-factor this part of the code while
> there is another large refactoring pending.  But I think an
> incremental improvement would be acceptable.

With the refactoring of tab-complete.c that is moving on, perhaps this
entry should be moved to next CF. Thoughts?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-12-01 Thread Michael Paquier

On Tue, Nov 24, 2015 at 7:03 AM, Robert Haas  wrote:
> On Mon, Nov 23, 2015 at 1:44 PM, Alvaro Herrera
>  wrote:
>> Robert Haas wrote:
>>> I support building incrementally, but I don't see why we want to
>>> change the catalog structure and then change it again.  That seems
>>> like it makes the project more work, not less.
>>
>> I agree with what you say.  I thought you were saying that the
>> implementation had to provide multi-partitioning from the get-go, not
>> just the design.
>
> Well, I *hope* that's going to fall out naturally.  If it doesn't, I
> can live with that.  But I hope it will.
>
>>> To me, it seems like there is a pretty obvious approach here: each
>>> table can be either a plain table, or a partition root (which can look
>>> just like an empty inheritance parent).  Then multi-level partitioning
>>> falls right out of that design without needing to do anything extra.
>>
>> Sounds reasonable.
>
> Cool.
>
>>> I think it is also worth getting the syntax right from the beginning.
>>
>> Yes, that's critical.  We could implement the whole thing in gram.y and
>> then have the unsupported cases throw errors; then it's easy to see that
>> there are no grammar conflicts to deal with later.
>
> That's worth considering, too.

It seems that the consensus is to rework a bit more this patch.
Returned with feedback then?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-12-01 Thread Amit Langote

On 2015/12/02 15:41, Michael Paquier wrote:
> It seems that the consensus is to rework a bit more this patch.
> Returned with feedback then?

Yes, as far as this commitfest is concerned. Or "moved to the next
commitfest"? Not sure exactly which makes sense.

Thanks,
Amit




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Declarative partitioning

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 3:55 PM, Amit Langote
 wrote:
> On 2015/12/02 15:41, Michael Paquier wrote:
>> It seems that the consensus is to rework a bit more this patch.
>> Returned with feedback then?
>
> Yes, as far as this commitfest is concerned. Or "moved to the next
> commitfest"? Not sure exactly which makes sense.

OK, then, switched as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 1:04 PM, Michael Paquier
 wrote:
> On Wed, Dec 2, 2015 at 12:01 PM, Alvaro Herrera
>  wrote:
>> Michael Paquier wrote:
>>> On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera  
>>> wrote:
>>
>>> > - It would be nice to have command_ok and command_fails in PostgresNode
>>> >   too; that would remove the need for setting $ENV{PGPORT} but it's
>>> >   possible to run commands outside a node too, so we'd need duplicates,
>>> >   which would be worse.
>>>
>>> I am fine to let it the way your patch does it. There are already many 
>>> changes.
>>
>> Idea: we can have a bare command_ok exported by TestLib just as
>> currently, and instance method PostgresNode->command_ok that first sets
>> local $ENV{PGPORT} and then calls the other one.
>
> Hm. That would be cleaner and make the code more consistent. Now as
> TestLib exports command_ok, command_like and command_fails, we would
> get redefinition errors when compiling the code if those routines are
> not named differently in PostgresNode. If you want to have the names
> consistent, then I guess that the only way would be to remove those
> routines from the export list of TestLib and call them directly as for
> example TestLib::command_ok(). See for example the patch attached that
> applies on top on your patch 2 that adds a set of routines in
> PostgresNode with a slightly different name.

Well, Alvaro has whispered me a more elegant method by using TestLib()
to only import a portion of the routines and avoid the redefinition
errors. Hence, patch 0001 attached creates equivalents of command_*
for PostgresNode and tests use it without setting PGPORT. Patch 0002
is a run of perltidy on the whole.
-- 
Michael
From 970423c6e5c3c30f6ecd82d8691437a2aa4acc87 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 2 Dec 2015 15:12:03 +0900
Subject: [PATCH 1/2] Rework TAP test infrastructure to ease node management

This is aimed to being used extensively by modules and features to
test features that require a bit more than a standalone node.
---
 src/bin/initdb/t/001_initdb.pl |   1 +
 src/bin/pg_basebackup/t/010_pg_basebackup.pl   | 104 +++---
 src/bin/pg_controldata/t/001_pg_controldata.pl |  11 +-
 src/bin/pg_ctl/t/001_start_stop.pl |   2 +
 src/bin/pg_ctl/t/002_status.pl |  13 +-
 src/bin/pg_rewind/RewindTest.pm| 191 +++---
 src/bin/pg_rewind/t/003_extrafiles.pl  |   3 +-
 src/bin/pg_rewind/t/004_pg_xlog_symlink.pl |   4 +-
 src/bin/scripts/t/010_clusterdb.pl |  23 +-
 src/bin/scripts/t/011_clusterdb_all.pl |  13 +-
 src/bin/scripts/t/020_createdb.pl  |  13 +-
 src/bin/scripts/t/030_createlang.pl|  21 +-
 src/bin/scripts/t/040_createuser.pl|  17 +-
 src/bin/scripts/t/050_dropdb.pl|  13 +-
 src/bin/scripts/t/060_droplang.pl  |  11 +-
 src/bin/scripts/t/070_dropuser.pl  |  13 +-
 src/bin/scripts/t/080_pg_isready.pl|   9 +-
 src/bin/scripts/t/090_reindexdb.pl |  23 +-
 src/bin/scripts/t/091_reindexdb_all.pl |  10 +-
 src/bin/scripts/t/100_vacuumdb.pl  |  17 +-
 src/bin/scripts/t/101_vacuumdb_all.pl  |  10 +-
 src/bin/scripts/t/102_vacuumdb_stages.pl   |  13 +-
 src/test/perl/PostgresNode.pm  | 471 +
 src/test/perl/RecursiveCopy.pm |  42 +++
 src/test/perl/TestLib.pm   | 317 ++---
 src/test/ssl/ServerSetup.pm|  34 +-
 src/test/ssl/t/001_ssltests.pl |  32 +-
 27 files changed, 911 insertions(+), 520 deletions(-)
 create mode 100644 src/test/perl/PostgresNode.pm
 create mode 100644 src/test/perl/RecursiveCopy.pm

diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 299dcf5..f64186d 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -4,6 +4,7 @@
 
 use strict;
 use warnings;
+use PostgresNode;
 use TestLib;
 use Test::More tests => 14;
 
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index dc96bbf..46c8a71 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -2,6 +2,7 @@ use strict;
 use warnings;
 use Cwd;
 use Config;
+use PostgresNode;
 use TestLib;
 use Test::More tests => 51;
 
@@ -9,12 +10,17 @@ program_help_ok('pg_basebackup');
 program_version_ok('pg_basebackup');
 program_options_handling_ok('pg_basebackup');
 
-my $tempdir = tempdir;
-start_test_server $tempdir;
+my $tempdir = TestLib::tempdir;
 
-command_fails(['pg_basebackup'],
+my $node = get_new_node();
+# Initialize node without replication settings
+$node->init(hba_permit_replication => 0);
+$node->start;
+my $pgdata = $node->data_dir;
+

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-01 Thread Noah Misch

On Tue, Dec 01, 2015 at 08:11:21PM -0300, Alvaro Herrera wrote:
> Finally, I ran perltidy on all the files, which strangely changed stuff
> that I didn't expect it to change.  I wonder if this is related to the
> perltidy version.

The last pgindent run (commit 807b9e0) used perltidy v20090616, and perltidy
behavior has changed slightly over time.  Install that version to do your own
perltidy runs.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.

2015-12-01 Thread Michael Paquier

On Wed, Dec 2, 2015 at 12:01 PM, Alvaro Herrera
 wrote:
> Michael Paquier wrote:
>> On Wed, Dec 2, 2015 at 8:11 AM, Alvaro Herrera  
>> wrote:
>
>> > - It would be nice to have command_ok and command_fails in PostgresNode
>> >   too; that would remove the need for setting $ENV{PGPORT} but it's
>> >   possible to run commands outside a node too, so we'd need duplicates,
>> >   which would be worse.
>>
>> I am fine to let it the way your patch does it. There are already many 
>> changes.
>
> Idea: we can have a bare command_ok exported by TestLib just as
> currently, and instance method PostgresNode->command_ok that first sets
> local $ENV{PGPORT} and then calls the other one.

Hm. That would be cleaner and make the code more consistent. Now as
TestLib exports command_ok, command_like and command_fails, we would
get redefinition errors when compiling the code if those routines are
not named differently in PostgresNode. If you want to have the names
consistent, then I guess that the only way would be to remove those
routines from the export list of TestLib and call them directly as for
example TestLib::command_ok(). See for example the patch attached that
applies on top on your patch 2 that adds a set of routines in
PostgresNode with a slightly different name.

>> > Finally, I ran perltidy on all the files, which strangely changed stuff
>> > that I didn't expect it to change.  I wonder if this is related to the
>> > perltidy version.  Do you see further changes if you run perltidy on the
>> > patched tree?
>>
>> SimpleTee.pm shows some diffs, though it doesn't seem that this patch
>> should care about that. The rest is showing no diffs. And I have used
>> perltidy v20140711.
>
> Yes, the patch doesn't modify SimpleTee -- I just used "find" to indent
> perl files.  What I don't understand is why doesn't my perltidy run
> match what was in master currently.  It should be a no-op ...

Well I don't get it either :)
I did a run on top of your two patches and saw no differences.
-- 
Michael
diff --git a/src/bin/pg_basebackup/t/010_pg_basebackup.pl b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
index bc4afce..96dd103 100644
--- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl
+++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl
@@ -19,11 +19,10 @@ $node->init(hba_permit_replication => 0);
 $node->start;
 my $pgdata = $node->data_dir;
 
-$ENV{PGPORT} = $node->port;
-
-command_fails(['pg_basebackup'],
+$node->node_command_fails(
+	['pg_basebackup'],
 	'pg_basebackup needs target directory specified');
-command_fails(
+$node->node_command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup" ],
 	'pg_basebackup fails because of hba');
 
@@ -38,7 +37,7 @@ if (open BADCHARS, ">>$tempdir/pgdata/FOO\xe0\xe0\xe0BAR")
 $node->set_replication_conf();
 system_or_bail 'pg_ctl', '-D', $pgdata, 'reload';
 
-command_fails(
+$node->node_command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup" ],
 	'pg_basebackup fails because of WAL configuration');
 
@@ -49,7 +48,7 @@ print CONF "wal_level = archive\n";
 close CONF;
 $node->restart;
 
-command_ok([ 'pg_basebackup', '-D', "$tempdir/backup" ],
+$node->node_command_ok([ 'pg_basebackup', '-D', "$tempdir/backup" ],
 	'pg_basebackup runs');
 ok(-f "$tempdir/backup/PG_VERSION", 'backup was created');
 
@@ -58,34 +57,34 @@ is_deeply(
 	[ sort qw(. .. archive_status) ],
 	'no WAL files copied');
 
-command_ok(
+$node->node_command_ok(
 	[   'pg_basebackup', '-D', "$tempdir/backup2", '--xlogdir',
 		"$tempdir/xlog2" ],
 	'separate xlog directory');
 ok(-f "$tempdir/backup2/PG_VERSION", 'backup was created');
 ok(-d "$tempdir/xlog2/", 'xlog directory was created');
 
-command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft' ],
+$node->node_command_ok([ 'pg_basebackup', '-D', "$tempdir/tarbackup", '-Ft' ],
 	'tar format');
 ok(-f "$tempdir/tarbackup/base.tar", 'backup tar was created');
 
-command_fails(
+$node->node_command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T=/foo" ],
 	'-T with empty old directory fails');
-command_fails(
+$node->node_command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=" ],
 	'-T with empty new directory fails');
-command_fails(
+$node->node_command_fails(
 	[   'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp',
 		"-T/foo=/bar=/baz" ],
 	'-T with multiple = fails');
-command_fails(
+$node->node_command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo=/bar" ],
 	'-T with old directory not absolute fails');
-command_fails(
+$node->node_command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-T/foo=bar" ],
 	'-T with new directory not absolute fails');
-command_fails(
+$node->node_command_fails(
 	[ 'pg_basebackup', '-D', "$tempdir/backup_foo", '-Fp', "-Tfoo" ],
 	'-T with invalid format fails');
 
@@ -95,7 +94,7 @@ my $superlongpath = "$pgdata/$superlongname";
 
 open FILE, ">$superlongpath" or die "unable

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-01 Thread Kyotaro HORIGUCHI

Hello,

> You're right, it's not necessary.
> Attached latest v29 patch which removes the mention in pg_upgrade 
> documentation.

The changes looks to be correct but I haven't tested.
And I have some additional random comments.

visibilitymap.c:

  In visibilitymap_set, the followint lines.

map = PageGetContents(page);
...
if (flags != (map[mapByte] & (flags << mapBit)))

  map is (char*), PageGetContents returns (char*) but flags is
  uint8. I think that defining map as (uint8*) would be safer.

  In visibilitymap_set, the following lines does something
  different from what to do.  Only right side of the inequality
  gets shifted and what should be used in right side is not flags
  but VISIBILITYMAP_VALID_BITS.

  - if (!(map[mapByte] & (1 << mapBit)))
  + if (flags != (map[mapByte] & (flags << mapBit)))

  Something like the following will do the right thing.

  + if (flags != (map[mapByte]>>mapBit & VISIBILITYMAP_VALID_BITS))

analyze.c:

 In do_analyze_rel, the successive if (!inh) in the following
 steps looks a bit odd. This would be emphasized by the first if
 block you added:) These blocks should be enclosed by if (!inh)
 {} block.

 >   /* Calculate the number of all-visible and all-frozen bit */
 >   if (!inh)
 >   relallvisible = visibilitymap_count(onerel, );
 >   if (!inh)
 >   vac_update_relstats(onerel,
 >   if (!inh && !(options & VACOPT_VACUUM))
 >   {
 >   for (ind = 0; ind < nindexes; ind++)
 ...
 >   }
 >   if (!inh)
 >   pgstat_report_analyze(onerel, totalrows, totaldeadrows, relallfrozen);

vacuum.c:

  >- * relpages and relallvisible, we try to maintain certain lazily-updated
  >- * DDL flags such as relhasindex, by clearing them if no longer correct.
  >- * It's safe to do this in VACUUM, which can't run in parallel with
  >- * CREATE INDEX/RULE/TRIGGER and can't be part of a transaction block.
  >- * However, it's *not* safe to do it in an ANALYZE that's within an

  >+ * relpages, relallvisible, we try to maintain certain lazily-updated

Why did you just drop the 'and' after relpages? And this seems
the only change of this file except the additinally missing
letter just below:p

  >+ * DDL flags such as relhasindex, by clearing them if no onger correct.
  >+ * It's safe to do this in VACUUM, which can't run in
  >+ * parallel with CREATE INDEX/RULE/TRIGGER and can't be part of a 
transaction
  >+ * block.  However, it's *not* safe to do it in an ANALYZE that's within an

nodeIndexonlyscan.c:

 A duplicate letters.  And the line exceeds right margin.

 > - * Note on Memory Ordering Effects: visibilitymap_test does not lock
-> + * Note on Memory Ordering Effects: visibilitymap_get_stattus does not lock
+ * Note on Memory Ordering Effects: visibilitymap_get_status does not lock

 The edited line exceeds right margin by removing a newline.

- if (!visibilitymap_test(scandesc->heapRelation,
- ItemPointerGetBlockNumber(tid),
- >ioss_VMBuffer))
+ if (!VM_ALL_VISIBLE(scandesc->heapRelation, ItemPointerGetBlockNumber(tid),
+ >ioss_VMBuffer))

costsize.c:

 Duplicate words and it is the only change.

 > - * pages for which the visibility map shows all tuples are visible.
-> + * pages for which the visibility map map shows all tuples are visible.
+ * pages for which the visibility map shows all tuples are visible.

pgstat.c:

 The new parameter frozenpages of pgstat_report_vacuum() is
 defined as int32, but its callers give BlockNumber(=uint32).  I
 recommend to define the frozenpages as BlockNumber.
 PgStat_MsgVacuum has a corresponding member defined as int32.

pg_upgrade.c:

 BITS_PER_HEAPBLOCK is defined in two c files with the same
 definition. This might be better to be merged into some header
 file.

heapam_xlog.h, hio.h, execnodes.h:

 Have we decided to rename vm to pim? Anyway it is inconsistent
 with that of corresponding definition of the function body
 remains as 'vm_buffer'. (I'm not confident on that, though.)

 >-   Buffer vm_buffer, TransactionId cutoff_xid);
 >+   Buffer pim_buffer, TransactionId cutoff_xid, uint8 flags);

regards,

At Wed, 2 Dec 2015 00:10:09 +0530, Masahiko Sawada  
wrote in 
> On Tue, Dec 1, 2015 at 3:04 AM, Jeff Janes  wrote:
> > On Mon, Nov 30, 2015 at 9:18 AM, Masahiko Sawada  
> > wrote:
> > After running pg_upgrade, if I manually vacuum a table a start getting 
> > warnings:
> >
> > WARNING:  page is not marked all-visible (and all-frozen) but
> > visibility map bit(s) is set in relation "foo" page 32756
> > WARNING:  page is not marked all-visible (and all-frozen) but
> > visibility map bit(s) is set in relation "foo" page 32756
...
> > The warnings are right where the blocks would start using the 2nd page
> > of the _vm, so I think the problem is there.

Re: [HACKERS] proposal: multiple psql option -c

2015-12-01 Thread Pavel Stehule

Hi


> Yeah, I don't think that's a big issue either to be honest. The code
> is kept consistent a maximum with what is there previously.
>
> Patch is switched to ready for committer.
>

perfect

Thank you very much to all

Regards

Pavel


> --
> Michael
>

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Kouhei Kaigai

> On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai  wrote:
> > The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
> > FDW driver can set arbitrary but one path-node here.
> > After that, this path-node shall be transformed to plan-node by
> > createplan.c, then passed to FDW driver using GetForeignPlan callback.
> > We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
> > The Plan->outerPlan is a common field, so patch size become relatively
> > small. FDW driver can initialize this plan at BeginForeignScan, then
> > execute this sub-plan-tree on demand.
> >
> > Remaining portions are as previous version. ExecScanFetch is revised
> > to call recheckMtd always when scanrelid==0, then FDW driver can get
> > control using RecheckForeignScan callback.
> > It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
> > (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
> > by its preferable implementation - including execution of an alternative
> > sub-plan.
> >
> >> There seems to be no changes to make_foreignscan.  Is that OK?
> >>
> > create_foreignscan_path(), not only make_foreignscan().
> >
> > This patch is not tested by actual FDW extensions, so it is helpful
> > to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.
> 
> I have done some editing and some small revisions on this patch.
> Here's what I came up with.  The revisions are mostly cosmetic, but I
> revised it a bit so that the signature of GetForeignPlan need not
> change.
>
Thanks for the revising. (I could not be online for a few days, sorry.)

> Also, I made nodeForeignScan.c do some of the outer plan
> handling automatically,
>
It's OK for me. We may omit initialization/shutdown of sub-plan when
it is not actually needed, even if FDW driver set up. However, it is
very tiny advantage.

> and I fixed the compile breaks in
> contrib/file_fdw and contrib/postgres_fdw.
>
Sorry, I didn't fix up contrib side.

> Comments/review/testing are very welcome.
>
One small point:

@@ -3755,7 +3762,6 @@ make_foreignscan(List *qptlist,
/* cost will be filled in by create_foreignscan_plan */
plan->targetlist = qptlist;
plan->qual = qpqual;
-   plan->lefttree = NULL;
plan->righttree = NULL;
node->scan.scanrelid = scanrelid;
/* fs_server will be filled in by create_foreignscan_plan */

Although it is harmless, I prefer this line is kept because caller
of make_foreignscan() expects a ForeignScan node with empty lefttree,
even if it is filled up later.

Best regards,
--
NEC Business Creation Division / PG-Strom Project
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Kouhei Kaigai

> On 2015/12/02 1:41, Robert Haas wrote:
> > On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
> >  wrote:
> >>> The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
> >>> FDW driver can set arbitrary but one path-node here.
> >>> After that, this path-node shall be transformed to plan-node by
> >>> createplan.c, then passed to FDW driver using GetForeignPlan callback.
> 
> >> I understand this, as I also did the same thing in my patches, but 
> >> actually,
> >> that seems a bit complicated to me.  Instead, could we keep the
> >> fdw_outerpath in the fdw_private of a ForeignPath node when creating the
> >> path node during GetForeignPaths, and then create an outerplan accordingly
> >> from the fdw_outerpath stored into the fdw_private during GetForeignPlan, 
> >> by
> >> using create_plan_recurse there?  I think that that would make the core
> >> involvment much simpler.
> 
> > I can't see how it's going to get much simpler than this.  The core
> > core is well under a hundred lines, and it all looks pretty
> > straightforward to me.  All of our existing path and plan types keep
> > lists of paths and plans separate from other kinds of data, and I
> > don't think we're going to win any awards for deviating from that
> > principle here.
> 
> One thing I can think of is that we can keep both the structure of a
> ForeignPath node and the API of create_foreignscan_path as-is.  The
> latter is a good thing for FDW authors.  And IIUC the patch you posted
> today, I think we could make create_foreignscan_plan a bit simpler too.
>   Ie, in your patch, you modified that function as follows:
> 
> @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root,
> ForeignPath *best_path,
>*/
>   scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid,
> 
>   best_path,
> -
>   tlist, scan_clauses);
> +
>   tlist,
> +
>   scan_clauses);
> + outerPlan(scan_plan) = fdw_outerplan;
> 
> I think that would be OK, but I think we would have to do a bit more
> here about the fdw_outerplan's targetlist and qual; I think that the
> targetlist needs to be changed to fdw_scan_tlist, as in the patch [1],
>
Hmm... you are right. The sub-plan shall generate a tuple according to
the fdw_scan_tlist, if valid. Do you think the surgical operation is best
to apply alternative target-list than build_path_tlist()?

> and that it'd be better to change the qual to remote conditions, ie,
> quals not in the scan_plan's scan.plan.qual, to avoid duplicate
> evaluation of local conditions.  (In the patch [1], I didn't do anything
> about the qual because the current postgres_fdw join pushdown patch
> assumes that all the the scan_plan's scan.plan.qual are pushed down.)
> Or, FDW authors might want to do something about fdw_recheck_quals for a
> foreign-join while creating the fdw_outerplan.  So if we do that during
> GetForeignPlan, I think we could make create_foreignscan_plan a bit
> simpler, or provide flexibility to FDW authors.
>
So, you suggest it is better to pass fdw_outerplan on the GetForeignPlan
callback, to allow FDW to adjust target-list and quals of sub-plans.
I think it is reasonable argue. Only FDW knows which qualifiers are
executable on remote side, so it is not easy to remove qualifiers to be
executed on host-side only, from the sub-plan tree.

> >> @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
> >> *slot)
> >>
> >>  ResetExprContext(econtext);
> >>
> >> +   /*
> >> +* FDW driver has to recheck visibility of EPQ tuple towards
> >> +* the scan qualifiers once it gets pushed down.
> >> +* In addition, if this node represents a join sub-tree, not
> >> +* a scan, FDW driver is also responsible to reconstruct
> >> +* a joined tuple according to the primitive EPQ tuples.
> >> +*/
> >> +   if (fdwroutine->RecheckForeignScan)
> >> +   {
> >> +   if (!fdwroutine->RecheckForeignScan(node, slot))
> >> +   return false;
> >> +   }
> >>
> >> Maybe I'm missing something, but I think we should let FDW do the work if
> >> scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
> >> scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
> >> abort the transaction.)
> 
> > That would be unnecessarily restrictive.  On the one hand, even if
> > scanrelid != 0, the FDW can decide that it prefers to do the rechecks
> > using RecheckForeignScan rather than fdw_recheck_quals.  For most
> > FDWs, I expect using fdw_recheck_quals to be more convenient, but
> > there may be cases where somebody prefers to use RecheckForeignScan,
> > and allowing that costs nothing.
> 
> I suppose that the flexibility would probably be a good thing, but I'm a
> little bit concerned that that might be rather confusing to FDW authors.
>
We expect FDW authors, like

[HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread David E. Wheeler

Hackers,

Looks like Mac OS X 10.11 El Capitan has remove the OpenSSL header files. They 
recommend building your own or using native OS X SDKs, like Secure Transport:

  http://lists.apple.com/archives/macnetworkprog/2015/Jun/msg00025.html

I don’t suppose anyone has looked at what it would take to get PostgreSQL use 
Secure Transport, right? Here are the docs:

  
https://developer.apple.com/library/ios/documentation/Security/Reference/secureTransportRef/index.html

If it’s not feasible, those of use who need SSL connections on OS X will just 
have to build OpenSSL ourselves (or install from Homebrew or MacPorts).

David



smime.p7s
Description: S/MIME cryptographic signature

Re: [HACKERS] Freeze avoidance of very large table.

2015-12-01 Thread Masahiko Sawada

On Tue, Dec 1, 2015 at 3:04 AM, Jeff Janes  wrote:
> On Mon, Nov 30, 2015 at 9:18 AM, Masahiko Sawada  
> wrote:
>> On Sun, Nov 29, 2015 at 2:21 AM, Jeff Janes  wrote:
>>> On Tue, Nov 24, 2015 at 3:13 PM, Masahiko Sawada  
>>> wrote:

 Yeah, we need to consider to compute checksum if enabled.
 I've changed the patch, and attached.
 Please review it.
>>>
>>> Thanks for the update.  This now conflicts with the updates doesn to
>>> fix pg_upgrade out-of-space issue on Windows. I've fixed (I think) the
>>> conflict in order to do some testing, but I'd like to get an updated
>>> patch from the author in case I did it wrong.  I don't want to find
>>> bugs that I just introduced myself.
>>>
>>
>> Thank you for having a look.
>>
>> Attached updated v28 patch.
>> Please review it.
>>
>> Regards,
>
> After running pg_upgrade, if I manually vacuum a table a start getting 
> warnings:
>
> WARNING:  page is not marked all-visible (and all-frozen) but
> visibility map bit(s) is set in relation "foo" page 32756
> WARNING:  page is not marked all-visible (and all-frozen) but
> visibility map bit(s) is set in relation "foo" page 32756
> WARNING:  page is not marked all-visible (and all-frozen) but
> visibility map bit(s) is set in relation "foo" page 32757
> WARNING:  page is not marked all-visible (and all-frozen) but
> visibility map bit(s) is set in relation "foo" page 32757
>
> The warnings are right where the blocks would start using the 2nd page
> of the _vm, so I think the problem is there.  And looking at the code,
> I think that "cur += SizeOfPageHeaderData;" in the inner loop cannot
> be correct.  We can't skip a header in the current (old) block each
> time we reach the end of the new block.  The thing we are skipping in
> the current block is half the time not a header, but the data at the
> halfway point through the block.
>

Thank you for reviewing.

You're right, it's not necessary.
Attached latest v29 patch which removes the mention in pg_upgrade documentation.


Regards,

--
Masahiko Sawada


000_add_frozen_bit_into_visibilitymap_v29.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Another little thing about psql wrapped expanded output

2015-12-01 Thread David Fetter

On Tue, Dec 01, 2015 at 11:20:53AM -0500, Tom Lane wrote:
> regression=# \pset format wrapped
> Output format is wrapped.
> regression=# \x
> Expanded display is on.
> regression=# select * from int8_tbl;
> -[ RECORD 1 
> ]---
> q1 | 123
> q2 | 456
> -[ RECORD 2 
> ]---
> q1 | 123
> q2 | 4567890123456789
> -[ RECORD 3 
> ]---
> q1 | 4567890123456789
> q2 | 123
> -[ RECORD 4 
> ]---
> q1 | 4567890123456789
> q2 | 4567890123456789
> -[ RECORD 5 
> ]---
> q1 | 4567890123456789
> q2 | -4567890123456789
> 
> Notice that the dashed lines go all the way to the right margin of my
> 80-column terminal window, even though the data requires no more than
> 22 columns.  While this doesn't look so awful as-is, when I'm working
> in a very wide window it starts to look a little silly.
> 
> The behavior I'd have expected is that if the data is narrower than
> the window, the lines only go to the right margin of the data.  This
> is a trivial change to the logic in print_aligned_vertical, but before
> I go make it, does anyone want to argue that the current behavior is
> preferable to that?

+1 for changing it.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] El Capitan Removes OpenSSL Headers

2015-12-01 Thread Tom Lane

"David E. Wheeler"  writes:
> Looks like Mac OS X 10.11 El Capitan has remove the OpenSSL header files. 
> They recommend building your own or using native OS X SDKs, like Secure 
> Transport:
>   http://lists.apple.com/archives/macnetworkprog/2015/Jun/msg00025.html

That's annoying.

> I donât suppose anyone has looked at what it would take to get PostgreSQL 
> use Secure Transport, right?

This is going to put a bit more urgency into the project Heikki had been
working on to allow use of more than one SSL implementation.  I can't
really see us back-porting that, though, which is going to leave things
in a fairly nasty place for all pre-9.6 branches ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Robert Haas

On Thu, Nov 26, 2015 at 12:04 AM, Kouhei Kaigai  wrote:
> The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
> FDW driver can set arbitrary but one path-node here.
> After that, this path-node shall be transformed to plan-node by
> createplan.c, then passed to FDW driver using GetForeignPlan callback.
> We expect FDW driver set this plan-node on lefttree (a.k.a outerPlan).
> The Plan->outerPlan is a common field, so patch size become relatively
> small. FDW driver can initialize this plan at BeginForeignScan, then
> execute this sub-plan-tree on demand.
>
> Remaining portions are as previous version. ExecScanFetch is revised
> to call recheckMtd always when scanrelid==0, then FDW driver can get
> control using RecheckForeignScan callback.
> It allows FDW driver to handle (1) EPQ recheck on underlying scan nodes,
> (2) reconstruction of joined tuple, and (3) EPQ recheck on join clauses,
> by its preferable implementation - including execution of an alternative
> sub-plan.
>
>> There seems to be no changes to make_foreignscan.  Is that OK?
>>
> create_foreignscan_path(), not only make_foreignscan().
>
> This patch is not tested by actual FDW extensions, so it is helpful
> to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.

I have done some editing and some small revisions on this patch.
Here's what I came up with.  The revisions are mostly cosmetic, but I
revised it a bit so that the signature of GetForeignPlan need not
change.  Also, I made nodeForeignScan.c do some of the outer plan
handling automatically, and I fixed the compile breaks in
contrib/file_fdw and contrib/postgres_fdw.

Comments/review/testing are very welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c
index 5ce8f90..1966b51 100644
--- a/contrib/file_fdw/file_fdw.c
+++ b/contrib/file_fdw/file_fdw.c
@@ -525,6 +525,7 @@ fileGetForeignPaths(PlannerInfo *root,
 	 total_cost,
 	 NIL,		/* no pathkeys */
 	 NULL,		/* no outer rel either */
+	 NULL,		/* no extra plan */
 	 coptions));
 
 	/*
diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index a6ba672..dd63159 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -535,6 +535,7 @@ postgresGetForeignPaths(PlannerInfo *root,
    fpinfo->total_cost,
    NIL, /* no pathkeys */
    NULL,		/* no outer rel either */
+   NULL,		/* no extra plan */
    NIL);		/* no fdw_private list */
 	add_path(baserel, (Path *) path);
 
@@ -589,6 +590,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 		 total_cost,
 		 usable_pathkeys,
 		 NULL,
+		 NULL,
 		 NIL));
 	}
 
@@ -756,6 +758,7 @@ postgresGetForeignPaths(PlannerInfo *root,
 	   total_cost,
 	   NIL,		/* no pathkeys */
 	   param_info->ppi_req_outer,
+	   NULL,
 	   NIL);	/* no fdw_private list */
 		add_path(baserel, (Path *) path);
 	}
diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 1533a6b..a646b2a 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -765,6 +765,35 @@ RefetchForeignRow (EState *estate,
  See  for more information.
 
 
+
+
+bool
+RecheckForeignScan (ForeignScanState *node, TupleTableSlot *slot);
+
+ Recheck that a previously-returned tuple still matches the relevant
+ scan and join qualifiers, and possibly provide a modified version of
+ the tuple.  For foreign data wrappers which do not perform join pushdown,
+ it will typically be more convenient to set this to NULL and
+ instead set fdw_recheck_quals appropriately.
+ When outer joins are pushed down, however, it isn't sufficient to
+ reapply the checks relevant to all the base tables to the result tuple,
+ even if all needed attributes are present, because failure to match some
+ qualifier might result in some attributes going to NULL, rather than in
+ no tuple being returned.  RecheckForeignScan can recheck
+ qualifiers and return true if they are still satisfied and false
+ otherwise, but it can also store a replacement tuple into the supplied
+ slot.
+
+
+
+ To implement join pushdown, a foreign data wrapper will typically
+ construct an alternative local join plan which is used only for
+ rechecks; this will become the outer subplan of the
+ ForeignScan.  When a recheck is required, this subplan
+ can be executed and the resulting tuple can be stored in the slot.
+ This plan need not be efficient since no base table will return more
+ that one row; for example, it may implement all joins as nested loops.
+

 

@@ -1137,11 +1166,17 @@ GetForeignServerByName(const char *name, bool missing_ok);

Re: [HACKERS] On-demand running query plans using auto_explain and signals

2015-12-01 Thread Shulgin, Oleksandr

On Tue, Dec 1, 2015 at 12:04 AM, Simon Riggs  wrote:

> On 30 November 2015 at 22:27, Julien Rouhaud 
> wrote:
>
>
>> I registered as reviewer on this, but after reading the whole thread for
>> the second time, it's still not clear to me if the last two submitted
>> patches (0001-Add-auto_explain.publish_plans.patch and
>> 0002-Add-SHM-table-of-contents-to-the-explain-DSM.patch) are still
>> needed reviews, since multiple refactoring ideas and objections have
>> been raised since.
>>
>
> I looked into it more deeply and decided partial EXPLAINs aren't very
> useful yet.
>
> I've got some thoughts around that which I'll publish when I have more
> time. I suggest this is a good idea, just needs some serious
> committer-love, so we should bounce this for now.
>

I have the plans to make something from this on top of pg_stat_statements
and auto_explain, as I've mentioned last time.  The next iteration will be
based on the two latest patches above, so it still makes sense to review
them.

As for EXPLAIN ANALYZE support, it will require changes to core, but this
can be done separately.

--
Alex

Re: [HACKERS] custom function for converting human readable sizes to bytes

2015-12-01 Thread Kyotaro HORIGUCHI

Hello, The meaning of "be orange" (I couldn't find it) interests
me but putting it aside..

I have some random comments.

At Mon, 30 Nov 2015 18:30:18 +0100, Pavel Stehule  
wrote in 
> Hi
> 
> 
> > 2. using independent implementation - there is some redundant code, but we
> > can support duble insted int, and we can support some additional units.
> >
> 
>  new patch is based on own implementation - use numeric/bigint calculations
> 
> + regress tests and doc

1. What do you think about binary byte prefixes? (kiB, MiB...)

 I couldn't read what Robert wrote upthread but I also would like
 to have 2-base byte unit prifixes. (Sorry I couldn't put it
 aside..)


2. Doesn't it allow units in lower case?

 I think '1gb' and so should be acceptable as an input.


3. Why are you allow positive sign prefixing digits? I suppose
  that positive sign also shouldn't be allowed if it rejects
  prifixing negative sign.

4. Useless includes

 dbsize.c is modified to include guc.h but it is useless.

5. Error message

+   if (ndigits == 0)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("string is empty")));

 If pg_size_bytes allows prefixing positive sign or spaces,
 ndigits == 0 here doesn't mean that the whole string is empty.


6. Symmetry with pg_size_pretty

 pg_size_pretty doesn't have units larger than TB. Just adding PB
 to it breaks backward compatibility, but leaving as it is breaks
 symmetry with this function. Some possible solution for this
 that I can guess for now are..

  - Leave it as is.

  - New GUC to allow PB for pg_size_pretty().

  - Expanded function such like pg_size_pretty2 (oops..)

  - New polymorphic (sibling?) function with additional
parameters to instruct how they work. (The following
involving 2-base representations)

pg_size_pretty(n, <2base>, );
pg_size_bytes(n, <2base>, );

7. Docs

+  Converts a size in human-readable format with size units
+  to bytes

 The descriptions for the functions around use 'into' instead of
 'to' for the preposition.


8. Regression

 The regression in the patch looks good enough as is (except that
 it forgets the unit 'B' or prifixing spaces or sings), but they
 are necessarily have individual tests. The following SQL
 statement will do them at once.

  SELECT s, pg_size_bytes(s) FROM (VALUES ('1'), ('1B')...) as t(s);

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2015-12-01 Thread Pavel Stehule

2015-12-01 11:02 GMT+01:00 Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp>:

> Hello, The meaning of "be orange" (I couldn't find it) interests
> me but putting it aside..
>
> I have some random comments.
>
> At Mon, 30 Nov 2015 18:30:18 +0100, Pavel Stehule 
> wrote in <
> cafj8prcd6we3tqmr0vbcn98wf0p3o3h6nau4btaoswcj443...@mail.gmail.com>
> > Hi
> >
> >
> > > 2. using independent implementation - there is some redundant code,
> but we
> > > can support duble insted int, and we can support some additional units.
> > >
> >
> >  new patch is based on own implementation - use numeric/bigint
> calculations
> >
> > + regress tests and doc
>
> 1. What do you think about binary byte prefixes? (kiB, MiB...)
>

I don't know this mechanics - can you write it? It can be good idea/


>
>  I couldn't read what Robert wrote upthread but I also would like
>  to have 2-base byte unit prifixes. (Sorry I couldn't put it
>  aside..)
>
>
> 2. Doesn't it allow units in lower case?
>

The parser is consistent with a behave used in configure file processing.
We can change it, but then there will be divergence between this function
and GUC parser.


>
>  I think '1gb' and so should be acceptable as an input.
>
>
> 3. Why are you allow positive sign prefixing digits? I suppose
>   that positive sign also shouldn't be allowed if it rejects
>   prifixing negative sign.
>

I have not strong opinion about it. '-' is exactly wrong, but '+' can be
acceptable. But it can be changed.


>
> 4. Useless includes
>
>  dbsize.c is modified to include guc.h but it is useless.
>

I'll fix it.


>
> 5. Error message
>
> +   if (ndigits == 0)
> +   ereport(ERROR,
> +   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> +errmsg("string is empty")));
>
>  If pg_size_bytes allows prefixing positive sign or spaces,
>  ndigits == 0 here doesn't mean that the whole string is empty.
>
>
I'll fix it.


>
> 6. Symmetry with pg_size_pretty
>
>  pg_size_pretty doesn't have units larger than TB. Just adding PB
>  to it breaks backward compatibility, but leaving as it is breaks
>  symmetry with this function. Some possible solution for this
>  that I can guess for now are..
>

I prefer a warning about possible compatibility issue in pg_size_pretty and
support PB directly there.


>
>   - Leave it as is.
>
>   - New GUC to allow PB for pg_size_pretty().
>
>   - Expanded function such like pg_size_pretty2 (oops..)
>
>   - New polymorphic (sibling?) function with additional
> parameters to instruct how they work. (The following
> involving 2-base representations)
>
> pg_size_pretty(n, <2base>, );
> pg_size_bytes(n, <2base>, );
>
> 7. Docs
>
> +  Converts a size in human-readable format with size units
> +  to bytes
>
>  The descriptions for the functions around use 'into' instead of
>  'to' for the preposition.
>
>
> 8. Regression
>
>  The regression in the patch looks good enough as is (except that
>  it forgets the unit 'B' or prifixing spaces or sings), but they
>  are necessarily have individual tests. The following SQL
>  statement will do them at once.
>
>   SELECT s, pg_size_bytes(s) FROM (VALUES ('1'), ('1B')...) as t(s);
>
>
I'll fix it.

Thank you for your ideas

Regards

Pavel


> regards,
>
> --
> Kyotaro Horiguchi
> NTT Open Source Software Center
>
>
>

Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.

2015-12-01 Thread Amit Langote

On 2015/12/01 16:25, Kyotaro HORIGUCHI wrote:
> At Mon, 30 Nov 2015 19:10:44 -0700 (MST), Vinayak  wrote
>> Thanks for the v7.
>> Please check the comment below.
>> -Table name in the vacuum progress
>>
>> + snprintf(progress_message[0], PROGRESS_MESSAGE_LENGTH, "%s.%s",
>> schemaname,relname);
>>
>> In the vacuum progress, column table_name is showing first 30 characters of
>> table name.
> 
> Yeah, it is actually restricted in that length. But if we allow
> the buffer to store whole the qualified names, it will need 64 *
> 2 + 1 +1 = 130 bytes * 10 1300 bytes for each beentry... It might
> be acceptable by others, but I don't think that is preferable..
> 
> Separating namespace and relation name as below reduces the
> required length of the field but 62 bytes is still too long for
> most of the information and in turn too short for longer messages
> in some cases.

As done in the patch, the table name is stored in one of the slots of
st_progress_message which has the width limit of PROGRESS_MESSAGE_LENGTH
bytes. Whereas users of pgstat_report_progress interface could make sure
that strings of their choosing to be stored in st_progress_param slots are
within the PROGRESS_MESSAGE_LENGTH limit, the same cannot be ensured for
the table name. Maybe, the table name is a different kind of information
than other reported parameters that it could be treated specially. How
about a separate st_* member, say, st_command_target[2*NAMDATALEN+1] for
the table name? It would be reported using a separate interface, say,
pgstat_report_command_target() once the name is determined. Moreover,
subsequent pgstat_report_progress() invocations need not copy the table
name needlessly as part of copying argument values to st_progress_param
(which is a separate suggestion in its own right though).

Thanks,
Amit

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Help on creating C function that reads int2[] (using "int2vector")

2015-12-01 Thread Rodrigo Hjort

Hello PG Hackers,


I created a custom C function with this signature:

CREATE FUNCTION calculate_hash(numbers int2[])
RETURNS int8
AS 'MODULE_PATHNAME', 'pg_calculate_hash'
LANGUAGE C
IMMUTABLE STRICT;


And here is the function source code (inspired in codes I found in
src/backend/utils/adt/int.c):

PG_FUNCTION_INFO_V1(pg_calculate_hash);
Datum
pg_calculate_hash(PG_FUNCTION_ARGS)
{
  int2vector *int2Array = (int2vector *) PG_GETARG_POINTER(0);
  const int qtd = int2Array->dim1;

  elog(DEBUG1, "pg_calculate_hash(qtd=%d)", qtd);

  elog(DEBUG2, "  [ndim=%d, dataoffset=%d, elemtype=%d, dim1=%d,
lbound1=%d]",
int2Array->ndim, int2Array->dataoffset, int2Array->elemtype,
int2Array->dim1, int2Array->lbound1);

  [...]
}


In order to test it against a table structure, I executed these
instructions on psql:

db=# create table ss (s int2[]);
CREATE TABLE

db=# \d+ ss
Table "public.ss"
 Column |Type| Modifiers | Storage  | Stats target | Description
++---+--+--+-
 s  | smallint[] |   | extended |  |
Has OIDs: no

db=# insert into ss values ('[0:5]={58,17,15,36,59,54}');
INSERT 0 1

db=# select * from ss;
 s
---
 [0:5]={58,17,15,36,59,54}
(1 row)


Then, whenever calling the function passing the int2[] column directly,
strange values are read into the "int2vector" object:

db=# set client_min_messages to debug2;
SET

db=# select s, calculate_hash(s) from ss;
DEBUG:  pg_calculate_hash(qtd=0)
DEBUG:[ndim=0, dataoffset=5376, elemtype=1536, dim1=0,
lbound1=285227520]
 s | calculate_hash
---+---
 [0:5]={58,17,15,36,59,54} | 0
(1 row)


On the other hand, when I double-cast the int2[] column value, it works as
expected (reading the proper "int2vector" structure):

db=# select s, calculate_hash(s::varchar::int2[]) from ss;
DEBUG:  pg_calculate_hash(qtd=6)
DEBUG:[ndim=1, dataoffset=0, elemtype=21, dim1=6, lbound1=0]
 s |   calculate_hash
---+
 [0:5]={58,17,15,36,59,54} | 441352797842128896
(1 row)


Please, what is wrong with that function code?

Thanks in advance.

The whole project is on GitHub:
https://github.com/hjort/mega-sena/tree/master/src


Best regards,

-- 
Rodrigo Hjort
www.hjort.co

[HACKERS] More stable query plans via more predictable column statistics

2015-12-01 Thread Shulgin, Oleksandr

Hi Hackers!

This post summarizes a few weeks of research of ANALYZE statistics
distribution on one of our bigger production databases with some real-world
data and proposes a patch to rectify some of the oddities observed.


Introduction


We have observed that for certain data sets the distribution of samples
between most_common_vals and histogram_bounds can be unstable: so that it
may change dramatically with the next ANALYZE run, thus leading to
radically different plans.

I was revisiting the following performance thread and I've found some
interesting details about statistics in our environment:


http://www.postgresql.org/message-id/flat/CAMkU=1zxynmn11yl8g7agf7k5u4zhvjn0dqcc_eco1qs49u...@mail.gmail.com#CAMkU=1zxynmn11yl8g7agf7k5u4zhvjn0dqcc_eco1qs49u...@mail.gmail.com

My initial interest was in evaluation if distribution of samples could be
made more predictable and less dependent on the factor of luck, thus
leading to more stable execution plans.


Unexpected findings
===

What I have found is that in a significant percentage of instances, when a
duplicate sample value is *not* put into the MCV list, it does produce
duplicates in the histogram_bounds, so it looks like the MCV cut-off
happens too early, even though we have enough space for more values in the
MCV list.

In the extreme cases I've found completely empty MCV lists and histograms
full of duplicates at the same time, with only about 20% of distinct values
in the histogram (as it turns out, this happens due to high fraction of
NULLs in the sample).


Data set and approach
=

In order to obtain these insights into distribution of statistics samples
on one of our bigger databases (~5 TB, 2,300+ tables, 31,000+ individual
attributes) I've built some queries which all start with the following CTEs:

WITH stats1 AS (
SELECT *,
   current_setting('default_statistics_target')::int stats_target,

   array_length(most_common_vals,1) AS num_mcv,
   (SELECT SUM(f) FROM UNNEST(most_common_freqs) AS f) AS mcv_frac,

   array_length(histogram_bounds,1) AS num_hist,
   (SELECT COUNT(DISTINCT h)
  FROM UNNEST(histogram_bounds::text::text[]) AS h) AS
distinct_hist

   FROM pg_stats
  WHERE schemaname NOT IN ('pg_catalog', 'information_schema')
),
stats2 AS (
SELECT *,
   distinct_hist::float/num_hist AS hist_ratio
  FROM stats1
)

The idea here is to collect the number of distinct values in the histogram
bounds vs. the total number of bounds.  One of the reasons why there might
be duplicates in the histogram is a fully occupied MCV list, so we collect
the number of MCVs as well, in order to compare it with the stats_target.

These queries assume that all columns use the default statistics target,
which was actually the case with the database where I was testing this.
 (It is straightforward to include per-column stats target in the picture,
but to do that efficiently, one will have to copy over the definition of
pg_stats view in order to access pg_attribute.attstattarget, and also will
have to run this as superuser to access pg_statistic.  I wanted to keep the
queries simple to make it easier for other interested people to run these
queries in their environment, which quite likely excludes superuser
access.  The more general query is also included[3].)


With the CTEs shown above it was easy to assess the following
"meta-statistics":

WITH ...
SELECT count(1),
   min(hist_ratio)::real,
   avg(hist_ratio)::real,
   max(hist_ratio)::real,
   stddev(hist_ratio)::real
  FROM stats2
 WHERE histogram_bounds IS NOT NULL;

-[ RECORD 1 ]
count  | 18980
min| 0.181818
avg| 0.939942
max| 1
stddev | 0.143189

That doesn't look too bad, but the min value is pretty fishy already.  If I
would run the same query, limiting the scope to non-fully-unique
histograms, with "WHERE distinct_hist < num_hist" instead, the picture
would be a bit different:

-[ RECORD 1 ]
count  | 3724
min| 0.181818
avg| 0.693903
max| 0.990099
stddev | 0.170845

It shows that about 20% of all analyzed columns that have a histogram
(3724/18980), also have some duplicates in it, and that they have only
about 70% of distinct sample values on average.


Select statistics examples
==

Apart from mere aggregates it is interesting to look at some specific
MCVs/histogram examples.  The following query is aimed to reconstruct
values of certain variables present in analyze.c code of
compute_scalar_stats() (again, with the same CTEs as above):

WITH ...
SELECT schemaname ||'.'|| tablename ||'.'|| attname || (CASE inherited WHEN
TRUE THEN ' (inherited)' ELSE '' END) AS columnname,
   n_distinct, null_frac,
   num_mcv, most_common_vals, most_common_freqs,
   mcv_frac, (mcv_frac / (1 - null_frac))::real AS nonnull_mcv_frac,
   distinct_hist, num_hist, hist_ratio,
   histogram_bounds
  FROM

Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-12-01 Thread Teodor Sigaev


Thank you, committed.

Alexander Korotkov wrote:

On Sat, Nov 28, 2015 at 2:27 PM, Michael Paquier > wrote:

On Fri, Nov 27, 2015 at 11:42 PM, Teodor Sigaev > wrote:
> Seems, patch is ready to commit. But it needs some documentation.

Of what kind? The documentation of pg_rewind is rather explicit on the
subject and looks fine as-is, and that's what Alexander and I agreed
on upthread. If there is something forgotten though, this may be the
fact that we do not mention in 9.5's documentation that pg_rewind can
*not* handle timeline switches.


However, we can add some explicit statements about new pg_rewind capabilities.
Please, check attached patch for those statements.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com 
The Russian Postgres Company





--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Remaining 9.5 open items

2015-12-01 Thread Robert Haas

On Mon, Nov 30, 2015 at 2:43 PM, Tom Lane  wrote:
> * Foreign join pushdown vs EvalPlanQual
>
> Is this fixed by 5fc4c26db?  If not, what remains to do?

Unfortunately, no.  That commit allows FDWs to do proper EPQ handling
for plain table scans, but it proves to be inadequate for EPQ handling
for joins. Solving that problem will require another patch, and,
modulo a bunch of cosmetic issues, I'm reasonably happy with KaiGai
Kohei's latest submission.  I'll respond in more detail on that
thread, but the question I want to raise here is: do we want to
back-patch those changes to 9.5 at this late date?

If we don't, then join pushdown won't be usable in 9.5 for queries
that have locking clauses; it will crash.  However, it may be that
nobody's going to try to do that anyway.  And if they do, they have
the PlannerInfo available when generating paths, so they can just not
push down any joins when there are row marks, which doesn't sound like
the worst thing that ever happened to anybody.  It's also possible
that the fix isn't really correct and we won't find that out until
after release, at which point it'll be too late to tinker with the API
(if it isn't already).  On the other hand, what do we get out of
shipping an API that we know to be a few bricks short of a load?  I
think the risk of collateral damage is low.  If there's a problem, I
expect it to be that the join-pushdown-vs-EPQ problem is still not
solved, not that anything else stops working.

>From the point of view of existing FDWs, the most noticeable effect of
the patch is that you'll have to pass one more NULL argument to
functions make_foreignscan().  This will break compiles, but it should
be a more trivial repair than what 5fc4c26db demanded.  So maybe it's
OK.

I could go either way on this.  What do others think?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Regarding recovery configuration

2015-12-01 Thread Michael Paquier

On Tue, Dec 1, 2015 at 8:46 PM, Dmitry Ivanov  wrote:
> 2. Add them to the GUC, perhaps with a special context.
>
> Both approaches would allow us to create, for example, a dedicated view for
> the 'recovery.conf' file using a drastically simplified procedure written in 
> C.
> Also it would get easier to reload at least some settings from the
> 'recovery.conf' file at runtime.
>
> Suggestions and comments are welcome.

Beginning here may be a good idea, this is not a new subject:
http://www.postgresql.org/message-id/CAJKUy5id1eyweK0W4+yyCM6+-qYs9erLidUmb=1a-qybgtw...@mail.gmail.com

And that's something a couple of people have tried to work on without
much progress over the last couple of years: we blocked on a couple of
points like for example the way we should manage the switch to what is
now done from recovery.conf to recovery.done.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: SCRAM authentication

2015-12-01 Thread Michael Paquier

On Mon, Nov 16, 2015 at 10:53 PM, Michael Paquier wrote:
> Reviving an old thread for a patch still registered in this commit
> fest to make the arguing move on.
>
> Supporting multiple verifiers for a single role has IMO clear advantages:
> - help transition to new protocols and decommission of old protocols
> without the need to create alternative roles in the backend when
> switching from one or the other.
> - move on to a more complex password aging system. The patch currently
> submitted allows only one verifier per type and per role so this is
> not a complete system. Still, the new catalog table pg_auth_verifiers
> could be later easily extended based on other aging properties that we
> consider adapted to reach this goal.
>
> There are clear concerns about protocol aging and how to facilitate
> the life of users to switch to a new system. Hence, I think that the
> patch could include the following:
> - A compatibility GUC allowed_password_verifiers defaulting to a list
> of verifier protocols that we think are safe. This would be added in
> the patch adding infrastructure for multiple verifiers, with default
> to md5. In the patch adding SCRAM, the value of this parameter is
> changed to md5,scram. If a user create a password verifier with a
> protocol not listed in this parameter we return to him a WARNING.
> ERROR may be too much but opinions are welcome.
> - A superuser cleanup function for pg_auth_verifiers aimed at being
> used by pg_upgrade to do needed cleanup of this catalog based on the
> previous GUC to remove outdated verifiers. Optionally, we could have
> an argument for a list of protocols to clean up.
> Using both things we could leverage the upgrade experience and
> transition between systems. Say even if at some point we decide to
> decommission SCRAM we could reuse the same infrastructure to move on
> to a new major version.
>
> Thoughts?

I am going to mark this patch as returned with feedback because of a
visible lack of interest. It would be nice if I could get some
feedback about the suggestions above to help move on for (why not) a
patch aiming for January's CF.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC] overflow checks optimized away

2015-12-01 Thread Michael Paquier

On Tue, Oct 20, 2015 at 4:25 PM, Michael Paquier wrote:
> I'll add that to the next CF, perhaps this will interest somebody.

And nobody got interested into that, marking as returned with feedback.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Some questions about the array.

2015-12-01 Thread YUriy Zhuravlev

On Tuesday 01 December 2015 15:30:47 Teodor Sigaev wrote:
> As I understand, update should fail with any array, so, first update should
> fail  too. Am I right?
You right. Done. New patch in attach. 
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
index 4385a09..6ee71a5 100644
--- a/doc/src/sgml/array.sgml
+++ b/doc/src/sgml/array.sgml
@@ -257,6 +257,26 @@ SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
 (1 row)
 
 
+  Possible to skip the lower-bound or
+  upper-bound
+  for get first or last element in slice.
+
+
+SELECT schedule[:][:] FROM sal_emp WHERE name = 'Bill';
+
+schedule
+
+ {{meeting,lunch},{training,presentation}}
+(1 row)
+
+SELECT schedule[:2][2:] FROM sal_emp WHERE name = 'Bill';
+
+schedule
+
+ {{lunch},{presentation}}
+(1 row)
+
+
   If any dimension is written as a slice, i.e., contains a colon, then all
   dimensions are treated as slices.  Any dimension that has only a single
   number (no colon) is treated as being from 1
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 29f058c..6643714 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -268,10 +268,12 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 	bool		eisnull;
 	ListCell   *l;
 	int			i = 0,
-j = 0;
+j = 0,
+indexexpr;
 	IntArray	upper,
 lower;
 	int		   *lIndex;
+	AnyArrayType *arrays;
 
 	array_source = ExecEvalExpr(astate->refexpr,
 econtext,
@@ -293,6 +295,7 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 	foreach(l, astate->refupperindexpr)
 	{
 		ExprState  *eltstate = (ExprState *) lfirst(l);
+		eisnull = false;
 
 		if (i >= MAXDIM)
 			ereport(ERROR,
@@ -300,10 +303,23 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 	 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
 			i + 1, MAXDIM)));
 
-		upper.indx[i++] = DatumGetInt32(ExecEvalExpr(eltstate,
-	 econtext,
-	 ,
-	 NULL));
+		if (eltstate == NULL && astate->refattrlength <= 0)
+		{
+			if (isAssignment)
+ereport(ERROR,
+	(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+	 errmsg("cannot determine upper index for empty array")));
+			arrays = (AnyArrayType *)DatumGetArrayTypeP(array_source);
+			indexexpr = AARR_LBOUND(arrays)[i] + AARR_DIMS(arrays)[i] - 1;
+		}
+		else
+			indexexpr = DatumGetInt32(ExecEvalExpr(eltstate,
+   econtext,
+   ,
+   NULL));
+
+		upper.indx[i++] = indexexpr;
+
 		/* If any index expr yields NULL, result is NULL or error */
 		if (eisnull)
 		{
@@ -321,6 +337,7 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 		foreach(l, astate->reflowerindexpr)
 		{
 			ExprState  *eltstate = (ExprState *) lfirst(l);
+			eisnull = false;
 
 			if (j >= MAXDIM)
 ereport(ERROR,
@@ -328,10 +345,20 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 		 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
 j + 1, MAXDIM)));
 
-			lower.indx[j++] = DatumGetInt32(ExecEvalExpr(eltstate,
-		 econtext,
-		 ,
-		 NULL));
+			if (eltstate == NULL)
+			{
+arrays = (AnyArrayType *)DatumGetArrayTypeP(array_source);
+indexexpr = AARR_LBOUND(arrays)[j];
+			}
+			else
+indexexpr = DatumGetInt32(ExecEvalExpr(eltstate,
+	   econtext,
+	   ,
+	   NULL));
+
+			lower.indx[j++] = indexexpr;
+
+
 			/* If any index expr yields NULL, result is NULL or error */
 			if (eisnull)
 			{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 26264cb..a761263 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2417,6 +2417,8 @@ _copyAIndices(const A_Indices *from)
 
 	COPY_NODE_FIELD(lidx);
 	COPY_NODE_FIELD(uidx);
+	COPY_SCALAR_FIELD(lidx_default);
+	COPY_SCALAR_FIELD(uidx_default);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index aa6e102..e75b448 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2162,6 +2162,8 @@ _equalAIndices(const A_Indices *a, const A_Indices *b)
 {
 	COMPARE_NODE_FIELD(lidx);
 	COMPARE_NODE_FIELD(uidx);
+	COMPARE_SCALAR_FIELD(lidx_default);
+	COMPARE_SCALAR_FIELD(uidx_default);
 
 	return true;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 012c14b..ed77c75 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2773,6 +2773,8 @@ _outA_Indices(StringInfo str, const A_Indices *node)
 
 	WRITE_NODE_FIELD(lidx);
 	WRITE_NODE_FIELD(uidx);
+	WRITE_BOOL_FIELD(lidx_default);
+	WRITE_BOOL_FIELD(uidx_default);
 }
 
 static void
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7916df8..f16456f 100644
--- a/src/backend/parser/gram.y
+++

Re: [HACKERS] Proposal: Trigonometric functions in degrees

2015-12-01 Thread Michael Paquier

On Mon, Nov 30, 2015 at 11:24 PM, Michael Paquier
 wrote:
> On Mon, Nov 30, 2015 at 11:11 PM, Tom Lane  wrote:
>> Michael Paquier  writes:
>>> On Mon, Nov 30, 2015 at 10:36 PM, Michael Paquier wrote:
 Instinctively, it seems to me that we had better return Nan for the
 new asind and acosd when being out of range for OSX, Linux will
 complain about an out-of-range error so the code is right in this
 case.
>>
>>> This is still mentioned upthread btw. And it does not seem to be a
>>> good idea to change this platform dependent behavior by throwing an
>>> error on the old functions, neither does it seem user-friendly to have
>>> inconsistent results for the XXX function and its XXXd equivalent.
>>
>> FWIW, I think that probably the best course of action is to go ahead
>> and install POSIX-compliant error checking in the existing functions
>> too.  POSIX:2008 is quite clear about this:
>>
>> : An application wishing to check for error situations should set errno to
>> : zero and call feclearexcept(FE_ALL_EXCEPT) before calling these
>> : functions. On return, if errno is non-zero or fetestexcept(FE_INVALID |
>> : FE_DIVBYZERO | FE_OVERFLOW | FE_UNDERFLOW) is non-zero, an error has
>> : occurred.
>
> OK, I have to admit I didn't know this part.
>
>> Although I'm not too sure about Windows, the inconsistent results
>> we're getting on OS X are certainly from failure to adhere to the spec.
>
> Windows complains about out-of-range errors contrary to OSX on for
> example asin or acos. So for once Linux and Windows agree with each
> other.
>
>> I concur with Peter's opinion that this is material for a separate
>> patch, but it seems like it's one that had better go in first.
>
> (I think you mean Dean here and not Peter).

Dean, are you planning to continue working on this patch? If yes, are
you fine to move it to next CF? It seems that the current consensus is
to split this effort into two patches:
1) Harden error checks for existing functions, particularly the
inconsistencies for asin and acos.
2) Have the new functions in degrees.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-01 Thread Amit Kapila

On Thu, Nov 26, 2015 at 8:11 AM, Robert Haas  wrote:
>
> Attached find a patch that does (mostly) two things.
>

I have started looking into this and would like to share few findings
with you:

-
+ /*
+ * Primitive parallel cost model.  Assume the leader will do half as much
+ * work as a
regular worker, because it will also need to read the tuples
+ * returned by the workers when they
percolate up to the gather ndoe.
+ * This is almost certainly not exactly the right way to model this,
so
+ * this will probably need to be changed at some point...
+ */
+ if (path->parallel_degree >
0)
+ path->rows = path->rows / (path->parallel_degree + 0.5);
+
  if (!enable_seqscan)

startup_cost += disable_cost;

@@ -225,16 +234,6 @@ cost_seqscan(Path *path, PlannerInfo *root,
  cpu_per_tuple
= cpu_tuple_cost + qpqual_cost.per_tuple;
  run_cost += cpu_per_tuple * baserel->tuples;

- /*
- *
Primitive parallel cost model.  Assume the leader will do half as much
- * work as a regular worker, because
it will also need to read the tuples
- * returned by the workers when they percolate up to the gather
ndoe.
- * This is almost certainly not exactly the right way to model this, so
- * this will probably
need to be changed at some point...
- */
- if (nworkers > 0)
- run_cost = run_cost /
(nworkers + 0.5);
-
..

Above and changes in add_path() makes planner *not* to select parallel path
for seq scan where earlier it was possible. I think you want to change the
costing of parallel plans based on rows selected instead of total_cost,
but there seems to be some problem in the logic (I think gather node is not
taking into account the reduced cost).  Consider below case:

create table tbl_parallel_test(c1 int, c2 char(1000));
insert into tbl_parallel_test values(generate_series(1,100),'a');
Analyze tbl_parallel_test;

set max_parallel_degree=6;
Explain select count(*) from tbl_parallel_test where c1 < 1;

Without patch, it is able to use parallel plan for above case and
with patch, it is not able to use it.

- There seems to be some inconsistency in Explain's output when
multiple workers are used.

Case -1
Consider the table is populated as mentioned above.
change max_worker_processes=2 in postgresql.conf
set max_parallel_degree=4;

Explain (Analyze,Verbose) select count(*) from tbl_parallel_test where c1 <
1;

QUERY PL
AN

---
 Aggregate  (cost=46227.78..46227.79 rows=1 width=0) (actual
time=182583.554..18
2583.555 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=1000.00..46203.83 rows=9579 width=0) (actual
time=167930.03
9..182571.654 rows= loops=1)
 Output: c1, c2
 Number of Workers: 4
 ->  Parallel Seq Scan on public.tbl_parallel_test
 (cost=0.00..44245.93
 rows=2129 width=0) (actual time=167904.516..182498.494 rows= loops=3)
   Output: c1, c2
   Filter: (tbl_parallel_test.c1 < 1)
   Rows Removed by Filter: 33
   Worker 0: actual time=167890.584..182491.043 rows=4564
loops=1
   Worker 1: actual time=167893.651..182461.904 rows=2740
loops=1
 Planning time: 0.121 ms
 Execution time: 182588.419 ms
(13 rows)


1. Rows removed by Filter should be 990001.
2. Total rows =  at Gather node are right, but it is not obvious how
the rows by each worker and leader leads to that total.

Case-2

change max_worker_processes=8 in postgresql.conf
set max_parallel_degree=4;

postgres=# Explain (Analyze,Verbose) select count(*) from tbl_parallel_test
wher
e c1 < 1;
  QUERY
PLAN


--
 Aggregate  (cost=46227.78..46227.79 rows=1 width=0) (actual
time=39365.233..393
65.234 rows=1 loops=1)
   Output: count(*)
   ->  Gather  (cost=1000.00..46203.83 rows=9579 width=0) (actual
time=47.485..3
9344.574 rows= loops=1)
 Output: c1, c2
 Number of Workers: 4
 ->  Parallel Seq Scan on public.tbl_parallel_test
 (cost=0.00..44245.93
 rows=2129 width=0) (actual time=11.910..39262.255 rows=2000 loops=5)
   Output: c1, c2
   Filter: (tbl_parallel_test.c1 < 1)
   Rows Removed by Filter: 198000
   Worker 0: actual time=5.931..39249.068 rows=3143 loops=1
   Worker 1: actual time=5.778..39254.504 rows=1687 loops=1
   Worker 2: actual time=0.836..39264.683 rows=2170 loops=1
   Worker 3: actual time=1.101..39251.459 rows=1715 loops=1
 Planning time: 0.123 ms
 Execution time: 39383.296 ms
(15 rows)

The problems reported in previous case are visible in this case as
well.  I think both are due to same problem

Case -3
postgres=# Explain

Re: [HACKERS] Some questions about the array.

2015-12-01 Thread Teodor Sigaev


On Friday 27 November 2015 17:23:35 Teodor Sigaev wrote:

1
Documentation isn't very informative

Added example with different results.

Perfect


2
Seems, error messages are too inconsistent. If you forbid omitting bound in
assigment then if all cases error message should be the same or close.

Done.  Skipping lower boundary is no longer an error.

Thank you for your review.


Much better, but:

# create table xxx (a int[]);
# update xxx set a[2:] = '{1,2}';
UPDATE 0
# insert into xxx values ('{1,2,3,3,4,4,5}');
INSERT 0 1
# update xxx set a[2:] = '{1,2}';
ERROR:  cannot determine upper index for empty array

As I understand, update should fail with any array, so, first update should fail 
too. Am I right?



--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns

2015-12-01 Thread David Rowley

On 1 December 2015 at 17:09, Marko Tiikkaja  wrote:

> On 2015-12-01 05:00, David Rowley wrote:
>
>> We already allow a SELECT's target list to contain non-aggregated columns
>> in a GROUP BY query in cases where the non-aggregated column is
>> functionally dependent on the GROUP BY clause.
>>
>> For example a query such as;
>>
>> SELECT p.product_id,p.description, SUM(s.quantity)
>> FROM product p
>> INNER JOIN sale s ON p.product_id = s.product_id
>> GROUP BY p.product_id;
>>
>> is perfectly fine in PostgreSQL, as p.description is functionally
>> dependent
>> on p.product_id (assuming product_id is the PRIMARY KEY of product).
>>
>
> This has come up before (on other forums, at least), and my main concern
> has been that unlike the case where we go from throwing an error to
> allowing a query, this has a chance to make the planning of currently legal
> queries slower.  Have you tried to measure the impact of this on queries
> where there's no runtime gains to be had?

I've performed a series of benchmarks on the following queries:

Test1: explain select id1,id2 from t1 group by id1,id2;
Test2: explain select id from t2 group by id;
Test3: explain select t1.id1,t1.id2 from t2 inner join t1 on t1.id1=t2.id
group by t1.id1,t1.id2;

I ran each of these with pgbench for 60 seconds, 3 runs per query. In each
case below I've converted the TPS into seconds using the average TPS over
the 3 runs.

In summary:

Test1 is the worst case test. It's a very simple query so planning overhead
of join searching is non-existent. The fact that there's 2 columns in the
GROUP BY means that the fast path cannot be used. I added this as if
there's only 1 column in the GROUP BY then there's no point in searching
for something to remove.

Average (Sec)
Master 0.0001043117
Patched 0.0001118961
Performance 93.22%
Microseconds of planning overhead 7.5844326722

Test2 is a simple query with a GROUP BY which can fast path due to there
being only 1 GROUP BY column.

Average (Sec)
Master 0.99374448
Patched 0.99670124
Performance 99.70%
Microseconds of planning overhead 0.2956763193

Test3 is a slightly more complex and is aimed to show that the percentage
of planning overhead is smaller when joins exist and overall planning cost
becomes higher

Average (Sec)
Master 0.0001797165
Patched 0.0001798406
Performance 99.93%
Microseconds of planning overhead 0.1240776236

Test3 results seem a bit strange, I would have expected more of a slowdown.
I ran the test again to make sure, and it came back with the same results
the 2nd time.

I've attached the spreadsheet that used to collect the results, and also
the raw pgbench output.

It seems that the worst case test adds about 7.6 microseconds onto planning
time. To get this worse case result I had to add two GROUP BY columns, as
having only 1 triggers a fast path as the code knows it can't remove any
columns, since there's only 1. A similar fast path also exists which will
only lookup the PRIMARY KEY details if there's more than 1 column per
relation in the GROUP BY, so for example GROUP BY rel1.col1, rel2.col1
won't lookup any PRIMARY KEY constraint.

Given that the extra code really only does anything if the GROUP BY has 2
or more expressions, are you worried that this will affect too many short
and fast to execute queries negatively?

--
 David Rowley   http://www.2ndQuadrant.com/

 PostgreSQL Development, 24x7 Support, Training & Services

$ psql -c "create table t1 (id1 int, id2 int, primary key(id1,id2));"
$ psql -c "create table t2 (id int primary key);"
$ echo explain select id1,id2 from t1 group by id1,id2 > test1.sql
$ echo explain select id from t2 group by id > test2.sql
$ echo explain select t1.id1,t1.id2 from t2 inner join t1 on t1.id1=t2.id group 
by t1.id1,t1.id2 > test3.sql

Unpatched

 $ pgbench -n -T 60 -f test1.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 575640
latency average: 0.104 ms
tps = 9593.939398 (including connections establishing)
tps = 9594.358669 (excluding connections establishing)
 $ pgbench -n -T 60 -f test1.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 574034
latency average: 0.105 ms
tps = 9567.199370 (including connections establishing)
tps = 9567.619227 (excluding connections establishing)
 $ pgbench -n -T 60 -f test1.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of clients: 1
number of threads: 1
duration: 60 s
number of transactions actually processed: 575861
latency average: 0.104 ms
tps = 9597.621429 (including connections establishing)
tps = 9597.987750 (excluding connections establishing)
 $ pgbench -n -T 60 -f test2.sql
transaction type: Custom query
scaling factor: 1
query mode: simple
number of

Re: [HACKERS] Improving test coverage of extensions with pg_dump

2015-12-01 Thread Michael Paquier

On Tue, Sep 29, 2015 at 9:39 PM, Michael Paquier wrote:
> Perhaps you did not look at the last patch I sent on this thread, but
> I changed it so as a schedule is used with a call to pg_regress.
> That's a more scalable approach as you were concerned about as we can
> plug in more easily new modules and new regression tests without
> modifying the perl script used to kick the tests, though it does not
> run the main regression test suite and it actually cannot run it,
> except with a modified schedule or with a filter of the child-parent
> tables. Patch is attached for consideration, which looks like a good
> base for future support, feel free to disagree :)

It is a bit sad to say, but this patch aiming at adding a test case
for an uncovered code path of pg_dump has not reached an agreement.
The most simple way to reduce the overhead of this test would be to
include something in the main regression test suite and let pg_dump
call incorporated in pg_upgrade test do the work. This has been
suggested upthread already as one way to do things.
Doing diff comparision of pg_dump/restore on the regression database
data set would have been nice, but the column ordering of inherited
tables when a column is added to a child is a can of worms, so that's
not something this patch should try to address.
So marking it as returned with feedback or moving it to next CF?
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Regarding recovery configuration

2015-12-01 Thread Dmitry Ivanov

Hi hackers,

I'd like to share some thoughts on 'recovery.conf'-related variables.


Introduction


The 'recovery.conf' file is used to set up the recovery configuration of a 
PostgreSQL server. Its structure closely resembles that of the 
'postgresql.conf' file whose settings are written to the GUC (stands for Grand 
Unified Configuration) at startup. There are several variables in the 
PostgreSQL 
backend, e.g.,

* recoveryRestoreCommand
* recoveryEndCommand
* recoveryCleanupCommand
* recoveryTarget
  ...

that store values read from the 'recovery.conf' file during server startup. 
Although they represent the recovery configuration, they are not included in 
the GUC and thus are not directly accessible via SHOW. Furthermore, the lack 
of an iterable collection (i.e. sorted array of 'config_generic' structures) of 
these variables significantly complicates the 'readRecoveryCommandFile' 
procedure since it mainly consists of numerous string comparisons and variable 
assignments. Lastly, the 'recovery.conf' is not reloadable, which means that 
there is no way to change recovery settings (e.g. 'primary_conninfo') after 
startup. This feature could come in handy if, for example, a master-slave 
connection terminated not long after the master's password had been changed. 
There would be no need to restart the slave in that case.


Possible changes


There are at least two ways to faciliate the management of these variables:

1. Combine them into a set similar to the GUC and name it something like RUC 
(Recovery Unified Configuration), thus providing separate variable storage for 
each configuration file. Some of its members might be read-only.

2. Add them to the GUC, perhaps with a special context.

Both approaches would allow us to create, for example, a dedicated view for 
the 'recovery.conf' file using a drastically simplified procedure written in C. 
Also it would get easier to reload at least some settings from the 
'recovery.conf' file at runtime.

Suggestions and comments are welcome.

-- 
Dmitry Ivanov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] parallel joins, and better parallel explain

2015-12-01 Thread Amit Kapila

On Tue, Dec 1, 2015 at 5:51 PM, Amit Kapila  wrote:
>
> On Thu, Nov 26, 2015 at 8:11 AM, Robert Haas 
wrote:
> >
> > Attached find a patch that does (mostly) two things.
> >
>
> I have started looking into this and would like to share few findings
> with you:
>
>
> - There seems to be some inconsistency in Explain's output when
> multiple workers are used.
>

Forgot to mention that I have changed code of cost_seqscan() by adding
below line, so that it can select parallel plan.  This is just a temporary
change to verify Explain's output.

cost_seqscan()
{
..

run_cost = run_cost / (path->parallel_degree + 0.5);
..
}

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS]WIP: Covering + unique indexes.

2015-12-01 Thread Anastasia Lubennikova


Finally, completed patch "covering_unique_3.0.patch" is here.
It includes the functionality discussed above in the thread, regression 
tests and docs update.

I think it's quite ready for review.

_Future work:_
Besides that, I'd like to get feedback about attached patch 
"optional_opclass_3.0.patch".

It should be applied on the "covering_unique_3.0.patch".

Actually, this patch is the first step to do opclasses for "included" 
columns optional

and implement real covering indexing.

Example:
CREATE TABLE tbl (c1 int, c4 box);
CREATE UNIQUE INDEX idx ON tbl USING btree (c1) INCLUDING (c4);

If we don't need c4 as an index scankey, we don't need any btree opclass 
on it.

But we still want to have it in covering index for queries like

SELECT c4 FROM tbl WHERE c1=1000; // uses the IndexOnlyScan
SELECT * FROM tbl WHERE c1=1000; // uses the IndexOnlyScan

The patch "optional_opclass" completely ignores opclasses of included 
attributes.

To see the difference, look at the explain analyze output:

explain analyze select * from tbl where c1=2 and c4 && box '(0,0,1,1)';
  QUERY PLAN
---
 Index Only Scan using idx on tbl  (cost=0.13..4.15 rows=1 width=36) 
(actual time=0.010..0.013 rows=1 loops=1)

   Index Cond: (c1 = 2)
   Filter: (c4 && '(1,1),(0,0)'::box)

"Index Cond" shows the index ScanKey conditions and "Filter" is for 
conditions which are used after index scan. Anyway it is faster than 
SeqScan that we had before, because IndexOnlyScan avoids extra heap fetches.


As I already said, this patch is just WIP, so included opclass is not 
"optional" but actually "ignored".
And following example works worse than without the patch. Please, don't 
care about it.


CREATE TABLE tbl2 (c1 int, c2 int);
CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2);
explain analyze select * from tbl2 where c1<20 and c2<5;
  QUERY PLAN
---
 Index Only Scan using idx2 on tbl2  (cost=0.28..4.68 rows=10 width=8) 
(actual time=0.055..0.066 rows=9 loops=1)

   Index Cond: (c1 < 20)
   Filter: (c2 < 5)

The question is more about suitable syntax.
We have two different optimizations here:
1. INCLUDED columns
2. Optional opclasses
It's logical to provide optional opclasses only for included columns.
Is it ok, to handle it using the same syntax and resolve all opclass 
conflicts while create index?


CREATE TABLE tbl2 (c1 int, c2 int, c4 box);
CREATE UNIQUE INDEX idx2 ON tbl2 USING btree (c1) INCLUDING (c2, c4);
CREATE UNIQUE INDEX idx3 ON tbl2 USING btree (c1) INCLUDING (c4, c2);

Of course, order of attributes is important.
Attrs which have oplass and want to use it in ScanKey must be situated 
before the others.
idx2 will use c2 in IndexCond, while idx3 will not. But I think that 
it's the job for DBA.


If you see any related changes in planner, please mention them. I 
haven't explored that part of code yet and could have missed something.


--
Anastasia Lubennikova
Postgres Professional:http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index b4ea227..4973e1b 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -644,6 +644,13 @@
   Does an index of this type manage fine-grained predicate locks?
  
 
+  
+  amcanincluding
+  bool
+  
+  Does the access method support included columns?
+ 
+
  
   amkeytype
   oid
@@ -3704,6 +3711,14 @@
   pg_class.relnatts)
  
 
+  
+  indnkeyatts
+  int2
+  
+  The number of key columns in the index. "Key columns" are ordinary
+  index columns in contrast with "included" columns.
+ 
+
  
   indisunique
   bool
diff --git a/doc/src/sgml/indexam.sgml b/doc/src/sgml/indexam.sgml
index 1c09bae..0287c62 100644
--- a/doc/src/sgml/indexam.sgml
+++ b/doc/src/sgml/indexam.sgml
@@ -765,9 +765,11 @@ amrestrpos (IndexScanDesc scan);
   
PostgreSQL enforces SQL uniqueness constraints
using unique indexes, which are indexes that disallow
-   multiple entries with identical keys.  An access method that supports this
+   multiple entries with identical keys. An access method that supports this
feature sets pg_am.amcanunique true.
-   (At present, only b-tree supports it.)
+   Columns included with clause INCLUDING  aren't used to enforce uniqueness.
+   An access method that supports this feature sets pg_am.amcanincluding true.
+   (At present, only b-tree supports them.)
   
 
   
diff --git a/doc/src/sgml/indices.sgml b/doc/src/sgml/indices.sgml
index 2d131c9..5178834 100644
--- a/doc/src/sgml/indices.sgml
+++ b/doc/src/sgml/indices.sgml
@@ -633,7 +633,8 @@ CREATE INDEX

Re: [HACKERS] proposal: multiple psql option -c

2015-12-01 Thread Michael Paquier

On Tue, Dec 1, 2015 at 11:46 AM, Pavel Stehule  wrote:
> 2015-11-30 15:17 GMT+01:00 Michael Paquier :
>> Removing some items from the list of potential actions and creating a
>> new sublist listing action types is a bit weird. Why not grouping them
>> together and allow for example -l as well in the list of things that
>> is considered as a repeatable action? It seems to me that we could
>> simplify the code this way, and instead of ACT_NOTHING we could check
>> if the list of actions is empty or not.
>
>
> fixed

Thanks for the patch. I have to admit that adding ACT_LIST_DB in the
list of actions was not actually a good idea. It makes the patch
uglier.

Except that, I just had an in-depth look at this patch, and there are
a couple of things that looked strange:
- ACT_LIST_DB would live better if removed from the list of actions
and be used as a separate, independent option. My previous suggestion
was unadapted. Sorry.
- There is not much meaning to have simple_action_list_append and all
its structures in common.c and common.h. Its use is limited in
startup.c (this code is basically a duplicate of dumputils.c still
things are rather different, justifying the duplication) and
centralized around parse_psql_options.
- use_stdin is not necessary. It is sufficient to rely on actions.head
== NULL instead.
- The documentation is pretty clear. That's nice.
Attached is a patch implementing those suggestions. This simplifies
the code without changing its usefulness. If you are fine with those
changes I will switch this patch as ready for committer.
Regards,
-- 
Michael
diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 5899bb4..2928c92 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -38,9 +38,10 @@ PostgreSQL documentation
  PostgreSQL. It enables you to type in
  queries interactively, issue them to
  PostgreSQL, and see the query results.
- Alternatively, input can be from a file. In addition, it provides a
- number of meta-commands and various shell-like features to
- facilitate writing scripts and automating a wide variety of tasks.
+ Alternatively, input can be from a file or from command line
+ arguments. In addition, it provides a number of meta-commands and various
+ shell-like features to facilitate writing scripts and automating a wide
+ variety of tasks.
 
  
 
@@ -89,38 +90,45 @@ PostgreSQL documentation
   --command=command
   
   
-  Specifies that psql is to execute one
-  command string, command,
-  and then exit. This is useful in shell scripts. Start-up files
-  (psqlrc and ~/.psqlrc) are
-  ignored with this option.
+  Specifies that psql is to execute the given
+  command string, command.
+  This option can be repeated and combined in any order with
+  the -f option.
   
   
-  command must be either
-  a command string that is completely parsable by the server (i.e.,
-  it contains no psql-specific features),
-  or a single backslash command. Thus you cannot mix
-  SQL and psql
-  meta-commands with this option. To achieve that, you could
-  pipe the string into psql, for example:
-  echo '\x \\ SELECT * FROM foo;' | psql.
+  command must be either a
+  command string that is completely parsable by the server (i.e., it
+  contains no psql-specific features), or a
+  single backslash command. Thus you cannot mix
+  SQL and psql meta-commands
+  with this option. To achieve that, you could use
+  repeated -c options or pipe the string
+  into psql, for example:
+  psql -c '\x' -c 'SELECT * FROM foo;' or
+  echo '\x \\ SELECT * FROM foo;' | psql
   (\\ is the separator meta-command.)
   
   
-   If the command string contains multiple SQL commands, they are
-   processed in a single transaction, unless there are explicit
-   BEGIN/COMMIT commands included in the
-   string to divide it into multiple transactions.  This is
-   different from the behavior when the same string is fed to
-   psql's standard input.  Also, only
-   the result of the last SQL command is returned.
+   Each command string passed to -c is sent to the server
+   as a single query. Because of this, the server executes it as a single
+   transaction, even if a command string contains
+   multiple SQL commands, unless there are
+   explicit BEGIN/COMMIT commands included in the
+   string to divide it into multiple transactions.  Also, the server only
+   returns the result of the last SQL command to the
+   client.  This is different from the behavior when the same string with
+   multiple SQL commands is fed
+   to psql's standard input because
+   then psql sends each SQL
+   command separately.
   
   
-   Because of these legacy

Re: [HACKERS] Some questions about the array.

2015-12-01 Thread Merlin Moncure

On Mon, Nov 30, 2015 at 3:05 PM, YUriy Zhuravlev
 wrote:
> On Monday 30 November 2015 08:58:49 you wrote:
>> +1  IMO this line of thinking is a dead end.  Better handled via
>> functions, not syntax
>
> Maybe then add array_pyslice(start, end) when start is 0 and with negative
> indexes? Only for 1D array.
> What do you think?

TBH, I'm not really thrilled about the concept in general; it (zero
based indexing support) doesn't meet the standard of necessity for
adding to the core API and as stated it's much to magical.  If it was
me, I'd be making a pgxn extension for zero based arrays that are zero
based;  it could be 100% SQL wrappers so I'd be pretty easy to install
for interested parties.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] New gist vacuum.

2015-12-01 Thread Костя Кузнецов

Thank you, Jeff.I reworking patch now. All // warning will be deleted.About memory consumption new version will control size of stack and will operate with map of little size because i want delete old style vacuum(now if maintenance_work_mem less than needed to build info map we use old-style vacuum with logical order).

Re: [HACKERS] Some questions about the array.

2015-12-01 Thread YUriy Zhuravlev

On Tuesday 01 December 2015 15:43:47 you wrote:
> On Tuesday 01 December 2015 15:30:47 Teodor Sigaev wrote:
> > As I understand, update should fail with any array, so, first update
> > should
> > fail  too. Am I right?
> 
> You right. Done. New patch in attach.

Found error when omitted lower bound in INSERT like this:
INSERT INTO arrtest_s (a[:2], b[1:2]) VALUES ('{1,2,3,4,5}', '{7,8,9}');

I fix it in new patch.  Lower bound for new array is 1 by default.

Thanks. 
-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/doc/src/sgml/array.sgml b/doc/src/sgml/array.sgml
index 4385a09..6ee71a5 100644
--- a/doc/src/sgml/array.sgml
+++ b/doc/src/sgml/array.sgml
@@ -257,6 +257,26 @@ SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill';
 (1 row)
 
 
+  Possible to skip the lower-bound or
+  upper-bound
+  for get first or last element in slice.
+
+
+SELECT schedule[:][:] FROM sal_emp WHERE name = 'Bill';
+
+schedule
+
+ {{meeting,lunch},{training,presentation}}
+(1 row)
+
+SELECT schedule[:2][2:] FROM sal_emp WHERE name = 'Bill';
+
+schedule
+
+ {{lunch},{presentation}}
+(1 row)
+
+
   If any dimension is written as a slice, i.e., contains a colon, then all
   dimensions are treated as slices.  Any dimension that has only a single
   number (no colon) is treated as being from 1
diff --git a/src/backend/executor/execQual.c b/src/backend/executor/execQual.c
index 29f058c..f300b31 100644
--- a/src/backend/executor/execQual.c
+++ b/src/backend/executor/execQual.c
@@ -268,10 +268,12 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 	bool		eisnull;
 	ListCell   *l;
 	int			i = 0,
-j = 0;
+j = 0,
+indexexpr;
 	IntArray	upper,
 lower;
 	int		   *lIndex;
+	AnyArrayType *arrays;
 
 	array_source = ExecEvalExpr(astate->refexpr,
 econtext,
@@ -293,6 +295,7 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 	foreach(l, astate->refupperindexpr)
 	{
 		ExprState  *eltstate = (ExprState *) lfirst(l);
+		eisnull = false;
 
 		if (i >= MAXDIM)
 			ereport(ERROR,
@@ -300,10 +303,23 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 	 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
 			i + 1, MAXDIM)));
 
-		upper.indx[i++] = DatumGetInt32(ExecEvalExpr(eltstate,
-	 econtext,
-	 ,
-	 NULL));
+		if (eltstate == NULL && astate->refattrlength <= 0)
+		{
+			if (isAssignment)
+ereport(ERROR,
+	(errcode(ERRCODE_ARRAY_SUBSCRIPT_ERROR),
+	 errmsg("cannot determine upper index for empty array")));
+			arrays = (AnyArrayType *)DatumGetArrayTypeP(array_source);
+			indexexpr = AARR_LBOUND(arrays)[i] + AARR_DIMS(arrays)[i] - 1;
+		}
+		else
+			indexexpr = DatumGetInt32(ExecEvalExpr(eltstate,
+   econtext,
+   ,
+   NULL));
+
+		upper.indx[i++] = indexexpr;
+
 		/* If any index expr yields NULL, result is NULL or error */
 		if (eisnull)
 		{
@@ -321,6 +337,7 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 		foreach(l, astate->reflowerindexpr)
 		{
 			ExprState  *eltstate = (ExprState *) lfirst(l);
+			eisnull = false;
 
 			if (j >= MAXDIM)
 ereport(ERROR,
@@ -328,10 +345,25 @@ ExecEvalArrayRef(ArrayRefExprState *astate,
 		 errmsg("number of array dimensions (%d) exceeds the maximum allowed (%d)",
 j + 1, MAXDIM)));
 
-			lower.indx[j++] = DatumGetInt32(ExecEvalExpr(eltstate,
-		 econtext,
-		 ,
-		 NULL));
+			if (eltstate == NULL)
+			{
+if (*isNull)
+	indexexpr = 1;
+else
+{
+	arrays = (AnyArrayType *)DatumGetArrayTypeP(array_source);
+	indexexpr = AARR_LBOUND(arrays)[j];
+}
+			}
+			else
+indexexpr = DatumGetInt32(ExecEvalExpr(eltstate,
+	   econtext,
+	   ,
+	   NULL));
+
+			lower.indx[j++] = indexexpr;
+
+
 			/* If any index expr yields NULL, result is NULL or error */
 			if (eisnull)
 			{
diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c
index 26264cb..a761263 100644
--- a/src/backend/nodes/copyfuncs.c
+++ b/src/backend/nodes/copyfuncs.c
@@ -2417,6 +2417,8 @@ _copyAIndices(const A_Indices *from)
 
 	COPY_NODE_FIELD(lidx);
 	COPY_NODE_FIELD(uidx);
+	COPY_SCALAR_FIELD(lidx_default);
+	COPY_SCALAR_FIELD(uidx_default);
 
 	return newnode;
 }
diff --git a/src/backend/nodes/equalfuncs.c b/src/backend/nodes/equalfuncs.c
index aa6e102..e75b448 100644
--- a/src/backend/nodes/equalfuncs.c
+++ b/src/backend/nodes/equalfuncs.c
@@ -2162,6 +2162,8 @@ _equalAIndices(const A_Indices *a, const A_Indices *b)
 {
 	COMPARE_NODE_FIELD(lidx);
 	COMPARE_NODE_FIELD(uidx);
+	COMPARE_SCALAR_FIELD(lidx_default);
+	COMPARE_SCALAR_FIELD(uidx_default);
 
 	return true;
 }
diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 012c14b..ed77c75 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2773,6

Re: [HACKERS] Some questions about the array.

2015-12-01 Thread YUriy Zhuravlev

On Tuesday 01 December 2015 08:38:21 you wrote:
> it (zero
> based indexing support) doesn't meet the standard of necessity for
> adding to the core API and as stated it's much to magical. 

We do not touch the arrays, we simply create a function to access them with a 
comfortable behavior. Creating a separate array types in the form extension is 
very difficult IMHO.

-- 
YUriy Zhuravlev
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Cube extension kNN support

2015-12-01 Thread Teodor Sigaev


Patch looks good, but there ara some review notices:
1 gmake installcheck fails:
*** /.../pgsql/contrib/cube/expected/cube_1.out   2015-12-01 17:49:01.768764000 
+0300
--- /.../pgsql/contrib/cube/results/cube.out  2015-12-01 17:49:12.190818000 
+0300

***
*** 1382,1388 
  (1 row)

  -- Test of distances
! --
  SELECT cube_distance('(1,1)'::cube, '(4,5)'::cube);
   cube_distance
  ---
--- 1382,1388 
  (1 row)

  -- Test of distances
! --
  SELECT cube_distance('(1,1)'::cube, '(4,5)'::cube);
   cube_distance

Seems, there a extra space at the end of string

2 Pls, don use in C-code magick constants like 'case 16:'. Use macros to define 
some human-readable name (g_cube_distance())


3 Switch in g_cube_distance(): default switch path should generate a error. It 
just simplifies a degbug process, may be in future.


4 Docs: pls, don't use a strings with unlimited length.


Stas Kelvich wrote:

Hello.

That is updated version of the patch with proper update scripts.

Also i’ve noted that documentation states the wrong thing:

“It does not matter which order the opposite corners of a cube are entered in. The cube 
functions automatically swap values if needed to create a uniform "lower left — upper 
right" internal representation."

But in practice cubes stored "as is" and that leads to problems with getting 
cubes sorted along specific dimension directly from index.
As a simplest workaround i’ve deleted that sentence from docs and implemented two 
coordinate getters -> and ~>. First one returns
coordinate of cube as it stored, and second returns coordinate of cube 
normalised to (LL,UR)-form.

Other way to fix thing is to force ’normalization’ while creating cube. But 
that can produce wrong sorts with already existing data.


On 09 Jul 2015, at 16:40, Alexander Korotkov  wrote:

Hi!

On Sat, May 9, 2015 at 6:53 AM, Stas Kelvich  wrote:
Patch is pretty ready, last issue was about changed extension interface, so 
there should be migration script and version bump.
Attaching a version with all migration stuff.

I can't see cube--1.0--1.1.sql in the patch. Did forget to include it?

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Stas Kelvich
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company






--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] Asynchronous execution again (which is not parallel)

2015-12-01 Thread Amit Kapila

On Mon, Nov 30, 2015 at 6:17 PM, Kyotaro HORIGUCHI <
horiguchi.kyot...@lab.ntt.co.jp> wrote:
>
>
> == TODO or random thoughts, not restricted on this patch.
>
> - This patch doesn't contain planner part, it must be aware of
>   async execution in order that this can be  in effective.
>

How will you decide whether sync-execution is cheaper than parallel
execution.  Do you have some specific cases in mind where async
execution will be more useful than parallel execution?

>
> - Some measture to control execution on bgworker would be
>   needed. At least merge join requires position mark/reset
>   functions.
>
> - Currently, more tuples make reduce effectiveness of parallel
>   execution, some method to transfer tuples in larger unit would
>   be needed, or would be good to have shared workmem?
>

Yeah, I think here one thing we need to figure out is whether the
performance bottleneck is due to the amount of data that is transferred
between worker and master or something else. One idea could be to pass
TID and may be keep the buffer pin (which will be released by master
backend), but on the other hand if we have to perform costly target list
evaluation by bgworker, then it might be beneficial to pass the projected
list back.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: [HACKERS] [PROPOSAL] Improvements of Hunspell dictionaries support

2015-12-01 Thread Artur Zakirov


On 16.11.2015 15:51, Artur Zakirov wrote:

On 10.11.2015 13:23, Artur Zakirov wrote:


Link to patch in commitfest:
https://commitfest.postgresql.org/8/420/

Link to regression tests:
https://dl.dropboxusercontent.com/u/15423817/HunspellDictTest.tar.gz


I have done some changes in documentation in the section "12.6. 
Dictionaries". I have added some description how to load Ispell and 
Hunspell dictionaries and description about Ispell and Hunspell formats.


Patches for the documentation and for the code are attached separately.

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
*** a/src/backend/tsearch/spell.c
--- b/src/backend/tsearch/spell.c
***
*** 153,159  cmpspell(const void *s1, const void *s2)
  static int
  cmpspellaffix(const void *s1, const void *s2)
  {
! 	return (strncmp((*(SPELL *const *) s1)->p.flag, (*(SPELL *const *) s2)->p.flag, MAXFLAGLEN));
  }
  
  static char *
--- 153,159 
  static int
  cmpspellaffix(const void *s1, const void *s2)
  {
! 	return (strncmp((*(SPELL *const *) s1)->flag, (*(SPELL *const *) s2)->flag, MAXFLAGLEN));
  }
  
  static char *
***
*** 237,242  cmpaffix(const void *s1, const void *s2)
--- 237,309 
  	   (const unsigned char *) a2->repl);
  }
  
+ static unsigned short
+ decodeFlag(IspellDict *Conf, char *sflag, char **sflagnext)
+ {
+ 	unsigned short	s;
+ 	char		   *next;
+ 
+ 	switch (Conf->flagMode)
+ 	{
+ 		case FM_LONG:
+ 			s = (int)sflag[0] << 8 | (int)sflag[1];
+ 			if (sflagnext)
+ *sflagnext = sflag + 2;
+ 			break;
+ 		case FM_NUM:
+ 			s = (unsigned short) strtol(sflag, , 10);
+ 			if (sflagnext)
+ 			{
+ if (next)
+ {
+ 	*sflagnext = next;
+ 	while (**sflagnext)
+ 	{
+ 		if (**sflagnext == ',')
+ 		{
+ 			*sflagnext = *sflagnext + 1;
+ 			break;
+ 		}
+ 		*sflagnext = *sflagnext + 1;
+ 	}
+ }
+ else
+ 	*sflagnext = 0;
+ 			}
+ 			break;
+ 		default:
+ 			s = (unsigned short) *((unsigned char *)sflag);
+ 			if (sflagnext)
+ *sflagnext = sflag + 1;
+ 	}
+ 
+ 	return s;
+ }
+ 
+ static bool
+ isAffixFlagInUse(IspellDict *Conf, int affix, unsigned short affixflag)
+ {
+ 	char *flagcur;
+ 	char *flagnext = 0;
+ 
+ 	if (affixflag == 0)
+ 		return true;
+ 
+ 	flagcur = Conf->AffixData[affix];
+ 
+ 	while (*flagcur)
+ 	{
+ 		if (decodeFlag(Conf, flagcur, ) == affixflag)
+ 			return true;
+ 		if (flagnext)
+ 			flagcur = flagnext;
+ 		else
+ 			break;
+ 	}
+ 
+ 	return false;
+ }
+ 
  static void
  NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
  {
***
*** 255,261  NIAddSpell(IspellDict *Conf, const char *word, const char *flag)
  	}
  	Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
  	strcpy(Conf->Spell[Conf->nspell]->word, word);
! 	strlcpy(Conf->Spell[Conf->nspell]->p.flag, flag, MAXFLAGLEN);
  	Conf->nspell++;
  }
  
--- 322,328 
  	}
  	Conf->Spell[Conf->nspell] = (SPELL *) tmpalloc(SPELLHDRSZ + strlen(word) + 1);
  	strcpy(Conf->Spell[Conf->nspell]->word, word);
! 	Conf->Spell[Conf->nspell]->flag = (*flag != '\0') ? cpstrdup(Conf, flag) : VoidString;
  	Conf->nspell++;
  }
  
***
*** 355,361  FindWord(IspellDict *Conf, const char *word, int affixflag, int flag)
  	else if ((flag & StopMiddle->compoundflag) == 0)
  		return 0;
  
! 	if ((affixflag == 0) || (strchr(Conf->AffixData[StopMiddle->affix], affixflag) != NULL))
  		return 1;
  }
  node = StopMiddle->node;
--- 422,428 
  	else if ((flag & StopMiddle->compoundflag) == 0)
  		return 0;
  
! 	if (isAffixFlagInUse(Conf, StopMiddle->affix, affixflag))
  		return 1;
  }
  node = StopMiddle->node;
***
*** 394,400  NIAddAffix(IspellDict *Conf, int flag, char flagflags, const char *mask, const c
  
  	Affix = Conf->Affix + Conf->naffixes;
  
! 	if (strcmp(mask, ".") == 0)
  	{
  		Affix->issimple = 1;
  		Affix->isregis = 0;
--- 461,467 
  
  	Affix = Conf->Affix + Conf->naffixes;
  
! 	if (strcmp(mask, ".") == 0 || *mask == '\0')
  	{
  		Affix->issimple = 1;
  		Affix->isregis = 0;
***
*** 429,443  NIAddAffix(IspellDict *Conf, int flag, char flagflags, const char *mask, const c
  		err = pg_regcomp(&(Affix->reg.regex), wmask, wmasklen,
  		 REG_ADVANCED | REG_NOSUB,
  		 DEFAULT_COLLATION_OID);
  		if (err)
! 		{
! 			char		errstr[100];
! 
! 			pg_regerror(err, &(Affix->reg.regex), errstr, sizeof(errstr));
! 			ereport(ERROR,
! 	(errcode(ERRCODE_INVALID_REGULAR_EXPRESSION),
! 	 errmsg("invalid regular expression: %s", errstr)));
! 		}
  	}
  
  	Affix->flagflags = flagflags;
--- 496,504 
  		err = pg_regcomp(&(Affix->reg.regex), wmask, wmasklen,
  		 REG_ADVANCED | REG_NOSUB,
  		 DEFAULT_COLLATION_OID);
+ 		/* Ignore regular expression error and do not add wrong affix */
  		if (err)
! 			return;
  	}
  
  	Affix->flagflags = flagflags;

Re: [HACKERS] eXtensible Transaction Manager API

2015-12-01 Thread Bruce Momjian

On Tue, Nov 17, 2015 at 12:48:38PM -0500, Robert Haas wrote:
> > At this time, the number of round trips needed particularly for READ
> > COMMITTED transactions that need a new snapshot for each query was
> > really a performance killer. We used DBT-1 (TPC-W) which is less
> > OLTP-like than DBT-2 (TPC-C), still with DBT-1 the scalability limit
> > was quickly reached with 10-20 nodes..
> 
> Yeah.  I think this merits a good bit of thought.  Superficially, at
> least, it seems that every time you need a snapshot - which in the
> case of READ COMMITTED is for every SQL statement - you need a network
> roundtrip to the snapshot server.  If multiple backends request a
> snapshot in very quick succession, you might be able to do a sort of
> "group commit" thing where you send a single request to the server and
> they all use the resulting snapshot, but it seems hard to get very far
> with such optimization.  For example, if backend 1 sends a snapshot
> request and backend 2 then realizes that it also needs a snapshot, it
> can't just wait for the reply from backend 1 and use that one.  The
> user might have committed a transaction someplace else and then kicked
> off a transaction on backend 2 afterward, expecting it to see the work
> committed earlier.  But the snapshot returned to backend 1 might have
> been taken before that.  So, all in all, this seems rather crippling.
> 
> Things are better if the system has a single coordinator node that is
> also the arbiter of commits and snapshots.  Then, it can always take a
> snapshot locally with no network roundtrip, and when it reaches out to
> a shard, it can pass along the snapshot information with the SQL query
> (or query plan) it has to send anyway.  But then the single
> coordinator seems like it becomes a bottleneck.  As soon as you have
> multiple coordinators, one of them has got to be the arbiter of global
> ordering, and now all of the other coordinators have to talk to it
> constantly.

I think the performance benefits of having a single coordinator are
going to require us to implement different snapshot/transaction code
paths for single coordinator and multi-coordinator cases.  :-(  That is,
people who can get by with only a single coordinator are going to want
to do that to avoid the overhead of multiple coordinators, while those
using multiple coordinators are going to have to live with that
overhead.

I think an open question is what workloads can use a single coordinator?
Read-only?  Long queries?  Are those cases also ones where the
snapshot/transaction overhead is negligible, meaning we don't need the
single coordinator optimizations?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Error with index on unlogged table

2015-12-01 Thread Michael Paquier

On Tue, Dec 1, 2015 at 3:06 PM, Kyotaro HORIGUCHI
 wrote:
> At Tue, 1 Dec 2015 11:53:35 +0900, Michael Paquier 
>  wrote in 
> 
>> On Tue, Dec 1, 2015 at 11:11 AM, Kyotaro HORIGUCHI
>>  wrote:
>> > Hello, I studied your latest patch.
>>
>> Thanks!
>>
>> > I feel quite uncomfortable that it solves the problem from a kind
>> > of nature of unlogged object by arbitrary flagging which is not
>> > fully corresponds to the nature. If we can deduce the necessity
>> > of fsync from some nature, it would be preferable.
>>
>> INIT_FORKNUM is not something only related to unlogged relations,
>> indexes use them as well. And that's actually
>> If you look at for example BRIN indexes that do not sync immediately
>> their INIT_FORKNUM when index is created, I think that we still are
>> going to need a new flag to control the sync at WAL replay because
>> startup process cannot know a relation's persistence, thing that we
>> can know when the XLOG_FPI record is created. For BRIN indexes, we
>> want particularly to not sync the INIT_FORKNUM when the relation is
>> not an unlogged one.
>
> (The comment added in brinbuildempty looks wrong since it
>  actually doesn't fsync it immediately..)
>
> Hmm, I've already seen that, and having your explanation I wonder
> why brinbuidempty issues WAL for what is not necessary to be
> persistent at the mement. Isn't it breaking agreements about
> Write Ahead Log? INIT_FORKNUM and unconditionally fsync'ing would
> be equally tied excluding the anormally about WAL. (Except for
> succeeding newpages.)

Alvaro, your thoughts regarding those lines? When building an empty
INIT_FORKNUM for a brin index its data is saved into a shared buffer
and not immediately synced into disk. Shouldn't that be necessary for
at least unlogged relations?

>> > In short, it seems to me that the reason to choose using
>> > XLOG_FPI_FOR_SYNC here is only performance of processing
>> > successive FPIs for INIT_FORKNUM.
>>
>> Yeah, there is a one-way link between this WAL record a INIT_FORKNUM.
>> However please note that having a INIT_FORKNUM does not always imply
>> that a sync is wanted. copy_relation_data is an example of that.
>
> As I wrote above, I suppose we should fix(?) the irregular
> relationship between WAL and init fork of brin and so.

Yep.

>> > INIT_FORKNUM is generated only for unlogged tables and their
>> > belongings. I suppose such successive fsyncs doesn't cause
>> > observable performance drop assuming that the number of unlogged
>> > tables and belongings is not so high, especially with smarter
>> > storages. All we should do is that just fsync only for
>> > INIT_FORKNUM's FPIs for the case. If the performance does matter
>> > even so, we still can fsync the last md-file when any wal record
>> > other than FPI for INIT_FORK comes. (But this would be a bit
>> > complex..)
>>
>> Hm. If a system uses a bunch of temporary relations with brin index or
>> other included I would not say so. For back branches we may have to do
>> it unconditionally using INIT_FORKNUM, but having a control flag to
>> have it only done for unlogged relations would leverage that.
>
> It could, and should do so. And if we take such systems with
> bunch of temp relations as significant (I agree with this),
> XLogRegisterBlock() looks to be able to register multiple blocks
> into single wal record and we could eliminate arbitrary flagging
> on individual FPI records using it. Is it possible?

I thought about using a BKPBLOCK flag but all of them are already
taken if that's what you meant. it seems cheaper to do that a record
level...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Use pg_rewind when target timeline was switched

2015-12-01 Thread Alexander Korotkov

On Sat, Nov 28, 2015 at 2:27 PM, Michael Paquier 
wrote:

> On Fri, Nov 27, 2015 at 11:42 PM, Teodor Sigaev  wrote:
> > Seems, patch is ready to commit. But it needs some documentation.
>
> Of what kind? The documentation of pg_rewind is rather explicit on the
> subject and looks fine as-is, and that's what Alexander and I agreed
> on upthread. If there is something forgotten though, this may be the
> fact that we do not mention in 9.5's documentation that pg_rewind can
> *not* handle timeline switches.
>

However, we can add some explicit statements about new pg_rewind
capabilities. Please, check attached patch for those statements.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


pg-rewind-target-switch-7.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Another little thing about psql wrapped expanded output

2015-12-01 Thread Tom Lane

regression=# \pset format wrapped
Output format is wrapped.
regression=# \x
Expanded display is on.
regression=# select * from int8_tbl;
-[ RECORD 1 ]---
q1 | 123
q2 | 456
-[ RECORD 2 ]---
q1 | 123
q2 | 4567890123456789
-[ RECORD 3 ]---
q1 | 4567890123456789
q2 | 123
-[ RECORD 4 ]---
q1 | 4567890123456789
q2 | 4567890123456789
-[ RECORD 5 ]---
q1 | 4567890123456789
q2 | -4567890123456789

Notice that the dashed lines go all the way to the right margin of my
80-column terminal window, even though the data requires no more than
22 columns.  While this doesn't look so awful as-is, when I'm working
in a very wide window it starts to look a little silly.

The behavior I'd have expected is that if the data is narrower than
the window, the lines only go to the right margin of the data.  This
is a trivial change to the logic in print_aligned_vertical, but before
I go make it, does anyone want to argue that the current behavior is
preferable to that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] custom function for converting human readable sizes to bytes

2015-12-01 Thread Pavel Stehule

2015-12-01 11:12 GMT+01:00 Pavel Stehule :

>
>
> 2015-12-01 11:02 GMT+01:00 Kyotaro HORIGUCHI <
> horiguchi.kyot...@lab.ntt.co.jp>:
>
>> Hello, The meaning of "be orange" (I couldn't find it) interests
>> me but putting it aside..
>>
>> I have some random comments.
>>
>> At Mon, 30 Nov 2015 18:30:18 +0100, Pavel Stehule <
>> pavel.steh...@gmail.com> wrote in <
>> cafj8prcd6we3tqmr0vbcn98wf0p3o3h6nau4btaoswcj443...@mail.gmail.com>
>> > Hi
>> >
>> >
>> > > 2. using independent implementation - there is some redundant code,
>> but we
>> > > can support duble insted int, and we can support some additional
>> units.
>> > >
>> >
>> >  new patch is based on own implementation - use numeric/bigint
>> calculations
>> >
>> > + regress tests and doc
>>
>> 1. What do you think about binary byte prefixes? (kiB, MiB...)
>>
>
> I don't know this mechanics - can you write it? It can be good idea/
>
>
>>
>>  I couldn't read what Robert wrote upthread but I also would like
>>  to have 2-base byte unit prifixes. (Sorry I couldn't put it
>>  aside..)
>>
>>
>> 2. Doesn't it allow units in lower case?
>>
>
> The parser is consistent with a behave used in configure file processing.
> We can change it, but then there will be divergence between this function
> and GUC parser.
>
>
>>
>>  I think '1gb' and so should be acceptable as an input.
>>
>>
>> 3. Why are you allow positive sign prefixing digits? I suppose
>>   that positive sign also shouldn't be allowed if it rejects
>>   prifixing negative sign.
>>
>
> I have not strong opinion about it. '-' is exactly wrong, but '+' can be
> acceptable. But it can be changed.
>
>
>>
>> 4. Useless includes
>>
>>  dbsize.c is modified to include guc.h but it is useless.
>>
>
> I'll fix it.
>
>
>>
>> 5. Error message
>>
>> +   if (ndigits == 0)
>> +   ereport(ERROR,
>> +   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> +errmsg("string is empty")));
>>
>>  If pg_size_bytes allows prefixing positive sign or spaces,
>>  ndigits == 0 here doesn't mean that the whole string is empty.
>>
>>
> I'll fix it.
>
>
>>
>> 6. Symmetry with pg_size_pretty
>>
>>  pg_size_pretty doesn't have units larger than TB. Just adding PB
>>  to it breaks backward compatibility, but leaving as it is breaks
>>  symmetry with this function. Some possible solution for this
>>  that I can guess for now are..
>>
>
> I prefer a warning about possible compatibility issue in pg_size_pretty
> and support PB directly there.
>
>
>>
>>   - Leave it as is.
>>
>>   - New GUC to allow PB for pg_size_pretty().
>>
>>   - Expanded function such like pg_size_pretty2 (oops..)
>>
>>   - New polymorphic (sibling?) function with additional
>> parameters to instruct how they work. (The following
>> involving 2-base representations)
>>
>> pg_size_pretty(n, <2base>, );
>> pg_size_bytes(n, <2base>, );
>>
>> 7. Docs
>>
>> +  Converts a size in human-readable format with size units
>> +  to bytes
>>
>>  The descriptions for the functions around use 'into' instead of
>>  'to' for the preposition.
>>
>>
>> 8. Regression
>>
>>  The regression in the patch looks good enough as is (except that
>>  it forgets the unit 'B' or prifixing spaces or sings), but they
>>  are necessarily have individual tests. The following SQL
>>  statement will do them at once.
>>
>>   SELECT s, pg_size_bytes(s) FROM (VALUES ('1'), ('1B')...) as t(s);
>>
>>
> I'll fix it.
>

here is updated patch

Regards

Pavel


>
> Thank you for your ideas
>
> Regards
>
> Pavel
>
>
>> regards,
>>
>> --
>> Kyotaro Horiguchi
>> NTT Open Source Software Center
>>
>>
>>
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
new file mode 100644
index 60b9a09..55e7d35
*** a/doc/src/sgml/func.sgml
--- b/doc/src/sgml/func.sgml
*** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17607,17612 
--- 17607,17615 
  pg_relation_size
 
 
+ pg_size_bytes
+
+
  pg_size_pretty
 
 
*** postgres=# SELECT * FROM pg_xlogfile_nam
*** 17677,17682 
--- 17680,17695 
 


+
+ pg_size_bytes(text)
+ 
+bigint
+
+  Converts a size in human-readable format with size units
+  into bytes.
+
+   
+   
 
  pg_size_pretty(bigint)
  
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
new file mode 100644
index 5ee59d0..0e7399c
*** a/src/backend/utils/adt/dbsize.c
--- b/src/backend/utils/adt/dbsize.c
***
*** 31,36 
--- 31,58 
  #include "utils/relmapper.h"
  #include "utils/syscache.h"
  
+ #define MAX_UNIT_LEN		3
+ #define MAX_DIGITS		20
+ 
+ typedef struct
+ {
+ 	char		unit[MAX_UNIT_LEN + 1];
+ 	long int	multiplier;
+ } unit_multiplier;
+ 
+ static const unit_multiplier unit_multiplier_table[] =
+ {
+ 	{"PB", 1024L * 1024 * 1024 * 1024 * 1024},

Re: [HACKERS] Foreign join pushdown vs EvalPlanQual

2015-12-01 Thread Robert Haas

On Thu, Nov 26, 2015 at 7:59 AM, Etsuro Fujita
 wrote:
>> The attached patch adds: Path *fdw_outerpath field to ForeignPath node.
>> FDW driver can set arbitrary but one path-node here.
>> After that, this path-node shall be transformed to plan-node by
>> createplan.c, then passed to FDW driver using GetForeignPlan callback.
>
> I understand this, as I also did the same thing in my patches, but actually,
> that seems a bit complicated to me.  Instead, could we keep the
> fdw_outerpath in the fdw_private of a ForeignPath node when creating the
> path node during GetForeignPaths, and then create an outerplan accordingly
> from the fdw_outerpath stored into the fdw_private during GetForeignPlan, by
> using create_plan_recurse there?  I think that that would make the core
> involvment much simpler.

I can't see how it's going to get much simpler than this.  The core
core is well under a hundred lines, and it all looks pretty
straightforward to me.  All of our existing path and plan types keep
lists of paths and plans separate from other kinds of data, and I
don't think we're going to win any awards for deviating from that
principle here.

> @@ -85,6 +86,18 @@ ForeignRecheck(ForeignScanState *node, TupleTableSlot
> *slot)
>
> ResetExprContext(econtext);
>
> +   /*
> +* FDW driver has to recheck visibility of EPQ tuple towards
> +* the scan qualifiers once it gets pushed down.
> +* In addition, if this node represents a join sub-tree, not
> +* a scan, FDW driver is also responsible to reconstruct
> +* a joined tuple according to the primitive EPQ tuples.
> +*/
> +   if (fdwroutine->RecheckForeignScan)
> +   {
> +   if (!fdwroutine->RecheckForeignScan(node, slot))
> +   return false;
> +   }
>
> Maybe I'm missing something, but I think we should let FDW do the work if
> scanrelid==0, not just if fdwroutine->RecheckForeignScan is given. (And if
> scanrelid==0 and fdwroutine->RecheckForeignScan is not given, we should
> abort the transaction.)

That would be unnecessarily restrictive.  On the one hand, even if
scanrelid != 0, the FDW can decide that it prefers to do the rechecks
using RecheckForeignScan rather than fdw_recheck_quals.  For most
FDWs, I expect using fdw_recheck_quals to be more convenient, but
there may be cases where somebody prefers to use RecheckForeignScan,
and allowing that costs nothing.  On the flip side, an FDW could
choose to support join pushdown but not worry about EPQ rechecks: it
can just refuse to push down joins when any rowmarks are present.
Requiring the FDW author to supply a dummy RecheckForeignScan method
in that case is pointless.  So I think KaiGai's check is exactly
right.

> Another thing I'm concerned about is
>
> @@ -347,8 +355,26 @@ ExecScanReScan(ScanState *node)
> {
> Index   scanrelid = ((Scan *)
> node->ps.plan)->scanrelid;
>
> -   Assert(scanrelid > 0);
> +   if (scanrelid > 0)
> +   estate->es_epqScanDone[scanrelid - 1] = false;
> +   else
> +   {
> +   Bitmapset  *relids;
> +   int rtindex = -1;
> +
> +   if (IsA(node->ps.plan, ForeignScan))
> +   relids = ((ForeignScan *)
> node->ps.plan)->fs_relids;
> +   else if (IsA(node->ps.plan, CustomScan))
> +   relids = ((CustomScan *)
> node->ps.plan)->custom_relids;
> +   else
> +   elog(ERROR, "unexpected scan node: %d",
> +(int)nodeTag(node->ps.plan));
>
> -   estate->es_epqScanDone[scanrelid - 1] = false;
> +   while ((rtindex = bms_next_member(relids, rtindex))
>>= 0)
> +   {
> +   Assert(rtindex > 0);
> +   estate->es_epqScanDone[rtindex - 1] = false;
> +   }
> +   }
> }
>
> That seems the outerplan's business to me, so I think it'd be better to just
> return, right before the assertion, as I said before.  Seen from another
> angle, ISTM that FDWs that don't use a local join execution plan wouldn't
> need to be aware of handling the es_epqScanDone flags.  (Do you think that
> such FDWs should do something like what ExecScanFtch is doing about the
> flags, in their RecheckForeignScans?  If so, I think we need docs for that.)

I noticed this too when reviewing KaiGai's patch, but ultimately I
think the way KaiGai has it is fine.  It may not be useful in some
cases, but AFAICS it should be harmless.

>> This patch is not tested by actual FDW extensions, so it is helpful
>> to enhance postgres_fdw to run the alternative sub-plan on EPQ recheck.
>
> Will do.

That would be great.

-- 
Robert Haas

91 matches

Mail list logo