Re: [HACKERS] WAL logging problem in 9.4.3?

2019-03-20 Thread Noah Misch
On Wed, Mar 20, 2019 at 05:17:54PM +0900, Kyotaro HORIGUCHI wrote:
> At Sun, 10 Mar 2019 19:27:08 -0700, Noah Misch  wrote in 
> <20190311022708.ga2189...@rfd.leadboat.com>
> > On Mon, Mar 04, 2019 at 12:24:48PM +0900, Kyotaro HORIGUCHI wrote:
> > > +/*
> > > + * Sync to disk any relations that we skipped WAL-logging for earlier.
> > > + */
> > > +void
> > > +smgrDoPendingSyncs(bool isCommit)
> > > +{
> > > + if (!pendingSyncs)
> > > + return;
> > > +
> > > + if (isCommit)
> > > + {
> > > + HASH_SEQ_STATUS status;
> > > + PendingRelSync *pending;
> > > +
> > > + hash_seq_init(, pendingSyncs);
> > > +
> > > + while ((pending = hash_seq_search()) != NULL)
> > > + {
> > > + if (pending->sync_above != InvalidBlockNumber)
> > 
> > I'm mildly unhappy that pendingSyncs entries with "pending->sync_above ==
> > InvalidBlockNumber" are not sync requests at all.  Those just record the 
> > fact
> > of a RelationTruncate() happening.  If you can think of a way to improve 
> > that,
> > please do so.  If not, it's okay.
> 
> After a truncation, required WAL records are emitted for the
> truncated pages, so no need to sync. Does this make sense for
> you? (Maybe commit is needed there)

Yes, the behavior makes sense.  I wasn't saying the quoted code had the wrong
behavior.  I was saying that the data structure called "pendingSyncs" is
actually "pending syncs and past truncates".  It's not ideal that the variable
name differs from the variable purpose in this way.  However, it's okay if you
don't find a way to improve that.

> I don't have enough time for now so the new version will be
> posted early next week.

I'll wait for that version.



Re: Pluggable Storage - Andres's take

2019-03-20 Thread Haribabu Kommi
Hi,

The psql \dA commands currently doesn't show the type of the access methods
of
type 'Table'.

postgres=# \dA heap
List of access methods
 Name | Type
--+---
 heap |
(1 row)

Attached a simple patch that fixes the problem and outputs as follows.

postgres=# \dA heap
List of access methods
 Name | Type
--+---
 heap | Table
(1 row)

The attached patch directly modifies the query that is sent to the server.
Servers < 12 doesn't support of type 'Table', so the same query can work,
because of the case addition as follows.

SELECT amname AS "Name",
  CASE amtype WHEN 'i' THEN 'Index' WHEN 't' THEN 'Table' END AS
"Type"
FROM pg_catalog.pg_am ...

Anyone feels that it requires a separate query for servers < 12?

Regards,
Haribabu Kommi
Fujitsu Australia


0001-dA-to-show-Table-type-access-method.patch
Description: Binary data


Re: Pluggable Storage - Andres's take

2019-03-20 Thread Haribabu Kommi
On Sat, Mar 16, 2019 at 5:43 PM Haribabu Kommi 
wrote:

>
>
> On Sat, Mar 9, 2019 at 2:13 PM Andres Freund  wrote:
>
>> Hi,
>>
>> While 0001 is pretty bulky, the interesting bits concentrate on a
>> comparatively small area. I'd appreciate if somebody could give the
>> comments added in tableam.h a read (both on callbacks, and their
>> wrappers, as they have different audiences). It'd make sense to first
>> read the commit message, to understand the goal (and I'd obviously also
>> appreciate suggestions for improvements there as well).
>>
>> I'm pretty happy with the current state of the scan patch. I plan to do
>> two more passes through it (formatting, comment polishing, etc. I don't
>> know of any functional changes needed), and then commit it, lest
>> somebody objects.
>>
>
> I found couple of typos in the committed patch, attached patch fixes them.
> I am not sure about one typo, please check it once.
>
> And I reviewed the 0002 patch, which is a pretty simple and it can be
> committed.
>

As you are modifying the 0003 patch for modify API's, I went and reviewed
the
existing patch and found couple corrections that are needed, in case if you
are not
taken care of them already.


+ /* Update the tuple with table oid */
+ slot->tts_tableOid = RelationGetRelid(relation);
+ if (slot->tts_tableOid != InvalidOid)
+ tuple->t_tableOid = slot->tts_tableOid;

The setting of slot->tts_tableOid is not required in this function,
After set the check is happening, the above code is present in both
heapam_heap_insert and heapam_heap_insert_speculative.


+ slot->tts_tableOid = RelationGetRelid(relation);

In heapam_heap_update, i don't think there is a need to update
slot->tts_tableOid.


+ default:
+ elog(ERROR, "unrecognized heap_update status: %u", result);

heap_update --> table_update?


+ default:
+ elog(ERROR, "unrecognized heap_delete status: %u", result);

same as above?


+ /*hari FIXME*/
+ /*Assert(result != HeapTupleUpdated && hufd.traversed);*/

Removing the commented codes in both ExecDelete and ExecUpdate functions.


+ /**/
+ if (epqreturnslot)
+ {
+ *epqreturnslot = epqslot;
+ return NULL;
+ }

comment update missed?


Regards,
Haribabu Kommi
Fujitsu Australia


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Abhijit Menon-Sen
At 2019-03-20 23:22:44 +0100, tomas.von...@2ndquadrant.com wrote:
>
> I don't really understand what issue are we trying to solve here.
> 
> Can someone describe a scenario where this (name of the binary not
> clearly indicating it's related postgres) causes issues in practice?
> On my system, there are ~1400 binaries in /usr/bin, and for the vast
> majority of them it's rather unclear where do they come from.

It sounds like a problem especially when described with charged terms
like "pollutes", but I agree with you and others that it just doesn't
seem worth the effort to try to rename everything.

-- Abhijit



Re: MacPorts support for "extra" tests

2019-03-20 Thread Tom Lane
Thomas Munro  writes:
> Peter E added some nice tests for LDAP and Kerberos, but they assume
> you have Homebrew when testing on a Mac.  Here's a patch to make them
> work with MacPorts too (a competing open source port/package
> distribution that happens to be the one that I use).  The third
> "extra" test is ssl, but that was already working.

+1, but could we comment that a bit?  I'm thinking of something like

  # typical library location for Homebrew

in each of the if-branches.

regards, tom lane



Re: Determine if FOR UPDATE or FOR SHARE was used?

2019-03-20 Thread Chapman Flack
On 03/18/19 00:45, Tom Lane wrote:
> I think it would help to take two steps back and ask why you want
> to know this, and what exactly is it that you want to know, anyhow.
> What does it matter if there's FOR SHARE in the query?  Does it

I was looking at an old design decision in PL/Java, which implements
java.sql.ResultSet by grabbing a pile of tuples at a time from
SPI_cursor_fetch, and then letting the ResultSet API iterate through
those, until the next pile needs to be fetched.

It seemed like the kind of optimization probably very important in a
client/server connection over RFC 2549, but I'm not sure how important
it is for code running right in the backend.

Maybe it does save a few cycles, but I don't want to be watching when
somebody tries to do UPDATE or DELETE WHERE CURRENT OF.

It occurred to me that positioned update/delete could be made to work
either by simply having the Java ResultSet row fetch operations correspond
directly to SPI fetches, or by continuing to SPI-fetch multiple rows at
a time, but repositioning with SPI_cursor_move as the Java ResultSet
pointer moves through them. (Is one of those techniques common in other
PLs?)

But it also occurred to me that there might be a practical way to
examine the query to see it's one that could be used for positioned
update or delete at all, and avoid any special treatment if it isn't.

Regards,
-Chap



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tom Lane
Tomas Vondra  writes:
> On 3/21/19 1:49 AM, Michael Paquier wrote:
>> On Thu, Mar 21, 2019 at 08:41:32AM +0900, Tatsuo Ishii wrote:
>>> Can someone describe a scenario where this (name of the binary not
>>> clearly indicating it's related postgres) causes issues in practice?

>> Naming conflict because our binary names are too generic?

> Maybe. Do we actually know about such cases?

More to the point, we have now got twenty+ years seniority on any other
package that might want those /usr/bin names.  So a conflict is not
*really* going to happen, or at least it's not going to be our problem
if it does.

The whole thing is unfortunate, without a doubt, but it's still
unclear that renaming those programs will buy anything that's worth
the conversion costs.  I'd be happy to pay said costs if it were all
falling to this project to do so ... but most of the pain will be
borne by other people.

regards, tom lane



Re: DNS SRV support for LDAP authentication

2019-03-20 Thread Thomas Munro
On Tue, Mar 19, 2019 at 9:01 PM Thomas Munro  wrote:
> I'd like to commit this soon.

Done, after some more comment adjustments.  Thanks Daniel and Graham
for your feedback!

-- 
Thomas Munro
https://enterprisedb.com



Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-03-20 Thread Haribabu Kommi
On Thu, Mar 21, 2019 at 12:41 PM Haribabu Kommi 
wrote:

>
> On Wed, Mar 20, 2019 at 4:33 PM Michael Paquier 
> wrote:
>
>> And actually it seems to me that you have a race condition in that
>> stuff.  I think that you had better use umask(), then fopen, and then
>> once again umask() to put back the previous permissions, removing the
>> extra chmod() call.
>>
>
> Changed the patch to use umask() instead of chmod() according to
> your suggestion.
>
> updated patch attached.
>

Earlier attached patch is wrong.
Correct patch attached. Sorry for the inconvenience.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Adjust-current_logfiles-file-permissions_v3.patch
Description: Binary data


Re: MSVC Build support with visual studio 2019

2019-03-20 Thread Haribabu Kommi
On Thu, Mar 21, 2019 at 12:31 PM Michael Paquier 
wrote:

> On Thu, Mar 21, 2019 at 09:47:02AM +0900, Michael Paquier wrote:
> > When it comes to support newer versions of MSVC, we have come up
> > lately to backpatch that down to two stable versions but not further
> > down (see f2ab389 for v10 and v9.6), so it looks sensible to target
> > v11 and v10 as well if we were to do it today, and v11/v12 if we do it
> > in six months per the latest trends.
>
> By the way, you mentioned upthread that all the branches can compile
> with MSVC 2017, but that's not actually the case for 9.5 and 9.4 if
> you don't back-patch f2ab389 further down.
>

The commit f2ab389 is later back-patch to version till 9.3 in commit
19acfd65.
I guess that building the windows installer for all the versions using the
same
visual studio is may be the reason behind that back-patch.

Regards,
Haribabu Kommi
Fujitsu Australia


Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-03-20 Thread Haribabu Kommi
On Wed, Mar 20, 2019 at 4:33 PM Michael Paquier  wrote:

> On Fri, Mar 15, 2019 at 06:51:37PM +1100, Haribabu Kommi wrote:
> > IMO, this update is just a recommendation to the user, and sometimes it
> is
> > still possible that there may be strict permissions for the log file
> > even the data directory is allowed for the group access. So I feel
> > it is still better to update the permissions of the current_logfiles
> > to the database files permissions than log file permissions.
>
> I was just reading again this thread, and the suggestions that
> current_logfiles is itself not a log file is also a sensible
> position.  I was just looking at the patch that you sent at the top of
> the thread here:
>
> https://www.postgresql.org/message-id/cajrrpgceotf1p7awoeqyd3pqr-0xkqg_herv98djbamj+na...@mail.gmail.com
>

Thanks for the review.


> And actually it seems to me that you have a race condition in that
> stuff.  I think that you had better use umask(), then fopen, and then
> once again umask() to put back the previous permissions, removing the
> extra chmod() call.
>

Changed the patch to use umask() instead of chmod() according to
your suggestion.

updated patch attached.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Adjust-current_logfiles-file-permissions_v2.patch
Description: Binary data


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tomas Vondra



On 3/21/19 1:49 AM, Michael Paquier wrote:
> On Thu, Mar 21, 2019 at 08:41:32AM +0900, Tatsuo Ishii wrote:
>>> Can someone describe a scenario where this (name of the binary not
>>> clearly indicating it's related postgres) causes issues in practice? On
>>> my system, there are ~1400 binaries in /usr/bin, and for the vast
>>> majority of them it's rather unclear where do they come from.
> 
> Naming conflict because our binary names are too generic?  createdb
> could for example be applied to any database, and not only Postgres.
> (I have 1600 entries in /usr/bin on a Debian installation.)
> 

Maybe. Do we actually know about such cases? Also, isn't setting
--prefix a suitable solution? I mean, it's what we/packagers do to
support installing multiple Pg versions (in which case it'll conflict no
matter how we rename stuff) anyway.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: MSVC Build support with visual studio 2019

2019-03-20 Thread Michael Paquier
On Thu, Mar 21, 2019 at 09:47:02AM +0900, Michael Paquier wrote:
> When it comes to support newer versions of MSVC, we have come up
> lately to backpatch that down to two stable versions but not further
> down (see f2ab389 for v10 and v9.6), so it looks sensible to target
> v11 and v10 as well if we were to do it today, and v11/v12 if we do it
> in six months per the latest trends.

By the way, you mentioned upthread that all the branches can compile
with MSVC 2017, but that's not actually the case for 9.5 and 9.4 if
you don't back-patch f2ab389 further down.  Or did you mean that the
code as-is is compilable if the scripts are patched manually?
--
Michael


signature.asc
Description: PGP signature


Re: Special role for subscriptions

2019-03-20 Thread Andrey Borodin



> 21 марта 2019 г., в 8:56, Michael Paquier  написал(а):
> 
> On Wed, Mar 20, 2019 at 11:58:04PM +0800, Andrey Borodin wrote:
>>> 20 марта 2019 г., в 21:46, Robert Haas  написал(а):
>>> I think we should view this permission as "you can create
>>> subscriptions, plain and simple".
>> 
>> That sounds good.
>> From my POV, the purpose of the patch is to allow users to transfer
>> their database via logical replication. Without superuser privileges
>> (e.g. to the managed cloud with vanilla postgres).
> 
> A system role to be able to create subscriptions is perhaps a too big
> hammer as that would apply to all databases of a system, still we may
> be able to live with that.
> 
> Perhaps we would want something at database level different from GRANT
> CREATE ON DATABASE, but only for subscriptions?  This way, it is
> possible to have per-database groups having the right to create
> subscriptions, and I'd like to think that we should not include
> subcription creation into the existing CREATE rights.  It would be
> kind of funny to not have CREATE include the creation of this specific
> object though :)

I think that small granularity can lead to unnecessary multiplication of 
subscription. User need to have sufficient minimum number of subscriptions, 
like they have 1 incoming WAL.
If we have per-database permission management, user will decide that it is a 
good thing to divide one subscription to per-database subscriptions.

Best regards, Andrey Borodin.


Re: performance issue in remove_from_unowned_list()

2019-03-20 Thread Tomas Vondra
On 3/12/19 11:54 PM, Tomas Vondra wrote:
> 
> 
> On 3/10/19 9:09 PM, Alvaro Herrera wrote:
>> On 2019-Feb-07, Tomas Vondra wrote:
>>
>>> Attached is a WIP patch removing the optimization from DropRelationFiles
>>> and adding it to smgrDoPendingDeletes. This resolves the issue, at least
>>> in the cases I've been able to reproduce. But maybe we should deal with
>>> this issue earlier by ensuring the two lists are ordered the same way
>>> from the very beginning, somehow.
>>
>> I noticed that this patch isn't in the commitfest.  Are you planning to
>> push it soon?
>>
> 
> I wasn't planning to push anything particularly soon, for two reasons:
> Firstly, the issue is not particularly pressing except with non-trivial
> number of relations (and I only noticed that during benchmarking).
> Secondly, I still have a feeling I'm missing something about b4166911
> because for me that commit does not actually fix the issue - i.e. I can
> create a lot of relations in a transaction, abort it, and observe that
> the replica actually accesses the relations in exactly the wrong order.
> So that commit does not seem to actually fix anything.
> 
> Attached is a patch adopting the dlist approach - it seems to be working
> quite fine, and is a bit cleaner than just slapping another pointer into
> the SMgrRelationData struct. So I'd say this is the way to go.
> 
> I see b4166911 was actually backpatched to all supported versions, on
> the basis that it fixes oversight in 279628a0a7. So I suppose this fix
> should also be backpatched.
> 

OK, so here is a bit more polished version of the dlist-based patch.

It's almost identical to what I posted before, except that it:

1) undoes the non-working optimization in DropRelationFiles()

2) removes add_to_unowned_list/remove_from_unowned_list entirely and
just replaces that with dlist_push_tail/dlist_delete

I've originally planned to keep those functions, mostly because the
remove_from_unowned_list comment says this:

  - * If the reln is not present in the list, nothing happens.
  - * Typically this would be caller error, but there seems no
  - * reason to throw an error.

I don't think dlist_delete allows that. But after further inspection of
all places calling those functions, don't think that can happen.

I plan to commit this soon-ish and backpatch it as mentioned before,
because I consider it pretty much a fix for b4166911.

I'm however still mildly puzzled that b4166911 apparently improved the
behavior in some cases, at least judging by [1]. OTOH there's not much
detail about how it was tested, so I can't quite run it again.


[1]
https://www.postgresql.org/message-id/flat/CAHGQGwHVQkdfDqtvGVkty%2B19cQakAydXn1etGND3X0PHbZ3%2B6w%40mail.gmail.com

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/storage/smgr/md.c b/src/backend/storage/smgr/md.c
index 2aba2dfe91..6ed68185ed 100644
--- a/src/backend/storage/smgr/md.c
+++ b/src/backend/storage/smgr/md.c
@@ -1699,13 +1699,7 @@ DropRelationFiles(RelFileNode *delrels, int ndelrels, bool isRedo)
 
 	smgrdounlinkall(srels, ndelrels, isRedo);
 
-	/*
-	 * Call smgrclose() in reverse order as when smgropen() is called.
-	 * This trick enables remove_from_unowned_list() in smgrclose()
-	 * to search the SMgrRelation from the unowned list,
-	 * with O(1) performance.
-	 */
-	for (i = ndelrels - 1; i >= 0; i--)
+	for (i = 0; i < ndelrels; i++)
 		smgrclose(srels[i]);
 	pfree(srels);
 }
diff --git a/src/backend/storage/smgr/smgr.c b/src/backend/storage/smgr/smgr.c
index 0c0bba4ab3..f6de9df9e6 100644
--- a/src/backend/storage/smgr/smgr.c
+++ b/src/backend/storage/smgr/smgr.c
@@ -18,6 +18,7 @@
 #include "postgres.h"
 
 #include "commands/tablespace.h"
+#include "lib/ilist.h"
 #include "storage/bufmgr.h"
 #include "storage/ipc.h"
 #include "storage/smgr.h"
@@ -97,12 +98,10 @@ static const int NSmgr = lengthof(smgrsw);
  */
 static HTAB *SMgrRelationHash = NULL;
 
-static SMgrRelation first_unowned_reln = NULL;
+static dlist_head	unowned_relns;
 
 /* local function prototypes */
 static void smgrshutdown(int code, Datum arg);
-static void add_to_unowned_list(SMgrRelation reln);
-static void remove_from_unowned_list(SMgrRelation reln);
 
 
 /*
@@ -165,7 +164,7 @@ smgropen(RelFileNode rnode, BackendId backend)
 		ctl.entrysize = sizeof(SMgrRelationData);
 		SMgrRelationHash = hash_create("smgr relation table", 400,
 	   , HASH_ELEM | HASH_BLOBS);
-		first_unowned_reln = NULL;
+		dlist_init(_relns);
 	}
 
 	/* Look up or create an entry */
@@ -192,7 +191,7 @@ smgropen(RelFileNode rnode, BackendId backend)
 			reln->md_num_open_segs[forknum] = 0;
 
 		/* it has no owner yet */
-		add_to_unowned_list(reln);
+		dlist_push_tail(_relns, >node);
 	}
 
 	return reln;
@@ -222,7 +221,7 @@ smgrsetowner(SMgrRelation *owner, SMgrRelation reln)
 	if (reln->smgr_owner)
 		*(reln->smgr_owner) = NULL;
 	else
-		remove_from_unowned_list(reln);

RE: Best way to keep track of a sliced TOAST

2019-03-20 Thread Bruno Hass
I would like to optimize the jsonb key access operations. I could not find the 
discussion you've mentioned, but I am giving some thought to the idea.

Instead of storing lengths, could we dedicate the first chunk of the TOASTed 
jsonb to store where each key is located? Would it be a good idea?

You've mentioned that the current jsonb format is byte-oriented. Does that 
imply that a single jsonb key value might be split between multiple chunks?


Bruno Hass


De: Robert Haas 
Enviado: sexta-feira, 15 de março de 2019 12:22
Para: Bruno Hass
Cc: pgsql-hackers
Assunto: Re: Best way to keep track of a sliced TOAST

On Fri, Mar 15, 2019 at 7:37 AM Bruno Hass  wrote:
> This idea is what I was hoping to achieve. Would we be able to make 
> optimizations on deTOASTing  just by storing the chunk lengths in chunk 0?

I don't know. I guess we could also NOT store the chunk lengths and
just say that if you don't know which chunk you want by chunk number,
your only other alternative is to read the chunks in order.  The
problem with that is that it you can no longer index by byte-position
without fetching every chunk prior to that byte position, but maybe
that's not important enough to justify the overhead of a list of chunk
lengths.  Or maybe it depends on what you want to do with it.

Again, stuff like what you are suggesting here has been suggested
before.  I think the problem is if someone did the work to invent such
an infrastructure, that wouldn't actually do anything by itself.  We'd
then need to find an application of it where it brought us some clear
advantage.  As I said in my previous email, jsonb seems like a
promising candidate, but I don't think it's a slam dunk.  What would
the design look like, exactly?  Which operations would get faster, and
could we really make them work?  The existing format is, I think,
designed with a byte-oriented format in mind, and a chunk-oriented
format might have different design constraints.  It seems like an idea
with potential, but there's a lot of daylight between a directional
idea with potential and a specific idea accompanied by a high-quality
implementation thereof.

> Also, wouldn't it break existing functions by dedicating a whole chunk 
> (possibly more) to such metadata?

Anybody writing such a patch would have to be prepared to fix any such
breakage that occurred, at least as regards core code.  I would guess
that this could be done without breaking too much third-party code,
but I guess it depends on exactly what the author of this hypothetical
patch ends up changing.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Psql patch to show access methods info

2019-03-20 Thread Nikita Glukhov

Hi.

On 08.03.2019 7:52, Kyotaro HORIGUCHI wrote:


Hello.

At Mon, 10 Dec 2018 19:38:39 +0300, s.cherkas...@postgrespro.ru wrote in 
<70e94e339dd0fa2be5d3eebec68da...@postgrespro.ru>

Here are some fixes. But I'm not sure that the renaming of columns for
the '\dAp' command is sufficiently laconic and informative. If you
have any suggestions on how to improve them, I will be very grateful.

\dA:

   This is showing almost nothing. I think it's better that this
   command shows the same content with \dA+.  As per Nikita's comment
   upthread, "Table" addition to "Index" is needed.

\dAp:

   As the result \dAp gets useless. It cannot handle both Index
   and Table AMs at once.

   So, I propose the following behavior instead. It is similar to
   what \d does.

=# \dA
 List of access methods
   Name  | Type  |   Handler
+---+--
  brin   | Index | brinhandler
   ..
  heap   | Table | heap_tableam_handler


=# \dA+
   Name  | Type  |   Handler|  Description
+---+--+
  brin   | Index | brinhandler  | block range index (BRIN) access method
   ..
  heap   | Table | heap_tableam_handler | heap table access method


=# \dA brin
 Index access method "brin"
   Name  | Ordering | Unique | Multicol key | Non-key cols | Excl Constraints
+--++--+--+-
  brin   | No   | Yes| No   | No   | No


I completely agree.  Also I propose the following renaming of commands
after \dAp removing:
\dAfo => \dAo
\dAfp => \dAp
\dAoc => \dAc
 


\dA heap
 Table access method "heap"
(I don't have an idea what to show here..)


Yes, there are no functions like pg_tableam_has_property() yet.


\dAfo: I don't get the point of the command.


This commands helps to remember which operators can be accelerated up by
each index AM.  Maybe operator name and its operand type would be better to
put into a single column.  Also schema can be shown only when opfamily is not
visible, or in verbose mode.

For example, for jsonb type we could have:

\dAfo * jsonb*

 List operators of family related to access method
  AM   |   Schema   |Opfamily|  Operator
---+++
 btree | pg_catalog | jsonb_ops  | < (jsonb, jsonb)
 btree | pg_catalog | jsonb_ops  | <= (jsonb, jsonb)
 btree | pg_catalog | jsonb_ops  | = (jsonb, jsonb)
 btree | pg_catalog | jsonb_ops  | >= (jsonb, jsonb)
 btree | pg_catalog | jsonb_ops  | > (jsonb, jsonb)
 gin   | pg_catalog | jsonb_ops  | @> (jsonb, jsonb)
 gin   | pg_catalog | jsonb_ops  | ? (jsonb, text)
 gin   | pg_catalog | jsonb_ops  | ?| (jsonb, text[])
 gin   | pg_catalog | jsonb_ops  | ?& (jsonb, text[])
 gin   | pg_catalog | jsonb_path_ops | @> (jsonb, jsonb)
 hash  | pg_catalog | jsonb_ops  | = (jsonb, jsonb)
(11 rows)


\dAoc: This seems more useful than \dAfo but the information that
the command shows seems a bit pointless. We sometimes want to
know the name of operator class usable in a CREATE INDEX. So I
suppose that something like the following might be useful
instead.

SELECT DISTINCT a.amname AS "Acess method",
(case when o.opckeytype <> 0 then o.opckeytype else o.opcintype end)::regtype AS 
"Key type",
n.nspname || '.' || o.opcname AS "Operator class",
(case when o.opcdefault then 'Yes' else 'No' end) AS "Default for type?"
FROM pg_catalog.pg_opclass o
JOIN pg_catalog.pg_opfamily f ON (f.oid = o.opcfamily)
JOIN pg_catalog.pg_am a ON (a.oid = f.opfmethod)
JOIN pg_catalog.pg_namespace n ON (n.oid = o.opcnamespace)
ORDER BY 1, 2, 4 desc, 3;

\dAoc
 List of operator classes for access methods
  Access method | Key type |   Operator class| Default for type?
---+--+-+---
  brin  | bytea| pg_catalog.bytea_minmax_ops | Yes
  brin  | "char"   | pg_catalog.char_minmax_ops  | Yes
  brin  | name | pg_catalog.name_minmax_ops  | Yes
  brin  | bigint   | pg_catalog.int8_minmax_ops  | Yes
..


\dAoc btree
 List of operator classes for access method 'btree'
  Access method | Key type |Operator class   | Default for type?
---+--+-+---
  btree | boolean  | pg_catalog.bool_ops | Yes
...
  btree | text | pg_catalog.text_ops | Yes
  btree | text | pg_catalog.text_pattern_ops | No
  btree | text | pg_catalog.varchar_ops  | No

\dAoc btree text
List of operator classes for access method 'btree', type 'text'

 List of operator classes for access method 'btree'
  Access method | Key type | Operator class | Default for type?

Re: Special role for subscriptions

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 11:58:04PM +0800, Andrey Borodin wrote:
>> 20 марта 2019 г., в 21:46, Robert Haas  написал(а):
>> I think we should view this permission as "you can create
>> subscriptions, plain and simple".
> 
> That sounds good.
> From my POV, the purpose of the patch is to allow users to transfer
> their database via logical replication. Without superuser privileges
> (e.g. to the managed cloud with vanilla postgres).

A system role to be able to create subscriptions is perhaps a too big
hammer as that would apply to all databases of a system, still we may
be able to live with that.

Perhaps we would want something at database level different from GRANT
CREATE ON DATABASE, but only for subscriptions?  This way, it is
possible to have per-database groups having the right to create
subscriptions, and I'd like to think that we should not include
subcription creation into the existing CREATE rights.  It would be
kind of funny to not have CREATE include the creation of this specific
object though :)
--
Michael


signature.asc
Description: PGP signature


MacPorts support for "extra" tests

2019-03-20 Thread Thomas Munro
Hello hackers,

Peter E added some nice tests for LDAP and Kerberos, but they assume
you have Homebrew when testing on a Mac.  Here's a patch to make them
work with MacPorts too (a competing open source port/package
distribution that happens to be the one that I use).  The third
"extra" test is ssl, but that was already working.

-- 
Thomas Munro
https://enterprisedb.com


ldap-on-macports.patch
Description: Binary data


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Michael Paquier
On Thu, Mar 21, 2019 at 08:41:32AM +0900, Tatsuo Ishii wrote:
>> Can someone describe a scenario where this (name of the binary not
>> clearly indicating it's related postgres) causes issues in practice? On
>> my system, there are ~1400 binaries in /usr/bin, and for the vast
>> majority of them it's rather unclear where do they come from.

Naming conflict because our binary names are too generic?  createdb
could for example be applied to any database, and not only Postgres.
(I have 1600 entries in /usr/bin on a Debian installation.)

>> 
>> But it's not really an issue, because we have tools to do that
>> 
>> 1) man
>> 
>> 2) -h/--help
>> 
>> 3) rpm -qf $file (and similarly for other packagers)
>> 
>> 4) set --prefix to install binaries so separate directory (which some
>> distros already do anyway)
>> 
>> So to me this seems like a fairly invasive change (potentially breaking
>> quite a few scripts/tools) just to address a minor inconvenience.
> 
> +1.

Yes, +1.
--
Michael


signature.asc
Description: PGP signature


Re: MSVC Build support with visual studio 2019

2019-03-20 Thread Michael Paquier
On Thu, Mar 21, 2019 at 11:36:42AM +1100, Haribabu Kommi wrote:
> I can provide a separate back branches patch later once this patch comes to
> a stage of commit. Currently all the supported branches are possible to
> compile with VS 2017.

When it comes to support newer versions of MSVC, we have come up
lately to backpatch that down to two stable versions but not further
down (see f2ab389 for v10 and v9.6), so it looks sensible to target
v11 and v10 as well if we were to do it today, and v11/v12 if we do it
in six months per the latest trends.
--
Michael


signature.asc
Description: PGP signature


MSVC Build support with visual studio 2019

2019-03-20 Thread Haribabu Kommi
Hi Hackers,

Here I attached a patch that supports building of PostgreSQL with VS 2019.
VS 2019 is going to release on Apr 2nd 2019, it will be good if version 12
supports compiling. The attached for is for review, it may needs some
updates
once the final version is released.

Commit d9dd406fe281d22d5238d3c26a7182543c711e74 has reduced the
minimum visual studio support to 2013 to support C99 standards, because of
this
reason, the current attached patch cannot be backpatched as it is.

I can provide a separate back branches patch later once this patch comes to
a stage of commit. Currently all the supported branches are possible to
compile with VS 2017.

comments?

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Support-building-with-visual-studio-2019.patch
Description: Binary data


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tatsuo Ishii
> I don't really understand what issue are we trying to solve here.
> 
> Can someone describe a scenario where this (name of the binary not
> clearly indicating it's related postgres) causes issues in practice? On
> my system, there are ~1400 binaries in /usr/bin, and for the vast
> majority of them it's rather unclear where do they come from.
> 
> But it's not really an issue, because we have tools to do that
> 
> 1) man
> 
> 2) -h/--help
> 
> 3) rpm -qf $file (and similarly for other packagers)
> 
> 4) set --prefix to install binaries so separate directory (which some
> distros already do anyway)
> 
> So to me this seems like a fairly invasive change (potentially breaking
> quite a few scripts/tools) just to address a minor inconvenience.

+1.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 05:46:32PM +0100, Fabien COELHO wrote:
> I think that the motivation/risks should appear before the solution. "As xyz
> ..., ...", or there at least the logical link should be outlined.
> 
> It is not clear for me whether the following sentences, which seems specific
> to "pg_rewind", are linked to the previous advice, which seems rather to
> refer to streaming replication?

Do you have a better idea of formulation?  If you have a failover
which makes use of pg_rewind, or use some backup tool which does
incremental copy of physical blocks like pg_rman, then you have a risk
to mess up instances in a cluster which is the risk I am trying to
outline here.  It is actually fine to do the following actually if
everything is WAL-based as checksums are only computed once a shared
buffer is flushed on a single instance.  Imagine for example a
primary-standby with checksums disabled:
- Shutdown cleanly standby, enable checksums.
- Plug back standby to cluster, let it replay up to the latest point.
- Shutdown cleanly primary.
- Promote standby.
- Enable checksums on the previous primary.
- Add recovery parameters and recommend the primary back to the
cluster.
- All nodes have checksums enabled, without rebuilding any of them.
Per what I could see on this thread, folks tend to point out that
we should *not* include such recommendations in the docs, as it is too
easy to do a pilot error.

> > +   When using a cluster with
> > +   tools which perform direct copies of relation file blocks (for example
> > +   ), enabling or disabling checksums can
> > +   lead to page corruptions in the shape of incorrect checksums if the
> > +   operation is not done consistently across all nodes.  Destroying all
> > +   the standbys in a cluster first, enabling or disabling checksums on
> > +   the primary and finally recreate the cluster nodes from scratch is
> > +   also safe.
> > +  
> > + 
> 
> Should not disabling in reverse order be safe? the checksum are not checked
> afterwards?

I don't quite understand your comment about the ordering.  If all
the standbys are destroyed first, then enabling/disabling checksums
happens at a single place.

> After the reboot, some data files are not fully updated with their
> checksums, although the controlfiles tells that they are. It should then
> fail after a restart when a no-checksum page is loaded?
> 
> What am I missing?

Please note that we do that in other tools as well and we live fine
with that as pg_basebackup, pg_rewind just to name two.  I am not
saying that it is not a problem in some cases, but I am saying that
this is not a problem that this patch should solve.  If we were to do
something about that, it could make sense to make fsync_pgdata()
smarter so as the control file is flushed last there, or define flush
strategies there.
--
Michael


signature.asc
Description: PGP signature


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tatsuo Ishii
>> +1. As one of third party PostgreSQL tool developers, I am afraid
>> changing names of PostgreSQL commands would give us lots of pain: for
>> example checking PostgreSQL version to decide to use command "foo" not
>> "pg_foo".
>>
> createdb, dropdb, createuser, dropuser, reindexdb are binaries that
> confuse most newbies. Which tool is theses binaries from? The names
> does not give a hint. How often those confusing name tools are used?
> AFAICS a graphical tool or psql is used to create roles and databases.
> psql -c "stmt" can replace createdb, dropdb, createuser and dropuser.
> What about deprecate them (and remove after a support cycle)?

At least psql, initdb, pg_config, pgbench and pg_ctl for now. But I
don't want to say that renaming other commands would be fine for me
because I would like to take a liberty to extend my tool for my users.

BTW, a strange thing in the whole discussion is, installing those
PostgreSQL commands in /usr/bin is done by packagers, not PostgreSQL
core project itself. The default installation directory has been
/usr/local/pgsql/bin in the source code of PostgreSQL since it was
born, and I love the place. Forcing to install everything into
/usr/bin is distributions' policy, not PostgreSQL core project's as
far as I know. So I wonder why people don't ask the renaming request
to packagers, rather than PostgreSQL core project itself.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Removing unneeded self joins

2019-03-20 Thread David Rowley
On Thu, 21 Mar 2019 at 01:20, Alexander Kuzmenkov
 wrote:
> Let's recap the conditions when we can remove a self-join. It is when
> for each outer row, 1) at most one inner row matches the join clauses,
> and 2) it is the same row as the outer one. I'm not sure what (2) means
> precisely in a general case, but for a plain table, we can identify
> these rows by ctid. So when both sides have the same unique index with
> the same clauses, we conclude that we are always dealing with the same
> row (as identified by ctid) on both sides, hence the join can be
> replaced with a scan.
>
> The code I wrote just checks for the above conditions. The data we need
> for these checks is a byproduct of checking the relations for
> uniqueness, which we do anyway, so we just cache it for a negligible cost.
>
> I didn't write it in a more generic way because I don't understand the
> conditions for generic case. In your DISTINCT example, the join can be
> removed indeed. But if we select some columns from the inner side apart
> from the join ones, we can't remove the join anymore:
>
> select * from t1, (select distinct on (a) a, b from t1) tt where t1.a =
> tt.a;
>
> I think this might be a different kind of optimization, where we remove
> the self-join if the inner side is unique, and no inner columns are
> selected besides the join ones.
>
>
> Also, reading your letter I realized that I don't commute the index
> clauses correctly before comparing them in is_unique_self_join, so I
> fixed this in the new version of the patch.

I really just don't think checking the unique indexes match is going
to cut it. You should be looking for a unique index where the join
clauses match on either side of the join, not looking independently
and then checking the indexes are the same ones.

Here's an example of what can go wrong with your current code:

drop table abc;
create table abc(a int, b int, c int);
create unique index on abc(a);
create unique index on abc(b);
create unique index on abc(c);
explain select * from abc a1 inner join abc a2 on a1.a=a2.b and a1.c=a2.c;
   QUERY PLAN
-
 Seq Scan on abc a2  (cost=0.00..35.50 rows=10 width=24)
   Filter: ((c IS NOT NULL) AND (a = b))
(2 rows)

The above seems fine, but let's try again, this time change the order
that the indexes are defined.

drop table abc;
create table abc(a int, b int, c int);
create unique index on abc(a);
create unique index on abc(c);
create unique index on abc(b);
explain select * from abc a1 inner join abc a2 on a1.a=a2.b and a1.c=a2.c;
  QUERY PLAN
---
 Hash Join  (cost=61.00..102.11 rows=1 width=24)
   Hash Cond: ((a1.a = a2.b) AND (a1.c = a2.c))
   ->  Seq Scan on abc a1  (cost=0.00..30.40 rows=2040 width=12)
   ->  Hash  (cost=30.40..30.40 rows=2040 width=12)
 ->  Seq Scan on abc a2  (cost=0.00..30.40 rows=2040 width=12)
(5 rows)

oops. I think behaviour like this that depends on the order that
indexes are created is not going to cut it. Probably you maybe could
restrict the join qual list to just quals that have the same expr on
either side of the join, that way you could still use
innerrel_is_unique() to check the inner rel is unique. You'll likely
need to pass force_cache as false though, since you don't want to
cache non-uniqueness with a subset of joinquals.  Doing that could
cause unique joins not to work when the join search is done via GEQO.
I also think this way would give you the subquery GROUP BY / DISTINCT
self join removal for just about free.  However, there might be more
cleanup to be done in that case... I've not thought about that too
hard.

I'm going to set this to waiting on author, as that's quite a decent
sized change.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread Julien Rouhaud
On Wed, Mar 20, 2019 at 11:10 PM legrand legrand
 wrote:
>
> Thank you Julien for the workaround,
> It is not easy to build "cross tables" in excel to join metrics per query
> text ...

then keep only one queryid over all environments, that's easy enough in SQL:

SELECT min(queryid) OVER (partition by query ORDER BY environment),
... FROM all_pg_stat_statements

if you have your environment named like 0_production,
1_preproduction... you'll get the queryid from production.  Once
again, that's not ideal but it's easy to deal with it when consuming
the data.

> and I'm not ready to build a MD5(query) as many query could lead to the same
> QueryId

I'd be really surprised if you see a single collision in your whole
life, whatever pg_stat_statements.max you're using.  I'm also pretty
sure that the collision risk is technically higher with an 8B queryId
field rather than a 16B md5, but maybe I'm wrong.



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tomas Vondra
On 3/20/19 7:08 PM, Alvaro Herrera wrote:
> On 2019-Mar-20, Euler Taveira wrote:
> 
>> Em qua, 20 de mar de 2019 às 14:57, Tom Lane  escreveu:
>>>
>>> We managed to get rid of createlang and droplang in v10, and there
>>> hasn't been that much push-back about it.  So maybe there could be
>>> a move to remove createuser/dropuser?  Or at least rename them to
>>> pg_createuser and pg_dropuser.  But I think this was discussed
>>> (again) during the v10 cycle, and we couldn't agree to do more than
>>> get rid of createlang/droplang.
> 
> Previous discussion: 
> https://postgr.es/m/CABUevExPrfPH5K5qM=zst7tvfyace+i5qja6bfwckkyrh8m...@mail.gmail.com
> 
>> Votes? +1 to remove createuser/dropuser (and also createdb/dropdb as I
>> said in the other email). However, if we don't have sufficient votes,
>> let's at least consider a 'pg_' prefix.
> 
> I vote to keep these rename these utilities to have a pg_ prefix and to
> simultaneously install symlinks for their current names, so that nothing
> breaks.
> 

I don't really understand what issue are we trying to solve here.

Can someone describe a scenario where this (name of the binary not
clearly indicating it's related postgres) causes issues in practice? On
my system, there are ~1400 binaries in /usr/bin, and for the vast
majority of them it's rather unclear where do they come from.

But it's not really an issue, because we have tools to do that

1) man

2) -h/--help

3) rpm -qf $file (and similarly for other packagers)

4) set --prefix to install binaries so separate directory (which some
distros already do anyway)

So to me this seems like a fairly invasive change (potentially breaking
quite a few scripts/tools) just to address a minor inconvenience.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread legrand legrand
Julien Rouhaud wrote
> On Wed, Mar 20, 2019 at 10:30 PM legrand legrand
> 

> legrand_legrand@

>  wrote:
>>
>> maybe this patch (with a GUC)
>> https://www.postgresql.org/message-id/

> 55E51C48.1060102@

>> would be enough for thoses actually using a text normalization function.
> 
> The rest of thread raise quite a lot of concerns about the semantics,
> the cost and the correctness of this patch.  After 5 minutes checking,
> it wouldn't suits your need if you use custom functions, custom types,
> custom operators (say using intarray extension) or if your tables
> don't have columns in the same order in every environment.  And there
> are probably other caveats that I didn't see;

Yes I know,
It would have to be extended at less at functions, types, operators ...
names
and a guc pg_stat_statements.queryid_based= 'names' (default being 'oids')

and with a second guc ('fullyqualifed' ?)
sould include tables, functions, types, operators ... namespaces

let "users" specify their needs, we will see ;o)



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread legrand legrand
Julien Rouhaud wrote
> On Wed, Mar 20, 2019 at 10:18 PM legrand legrand
> 

> legrand_legrand@

>  wrote:
>>
>> On my personal point of view, I need to get the same Queryid between
>> (OLAP)
>> environments
>> to be able to compare Production, Pre-production, Qualif performances
>> (and I don't need Fully qualified relation names). Today to do that,
>> I'm using a custom extension computing the QueryId based on the
>> normalized
>> Query
>> text.
> 
> IIUC, your need is to compare pgss (maybe other extensions) counters
> from different primary servers, for queries generated by the same
> application(s).  A naive workaround like exporting each environment
> counters (COPY SELECT 'production', * FROM pg_stat_statements TO
> '...'), importing all of them on a server and then comparing
> everything using the query text (which should be the same if the
> application is the same) instead of queryid wouldn't work?  Or even
> using foreign tables if exporting data is too troublesome. That's
> clearly not ideal, but that's an easy workaround which doesn't add any
> performance hit at runtime.

Thank you Julien for the workaround,
It is not easy to build "cross tables" in excel to join metrics per query
text ...
and I'm not ready to build a MD5(query) as many query could lead to the same
QueryId
I've been using SQL_IDs for ten years, and I have some (who say old) habits
:^)

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread Julien Rouhaud
On Wed, Mar 20, 2019 at 10:30 PM legrand legrand
 wrote:
>
> maybe this patch (with a GUC)
> https://www.postgresql.org/message-id/55e51c48.1060...@uptime.jp
> would be enough for thoses actually using a text normalization function.

The rest of thread raise quite a lot of concerns about the semantics,
the cost and the correctness of this patch.  After 5 minutes checking,
it wouldn't suits your need if you use custom functions, custom types,
custom operators (say using intarray extension) or if your tables
don't have columns in the same order in every environment.  And there
are probably other caveats that I didn't see;



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread Julien Rouhaud
On Wed, Mar 20, 2019 at 10:18 PM legrand legrand
 wrote:
>
> On my personal point of view, I need to get the same Queryid between (OLAP)
> environments
> to be able to compare Production, Pre-production, Qualif performances
> (and I don't need Fully qualified relation names). Today to do that,
> I'm using a custom extension computing the QueryId based on the normalized
> Query
> text.

IIUC, your need is to compare pgss (maybe other extensions) counters
from different primary servers, for queries generated by the same
application(s).  A naive workaround like exporting each environment
counters (COPY SELECT 'production', * FROM pg_stat_statements TO
'...'), importing all of them on a server and then comparing
everything using the query text (which should be the same if the
application is the same) instead of queryid wouldn't work?  Or even
using foreign tables if exporting data is too troublesome. That's
clearly not ideal, but that's an easy workaround which doesn't add any
performance hit at runtime.



Re: Re: A separate table level option to control compression

2019-03-20 Thread Shaun Thomas
Jumping in here, please be gentle. :)

Contents & Purpose
==

This appears to be a patch to add a new table storage option similar to
`toast_tuple_target` but geared toward compression. As a result, it's been
named `compress_tuple_target`, and allows modifying the threshold where
inline tuples actually become compressed. If we're going to make the toast
threshold configurable, it tends to make sense we'd do the same for the
compression threshold.

The patch includes necessary documentation to describe the storage parameter
along with limitations and fallback operating modes. Several tests are also
included.

Verification Procedure
==

The patch applied clean to HEAD, which was at commit 28988a84c... by Peter
Eisentraut, at the time of this review.

Build succeeded without issue, as did `make check` and `make installcheck`.
In addition, I also performed the following manual verification steps
using table page count and `pgstattuple` page distribution for a 10-row
table with a junk column in these scenarios:

* A standard table with a 1000-byte junk column not using this option:
  2 pages at 66% density
* A table with a 1000-byte junk and `compress_tuple_target` set to 512:
  1 page at 6% density; the low threshold activated compression
* A table with a 8120-byte junk and `compress_tuple_target` +
  `toast_tuple_target` set to 8160. Further, junk column was set to
  `main` storage to prevent standard toast thresholds from interfering:
  10 pages at 99.5% density; no compression activated despite large column
* A table with a 8140-byte junk and `compress_tuple_target` +
  `toast_tuple_target` set to 8180. Further, junk column was set to
  `main` storage to prevent standard toast thresholds from interfering:
  1 page at 16% density; compression threshold > 8160 ignored as documented.
  Additionally, neither `compress_tuple_target` or `toast_tuple_target`
  were saved in `pg_class`.

I also performed a `pg_dump` of a table using `compress_tuple_target`,
and dump faithfully preserved the option in the expected location before
the DATA portion.

Discussion
==

Generally this ended about as I expected. I suspect much of the existing
code was cribbed from the implementation of `toast_tuple_target` given
the similar entrypoints and the already existing hard-coded compression
thresholds.

I can't really speak for the discussion related to `storage.sgml`, but
I based my investigation on the existing patch to `create_table.sgml`.
About the only thing I would suggest there is to possibly tweak the
wording.

* "The compress_tuple_target ... " for example should probably read
  "The toast_tuple_target parameter ...".
* "the (blocksize - header)" can drop "the".
* "If the value is set to a value" redundant wording should be rephrased;
  "If the specified value is greater than toast_tuple_target, then
  we will substitute the current setting of toast_tuple_target instead."
  would work.
* I'd recommend a short discussion on what negative consequences can be
  expected by playing with this value. As an example in my tests, setting
  it very high may result in extremely sparse pages that could have an
  adverse impact on HOT updates.

Still, those are just minor nitpicks, and I don't expect that to affect the
quality of the patch implementation.

Good show, gents!

-- 
Shaun M Thomas - 2ndQuadrant
PostgreSQL Training, Services and Support
shaun.tho...@2ndquadrant.com | www.2ndQuadrant.com



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread legrand legrand
maybe this patch (with a GUC)
https://www.postgresql.org/message-id/55e51c48.1060...@uptime.jp
would be enough for thoses actually using a text normalization function.




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread legrand legrand
Julien Rouhaud wrote
> On Wed, Mar 20, 2019 at 8:39 PM legrand legrand
> 

> legrand_legrand@

>  wrote:
>>
>> Yes, I would like first to understand what are the main needs,
> 
> I don't really see one implementation that suits every need, as
> probably not everyone will agree on using relation name vs fully
> qualified relation name for starter.  The idea to take into account or
> normalise comments will also probably require a lot of argumentation
> to reach a consensus.
> 
> Also, most of what's listed here would require catcache lookup for
> every objects impacted in a query, at every execution.  That would be
> *super* expensive (at least for OLTP workload).  As far as the need is
> to gather statistics like pg_stat_statements and similar extensions
> are doing, current queryid semantics and underlying limitations is not
> enough of a problem to justify paying that price IMHO.  Or do you have
> other needs and/or problems that really can't be solved with current
> implementation?
> 
> In other words, my first need would be to be able to deactivate
> everything that would make queryid computation significantly more
> expensive than it's today, and/or to be able to replace it with
> third-party implementation.
> 
>> >> needs.1: stable accross different databases,
>> [...]
>>
>> >needs.7: same value on both the primary and standby?
>>
>> I would say yes (I don't use standby) isn't this included into needs.1 ?
> 
> Physical replication servers have identical oids, so identical
> queryid.  That's obviously not true for logical replication.

On my personal point of view, I need to get the same Queryid between (OLAP)
environments
to be able to compare Production, Pre-production, Qualif performances 
(and I don't need Fully qualified relation names). Today to do that,
I'm using a custom extension computing the QueryId based on the normalized
Query 
text. 

This way to do, seems very popular and maybe including it in core (as a
dedicated extension) 
or proposing an alternate queryid (based on relation name) in PGSS (Guc
enabled) 
would fullfill 95% of the needs ...

I agree with you on the last point: being able to replace actual QueryId
with third-party 
implementation IS the first need.

Regards
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: propagating replica identity to partitions

2019-03-20 Thread Alvaro Herrera
Unless there are any objections to fixing the REPLICA IDENTITY bug, I
intend to push that tomorrow.

People can continue to discuss changing the behavior of other
subcommands where reasonable (OWNER TO) or even for the cases others
consider not reasonable (TABLESPACE), but there is no consensus of doing
that, and no patches either.  I suppose those changes (in the name of
uncompromising consistency) will have to wait till pg13.

Thanks

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread Julien Rouhaud
On Wed, Mar 20, 2019 at 8:39 PM legrand legrand
 wrote:
>
> Yes, I would like first to understand what are the main needs,

I don't really see one implementation that suits every need, as
probably not everyone will agree on using relation name vs fully
qualified relation name for starter.  The idea to take into account or
normalise comments will also probably require a lot of argumentation
to reach a consensus.

Also, most of what's listed here would require catcache lookup for
every objects impacted in a query, at every execution.  That would be
*super* expensive (at least for OLTP workload).  As far as the need is
to gather statistics like pg_stat_statements and similar extensions
are doing, current queryid semantics and underlying limitations is not
enough of a problem to justify paying that price IMHO.  Or do you have
other needs and/or problems that really can't be solved with current
implementation?

In other words, my first need would be to be able to deactivate
everything that would make queryid computation significantly more
expensive than it's today, and/or to be able to replace it with
third-party implementation.

> >> needs.1: stable accross different databases,
> [...]
>
> >needs.7: same value on both the primary and standby?
>
> I would say yes (I don't use standby) isn't this included into needs.1 ?

Physical replication servers have identical oids, so identical
queryid.  That's obviously not true for logical replication.



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-03-20 Thread Tom Lane
Joel Jacobson  writes:
> I've seen a performance trick in other hash functions [1]
> to instead read multiple bytes in each iteration,
> and then handle the remaining bytes after the loop.
> [1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29

I can't get very excited about this, seeing that we're only going to
be hashing short strings.  I don't really believe your 30% number
for short strings; and even if I did, there's no evidence that the
hash functions are worth any further optimization in terms of our
overall performance.

Also, as best I can tell, the approach you propose would result
in an endianness dependence, meaning we'd have to have separate
lookup tables for BE and LE machines.  That's not a dealbreaker
perhaps, but it is certainly another point on the "it's not worth it"
side of the argument.

regards, tom lane



Re: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread legrand legrand
> From "Kyotaro HORIGUCHI-2"
>>At Wed, 20 Mar 2019 00:23:30 +, "Tsunakawa, Takayuki" 
>>> From: legrand legrand [mailto:legrand_legrand@]
>>> norm.9: comments aware
>> Is this to distinguish queries that have different comments for optimizer
>> hints?  If yes, I agree.

> Or, any means to give an explict query id? I saw many instances
> of query that follows a comment describing a query id.

Yes, in fact didn't thought about different kink of comments, but all of
them


>> needs.3: search_path / schema independant,

>pg_stat_statements even ignores table/object/column names.

there is a very interesting thread about that in "pg_stat_statements and non
default search_path"
https://www.postgresql.org/message-id/8f54c609-17c6-945b-fe13-8b07c0866...@dalibo.com

expecting distinct QueryIds when using distinct schemas ...
maybe that It should be switched to "Schema dependant"



>> needs.4: pg version independant (as long as possible),

>I don't think this cannot be guaranteed.

maybe using a QueryId versioning GUC 
 

>> norm.1: case insensitive
>> norm.2: blank reduction 
>> norm.3: hash algoritm ?
>> norm.4: CURRENT_DATE, CURRENT_TIME, LOCALTIME, LOCALTIMESTAMP not
>> normalized
>> norm.5: NULL, IS NULL not normalized ?
>> norm.6: booleans t, f, true, false not normalized
>> norm.7: order by 1,2 or group by 1,2 should not be normalized
>> norm.8: pl/pgsql anonymous blocks not normalized

> pg_stat_statements can be the base of the discussion on them.

OK

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: reducing the footprint of ScanKeyword (was Re: Large writable variables)

2019-03-20 Thread Joel Jacobson
Many thanks for working on this, amazing work, really nice you made it a
separate reusable Perl-module.

The generated hash functions reads one character at a time.
I've seen a performance trick in other hash functions [1]
to instead read multiple bytes in each iteration,
and then handle the remaining bytes after the loop.

[1] https://github.com/wangyi-fudan/wyhash/blob/master/wyhash.h#L29

I've done some testing and it looks like a ~30% speed-up of the generated
ScanKeywords_hash_func() function would be possible.

If you think this approach is promising, I would be happy to prepare a
patch for it,
but I wanted to check with the project this idea has not already been
considered and ruled out
for some technical reasons I've failed to see, is there any?

For this to work you would need to use larger constants for $hash_mult1
and $hash_mult2 though.
I've successfully used these values:
$hash_mult1 0x2c1b3c6d
$hash_mult2 (0x297a2d39, 0x85ebca6b, 0xc2b2ae35, 0x7feb352d, 0x846ca68b)

Here is the idea:

Generated C-code:

  for (; keylen >= 4; keylen -= 4, k += 4)
  {
uint32_t v;
memcpy(, k, 4);
v |= 0x20202020;
a = a * 739982445 + v;
b = b * 2246822507 + v;
  }
  uint32_t v = 0;
  switch (keylen)
  {
  case 3:
memcpy(, k, 3);
v |= 0x202020;
break;
  case 2:
memcpy(, k, 2);
v |= 0x2020;
break;
  case 1:
memcpy(, k, 1);
v |= 0x20;
break;
  }
  a = a * 739982445 + v;
  b = b * 2246822507 + v;
  return h[a % 883] + h[b % 883];

(Reding 8 bytes a time instead would perhaps be a win since some keywords
are quite long.)

Perl-code:

sub _calc_hash
{
my ($key, $mult, $seed) = @_;

my $result = $seed;
my $i=0;
my $keylen = length($key);

for (; $keylen>=4; $keylen-=4, $i+=4) {
my $cn = (ord(substr($key,$i+0,1)) << 0)
   | (ord(substr($key,$i+1,1)) << 8)
| (ord(substr($key,$i+2,1)) << 16)
| (ord(substr($key,$i+3,1)) << 24);
$cn |= 0x20202020 if $case_fold;
$result = ($result * $mult + $cn) % 4294967296;
}

my $cn = 0;
if ($keylen == 3) {
$cn = (ord(substr($key,$i+0,1)) << 0)
   | (ord(substr($key,$i+1,1)) << 8)
| (ord(substr($key,$i+2,1)) << 16);
$cn |= 0x202020 if $case_fold;
} elsif ($keylen == 2) {
$cn = (ord(substr($key,$i+0,1)) << 0)
   | (ord(substr($key,$i+1,1)) << 8);
$cn |= 0x2020 if $case_fold;
} elsif ($keylen == 1) {
$cn = (ord(substr($key,$i+0,1)) << 0);
$cn |= 0x20 if $case_fold;
}
$result = ($result * $mult + $cn) % 4294967296;

return $result;
}


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Alvaro Herrera
On 2019-Mar-20, Andres Freund wrote:

> On 2019-03-20 15:15:02 -0400, Jonathan S. Katz wrote:
> > If we are evaluating this whole symlink / renaming thing, there could be
> > arguments for a "pgsql" alias to psql (or vice versa), but I don't think
> > "pg_sql" makes any sense and could be fairly confusing.
> 
> I don't care much about createdb etc, but I'm *strongly* against
> renaming psql and/or adding symlinks.

+1.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: [survey] New "Stable" QueryId based on normalized query text

2019-03-20 Thread legrand legrand
> From: "Tsunakawa, Takayuki"
>> From: legrand legrand [mailto:legrand_legrand@]
>> There are many projects that use alternate QueryId
>> distinct from the famous pg_stat_statements jumbling algorithm.

>I'd like to welcome the standard QueryID that DBAs and extension developers
can depend on.  
>Are you surveying the needs for you to develop the QueryID that can meet as
many needs as possible?

Yes, I would like first to understand what are the main needs, 
then identify how it would be possible to implement it 
in core, in a new extension or simply with a modified pg_stat_statements.
(I'm just a DBA not a C developer, so it will only be restricted to very
simple enhancements)


>> needs.1: stable accross different databases,

>Does this mean different database clusters, not different databases in a
single database cluster?

Same looking query should give same QueryId on any database (in the same
cluster or in distinct clusters). It can be differentiated with dbid.


>needs.5: minimal overhead to calculate

OK will add it


>needs.6: doesn't change across database server restarts

Really ? does this already occurs ?


>needs.7: same value on both the primary and standby?

I would say yes (I don't use standby) isn't this included into needs.1 ?


>> norm.9: comments aware

>Is this to distinguish queries that have different comments for optimizer
hints?  If yes, I agree.

Yes and others like playing with : 
set ...
select /* test 1*/ ...

set ... 
select /* test 2*/ ...
  




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Jonathan S. Katz
On 3/20/19 3:19 PM, Andres Freund wrote:
> Hi,
> 
> On 2019-03-20 15:15:02 -0400, Jonathan S. Katz wrote:
>> If we are evaluating this whole symlink / renaming thing, there could be
>> arguments for a "pgsql" alias to psql (or vice versa), but I don't think
>> "pg_sql" makes any sense and could be fairly confusing.
> 
> I don't care much about createdb etc, but I'm *strongly* against
> renaming psql and/or adding symlinks. That's like 95% of all
> interactions people have with postgres binaries, making that more
> confusing would be an enterily unnecessary self own.

Yeah I agree. The only one I would entertain is "pgsql" given enough
people refer to PostgreSQL as such, but note I use the term "entertain"
in a similar way to when I knowingly watch terrible movies.

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-03-20 Thread legrand legrand
Hi Jim, Robert,

As this is a distinct subject from adding QueryId to pg_stat_activity,
would it be possible to continue the discussion "new QueryId definition" 
(for postgres open source software) here:

https://www.postgresql.org/message-id/1553029215728-0.p...@n3.nabble.com

Thanks in advance.
Regards
PAscal




--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Andres Freund
Hi,

On 2019-03-20 15:15:02 -0400, Jonathan S. Katz wrote:
> If we are evaluating this whole symlink / renaming thing, there could be
> arguments for a "pgsql" alias to psql (or vice versa), but I don't think
> "pg_sql" makes any sense and could be fairly confusing.

I don't care much about createdb etc, but I'm *strongly* against
renaming psql and/or adding symlinks. That's like 95% of all
interactions people have with postgres binaries, making that more
confusing would be an enterily unnecessary self own.

Greetings,

Andres Freund



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Jonathan S. Katz
On 3/20/19 2:11 PM, Tom Lane wrote:
> "Fred .Flintstone"  writes:
>> Even just creating symlinks would be a welcome change.
>> So the real binary is pg_foo and foo is a symoblic link that points to 
>> pg_foo.
>> Then at least I can type pg_ and use tab auto-completion to find
>> everything related to PostgreSQL.
> 
> You'd miss psql.  I think the odds of renaming psql are not
> distinguishable from zero: whatever arguments you might want to make
> about, say, renaming initdb perhaps not affecting too many scripts
> are surely not going to fly for psql.  So that line of argument
> isn't too convincing.

To add to that, for better or worse, many people associate the
PostgreSQL database itself as "psql" or "pgsql" ("I use psql, it's my
favorite database!").

If we are evaluating this whole symlink / renaming thing, there could be
arguments for a "pgsql" alias to psql (or vice versa), but I don't think
"pg_sql" makes any sense and could be fairly confusing.

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: PostgreSQL pollutes the file system

2019-03-20 Thread Jonathan S. Katz
On 3/20/19 2:08 PM, Alvaro Herrera wrote:
> On 2019-Mar-20, Euler Taveira wrote:
> 
>> Em qua, 20 de mar de 2019 às 14:57, Tom Lane  escreveu:
>>>
>>> We managed to get rid of createlang and droplang in v10, and there
>>> hasn't been that much push-back about it.  So maybe there could be
>>> a move to remove createuser/dropuser?  Or at least rename them to
>>> pg_createuser and pg_dropuser.  But I think this was discussed
>>> (again) during the v10 cycle, and we couldn't agree to do more than
>>> get rid of createlang/droplang.
> 
> Previous discussion: 
> https://postgr.es/m/CABUevExPrfPH5K5qM=zst7tvfyace+i5qja6bfwckkyrh8m...@mail.gmail.com
> 
>> Votes? +1 to remove createuser/dropuser (and also createdb/dropdb as I
>> said in the other email). However, if we don't have sufficient votes,
>> let's at least consider a 'pg_' prefix.
> 
> I vote to keep these rename these utilities to have a pg_ prefix and to
> simultaneously install symlinks for their current names, so that nothing
> breaks.

This sounds like a reasonable plan, pending which binaries we feel to do
that with.

Pardon this naive question as I have not used such systems in awhile,
but would this work on systems that do not support symlinks?

Jonathan



signature.asc
Description: OpenPGP digital signature


Re: Feature: triggers on materialized views

2019-03-20 Thread David Steele

On 3/15/19 8:15 PM, Mitar wrote:


The only pending/unaddressed comment is about the philosophical
question of what it means to be a trigger. There it seems we simply
disagree with the reviewer and I do not know how to address that. I
just see this as a very pragmatical feature which provides features
you would have if you would not use PostgreSQL abstraction. If you can
use features then, why not also with the abstraction?


This seems to be a pretty big deal to me.  When the reviewer is also a 
committer I think you need to give serious thought to their objections.



So in my view this looks more like a lack of review feedback on the
latest version of the patch. I really do not know how to ask for more
feedback or to move the philosophical discussion further. I thought
that commit fest is in fact exactly a place to motivate and collect
such feedback instead of waiting for it in the limbo.


Yes, but sometimes these things take time.


What you need to be doing for PG13 is very specifically addressing
committer concerns and gathering a consensus that the behavior of this
patch is the way to go.


To my understanding the current patch addresses all concerns made by
reviewers on older versions of the patch, or explains why proposals
cannot work out, modulo the question of "does this change what trigger
is".


Still a pretty important question...


Thank you everyone who is involved in this process for your time, I do
appreciate. I am just trying to explain that I am a bit at loss on
concrete next steps I could take here.


This is the last commitfest, so committers and reviewers are focused on 
what is most likely to make it into PG12.  Your patch does not seem to 
be attracting the attention it needs to make it into this release.


Regards,
--
-David
da...@pgmasters.net



Re: PostgreSQL pollutes the file system

2019-03-20 Thread David Steele

On 3/20/19 9:32 PM, Alvaro Herrera wrote:

On 2019-Mar-20, Fred .Flintstone wrote:


Even just creating symlinks would be a welcome change.
So the real binary is pg_foo and foo is a symoblic link that points to pg_foo.
Then at least I can type pg_ and use tab auto-completion to find
everything related to PostgreSQL.


There is merit to this argument; if the starting point is an unknown
file /usr/bin/foo, then having it be a symlink to /usr/bin/pg_foo makes
it clear which package it belongs to.  We don't *have to* get rid of the
symlinks any time soon, but installing as symlinks now will allow Skynet
to get rid of them some decades from now.


+1 to tasking Skynet with removing deprecated features.  Seems like it 
would save a lot of arguing.


--
-David
da...@pgmasters.net



Re: GiST VACUUM

2019-03-20 Thread Heikki Linnakangas

On 15/03/2019 20:25, Andrey Borodin wrote:

11 марта 2019 г., в 20:03, Heikki Linnakangas 
написал(а):

On 10/03/2019 18:40, Andrey Borodin wrote:

One thing still bothers me. Let's assume that we have internal
page with 2 deletable leaves. We lock these leaves in order of
items on internal page. Is it possible that 2nd page have
follow-right link on 1st and someone will lock 2nd page, try to
lock 1st and deadlock with VACUUM?


Hmm. If the follow-right flag is set on a page, it means that its
right sibling doesn't have a downlink in the parent yet.
Nevertheless, I think I'd sleep better, if we acquired the locks in
left-to-right order, to be safe.

>

Actually, I did not found lock coupling in GiST code. But I decided
to lock just two pages at once (leaf, then parent, for every pair).
PFA v22 with this concurrency logic.


Good. I just noticed, that the README actually does say explicitly, that 
the child must be locked before the parent.


I rebased this over the new IntegerSet implementation, from the other 
thread, and did another round of refactoring, cleanups, etc. Attached is 
a new version of this patch. I'm also including the IntegerSet patch 
here, for convenience, but it's the same patch I posted at [1].


It's in pretty good shape, but one remaining issue that needs to be fixed:

During Hot Standby, the B-tree code writes a WAL reord, when a deleted 
page is recycled, to prevent the deletion from being replayed too early 
in the hot standby. See _bt_getbuf() and btree_xlog_reuse_page(). I 
think we need to do something similar in GiST.


I'll try fixing that tomorrow, unless you beat me to it. Making the 
changes is pretty straightforward, but it's a bit cumbersome to test.


[1] 
https://www.postgresql.org/message-id/1035d8e6-cfd1-0c27-8902-40d8d45eb...@iki.fi


- Heikki
>From 4c05c69020334babdc1aa406c5032ae2861e5cb5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2019 02:26:08 +0200
Subject: [PATCH 1/2] Add IntegerSet, to hold large sets of 64-bit ints
 efficiently.

The set is implemented as a B-tree, with a compact representation at leaf
items, using Simple-8b algorithm, so that clusters of nearby values take
less space.

This doesn't include any use of the code yet, but we have two patches in
the works that would benefit from this:

* the GiST vacuum patch, to track empty GiST pages and internal GiST pages.

* Reducing memory usage, and also allowing more than 1 GB of memory to be
  used, to hold the dead TIDs in VACUUM.

This includes a unit test module, in src/test/modules/test_integerset.
It can be used to verify correctness, as a regression test, but if you run
it manully, it can also print memory usage and execution time of some of
the tests.

Author: Heikki Linnakangas, Andrey Borodin
Discussion: https://www.postgresql.org/message-id/b5e82599-1966-5783-733c-1a947ddb7...@iki.fi
---
 src/backend/lib/Makefile  |2 +-
 src/backend/lib/README|2 +
 src/backend/lib/integerset.c  | 1039 +
 src/include/lib/integerset.h  |   25 +
 src/test/modules/Makefile |1 +
 src/test/modules/test_integerset/.gitignore   |4 +
 src/test/modules/test_integerset/Makefile |   21 +
 src/test/modules/test_integerset/README   |7 +
 .../expected/test_integerset.out  |   14 +
 .../test_integerset/sql/test_integerset.sql   |   11 +
 .../test_integerset/test_integerset--1.0.sql  |8 +
 .../modules/test_integerset/test_integerset.c |  622 ++
 .../test_integerset/test_integerset.control   |4 +
 13 files changed, 1759 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/lib/integerset.c
 create mode 100644 src/include/lib/integerset.h
 create mode 100644 src/test/modules/test_integerset/.gitignore
 create mode 100644 src/test/modules/test_integerset/Makefile
 create mode 100644 src/test/modules/test_integerset/README
 create mode 100644 src/test/modules/test_integerset/expected/test_integerset.out
 create mode 100644 src/test/modules/test_integerset/sql/test_integerset.sql
 create mode 100644 src/test/modules/test_integerset/test_integerset--1.0.sql
 create mode 100644 src/test/modules/test_integerset/test_integerset.c
 create mode 100644 src/test/modules/test_integerset/test_integerset.control

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 191ea9bca26..3c1ee1df83a 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \
-   ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
+   ilist.o integerset.o knapsack.o pairingheap.o rbtree.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/README b/src/backend/lib/README
index ae5debe1bc6..f2fb591237d 100644
--- 

Re: GiST VACUUM

2019-03-20 Thread Heikki Linnakangas

On 15/03/2019 20:25, Andrey Borodin wrote:

11 марта 2019 г., в 20:03, Heikki Linnakangas 
написал(а):

On 10/03/2019 18:40, Andrey Borodin wrote:

One thing still bothers me. Let's assume that we have internal
page with 2 deletable leaves. We lock these leaves in order of
items on internal page. Is it possible that 2nd page have
follow-right link on 1st and someone will lock 2nd page, try to
lock 1st and deadlock with VACUUM?


Hmm. If the follow-right flag is set on a page, it means that its
right sibling doesn't have a downlink in the parent yet.
Nevertheless, I think I'd sleep better, if we acquired the locks in
left-to-right order, to be safe.

>

Actually, I did not found lock coupling in GiST code. But I decided
to lock just two pages at once (leaf, then parent, for every pair).
PFA v22 with this concurrency logic.


Good. I just noticed, that the README actually does say explicitly, that 
the child must be locked before the parent.


I rebased this over the new IntegerSet implementation, from the other 
thread, and did another round of refactoring, cleanups, etc. Attached is 
a new version of this patch. I'm also including the IntegerSet patch 
here, for convenience, but it's the same patch I posted at [1].


It's in pretty good shapre, but one remaining issue that needs to be fixed:

During Hot Standby, the B-tree code writes a WAL reord, when a deleted 
page is recycled, to prevent the deletion from being replayed too early 
in the hot standby. See _bt_getbuf() and btree_xlog_reuse_page(). I 
think we need to do something similar in GiST.


I'll try fixing that tomorrow, unless you beat me to it. Making the 
changes is pretty straightforward, but it's a bit cumbersome to test.


[1] 
https://www.postgresql.org/message-id/1035d8e6-cfd1-0c27-8902-40d8d45eb...@iki.fi


- Heikki
>From 4c05c69020334babdc1aa406c5032ae2861e5cb5 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 20 Mar 2019 02:26:08 +0200
Subject: [PATCH 1/2] Add IntegerSet, to hold large sets of 64-bit ints
 efficiently.

The set is implemented as a B-tree, with a compact representation at leaf
items, using Simple-8b algorithm, so that clusters of nearby values take
less space.

This doesn't include any use of the code yet, but we have two patches in
the works that would benefit from this:

* the GiST vacuum patch, to track empty GiST pages and internal GiST pages.

* Reducing memory usage, and also allowing more than 1 GB of memory to be
  used, to hold the dead TIDs in VACUUM.

This includes a unit test module, in src/test/modules/test_integerset.
It can be used to verify correctness, as a regression test, but if you run
it manully, it can also print memory usage and execution time of some of
the tests.

Author: Heikki Linnakangas, Andrey Borodin
Discussion: https://www.postgresql.org/message-id/b5e82599-1966-5783-733c-1a947ddb7...@iki.fi
---
 src/backend/lib/Makefile  |2 +-
 src/backend/lib/README|2 +
 src/backend/lib/integerset.c  | 1039 +
 src/include/lib/integerset.h  |   25 +
 src/test/modules/Makefile |1 +
 src/test/modules/test_integerset/.gitignore   |4 +
 src/test/modules/test_integerset/Makefile |   21 +
 src/test/modules/test_integerset/README   |7 +
 .../expected/test_integerset.out  |   14 +
 .../test_integerset/sql/test_integerset.sql   |   11 +
 .../test_integerset/test_integerset--1.0.sql  |8 +
 .../modules/test_integerset/test_integerset.c |  622 ++
 .../test_integerset/test_integerset.control   |4 +
 13 files changed, 1759 insertions(+), 1 deletion(-)
 create mode 100644 src/backend/lib/integerset.c
 create mode 100644 src/include/lib/integerset.h
 create mode 100644 src/test/modules/test_integerset/.gitignore
 create mode 100644 src/test/modules/test_integerset/Makefile
 create mode 100644 src/test/modules/test_integerset/README
 create mode 100644 src/test/modules/test_integerset/expected/test_integerset.out
 create mode 100644 src/test/modules/test_integerset/sql/test_integerset.sql
 create mode 100644 src/test/modules/test_integerset/test_integerset--1.0.sql
 create mode 100644 src/test/modules/test_integerset/test_integerset.c
 create mode 100644 src/test/modules/test_integerset/test_integerset.control

diff --git a/src/backend/lib/Makefile b/src/backend/lib/Makefile
index 191ea9bca26..3c1ee1df83a 100644
--- a/src/backend/lib/Makefile
+++ b/src/backend/lib/Makefile
@@ -13,6 +13,6 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 OBJS = binaryheap.o bipartite_match.o bloomfilter.o dshash.o hyperloglog.o \
-   ilist.o knapsack.o pairingheap.o rbtree.o stringinfo.o
+   ilist.o integerset.o knapsack.o pairingheap.o rbtree.o stringinfo.o
 
 include $(top_srcdir)/src/backend/common.mk
diff --git a/src/backend/lib/README b/src/backend/lib/README
index ae5debe1bc6..f2fb591237d 100644
--- 

Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tom Lane
"Fred .Flintstone"  writes:
> Even just creating symlinks would be a welcome change.
> So the real binary is pg_foo and foo is a symoblic link that points to pg_foo.
> Then at least I can type pg_ and use tab auto-completion to find
> everything related to PostgreSQL.

You'd miss psql.  I think the odds of renaming psql are not
distinguishable from zero: whatever arguments you might want to make
about, say, renaming initdb perhaps not affecting too many scripts
are surely not going to fly for psql.  So that line of argument
isn't too convincing.

regards, tom lane



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Jehan-Guillaume de Rorthais
On Wed, 20 Mar 2019 13:56:55 -0400
Tom Lane  wrote:

> Julien Rouhaud  writes:
> > On Wed, Mar 20, 2019 at 6:25 PM Euler Taveira 
> > wrote:  
> >> createdb, dropdb, createuser, dropuser, reindexdb are binaries that
> >> confuse most newbies. Which tool is theses binaries from? The names
> >> does not give a hint. How often those confusing name tools are used?  
> 
> > initdb is probably an order of magnitude worse name than all of these.  
> 
> Meh.  The ones with "db" in the name don't strike me as mortal sins;
> even if you don't recognize them as referring to a "database", you're
> not likely to guess wrongly that you know what they do.  The two that
> seem the worst to me are createuser and dropuser, which not only have
> no visible connection to "postgres" or "database" but could easily
> be mistaken for utilities for managing operating-system accounts.
> 
> We managed to get rid of createlang and droplang in v10, and there
> hasn't been that much push-back about it.  So maybe there could be
> a move to remove createuser/dropuser?  Or at least rename them to
> pg_createuser and pg_dropuser.

If you rename them, rename as pg_createrole and pg_droprole :)

I teach people not to use "CREATE USER/GROUP", but each time I have to tell
them "Yes, we kept createuser since 8.1 where roles has been introduced for
backward compatibility. No, there's no createrole".

++



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Alvaro Herrera
On 2019-Mar-20, Euler Taveira wrote:

> Em qua, 20 de mar de 2019 às 14:57, Tom Lane  escreveu:
> >
> > We managed to get rid of createlang and droplang in v10, and there
> > hasn't been that much push-back about it.  So maybe there could be
> > a move to remove createuser/dropuser?  Or at least rename them to
> > pg_createuser and pg_dropuser.  But I think this was discussed
> > (again) during the v10 cycle, and we couldn't agree to do more than
> > get rid of createlang/droplang.

Previous discussion: 
https://postgr.es/m/CABUevExPrfPH5K5qM=zst7tvfyace+i5qja6bfwckkyrh8m...@mail.gmail.com

> Votes? +1 to remove createuser/dropuser (and also createdb/dropdb as I
> said in the other email). However, if we don't have sufficient votes,
> let's at least consider a 'pg_' prefix.

I vote to keep these rename these utilities to have a pg_ prefix and to
simultaneously install symlinks for their current names, so that nothing
breaks.


[In a couple of releases we could patch them so that they print a
deprecation warning to stderr if they're invoked without the prefix.]

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Euler Taveira
Em qua, 20 de mar de 2019 às 14:57, Tom Lane  escreveu:
>
> We managed to get rid of createlang and droplang in v10, and there
> hasn't been that much push-back about it.  So maybe there could be
> a move to remove createuser/dropuser?  Or at least rename them to
> pg_createuser and pg_dropuser.  But I think this was discussed
> (again) during the v10 cycle, and we couldn't agree to do more than
> get rid of createlang/droplang.
>
Votes? +1 to remove createuser/dropuser (and also createdb/dropdb as I
said in the other email). However, if we don't have sufficient votes,
let's at least consider a 'pg_' prefix.




-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tom Lane
Julien Rouhaud  writes:
> On Wed, Mar 20, 2019 at 6:25 PM Euler Taveira  wrote:
>> createdb, dropdb, createuser, dropuser, reindexdb are binaries that
>> confuse most newbies. Which tool is theses binaries from? The names
>> does not give a hint. How often those confusing name tools are used?

> initdb is probably an order of magnitude worse name than all of these.

Meh.  The ones with "db" in the name don't strike me as mortal sins;
even if you don't recognize them as referring to a "database", you're
not likely to guess wrongly that you know what they do.  The two that
seem the worst to me are createuser and dropuser, which not only have
no visible connection to "postgres" or "database" but could easily
be mistaken for utilities for managing operating-system accounts.

We managed to get rid of createlang and droplang in v10, and there
hasn't been that much push-back about it.  So maybe there could be
a move to remove createuser/dropuser?  Or at least rename them to
pg_createuser and pg_dropuser.  But I think this was discussed
(again) during the v10 cycle, and we couldn't agree to do more than
get rid of createlang/droplang.

regards, tom lane



Re: Sparse bit set data structure

2019-03-20 Thread Julien Rouhaud
On Wed, Mar 20, 2019 at 5:20 PM Julien Rouhaud  wrote:
>
> On Wed, Mar 20, 2019 at 2:10 AM Heikki Linnakangas  wrote:
>
> > I'm now pretty satisfied with this. Barring objections, I'll commit this
> > in the next few days. Please review, if you have a chance.
>
> You're defining SIMPLE8B_MAX_VALUE but never use it.  Maybe you wanted
> to add an assert / explicit test in intset_add_member()?
>
> /*
>  * We buffer insertions in a simple array, before packing and inserting them
>  * into the B-tree.  MAX_BUFFERED_VALUES sets the size of the buffer.  The
>  * encoder assumes that it is large enough, that we can always fill a leaf
>  * item with buffered new items.  In other words, MAX_BUFFERED_VALUES must be
>  * larger than MAX_VALUES_PER_LEAF_ITEM.
>  */
> #define MAX_BUFFERED_VALUES(MAX_VALUES_PER_LEAF_ITEM * 2)
>
> The *2 is not immediately obvious here (at least it wasn't to me),
> maybe explaining intset_flush_buffered_values() main loop rationale
> here could be worthwhile.
>
> Otherwise, everything looks just fine!

I forgot to mention a minor gripe about the intset_binsrch_uint64 /
intset_binsrch_leaf function, which are 99% duplicates.  But I don't
know if fixing that (something like passing the array as a void * and
passing a getter function?) is worth the trouble.



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Julien Rouhaud
On Wed, Mar 20, 2019 at 6:25 PM Euler Taveira  wrote:
>
> createdb, dropdb, createuser, dropuser, reindexdb are binaries that
> confuse most newbies. Which tool is theses binaries from? The names
> does not give a hint. How often those confusing name tools are used?

initdb is probably an order of magnitude worse name than all of these.



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Alvaro Herrera
On 2019-Mar-20, Fred .Flintstone wrote:

> Even just creating symlinks would be a welcome change.
> So the real binary is pg_foo and foo is a symoblic link that points to pg_foo.
> Then at least I can type pg_ and use tab auto-completion to find
> everything related to PostgreSQL.

There is merit to this argument; if the starting point is an unknown
file /usr/bin/foo, then having it be a symlink to /usr/bin/pg_foo makes
it clear which package it belongs to.  We don't *have to* get rid of the
symlinks any time soon, but installing as symlinks now will allow Skynet
to get rid of them some decades from now.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Euler Taveira
Em qua, 20 de mar de 2019 às 14:22, Fred .Flintstone
 escreveu:
>
> Even just creating symlinks would be a welcome change.
> So the real binary is pg_foo and foo is a symoblic link that points to pg_foo.
> Then at least I can type pg_ and use tab auto-completion to find
> everything related to PostgreSQL.
>
There are Postgres binaries that do not start with 'pg_' (for example,
pgbench and ecpg) and do not confuse newbies or conflict with OS
binary names.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Euler Taveira
Em qua, 20 de mar de 2019 às 11:39, Tatsuo Ishii  escreveu:
>
> +1. As one of third party PostgreSQL tool developers, I am afraid
> changing names of PostgreSQL commands would give us lots of pain: for
> example checking PostgreSQL version to decide to use command "foo" not
> "pg_foo".
>
createdb, dropdb, createuser, dropuser, reindexdb are binaries that
confuse most newbies. Which tool is theses binaries from? The names
does not give a hint. How often those confusing name tools are used?
AFAICS a graphical tool or psql is used to create roles and databases.
psql -c "stmt" can replace createdb, dropdb, createuser and dropuser.
What about deprecate them (and remove after a support cycle)?


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Fred .Flintstone
On Wed, Mar 20, 2019 at 3:19 PM Tom Lane  wrote:
> If we didn't pull the trigger twenty years ago, nor ten years ago,
> we're not likely to do so now.  Yeah, it's a mess and we'd certainly
> do it differently if we were starting from scratch, but we're not
> starting from scratch.  There are decades worth of scripts out there
> that know these program names, most of them not under our control.
>
> Every time this has been looked at, we've concluded that the
> distributed costs of getting rid of these program names would exceed
> the value; and that tradeoff gets worse, not better, as more years
> go by.  I don't foresee it happening.

Even just creating symlinks would be a welcome change.
So the real binary is pg_foo and foo is a symoblic link that points to pg_foo.
Then at least I can type pg_ and use tab auto-completion to find
everything related to PostgreSQL.



Re: Add exclusive backup deprecation notes to documentation

2019-03-20 Thread David Steele

Hi Robert,

On 3/20/19 6:31 PM, Robert Haas wrote:

On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier  wrote:

On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:

Please note that there have been objections to the patch later in this
thread by Peter and Robert.  I'm not very interested in watering down the
documentation changes as Peter suggests, but I think at the very least we
should commit the added hints in the error message.  For many users this
error will be their first point of contact with the backup_label
issue/behavior.


The updates of the log message do not imply anything negative as I
read them as they mention to not remove the backup_label file.  So
while we don't have an agreement about the docs, the log messages may
be able to be committed?  Peter?  Robert?

"will result in a corruptED cluster" is more correct?


I really like the proposed changes to the ereport() text.  I think the
"Be careful" hint is a really helpful way of phrasing it.  I think
"corrupt" as the patch has it is slightly better than "corrupted".
Obviously, we have to make the updates for recovery.signal no matter
what, and you could argue that part should be its own commit, but I
like all of the changes.


Cool.


I'm not too impressed with the documentation changes.  A lot of the
information being added is already present somewhere in that very same
section.  It's reasonable to revise the section so that the dangers
stand out more clearly, but it doesn't seem good to revise it in a way
that ends up duplicating the existing information.  


OK.


Here's my
suggestion -- ditch the note at the end of the section and make the
one at the beginning read like this:

The exclusive backup method is deprecated and should be avoided.
Prior to PostgreSQL 9.6, this was the only
low-level method available, but it is now recommended that all users
upgrade their scripts to use non-exclusive backups.

Then, revise the first paragraph like this:

The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of
backup can only be taken on a primary and does not allow concurrent
backups.  Moreover, because it writes a backup_label file on the
master, it can cause the master to fail to restart automatically after
a crash.  On the other hand, the erroneous removal of a backup_label
file from a backup or standby is a common mistake which can can result
in serious data corruption.  If it is necessary to use this method,
the following steps may be used.


This works for me as I feel like the cautions here (and below) are still 
strongly worded.  Peter?



Later, where it says:

Note that if the server crashes during the backup it may not be
possible to restart until the backup_label
file has been
manually deleted from the PGDATA directory.

Change it to read:

As noted above, if the server crashes during the backup it may not be
possible to restart until the backup_label file has
been manually deleted from the PGDATA directory.  Note
that it is very important to never to remove the
backup_label file when restoring a backup, because
this will result in corruption.  Confusion about the circumstances
under which it is appropriate to remove this file is a common cause of
data corruption when using this method; be very certain that you
remove the file only on an existing master and never when building a
standby or restoring a backup, even if you are building a standby that
will subsequently be promoted to a new master.


Technically we are into repetition here, but I'm certainly OK with it as 
this point bears repeating.



I also think we should revise this thoroughly terrible advice:

   If you wish to place a time limit on the execution of
   pg_stop_backup, set an appropriate
   statement_timeout value, but make note that if
   pg_stop_backup terminates because of this your 
backup
   may not be valid.

That seems awful, not only because it encourages people to do that
particular thing and end up leaving the server in backup mode, but
also because it doesn't clearly articulate the extreme importance of
making sure that the server is not left in backup mode.  So I would
propose that we strike that text entirely and replace it with
something like:

When using exclusive backup mode, it is absolutely imperative to make
sure that pg_stop_backup completes successfully
at the end of the backup.  Even if the backup itself fails, for
example due to lack of disk space, failure to call
pg_stop_backup will leave the server in backup
mode indefinitely, causing future backups to fail and increasing the
risk of a restart during a time when backup_label
exists.


It's also pretty important that exclusive backups complete successfully 
since backup_label is returned from pg_stop_backup() -- the backup will 
definitely be corrupt without that if there was a checkpoint during the 
backup.  But, yeah, leaving a backup_label around for a long time 

Re: Optimze usage of immutable functions as relation

2019-03-20 Thread Alexander Kuzmenkov

On 11/16/18 22:03, Tom Lane wrote:

A possible fix for this is to do eval_const_expressions() on
function RTE expressions at this stage (and then not need to
do it later), and then pull up only when we find that the
RTE expression has been reduced to a single Const.



Attached is a patch that does this, and transforms RTE_FUCTION that was 
reduced to a single Const into an RTE_RESULT.


Not sure it does everything correctly, but some basic cases work. In 
particular, I don't understand whether it needs any handling of "append 
relations".



--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From 4921a58f38659580f76f9684e56c0546d46526bc Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov 
Date: Wed, 20 Mar 2019 19:56:20 +0300
Subject: [PATCH v5] Simplify immutable RTE_FUNCTION to RTE_RESULT.

---
 src/backend/optimizer/plan/planner.c  |   6 +-
 src/backend/optimizer/prep/prepjointree.c |  85 ++
 src/test/regress/expected/join.out| 113 +++---
 src/test/regress/sql/join.sql |  44 ++--
 4 files changed, 234 insertions(+), 14 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index e408e77..fe2f426 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -1050,7 +1050,8 @@ preprocess_expression(PlannerInfo *root, Node *expr, int kind)
 		expr = flatten_join_alias_vars(root->parse, expr);
 
 	/*
-	 * Simplify constant expressions.
+	 * Simplify constant expressions. For function RTEs, this was already done
+	 * by pullup_simple_subqueries.
 	 *
 	 * Note: an essential effect of this is to convert named-argument function
 	 * calls to positional notation and insert the current actual values of
@@ -1064,7 +1065,8 @@ preprocess_expression(PlannerInfo *root, Node *expr, int kind)
 	 * careful to maintain AND/OR flatness --- that is, do not generate a tree
 	 * with AND directly under AND, nor OR directly under OR.
 	 */
-	expr = eval_const_expressions(root, expr);
+	if (kind != EXPRKIND_RTFUNC && kind != EXPRKIND_RTFUNC_LATERAL)
+		expr = eval_const_expressions(root, expr);
 
 	/*
 	 * If it's a qual or havingQual, canonicalize it.
diff --git a/src/backend/optimizer/prep/prepjointree.c b/src/backend/optimizer/prep/prepjointree.c
index aebe162..313c2bb 100644
--- a/src/backend/optimizer/prep/prepjointree.c
+++ b/src/backend/optimizer/prep/prepjointree.c
@@ -79,6 +79,9 @@ static Node *pull_up_simple_union_all(PlannerInfo *root, Node *jtnode,
 static void pull_up_union_leaf_queries(Node *setOp, PlannerInfo *root,
 		   int parentRTindex, Query *setOpQuery,
 		   int childRToffset);
+static void transform_const_function_to_result(PlannerInfo *root, Node *jtnode,
+	 RangeTblEntry *rte,
+	 JoinExpr *lowest_nulling_outer_join);
 static void make_setop_translation_list(Query *query, Index newvarno,
 			List **translated_vars);
 static bool is_simple_subquery(Query *subquery, RangeTblEntry *rte,
@@ -754,6 +757,18 @@ pull_up_subqueries_recurse(PlannerInfo *root, Node *jtnode,
 			is_simple_values(root, rte))
 			return pull_up_simple_values(root, jtnode, rte);
 
+		/*
+		 * Or is it an immutable function that evaluated to a single Const?
+		 */
+		if (rte->rtekind == RTE_FUNCTION)
+		{
+			rte->functions = (List *) 
+eval_const_expressions(root, (Node *) rte->functions);
+			transform_const_function_to_result(root, jtnode, rte,
+			   lowest_nulling_outer_join);
+			return jtnode;
+		}
+
 		/* Otherwise, do nothing at this node. */
 	}
 	else if (IsA(jtnode, FromExpr))
@@ -1783,6 +1798,76 @@ is_simple_union_all_recurse(Node *setOp, Query *setOpQuery, List *colTypes)
 }
 
 /*
+ * If the function of this RTE_FUNCTION entry evaluated to a single Const
+ * after eval_const_expressions, transform it to RTE_RESULT.
+ */
+static void
+transform_const_function_to_result(PlannerInfo *root, Node *jtnode,
+   RangeTblEntry *rte,
+   JoinExpr *lowest_nulling_outer_join)
+{
+	ListCell *lc;
+	pullup_replace_vars_context rvcontext;
+	RangeTblFunction *rtf = (RangeTblFunction *) linitial(rte->functions);
+	Query *parse = root->parse;
+
+	if (!IsA(rtf->funcexpr, Const))
+		return;
+
+	if (rte->funcordinality)
+		return;
+
+	if (list_length(rte->functions) != 1)
+		return;
+
+	rvcontext.targetlist = list_make1(makeTargetEntry((Expr *) rtf->funcexpr,
+		1 /* resno */, NULL /* resname */, false /* resjunk */));
+	rvcontext.root = root;
+	rvcontext.target_rte = rte;
+
+	/*
+	 * Since this function was reduced to Const, it can't really have lateral
+	 * references, even if it's marked as LATERAL. This means we don't need
+	 * to fill relids.
+	 */
+	rvcontext.relids = NULL;
+
+	rvcontext.outer_hasSubLinks = >hasSubLinks;
+	rvcontext.varno = ((RangeTblRef *) jtnode)->rtindex;
+
+	/*
+	 * If this RTE is on a nullable side of an outer join, we have to insert
+	 * PHVs around 

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO


Michaël-san,


I think that a clear warning not to run any cluster command in parallel,
under pain of possible cluster corruption, and possibly other caveats about
replication, should appear in the documentation.


I still have the following extra documentation in my notes:


Ok, it should have been included in the patch.


+ 
+  Notes
+  
+   When disabling or enabling checksums in a cluster of multiple instances,


ISTM that a "postgres cluster" was like an Oracle instance:

See https://www.postgresql.org/docs/devel/creating-cluster.html

So the vocabulary used above seems misleading. I'm not sure how to name an 
Oracle cluster in postgres lingo, though.



+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.


I think that the motivation/risks should appear before the solution. "As 
xyz ..., ...", or there at least the logical link should be outlined.


It is not clear for me whether the following sentences, which seems 
specific to "pg_rewind", are linked to the previous advice, which seems 
rather to refer to streaming replication?



+   When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   ), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  
+ 


Should not disabling in reverse order be safe? the checksum are not 
checked afterwards?



This sounds kind of enough for me but..


ISTM that the risks could be stated more clearly.


I also think that some mechanism should be used to prevent such occurence,
whether in this patch or another.


I won't counter-argue on that.


This answer is ambiguous.


I still think that the control file update should be made *after* the data
directory has been synced, otherwise the control file could be updated while
some data files are not. It just means exchanging two lines.


The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.


I do not understand. The issue I'm refering to is, when enabling:

 - data files are updated in fs cache
 - control file is updated in fs cache
 * fsync is called
 - updated control file gets written
 - data files are being written...
   but reboot occurs while fsyncing is still in progress

After the reboot, some data files are not fully updated with their 
checksums, although the controlfiles tells that they are. It should then 
fail after a restart when a no-checksum page is loaded?


What am I missing?

Also, I do not see how exchanging two lines makes the code less simple.


If the conditional sync is moved before the file update, the file update
should pass do_sync to update_controlfile.


For the durability of the operation, the current order is intentional.


See my point above: I think that this order can lead to cluster 
corruption. If not, please be kind enough to try to explain in more 
details why I'm wrong.


--
Fabien.

Re: Automated way to find actual COMMIT LSN of subxact LSN

2019-03-20 Thread Tom Lane
Jeremy Finzel  writes:
> A related problem kind of demonstrates the same odd behavior.  If you put
> in recovery_target_xid to a subtransaction_id, it just skips it and
> continues recovering, which really seems to be undesirable behavior.  It
> would be nice if that also could roll up to the next valid actual commit
> transaction.

It would seem like what you're asking for is to continue until the commit
of the parent transaction, not just the next commit after the subcommit.
Otherwise (if that's an unrelated xact) the subxact would still not be
committed, so that you might as well have stopped short of it.

I'd be in favor of that for recovery_target_xid, but I'm not at all
convinced about changing the behavior for a target LSN.  The fact that
the target is a subcommit seems irrelevant when you specify by LSN.

I don't recall this for sure, but doesn't a parent xact's commit record
include all subxact XIDs?  If so, the implementation would just require
searching the subxacts as well as the main XID for a match to
recovery_target_xid.

regards, tom lane



Re: Sparse bit set data structure

2019-03-20 Thread Julien Rouhaud
On Wed, Mar 20, 2019 at 2:10 AM Heikki Linnakangas  wrote:
>
> On 14/03/2019 17:37, Julien Rouhaud wrote:
>
> > +   if (newitem <= sbs->last_item)
> > +   elog(ERROR, "cannot insert to sparse bitset out of order");
> >
> > Is there any reason to disallow inserting duplicates?  AFAICT nothing
> > prevents that in the current code.  If that's intended, that probably
> > should be documented.
>
> Yeah, we could easily allow setting the last item again. It would be a
> no-op, though, which doesn't seem very useful. It would be useful to
> lift the limitation that the values have to be added in ascending order,
> but current users that we're thinking of don't need it. Let's add that
> later, if the need arises.
>
> Or did you mean that the structure would be a "bag" rather than a "set",
> so that it would keep the duplicates? I don't think that would be good.
> I guess the vacuum code that this will be used in wouldn't care either
> way, but "set" seems like a more clean concept.

Yes, I was thinking about "bag".  For a set, allowing inserting
duplicates is indeed a no-op and should be pretty cheap with almost no
extra code for that.  Maybe VACUUM can't have duplicate, but is it
that unlikely that other would need it?  I'm wondering if just
requiring to merge multiple such structure isn't going to be needed
soon for instance.  If that's not wanted, I'm still thinking that a
less ambiguous error should be raised.

> I'm now pretty satisfied with this. Barring objections, I'll commit this
> in the next few days. Please review, if you have a chance.

You're defining SIMPLE8B_MAX_VALUE but never use it.  Maybe you wanted
to add an assert / explicit test in intset_add_member()?

/*
 * We buffer insertions in a simple array, before packing and inserting them
 * into the B-tree.  MAX_BUFFERED_VALUES sets the size of the buffer.  The
 * encoder assumes that it is large enough, that we can always fill a leaf
 * item with buffered new items.  In other words, MAX_BUFFERED_VALUES must be
 * larger than MAX_VALUES_PER_LEAF_ITEM.
 */
#define MAX_BUFFERED_VALUES(MAX_VALUES_PER_LEAF_ITEM * 2)

The *2 is not immediately obvious here (at least it wasn't to me),
maybe explaining intset_flush_buffered_values() main loop rationale
here could be worthwhile.

Otherwise, everything looks just fine!



Re: pg_basebackup ignores the existing data directory permissions

2019-03-20 Thread Peter Eisentraut
On 2019-03-19 08:34, Haribabu Kommi wrote:
> How about the following change?
> 
> pg_basebackup  --> copies the contents of the src directory (with group
> access) 
> and even the root directory permissions.
> 
> pg_basebackup --no-group-access   --> copies the contents of the src
> directory 
> (with no group access) even for the root directory.

I'm OK with that.  Perhaps a positive option --allow-group-access would
also be useful.

Let's make sure the behavior of these options is aligned with initdb.
And write tests for each variant.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Special role for subscriptions

2019-03-20 Thread Andrey Borodin



> 20 марта 2019 г., в 21:46, Robert Haas  написал(а):
> 
> On Wed, Mar 20, 2019 at 5:39 AM Evgeniy Efimkin  
> wrote:
>> Hi!
>>> Currently, user with pg_subscription_users can create subscription into any 
>>> system table, can't they?
>>> We certainly need to change it to more secure way.
>> No, you can't add system tables to publication. In new patch i add 
>> privileges checks on target table, non superuser can't create/refresh 
>> subscription if he don't have INSERT, UPDATE, DELETE and TRUNCATE privileges.
> 
> 
> 
> I think we should view this permission as "you can create
> subscriptions, plain and simple".

That sounds good.
From my POV, the purpose of the patch is to allow users to transfer their 
database via logical replication. Without superuser privileges (e.g. to the 
managed cloud with vanilla postgres).

But the role effectively allows inserts to any table, this can be escalated to 
superuser. What is the best way to deal with it?

Best regards, Andrey Borodin.


Re: pg_basebackup ignores the existing data directory permissions

2019-03-20 Thread Peter Eisentraut
On 2019-03-18 16:45, Robert Haas wrote:
>> I'm strongly in favor of keeping initdb and pg_basebackup options
>> similar and consistent.  They are both ways to initialize data directories.
>>
>> You'll note that initdb does not behave the way you describe.  It's not
>> unreasonable behavior, but it's not the way it currently works.
> So you want to default to no group access regardless of the directory
> permissions, with an option to enable group access that must be
> explicitly specified?  That seems like a reasonable option to me; note
> that initdb does seem to chdir() an existing directory.

I think that would have been my preference, but PG11 is already shipping
with the current behavior, so I'm not sure whether that should be changed.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Built-in connection pooler

2019-03-20 Thread Konstantin Knizhnik

New version of the patch (rebased + bug fixes) is attached to this mail.

On 20.03.2019 18:32, Konstantin Knizhnik wrote:
Attached please find results of benchmarking of different connection 
poolers.


Hardware configuration:
   Intel(R) Xeon(R) CPU   X5675  @ 3.07GHz
   24 cores (12 physical)
   50 GB RAM

Tests:
 pgbench read-write (scale 1): performance is mostly limited by 
disk throughput
 pgbench select-only (scale 1): performance is mostly limited by 
efficient utilization of CPU by all workers
 pgbench with YCSB-like workload with Zipf distribution: 
performance is mostly limited by lock contention


Participants:
    1. pgbouncer (16 and 32 pool size, transaction level pooling)
    2. Postgres Pro-EE connection poller: redirection of client 
connection to poll workers and maintaining of session contexts.

    16 and 32 connection pool size (number of worker backend).
    3. Built-in proxy connection pooler: implementation proposed in 
this thread.
    16/1 and 16/2 specifies number of worker backends per proxy 
and number of proxies, total number of backends is multiplication of 
this values.
    4. Yandex Odyssey (32/2 and 64/4 configurations specifies number 
of backends and Odyssey threads).
    5. Vanilla Postgres (marked at diagram as "12devel-master/2fadf24 
POOL=none")


In all cases except 2) master branch of Postgres is used.
Client (pgbench), pooler and postgres are laucnhed at the same host. 
Communication is though loopback interface (host=localhost).

We have tried to find the optimal parameters for each pooler.
Three graphics attached to the mail illustrate three different test 
cases.


Few comments about this results:
- PgPro EE pooler shows the best results in all cases except tpc-b 
like. In this case proxy approach is more efficient because more 
flexible job schedule between workers
  (in EE sessions are scattered between worker backends at connect 
time, while proxy chooses least loaded backend for each transaction).
- pgbouncer is not able to scale well because of its single-threaded 
architecture. Certainly it is possible to spawn several instances of 
pgbouncer and scatter

  workload between them. But we have not did it.
- Vanilla Postgres demonstrates significant degradation of performance 
for large number of active connections on all workloads except read-only.
- Despite to the fact that Odyssey is new player (or may be because of 
it), Yandex pooler doesn't demonstrate good results. It is the only 
pooler which also cause degrade of performance with increasing number 
of connections. May be it is caused by memory leaks, because it memory 
footprint is also actively increased during test.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d383de2..bee9725 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -719,6 +719,123 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   max_sessions configuration parameter
+  
+  
+  
+
+  The maximum number of client sessions that can be handled by
+  one connection proxy when session pooling is switched on.
+  This parameter does not add any memory or CPU overhead, so
+  specifying a large max_sessions value
+  does not affect performance.
+  If the max_sessions limit is reached new connection are not accepted.
+
+
+  The default value is 1000. This parameter can only be set at server start.
+   
+  
+ 
+
+ 
+  session_pool_size (integer)
+  
+   session_pool_size configuration parameter
+  
+  
+  
+
+  Enables session pooling and defines the maximum number of
+  backends that can be used by client sessions for each database/user combination.
+  Launched non-tainted backends are never terminated even if there are no active sessions.
+  Backend is considered as tainted if client updates GUCs, creates temporary table or prepared statements.
+  Tainted backend can server only one client.
+
+
+  The default value is 10, so up to 10 backends will server each database,
+
+  
+ 
+
+ 
+  proxy_port (integer)
+  
+   proxy_port configuration parameter
+  
+  
+  
+
+  Sets the TCP port for the connection pooler.
+  Clients connected to main "port" will be assigned dedicated backends,
+  while client connected to proxy port will be connected to backends through proxy which
+  performs transaction level scheduling. 
+   
+
+  The default value is 6543.
+
+  
+ 
+
+ 
+  connection_proxies (integer)
+  
+   connection_proxies configuration parameter
+  
+  
+  
+
+  

Re: Automated way to find actual COMMIT LSN of subxact LSN

2019-03-20 Thread Jeremy Finzel
>
> If recovery_target_inclusive were able to take the third value
> "xact", is it exactly what you want?
>
> And is it acceptable?
>

Yes, that would be exactly what I would want.  It would work to have a 3rd
value for recovery_target_inclusive, although perhaps it's debatable that
instead, it should actually be the default behavior to roll any LSN with
recovery_target_inclusive = true until it is actually visible?  If I say I
want to "include" an LSN in my recovery target, it doesn't make much sense
if that just won't be visible unless it's an actual commit LSN, so in fact
the recovery point does not include the LSN.

A related problem kind of demonstrates the same odd behavior.  If you put
in recovery_target_xid to a subtransaction_id, it just skips it and
continues recovering, which really seems to be undesirable behavior.  It
would be nice if that also could roll up to the next valid actual commit
transaction.

Thanks!
Jeremy


Re: Add exclusive backup deprecation notes to documentation

2019-03-20 Thread Robert Haas
On Wed, Mar 20, 2019 at 9:00 AM Michael Paquier  wrote:
> On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
> > Please note that there have been objections to the patch later in this
> > thread by Peter and Robert.  I'm not very interested in watering down the
> > documentation changes as Peter suggests, but I think at the very least we
> > should commit the added hints in the error message.  For many users this
> > error will be their first point of contact with the backup_label
> > issue/behavior.
>
> The updates of the log message do not imply anything negative as I
> read them as they mention to not remove the backup_label file.  So
> while we don't have an agreement about the docs, the log messages may
> be able to be committed?  Peter?  Robert?
>
> "will result in a corruptED cluster" is more correct?

I really like the proposed changes to the ereport() text.  I think the
"Be careful" hint is a really helpful way of phrasing it.  I think
"corrupt" as the patch has it is slightly better than "corrupted".
Obviously, we have to make the updates for recovery.signal no matter
what, and you could argue that part should be its own commit, but I
like all of the changes.

I'm not too impressed with the documentation changes.  A lot of the
information being added is already present somewhere in that very same
section.  It's reasonable to revise the section so that the dangers
stand out more clearly, but it doesn't seem good to revise it in a way
that ends up duplicating the existing information.  Here's my
suggestion -- ditch the note at the end of the section and make the
one at the beginning read like this:

The exclusive backup method is deprecated and should be avoided.
Prior to PostgreSQL 9.6, this was the only
low-level method available, but it is now recommended that all users
upgrade their scripts to use non-exclusive backups.

Then, revise the first paragraph like this:

The process for an exclusive backup is mostly the same as for a
non-exclusive one, but it differs in a few key steps. This type of
backup can only be taken on a primary and does not allow concurrent
backups.  Moreover, because it writes a backup_label file on the
master, it can cause the master to fail to restart automatically after
a crash.  On the other hand, the erroneous removal of a backup_label
file from a backup or standby is a common mistake which can can result
in serious data corruption.  If it is necessary to use this method,
the following steps may be used.

Later, where it says:

   Note that if the server crashes during the backup it may not be
   possible to restart until the backup_label
file has been
   manually deleted from the PGDATA directory.

Change it to read:

As noted above, if the server crashes during the backup it may not be
possible to restart until the backup_label file has
been manually deleted from the PGDATA directory.  Note
that it is very important to never to remove the
backup_label file when restoring a backup, because
this will result in corruption.  Confusion about the circumstances
under which it is appropriate to remove this file is a common cause of
data corruption when using this method; be very certain that you
remove the file only on an existing master and never when building a
standby or restoring a backup, even if you are building a standby that
will subsequently be promoted to a new master.

I also think we should revise this thoroughly terrible advice:

  If you wish to place a time limit on the execution of
  pg_stop_backup, set an appropriate
  statement_timeout value, but make note that if
  pg_stop_backup terminates because of this your backup
  may not be valid.

That seems awful, not only because it encourages people to do that
particular thing and end up leaving the server in backup mode, but
also because it doesn't clearly articulate the extreme importance of
making sure that the server is not left in backup mode.  So I would
propose that we strike that text entirely and replace it with
something like:

When using exclusive backup mode, it is absolutely imperative to make
sure that pg_stop_backup completes successfully
at the end of the backup.  Even if the backup itself fails, for
example due to lack of disk space, failure to call
pg_stop_backup will leave the server in backup
mode indefinitely, causing future backups to fail and increasing the
risk of a restart during a time when backup_label
exists.

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: speeding up planning with partitions

2019-03-20 Thread Jesper Pedersen

Hi,

On 3/19/19 11:15 PM, Imai, Yoshikazu wrote:

Here the details.

[creating partitioned tables (with 1024 partitions)]
drop table if exists rt;
create table rt (a int, b int, c int) partition by range (a);
\o /dev/null
select 'create table rt' || x::text || ' partition of rt for values from (' ||
  (x)::text || ') to (' || (x+1)::text || ');' from generate_series(1, 1024) x;
\gexec
\o

[select1024.sql]
\set a random (1, 1024)
select * from rt where a = :a;

[pgbench]
pgbench -n -f select1024.sql -T 60




My tests - using hash partitions - shows that the extra time is spent in 
make_partition_pruneinfo() for the relid_subplan_map variable.


  64 partitions: make_partition_pruneinfo() 2.52%
8192 partitions: make_partition_pruneinfo() 5.43%

TPS goes down ~8% between the two. The test setup is similar to the above.

Given that Tom is planning to change the List implementation [1] in 13 I 
think using the palloc0 structure is ok for 12 before trying other 
implementation options.


perf sent off-list.

[1] https://www.postgresql.org/message-id/24783.1551568303%40sss.pgh.pa.us

Best regards,
 Jesper



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Tom Lane
Chris Travers  writes:
> On Wed, Mar 20, 2019 at 11:06 AM Andreas Karlsson  wrote:
>> On 3/19/19 11:19 AM, Fred .Flintstone wrote:
>>> It would be better if these files were renamed to be prefixed with
>>> pg_, such as pg_createdb.
>>> Or even better postgresql-createdb then be reachable by through a
>>> "postgresql" wrapper script.

>> This topic has been discussed before e.g. in 2008 in
>> https://www.postgresql.org/message-id/47EA5CC0.8040102%40sun.com and
>> also more recently but I cannot find it in the archives right now.

And also before that, eg
https://www.postgresql.org/message-id/flat/199910091253.IAA10670%40candle.pha.pa.us

> I wouldn't be opposed to this, but I would note two points on a deprecation
> cycle:
> 1  Given that people may have tools that work with all supported versions
> of PostgreSQL, this needs to be a long cycle, and
> 2. Managing that cycle makes it a little bit of a tough sell.

If we didn't pull the trigger twenty years ago, nor ten years ago,
we're not likely to do so now.  Yeah, it's a mess and we'd certainly
do it differently if we were starting from scratch, but we're not
starting from scratch.  There are decades worth of scripts out there
that know these program names, most of them not under our control.

Every time this has been looked at, we've concluded that the
distributed costs of getting rid of these program names would exceed
the value; and that tradeoff gets worse, not better, as more years
go by.  I don't foresee it happening.

regards, tom lane



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Chris Howard



Another pattern is to have a separate bin path for
various software packages:  /opt/postgres/bin  for example.

That doesn't directly answer "what is createdb?" but it
does give a quicker indication via the 'which' command.




On 3/20/19 5:43 AM, Fred .Flintstone wrote:

It seems nothing came out of the discussion in 2008.
I feel the topic should be revisited.

I am in favor of doing so too. The deprecation cycle could involve
symlinks for a brief period of time or a couple of versions.

Yes, the wrapper script approach is used by Git as well as the "dotnet" command.
The wrapper script addition doesn't mean executing the commands
directly without the wrapper won't be possible. So one doesn't exclude
the other.
It would be a welcome addition.

On Wed, Mar 20, 2019 at 11:05 AM Andreas Karlsson  wrote:

On 3/19/19 11:19 AM, Fred .Flintstone wrote:

PostgreSQL pollutes the file system with lots of binaries that it is
not obvious that they belong to PostgreSQL.

Such as "/usr/bin/createdb", etc.

It would be better if these files were renamed to be prefixed with
pg_, such as pg_createdb.
Or even better postgresql-createdb then be reachable by through a
"postgresql" wrapper script.

Hi,

This topic has been discussed before e.g. in 2008 in
https://www.postgresql.org/message-id/47EA5CC0.8040102%40sun.com and
also more recently but I cannot find it in the archives right now.

I am personally in favor of renaming e.g. createdb to pg_createdb, since
it is not obvious that createdb belongs to PostgreSQL when reading a
script or looking in /usr/bin, but we would need a some kind of
deprecation cycle here or we would suddenly break tons of people's scripts.

And as for the git-like solution with a wrapper script, that seems to be
the modern way to do things but would be an even larger breakage and I
am not convinced the advantage would be worth it especially since our
executables are not as closely related and consistent as for example git's.

Andreas








Re: Special role for subscriptions

2019-03-20 Thread Robert Haas
On Wed, Mar 20, 2019 at 5:39 AM Evgeniy Efimkin  wrote:
> Hi!
> > Currently, user with pg_subscription_users can create subscription into any 
> > system table, can't they?
> > We certainly need to change it to more secure way.
> No, you can't add system tables to publication. In new patch i add privileges 
> checks on target table, non superuser can't create/refresh subscription if he 
> don't have INSERT, UPDATE, DELETE and TRUNCATE privileges.

I don't that's the right approach.  That idea kinda makes sense if you
think about it as giving permission to publish tables to which they
have rights, but that doesn't seem like the right mental model to me.
It seems more likely that there is a person whose job it is to set up
replication but who doesn't normally interact with the table data
itself.  In that kind of case, you just want to give the person
permission to create subscriptions, without needing to also give them
lots of privileges on individual tables (and maybe having whatever
they are trying to do fail if you miss a table someplace).

But there are some other things that are strange about this too:

- If the user's permissions are later revoked, the subscription is unaffected.

- If the user creates a subscription that targets a publication which
only includes a subset of the insert, update, delete, and truncate
operations, they still need all of those permissions on their local
table.

- We don't typically have operations that require a whole bundle of
privileges on the local table -- sometimes you check that you have A
on X and B on Y, like for REFERENCES, but needing both A and B on X is
somewhat unusual.

I think we should view this permission as "you can create
subscriptions, plain and simple".

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Add exclusive backup deprecation notes to documentation

2019-03-20 Thread Magnus Hagander
On Mon, Mar 18, 2019 at 1:33 PM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-07 10:33, David Steele wrote:
> > On 3/1/19 3:14 PM, Laurenz Albe wrote:
> I think it would be helpful to frame the documentation in a way to
> suggest that the nonexclusive mode is more for automation and more
> sophisticated tools and the exclusive mode is more for manual or simple
> scripted use.
>

But that would be factually incorrect and backwards, so it seems like a
terrible idea, at least when it comes to manual. If you are doing it
manually, it's a lot *easier* to do it right with the non-exclusive mode,
because you can easily keep one psql and one shell open. And that's safe.

The only real use case that has been put forward for the exclusive backup
mode is when the backups are done through a script, and that script is
limited to only use something like bash (and can't use a scripting language
like perl or python or powershell or other more advanced scripting
languages).

And I don't think exclusive mode should be suggested for "simple scripts"
either, since it's anything but -- scripts using the exclusive mode
correctly will be anything but simple. A better term there would be to
single out shellscripts, I'd suggest, if we want to single something out.
Or more generic, for "scripting languages incapable of keeping a connection
open across multiple lines" or something?

We can certainly keep it, but let's not tell people something is simple
when it's not.


If we do think that the exclusive mode will be removed in PG13, then I
> don't think we need further documentation changes.  It already says it's
> deprecated, and we don't need to justify that at length.  But again, I'm
> not convinced that that will happen.
>

But the complaints before was that the deprecation currently in the
documentation was not enough to remove it

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: [GSoC] application ideas

2019-03-20 Thread pantilimonov misha
Excuse me for the previous letter, should be fixed now by using simple html.

---

Greetings,

i am interested in databases and would like to make a contribution to the
PostgreSQL by participating in GSoC 2019. Currently i am studying in HSE[1],
doing last year of master's program that mostly build on top of collaboration
with ISP RAS[2].

In the previous year i have been working on llvm_jit extension for
PostgreSQL 9.6, that was developed in ISP RAS and presented at PGCON[3].
Specifically, my work consisted of adding support for several missing
nodes(bitmapscan, mergejoin, subqueryscan, etc)
by rewriting them with LLVM API, as well as other functionality(e.g. distinct 
in group by)
that is required to fully support TPC-H of SCALE 100.

Originally i wanted to pursue "TOAST" tasks from ideas list, but noticed
that couple of students have already mentioned them in mailing list. So, instead
of increasing the queue for single possible idea, i would like to offer other
ones, that sound interesting to me and can potentially be useful for PostgreSQL
and community:

1) The so-called Adaptive join, that exists in modern Oracle[4] and MSSQL[5]
versions. This type of node is designed to mitigate cardinality estimation
errors in queries that are somewhere inbetween NL(nested loop with indexscan)
and HJ(hashjoin).

One possible implementation of that is to start execution in HJ fasion, by 
accumulating
rows in hashtable with certain threshold. If threshold is not exceeded, then
continue with indexscan, otherwise switch to usual HJ.

2) Changing buffer manager strategy.
Somewhere in 2016 Andres Freund made a presention[6] of possible improvements
that can be done in buffer manager. I find the idea of changing hashtable to
trees of radix trees[7] promising. Most likely, taking into account program's
time constraints, this task won't be done as "ready to deploy" solution.
Instead, some kind of prototype can be implemented and benchmarked. 

3) Improvements in jit component.
Great progress has been made in this direction in 10 and 11 versions, but
still there's a lot to be done. Possible subareas: compiled code 
caching/sharing,
cost-based optimizer improvements, push-based execution with bytecode
transformation, compiling plpgsql, etc.

At this stage i would like to receive some feedback from the community,
which of those ideas are more useful for the near future of PostgreSQL and
more suitable for GSoC itself. With that information i can dive into particular
 topic, extract additional information and prepare required proposal.

p.s. my preferred order: 2,1,3


[1] https://www.hse.ru/en/ma/sp
[2] http://www.ispras.ru/en/
[3] http://www.pgcon.org/2017/schedule/events/1092.en.html
[4] 
https://www.oracle.com/technetwork/database/bi-datawarehousing/twp-optimizer-with-oracledb-12c-1963236.pdf
[5] 
https://blogs.msdn.microsoft.com/sqlserverstorageengine/2017/04/19/introducing-batch-mode-adaptive-joins/
[6] https://pgconf.ru/media/2016/05/13/1io.pdf
[7] 
http://events17.linuxfoundation.org/sites/events/files/slides/LinuxConNA2016%20-%20Radix%20Tree.pdf

Best regards,

Michael.




Re: PostgreSQL pollutes the file system

2019-03-20 Thread Fred .Flintstone
It seems nothing came out of the discussion in 2008.
I feel the topic should be revisited.

I am in favor of doing so too. The deprecation cycle could involve
symlinks for a brief period of time or a couple of versions.

Yes, the wrapper script approach is used by Git as well as the "dotnet" command.
The wrapper script addition doesn't mean executing the commands
directly without the wrapper won't be possible. So one doesn't exclude
the other.
It would be a welcome addition.

On Wed, Mar 20, 2019 at 11:05 AM Andreas Karlsson  wrote:
>
> On 3/19/19 11:19 AM, Fred .Flintstone wrote:
> > PostgreSQL pollutes the file system with lots of binaries that it is
> > not obvious that they belong to PostgreSQL.
> >
> > Such as "/usr/bin/createdb", etc.
> >
> > It would be better if these files were renamed to be prefixed with
> > pg_, such as pg_createdb.
> > Or even better postgresql-createdb then be reachable by through a
> > "postgresql" wrapper script.
>
> Hi,
>
> This topic has been discussed before e.g. in 2008 in
> https://www.postgresql.org/message-id/47EA5CC0.8040102%40sun.com and
> also more recently but I cannot find it in the archives right now.
>
> I am personally in favor of renaming e.g. createdb to pg_createdb, since
> it is not obvious that createdb belongs to PostgreSQL when reading a
> script or looking in /usr/bin, but we would need a some kind of
> deprecation cycle here or we would suddenly break tons of people's scripts.
>
> And as for the git-like solution with a wrapper script, that seems to be
> the modern way to do things but would be an even larger breakage and I
> am not convinced the advantage would be worth it especially since our
> executables are not as closely related and consistent as for example git's.
>
> Andreas



Re: Re: query logging of prepared statements

2019-03-20 Thread Justin Pryzby
Hi,

On Wed, Mar 20, 2019 at 02:46:00PM +0400, David Steele wrote:
> >I perfectly understand your use case. I agree, it is duplicated. But I
> >think some people may want to see it at every EXECUTE, if they don't want
> >to grep for the prepared statement body which was logged earlier.
> >
> >I think it would be good to give possibility to configure this behavior.
> >At first version of your patch you relied on log_error_verbosity GUC. I'm
> >not sure that this variables is suitable for configuring visibility of
> >prepared statement body in logs, because it sets more general behavior.
> >Maybe it would be better to introduce some new GUC variable if the
> >community don't mind.
>
> Thoughts on this?

I'm happy to make the behavior configurable, but I'm having trouble believing
many people would want a GUC added for this.  But I'd be interested to hear
input on whether it should be configurable, or whether the original "verbose"
logging is desirable to anyone.

This is mostly tangential, but since writing the original patch, I considered
the possibility of a logging GUC which is scales better than log_* booleans:
https://www.postgresql.org/message-id/20190316122422.GR6030%40telsasoft.com
If that idea were desirable, there could be a logging_bit to enable verbose
logging of prepared statements.

Justin



Re: Add exclusive backup deprecation notes to documentation

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 04:29:35PM +0400, David Steele wrote:
> Please note that there have been objections to the patch later in this
> thread by Peter and Robert.  I'm not very interested in watering down the
> documentation changes as Peter suggests, but I think at the very least we
> should commit the added hints in the error message.  For many users this
> error will be their first point of contact with the backup_label
> issue/behavior.

The updates of the log message do not imply anything negative as I
read them as they mention to not remove the backup_label file.  So
while we don't have an agreement about the docs, the log messages may
be able to be committed?  Peter?  Robert?

"will result in a corruptED cluster" is more correct?
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Michael Paquier
On Wed, Mar 20, 2019 at 10:38:36AM +0100, Fabien COELHO wrote:
> Hmmm… so nothing:-)

The core of the feature is still here, fortunately.

> I think that a clear warning not to run any cluster command in parallel,
> under pain of possible cluster corruption, and possibly other caveats about
> replication, should appear in the documentation.

I still have the following extra documentation in my notes:
+ 
+  Notes
+  
+   When disabling or enabling checksums in a cluster of multiple instances,
+   it is recommended to stop all the instances of the cluster before doing
+   the switch to all the instances consistently.  When using a cluster with
+   tools which perform direct copies of relation file blocks (for example
+   ), enabling or disabling checksums can
+   lead to page corruptions in the shape of incorrect checksums if the
+   operation is not done consistently across all nodes.  Destroying all
+   the standbys in a cluster first, enabling or disabling checksums on
+   the primary and finally recreate the cluster nodes from scratch is
+   also safe.
+  
+ 
This sounds kind of enough for me but..

> I also think that some mechanism should be used to prevent such occurence,
> whether in this patch or another.

I won't counter-argue on that.

> I still think that the control file update should be made *after* the data
> directory has been synced, otherwise the control file could be updated while
> some data files are not. It just means exchanging two lines.

The parent path of the control file needs also to be flushed to make
the change durable, just doing things the same way pg_rewind keeps the
code simple in my opinion.

> If the conditional sync is moved before the file update, the file update
> should pass do_sync to update_controlfile.

For the durability of the operation, the current order is
intentional.
--
Michael


signature.asc
Description: PGP signature


Re: Add exclusive backup deprecation notes to documentation

2019-03-20 Thread David Steele

On 3/8/19 6:08 AM, Magnus Hagander wrote:
On Thu, Mar 7, 2019 at 5:35 PM Michael Paquier > wrote:


On Thu, Mar 07, 2019 at 11:33:20AM +0200, David Steele wrote:
 > OK, here's a new version that splits the deprecation notes from the
 > discussion of risks.  I also fixed the indentation.

The documentation part looks fine to me.  Just one nit regarding the
error hint.

 > -     errhint("If you are not restoring from a backup, try
removing the file \"%s/backup_label\".", DataDir)));
 > +     errhint("If you are restoring from a backup, touch
\"%s/recovery.signal\" and add recovery options to
\"%s/postgresql.auto.conf\".\n"

Here do we really want to recommend adding options to
postgresql.auto.conf?  This depends a lot on the solution integration
so I think that this hint could actually confuse some users because it
implies that they kind of *have* to do so, which is not correct.  I
would recommend to be a bit more generic and just use "and add
necessary recovery configuration".


Agreed, I think we should never tell people to "add recovery options to 
postgresql.auto.conf". Becuase they should never do that manually. If we 
want to suggest people use postgresql.auto.conf surely they should be 
using ALTER SYSTEM SET. Which of course doesn't work in this case, since 
postgrsql isn't running yet.


So yeah either that, or say "add to postgresql.conf" without the auto?


I went with Michael's suggestion.  Attached is a new patch.

I also think we should set a flag and throw the error below this if/else 
block.  This is a rather large message and maintaining two copies of it 
is not ideal.


Please note that there have been objections to the patch later in this 
thread by Peter and Robert.  I'm not very interested in watering down 
the documentation changes as Peter suggests, but I think at the very 
least we should commit the added hints in the error message.  For many 
users this error will be their first point of contact with the 
backup_label issue/behavior.


Regards,
--
-David
da...@pgmasters.net
From 87a53d0dc0e2118c553110813209eba55174bb1e Mon Sep 17 00:00:00 2001
From: David Steele 
Date: Wed, 20 Mar 2019 16:16:37 +0400
Subject: [PATCH] Add exclusive backup deprecation notes to documentation.

Update the exclusive backup documentation to explain the limitations of
the exclusive backup mode and make it clear that the feature is
deprecated.

Update the log message when the backup_label is found but recovery
cannot proceed.
---
 doc/src/sgml/backup.sgml  | 46 +++
 src/backend/access/transam/xlog.c | 10 +--
 2 files changed, 54 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/backup.sgml b/doc/src/sgml/backup.sgml
index a73fd4d044..ab5c81d382 100644
--- a/doc/src/sgml/backup.sgml
+++ b/doc/src/sgml/backup.sgml
@@ -948,6 +948,17 @@ SELECT * FROM pg_stop_backup(false, true);


 Making an exclusive low level backup
+
+
+ 
+  The exclusive backup method is deprecated and should be avoided in favor
+  of the non-exclusive backup method, e.g.
+  pg_basebackup.  This method is deprecated
+  because it has serious risks which are enumerated at the end of this
+  section.
+ 
+
+
 
  The process for an exclusive backup is mostly the same as for a
  non-exclusive one, but it differs in a few key steps. This type of backup
@@ -1054,6 +1065,41 @@ SELECT pg_stop_backup();

   
 
+
+
+ 
+  The primary issue with the exclusive method is that the
+  backup_label file is written into the data directory
+  when pg_start_backup is called and remains until
+  pg_stop_backup is called.  If
+  PostgreSQL or the host terminates abnormally,
+  then backup_label will be left in the data directory
+  and PostgreSQL probably will not start. A log
+  message recommends that backup_label be removed if
+  not restoring from a backup.
+ 
+
+ 
+  However, if backup_label is present because a 
restore
+  is actually in progress, then removing it will result in corruption.  For
+  this reason it is not recommended to automate the removal of
+  backup_label.
+ 
+
+ 
+  Another issue with exclusive backup mode is that it will continue until
+  pg_stop_backup is called, even if the calling 
process
+  is no longer performing the backup.  The next time
+  pg_start_backup is called it will fail unless
+  pg_stop_backup is called manually first.
+ 
+
+ 
+  Finally, only one exclusive backup may be run at a time.  However, it is
+  possible to run an exclusive backup at the same time as any number of
+  non-exclusive backups.
+ 
+


Backing up the data directory
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index ad12ebc426..3bb1e406e1 100644
--- 

Re: Removing unneeded self joins

2019-03-20 Thread Alexander Kuzmenkov

On 3/14/19 14:21, David Rowley wrote:


What do you think?



Let's recap the conditions when we can remove a self-join. It is when 
for each outer row, 1) at most one inner row matches the join clauses, 
and 2) it is the same row as the outer one. I'm not sure what (2) means 
precisely in a general case, but for a plain table, we can identify 
these rows by ctid. So when both sides have the same unique index with 
the same clauses, we conclude that we are always dealing with the same 
row (as identified by ctid) on both sides, hence the join can be 
replaced with a scan.


The code I wrote just checks for the above conditions. The data we need 
for these checks is a byproduct of checking the relations for 
uniqueness, which we do anyway, so we just cache it for a negligible cost.


I didn't write it in a more generic way because I don't understand the 
conditions for generic case. In your DISTINCT example, the join can be 
removed indeed. But if we select some columns from the inner side apart 
from the join ones, we can't remove the join anymore:


select * from t1, (select distinct on (a) a, b from t1) tt where t1.a = 
tt.a;


I think this might be a different kind of optimization, where we remove 
the self-join if the inner side is unique, and no inner columns are 
selected besides the join ones.



Also, reading your letter I realized that I don't commute the index 
clauses correctly before comparing them in is_unique_self_join, so I 
fixed this in the new version of the patch.



--
Alexander Kuzmenkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

>From d3bb27a26919441df1c31645ff39655c124e5ae6 Mon Sep 17 00:00:00 2001
From: Alexander Kuzmenkov 
Date: Wed, 20 Mar 2019 15:15:33 +0300
Subject: [PATCH v12] Remove unique self joins.

---
 src/backend/nodes/outfuncs.c  |  19 +-
 src/backend/optimizer/path/indxpath.c |  28 +-
 src/backend/optimizer/path/joinpath.c |   6 +-
 src/backend/optimizer/plan/analyzejoins.c | 965 +++---
 src/backend/optimizer/plan/planmain.c |   7 +-
 src/backend/optimizer/util/pathnode.c |   3 +-
 src/backend/optimizer/util/relnode.c  |  26 +-
 src/include/nodes/nodes.h |   1 +
 src/include/nodes/pathnodes.h |  22 +-
 src/include/optimizer/pathnode.h  |   5 +
 src/include/optimizer/paths.h |   4 +-
 src/include/optimizer/planmain.h  |  10 +-
 src/test/regress/expected/equivclass.out  |  32 +
 src/test/regress/expected/join.out| 219 ++-
 src/test/regress/sql/equivclass.sql   |  17 +
 src/test/regress/sql/join.sql |  94 +++
 16 files changed, 1359 insertions(+), 99 deletions(-)

diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c
index 69179a0..7ed1f47 100644
--- a/src/backend/nodes/outfuncs.c
+++ b/src/backend/nodes/outfuncs.c
@@ -2263,7 +2263,8 @@ _outRelOptInfo(StringInfo str, const RelOptInfo *node)
 	WRITE_OID_FIELD(userid);
 	WRITE_BOOL_FIELD(useridiscurrent);
 	/* we don't try to print fdwroutine or fdw_private */
-	/* can't print unique_for_rels/non_unique_for_rels; BMSes aren't Nodes */
+	WRITE_NODE_FIELD(unique_for_rels);
+	/* can't print non_unique_for_rels; BMSes aren't Nodes */
 	WRITE_NODE_FIELD(baserestrictinfo);
 	WRITE_UINT_FIELD(baserestrict_min_security);
 	WRITE_NODE_FIELD(joininfo);
@@ -2336,6 +2337,19 @@ _outStatisticExtInfo(StringInfo str, const StatisticExtInfo *node)
 }
 
 static void
+_outUniqueRelInfo(StringInfo str, const UniqueRelInfo *node)
+{
+	WRITE_NODE_TYPE("UNIQUERELINFO");
+
+	WRITE_BITMAPSET_FIELD(outerrelids);
+	if (node->index)
+	{
+		WRITE_OID_FIELD(index->indexoid);
+		WRITE_NODE_FIELD(column_values);
+	}
+}
+
+static void
 _outEquivalenceClass(StringInfo str, const EquivalenceClass *node)
 {
 	/*
@@ -4088,6 +4102,9 @@ outNode(StringInfo str, const void *obj)
 			case T_StatisticExtInfo:
 _outStatisticExtInfo(str, obj);
 break;
+			case T_UniqueRelInfo:
+_outUniqueRelInfo(str, obj);
+break;
 			case T_ExtensibleNode:
 _outExtensibleNode(str, obj);
 break;
diff --git a/src/backend/optimizer/path/indxpath.c b/src/backend/optimizer/path/indxpath.c
index 3434219..88e61d4 100644
--- a/src/backend/optimizer/path/indxpath.c
+++ b/src/backend/optimizer/path/indxpath.c
@@ -3583,7 +3583,8 @@ ec_member_matches_indexcol(PlannerInfo *root, RelOptInfo *rel,
  * relation_has_unique_index_for
  *	  Determine whether the relation provably has at most one row satisfying
  *	  a set of equality conditions, because the conditions constrain all
- *	  columns of some unique index.
+ *	  columns of some unique index. If index_info is not null, it is set to
+ *	  point to a new UniqueRelInfo containing the index and conditions.
  *
  * The conditions can be represented in either or both of two ways:
  * 1. A list of RestrictInfo nodes, where the caller has already determined
@@ -3604,12 +3605,16 @@ ec_member_matches_indexcol(PlannerInfo *root, 

Re: Should we add GUCs to allow partition pruning to be disabled?

2019-03-20 Thread David Rowley
On Thu, 14 Mar 2019 at 02:10, Robert Haas  wrote:
>
> On Tue, Mar 12, 2019 at 7:28 PM David Rowley
>  wrote:
> > I think I've done that in the attached patch.
>
> Cool, thanks.

Just so I don't forget about this, I've added it to the July 'fest.

https://commitfest.postgresql.org/23/2065/

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: BUG #15572: Misleading message reported by "Drop function operation" on DB with functions having same name

2019-03-20 Thread David Rowley
Thanks for reviewing this.

On Wed, 20 Mar 2019 at 04:31, Pavel Stehule  wrote:
> I propose maybe more strongly comment fact so noError is applied only on "not 
> found" event. In other cases, this flag is ignored and error is raised 
> immediately there. I think so it is not good enough commented why.
> This is significant change - in previous releases, noError was used like 
> really noError, so should be commented more.

I've made a change to the comments in LookupFuncWithArgs() to make
this more clear. I also ended up renaming noError to missing_ok.
Hopefully this backs up the comments and reduces the chances of
surprises.

> Regress tests are enough.
> The patch is possible to apply without problems and compile without warnings

Thanks. I also fixed a bug that caused an Assert failure when
performing DROP ROUTINE ambiguous_name;  test added for that case too.

> The new status of this patch is: Ready for Committer

Great!

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


drop_func_if_not_exists_fix_v9.patch
Description: Binary data


Re: Re: Psql patch to show access methods info

2019-03-20 Thread David Steele

Hi Sergey,

On 3/8/19 8:52 AM, Kyotaro HORIGUCHI wrote:


At Mon, 10 Dec 2018 19:38:39 +0300, s.cherkas...@postgrespro.ru wrote in 
<70e94e339dd0fa2be5d3eebec68da...@postgrespro.ru>

Here are some fixes. But I'm not sure that the renaming of columns for
the '\dAp' command is sufficiently laconic and informative. If you
have any suggestions on how to improve them, I will be very grateful.


\dA:

   This is showing almost nothing. I think it's better that this
   command shows the same content with \dA+.  As per Nikita's comment
   upthread, "Table" addition to "Index" is needed.

\dAp:

   As the result \dAp gets useless. It cannot handle both Index
   and Table AMs at once.

   So, I propose the following behavior instead. It is similar to
   what \d does.

=# \dA
 List of access methods
   Name  | Type  |   Handler
+---+--
  brin   | Index | brinhandler
   ..
  heap   | Table | heap_tableam_handler


=# \dA+
   Name  | Type  |   Handler|  Description
+---+--+
  brin   | Index | brinhandler  | block range index (BRIN) access method
   ..
  heap   | Table | heap_tableam_handler | heap table access method


=# \dA brin
 Index access method "brin"
   Name  | Ordering | Unique | Multicol key | Non-key cols | Excl Constraints
+--++--+--+-
  brin   | No   | Yes| No   | No   | No

\dA heap
 Table access method "heap"
(I don't have an idea what to show here..)



\dAfo: I don't get the point of the command.

\dAoc: This seems more useful than \dAfo but the information that
the command shows seems a bit pointless. We sometimes want to
know the name of operator class usable in a CREATE INDEX. So I
suppose that something like the following might be useful
instead.

SELECT DISTINCT a.amname AS "Acess method",
(case when o.opckeytype <> 0 then o.opckeytype else o.opcintype end)::regtype AS 
"Key type",
n.nspname || '.' || o.opcname AS "Operator class",
(case when o.opcdefault then 'Yes' else 'No' end) AS "Default for type?"
FROM pg_catalog.pg_opclass o
JOIN pg_catalog.pg_opfamily f ON (f.oid = o.opcfamily)
JOIN pg_catalog.pg_am a ON (a.oid = f.opfmethod)
JOIN pg_catalog.pg_namespace n ON (n.oid = o.opcnamespace)
ORDER BY 1, 2, 4 desc, 3;

\dAoc
 List of operator classes for access methods
  Access method | Key type |   Operator class| Default for type?
---+--+-+---
  brin  | bytea| pg_catalog.bytea_minmax_ops | Yes
  brin  | "char"   | pg_catalog.char_minmax_ops  | Yes
  brin  | name | pg_catalog.name_minmax_ops  | Yes
  brin  | bigint   | pg_catalog.int8_minmax_ops  | Yes
..


\dAoc btree
 List of operator classes for access method 'btree'
  Access method | Key type |Operator class   | Default for type?
---+--+-+---
  btree | boolean  | pg_catalog.bool_ops | Yes
...
  btree | text | pg_catalog.text_ops | Yes
  btree | text | pg_catalog.text_pattern_ops | No
  btree | text | pg_catalog.varchar_ops  | No

\dAoc btree text
List of operator classes for access method 'btree', type 'text'

 List of operator classes for access method 'btree'
  Access method | Key type | Operator class | Default for type?
---+--++--
  btree | text | pg_catalog.text_ops| Yes
  btree | text | pg_catalog.text_pattern_ops| No
  btree | text | pg_catalog.varchar_ops | No
  btree | text | pg_catalog.varchar_pattern_ops | No

I'm not sure it's useful, but \dAoc+ may print owner.



0002 no longer applies.

\dip: It works, but you are catching 'd[tvmi]' for 'dip' and 'dicp'.

\dip shows the following rseult.

   Index properties
  Schema |   Name| Access method | Clusterable | Index scan | Bitmap scan | 
B
ackward scan
+---+---+-++-+--
-
  public | x_a_idx   | btree | t   | t  | t   | 
t
  public | tt_a_idx  | brin  | f   | f  | t   | 
f
  public | tt_a_idx1 | brin  | f   | f  | t   | 
f


The colums arfter "Access method" don't seem informatitve for
users since they are fixed properties of an access method, and
they doesn't make difference in what users can do.  "Clusterable"
seems useful in certain extent, but it doesn't fit here. Instaed
\d  seems to me to be the place. (It could be shown also
in \di+, but that 

Re: PostgreSQL pollutes the file system

2019-03-20 Thread Chris Travers
On Wed, Mar 20, 2019 at 11:06 AM Andreas Karlsson  wrote:

> On 3/19/19 11:19 AM, Fred .Flintstone wrote:
> > PostgreSQL pollutes the file system with lots of binaries that it is
> > not obvious that they belong to PostgreSQL.
> >
> > Such as "/usr/bin/createdb", etc.
> >
> > It would be better if these files were renamed to be prefixed with
> > pg_, such as pg_createdb.
> > Or even better postgresql-createdb then be reachable by through a
> > "postgresql" wrapper script.
>
> Hi,
>
> This topic has been discussed before e.g. in 2008 in
> https://www.postgresql.org/message-id/47EA5CC0.8040102%40sun.com and
> also more recently but I cannot find it in the archives right now.
>
> I am personally in favor of renaming e.g. createdb to pg_createdb, since
> it is not obvious that createdb belongs to PostgreSQL when reading a
> script or looking in /usr/bin, but we would need a some kind of
> deprecation cycle here or we would suddenly break tons of people's scripts


I wouldn't be opposed to this, but I would note two points on a deprecation
cycle:
1  Given that people may have tools that work with all supported versions
of PostgreSQL, this needs to be a long cycle, and
2. Managing that cycle makes it a little bit of a tough sell.

> .
>
> And as for the git-like solution with a wrapper script, that seems to be
> the modern way to do things but would be an even larger breakage and I
> am not convinced the advantage would be worth it especially since our
> executables are not as closely related and consistent as for example git's.
>

Git commands may be related, but I would actually argue that git commands
have a lot of inconsistency because of this structure,

See, for example, http://stevelosh.com/blog/2013/04/git-koans/


>
> Andreas
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Re: A separate table level option to control compression

2019-03-20 Thread David Steele

Hi Pavan,

On 3/12/19 4:38 PM, Andrew Dunstan wrote:

On 3/11/19 2:23 AM, Masahiko Sawada wrote:



I like this idea.

The patch seems to need update the part describing on-disk toast
storage in storage.sgml.


Yeah. Meanwhile, here's a rebased version of the patch to keep the cfbot
happy.


Looks like you need to update the documentation in storage.sgml?

Regards,
--
-David
da...@pgmasters.net



Re: Re: query logging of prepared statements

2019-03-20 Thread David Steele

Hi Justin,

On 3/5/19 2:30 PM, Arthur Zakirov wrote:

On 04.03.2019 21:31, Justin Pryzby wrote:

It wasn't intentional.  Find attached v3 patch which handles that case,
by removing the 2nd call to errdetail_execute() ; since it's otherwise 
unused,

so remove that function entirely.


Thank you.

Thanks for reviewing.  I'm also interested in discussion about whether 
this
change is undesirable for someone else for some reason ?  For me, the 
existing

output seems duplicative and "denormalized".  :)


I perfectly understand your use case. I agree, it is duplicated. But I 
think some people may want to see it at every EXECUTE, if they don't 
want to grep for the prepared statement body which was logged earlier.


I think it would be good to give possibility to configure this behavior. 
At first version of your patch you relied on log_error_verbosity GUC. 
I'm not sure that this variables is suitable for configuring visibility 
of prepared statement body in logs, because it sets more general 
behavior. Maybe it would be better to introduce some new GUC variable if 
the community don't mind.

Thoughts on this?

Regards,
--
-David
da...@pgmasters.net



Re: Re: Planning counters in pg_stat_statements (using pgss_store)

2019-03-20 Thread David Steele

Hi PAscal,

On 2/15/19 11:32 AM, Sergei Kornilov wrote:

Hi


  +#define PG_STAT_STATEMENTS_COLS_V1_4 25


I thought it was needed when adding new columns, isn't it ?


Yes, this is needed. I mean it should be PG_STAT_STATEMENTS_COLS_V1_8: because 
such change was made for 1.8 pg_stat_statements version. Same thing for other 
version-specific places.


This patch has been waiting for an update for over a month.  Do you know 
when you will have one ready?  Should we move the release target to PG13?


Regards,
--
-David
da...@pgmasters.net



Re: selecting from partitions and constraint exclusion

2019-03-20 Thread David Rowley
On Wed, 20 Mar 2019 at 17:37, Amit Langote
 wrote:
> That's because get_relation_constraints() no longer (as of PG 11) includes
> the partition constraint for SELECT queries.  But that's based on an
> assumption that partitions are always accessed via parent, so partition
> pruning would make loading the partition constraint unnecessary.  That's
> not always true, as shown in the above example.
>
> Should we fix that?  I'm attaching a patch here.

Perhaps we should. The constraint_exclusion documents [1] just mention:

> Controls the query planner's use of table constraints to optimize queries.

and I'm pretty sure you could class the partition constraint as a
table constraint.

As for the patch:

+ if ((root->parse->commandType == CMD_SELECT && !IS_OTHER_REL(rel)) ||

Shouldn't this really be checking rel->reloptkind == RELOPT_BASEREL
instead of !IS_OTHER_REL(rel) ?

For the comments:

+ * For selects, we only need those if the partition is directly mentioned
+ * in the query, that is not via parent.  In case of the latter, partition
+ * pruning, which uses the parent table's partition bound descriptor,
+ * ensures that we only consider partitions whose partition constraint
+ * satisfy the query quals (or, the two don't contradict each other), so
+ * loading them is pointless.
+ *
+ * For updates and deletes, we always need those for performing partition
+ * pruning using constraint exclusion, but, only if pruning is enabled.

You mention "the latter", normally you'd only do that if there was a
former, but in this case there's not.

How about just making it:

/*
 * Append partition predicates, if any.
 *
 * For selects, partition pruning uses the parent table's partition bound
 * descriptor, so there's no need to include the partition constraint for
 * this case.  However, if the partition is referenced directly in the query
 * then no partition pruning will occur, so we'll include it in the case.
 */
if ((root->parse->commandType != CMD_SELECT && enable_partition_pruning) ||
(root->parse->commandType == CMD_SELECT && rel->reloptkind ==
RELOPT_BASEREL))

For the tests, it seems excessive to create some new tables for this.
Won't the tables in the previous test work just fine?

[1] 
https://www.postgresql.org/docs/devel/runtime-config-query.html#GUC-CONSTRAINT-EXCLUSION

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Re: [RFC] [PATCH] Flexible "partition pruning" hook

2019-03-20 Thread David Steele

Hi Peter,

On 2/28/19 10:36 PM, Mike Palmiotto wrote:

On Wed, Feb 27, 2019 at 12:36 PM Peter Eisentraut
 wrote:


To rephrase this: You have a partitioned table, and you have a RLS
policy that hides certain rows, and you know based on your business
logic that under certain circumstances entire partitions will be hidden,
so they don't need to be scanned.  So you want a planner hook that would
allow you to prune those partitions manually.

That sounds pretty hackish to me.  We should give the planner and
executor the appropriate information to make these decisions, like an
additional partition constraint.


Are you thinking of a partition pruning step for FuncExpr's or
something else? I was considering an implementation where FuncExpr's
were marked for execution-time pruning, but wanted to see if this
patch got any traction first.


If this information is hidden in
user-defined functions in a way that cannot be reasoned about, who is
enforcing these constraints and how do we know they are actually correct?


The author of the extension which utilizes the hook would have to be
sure they use the hook correctly. This is not a new or different
concept to any other existing hook. This hook in particular would be
used by security extensions that have some understanding of the
underlying security model being implemented by RLS.


Thoughts on Mike's response?

Regards,
--
-David
da...@pgmasters.net



Re: Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-03-20 Thread David Steele

Hi Pavan,

On 3/14/19 2:20 PM, Masahiko Sawada wrote:

On Thu, Mar 14, 2019 at 5:17 PM Pavan Deolasee  wrote:


Ok. I will run some tests. But please note that this patch is a bug fix to 
address the performance issue that is caused by having to rewrite the entire 
table when all-visible bit is set on the page during first vacuum. So while we 
may do some more work during COPY FREEZE, we're saving a lot of page writes 
during next vacuum. Also, since the scan that we are doing in this patch is 
done on a page that should be in the buffer cache, we will pay a bit in terms 
of CPU cost, but not anything in terms of IO cost.


Agreed. I had been misunderstanding this patch. The page scan during
COPY FREEZE is necessary and it's very cheaper than doing in the first
vacuum.


I have removed Ibrar as a reviewer since there has been no review from 
them in three weeks, and too encourage others to have a look.


Regards,
--
-David
da...@pgmasters.net



Re: Re: Reporting script runtimes in pg_regress

2019-03-20 Thread David Steele

Hi Christophe,

On 3/8/19 5:12 PM, Alvaro Herrera wrote:

On 2019-Mar-08, Christoph Berg wrote:


Re: Peter Eisentraut 2019-03-08 
<3eb194cf-b878-1f63-8623-6d6add0ed...@2ndquadrant.com>

On 2019-02-21 10:37, Christoph Berg wrote:

diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c
index a18a6f6c45..8080626e94 100644
--- a/src/test/regress/pg_regress.c
+++ b/src/test/regress/pg_regress.c
@@ -1794,12 +1794,14 @@ run_schedule(const char *schedule, test_function tfunc)
else
{
status(_("FAILED"));
+   status("  "); /* align with 
failed (ignored) */
fail_count++;
}


So an issue here is that in theory "FAILED" etc. are marked for
translation but your spacers do not take that into account.  Personally,
I have no ambition to translate pg_regress, so we could remove all that.
  But it should be done consistently in either case.


Oh, right. So the way to go would be to use _("FAILED   "), and
ask translators to use the same length.


Note there's no translation for pg_regress.  All these _() markers are
currently dead code.  It seems hard to become motivated to translate
that kind of program.  I don't think it has much value, myself.


This patch has been "Waiting on Author" since March 8th.  Do you know 
when you'll have a new version ready?


Regards,
--
-David
da...@pgmasters.net



Re: PostgreSQL pollutes the file system

2019-03-20 Thread Andreas Karlsson

On 3/19/19 11:19 AM, Fred .Flintstone wrote:

PostgreSQL pollutes the file system with lots of binaries that it is
not obvious that they belong to PostgreSQL.

Such as "/usr/bin/createdb", etc.

It would be better if these files were renamed to be prefixed with
pg_, such as pg_createdb.
Or even better postgresql-createdb then be reachable by through a
"postgresql" wrapper script.


Hi,

This topic has been discussed before e.g. in 2008 in 
https://www.postgresql.org/message-id/47EA5CC0.8040102%40sun.com and 
also more recently but I cannot find it in the archives right now.


I am personally in favor of renaming e.g. createdb to pg_createdb, since 
it is not obvious that createdb belongs to PostgreSQL when reading a 
script or looking in /usr/bin, but we would need a some kind of 
deprecation cycle here or we would suddenly break tons of people's scripts.


And as for the git-like solution with a wrapper script, that seems to be 
the modern way to do things but would be an even larger breakage and I 
am not convinced the advantage would be worth it especially since our 
executables are not as closely related and consistent as for example git's.


Andreas



RE: speeding up planning with partitions

2019-03-20 Thread Imai, Yoshikazu
Amit-san,

On Wed, Mar 20, 2019 at 9:07 AM, Amit Langote wrote:
> On 2019/03/20 17:36, Imai, Yoshikazu wrote:
> > On Wed, Mar 20, 2019 at 8:21 AM, Amit Langote wrote:
> >> On 2019/03/20 12:15, Imai, Yoshikazu wrote:
> >>> [select1024.sql]
> >>> \set a random (1, 1024)
> >>> select * from rt where a = :a;
> >>>
> >>> [pgbench]
> >>> pgbench -n -f select1024.sql -T 60
> >>
> >> Thank you.
> >>
> >> Could you please try with running pgbench for a bit longer than 60
> seconds?
> >
> > I run pgbench for 180 seconds but there are still difference.
> 
> Thank you very much.
> 
> > 1024: 7,004 TPS
> > 8192: 5,859 TPS
> >
> >
> > I also tested for another number of partitions by running pgbench for
> 60 seconds.
> >
> > num of partTPS
> > ---  -
> > 128  7,579
> > 256  7,528
> > 512  7,512
> > 1024 7,257 (7274, 7246, 7252)
> > 2048 6,718 (6627, 6780, 6747)
> > 4096 6,472 (6434, 6565, 6416) (quoted from above (3)'s results)
> > 8192 6,008 (6018, 5999, 6007)
> >
> >
> > I checked whether there are the process which go through the number
> of partitions, but I couldn't find. I'm really wondering why this
> degradation happens.
> 
> Indeed, it's quite puzzling why.  Will look into this.

I don't know whether it is useful, but I noticed the usage of get_tabstat_entry 
increased when many partitions are scanned.

--
Yoshikazu Imai 



Re: Special role for subscriptions

2019-03-20 Thread Evgeniy Efimkin


Hi!
> Currently, user with pg_subscription_users can create subscription into any 
> system table, can't they?
> We certainly need to change it to more secure way.
No, you can't add system tables to publication. In new patch i add privileges 
checks on target table, non superuser can't create/refresh subscription if he 
don't have INSERT, UPDATE, DELETE and TRUNCATE privileges.


 
Efimkin Evgeny
diff --git a/doc/src/sgml/logical-replication.sgml b/doc/src/sgml/logical-replication.sgml
index 3f2f674a1a..c2c7241084 100644
--- a/doc/src/sgml/logical-replication.sgml
+++ b/doc/src/sgml/logical-replication.sgml
@@ -201,9 +201,10 @@
 
   
Subscriptions are dumped by pg_dump if the current user
-   is a superuser.  Otherwise a warning is written and subscriptions are
-   skipped, because non-superusers cannot read all subscription information
-   from the pg_subscription catalog.
+   is a superuser or member of role pg_subscription_users, other users should
+   use no-subscriptions because non-superusers cannot read
+   all subscription information from the pg_subscription
+   catalog.
   
 
   
@@ -522,7 +523,10 @@
   
 
   
-   To create a subscription, the user must be a superuser.
+   To create a subscription, the user must be a superuser or be member of role
+   pg_subscription_users and have INSERT ,
+   UPDATE,  DELETE, TRUNCATE
+   privileges on target table.
   
 
   
diff --git a/doc/src/sgml/user-manag.sgml b/doc/src/sgml/user-manag.sgml
index 6106244d32..9c23a6280d 100644
--- a/doc/src/sgml/user-manag.sgml
+++ b/doc/src/sgml/user-manag.sgml
@@ -556,6 +556,10 @@ DROP ROLE doomed_role;
pg_read_all_stats and
pg_stat_scan_tables.
   
+  
+   pg_subscription_users
+   Allow create/drop subscriptions and be owner of subscription. Read pg_subscription
+  
  
 

@@ -579,6 +583,12 @@ DROP ROLE doomed_role;
   other system information normally restricted to superusers.
   
 
+  
+  The pg_subscription_users role are intended to allow
+  administrators trusted, but non-superuser, which are able to create/drop subscription.
+  
+
+
   
   Care should be taken when granting these roles to ensure they are only used where
   needed and with the understanding that these roles grant access to privileged
diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c
index afee2838cc..5483a65376 100644
--- a/src/backend/catalog/pg_subscription.c
+++ b/src/backend/catalog/pg_subscription.c
@@ -242,12 +242,16 @@ AddSubscriptionRelState(Oid subid, Oid relid, char state,
 		XLogRecPtr sublsn)
 {
 	Relation	rel;
+	Relation	target_rel;
 	HeapTuple	tup;
 	bool		nulls[Natts_pg_subscription_rel];
 	Datum		values[Natts_pg_subscription_rel];
+	AclResult	aclresult;
+	AclMode		aclmask;
 
 	LockSharedObject(SubscriptionRelationId, subid, 0, AccessShareLock);
 
+
 	rel = table_open(SubscriptionRelRelationId, RowExclusiveLock);
 
 	/* Try finding existing mapping. */
@@ -258,6 +262,14 @@ AddSubscriptionRelState(Oid subid, Oid relid, char state,
 		elog(ERROR, "subscription table %u in subscription %u already exists",
 			 relid, subid);
 
+	/* Check permission on target table. */
+	aclmask = ACL_INSERT | ACL_UPDATE | ACL_DELETE | ACL_TRUNCATE;
+	target_rel = table_open(relid, NoLock);
+	aclresult = pg_class_aclcheck(RelationGetRelid(target_rel), GetUserId(), aclmask);
+	if (aclresult != ACLCHECK_OK)
+		aclcheck_error(aclresult, get_relkind_objtype(target_rel->rd_rel->relkind),
+			RelationGetRelationName(target_rel));
+
 	/* Form the tuple. */
 	memset(values, 0, sizeof(values));
 	memset(nulls, false, sizeof(nulls));
@@ -278,6 +290,7 @@ AddSubscriptionRelState(Oid subid, Oid relid, char state,
 
 	/* Cleanup. */
 	table_close(rel, NoLock);
+	table_close(target_rel, NoLock);
 }
 
 /*
diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql
index d962648bc5..08e58295bd 100644
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -939,9 +939,12 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 
 -- All columns of pg_subscription except subconninfo are readable.
 REVOKE ALL ON pg_subscription FROM public;
-GRANT SELECT (subdbid, subname, subowner, subenabled, subslotname, subpublications)
+GRANT SELECT (tableoid, oid, subdbid, subname, subowner, subenabled,
+	subslotname, subsynccommit, subpublications)
 ON pg_subscription TO public;
 
+GRANT SELECT (subconninfo)
+ON pg_subscription TO pg_subscription_users;
 
 --
 -- We have a few function definitions in here, too.
diff --git a/src/backend/commands/subscriptioncmds.c b/src/backend/commands/subscriptioncmds.c
index a60a15193a..e1555ed6eb 100644
--- a/src/backend/commands/subscriptioncmds.c
+++ b/src/backend/commands/subscriptioncmds.c
@@ -26,6 +26,7 @@
 #include "catalog/namespace.h"
 #include "catalog/objectaccess.h"
 #include "catalog/objectaddress.h"
+#include "catalog/pg_authid.h"
 #include 

Re: Offline enabling/disabling of data checksums

2019-03-20 Thread Fabien COELHO


Michaël-san,


In short, you keep the main feature with:
- No tweaks with postmaster.pid.
- Rely just on the control file indicating an instance shutdown
cleanly.
- No tweaks with the system ID.
- No renaming of the control file.


Hmmm… so nothing:-)

I think that this feature is useful, in complement to a possible 
online-enabling server-managed version.


About patch v8 part 1: applies cleanly, compiles, global & local make 
check ok, doc build ok.


I think that a clear warning not to run any cluster command in parallel, 
under pain of possible cluster corruption, and possibly other caveats 
about replication, should appear in the documentation.


I also think that some mechanism should be used to prevent such occurence, 
whether in this patch or another.


About "If enabling or disabling checksums, the exit status is nonzero if 
the operation failed." I must admit that enabling/disabling an already 
enabled/disabled cluster is still not really a failure for me, because the 
end status is that the cluster is in the state required by the user.


I still think that the control file update should be made *after* the data 
directory has been synced, otherwise the control file could be updated 
while some data files are not. It just means exchanging two lines.



About patch v8 part 2: applies cleanly, compiles, global & local make 
check ok.


The added -N option is not documented.

If the conditional sync is moved before the file update, the file update 
should pass do_sync to update_controlfile.


--
Fabien.

  1   2   >