Re: SP-GiST failing to complete SP-GiST index build

2018-05-27 Thread Jonathan S. Katz

> On May 27, 2018, at 8:24 PM, Peter Geoghegan  wrote:
> 
> On Sun, May 27, 2018 at 5:10 PM, Tom Lane  wrote:
>> Instrumenting the test case suggests that getQuadrant pretty much always
>> returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
>> is related to the fact that the test case inserts the values in increasing
>> order, so that new values are always greater than existing values in the
>> index.
> 
> I suspected the same. It reminded me of the weird behavior that the
> Postgres qsort() sometimes exhibits.
> 
>> SPGiST is unable to rebalance its tree on the fly, so it's pretty
>> well screwed in this situation.  It does finish eventually, but in about
>> 50x longer than GiST.  I imagine the index's query performance would be
>> equally awful.
> 
> Can you think of some way of side-stepping the issue? It's unfortunate
> that SP-GiST is potentially so sensitive to input order.

To help with the testing, I’ve attached two more scenarios, labeled
“good2” and “bad2” below.  The premise is similar, except that I start
with empty tables with indexes already created.

The workload in “bad2” is what you may see in the real world with
proper DBA planning (i.e. I have my indexes in place before I start
collecting data) with scheduling applications or anything with an increasing
time series.

The timing results I found were similar to the initial example posted, with
me giving up on the last scenario (I do not have the same patience as
Peter).

FWIW I have used SP-GiST indexes before with datasets similar to how
“bad2” is generated (though not nearly as dramatic as the upward increase
seen in the range) and have not run across this issue.

Jonathan




bad2.sql
Description: Binary data


good2.sql
Description: Binary data


Re: Undo logs

2018-05-27 Thread James Sewell
Exciting stuff! Really looking forward to having a play with this.

James Sewell,
*Chief Architect*



Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

On 25 May 2018 at 08:22, Thomas Munro  wrote:

> Hello hackers,
>
> As announced elsewhere[1][2][3], at EnterpriseDB we are working on a
> proposal to add in-place updates with undo logs to PostgreSQL.  The
> goal is to improve performance and resource usage by recycling space
> better.
>
> The lowest level piece of this work is a physical undo log manager,
> which I've personally been working on.  Later patches will build on
> top, adding record-oriented access and then the main "zheap" access
> manager and related infrastructure.  My colleagues will write about
> those.
>
> The README files[4][5] explain in more detail, but here is a
> bullet-point description of what the attached patch set gives you:
>
> 1.  Efficient appending of new undo data from many concurrent
> backends.  Like logs.
> 2.  Efficient discarding of old undo data that isn't needed anymore.
> Like queues.
> 3.  Efficient buffered random reading of undo data.  Like relations.
>
> A test module is provided that can be used to exercise the undo log
> code paths without needing any of the later zheap patches.
>
> This is work in progress.  A few aspects are under active development
> and liable to change, as indicated by comments, and there are no doubt
> bugs and room for improvement.  The code is also available at
> github.com/EnterpriseDB/zheap (these patches are from the
> undo-log-storage branch, see also the master branch which has the full
> zheap feature).  We'd be grateful for any questions, feedback or
> ideas.
>
> [1] https://amitkapila16.blogspot.com/2018/03/zheap-storage-
> engine-to-provide-better.html
> [2] https://rhaas.blogspot.com/2018/01/do-or-undo-there-is-no-vacuum.html
> [3] https://www.pgcon.org/2018/schedule/events/1190.en.html
> [4] https://github.com/EnterpriseDB/zheap/tree/undo-
> log-storage/src/backend/access/undo
> [5] https://github.com/EnterpriseDB/zheap/tree/undo-
> log-storage/src/backend/storage/smgr
>
> --
> Thomas Munro
> http://www.enterprisedb.com
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Redesigning the executor (async, JIT, memory efficiency)

2018-05-27 Thread Kyotaro HORIGUCHI
Thanks! I've been waiting for this.

At Thu, 24 May 2018 20:35:39 -0700, Andres Freund  wrote in 
<20180525033538.6ypfwcqcxce6z...@alap3.anarazel.de>
> The current executor structure limits us in a number of ways that are
> becoming problematic.  I'll outline in this email how I think we should
> rework it. Including a prototype for the first, most drastic, steps.
> 
> The primary problems with the current design that I want to address are:
> 
> 1) Executor structure makes it hard to do asynchronous execution.  That
>comes into play in several cases. For example we want to be able to
>continue execution if no tuples are ready in one FDW below an append
>node, instead switch to another one. Similarly that could be relevant
>for parallel queries, with multiple gather nodes.
> 
>A bit further out that'd also be very useful for continuing execution
>in a different part of the tree once blocked on IO. That'd e.g. be
>useful to build several hashtables for hashjoins at once.  There's
>some limitations of how we do IO for that atm however, it'd be easier
>to efficiently implement this if we had AIO support.
> 
>With the current, tree-walking based architecture, it's hard to
>implement something like that, because we basically have to re-enter
>into the subtrees where we'd stopped. That'd requires adding new
>state-machine logic in every tree, including additional branches.
> 
>What we basically need here is the ability to stop execution inside a
>specific executor node and have a multiplexer node (which can use
>WaitEventSets or such) decide which part of the execution tree to
>continue executing instead.

Yeah. Finally async execution requires that capability.

> 2) The tree-walking based approach makes it very hard to JIT compile the
>main portion of query execution without manually re-emitting all of
>executor/, which is obviously entirely unattractive.
> 
>For the expression evaluation JIT the biggest benefit comes from
>removing the indirect branches between the expression steps and then,
>a close second, from having more efficient implementations of
>hot-path expression steps.
> 
>That approach currently isn't feasible for the whole executor. The
>main sources for indirect branches / calls are ExecProcNode() and
>ExecEvalExpr() calls inside the the Exec*() nodes. To be able to make
>those constant we'd basically have to manually emit code for all of
>these (or attempt to rely on the inliner, but that doesn't work
>across deep, potentially self invoking, trees).  The expression code,
>which also used to be tree walking, avoids that now by having the
>indirect function calls nearly exclusively at the top-level.

I looked into this a bit. One of most hard point in flattening
executor (in my previous attempt) *was* how to inside-out'ing the
Exec* functions, especially Agg/WindowAgg. The highlight of the
hardness was it could be utterly unreadable if *I* reconstruct,
say WindowAgg, or non-hash-Agg into functions that is called with
tuples from underlying nodes. In the repo, nodeAgg is flattened
only for the hash-agg case so it seems rather straight-forward
but I'm afraid that such complex nodes becomes collections of
mystic fractions that should be work when called in the sequence
written as a magic spell casted by the compiler part.. (Sorry in
advance for the maybe hard-to-read blob.)

One related concern is finally the "mystic fractions" are no
longer node-private but global ones called from
ExecExecProgram. Essentially JIT'ing is a kind of such and in the
current ExecExpr case, it seems to me tolerablly small. However,
I'm not sure it is maintainable (for moderate developers) if all
such fractions are connected via ExecExecProgram, following a
magic spell.

> Secondary issues that I also want to address, or at least get ready to
> address, are:
> 
> 
> 3) Reduce the overhead of executor startup. In a number of workloads
>involving simple queries we currently spend most of the time within
>ExecInitNode().  There's two major problems here: For one a lot of
>work is done at executor initialization that should be part of
>planning (especially creating tupledescs and nodeAgg
>initialization). The other problem is that initialization does a lot
>of tiny allocations.
> 
> 
> 4) Make the executor faster, leaving JIT aside.
> 
>Moving to a "push based" executor can yield noticable speedups: Right
>now returning a single tuple will often pass through a lot of
>indirect function calls (via ExecProcNode()).  By moving to a push
>based executor (where the first executed piece is reachable without
>any recursion), this can be significantly cheaper.  We also exercise
>the stack *a lot* right now, constantly pushing/poping stuff onto it
>that's not going to be used anytime soon.  Profiling shows that stack
>usage is quite a bottleneck right now.
> 

Re: Handling better supported channel binding types for SSL implementations

2018-05-27 Thread Michael Paquier
On Thu, Mar 08, 2018 at 02:19:55PM -0500, Peter Eisentraut wrote:
> Moved to next CF along with those other two patches.

Attached is a refreshed patch for this thread, which failed to apply
after recent changes.  This is tied with patches for other SSL
implementations, so let's see about it later if necessary.
--
Michael
diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 63f37902e6..6ad5e2eedf 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -873,6 +873,7 @@ CheckSCRAMAuth(Port *port, char *shadow_pass, char **logdetail)
 	int			inputlen;
 	int			result;
 	bool		initial;
+	List	   *channel_bindings = NIL;
 
 	/*
 	 * SASL auth is not supported for protocol versions before 3, because it
@@ -898,7 +899,17 @@ CheckSCRAMAuth(Port *port, char *shadow_pass, char **logdetail)
 		strlen(SCRAM_SHA_256_NAME) + 3);
 	p = sasl_mechs;
 
-	if (port->ssl_in_use)
+#ifdef USE_SSL
+	/*
+	 * Get the list of channel binding types supported by the SSL
+	 * implementation used to determine if server should publish any
+	 * SASL mechanism supporting channel binding or not.
+	 */
+	channel_bindings = be_tls_list_channel_bindings();
+#endif
+
+	if (port->ssl_in_use &&
+		list_length(channel_bindings) > 0)
 	{
 		strcpy(p, SCRAM_SHA_256_PLUS_NAME);
 		p += strlen(SCRAM_SHA_256_PLUS_NAME) + 1;
diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 48b468f62f..cc55ce04ef 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -36,6 +36,7 @@
 #include 
 #endif
 
+#include "common/scram-common.h"
 #include "libpq/libpq.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -1184,6 +1185,18 @@ be_tls_get_certificate_hash(Port *port, size_t *len)
 #endif
 }
 
+/*
+ * Routine to get the list of channel binding types available in this SSL
+ * implementation. For OpenSSL, both tls-unique and tls-server-end-point
+ * are supported.
+ */
+List *
+be_tls_list_channel_bindings(void)
+{
+	return list_make2(pstrdup(SCRAM_CHANNEL_BINDING_TLS_UNIQUE),
+	  pstrdup(SCRAM_CHANNEL_BINDING_TLS_END_POINT));
+}
+
 /*
  * Convert an X509 subject name to a cstring.
  *
diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq/libpq-be.h
index 7698cd1f88..1a314346b8 100644
--- a/src/include/libpq/libpq-be.h
+++ b/src/include/libpq/libpq-be.h
@@ -259,6 +259,7 @@ extern bool be_tls_get_compression(Port *port);
 extern const char *be_tls_get_version(Port *port);
 extern const char *be_tls_get_cipher(Port *port);
 extern void be_tls_get_peerdn_name(Port *port, char *ptr, size_t len);
+extern List *be_tls_list_channel_bindings(void);
 
 /*
  * Get the expected TLS Finished message information from the client, useful


signature.asc
Description: PGP signature


Re: In what range of the code can we read debug_query_string?

2018-05-27 Thread Kyotaro HORIGUCHI
Thank you for the comment.

At Fri, 25 May 2018 10:08:08 -0400, Tom Lane  wrote in 
<3389.1527257...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > It is set for other kinds of message, (parse, bind, execute). I
> > think fastpath, close, flush and sync don't need that. If it is
> > reasonable to assume that we can see debug_query_string in the
> > DESCRIBE path, the attached patch would work.
> 
> I think this patch is a bad idea.

I agree on the CachedPlanGetTargetList (Describe-Portal) case.

>In the first place, debug_query_string
> is generally understood as the current *client* command string, not
> something internally generated.

Ture. It is commented on the variable, and I see bind and execute
restore the prepared statement (string) to the variable while
processing on the other hand. So my core question here is whether
the same reasoning is applicable on describe_statement. Execute
returns the result of a statement, and describe returns the
input/output description for the same statement these are
different only in what they returns for the same query.

> In the second place, it looks way too
> easy for this to leave debug_query_string as a dangling pointer,
> ie pointing at a dropped plancache entry.

exec_bind_message uses CachedPlanSource.plansource for
debug_query_string. If the describe_statement case is problematic
in the way, is exec also problematic in the same way?

> There might be a case for providing some way to get at the current
> actively-executing query's string, using a protocol that can deal
> with nesting of execution.  (This might already be more or less
> possible via ActivePortal->sourceText, depending on what you think
> the semantics ought to be.)  But debug_query_string was never
> intended to do that.

I'm not sure of for what purpose the describe_statement is used,
but exec_execute_messages seems doing exactly that.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is a modern build system acceptable for older platforms

2018-05-27 Thread Yuriy Zhuravlev
>
> Can't see getting rid of those entirely. None of the github style
> platforms copes with reasonable complex discussions.


I disagree. A good example of complex discussions on github is Rust
language tracker for RFCs:
https://github.com/rust-lang/rfcs/issues
and one concrete example: https://github.com/rust-lang/rfcs/issues/2327
I have no any problem with complex discussions on github.
Anyway, it's much better than tons of emails in your mailbox without tags
and status of discussion.


Re: Is a modern build system acceptable for older platforms

2018-05-27 Thread Yuriy Zhuravlev
I suppose I can make summary after reading all this:
1. Any change in the development process will be possible if it will be
convenient for each key developers personally only. (development process
include build system)
2. Currently, almost all key developers use Unix like systems, they have
strong old school C experience and current build system very comfortable
for them.

I think new build system will be possible only by next reasons:
1. Autotools will be completely deprecated and unsupported.
2. Key developers will be changed by people with another experience and
habits (and maybe younger).

I don't want to be CMake advocate here, and I see some problems with CMake
to Postgres project too. But I want to make Postgres development more
comfortable for people like me who also doesn't like mail lists and was
growing with github. Unfortunately, we are too few here to change anything
now.


Re: SCRAM with channel binding downgrade attack

2018-05-27 Thread Michael Paquier
On Sat, May 26, 2018 at 11:42:38PM +0900, Michael Paquier wrote:
> On Sat, May 26, 2018 at 09:08:50AM -0400, Bruce Momjian wrote:
>> On Sat, May 26, 2018 at 08:32:20AM +0900, Michael Paquier wrote:
>>>
>>> OK, I can live with that as well.  So we'll go in the direction of two
>>> parameters then:
>>> - scram_channel_binding, which can use "prefer" (default), "require" or
>>> "disable".
>>> - scram_channel_binding_name, developer option to choose the type of
>>> channel binding, with "tls-unique" (default) and "tls-server-end-point".
>>> We could also remove the prefix "scram_".  Ideas of names are welcome.
>> 
>> scram_channel_binding_method?
> 
> Or scram_channel_binding_type.  The first sentence of RFC 5929 uses this
> term.

I just went with scram_channel_binding_mode (require, disable and
prefer) and scram_channel_binding_type as parameter names, in the shape
of the attached patch.

>> Do we really know someone is going to want to actually specify the
>> channel binding type?  If it is only testing, maybe we don't need to
>> document this parameter.
> 
> Keeping everything documented is useful as well for new developers, as
> they need to guess less from the code.  So I would prefer seeing both
> connection parameters documented, and mentioning directly in the docs if
> a parameter is for developers or not.

So done this way.  Feel free to pick me up at PGcon this week if you
wish to discuss this issue.  Docs, tests and a commit message are
added.
--
Michael
From 34b68db28172458f69c59488a613d848939838a3 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 22 May 2018 17:03:48 +0900
Subject: [PATCH] Rework scram_channel_binding to protect from downgrade
 attacks

When a client attempts to connect to a PostgreSQL cluster, it may be
possible that it requested channel binding with SCRAM authentication,
but that the server tricks the cluster and forcibly downgrades the
authentication request.  For example, a v10 cluster supports SCRAM but
not channel binding so authentication could be achieved without channel
binding used.

In order to protect from such attacks, rework libpq handling of the
connection parameter scram_channel_binding as follows by splitting it
into two pieces:
- scram_channel_binding_mode, which can use as values "require",
"disable" or "prefer".
- scram_channel_binding_type, which is similar to the previous
scram_channel_binding, except that it does not accept anymore an empty
value to mean that channel binding is disabled.

"prefer", is the default and behaves so as channel binding is used if
available.  If the cluster does not support it then it is not used. This
does not protect from downgrade attacks.
"disable", which is the equivalent of the empty value previously,
disables channel binding for all exchanges.
"require" makes channel binding a hard requirement for the
authentication.

For "require", if the cluster does not support channel binding with
SCRAM (v10 server) or if SSL is not used, then the connection is refused
by libpq, which is something that can happen if SSL is not used
(prevention also for users forgetting to use sslmode=require at least in
connection strings).  This also allows users to enforce the use of SCRAM
with channel binding even if the targeted cluster downgrades to "trust"
or similar.

In order to achieve that, when receiving AUTH_REQ_OK from the server, we
check if a SASL exchange has happened and has finished, where the client
makes sure that the server knows the connection proof.  If the cluster
does not publish the -PLUS mechanism, then connection is also refused.

Discussion: https://postgr.es/m/20180519123557.gb2...@paquier.xyz
---
 doc/src/sgml/libpq.sgml  | 72 
 doc/src/sgml/release-11.sgml |  2 +-
 src/interfaces/libpq/fe-auth-scram.c | 45 +
 src/interfaces/libpq/fe-auth.c   | 54 +++--
 src/interfaces/libpq/fe-auth.h   |  1 +
 src/interfaces/libpq/fe-connect.c| 51 +---
 src/interfaces/libpq/libpq-int.h |  3 +-
 src/test/ssl/ServerSetup.pm  | 27 ++-
 src/test/ssl/t/002_scram.pl  | 53 
 9 files changed, 257 insertions(+), 51 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 498b8df988..90e522b2af 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1238,24 +1238,76 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
- 
-  scram_channel_binding
+ 
+  scram_channel_binding_mode
+  
+   
+This option determines whether channel binding should be used
+with SCRAM authentication.  While
+SCRAM alone prevents the replay of transmitted
+hashed passwords, channel binding also prevents man-in-the-middle
+attacks.  There are three modes:
+
+
+ 
+  disable
+  
+   
+Disable the use of channel binding for all connection at

Re: SP-GiST failing to complete SP-GiST index build

2018-05-27 Thread Peter Geoghegan
On Sun, May 27, 2018 at 5:24 PM, Peter Geoghegan  wrote:
> On Sun, May 27, 2018 at 5:10 PM, Tom Lane  wrote:
>> Instrumenting the test case suggests that getQuadrant pretty much always
>> returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
>> is related to the fact that the test case inserts the values in increasing
>> order, so that new values are always greater than existing values in the
>> index.
>
> I suspected the same. It reminded me of the weird behavior that the
> Postgres qsort() sometimes exhibits.

I confirmed this by using CLUSTER on an index built against a new
column with no physical/logical correlation (a column containing
random() data). This resulted in a dramatically faster build for the
SP-GiST index:

pg@~[31121]=# CREATE INDEX logs2_log_time_spgist_idx ON logs2 USING
spgist (log_time);
DEBUG:  building index "logs2_log_time_spgist_idx" on table "logs2" serially
CREATE INDEX
Time: 3961.815 ms (00:03.962)

Also, the final index is only 88 MB (not 122 MB).

As a point of comparison, this is how a REINDEX of the GiST index went
against the same (CLUSTERed) table:

pg@~[31121]=# REINDEX INDEX logs2_log_time_gist_idx;
DEBUG:  building index "logs2_log_time_gist_idx" on table "logs2" serially
REINDEX
Time: 14652.058 ms (00:14.652)

-- 
Peter Geoghegan



Re: SP-GiST failing to complete SP-GiST index build

2018-05-27 Thread Peter Geoghegan
On Sun, May 27, 2018 at 5:10 PM, Tom Lane  wrote:
> Instrumenting the test case suggests that getQuadrant pretty much always
> returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
> is related to the fact that the test case inserts the values in increasing
> order, so that new values are always greater than existing values in the
> index.

I suspected the same. It reminded me of the weird behavior that the
Postgres qsort() sometimes exhibits.

> SPGiST is unable to rebalance its tree on the fly, so it's pretty
> well screwed in this situation.  It does finish eventually, but in about
> 50x longer than GiST.  I imagine the index's query performance would be
> equally awful.

Can you think of some way of side-stepping the issue? It's unfortunate
that SP-GiST is potentially so sensitive to input order.

-- 
Peter Geoghegan



Re: SP-GiST failing to complete SP-GiST index build

2018-05-27 Thread Tom Lane
Peter Geoghegan  writes:
> Looks like I spoke too soon. The SP-GiST index build finished a moment
> ago. The index build took a horrifically long time for a 122 MB index,
> though.

Instrumenting the test case suggests that getQuadrant pretty much always
returns 1, resulting in a worst-case unbalanced SPGiST tree.  I think this
is related to the fact that the test case inserts the values in increasing
order, so that new values are always greater than existing values in the
index.  SPGiST is unable to rebalance its tree on the fly, so it's pretty
well screwed in this situation.  It does finish eventually, but in about
50x longer than GiST.  I imagine the index's query performance would be
equally awful.

regards, tom lane



Re: SP-GiST failing to complete SP-GiST index build

2018-05-27 Thread Peter Geoghegan
On Sun, May 27, 2018 at 2:09 PM, Peter Geoghegan  wrote:
> On Sun, May 27, 2018 at 12:45 PM, Jonathan S. Katz  
> wrote:
>> Next, see bad.sql.  1.2MM sparsely clustered rows inserted, GiST indexes
>> builds in about 30s on my machine.  SP-GiST does not build at all, or at
>> least I have been composing this email for about 10 minutes since I kicked
>> off my latest and it has yet to terminate.
>>
>> I can understand this being an extreme case for SP-GiST as it’s better
>> for data set that’s more densely clustered, but I wanted to provide
>> this info to rule out whether or not this is a bug.
>
> While I'm no SP-GiST expert, I strongly suspect that you've identified
> a real bug here. Your test case has been running on my development
> machine for 20 minutes now (this is server class hardware).

Looks like I spoke too soon. The SP-GiST index build finished a moment
ago. The index build took a horrifically long time for a 122 MB index,
though.

-- 
Peter Geoghegan



Re: SP-GiST failing to complete SP-GiST index build

2018-05-27 Thread Peter Geoghegan
On Sun, May 27, 2018 at 12:45 PM, Jonathan S. Katz  wrote:
> Next, see bad.sql.  1.2MM sparsely clustered rows inserted, GiST indexes
> builds in about 30s on my machine.  SP-GiST does not build at all, or at
> least I have been composing this email for about 10 minutes since I kicked
> off my latest and it has yet to terminate.
>
> I can understand this being an extreme case for SP-GiST as it’s better
> for data set that’s more densely clustered, but I wanted to provide
> this info to rule out whether or not this is a bug.

While I'm no SP-GiST expert, I strongly suspect that you've identified
a real bug here. Your test case has been running on my development
machine for 20 minutes now (this is server class hardware).

I ran perf with your testcase, and I see that the majority of
instructions are executed within these functions:

  22.88%  postgres  postgres[.] spgdoinsert
  12.98%  postgres  postgres[.] range_deserialize
  11.44%  postgres  postgres[.] FunctionCall2Coll
  10.40%  postgres  postgres[.] heap_tuple_untoast_attr
   8.62%  postgres  postgres[.] spgExtractNodeLabels
   5.92%  postgres  postgres[.] getQuadrant
   4.90%  postgres  postgres[.] AllocSetAlloc

spgdoinsert() contains the following comment:

/*
 * Bail out if query cancel is pending.  We must have this somewhere
 * in the loop since a broken opclass could produce an infinite
 * picksplit loop.
 */
CHECK_FOR_INTERRUPTS();

Perhaps the problem is in the range type SP-GiST opclass support code
- it could have this exact picksplit infinite loop problem. That's
perfectly consistent with what we see here.

-- 
Peter Geoghegan



Re: jsonb iterator not fully initialized

2018-05-27 Thread Dmitry Dolgov
> On 26 May 2018 at 16:47, Andrew Dunstan  
> wrote:
> On 05/26/2018 03:09 AM, Piotr Stefaniak wrote:
>> On 2018-05-26 02:02, Peter Eisentraut wrote:
>>>
>>> It can be fixed this way:
>>>
>>> --- a/src/backend/utils/adt/jsonb_util.c
>>> +++ b/src/backend/utils/adt/jsonb_util.c
>>> @@ -901,7 +901,7 @@ iteratorFromContainer(JsonbContainer *container,
>>> JsonbIterator *parent)
>>>{
>>>   JsonbIterator *it;
>>>
>>> -   it = palloc(sizeof(JsonbIterator));
>>> +   it = palloc0(sizeof(JsonbIterator));
>>>   it->container = container;
>>>   it->parent = parent;
>>>   it->nElems = JsonContainerSize(container);
>>>
>>> It's probably not a problem in practice, since the isScalar business is
>>> apparently only used in the array case, but it's dubious to leave things
>>> uninitialized like this nonetheless.

Yes, sounds reasonable.

>> I've seen it earlier but couldn't decide what my proposed fix should
>> look like. One of the options I considered was:
>>
>> --- a/src/backend/utils/adt/jsonfuncs.c
>> +++ b/src/backend/utils/adt/jsonfuncs.c
>> @@ -5010,10 +5010,8 @@ transform_jsonb_string_values(Jsonb *jsonb, void
>> *action_state,
>>   JsonbIteratorToken type;
>>   JsonbParseState *st = NULL;
>>   text   *out;
>> -   boolis_scalar = false;
>>
>>   it = JsonbIteratorInit(&jsonb->root);
>> -   is_scalar = it->isScalar;
>>
>>   while ((type = JsonbIteratorNext(&it, &v, false)) != WJB_DONE)
>>   {
>> @@ -5033,7 +5031,7 @@ transform_jsonb_string_values(Jsonb *jsonb, void
>> *action_state,
>>   }
>>
>>   if (res->type == jbvArray)
>> -   res->val.array.rawScalar = is_scalar;
>> +   res->val.array.rawScalar =
>> JsonContainerIsScalar(&jsonb->root);
>>
>>   return JsonbValueToJsonb(res);
>>}
>
> palloc0 seems cleaner.

Totally agree, palloc0 looks better (although I assume it's going to be
negligibly slower in those cases that aren't affected by this problem).



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Andres Freund
Hi,

On 2018-05-27 13:22:21 -0400, Tom Lane wrote:
> But I don't think there's any need for special magic here: we just
> have to accept the fact that there's a need to flush that cache
> sometimes.  In normal use it shouldn't happen often enough to be a
> performance problem.

Yea, it's not that problematic. We already remove the local init
file. I started out trying to write a version of invalidation that also
WAL logs shared inval, but that turns out to be hard to do without
breaking compatibilty.  So I think we should just always unlink the
shared one - that seems to work well.


> FWIW, I'm not on board with "memcpy the whole row".  I think the right
> thing is more like what we do in RelationReloadIndexInfo, ie copy over
> the specific fields that we expect to be mutable.

But that's what RelationReloadIndexInfo() etc do?
relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
I don't think we need to modify anything outside of rd_rel at this
point?

I've a patch that seems to work, that mostly needs some comment
polishing.

Greetings,

Andres Freund



SP-GiST failing to complete SP-GiST index build

2018-05-27 Thread Jonathan S. Katz
Hi,

While preparing for an upcoming presentation, I was playing around
with SP-GiST indexes on tstz ranges and was having an issue where
some would fail to build to completion in a reasonable time, especially
compared to corresponding GiST builds.

Version:

PostgreSQL 10.4 on x86_64-apple-darwin16.7.0,
compiled by Apple LLVM version 9.0.0 (clang-900.0.39.2), 64-bit

Relevant Config:

shared_buffers = 1GB
work_mem = 64MB 
maintenance_work_mem = 1GB 

System was not under unusual load.

I thought perhaps it was a result of my data being relatively sparse,
but then I had an issue getting one scenario to build where the data
was much more common use case. I wanted to run this scenario
by just to ensure there are no bugs, as the “common case” was working
fine for me.

First, see “good.sql” for a base case: 1.2MM rows “densely” clustered
rows inserted, both GiST and SP-GiST index build in reasonable time
periods (< 15s on my machine).

Next, see bad.sql.  1.2MM sparsely clustered rows inserted, GiST indexes
builds in about 30s on my machine.  SP-GiST does not build at all, or at
least I have been composing this email for about 10 minutes since I kicked
off my latest and it has yet to terminate.

I can understand this being an extreme case for SP-GiST as it’s better
for data set that’s more densely clustered, but I wanted to provide
this info to rule out whether or not this is a bug.

Thanks,

Jonathan



good.sql
Description: Binary data


bad.sql
Description: Binary data


Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Tom Lane
Andres Freund  writes:
> On May 27, 2018 9:39:49 AM PDT, "Nasby, Jim"  wrote:
>> How about only keeping the critical info for being able to find
>> relations in the .init files, and then fully populate the cache by
>> doing a normal lookup?

> Then the cache wouldn't have any benefits, no? It's been a while, but last 
> time I checked it does make quite a measurable performance difference in a 
> new backend.

Yeah, we don't want to lose the performance benefit.  But I don't think
there's any need for special magic here: we just have to accept the fact
that there's a need to flush that cache sometimes.  In normal use it
shouldn't happen often enough to be a performance problem.

FWIW, I'm not on board with "memcpy the whole row".  I think the right
thing is more like what we do in RelationReloadIndexInfo, ie copy over
the specific fields that we expect to be mutable.

regards, tom lane



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Andres Freund


On May 27, 2018 9:39:49 AM PDT, "Nasby, Jim"  wrote:
>On May 26, 2018, at 1:45 PM, Andres Freund  wrote:
>> Does anybody see a way to not have to remove the .init file?
>
>How about only keeping the critical info for being able to find
>relations in the .init files, and then fully populate the cache by
>doing a normal lookup? Since there’s multiple entries needed to
>bootstrap things I guess there’d need to be a flag in relcache to
>indicate that an existing entry wasn’t fully formed.

Then the cache wouldn't have any benefits, no? It's been a while, but last time 
I checked it does make quite a measurable performance difference in a new 
backend.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.



Re: found xmin from before relfrozenxid on pg_catalog.pg_authid

2018-05-27 Thread Nasby, Jim
On May 26, 2018, at 1:45 PM, Andres Freund  wrote:
> Does anybody see a way to not have to remove the .init file?

How about only keeping the critical info for being able to find relations in 
the .init files, and then fully populate the cache by doing a normal lookup? 
Since there’s multiple entries needed to bootstrap things I guess there’d need 
to be a flag in relcache to indicate that an existing entry wasn’t fully formed.

Re: perl checking

2018-05-27 Thread Andrew Dunstan



On 05/18/2018 02:02 PM, Andrew Dunstan wrote:


These two small patches allow us to run "perl -cw" cleanly on all our 
perl code.



One patch silences a warning from convutils.pl about the unportability 
of the literal 0x1. We've run for many years without this 
giving us a problem, so I think we can turn the warning off pretty 
safely.



The other patch provides a dummy library that emulates just enough of 
the Win32 perl infrastructure to allow us to run these checks. That 
means that Unix-based developers who might want to make changes in the 
msvc code can actually run a check against their code without having 
to put it on a Windows machine. The invocation goes like this (to 
check Mkvcbuild.pl for example):



   PERL5LIB=src/tools/msvc/dummylib perl -cw src/tools/Mkvcbuild.pm


This also allows us to check src/tools/win32tzlist.pl.


In due course I'll submit a script to automate this syntax checking.






Here is the latest version of the second patch, this time with warnings 
about redefinition of some subroutines suppressed. These mostly occur 
because in a few cases we have multiple packages in a single file.


This allows the following command to pass cleanly:

   { find . -type f -a -name '*.p[lm]' -print; find . -type f -perm
   -100 -exec file {} \; -print | egrep -i ':.*perl[0-9]*\>' |cut -d:
   -f1 ; } | sort -u |
   
PERL5LIB=src/test/perl:src/test/ssl:src/bin/pg_rewind:src/backend/catalog:src/backend/utils/mb/Unicode:src/tools/msvc/dummylib:src/tools/msvc
   xargs -L 1 perl -cw

cheers

andrew



--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/tools/msvc/MSBuildProject.pm b/src/tools/msvc/MSBuildProject.pm
index 27397ba..86979df 100644
--- a/src/tools/msvc/MSBuildProject.pm
+++ b/src/tools/msvc/MSBuildProject.pm
@@ -11,6 +11,8 @@ use strict;
 use warnings;
 use base qw(Project);
 
+no warnings qw(redefine);
+
 sub _new
 {
 	my $classname = shift;
@@ -399,6 +401,8 @@ use strict;
 use warnings;
 use base qw(MSBuildProject);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -420,6 +424,8 @@ use strict;
 use warnings;
 use base qw(MSBuildProject);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -464,6 +470,8 @@ use strict;
 use warnings;
 use base qw(VC2012Project);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -487,6 +495,8 @@ use strict;
 use warnings;
 use base qw(VC2012Project);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -510,6 +520,8 @@ use strict;
 use warnings;
 use base qw(VC2012Project);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
diff --git a/src/tools/msvc/Project.pm b/src/tools/msvc/Project.pm
index 261c913..0d35546 100644
--- a/src/tools/msvc/Project.pm
+++ b/src/tools/msvc/Project.pm
@@ -229,6 +229,7 @@ sub AddDir
 
 if ($filter eq "LIBOBJS")
 {
+	no warnings qw(once);
 	if (grep(/$p/, @main::pgportfiles, @main::pgcommonfiles)
 		== 1)
 	{
diff --git a/src/tools/msvc/Solution.pm b/src/tools/msvc/Solution.pm
index 8f0b355..1440989 100644
--- a/src/tools/msvc/Solution.pm
+++ b/src/tools/msvc/Solution.pm
@@ -10,6 +10,8 @@ use strict;
 use warnings;
 use VSObjectFactory;
 
+no warnings qw(redefine);
+
 sub _new
 {
 	my $classname = shift;
@@ -768,6 +770,8 @@ use strict;
 use warnings;
 use base qw(Solution);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -791,6 +795,8 @@ use strict;
 use warnings;
 use base qw(Solution);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -815,6 +821,8 @@ use strict;
 use warnings;
 use base qw(Solution);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -839,6 +847,8 @@ use strict;
 use warnings;
 use base qw(Solution);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -863,6 +873,8 @@ use strict;
 use warnings;
 use base qw(Solution);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -889,6 +901,8 @@ use strict;
 use warnings;
 use base qw(Solution);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -915,6 +929,8 @@ use strict;
 use warnings;
 use base qw(Solution);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
diff --git a/src/tools/msvc/VCBuildProject.pm b/src/tools/msvc/VCBuildProject.pm
index 03b890b..ad613b3 100644
--- a/src/tools/msvc/VCBuildProject.pm
+++ b/src/tools/msvc/VCBuildProject.pm
@@ -11,6 +11,8 @@ use strict;
 use warnings;
 use base qw(Project);
 
+no warnings qw(redefine);
+
 sub _new
 {
 	my $classname = shift;
@@ -268,6 +270,8 @@ use strict;
 use warnings;
 use base qw(VCBuildProject);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
@@ -289,6 +293,8 @@ use strict;
 use warnings;
 use base qw(VCBuildProject);
 
+no warnings qw(redefine);
+
 sub new
 {
 	my $classname = shift;
diff --git a/src/tools/msvc/VS

Re: Redesigning the executor (async, JIT, memory efficiency)

2018-05-27 Thread Douglas Doole
>
> I think we're going to have to continue showing the tree plan. I think
> the linear version is far too hard to understand for anything
> nontrivial.
>

Hey Andres, what you're pitching here is very similar to the way DB2 works.
It actually has two different explain tools that dumps the two different
formats. The tree dump is the preferred format, but there are times where
the human-readable dump of the executor steps is very useful. Particularly,
as mentioned elsewhere in this thread, when the executor steps diverge from
the plan.

One thing that helps correlate the two formats is that each node in the
tree dump is numbered and each step in the executor is annotated with the
corresponding tree node number.

Another tool that was invaluable is a readable dump (effectively a
disassembly) of the byte code. It was only useful to developers, but when
something goes wrong in the executor it was incredibly useful to see
exactly what the byte code was specifying (as opposed the human readable
format of explain, which could hide subtle details.)

- Doug
Salesforce


Re: perlcritic and perltidy

2018-05-27 Thread Andrew Dunstan



On 05/25/2018 03:04 PM, Bruce Momjian wrote:

On Sun, May  6, 2018 at 11:53:34AM -0400, Tom Lane wrote:

What sort of changes do we get if we remove those two flags as you prefer?
It'd help to see some examples.

Since we just went to a new perltidy version, and made some other
policy changes for it, in HEAD, it'd make sense to make any further
changes in this same release cycle rather than drip drip drip over
multiple cycles.  We just need to get some consensus about what
style we like.

I saw you looking for feedback so I wanted to give mine.  Also, Andrew,
thanks for working on this --- it is a big help to have limited Perl
critic reports and good tidiness.

I am using the src/tools/pgindent/perltidyrc setting for my own Perl
code, but needed to add these two:

--noblanks-before-comments
--break-after-all-operators

The first one fixes odd blank lines when I put comments inside
conditional tests, e.g.:

 if (!$options{args_supplied}   &&
 !$is_debug &&
 defined($stat_main)&&
 defined($stat_cache)   &&
 $stat_main->mtime < $stat_cache->mtime &&
 # is local time zone?
 (!defined($ENV{TZ}) || $ENV{TZ} =~ m/^E.T$/))

Without the first option, I get:

 if (!$options{args_supplied}   &&
 !$is_debug &&
 defined($stat_main)&&
 defined($stat_cache)   &&
 $stat_main->mtime < $stat_cache->mtime &&
-->
 # is local time zone?
 (!defined($ENV{TZ}) || $ENV{TZ} =~ m/^E.T$/))

which just looks odd to me.  Am I the only person who often does this?

The second option, --break-after-all-operators, is more of a personal
taste, but it does match how our C code works, and people have said I
write C code in Perl.  ;-)





I agree with adding --no-blanks-before-comments. That doesn't remove any 
blank lines, it just stops perltidy from adding them before comments, so 
adding it to the perltidyrc doesn't change anything.


I looked at --break-after-all-operators, but I didn't like the result. I 
tried to refine it by adding --want-break-before='. : ? && || and or'. 
However, it didn't do what it was told in the case of ? operators. That 
seems like a perltidy bug. The bug persists even in the latest version 
of perltidy. So I think we should just leave things as they are in this 
respect.


cheers

andrew

--
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Periods

2018-05-27 Thread Pavel Stehule
2018-05-26 22:56 GMT+02:00 Vik Fearing :

> SQL:2011 introduced the concept of a "period". It takes two existing
> columns and basically does the same thing as our range types except there
> is no new storage.  I believe if Jeff Davis had given us range types a few
> years later than he did, it would have been using periods.
>
> Attached is a WIP patch that I have been working on.  The only thing left
> is completing periods on inherited tables, and finishing up pg_dump.  I'm
> posting this now just to make sure my basic foundation is sound, and to let
> people know that I'm working on this.
>
> The patch itself doesn't have any functionality, it just allows periods to
> be defined.  With that, there are several things that we can do:
> SYSTEM_TIME periods, which are explicitly not allowed by this patch, will
> allow us to do SQL standard versioned tables, and also allows some time
> travel functionality.  Application periods can be used in PRIMARY/UNIQUE
> keys, foreign keys, and give nice new features to UPDATE and DELETE.  They
> also allow "period predicates" which are the same kind of operations we
> already have for range types.  All of that is for future patches that build
> on the infrastructure presented in this patch.
>
> The SQL standard restricts period columns to dates or timestamps, but I'm
> allowing anything that has a btree operator class, as is the PostgreSQL
> way. System periods, once allowed, will only be timestamptz though.
> Unfortunately, I had to fully reserve the word PERIOD for this.
>
> I'm looking for comments on everything except the pg_dump stuff, keeping
> in mind that inheritance is not finished either.
>
> Thanks!
>

looks interesting

Regards

Pavel


>
>
> This is patch is based off of 71b349aef4.
>