Re: Restore replication settings when modifying a field type

2020-01-14 Thread Quan Zongliang

On 2020/1/3 17:14, Peter Eisentraut wrote:

On 2019-11-01 04:39, Euler Taveira wrote:

ATExecAlterColumnType records everything that depends on the column
and for indexes it saves the definition (via pg_get_indexdef_string).
Definition is not sufficient for reconstructing the replica identity
information because there is not such keyword for replica identity in
CREATE INDEX. The new index should call relation_mark_replica_identity
to fix pg_index.indisreplident.


Yeah, I don't think we need to do the full dance of reverse compiling 
the SQL command and reexecuting it, as the patch currently does.  That's 
only necessary for rebuilding the index itself.  For re-setting the 
replica identity, we can just use the internal API as you say.


Also, a few test cases would be nice for this patch.



I'm a little busy. I'll write a new patch in a few days.




Re: aggregate crash

2020-01-14 Thread Andres Freund
Hi,

On 2020-01-14 17:54:16 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2020-01-14 17:01:01 -0500, Tom Lane wrote:
> >> But I agree that not checking null-ness
> >> explicitly is kind of unsafe.  We've never before had any expectation
> >> that the Datum value of a null is anything in particular.
> 
> > I'm still not sure I actually fully understand the bug. It's obvious how
> > returning the input value again could lead to memory not being freed (so
> > that leak seems to go all the way back). And similarly, since the
> > introduction of expanded objects, it can also lead to the expanded
> > object not being deleted.
> > But that's not the problem causing the crash here. What I think must
> > instead be the problem is that pergroupstate->transValueIsNull, but
> > pergroupstate->transValue is set to something looking like a
> > pointer. Which caused us not to datumCopy() a new transition value into
> > a long lived context. and then a later transition causes us to free the
> > short-lived value?
> 
> Yeah, I was kind of wondering that too.  While formally the Datum value
> for a null is undefined, I'm not aware offhand of any functions that
> wouldn't return zero --- and this would have to be an aggregate transition
> function doing so, which reduces the universe of candidates quite a lot.
> Plus there's the question of how often a transition function would return
> null for non-null input at all.
> 
> Could we see a test case that provokes this crash, even if it doesn't
> do so reliably?

There's a larger reproducer referenced in the first message. I had hoped
that Teodor could narrow it down - I guess I'll try to do that tomorrow...

Greetings,

Andres Freund




Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-14 Thread Maciek Sakrejda
Thanks for the review! I looked at  and rebased the patch
on current master, ac5bdf6.

I introduced a new test file because this bug is specifically about
EXPLAIN output (as opposed to query execution or planning
functionality), and it didn't seem like a test would fit in any of the
other files. I focused on testing just the behavior around this
specific bug (and fix). I think eventually we should probably test
other more fundamental EXPLAIN features (and I'm happy to contribute
to that) in that file, but that seems outside of the scope of this
patch.

Any thoughts on what we should do with text mode output (which is
untouched right now)? The output Andres proposed above makes sense to
me, but I'd like to get more input.
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index d189b8d573..96a8973884 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -100,7 +100,7 @@ static void show_sortorder_options(StringInfo buf, Node *sortexpr,
    Oid sortOperator, Oid collation, bool nullsFirst);
 static void show_tablesample(TableSampleClause *tsc, PlanState *planstate,
 			 List *ancestors, ExplainState *es);
-static void show_sort_info(SortState *sortstate, ExplainState *es);
+static void show_sort_info(SortState *sortstate, StringInfo* worker_strs, ExplainState *es);
 static void show_hash_info(HashState *hashstate, ExplainState *es);
 static void show_tidbitmap_info(BitmapHeapScanState *planstate,
 ExplainState *es);
@@ -289,6 +289,7 @@ NewExplainState(void)
 	es->costs = true;
 	/* Prepare output buffer. */
 	es->str = makeStringInfo();
+	es->root_str = es->str;
 
 	return es;
 }
@@ -1090,9 +1091,20 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	const char *partialmode = NULL;
 	const char *operation = NULL;
 	const char *custom_name = NULL;
+	StringInfo *worker_strs = NULL;
 	int			save_indent = es->indent;
 	bool		haschildren;
 
+	/* Prepare per-worker output */
+	if (es->analyze && planstate->worker_instrument) {
+		int num_workers = planstate->worker_instrument->num_workers;
+		int n;
+		worker_strs = (StringInfo *) palloc0(num_workers * sizeof(StringInfo));
+		for (n = 0; n < num_workers; n++) {
+			worker_strs[n] = makeStringInfo();
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_Result:
@@ -1357,6 +1369,64 @@ ExplainNode(PlanState *planstate, List *ancestors,
 		ExplainPropertyBool("Parallel Aware", plan->parallel_aware, es);
 	}
 
+	/* Prepare worker general execution details */
+	if (es->analyze && es->verbose && planstate->worker_instrument)
+	{
+		WorkerInstrumentation *w = planstate->worker_instrument;
+		int			n;
+
+		for (n = 0; n < w->num_workers; ++n)
+		{
+			Instrumentation *instrument = >instrument[n];
+			double		nloops = instrument->nloops;
+			double		startup_ms;
+			double		total_ms;
+			double		rows;
+
+			if (nloops <= 0)
+continue;
+			startup_ms = 1000.0 * instrument->startup / nloops;
+			total_ms = 1000.0 * instrument->total / nloops;
+			rows = instrument->ntuples / nloops;
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+appendStringInfoSpaces(es->str, es->indent * 2);
+appendStringInfo(es->str, "Worker %d: ", n);
+if (es->timing)
+	appendStringInfo(es->str,
+	 "actual time=%.3f..%.3f rows=%.0f loops=%.0f\n",
+	 startup_ms, total_ms, rows, nloops);
+else
+	appendStringInfo(es->str,
+	 "actual rows=%.0f loops=%.0f\n",
+	 rows, nloops);
+es->indent++;
+if (es->buffers)
+	show_buffer_usage(es, >bufusage);
+es->indent--;
+			}
+			else
+			{
+ExplainOpenWorker(worker_strs[n], es);
+
+if (es->timing)
+{
+	ExplainPropertyFloat("Actual Startup Time", "ms",
+		 startup_ms, 3, es);
+	ExplainPropertyFloat("Actual Total Time", "ms",
+		 total_ms, 3, es);
+}
+ExplainPropertyFloat("Actual Rows", NULL, rows, 0, es);
+ExplainPropertyFloat("Actual Loops", NULL, nloops, 0, es);
+			}
+		}
+
+		if (es->format != EXPLAIN_FORMAT_TEXT) {
+			ExplainCloseWorker(es);
+		}
+	}
+
 	switch (nodeTag(plan))
 	{
 		case T_SeqScan:
@@ -1856,7 +1926,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
 			break;
 		case T_Sort:
 			show_sort_keys(castNode(SortState, planstate), ancestors, es);
-			show_sort_info(castNode(SortState, planstate), es);
+			show_sort_info(castNode(SortState, planstate), worker_strs, es);
 			break;
 		case T_MergeAppend:
 			show_merge_append_keys(castNode(MergeAppendState, planstate),
@@ -1885,74 +1955,30 @@ ExplainNode(PlanState *planstate, List *ancestors,
 	if (es->buffers && planstate->instrument)
 		show_buffer_usage(es, >instrument->bufusage);
 
-	/* Show worker detail */
-	if (es->analyze && es->verbose && !es->hide_workers &&
-		planstate->worker_instrument)
+	/* Prepare worker buffer usage */
+	if (es->buffers && es->analyze && es->verbose && !es->hide_workers
+		&& planstate->worker_instrument && es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		

Re: Remove libpq.rc, use win32ver.rc for libpq

2020-01-14 Thread Michael Paquier
On Wed, Jan 15, 2020 at 02:22:45PM +0900, Kyotaro Horiguchi wrote:
> At Tue, 14 Jan 2020 22:34:10 +0100, Peter Eisentraut 
>  wrote in 
>> On 2020-01-09 10:56, Peter Eisentraut wrote:
>>> Note that there is FILEFLAGSMASK and FILEFLAGS.  The first is just a
>>> mask that says which bits in the second are valid.  Since both
>>> libpq.rc
>>> and win32ver.rc use FILEFLAGS 0, it doesn't matter what we set
>>> FILEFLAGSMASK to.  But currently libpq.rc uses 0x3fL and win32ver.rc
>>> uses 0x17L, so in order to unify this sensibly I looked for a
>>> well-recognized standard value, which led to VS_FFI_FILEFLAGSMASK.

Hmm.  I agree that what you have here is sensible.  I am wondering if
it would be better to have VS_FF_DEBUG set dynamically in FILEFLAGS in
the future though.  But that's no material for this patch.

>> Here is a rebased patch.
> 
> It applied on 4d8a8d0c73 cleanly and built successfully by VS2019.

I have been testing and checking the patch a bit more seriously, and
the information gets generated correctly for dlls and exe files.  The
rest of the changes look fine to me.  For src/makefiles/Makefile.win32,
I don't have a MinGW environment at hand so I have not directly
tested but the logic looks fine.
--
Michael


signature.asc
Description: PGP signature


Re: Setting min/max TLS protocol in clientside libpq

2020-01-14 Thread Michael Paquier
On Tue, Jan 14, 2020 at 11:01:00PM +0100, Daniel Gustafsson wrote:
> Files renamed to match existing naming convention, the rest of the patch left
> unchanged.

+   if ((pg_strcasecmp("tlsv1", protocol) == 0) || pg_strcasecmp("tlsv1.0", 
protocol) == 0)
+   return TLS1_VERSION;
"TLSv1.0" is not a supported grammar in the backend.  So I would just
drop it in libpq.  It is also not documented.

+ * Portions Copyright (c) 2018-2020, PostgreSQL Global Development Group
+ *
+ * IDENTIFICATION
+ *   src/common/protocol_openssl.c
It is a nobrainer to just use those lines for copyright notices
instead:
 * Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
 * Portions Copyright (c) 1994, Regents of the University of California

+ 
+  sslminprotocolversion
Got to wonder if we had better not use underscores for those new
parameter names as they are much longer than their cousins.
Underscores would make things more inconsistent.

+   if (ssl_min_ver == -1)
+   {
+   printfPQExpBuffer(>errorMessage,
+ libpq_gettext("invalid minimum protocol version 
specified: %s\n"),
+ conn->sslminprotocolversion);
+   return -1;
+   }
[...]
+   if (ssl_max_ver == -1)
+   {
+   printfPQExpBuffer(>errorMessage,
+ libpq_gettext("invalid or unsupported maximum 
protocol version specified: %s\n"),
+ conn->sslmaxprotocolversion);
+   return -1;
+   }
Error messages for the min/max are inconsistent.  I would just use
"unsupported", because...

Following with your complain on the other thread about the GUC
handling for the min/max protocol parameter. Shouldn't you validate
the supported values in connectOptions2() like what's done for the
other parameters?  This way, you can make the difference between an
invalid value and an unsupported value with two different error
strings.  By validating the values at an earlier stage, you save a
couple of cycles for the application.

+TLSv1.3.  The supported protocols depend on the
+version of OpenSSL used, older versions
+doesn't support the modern protocol versions.
Incorrect grammar => s/doesn't/don't/.

It would be good to mention that the default is no value, meaning that
the minimum and/or the maximum are not enforced in the SSL context.

+   if (conn->sslminprotocolversion)
+   {
[...]
+   if (conn->sslmaxprotocolversion)
+   {
You are missing two checks for empty strings here (aka strlen == 0).
These should be treated the same as no value to enforce the protocol
to.  (Let's not add an alias for "any".)

+ * Convert TLS protocol versionstring to OpenSSL values
Space needed here => "version string".

A nit, perhaps unnecessary, but I would use "TLSv1.1", etc. in the
values harcoded for libpq.  That's easier to grep after, and
consistent with the existing conventions even if you apply a
case-insensitive comparison.
--
Michael


signature.asc
Description: PGP signature


Re: Remove libpq.rc, use win32ver.rc for libpq

2020-01-14 Thread Kyotaro Horiguchi
At Tue, 14 Jan 2020 22:34:10 +0100, Peter Eisentraut 
 wrote in 
> On 2020-01-09 10:56, Peter Eisentraut wrote:
> > On 2020-01-06 09:02, Michael Paquier wrote:
> >> - FILEFLAGSMASK  0x17L
> >> + FILEFLAGSMASK  VS_FFI_FILEFLAGSMASK
> >> Are you sure with the mapping here?  I would have thought that
> >> VS_FF_DEBUG is not necessary when using release-quality builds, which
> >> is something that can be configured with build.pl, and that it would
> >> be better to not enforce VS_FF_PRERELEASE all the time.
> > Note that there is FILEFLAGSMASK and FILEFLAGS.  The first is just a
> > mask that says which bits in the second are valid.  Since both
> > libpq.rc
> > and win32ver.rc use FILEFLAGS 0, it doesn't matter what we set
> > FILEFLAGSMASK to.  But currently libpq.rc uses 0x3fL and win32ver.rc
> > uses 0x17L, so in order to unify this sensibly I looked for a
> > well-recognized standard value, which led to VS_FFI_FILEFLAGSMASK.

I agree to the direction of the patch and the point above sounds
sensible to me.

> Here is a rebased patch.

It applied on 4d8a8d0c73 cleanly and built successfully by VS2019.

regares.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: backup manifests

2020-01-14 Thread David Steele

Hi Tom,

On 1/14/20 9:47 PM, Tom Lane wrote:

David Steele  writes:

I'm happy to work up a prototype unless the consensus is that we
absolutely don't want a second JSON parser in core.


How much code are we talking about?  If the answer is "a few hundred
lines", it's a lot easier to swallow than if it's "a few thousand".


It's currently about a thousand lines but we have a lot of functions to 
convert to/from specific types.  I imagine the line count would be 
similar using one of the approaches I discussed above.


Current source attached for reference.

Regards,
--
-David
da...@pgmasters.net
/***
Convert JSON to/from KeyValue
***/
#include "build.auto.h"

#include 
#include 

#include "common/debug.h"
#include "common/log.h"
#include "common/type/json.h"

/***
Prototypes
***/
static Variant *jsonToVarInternal(const char *json, unsigned int *jsonPos);

/***
Consume whitespace
***/
static void
jsonConsumeWhiteSpace(const char *json, unsigned int *jsonPos)
{
FUNCTION_TEST_BEGIN();
FUNCTION_TEST_PARAM(STRINGZ, json);
FUNCTION_TEST_PARAM_P(UINT, jsonPos);
FUNCTION_TEST_END();

// Consume whitespace
while (json[*jsonPos] == ' ' || json[*jsonPos] == '\t' || json[*jsonPos] == 
'\n'  || json[*jsonPos] == '\r')
(*jsonPos)++;

FUNCTION_TEST_RETURN_VOID();
}

/***
Convert a json string to a bool
***/
static bool
jsonToBoolInternal(const char *json, unsigned int *jsonPos)
{
FUNCTION_TEST_BEGIN();
FUNCTION_TEST_PARAM(STRINGZ, json);
FUNCTION_TEST_PARAM_P(UINT, jsonPos);
FUNCTION_TEST_END();

bool result;

if (strncmp(json + *jsonPos, TRUE_Z, 4) == 0)
{
result = true;
*jsonPos += 4;
}
else if (strncmp(json + *jsonPos, FALSE_Z, 5) == 0)
{
result = false;
*jsonPos += 5;
}
else
THROW_FMT(JsonFormatError, "expected boolean at '%s'", json + *jsonPos);

FUNCTION_TEST_RETURN(result);
}

bool
jsonToBool(const String *json)
{
FUNCTION_TEST_BEGIN();
FUNCTION_TEST_PARAM(STRING, json);
FUNCTION_TEST_END();

unsigned int jsonPos = 0;
jsonConsumeWhiteSpace(strPtr(json), );

bool result = jsonToBoolInternal(strPtr(json), );

jsonConsumeWhiteSpace(strPtr(json), );

if (jsonPos != strSize(json))
THROW_FMT(JsonFormatError, "unexpected characters after boolean at 
'%s'", strPtr(json) + jsonPos);

FUNCTION_TEST_RETURN(result);
}

/***
Convert a json number to various integer types
***/
static Variant *
jsonToNumberInternal(const char *json, unsigned int *jsonPos)
{
FUNCTION_TEST_BEGIN();
FUNCTION_TEST_PARAM(STRINGZ, json);
FUNCTION_TEST_PARAM_P(UINT, jsonPos);
FUNCTION_TEST_END();

Variant *result = NULL;
unsigned int beginPos = *jsonPos;
bool intSigned = false;

// Consume the -
if (json[*jsonPos] == '-')
{
(*jsonPos)++;
intSigned = true;
}

// Consume all digits
while (isdigit(json[*jsonPos]))
(*jsonPos)++;

// Invalid if only a - was found
if (json[*jsonPos - 1] == '-')
THROW_FMT(JsonFormatError, "found '-' with no integer at '%s'", json + 
beginPos);

MEM_CONTEXT_TEMP_BEGIN()
{
// Extract the numeric as a string
String *resultStr = strNewN(json + beginPos, *jsonPos - beginPos);

// Convert the string to a integer variant
memContextSwitch(MEM_CONTEXT_OLD());

if (intSigned)
result = varNewInt64(cvtZToInt64(strPtr(resultStr)));
else
result = varNewUInt64(cvtZToUInt64(strPtr(resultStr)));

memContextSwitch(MEM_CONTEXT_TEMP());
}
MEM_CONTEXT_TEMP_END();

FUNCTION_TEST_RETURN(result);
}

static 

Re: [HACKERS] Block level parallel vacuum

2020-01-14 Thread Masahiko Sawada
On Wed, 15 Jan 2020 at 12:34, Mahendra Singh Thalor  wrote:
>
> On Tue, 14 Jan 2020 at 17:16, Mahendra Singh Thalor  
> wrote:
> >
> > On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor  
> > wrote:
> > >
> > > On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
> > >  wrote:
> > > >
> > > > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor 
> > > >  wrote:
> > > > >
> > > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> > > > > >
> > > > > > Hi
> > > > > > Thank you for update! I looked again
> > > > > >
> > > > > > (vacuum_indexes_leader)
> > > > > > +   /* Skip the indexes that can be processed by 
> > > > > > parallel workers */
> > > > > > +   if (!skip_index)
> > > > > > +   continue;
> > > > > >
> > > > > > Does the variable name skip_index not confuse here? Maybe rename to 
> > > > > > something like can_parallel?
> > > > > >
> > > > >
> > > > > Again I looked into code and thought that somehow if we can add a
> > > > > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > > > > identify that this index is supporting parallel vacuum or not, then it
> > > > > will be easy to skip those indexes and multiple time we will not call
> > > > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > > > parallel_vacuum_index)
> > > > > We can have a linked list of non-parallel supported indexes, then
> > > > > directly we can pass to vacuum_indexes_leader.
> > > > >
> > > > > Ex: let suppose we have 5 indexes into a table.  If before launching
> > > > > parallel workers, if we can add boolean flag(can_parallel)
> > > > > IndexBulkDeleteResult structure to identify that this index is
> > > > > supporting parallel vacuum or not.
> > > > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > > > parallel_vacuum_index will process these indexes and rest will be
> > > > > processed by parallel workers. If parallel worker found that
> > > > > can_parallel is false, then it will skip that index.
> > > > >
> > > > > As per my understanding, if we implement this, then we can avoid
> > > > > multiple function calling of skip_parallel_vacuum_index and if there
> > > > > is no index which can't  performe parallel vacuum, then we will not
> > > > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > > > save unnecessary calling of vacuum_indexes_leader)
> > > > >
> > > > > Thoughts?
> > > > >
> > > >
> > > > We skip not only indexes that don't support parallel index vacuum but
> > > > also indexes supporting it depending on vacuum phase. That is, we
> > > > could skip different indexes at different vacuum phase. Therefore with
> > > > your idea, we would need to have at least three linked lists for each
> > > > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > > > that right?
> > > >
> > > > I think we can check if there are indexes that should be processed by
> > > > the leader process before entering the loop in vacuum_indexes_leader
> > > > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > > > indexes but I'm not sure it's effective since the number of indexes on
> > > > a table should be small.
> > > >
> > >
> > > Hi,
> > >
> > > +/*
> > > + * Try to initialize the parallel vacuum if requested
> > > + */
> > > +if (params->nworkers >= 0 && vacrelstats->useindex)
> > > +{
> > > +/*
> > > + * Since parallel workers cannot access data in temporary 
> > > tables, we
> > > + * can't perform parallel vacuum on them.
> > > + */
> > > +if (RelationUsesLocalBuffers(onerel))
> > > +{
> > > +/*
> > > + * Give warning only if the user explicitly tries to perform 
> > > a
> > > + * parallel vacuum on the temporary table.
> > > + */
> > > +if (params->nworkers > 0)
> > > +ereport(WARNING,
> > > +(errmsg("disabling parallel option of vacuum
> > > on \"%s\" --- cannot vacuum temporary tables in parallel",
> > >
> > > From v45 patch, we moved warning of temporary table into
> > > "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> > > don't have any index, then we are not giving any warning. I think, we
> > > should give warning for all the temporary tables if parallel degree is
> > > given. (Till v44 patch, we were giving warning for all the temporary
> > > tables(having index and without index))
> > >
> > > Thoughts?
> >
> > Hi,
> > I did some more review.  Below is the 1 review comment for v46-0002.
> >
> > +/*
> > + * Initialize the state for parallel vacuum
> > + */
> > +if (params->nworkers >= 0 && vacrelstats->useindex)
> > +{
> > +/*
> > + * Since parallel workers cannot access data in temporary tables, 
> > we
> > + * can't perform parallel vacuum 

Re: backup manifests

2020-01-14 Thread Tom Lane
David Steele  writes:
> I'm happy to work up a prototype unless the consensus is that we 
> absolutely don't want a second JSON parser in core.

How much code are we talking about?  If the answer is "a few hundred
lines", it's a lot easier to swallow than if it's "a few thousand".

regards, tom lane




Re: backup manifests

2020-01-14 Thread David Steele

Hi Stephen,

On 1/14/20 1:35 PM, Stephen Frost wrote:


My thought, which I had expressed to David (though he obviously didn't
entirely agree with me since he suggested the other options), was to
adapt the pgBackRest JSON parser, which isn't really all that much code.


It's not that I didn't agree, it's just that the pgBackRest code does 
use mem contexts, the type system, etc.  After looking at some other 
solutions with similar amounts of code I thought they might be more 
acceptable.  At least it seemed like a good idea to throw it out there.



Even so, David's offered to adjust the code to use the frontend's memory
management (*cough* malloc()..), and error handling/logging, and he had
some idea for Variadics (or maybe just pulling the backend's Datum
system in..?  He could answer better), and basically write a frontend
JSON parser for PG without too much code, no external dependencies, and
to make sure it answers this requirement, and I've agreed that he can
spend some time on that instead of pgBackRest to get us through this, if
everyone else is agreeable to the idea.  


To keep it simple I think we are left with callbacks or a somewhat 
static "what's the next datum" kind of approach.  I think the latter 
could get us through a release or two while we make improvements.



Obviously this isn't intended
to box anyone in- if there turns out even after the code's been written
to be some fatal issue with using it, so be it, but we're offering to
help.


I'm happy to work up a prototype unless the consensus is that we 
absolutely don't want a second JSON parser in core.


Regards,
--
-David
da...@pgmasters.net




Re: [HACKERS] Block level parallel vacuum

2020-01-14 Thread Masahiko Sawada
On Tue, 14 Jan 2020 at 21:43, Amit Kapila  wrote:
>
> On Tue, Jan 14, 2020 at 10:04 AM Masahiko Sawada
>  wrote:
> >
> > On Mon, 13 Jan 2020 at 12:50, Amit Kapila  wrote:
> > >
> > > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
> > >  wrote:
> > >
> > > Okay, would it better if we get rid of this variable and have code like 
> > > below?
> > >
> > > /* Skip the indexes that can be processed by parallel workers */
> > > if ( !(get_indstats(lps->lvshared, i) == NULL ||
> > > skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
> > > continue;
> >
> > Make sense to me.
> >
>
> I have changed the comment and condition to make it a positive test so
> that it is more clear.
>
> > > ...
> > > > Agreed. But with the updated patch the PARALLEL option without the
> > > > parallel degree doesn't display warning because params->nworkers = 0
> > > > in that case. So how about restoring params->nworkers at the end of
> > > > vacuum_rel()?
> > > >
> > >
> > > I had also thought on those lines, but I was not entirely sure about
> > > this resetting of workers.  Today, again thinking about it, it seems
> > > the idea Mahendra is suggesting that is giving an error if the
> > > parallel degree is not specified seems reasonable to me.  This means
> > > Vacuum (parallel), Vacuum (parallel) , etc. will give an
> > > error "parallel degree must be specified".  This idea has merit as now
> > > we are supporting a parallel vacuum by default, so a 'parallel' option
> > > without a parallel degree doesn't have any meaning.  If we do that,
> > > then we don't need to do anything additional about the handling of
> > > temp tables (other than what patch is already doing) as well.  What do
> > > you think?
> > >
> >
> > Good point! Agreed.
> >
>
> Thanks, changed accordingly.
>

Thank you for updating the patch! I have a few small comments. The
rest looks good to me.

1.
+ * Compute the number of parallel worker processes to request.  Both index
+ * vacuum and index cleanup can be executed with parallel workers.  The
+ * relation size of the table don't affect the parallel degree for now.

s/don't/doesn't/

2.
@@ -383,6 +435,7 @@ vacuum(List *relations, VacuumParams *params,
VacuumPageHit = 0;
VacuumPageMiss = 0;
VacuumPageDirty = 0;
+   VacuumSharedCostBalance = NULL;

I think we can initialize VacuumCostBalanceLocal and
VacuumActiveNWorkers here. We use these parameters during parallel
index vacuum and reset at the end but we might want to initialize them
for safety.

3.
+   /* Set cost-based vacuum delay */
+   VacuumCostActive = (VacuumCostDelay > 0);
+   VacuumCostBalance = 0;
+   VacuumPageHit = 0;
+   VacuumPageMiss = 0;
+   VacuumPageDirty = 0;
+   VacuumSharedCostBalance = &(lvshared->cost_balance);
+   VacuumActiveNWorkers = &(lvshared->active_nworkers);

VacuumCostBalanceLocal also needs to be initialized.

4.
The regression tests don't have the test case of PARALLEL 0.

Since I guess you already modifies the code locally I've attached the
diff containing the above review comments.

Regards,

--
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


review_v47_masahiko.patch
Description: Binary data


Re: pause recovery if pitr target not reached

2020-01-14 Thread Kyotaro Horiguchi
FWIW, I restate this (perhaps) more clearly.

At Wed, 15 Jan 2020 11:02:24 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> recvoery_target_* is not cleared after startup. If a server crashed
> just after the last shutdown checkpoint, any recovery_target_* setting
> prevents the server from starting regardless of its value.

recvoery_target_* is not automatically cleared after a successful
archive recovery.  After that, if the server crashed just after the
last shutdown checkpoint, any recovery_target_* setting prevents the
server from starting regardless of its value.

> > LOG:  database system was not properly shut down; automatic recovery in 
> > progress
> > LOG:  invalid record length at 0/9000420: wanted 24, got 0
> (recovery is skipped)
> > FATAL:  recovery ended before configured recovery target was reached
> 
> I think we should ignore the setting while crash recovery. Targeted
> recovery mode is documented as a feature of archive recovery.  Perhaps
> ArchiveRecoveryRequested is needed in the condition.
> 
> > if (ArchiveRecoveryRequested &&
> > recoveryTarget != RECOVERY_TARGET_UNSET && !reachedStopPoint)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Complete data erasure

2020-01-14 Thread Tom Lane
"asaba.takan...@fujitsu.com"  writes:
> I want to add the feature to erase data so that it cannot be restored 
> because it prevents attackers from stealing data from released data area.

I think this is fairly pointless, unfortunately.

Dropping or truncating tables is as much as we can do without making
unwarranted assumptions about the filesystem's behavior.  You say
you want to zero out the files first, but what will that accomplish
on copy-on-write filesystems?

Even granting that zeroing our storage files is worth something,
it's not worth much if there are more copies of the data elsewhere.
And any well-run database is going to have backups, or standby servers,
or both.  There's no way for the primary node to force standbys to erase
themselves (and it'd be a different sort of security hazard if there
were).

Also to the point: if your assumption is that an attacker has access
to the storage medium at a sufficiently low level that they can examine
previously-deleted files, what's stopping them from reading the data
*before* it's deleted?

So I think doing anything useful in this area is a bit outside
Postgres' remit.  You'd need to be thinking at an operating-system
or hardware level.

regards, tom lane




Re: Complete data erasure

2020-01-14 Thread Kyotaro Horiguchi
Hello, Asaba-san.

At Wed, 15 Jan 2020 01:31:44 +, "asaba.takan...@fujitsu.com" 
 wrote in 
> Hello hackers,
> 
> I want to add the feature to erase data so that it cannot be restored 
> because it prevents attackers from stealing data from released data area.
> 
> - Background
> International security policies require that above threat is taken measures.
> It is "Base Protection Profile for Database Management Systems Version 2.12 
> (DBMS PP)" [1] based on iso 15408.
> If the security is improved, it will be more likely to be adopted by 
> security-conscious procurers such as public agencies.
> 
> - Feature
> This feature erases data area just before it is returned to the OS 
> (-Y´erase¡ means that overwrite data area to hide its contents here) -A
> because there is a risk that the data will be restored by attackers if it is 
> returned to the OS without being overwritten.
> The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
> I want users to be able to customize the erasure method for their security 
> policies.

shred(1) or wipe(1) doesn't seem to contribute to the objective on
journaled or copy-on-write file systems. I'm not sure, but maybe the
same can be true for read-modify-write devices like SSD. I'm not sure
about SDelete, but anyway replacing unlink() with something like
'system("shred")' leads to siginificant performance degradation.

man 1 wipe says (https://linux.die.net/man/1/wipe) : (shred has a
similar note.)

> NOTE ABOUT JOURNALING FILESYSTEMS AND SOME RECOMMENDATIONS (JUNE 2004)
> Journaling filesystems (such as Ext3 or ReiserFS) are now being used
> by default by most Linux distributions. No secure deletion program
> that does filesystem-level calls can sanitize files on such
> filesystems, because sensitive data and metadata can be written to the
> journal, which cannot be readily accessed. Per-file secure deletion is
> better implemented in the operating system.


WAL files contain copies of such sensitive information, which is not
covered by the proposal. Also temporary files are not.  If the system
doesn't want not to be recoverable after corruption, it must copy such
WAL files to archive.

Currently there's a discussion on transparent data encyryption
covering the all of the above cases on and off of this mailing list.
It is different from device-level encryption mentioned in the man
page.  Doesn't that fit the requirement?

https://www.postgresql.org/message-id/CALS%2BJ3-57cL%3Djz_eT9uxiLa8CAh5BE3-HcQvXQBz0ScMjag4Zg%40mail.gmail.com


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [HACKERS] Block level parallel vacuum

2020-01-14 Thread Mahendra Singh Thalor
On Tue, 14 Jan 2020 at 17:16, Mahendra Singh Thalor  wrote:
>
> On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor  
> wrote:
> >
> > On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
> >  wrote:
> > >
> > > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor  
> > > wrote:
> > > >
> > > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> > > > >
> > > > > Hi
> > > > > Thank you for update! I looked again
> > > > >
> > > > > (vacuum_indexes_leader)
> > > > > +   /* Skip the indexes that can be processed by parallel 
> > > > > workers */
> > > > > +   if (!skip_index)
> > > > > +   continue;
> > > > >
> > > > > Does the variable name skip_index not confuse here? Maybe rename to 
> > > > > something like can_parallel?
> > > > >
> > > >
> > > > Again I looked into code and thought that somehow if we can add a
> > > > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > > > identify that this index is supporting parallel vacuum or not, then it
> > > > will be easy to skip those indexes and multiple time we will not call
> > > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > > parallel_vacuum_index)
> > > > We can have a linked list of non-parallel supported indexes, then
> > > > directly we can pass to vacuum_indexes_leader.
> > > >
> > > > Ex: let suppose we have 5 indexes into a table.  If before launching
> > > > parallel workers, if we can add boolean flag(can_parallel)
> > > > IndexBulkDeleteResult structure to identify that this index is
> > > > supporting parallel vacuum or not.
> > > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > > parallel_vacuum_index will process these indexes and rest will be
> > > > processed by parallel workers. If parallel worker found that
> > > > can_parallel is false, then it will skip that index.
> > > >
> > > > As per my understanding, if we implement this, then we can avoid
> > > > multiple function calling of skip_parallel_vacuum_index and if there
> > > > is no index which can't  performe parallel vacuum, then we will not
> > > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > > save unnecessary calling of vacuum_indexes_leader)
> > > >
> > > > Thoughts?
> > > >
> > >
> > > We skip not only indexes that don't support parallel index vacuum but
> > > also indexes supporting it depending on vacuum phase. That is, we
> > > could skip different indexes at different vacuum phase. Therefore with
> > > your idea, we would need to have at least three linked lists for each
> > > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > > that right?
> > >
> > > I think we can check if there are indexes that should be processed by
> > > the leader process before entering the loop in vacuum_indexes_leader
> > > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > > indexes but I'm not sure it's effective since the number of indexes on
> > > a table should be small.
> > >
> >
> > Hi,
> >
> > +/*
> > + * Try to initialize the parallel vacuum if requested
> > + */
> > +if (params->nworkers >= 0 && vacrelstats->useindex)
> > +{
> > +/*
> > + * Since parallel workers cannot access data in temporary tables, 
> > we
> > + * can't perform parallel vacuum on them.
> > + */
> > +if (RelationUsesLocalBuffers(onerel))
> > +{
> > +/*
> > + * Give warning only if the user explicitly tries to perform a
> > + * parallel vacuum on the temporary table.
> > + */
> > +if (params->nworkers > 0)
> > +ereport(WARNING,
> > +(errmsg("disabling parallel option of vacuum
> > on \"%s\" --- cannot vacuum temporary tables in parallel",
> >
> > From v45 patch, we moved warning of temporary table into
> > "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> > don't have any index, then we are not giving any warning. I think, we
> > should give warning for all the temporary tables if parallel degree is
> > given. (Till v44 patch, we were giving warning for all the temporary
> > tables(having index and without index))
> >
> > Thoughts?
>
> Hi,
> I did some more review.  Below is the 1 review comment for v46-0002.
>
> +/*
> + * Initialize the state for parallel vacuum
> + */
> +if (params->nworkers >= 0 && vacrelstats->useindex)
> +{
> +/*
> + * Since parallel workers cannot access data in temporary tables, we
> + * can't perform parallel vacuum on them.
> + */
> +if (RelationUsesLocalBuffers(onerel)
>
> In above check, we should add "nindexes > 1" check so that if there is only 1 
> index, then we will not call begin_parallel_vacuum.

I think, " if (params->nworkers >= 0 && nindexes > 1)" check will be
enough here .


Re: Improve errors when setting incorrect bounds for SSL protocols

2020-01-14 Thread Michael Paquier
On Tue, Jan 14, 2020 at 11:21:53AM +0100, Daniel Gustafsson wrote:
> On 14 Jan 2020, at 04:54, Michael Paquier  wrote:
>> Please note that OpenSSL 1.1.0 has added two routines to be able to
>> get the min/max protocols set in a context, called
>> SSL_CTX_get_min/max_proto_version.  Thinking about older versions of
>> OpenSSL I think that it is better to use
>> ssl_protocol_version_to_openssl to do the parsing work.  I also found
>> that it is easier to check for compatible versions after setting both
>> bounds in the SSL context, so as there is no need to worry about
>> invalid values depending on the build of OpenSSL used.
> 
> I'm not convinced that it's a good idea to check for incompatible protocol
> range in the OpenSSL backend.  We've spent a lot of energy to make the TLS 
> code
> library agnostic and pluggable, and since identifying a basic configuration
> error isn't OpenSSL specific I think it should be in the guc code.  That would
> keep the layering as well as ensure that we don't mistakenly treat this
> differently should we get a second TLS backend.

Good points.  And the get routines are not that portable in OpenSSL
either even if HEAD supports 1.0.1 and newer versions...  Attached is
an updated patch which uses a GUC check for both parameters, and
provides a hint on top of the original error message.  The SSL context
does not get reloaded if there is an error, so the errors from OpenSSL
cannot be triggered as far as I checked (after mixing a couple of
corrent and incorrect combinations manually).
--
Michael
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index e5f8a1301f..d97fe3beb2 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -204,6 +204,10 @@ static const char *show_log_file_mode(void);
 static const char *show_data_directory_mode(void);
 static bool check_backtrace_functions(char **newval, void **extra, GucSource source);
 static void assign_backtrace_functions(const char *newval, void *extra);
+static bool check_ssl_min_protocol_version(int *newval, void **extra,
+		   GucSource source);
+static bool check_ssl_max_protocol_version(int *newval, void **extra,
+		   GucSource source);
 static bool check_recovery_target_timeline(char **newval, void **extra, GucSource source);
 static void assign_recovery_target_timeline(const char *newval, void *extra);
 static bool check_recovery_target(char **newval, void **extra, GucSource source);
@@ -4594,7 +4598,7 @@ static struct config_enum ConfigureNamesEnum[] =
 		_min_protocol_version,
 		PG_TLS1_2_VERSION,
 		ssl_protocol_versions_info + 1, /* don't allow PG_TLS_ANY */
-		NULL, NULL, NULL
+		check_ssl_min_protocol_version, NULL, NULL
 	},
 
 	{
@@ -4606,7 +4610,7 @@ static struct config_enum ConfigureNamesEnum[] =
 		_max_protocol_version,
 		PG_TLS_ANY,
 		ssl_protocol_versions_info,
-		NULL, NULL, NULL
+		check_ssl_max_protocol_version, NULL, NULL
 	},
 
 	/* End-of-list marker */
@@ -11603,6 +11607,49 @@ assign_backtrace_functions(const char *newval, void *extra)
 	backtrace_symbol_list = (char *) extra;
 }
 
+static bool
+check_ssl_min_protocol_version(int *newval, void **extra, GucSource source)
+{
+	int  new_ssl_min_protocol_version = *newval;
+
+	/* PG_TLS_ANY is not supported for the minimum bound */
+	Assert(new_ssl_min_protocol_version > PG_TLS_ANY);
+
+	if (ssl_max_protocol_version &&
+		new_ssl_min_protocol_version > ssl_max_protocol_version)
+	{
+		GUC_check_errhint("\"%s\" cannot be higher than \"%s\".",
+		 "ssl_min_protocol_version",
+		 "ssl_max_protocol_version");
+		GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE);
+		return false;
+	}
+
+	return true;
+}
+
+static bool
+check_ssl_max_protocol_version(int *newval, void **extra, GucSource source)
+{
+	int  new_ssl_max_protocol_version = *newval;
+
+	/* if PG_TLS_ANY, there is no need to check the bounds */
+	if (new_ssl_max_protocol_version == PG_TLS_ANY)
+		return true;
+
+	if (ssl_min_protocol_version &&
+		ssl_min_protocol_version > new_ssl_max_protocol_version)
+	{
+		GUC_check_errhint("\"%s\" cannot be lower than \"%s\".",
+		 "ssl_max_protocol_version",
+		 "ssl_min_protocol_version");
+		GUC_check_errcode(ERRCODE_INVALID_PARAMETER_VALUE);
+		return false;
+	}
+
+	return true;
+}
+
 static bool
 check_recovery_target_timeline(char **newval, void **extra, GucSource source)
 {
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 83fcd5e839..7931ebc067 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -13,7 +13,7 @@ use SSLServer;
 
 if ($ENV{with_openssl} eq 'yes')
 {
-	plan tests => 84;
+	plan tests => 86;
 }
 else
 {
@@ -97,6 +97,22 @@ command_ok(
 	'restart succeeds with password-protected key file');
 $node->_update_pid(1);
 
+# Test compatibility of SSL protocols.
+# TLSv1.1 is lower than TLSv1.2, so it won't work.
+$node->append_conf('postgresql.conf',
+		   qq{ssl_min_protocol_version='TLSv1.2'

Re: pause recovery if pitr target not reached

2020-01-14 Thread Kyotaro Horiguchi
At Tue, 14 Jan 2020 21:13:51 +0100, Peter Eisentraut 
 wrote in 
> On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote:
> > Adding patch written for 13dev from git
> > "Michael Paquier"  skrev 1. desember 2019
> > kl. 03:08:
> > 
> >> On Fri, Nov 22, 2019 at 11:26:59AM +, Leif Gunnar Erlandsen wrote:
> >>
> >>> No it does not. It works well to demonstrate its purpose though.
> >>> And it might be to stop with FATAL would be more correct.
> >>
> >> This is still under active discussion. Please note that the latest
> >> patch does not apply, so a rebase would be nice to have. I have moved
> >> the patch to next CF, waiting on author.
> 
> I reworked your patch a bit.  I changed the outcome to be an error, as
> was discussed.  I also added tests and documentation.  Please take a
> look.

It doesn't show how far the last recovery actually reached. I don't
think activating resource managers harms. Don't we check the
not-reached condition *only* after the else block of the "if (record
!= NULL)" statement?

> /* just have to read next record after CheckPoint */
> record = ReadRecord(xlogreader, InvalidXLogRecPtr, LOG, false);
>   }
>
>   if (record != NULL)
>   {
>   ...
>   }
>   else
>   {
>  /* there are no WAL records following the checkpoint */
>  ereport(LOG,
>  (errmsg("redo is not required")));
>   }
>
+   if (recoveryTarget != RECOVERY_TARGET_UNSET && !reachedStopPoint)
..


recvoery_target_* is not cleared after startup. If a server crashed
just after the last shutdown checkpoint, any recovery_target_* setting
prevents the server from starting regardless of its value.

> LOG:  database system was not properly shut down; automatic recovery in 
> progress
> LOG:  invalid record length at 0/9000420: wanted 24, got 0
(recovery is skipped)
> FATAL:  recovery ended before configured recovery target was reached

I think we should ignore the setting while crash recovery. Targeted
recovery mode is documented as a feature of archive recovery.  Perhaps
ArchiveRecoveryRequested is needed in the condition.

> if (ArchiveRecoveryRequested &&
> recoveryTarget != RECOVERY_TARGET_UNSET && !reachedStopPoint)
 
regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: logical decoding : exceeded maxAllocatedDescs for .spill files

2020-01-14 Thread Amit Kapila
On Tue, Jan 14, 2020 at 9:58 AM Amit Kapila  wrote:
>
> On Sun, Jan 12, 2020 at 8:18 AM Amit Kapila  wrote:
> >
> > On Sat, Jan 11, 2020 at 11:16 AM Tom Lane  wrote:
> > >
> >
> > > * The seeming bug in v10 suggests that we aren't testing large enough
> > > logical-decoding cases, or at least aren't noticing leaks in that
> > > area.  I'm not sure what a good design is for testing that.  I'm not
> > > thrilled with just using a larger (and slower) test case, but it's
> > > not clear to me how else to attack it.
> > >
> >
> > It is not clear to me either at this stage, but I think we can decide
> > that after chasing the issue in v10.  My current plan is to revert
> > this test and make a note of the memory leak problem found (probably
> > track in Older Bugs section of PostgreSQL 12 Open Items).
> >
>
> Pushed the revert
>

Sidewinder is green now on back branches.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Complete data erasure

2020-01-14 Thread asaba.takan...@fujitsu.com
Hello hackers,

I want to add the feature to erase data so that it cannot be restored 
because it prevents attackers from stealing data from released data area.

- Background
International security policies require that above threat is taken measures.
It is "Base Protection Profile for Database Management Systems Version 2.12 
(DBMS PP)" [1] based on iso 15408.
If the security is improved, it will be more likely to be adopted by 
security-conscious procurers such as public agencies.

- Feature
This feature erases data area just before it is returned to the OS (“erase” 
means that overwrite data area to hide its contents here) 
because there is a risk that the data will be restored by attackers if it is 
returned to the OS without being overwritten.
The erase timing is when DROP, VACUUM, TRUNCATE, etc. are executed.
I want users to be able to customize the erasure method for their security 
policies.

- Implementation
My idea is adding a new parameter erase_command to postgresql.conf.
The command that users set in this parameter is executed just before 
unlink(path) or ftruncate(fd, 0) is called.
For example, the command is shred on Linux and SDelete on Windows.

When erase_command is set, VACUUM does not truncate a file size to non-zero 
because it's safer for users to return the entire file to the OS than to return 
part of it.
Also, there is no standard tool that overwrites part of a file.
With the above specifications, users can easily and safely use this feature 
using standard tool that overwrites entire file like shred.

Hope to hear your feedback and comments.

[1] https://www.commoncriteriaportal.org/files/ppfiles/pp0088V2b_pdf.pdf
P44 8.1.2

- Threat/Policy
A threat agent may use or manage TSF, bypassing the protection mechanisms of 
the TSF.

- TOE Security Objectives Addressing the Threat/Policy 
The TOE will ensure that any information contained in a protected resource 
within its Scope of Control 
is not inappropriately disclosed when the resource is reallocated.

- Rationale
diminishes this threat by ensuring that TSF data and user data is not persistent
when resources are released by one user/process and allocated to another 
user/process.

TOE: Target of Evaluation
TSF: TOE Security Functionality


Regards

--
Takanori Asaba





Re: base backup client as auxiliary backend process

2020-01-14 Thread Masahiko Sawada
On Tue, 14 Jan 2020 at 22:58, Peter Eisentraut
 wrote:
>
> On 2020-01-14 07:32, Masahiko Sawada wrote:
> > - Replication slot name used by this WAL receiver
> > + 
> > +  Replication slot name used by this WAL receiver.  This is only set 
> > if a
> > +  permanent replication slot is set using  > +  linkend="guc-primary-slot-name"/>.  Otherwise, the WAL receiver may 
> > use
> > +  a temporary replication slot (determined by  > +  linkend="guc-wal-receiver-create-temp-slot"/>), but these are not 
> > shown
> > +  here.
> > + 
> >
> > Now that the slot name is shown even if it's a temp slot the above
> > documentation changes needs to be changed. Other changes look good to
> > me.
>
> committed, thanks

Thank you for committing these patches.

Could you rebase the main patch that adds base backup client as
auxiliary backend process since the previous version patch (v3)
conflicts with the current HEAD?

Regards,

-- 
Masahiko Sawadahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Add support for automatically updating Unicode derived files

2020-01-14 Thread Tom Lane
Peter Eisentraut  writes:
> Committed, thanks.

This patch is making src/tools/pginclude/headerscheck unhappy:

./src/include/common/unicode_combining_table.h:3: error: array type has 
incomplete element type

I guess that header needs another #include, or else you need to
move some declarations around.

regards, tom lane




Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Jan 14, 2020 at 05:37:53PM -0500, Tom Lane wrote:
>> I wonder just how messy it would be to add a column to pg_statistic_ext
>> whose type is the composite type "pg_statistic", and drop the required
>> data into that.  We've not yet used any composite types in the system
>> catalogs, AFAIR, but since pg_statistic_ext isn't a bootstrap catalog
>> it seems like we might be able to get away with it.

[ I meant pg_statistic_ext_data, obviously ]

> I don't know, but feels a bit awkward to store this type of stats into
> pg_statistic_ext, which was meant for multi-column stats. Maybe it'd
> work fine, not sure.

If we wanted to allow a single statistics object to contain data for
multiple expressions, we'd actually need that to be array-of-pg_statistic
not just pg_statistic.  Seems do-able, but on the other hand we could
just prohibit having more than one output column in the "query" for this
type of extended statistic.  Either way, this seems far less invasive
than either a new catalog or a new relation relkind (to say nothing of
needing both, which is where you seemed to be headed).

regards, tom lane




Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tomas Vondra

On Tue, Jan 14, 2020 at 05:37:53PM -0500, Tom Lane wrote:

Tomas Vondra  writes:

On Tue, Jan 14, 2020 at 04:52:44PM -0500, Tom Lane wrote:

The main issue for sticking the results into pg_statistic is that
the primary key there is (starelid, staattnum), and we haven't got
a suitable attnum.  I wouldn't much object to putting the data into
pg_statistic_ext_data, but it doesn't really have a suitable
rowtype ...



Well, that's why I proposed to essentially build a fake "relation" just
for this purpose. So we'd have a pg_class entry with a special relkind,
attnums and all that. And the expressions would be stored either in
pg_statistic_ext or in a new catalog. But maybe that's nonsense.


Seems pretty yucky.  I realize we've already got "fake relations" like
foreign tables and composite types, but the number of special cases
those create is very annoying.  And you still don't have anyplace to
put the expressions themselves in such a structure --- I hope you
weren't going to propose fake pg_index rows for that.



No, I wasn't going to propose fake pg_index rows, because - I actually
wrote "stored either in pg_statistic_ext or in a new catalog" so I was
thinking about a new catalog (so a dedicated and simplified copy of
pg_index).


I wonder just how messy it would be to add a column to pg_statistic_ext
whose type is the composite type "pg_statistic", and drop the required
data into that.  We've not yet used any composite types in the system
catalogs, AFAIR, but since pg_statistic_ext isn't a bootstrap catalog
it seems like we might be able to get away with it.



I don't know, but feels a bit awkward to store this type of stats into
pg_statistic_ext, which was meant for multi-column stats. Maybe it'd
work fine, not sure.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Unicode escapes with any backend encoding

2020-01-14 Thread Tom Lane
Andrew Dunstan  writes:
> Perhaps I expressed myself badly. What I meant was that we should keep
> the json and text escape rules in sync, as they are now. Since we're
> changing the text rules to allow resolvable non-ascii unicode escapes
> in non-utf8 locales, we should do the same for json.

Got it.  I'll make the patch do that in a little bit.

regards, tom lane




Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-14 Thread Alvaro Herrera
On 2020-Jan-14, Tom Lane wrote:

> I can't get terribly excited about persuading that test to cover this
> trivial little bit of logic, but if you are, I won't stand in the way.

Hmm, that's a good point actually: the patch changed several places to
inject the FOREIGN keyword, so in order to cover them all it would need
several additional regexps, not just one.  I'm not sure that
002_pg_dump.pl is prepared to do that without unsightly contortions.

Anyway, other than that minor omission the patch seemed good to me, so I
don't oppose Tomas pushing the version I posted yesterday.  Or I can, if
he prefers that.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Avoid full GIN index scan when possible

2020-01-14 Thread Tom Lane
Alexander Korotkov  writes:
> On Tue, Jan 14, 2020 at 9:43 PM Tom Lane  wrote:
>> One thing I'm still not happy about is the comment in
>> collectMatchesForHeapRow.  v12 failed to touch that at all, so I tried to
>> fill it in, but I'm not sure if my explanation is good.

> I've tried to rephrase this comment making it better from my point of
> view.  It's hard for me to be sure about this, since I'm not native
> English speaker.  I'd like you to take a look on it.

Yeah, that's not great as-is.  Maybe like

+* All scan keys except excludeOnly require at least one entry to match.
+* excludeOnly keys are an exception, because their implied
+* GIN_CAT_EMPTY_QUERY scanEntry always matches.  So return "true"
+* if all non-excludeOnly scan keys have at least one match.

>> Also, if we know
>> that excludeOnly keys are going to be ignored, can we save any work in
>> the main loop of that function?

> It doesn't look so for me.  We still need to collect matches for
> consistent function call afterwards.

Ah, right.

> I also had concerns about how excludeOnly keys work with lossy pages.
> I didn't find exact error.  But I've added code, which skips
> excludeOnly keys checks for lossy pages.  They aren't going to exclude
> any lossy page anyway.  So, we can save some resources by skipping
> this.

Hmm ... yeah, these test cases are not large enough to exercise any
lossy-page cases, are they?  I doubt we should try to make a new regression
test that is that big.  (But if there is one already, maybe we could add
more test queries with it, instead of creating whole new tables?)

regards, tom lane




Re: Avoid full GIN index scan when possible

2020-01-14 Thread Alexander Korotkov
On Wed, Jan 15, 2020 at 1:47 AM Alexander Korotkov
 wrote:
> I also had concerns about how excludeOnly keys work with lossy pages.
> I didn't find exact error.  But I've added code, which skips
> excludeOnly keys checks for lossy pages.  They aren't going to exclude
> any lossy page anyway.  So, we can save some resources by skipping
> this.

I also found the way we combine lossy pages and exact TIDs pretty
asymmetric.  Imagine one scan key A matches a lossy page, while
another key B have set of matching TIDs on the same page.  If key A
goes first, we will report a lossy page.  But if key B goes first, we
will report a set of TIDs with recheck set.  It would be nice to
improve.  But this is definitely subject of a separate patch.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: aggregate crash

2020-01-14 Thread Tom Lane
Andres Freund  writes:
> On 2020-01-14 17:01:01 -0500, Tom Lane wrote:
>> But I agree that not checking null-ness
>> explicitly is kind of unsafe.  We've never before had any expectation
>> that the Datum value of a null is anything in particular.

> I'm still not sure I actually fully understand the bug. It's obvious how
> returning the input value again could lead to memory not being freed (so
> that leak seems to go all the way back). And similarly, since the
> introduction of expanded objects, it can also lead to the expanded
> object not being deleted.
> But that's not the problem causing the crash here. What I think must
> instead be the problem is that pergroupstate->transValueIsNull, but
> pergroupstate->transValue is set to something looking like a
> pointer. Which caused us not to datumCopy() a new transition value into
> a long lived context. and then a later transition causes us to free the
> short-lived value?

Yeah, I was kind of wondering that too.  While formally the Datum value
for a null is undefined, I'm not aware offhand of any functions that
wouldn't return zero --- and this would have to be an aggregate transition
function doing so, which reduces the universe of candidates quite a lot.
Plus there's the question of how often a transition function would return
null for non-null input at all.

Could we see a test case that provokes this crash, even if it doesn't
do so reliably?

regards, tom lane




Re: Disallow cancellation of waiting for synchronous replication

2020-01-14 Thread Andres Freund
Hi,

On 2020-01-12 16:18:38 +0500, Andrey Borodin wrote:
> > 11 янв. 2020 г., в 7:34, Bruce Momjian  написал(а):
> > 
> > Actually, it might be worse than that.  In my reading of
> > RecordTransactionCommit(), we do this:
> > 
> > write to WAL
> > flush WAL (durable)
> > make visible to other backends
> > replicate
> > communicate to the client
> > 
> > I think this means we make the transaction commit visible to all
> > backends _before_ we replicate it, and potentially wait until we get a
> > replication reply to return SUCCESS to the client.
> No. Data is not visible to other backend when we await sync rep.

Yea, as the relevant comment in RecordTransactionCommit() says;

 * Note that at this stage we have marked clog, but still show as 
running
 * in the procarray and continue to hold locks.
 */
if (wrote_xlog && markXidCommitted)
SyncRepWaitForLSN(XactLastRecEnd, true);


But it's worthwhile to emphasize that data at that stage actually *can*
be visible on standbys. The fact that the transaction still shows as
running via procarray, on the primary, does not influence visibility
determinations on the standby.

Greetings,

Andres Freund




Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-14 Thread Tom Lane
Alvaro Herrera  writes:
>> On Mon, Jan 13, 2020 at 7:52 PM Tom Lane  wrote:
>>> Isn't the change in the TAP test output sufficient?

> Yeah, I think there should be at least one regexp in t/002_pg_dump.pl to
> verify ALTER FOREIGN TABLE is being produced.
> I wonder if Tom is thinking about Luis' other pg_dump patch for foreign
> tables, which includes some changes to src/test/modules/test_pg_dump.

No, I was just reacting to the comment that the TAP test was failing,
and assuming that that meant the patch had already changed the expected
output.  Looking at the patch now, I suppose that just means it had
incautiously changed whitespace or something for the non-foreign case.

I can't get terribly excited about persuading that test to cover this
trivial little bit of logic, but if you are, I won't stand in the way.

regards, tom lane




Re: Avoid full GIN index scan when possible

2020-01-14 Thread Alexander Korotkov
Hi!

On Tue, Jan 14, 2020 at 9:43 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > Updated patch is attached.  It contains more comments as well as commit 
> > message.
>
> I reviewed this a little bit.  I agree this seems way more straightforward
> than the patches we've been considering so far.  I wasn't too happy with
> the state of the comments, so I rewrote them a bit in the attached v13.

Thank you!

> One thing I'm still not happy about is the comment in
> collectMatchesForHeapRow.  v12 failed to touch that at all, so I tried to
> fill it in, but I'm not sure if my explanation is good.

I've tried to rephrase this comment making it better from my point of
view.  It's hard for me to be sure about this, since I'm not native
English speaker.  I'd like you to take a look on it.

> Also, if we know
> that excludeOnly keys are going to be ignored, can we save any work in
> the main loop of that function?

It doesn't look so for me.  We still need to collect matches for
consistent function call afterwards.  We may skip calling consistent
function for excludeOnly keys by forcing a recheck.  But that's not
going to be a plain win.

I thought about different optimization.  We now check for at least one
matching entry.  Can we check for at least one *required* entry?  It
seems we can save some consistent function calls.

> The test cases needed some work too.  Notably, some of the places where
> you tried to use EXPLAIN ANALYZE are unportable because they expose "Heap
> Blocks" counts that are not stable.  (I checked the patch on a 32-bit
> machine and indeed some of these failed.)  While it'd be possible to work
> around that by filtering the EXPLAIN output, that would not be any simpler
> or faster than our traditional style of just doing a plain EXPLAIN and a
> separate execution.

Thanks!

> It troubles me a bit as well that the test cases don't really expose
> any difference between patched and unpatched code --- I checked, and
> they "passed" without applying any of the code changes.  Maybe there's
> not much to be done about that, since after all this is an optimization
> that's not supposed to change any query results.

Yep, it seems like we can't do much in this field unless we're going
to expose too much internals at user level.

I also had concerns about how excludeOnly keys work with lossy pages.
I didn't find exact error.  But I've added code, which skips
excludeOnly keys checks for lossy pages.  They aren't going to exclude
any lossy page anyway.  So, we can save some resources by skipping
this.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


0001-Avoid-GIN-full-scan-for-empty-ALL-keys-v14.patch
Description: Binary data


Re: Unicode escapes with any backend encoding

2020-01-14 Thread Andrew Dunstan
On Wed, Jan 15, 2020 at 7:55 AM Tom Lane  wrote:
>
> Andrew Dunstan  writes:
> > On Wed, Jan 15, 2020 at 4:25 AM Chapman Flack  wrote:
> >> On 1/14/20 10:10 AM, Tom Lane wrote:
> >>> to me that this error is just useless pedantry.  As long as the DB
> >>> encoding can represent the desired character, it should be transparent
> >>> to users.
>
> >> That's my position too.
>
> > and mine.
>
> I'm confused --- yesterday you seemed to be against this idea.
> Have you changed your mind?
>
> I'll gladly go change the patch if people are on board with this.
>
>

Perhaps I expressed myself badly. What I meant was that we should keep
the json and text escape rules in sync, as they are now. Since we're
changing the text rules to allow resolvable non-ascii unicode escapes
in non-utf8 locales, we should do the same for json.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Why is pq_begintypsend so slow?

2020-01-14 Thread Andres Freund
Hi,

On 2020-01-13 15:18:04 -0800, Andres Freund wrote:
> On 2020-01-11 22:32:45 -0500, Tom Lane wrote:
> > I wrote:
> > > I saw at this point that the remaining top spots were
> > > enlargeStringInfo and appendBinaryStringInfo, so I experimented
> > > with inlining them (again, see patch below).  That *did* move
> > > the needle: down to 72691 ms, or 20% better than HEAD.
> >
> > Oh ... marking the test in the inline part of enlargeStringInfo()
> > as unlikely() helps quite a bit more: 66100 ms, a further 9% gain.
> > Might be over-optimizing for this particular case, perhaps
> >
> > (But ... I'm not finding these numbers to be super reproducible
> > across different ASLR layouts.  So take it with a grain of salt.)
>
> FWIW, I've also observed, in another thread (the node func generation
> thing [1]), that inlining enlargeStringInfo() helps a lot, especially
> when inlining some of its callers. Moving e.g. appendStringInfo() inline
> allows the compiler to sometimes optimize away the strlen. But if
> e.g. an inlined appendBinaryStringInfo() still calls enlargeStringInfo()
> unconditionally, successive appends cannot optimize away memory accesses
> for ->len/->data.

With a set of patches doing so, int4send itself is not a significant
factor for my test benchmark [1] anymore. The assembly looks about as
good as one could hope, I think:

# save rbx on the stack
   0x004b7f90 <+0>: push   %rbx
   0x004b7f91 <+1>: sub$0x20,%rsp
# store integer to be sent into rbx
   0x004b7f95 <+5>: mov0x20(%rdi),%rbx
# palloc length argument
   0x004b7f99 <+9>: mov$0x9,%edi
   0x004b7f9e <+14>:callq  0x5d9aa0 
# store integer in buffer (ebx is 4 byte portion of rbx)
   0x004b7fa3 <+19>:movbe  %ebx,0x4(%rax)
# store varlena header
   0x004b7fa8 <+24>:movl   $0x20,(%rax)
# restore stack and rbx registerz
   0x004b7fae <+30>:add$0x20,%rsp
   0x004b7fb2 <+34>:pop%rbx
   0x004b7fb3 <+35>:retq

All the $0x20 constants are a bit confusing, but they just happen to be
the same for int4send. It's the size of the stack frame,
offset for FunctionCallInfoBaseData->args[0], the varlena header (and then the 
stack
frame again) respectively.

Note that I had to annotate palloc with __attribute__((malloc)) to make
the compiler understand that palloc's returned value will not alias with
anything problematic (e.g. the potential of aliasing with fcinfo
prevents optimizing to the above without that annotation).  I think such
annotations would be a good idea anyway, precisely because they allow
the compiler to optimize code significantly better.


These together yields about a 1.8x speedup for me. The profile shows
that the overhead now is overwhelmingly elsewhere:
+   26.30%  postgres  postgres  [.] CopyOneRowTo
+   13.40%  postgres  postgres  [.] tts_buffer_heap_getsomeattrs
+   10.61%  postgres  postgres  [.] AllocSetAlloc
+9.26%  postgres  libc-2.29.so  [.] __memmove_avx_unaligned_erms
+7.32%  postgres  postgres  [.] SendFunctionCall
+6.02%  postgres  postgres  [.] palloc
+4.45%  postgres  postgres  [.] int4send
+3.68%  postgres  libc-2.29.so  [.] _IO_fwrite
+2.71%  postgres  postgres  [.] heapgettup_pagemode
+1.96%  postgres  postgres  [.] AllocSetReset
+1.83%  postgres  postgres  [.] CopySendEndOfRow
+1.75%  postgres  libc-2.29.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
+1.60%  postgres  postgres  [.] ExecStoreBufferHeapTuple
+1.57%  postgres  postgres  [.] DoCopyTo
+1.16%  postgres  postgres  [.] memcpy@plt
+1.07%  postgres  postgres  [.] heapgetpage

Even without using the new pq_begintypesend_ex()/initStringInfoEx(), the
generated code is still considerably better than before, yielding a
1.58x speedup. Tallocator overhead unsurprisingly is higher:
+   24.93%  postgres  postgres  [.] CopyOneRowTo
+   17.10%  postgres  postgres  [.] AllocSetAlloc
+   10.09%  postgres  postgres  [.] tts_buffer_heap_getsomeattrs
+6.50%  postgres  libc-2.29.so  [.] __memmove_avx_unaligned_erms
+5.99%  postgres  postgres  [.] SendFunctionCall
+5.11%  postgres  postgres  [.] palloc
+3.95%  postgres  libc-2.29.so  [.] _int_malloc
+3.38%  postgres  postgres  [.] int4send
+2.54%  postgres  postgres  [.] heapgettup_pagemode
+2.11%  postgres  libc-2.29.so  [.] _int_free
+2.06%  postgres  postgres  [.] MemoryContextReset
+2.02%  postgres  postgres  [.] AllocSetReset
+1.97%  postgres  libc-2.29.so  [.] _IO_fwrite
+1.47%  postgres  postgres  [.] DoCopyTo
+1.14%  postgres  postgres  [.] ExecStoreBufferHeapTuple
+1.06%  postgres  libc-2.29.so  [.] _IO_file_xsputn@@GLIBC_2.2.5
+1.04%  postgres  libc-2.29.so  

Re: aggregate crash

2020-01-14 Thread Andres Freund
Hi,

On 2020-01-14 17:01:01 -0500, Tom Lane wrote:
> Alvaro Herrera  writes:
> > On 2019-Dec-27, Teodor Sigaev wrote:
> >> Found crash on production instance, assert-enabled build crashes in pfree()
> >> call, with default config. v11, v12 and head are affected, but, seems, you
> >> need to be a bit lucky.
> 
> > Is this bug being considered for the next set of minors?
> 
> I think Andres last touched that code, so I was sort of expecting
> him to have an opinion on this.

Well, I commented a few days ago, also asking for further input...

To me it looks like that code has effectively been the same for quite a
while.  While today the code is:

newVal = FunctionCallInvoke(fcinfo);

/*
 * For pass-by-ref datatype, must copy the new value 
into
 * aggcontext and free the prior transValue.  But if 
transfn
 * returned a pointer to its first input, we don't need 
to do
 * anything.  Also, if transfn returned a pointer to a 
R/W
 * expanded object that is already a child of the 
aggcontext,
 * assume we can adopt that value without copying it.
 */
if (DatumGetPointer(newVal) != 
DatumGetPointer(pergroup->transValue))
newVal = ExecAggTransReparent(aggstate, 
pertrans,

  newVal, fcinfo->isnull,

  pergroup->transValue,

  pergroup->transValueIsNull);
...
ExecAggTransReparent(AggState *aggstate, AggStatePerTrans pertrans,
 Datum newValue, bool newValueIsNull,
 Datum oldValue, bool oldValueIsNull)
...
if (!newValueIsNull)
{

MemoryContextSwitchTo(aggstate->curaggcontext->ecxt_per_tuple_memory);
if (DatumIsReadWriteExpandedObject(newValue,

   false,

   pertrans->transtypeLen) &&

MemoryContextGetParent(DatumGetEOHP(newValue)->eoh_context) == 
CurrentMemoryContext)
 /* do nothing */ ;
else
newValue = datumCopy(newValue,
 
pertrans->transtypeByVal,
 
pertrans->transtypeLen);
}
if (!oldValueIsNull)
{
if (DatumIsReadWriteExpandedObject(oldValue,

   false,

   pertrans->transtypeLen))
DeleteExpandedObject(oldValue);
else
pfree(DatumGetPointer(oldValue));
}

before it was (in v10):

if (!pertrans->transtypeByVal &&
DatumGetPointer(newVal) != 
DatumGetPointer(pergroupstate->transValue))
{
if (!fcinfo->isnull)
{

MemoryContextSwitchTo(aggstate->curaggcontext->ecxt_per_tuple_memory);
if (DatumIsReadWriteExpandedObject(newVal,

   false,

   pertrans->transtypeLen) &&

MemoryContextGetParent(DatumGetEOHP(newVal)->eoh_context) == 
CurrentMemoryContext)
 /* do nothing */ ;
else
newVal = datumCopy(newVal,
   
pertrans->transtypeByVal,
   
pertrans->transtypeLen);
}
if (!pergroupstate->transValueIsNull)
{
if 
(DatumIsReadWriteExpandedObject(pergroupstate->transValue,

   false,

   pertrans->transtypeLen))
DeleteExpandedObject(pergroupstate->transValue);
else

pfree(DatumGetPointer(pergroupstate->transValue));
}
}

there's no need in the current code to check 

Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Jan 14, 2020 at 04:52:44PM -0500, Tom Lane wrote:
>> The main issue for sticking the results into pg_statistic is that
>> the primary key there is (starelid, staattnum), and we haven't got
>> a suitable attnum.  I wouldn't much object to putting the data into
>> pg_statistic_ext_data, but it doesn't really have a suitable
>> rowtype ...

> Well, that's why I proposed to essentially build a fake "relation" just
> for this purpose. So we'd have a pg_class entry with a special relkind,
> attnums and all that. And the expressions would be stored either in
> pg_statistic_ext or in a new catalog. But maybe that's nonsense.

Seems pretty yucky.  I realize we've already got "fake relations" like
foreign tables and composite types, but the number of special cases
those create is very annoying.  And you still don't have anyplace to
put the expressions themselves in such a structure --- I hope you
weren't going to propose fake pg_index rows for that.

I wonder just how messy it would be to add a column to pg_statistic_ext
whose type is the composite type "pg_statistic", and drop the required
data into that.  We've not yet used any composite types in the system
catalogs, AFAIR, but since pg_statistic_ext isn't a bootstrap catalog
it seems like we might be able to get away with it.

regards, tom lane




Re: pgindent && weirdness

2020-01-14 Thread Tom Lane
Alvaro Herrera  writes:
> I just ran pgindent over some patch, and noticed that this hunk ended up
> in my working tree:
 
> - if (IsA(leftop, Var) && IsA(rightop, Const))
> + if (IsA(leftop, Var) &(rightop, Const))

Yeah, it's been doing that for decades.  I think the triggering
factor is the typedef name (Var, here) preceding the &&.

It'd be nice to fix properly, but I've tended to take the path
of least resistance by breaking such lines to avoid the ugliness:

if (IsA(leftop, Var) &&
IsA(rightop, Const))

regards, tom lane




Re: Rearranging ALTER TABLE to avoid multi-operations bugs

2020-01-14 Thread Tom Lane
I wrote:
> [ fix-alter-table-order-of-operations-3.patch ]

Rebased again, fixing a minor conflict with f595117e2.

> I'd kind of like to get this cleared out of my queue soon.
> Does anyone intend to review it further?

If I don't hear objections pretty darn quick, I'm going to
go ahead and push this.

regards, tom lane

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 2ec3fc5..4e44ebb 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -85,6 +85,7 @@
 #include "storage/lock.h"
 #include "storage/predicate.h"
 #include "storage/smgr.h"
+#include "tcop/utility.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
 #include "utils/fmgroids.h"
@@ -142,11 +143,13 @@ static List *on_commits = NIL;
 #define AT_PASS_OLD_CONSTR		3	/* re-add existing constraints */
 /* We could support a RENAME COLUMN pass here, but not currently used */
 #define AT_PASS_ADD_COL			4	/* ADD COLUMN */
-#define AT_PASS_COL_ATTRS		5	/* set other column attributes */
-#define AT_PASS_ADD_INDEX		6	/* ADD indexes */
-#define AT_PASS_ADD_CONSTR		7	/* ADD constraints, defaults */
-#define AT_PASS_MISC			8	/* other stuff */
-#define AT_NUM_PASSES			9
+#define AT_PASS_ADD_CONSTR		5	/* ADD constraints (initial examination) */
+#define AT_PASS_COL_ATTRS		6	/* set column attributes, eg NOT NULL */
+#define AT_PASS_ADD_INDEXCONSTR	7	/* ADD index-based constraints */
+#define AT_PASS_ADD_INDEX		8	/* ADD indexes */
+#define AT_PASS_ADD_OTHERCONSTR	9	/* ADD other constraints, defaults */
+#define AT_PASS_MISC			10	/* other stuff */
+#define AT_NUM_PASSES			11
 
 typedef struct AlteredTableInfo
 {
@@ -159,6 +162,7 @@ typedef struct AlteredTableInfo
 	/* Information saved by Phases 1/2 for Phase 3: */
 	List	   *constraints;	/* List of NewConstraint */
 	List	   *newvals;		/* List of NewColumnValue */
+	List	   *afterStmts;		/* List of utility command parsetrees */
 	bool		verify_new_notnull; /* T if we should recheck NOT NULL */
 	int			rewrite;		/* Reason for forced rewrite, if any */
 	Oid			newTableSpace;	/* new tablespace; 0 means no change */
@@ -340,31 +344,45 @@ static void validateForeignKeyConstraint(char *conname,
 		 Relation rel, Relation pkrel,
 		 Oid pkindOid, Oid constraintOid);
 static void ATController(AlterTableStmt *parsetree,
-		 Relation rel, List *cmds, bool recurse, LOCKMODE lockmode);
+		 Relation rel, List *cmds, bool recurse, LOCKMODE lockmode,
+		 AlterTableUtilityContext *context);
 static void ATPrepCmd(List **wqueue, Relation rel, AlterTableCmd *cmd,
-	  bool recurse, bool recursing, LOCKMODE lockmode);
-static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode);
+	  bool recurse, bool recursing, LOCKMODE lockmode,
+	  AlterTableUtilityContext *context);
+static void ATRewriteCatalogs(List **wqueue, LOCKMODE lockmode,
+			  AlterTableUtilityContext *context);
 static void ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
-	  AlterTableCmd *cmd, LOCKMODE lockmode);
+	  AlterTableCmd *cmd, LOCKMODE lockmode, int cur_pass,
+	  AlterTableUtilityContext *context);
+static AlterTableCmd *ATParseTransformCmd(List **wqueue, AlteredTableInfo *tab,
+		  Relation rel, AlterTableCmd *cmd,
+		  bool recurse, LOCKMODE lockmode,
+		  int cur_pass,
+		  AlterTableUtilityContext *context);
 static void ATRewriteTables(AlterTableStmt *parsetree,
-			List **wqueue, LOCKMODE lockmode);
+			List **wqueue, LOCKMODE lockmode,
+			AlterTableUtilityContext *context);
 static void ATRewriteTable(AlteredTableInfo *tab, Oid OIDNewHeap, LOCKMODE lockmode);
 static AlteredTableInfo *ATGetQueueEntry(List **wqueue, Relation rel);
 static void ATSimplePermissions(Relation rel, int allowed_targets);
 static void ATWrongRelkindError(Relation rel, int allowed_targets);
 static void ATSimpleRecursion(List **wqueue, Relation rel,
-			  AlterTableCmd *cmd, bool recurse, LOCKMODE lockmode);
+			  AlterTableCmd *cmd, bool recurse, LOCKMODE lockmode,
+			  AlterTableUtilityContext *context);
 static void ATCheckPartitionsNotInUse(Relation rel, LOCKMODE lockmode);
 static void ATTypedTableRecursion(List **wqueue, Relation rel, AlterTableCmd *cmd,
-  LOCKMODE lockmode);
+  LOCKMODE lockmode,
+  AlterTableUtilityContext *context);
 static List *find_typed_table_dependencies(Oid typeOid, const char *typeName,
 		   DropBehavior behavior);
 static void ATPrepAddColumn(List **wqueue, Relation rel, bool recurse, bool recursing,
-			bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode);
+			bool is_view, AlterTableCmd *cmd, LOCKMODE lockmode,
+			AlterTableUtilityContext *context);
 static ObjectAddress ATExecAddColumn(List **wqueue, AlteredTableInfo *tab,
-	 Relation rel, ColumnDef *colDef,
+	 Relation rel, AlterTableCmd **cmd,
 	 bool recurse, bool recursing,
-	 bool 

Re: Add FOREIGN to ALTER TABLE in pg_dump

2020-01-14 Thread Alvaro Herrera
On 2020-Jan-14, vignesh C wrote:

> On Mon, Jan 13, 2020 at 7:52 PM Tom Lane  wrote:
> >
> > vignesh C  writes:
> > > On Thu, Sep 26, 2019 at 7:17 PM Luis Carril  
> > > wrote:
> > >>> Your patch is failing the pg_dump TAP tests.  Please use
> > >>> configure --enable-tap-tests, fix the problems, then resubmit.
> >
> > >> Fixed, I've attached a new version.
> >
> > > Will it be possible to add a test case for this, can we validate by
> > > adding one test?
> >
> > Isn't the change in the TAP test output sufficient?
> 
> I could not see any expected file output changes in the patch. Should
> we modify the existing test to validate this.

Yeah, I think there should be at least one regexp in t/002_pg_dump.pl to
verify ALTER FOREIGN TABLE is being produced.

I wonder if Tom is thinking about Luis' other pg_dump patch for foreign
tables, which includes some changes to src/test/modules/test_pg_dump.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




pgindent && weirdness

2020-01-14 Thread Alvaro Herrera
I just ran pgindent over some patch, and noticed that this hunk ended up
in my working tree:

diff --git a/src/backend/statistics/extended_stats.c 
b/src/backend/statistics/extended_stats.c
index 861a9148ed..fff54062b0 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -1405,13 +1405,13 @@ examine_opclause_expression(OpExpr *expr, Var **varp, 
Const **cstp, bool *varonl
if (IsA(rightop, RelabelType))
rightop = (Node *) ((RelabelType *) rightop)->arg;
 
-   if (IsA(leftop, Var) && IsA(rightop, Const))
+   if (IsA(leftop, Var) &(rightop, Const))
{
var = (Var *) leftop;
cst = (Const *) rightop;
varonleft = true;
}
-   else if (IsA(leftop, Const) && IsA(rightop, Var))
+   else if (IsA(leftop, Const) &(rightop, Var))
{
var = (Var *) rightop;
cst = (Const *) leftop;

This seems a really strange change; this
git grep '&&[^([:space:]]' -- *.c
shows that we already have a dozen or so occurrences already.  (That's
ignoring execExprInterp.c's use of computed gotos.)

I don't care all that much, but wanted to throw it out in case somebody
is specifically interested in studying pgindent's logic, since the last
round of changes has yielded excellent results.

Thanks,

-- 
Álvaro HerreraPostgreSQL Expert, https://www.2ndQuadrant.com/




Re: backup manifests

2020-01-14 Thread David Fetter
On Tue, Jan 14, 2020 at 03:35:40PM -0500, Stephen Frost wrote:
> Greetings,
> 
> * David Fetter (da...@fetter.org) wrote:
> > On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> > > Robert Haas  writes:
> > > > ... I would also expect that depending on an external package
> > > > would provoke significant opposition. If we suck the code into core,
> > > > then we have to keep it up to date with the upstream, which is a
> > > > significant maintenance burden - look at all the time Tom has spent on
> > > > snowball, regex, and time zone code over the years.
> > > 
> > > Also worth noting is that we have a seriously bad track record about
> > > choosing external packages to depend on.  The regex code has no upstream
> > > maintainer anymore (well, the Tcl guys seem to think that *we* are
> > > upstream for that now), and snowball is next door to moribund.
> > > With C not being a particularly hip language to develop in anymore,
> > > it wouldn't surprise me in the least for any C-code JSON parser
> > > we might pick to go dead pretty soon.
> > 
> > Given jq's extreme popularity and compatible license, I'd nominate that.
> 
> I don't think that really changes Tom's concerns here about having an
> "upstream" for this.
> 
> For my part, I don't really agree with the whole "we don't want two
> different JSON parsers" when we've got two of a bunch of stuff between
> the frontend and the backend, particularly since I don't really think
> it'll end up being *that* much code.
> 
> My thought, which I had expressed to David (though he obviously didn't
> entirely agree with me since he suggested the other options), was to
> adapt the pgBackRest JSON parser, which isn't really all that much code.
> 
> Frustratingly, that code has got some internal pgBackRest dependency on
> things like the memory context system (which looks, unsurprisingly, an
> awful lot like what is in PG backend), the error handling and logging
> systems (which are different from PG because they're quite intentionally
> segregated from each other- something PG would benefit from, imv..), and
> Variadics (known in the PG backend as Datums, and quite similar to
> them..).

It might be more fun to put in that infrastructure and have it gate
the manifest feature than to have two vastly different parsers to
contend with. I get that putting off the backup manifests isn't an
awesome prospect, but neither is rushing them in and getting them
wrong in ways we'll still be regretting a decade hence.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tomas Vondra

On Tue, Jan 14, 2020 at 04:52:44PM -0500, Tom Lane wrote:

Tomas Vondra  writes:

On Tue, Jan 14, 2020 at 04:21:57PM -0500, Tom Lane wrote:

Yeah, it seems likely to me that the infrastructure for this would be
somewhat different --- the user-facing syntax could be basically the
same, but ultimately we want to generate entries in pg_statistic not
pg_statistic_ext_data.  Or at least entries that look the same as what
you could find in pg_statistic.



Yeah. I think we could invent a new type of statistics "expressions"
which would simply built this per-column stats. So for example
   CREATE STATISTICS s (expressions) ON (a*b), sqrt(c) FROM t;


I was imagining the type keyword as being "standard" or something
like that, since what it's going to build are the "standard" kinds
of stats for the expression's datatype.  But yeah, has to be some other
keyword than the existing ones.

The main issue for sticking the results into pg_statistic is that
the primary key there is (starelid, staattnum), and we haven't got
a suitable attnum.  I wouldn't much object to putting the data into
pg_statistic_ext_data, but it doesn't really have a suitable
rowtype ...


Well, that's why I proposed to essentially build a fake "relation" just
for this purpose. So we'd have a pg_class entry with a special relkind,
attnums and all that. And the expressions would be stored either in
pg_statistic_ext or in a new catalog. But maybe that's nonsense.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Add pg_file_sync() to adminpack

2020-01-14 Thread Julien Rouhaud
Le mar. 14 janv. 2020 à 22:57, Stephen Frost  a écrit :

> Greetings,
>
> * Julien Rouhaud (rjuju...@gmail.com) wrote:
> > On Fri, Jan 10, 2020 at 10:50 AM Fujii Masao 
> wrote:
> > > On Thu, Jan 9, 2020 at 10:39 PM Julien Rouhaud 
> wrote:
> > > > I think that pg_write_server_files should be allowed to call that
> > > > function by default.
> > >
> > > But pg_write_server_files users are not allowed to execute
> > > other functions like pg_file_write() by default. So doing that
> > > change only for pg_file_sync() looks strange to me.
> >
> > Ah indeed.  I'm wondering if that's an oversight of the original
> > default role patch or voluntary.
>
> It was intentional.
>

ok, thanks for the clarification.

>


Re: Use compiler intrinsics for bit ops in hash

2020-01-14 Thread David Fetter
On Tue, Jan 14, 2020 at 12:21:41PM -0800, Jesse Zhang wrote:
> Hi David,
> 
> On Tue, Jan 14, 2020 at 9:36 AM David Fetter  wrote:
> >
> > Folks,
> >
> > The recent patch for distinct windowing aggregates contained a partial
> > fix of the FIXME that didn't seem entirely right, so I extracted that
> > part, changed it to use compiler intrinsics, and submit it here.
> 
> The changes in hash AM and SIMPLEHASH do look like a net positive
> improvement. My biggest cringe might be in pg_bitutils:

Thanks for looking at this!

> > diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
> > index 498e532308..cc9338da25 100644
> > --- a/src/include/port/pg_bitutils.h
> > +++ b/src/include/port/pg_bitutils.h
> > @@ -145,4 +145,32 @@ pg_rotate_right32(uint32 word, int n)
> >  return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n));
> >  }
> >
> > +/* ceil(lg2(num)) */
> > +static inline uint32
> > +ceil_log2_32(uint32 num)
> > +{
> > + return pg_leftmost_one_pos32(num-1) + 1;
> > +}
> > +
> > +static inline uint64
> > +ceil_log2_64(uint64 num)
> > +{
> > + return pg_leftmost_one_pos64(num-1) + 1;
> > +}
> > +
> > +/* calculate first power of 2 >= num
> > + * per https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
> > + * using BSR where available */
> > +static inline uint32
> > +next_power_of_2_32(uint32 num)
> > +{
> > + return ((uint32) 1) << (pg_leftmost_one_pos32(num-1) + 1);
> > +}
> > +
> > +static inline uint64
> > +next_power_of_2_64(uint64 num)
> > +{
> > + return ((uint64) 1) << (pg_leftmost_one_pos64(num-1) + 1);
> > +}
> > +
> >  #endif /* PG_BITUTILS_H */
> >
> 
> 1. Is ceil_log2_64 dead code?

Let's call it nascent code. I suspect there are places it could go, if
I look for them.  Also, it seemed silly to have one without the other.

> 2. The new utilities added here (ceil_log2_32 and company,
> next_power_of_2_32 and company) all require num > 1, but don't clearly
> Assert (or at the very least document) so.

Assert()ed.

> 3. A couple of the callers can actively pass in an argument of 1, e.g.
> from _hash_spareindex in hashutil.c, while some other callers are iffy
> at best (simplehash.h maybe?)

What would you recommend be done about this?

> 4. It seems like you *really* would like an operation like LZCNT in x86
> (first appearing in Haswell) that is well defined on zero input. ISTM
> the alternatives are:
> 
>a) Special case 1. That seems straightforward, but the branching cost
>on a seemingly unlikely condition seems to be a lot of performance
>loss
> 
>b) Use architecture specific intrinsic (and possibly with CPUID
>shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
>intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
>instructions that are well defined on zero in most ISA's other than
>x86, so maybe we can get away with special-casing x86?

b) seems much more attractive. Is there some way to tilt the tools so
that this happens? What should I be reading up on?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From b542c9efb4f4ec3999a9e6a6b180d98fca3a5101 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 14 Jan 2020 09:32:15 -0800
Subject: [PATCH v2] Use compiler intrinsics for bit ops in hash
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- In passing, fix the FIXME by centralizing those calls.

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 55d85644a4..b1e04c0136 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -30,6 +30,7 @@
 
 #include "access/hash.h"
 #include "access/hash_xlog.h"
+#include "port/pg_bitutils.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -543,11 +544,7 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
 	metap->hashm_ffactor = ffactor;
 	metap->hashm_bsize = HashGetMaxBitmapSize(page);
 	/* find largest bitmap array size that will fit in page size */
-	for (i = _hash_log2(metap->hashm_bsize); i > 0; --i)
-	{
-		if ((1 << i) <= metap->hashm_bsize)
-			break;
-	}
+	i =  pg_leftmost_one_pos32(metap->hashm_bsize);
 	Assert(i > 0);
 	metap->hashm_bmsize = 1 << i;
 	metap->hashm_bmshift = i + BYTE_TO_BIT;
@@ -570,7 +567,7 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
 	 * Set highmask as next immediate ((2 ^ x) - 1), which should be
 	 * sufficient to cover num_buckets.
 	 */
-	metap->hashm_highmask = (1 << (_hash_log2(num_buckets + 1))) - 1;
+	metap->hashm_highmask = next_power_of_2_32(num_buckets + 1) - 1;
 	metap->hashm_lowmask = (metap->hashm_highmask >> 1);
 
 	

Re: Unicode escapes with any backend encoding

2020-01-14 Thread Chapman Flack
On 1/14/20 4:25 PM, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On Wed, Jan 15, 2020 at 4:25 AM Chapman Flack  wrote:
>>> On 1/14/20 10:10 AM, Tom Lane wrote:
 to me that this error is just useless pedantry.  As long as the DB
 encoding can represent the desired character, it should be transparent
 to users.
> 
>>> That's my position too.
> 
>> and mine.
> 
> I'm confused --- yesterday you seemed to be against this idea.
> Have you changed your mind?
> 
> I'll gladly go change the patch if people are on board with this.

Hmm, well, let me clarify for my own part what I think I'm agreeing
with ... perhaps it's misaligned with something further upthread.

In an ideal world (which may be ideal in more ways than are in scope
for the present discussion) I would expect to see these principles:

1. On input, whether a Unicode escape is or isn't allowed should
   not depend on any encoding settings. It should be lexically
   allowed always, and if it represents a character that exists
   in the server encoding, it should mean that character. If it's
   not representable in the storage format, it should produce an
   error that says that.

2. If it happens that the character is representable in both the
   storage encoding and the client encoding, it shouldn't matter
   whether it arrives literally as an é or as an escape. Either
   should get stored on disk as the same bytes.

3. On output, as long as the character is representable in the client
   encoding, there is nothing to worry about. It will be sent as its
   representation in the client encoding (which may be different bytes
   than its representation in the server encoding).

4. If a character to be output isn't in the client encoding, it
   will be datatype-dependent whether there is any way to escape.
   For example, xml_out could produce 

Re: aggregate crash

2020-01-14 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Dec-27, Teodor Sigaev wrote:
>> Found crash on production instance, assert-enabled build crashes in pfree()
>> call, with default config. v11, v12 and head are affected, but, seems, you
>> need to be a bit lucky.

> Is this bug being considered for the next set of minors?

I think Andres last touched that code, so I was sort of expecting
him to have an opinion on this.  But I agree that not checking null-ness
explicitly is kind of unsafe.  We've never before had any expectation
that the Datum value of a null is anything in particular.

regards, tom lane




Re: Setting min/max TLS protocol in clientside libpq

2020-01-14 Thread Daniel Gustafsson
> On 14 Jan 2020, at 16:15, Daniel Gustafsson  wrote:
> 
>> On 14 Jan 2020, at 15:49, Tom Lane  wrote:
>> 
>> Daniel Gustafsson  writes:
> On 11 Jan 2020, at 03:49, Michael Paquier  wrote:
> One thing I noticed when looking at it is that we now have sha2_openssl.c 
> and
> openssl_protocol.c in src/common.  For easier visual grouping of OpenSSL
> functionality, it makes sense to me to rename sha2_openssl.c to 
> openssl_sha2.c,
> but that might just be pointless churn.
>> 
 Databases like consistency, and so do I, so no issues from me to do a
 rename of the sha2.c file.  That makes sense with the addition of the
 new file.
>> 
>>> Done in the attached v3.
>> 
>> I'm kind of down on renaming files unless there is a *really* strong
>> reason for it.  It makes back-patching more difficult and it makes
>> it much harder to follow the git history.  And, seeing that there is
>> also a src/common/sha2.c, it seems to me that renaming sha2_openssl.c
>> will just break consistency in a different way.
>> 
>> Maybe the problem is you've got the new file's name backwards.
>> Maybe it should be protocol_openssl.c.
> 
> Thats a very good argument, I’ll send a v4 with protocol_openssl.c when back 
> at the computer.

Files renamed to match existing naming convention, the rest of the patch left
unchanged.

cheers ./daniel



libpq_minmaxproto_v4.patch
Description: Binary data


Re: DROP OWNED CASCADE vs Temp tables

2020-01-14 Thread Alvaro Herrera
On 2020-Jan-13, Tom Lane wrote:

> That seems fundamentally wrong.  By the time we've queued an object for
> deletion in dependency.c, we have a lock on it, and we've verified that
> the object is still there (cf. systable_recheck_tuple calls).
> If shdepDropOwned is doing it differently, I'd say shdepDropOwned is
> doing it wrong.

Hmm, it seems to be doing it differently.  Maybe it should be acquiring
locks on all objects in that nested loop and verified them for
existence, so that when it calls performMultipleDeletions the objects
are already locked, as you say.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Jan 14, 2020 at 04:21:57PM -0500, Tom Lane wrote:
>> Yeah, it seems likely to me that the infrastructure for this would be
>> somewhat different --- the user-facing syntax could be basically the
>> same, but ultimately we want to generate entries in pg_statistic not
>> pg_statistic_ext_data.  Or at least entries that look the same as what
>> you could find in pg_statistic.

> Yeah. I think we could invent a new type of statistics "expressions"
> which would simply built this per-column stats. So for example
>CREATE STATISTICS s (expressions) ON (a*b), sqrt(c) FROM t;

I was imagining the type keyword as being "standard" or something
like that, since what it's going to build are the "standard" kinds
of stats for the expression's datatype.  But yeah, has to be some other
keyword than the existing ones.

The main issue for sticking the results into pg_statistic is that
the primary key there is (starelid, staattnum), and we haven't got
a suitable attnum.  I wouldn't much object to putting the data into
pg_statistic_ext_data, but it doesn't really have a suitable
rowtype ...

regards, tom lane




Re: aggregate crash

2020-01-14 Thread Alvaro Herrera
On 2019-Dec-27, Teodor Sigaev wrote:

> Hi!
> 
> Found crash on production instance, assert-enabled build crashes in pfree()
> call, with default config. v11, v12 and head are affected, but, seems, you
> need to be a bit lucky.

Is this bug being considered for the next set of minors?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tomas Vondra

On Tue, Jan 14, 2020 at 04:21:57PM -0500, Tom Lane wrote:

Tomas Vondra  writes:

On Tue, Jan 14, 2020 at 03:12:21PM -0500, Tom Lane wrote:

cc'ing Tomas in case he has any thoughts about it.



Well, I certainly do thoughts about this - it's pretty much exactly what
I proposed yesterday in this thread:
   
https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development
The third part of that patch series is exactly about supporting extended
statistics on expressions, about the way you described here. The current
status of the WIP patch is that grammar + ANALYZE mostly works, but
there is no support in the planner. It's obviously still very hackish.


Cool.  We should probably take the discussion to that thread, then.


I'm also wondering if we could/should 100% rely on extended statistics,
because those are really meant to track correlations between columns,


Yeah, it seems likely to me that the infrastructure for this would be
somewhat different --- the user-facing syntax could be basically the
same, but ultimately we want to generate entries in pg_statistic not
pg_statistic_ext_data.  Or at least entries that look the same as what
you could find in pg_statistic.



Yeah. I think we could invent a new type of statistics "expressions"
which would simply built this per-column stats. So for example

  CREATE STATISTICS s (expressions) ON (a*b), sqrt(c) FROM t;

would build per-column stats stored in pg_statistics, while

  CREATE STATISTICS s (mcv) ON (a*b), sqrt(c) FROM t;

would build the multi-column MCV list on expressions.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: Remove libpq.rc, use win32ver.rc for libpq

2020-01-14 Thread Peter Eisentraut

On 2020-01-09 10:56, Peter Eisentraut wrote:

On 2020-01-06 09:02, Michael Paquier wrote:

- FILEFLAGSMASK  0x17L
+ FILEFLAGSMASK  VS_FFI_FILEFLAGSMASK
Are you sure with the mapping here?  I would have thought that
VS_FF_DEBUG is not necessary when using release-quality builds, which
is something that can be configured with build.pl, and that it would
be better to not enforce VS_FF_PRERELEASE all the time.


Note that there is FILEFLAGSMASK and FILEFLAGS.  The first is just a
mask that says which bits in the second are valid.  Since both libpq.rc
and win32ver.rc use FILEFLAGS 0, it doesn't matter what we set
FILEFLAGSMASK to.  But currently libpq.rc uses 0x3fL and win32ver.rc
uses 0x17L, so in order to unify this sensibly I looked for a
well-recognized standard value, which led to VS_FFI_FILEFLAGSMASK.


Here is a rebased patch.

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 53607c4344f098a6ea2460b9b21715433199a664 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 14 Jan 2020 21:18:29 +0100
Subject: [PATCH v2] Remove libpq.rc, use win32ver.rc for libpq

For historical reasons, libpq used a separate libpq.rc file for the
Windows builds while all other components use a common file
win32ver.rc.  With a bit of tweaking, the libpq build can also use the
win32ver.rc file.  This removes a bit of duplicative code.

Discussion: 
https://www.postgresql.org/message-id/flat/ad505e61-a923-e114-9f38-9867d1610...@2ndquadrant.com
---
 src/bin/pgevent/Makefile |  1 -
 src/interfaces/libpq/.gitignore  |  1 -
 src/interfaces/libpq/Makefile| 15 ---
 src/interfaces/libpq/libpq.rc.in | 31 ---
 src/makefiles/Makefile.win32 | 17 +++--
 src/port/win32ver.rc | 10 +++---
 src/tools/copyright.pl   |  1 -
 src/tools/msvc/Mkvcbuild.pm  |  2 --
 src/tools/msvc/Project.pm|  8 
 src/tools/msvc/Solution.pm   | 21 -
 src/tools/msvc/clean.bat |  1 -
 src/tools/version_stamp.pl   |  7 ---
 12 files changed, 30 insertions(+), 85 deletions(-)
 delete mode 100644 src/interfaces/libpq/libpq.rc.in

diff --git a/src/bin/pgevent/Makefile b/src/bin/pgevent/Makefile
index 215e343605..28c3078b01 100644
--- a/src/bin/pgevent/Makefile
+++ b/src/bin/pgevent/Makefile
@@ -9,7 +9,6 @@
 PGFILEDESC = "Eventlog message formatter"
 PGAPPICON=win32
 
-PGFILESHLIB = 1
 subdir = src/bin/pgevent
 top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
diff --git a/src/interfaces/libpq/.gitignore b/src/interfaces/libpq/.gitignore
index 9be338dec8..7b438f3765 100644
--- a/src/interfaces/libpq/.gitignore
+++ b/src/interfaces/libpq/.gitignore
@@ -1,5 +1,4 @@
 /exports.list
-/libpq.rc
 # .c files that are symlinked in from elsewhere
 /encnames.c
 /wchar.c
diff --git a/src/interfaces/libpq/Makefile b/src/interfaces/libpq/Makefile
index 773ef2723d..f5f1c0c08d 100644
--- a/src/interfaces/libpq/Makefile
+++ b/src/interfaces/libpq/Makefile
@@ -14,6 +14,8 @@ top_builddir = ../../..
 include $(top_builddir)/src/Makefile.global
 
 
+PGFILEDESC = "PostgreSQL Access Library"
+
 # shared library parameters
 NAME= pq
 SO_MAJOR_VERSION= 5
@@ -28,6 +30,7 @@ endif
 # the conditional additions of files to OBJS, update Mkvcbuild.pm to match.
 
 OBJS = \
+   $(WIN32RES) \
fe-auth-scram.o \
fe-connect.o \
fe-exec.o \
@@ -65,12 +68,8 @@ endif
 
 ifeq ($(PORTNAME), win32)
 OBJS += \
-   libpqrc.o \
win32.o
 
-libpqrc.o: libpq.rc
-   $(WINDRES) -i $< -o $@
-
 ifeq ($(enable_thread_safety), yes)
 OBJS += pthread-win32.o
 endif
@@ -113,12 +112,6 @@ encnames.c wchar.c: % : $(backend_src)/utils/mb/%
rm -f $@ && $(LN_S) $< .
 
 
-libpq.rc: libpq.rc.in
-   sed -e 's/\(VERSION.*\),0 *$$/\1,'`date '+%y%j' | sed 's/^0*//'`'/' $< 
>$@
-
-# Depend on Makefile.global to force rebuild on re-run of configure.
-libpq.rc: $(top_builddir)/src/Makefile.global
-
 # Make dependencies on pg_config_paths.h visible, too.
 fe-connect.o: fe-connect.c $(top_builddir)/src/port/pg_config_paths.h
 fe-misc.o: fe-misc.c $(top_builddir)/src/port/pg_config_paths.h
@@ -148,7 +141,7 @@ uninstall: uninstall-lib
 
 clean distclean: clean-lib
$(MAKE) -C test $@
-   rm -f $(OBJS) pthread.h libpq.rc
+   rm -f $(OBJS) pthread.h
 # Might be left over from a Win32 client-only build
rm -f pg_config_paths.h
 # Remove files we (may have) symlinked in from other places
diff --git a/src/interfaces/libpq/libpq.rc.in b/src/interfaces/libpq/libpq.rc.in
deleted file mode 100644
index 3669a17c3c..00
--- a/src/interfaces/libpq/libpq.rc.in
+++ /dev/null
@@ -1,31 +0,0 @@
-#include 
-
-VS_VERSION_INFO VERSIONINFO
- FILEVERSION 13,0,0,0
- PRODUCTVERSION 13,0,0,0
- FILEFLAGSMASK 0x3fL
- FILEFLAGS 0
- FILEOS VOS__WINDOWS32
- FILETYPE VFT_DLL
- FILESUBTYPE 0x0L
-BEGIN
-BLOCK "StringFileInfo"
-

Re: Unicode escapes with any backend encoding

2020-01-14 Thread Tom Lane
Andrew Dunstan  writes:
> On Wed, Jan 15, 2020 at 4:25 AM Chapman Flack  wrote:
>> On 1/14/20 10:10 AM, Tom Lane wrote:
>>> to me that this error is just useless pedantry.  As long as the DB
>>> encoding can represent the desired character, it should be transparent
>>> to users.

>> That's my position too.

> and mine.

I'm confused --- yesterday you seemed to be against this idea.
Have you changed your mind?

I'll gladly go change the patch if people are on board with this.

regards, tom lane




Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tom Lane
Tomas Vondra  writes:
> On Tue, Jan 14, 2020 at 03:12:21PM -0500, Tom Lane wrote:
>> cc'ing Tomas in case he has any thoughts about it.

> Well, I certainly do thoughts about this - it's pretty much exactly what
> I proposed yesterday in this thread:
>
> https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development
> The third part of that patch series is exactly about supporting extended
> statistics on expressions, about the way you described here. The current
> status of the WIP patch is that grammar + ANALYZE mostly works, but
> there is no support in the planner. It's obviously still very hackish.

Cool.  We should probably take the discussion to that thread, then.

> I'm also wondering if we could/should 100% rely on extended statistics,
> because those are really meant to track correlations between columns,

Yeah, it seems likely to me that the infrastructure for this would be
somewhat different --- the user-facing syntax could be basically the
same, but ultimately we want to generate entries in pg_statistic not
pg_statistic_ext_data.  Or at least entries that look the same as what
you could find in pg_statistic.

regards, tom lane




Re: Unicode escapes with any backend encoding

2020-01-14 Thread Andrew Dunstan
On Wed, Jan 15, 2020 at 4:25 AM Chapman Flack  wrote:
>
> On 1/14/20 10:10 AM, Tom Lane wrote:
> > to me that this error is just useless pedantry.  As long as the DB
> > encoding can represent the desired character, it should be transparent
> > to users.
>
> That's my position too.
>


and mine.

cheers

andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: proposal: type info support functions for functions that use "any" type

2020-01-14 Thread Tom Lane
Pavel Stehule  writes:
>  [ parser-support-function-with-demo-20191128.patch ]

TBH, I'm still not convinced that this is a good idea.  Restricting
the support function to only change the function's return type is
safer than the original proposal, but it's still not terribly safe.
If you change the support function's algorithm in any way, how do
you know whether you've broken existing stored queries?  If the
support function consults external resources to make its choice
(perhaps checking the existence of a cast), where could we record
that the query depends on the existence of that cast?  There'd be
no visible trace of that in the query parsetree.

I'm also still not convinced that this idea allows doing anything
that can't be done just as well with polymorphism.  It would be a
really bad idea for the support function to be examining the values
of the arguments (else what happens when they're not constants?).
So all you can do is look at their types, and then it seems like
the things you can usefully do are pretty much like polymorphism,
i.e. select some one of the input types, or a related type such
as an array type or element type.  If there are gaps in what you
can express with polymorphism, I'd much rather spend effort on
improving that facility than in adding something that is only
accessible to advanced C coders.  (Yes, I know I've been slacking
on reviewing [1].)

Lastly, I still think that this patch doesn't begin to address
all the places that would have to know about the feature.  There's
a lot of places that know about polymorphism --- if this is
polymorphism on steroids, which it is, then why don't all of those
places need to be touched?

On the whole I think we should reject this idea.

regards, tom lane

[1] https://commitfest.postgresql.org/26/1911/




Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tomas Vondra

On Tue, Jan 14, 2020 at 03:12:21PM -0500, Tom Lane wrote:

Justin Pryzby  writes:

On Sun, Dec 22, 2019 at 06:16:48PM -0600, Justin Pryzby wrote:

On Tue, Nov 19, 2019 at 01:34:21PM -0600, Justin Pryzby wrote:

Tom implemented "Planner support functions":
https://www.postgresql.org/docs/12/xfunc-optimization.html
I wondered whether there was any consideration to extend that to allow
providing improved estimates of "group by".  That currently requires manually
by creating an expression index, if the function is IMMUTABLE (which is not
true for eg.  date_trunc of timestamptz).



I didn't hear back so tried implementing this for date_trunc().  Currently, the
...
If the input timestamps have (say) hourly granularity, rowcount will be
*underestimated* by 3600x, which is worse than the behavior in master of
overestimating by (for "day") 24x.


While I don't have any objection in principle to extending the set of
things planner support functions can do, it doesn't seem like the idea is
giving you all that much traction for this problem.  There isn't that much
knowledge that's specific to date_trunc in this, and instead you've got a
bunch of generic problems (that would have to be solved again in every
other function's planner support).

Another issue is that it seems like this doesn't compose nicely ---
if the GROUP BY expression is "f(g(x))", how do f's support function
and g's support function interact?

The direction that I've been wanting to go in for this kind of problem
is to allow CREATE STATISTICS on an expression, ie if you were concerned
about the estimation accuracy for GROUP BY or anything else, you could do
something like

CREATE STATISTICS foo ON date_trunc('day', mod_time) FROM my_table;

This would have the effect of cueing ANALYZE to gather stats on the
value of that expression, which the planner could then use, very much
as if you'd created an index on the expression.  The advantages of
doing this rather than making an index are

(1) you don't have to pay the maintenance costs for an index,

(2) we don't have to restrict it to immutable expressions.  (Volatile
expressions would have to be disallowed, if only because of fear of
side-effects; but I think we could allow stable expressions just fine.
Worst case problem is that the stats are stale, but so what?)

With a solution like this, we don't have to solve any of the difficult
problems of how the pieces of the expression interact with each other
or with the statistics of the underlying column(s).  We just use the
stats if available, and the estimate will be as good as it'd be for
a plain column reference.

I'm not sure how much new infrastructure would have to be built
for this.  We designed the CREATE STATISTICS syntax to support
this (partly at my insistence IIRC) but I do not think any of the
existing plumbing is ready for it.  I don't think it'd be very
hard to plug this into ANALYZE or the planner, but there might be
quite some work to be done on the catalog infrastructure, pg_dump,
etc.

cc'ing Tomas in case he has any thoughts about it.



Well, I certainly do thoughts about this - it's pretty much exactly what
I proposed yesterday in this thread:

  
https://www.postgresql.org/message-id/flat/20200113230008.g67iyk4cs3xbnjju@development

The third part of that patch series is exactly about supporting extended
statistics on expressions, about the way you described here. The current
status of the WIP patch is that grammar + ANALYZE mostly works, but
there is no support in the planner. It's obviously still very hackish.

The main thing I'm not sure about is how to represent this in catalogs,
whether to have two fields (like for indexes) or maybe a single list of
expressions.


I'm also wondering if we could/should 100% rely on extended statistics,
because those are really meant to track correlations between columns,
which means we currently require at least two attributes in CREATE
STATISTICS and so on. So maybe what we want is collecting "regular"
per-column stats just like we do for indexes, but without the index
maintenance overhead?

The advantage would be we'd get exactly the same stats as for indexes,
and we could use them in the same places out of the box. While with
extended stats we'll have to tweak those places.

Now, the trouble is we can't store stuff in pg_statistic without having
a relation (i.e. table / index / ...) but maybe we could invent a new 
relation type for this purpose. Of course, it'd require some catalog

work to represent this ...


Ultimately I think we'd want both things, it's not one or the other.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-14 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> ... We must remember how much data we encrypted
> >> and then discount that much of the caller's supplied data next time.
> >> There are hints in the existing comments that somebody understood
> >> this at one point, but the code isn't acting that way today.
> 
> > That's a case I hadn't considered and you're right- the algorithm
> > certainly wouldn't work in such a case.  I don't recall specifically if
> > the code had handled it better previously, or not, but I do recall there
> > was something previously about being given a buffer and then having the
> > API defined as "give me back the exact same buffer because I had to
> > stop" and I recall finding that to ugly, but I get it now, seeing this
> > issue.  I'd certainly be happier if there was a better alternative but I
> > don't know that there really is.
> 
> Yeah.  The only bright spot is that there's no reason for the caller
> to change its mind about what it wants to write, so that this restriction
> doesn't really affect anything.  (The next call might potentially add
> *more* data at the end, but that's fine.)

Right, makes sense.

> I realized when I got into it that my sketch above also considered only
> part of the problem.  In the general case, we might've encrypted some data
> from the current write request and successfully sent it, and then
> encrypted some more data but been unable to (fully) send that packet.
> In this situation, it's best to report that we wrote however much data
> corresponds to the fully sent packet(s).  That way the caller can discard
> that data from its buffer.  We can't report the data corresponding to the
> in-progress packet as being written, though, or we have the
> might-not-get-another-call problem.  Fortunately the API already has the
> notion of a partial write, since the underlying socket calls do.

Yeah, I see how that's also an issue and agree that it makes sense to
report back what's been written and sent as a partial write, and not
report back everything we've "consumed" since we might not get called
again in that case.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: backup manifests

2020-01-14 Thread Stephen Frost
Greetings,

* David Fetter (da...@fetter.org) wrote:
> On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> > Robert Haas  writes:
> > > ... I would also expect that depending on an external package
> > > would provoke significant opposition. If we suck the code into core,
> > > then we have to keep it up to date with the upstream, which is a
> > > significant maintenance burden - look at all the time Tom has spent on
> > > snowball, regex, and time zone code over the years.
> > 
> > Also worth noting is that we have a seriously bad track record about
> > choosing external packages to depend on.  The regex code has no upstream
> > maintainer anymore (well, the Tcl guys seem to think that *we* are
> > upstream for that now), and snowball is next door to moribund.
> > With C not being a particularly hip language to develop in anymore,
> > it wouldn't surprise me in the least for any C-code JSON parser
> > we might pick to go dead pretty soon.
> 
> Given jq's extreme popularity and compatible license, I'd nominate that.

I don't think that really changes Tom's concerns here about having an
"upstream" for this.

For my part, I don't really agree with the whole "we don't want two
different JSON parsers" when we've got two of a bunch of stuff between
the frontend and the backend, particularly since I don't really think
it'll end up being *that* much code.

My thought, which I had expressed to David (though he obviously didn't
entirely agree with me since he suggested the other options), was to
adapt the pgBackRest JSON parser, which isn't really all that much code.

Frustratingly, that code has got some internal pgBackRest dependency on
things like the memory context system (which looks, unsurprisingly, an
awful lot like what is in PG backend), the error handling and logging
systems (which are different from PG because they're quite intentionally
segregated from each other- something PG would benefit from, imv..), and
Variadics (known in the PG backend as Datums, and quite similar to
them..).

Even so, David's offered to adjust the code to use the frontend's memory
management (*cough* malloc()..), and error handling/logging, and he had
some idea for Variadics (or maybe just pulling the backend's Datum
system in..?  He could answer better), and basically write a frontend
JSON parser for PG without too much code, no external dependencies, and
to make sure it answers this requirement, and I've agreed that he can
spend some time on that instead of pgBackRest to get us through this, if
everyone else is agreeable to the idea.  Obviously this isn't intended
to box anyone in- if there turns out even after the code's been written
to be some fatal issue with using it, so be it, but we're offering to
help.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-14 Thread Tom Lane
Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> ... We must remember how much data we encrypted
>> and then discount that much of the caller's supplied data next time.
>> There are hints in the existing comments that somebody understood
>> this at one point, but the code isn't acting that way today.

> That's a case I hadn't considered and you're right- the algorithm
> certainly wouldn't work in such a case.  I don't recall specifically if
> the code had handled it better previously, or not, but I do recall there
> was something previously about being given a buffer and then having the
> API defined as "give me back the exact same buffer because I had to
> stop" and I recall finding that to ugly, but I get it now, seeing this
> issue.  I'd certainly be happier if there was a better alternative but I
> don't know that there really is.

Yeah.  The only bright spot is that there's no reason for the caller
to change its mind about what it wants to write, so that this restriction
doesn't really affect anything.  (The next call might potentially add
*more* data at the end, but that's fine.)

I realized when I got into it that my sketch above also considered only
part of the problem.  In the general case, we might've encrypted some data
from the current write request and successfully sent it, and then
encrypted some more data but been unable to (fully) send that packet.
In this situation, it's best to report that we wrote however much data
corresponds to the fully sent packet(s).  That way the caller can discard
that data from its buffer.  We can't report the data corresponding to the
in-progress packet as being written, though, or we have the
might-not-get-another-call problem.  Fortunately the API already has the
notion of a partial write, since the underlying socket calls do.

regards, tom lane




Re: Use compiler intrinsics for bit ops in hash

2020-01-14 Thread Jesse Zhang
Hi David,

On Tue, Jan 14, 2020 at 9:36 AM David Fetter  wrote:
>
> Folks,
>
> The recent patch for distinct windowing aggregates contained a partial
> fix of the FIXME that didn't seem entirely right, so I extracted that
> part, changed it to use compiler intrinsics, and submit it here.

The changes in hash AM and SIMPLEHASH do look like a net positive
improvement. My biggest cringe might be in pg_bitutils:

> diff --git a/src/include/port/pg_bitutils.h b/src/include/port/pg_bitutils.h
> index 498e532308..cc9338da25 100644
> --- a/src/include/port/pg_bitutils.h
> +++ b/src/include/port/pg_bitutils.h
> @@ -145,4 +145,32 @@ pg_rotate_right32(uint32 word, int n)
>  return (word >> n) | (word << (sizeof(word) * BITS_PER_BYTE - n));
>  }
>
> +/* ceil(lg2(num)) */
> +static inline uint32
> +ceil_log2_32(uint32 num)
> +{
> + return pg_leftmost_one_pos32(num-1) + 1;
> +}
> +
> +static inline uint64
> +ceil_log2_64(uint64 num)
> +{
> + return pg_leftmost_one_pos64(num-1) + 1;
> +}
> +
> +/* calculate first power of 2 >= num
> + * per https://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
> + * using BSR where available */
> +static inline uint32
> +next_power_of_2_32(uint32 num)
> +{
> + return ((uint32) 1) << (pg_leftmost_one_pos32(num-1) + 1);
> +}
> +
> +static inline uint64
> +next_power_of_2_64(uint64 num)
> +{
> + return ((uint64) 1) << (pg_leftmost_one_pos64(num-1) + 1);
> +}
> +
>  #endif /* PG_BITUTILS_H */
>

1. Is ceil_log2_64 dead code?

2. The new utilities added here (ceil_log2_32 and company,
next_power_of_2_32 and company) all require num > 1, but don't clearly
Assert (or at the very least document) so.

3. A couple of the callers can actively pass in an argument of 1, e.g.
from _hash_spareindex in hashutil.c, while some other callers are iffy
at best (simplehash.h maybe?)

4. It seems like you *really* would like an operation like LZCNT in x86
(first appearing in Haswell) that is well defined on zero input. ISTM
the alternatives are:

   a) Special case 1. That seems straightforward, but the branching cost
   on a seemingly unlikely condition seems to be a lot of performance
   loss

   b) Use architecture specific intrinsic (and possibly with CPUID
   shenanigans) like __builtin_ia32_lzcnt_u64 on x86 and use the CLZ
   intrinsic elsewhere. The CLZ GCC intrinsic seems to map to
   instructions that are well defined on zero in most ISA's other than
   x86, so maybe we can get away with special-casing x86?

Cheers,
Jesse




Re: pause recovery if pitr target not reached

2020-01-14 Thread Peter Eisentraut

On 2019-12-11 12:40, Leif Gunnar Erlandsen wrote:

Adding patch written for 13dev from git

"Michael Paquier"  skrev 1. desember 2019 kl. 03:08:


On Fri, Nov 22, 2019 at 11:26:59AM +, Leif Gunnar Erlandsen wrote:


No it does not. It works well to demonstrate its purpose though.
And it might be to stop with FATAL would be more correct.


This is still under active discussion. Please note that the latest
patch does not apply, so a rebase would be nice to have. I have moved
the patch to next CF, waiting on author.


I reworked your patch a bit.  I changed the outcome to be an error, as 
was discussed.  I also added tests and documentation.  Please take a look.


--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 0850de3922854ecdd4249f98ea854afd6ecc9ae2 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Tue, 14 Jan 2020 21:01:26 +0100
Subject: [PATCH v5] Fail if recovery target is not reached

Before, if a recovery target is configured, but the archive ended
before the target was reached, recovery would end and the server would
promote without further notice.  That was deemed to be pretty wrong.
With this change, if the recovery target is not reached, it is a fatal
error.

Discussion: 
https://www.postgresql.org/message-id/flat/993736dd3f1713ec1f63fc3b65383...@lako.no
---
 doc/src/sgml/config.sgml|  5 +++
 src/backend/access/transam/xlog.c   | 19 +
 src/test/perl/PostgresNode.pm   | 33 +--
 src/test/recovery/t/003_recovery_targets.pl | 45 -
 4 files changed, 98 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 5d45b6f7cb..385ccc7196 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -3571,6 +3571,11 @@ Recovery Target
 If  is not enabled, a setting of
 pause will act the same as 
shutdown.

+   
+In any case, if a recovery target is configured but the archive
+recovery ends before the target is reached, the server will shut down
+with a fatal error.
+   
   
  
 
diff --git a/src/backend/access/transam/xlog.c 
b/src/backend/access/transam/xlog.c
index 7f4f784c0e..43a78e365d 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7303,6 +7303,15 @@ StartupXLOG(void)
break;
}
}
+   else if (recoveryTarget != RECOVERY_TARGET_UNSET)
+   {
+   /*
+* A recovery target was set but we got here 
without the
+* target being reached.
+*/
+   ereport(FATAL,
+   (errmsg("recovery ended before 
configured recovery target was reached")));
+   }
 
/* Allow resource managers to do any required cleanup. 
*/
for (rmid = 0; rmid <= RM_MAX_ID; rmid++)
@@ -7324,6 +7333,16 @@ StartupXLOG(void)
}
else
{
+   if (recoveryTarget != RECOVERY_TARGET_UNSET)
+   {
+   /*
+* A recovery target was set but we got here 
without the
+* target being reached.
+*/
+   ereport(FATAL,
+   (errmsg("recovery ended before 
configured recovery target was reached")));
+   }
+
/* there are no WAL records following the checkpoint */
ereport(LOG,
(errmsg("redo is not required")));
diff --git a/src/test/perl/PostgresNode.pm b/src/test/perl/PostgresNode.pm
index 2e0cf4a2f3..be44e8784f 100644
--- a/src/test/perl/PostgresNode.pm
+++ b/src/test/perl/PostgresNode.pm
@@ -653,6 +653,9 @@ Restoring WAL segments from archives using restore_command 
can be enabled
 by passing the keyword parameter has_restoring => 1. This is disabled by
 default.
 
+Is has_restoring is used, standby mode is used by default.  To use
+recovery mode instead, pass the keyword parameter standby => 0.
+
 The backup is copied, leaving the original unmodified. pg_hba.conf is
 unconditionally set to enable replication connections.
 
@@ -669,6 +672,7 @@ sub init_from_backup
 
$params{has_streaming} = 0 unless defined $params{has_streaming};
$params{has_restoring} = 0 unless defined $params{has_restoring};
+   $params{standby} = 1 unless defined $params{standby};
 
print
  "# Initializing node \"$node_name\" from backup 

Re: planner support functions: handle GROUP BY estimates ?

2020-01-14 Thread Tom Lane
Justin Pryzby  writes:
> On Sun, Dec 22, 2019 at 06:16:48PM -0600, Justin Pryzby wrote:
>> On Tue, Nov 19, 2019 at 01:34:21PM -0600, Justin Pryzby wrote:
>>> Tom implemented "Planner support functions":
>>> https://www.postgresql.org/docs/12/xfunc-optimization.html
>>> I wondered whether there was any consideration to extend that to allow
>>> providing improved estimates of "group by".  That currently requires 
>>> manually
>>> by creating an expression index, if the function is IMMUTABLE (which is not
>>> true for eg.  date_trunc of timestamptz).

>> I didn't hear back so tried implementing this for date_trunc().  Currently, 
>> the
>> ...
>> If the input timestamps have (say) hourly granularity, rowcount will be
>> *underestimated* by 3600x, which is worse than the behavior in master of
>> overestimating by (for "day") 24x.

While I don't have any objection in principle to extending the set of
things planner support functions can do, it doesn't seem like the idea is
giving you all that much traction for this problem.  There isn't that much
knowledge that's specific to date_trunc in this, and instead you've got a
bunch of generic problems (that would have to be solved again in every
other function's planner support).

Another issue is that it seems like this doesn't compose nicely ---
if the GROUP BY expression is "f(g(x))", how do f's support function
and g's support function interact?

The direction that I've been wanting to go in for this kind of problem
is to allow CREATE STATISTICS on an expression, ie if you were concerned
about the estimation accuracy for GROUP BY or anything else, you could do
something like

CREATE STATISTICS foo ON date_trunc('day', mod_time) FROM my_table;

This would have the effect of cueing ANALYZE to gather stats on the
value of that expression, which the planner could then use, very much
as if you'd created an index on the expression.  The advantages of
doing this rather than making an index are

(1) you don't have to pay the maintenance costs for an index,

(2) we don't have to restrict it to immutable expressions.  (Volatile
expressions would have to be disallowed, if only because of fear of
side-effects; but I think we could allow stable expressions just fine.
Worst case problem is that the stats are stale, but so what?)

With a solution like this, we don't have to solve any of the difficult
problems of how the pieces of the expression interact with each other
or with the statistics of the underlying column(s).  We just use the
stats if available, and the estimate will be as good as it'd be for
a plain column reference.

I'm not sure how much new infrastructure would have to be built
for this.  We designed the CREATE STATISTICS syntax to support
this (partly at my insistence IIRC) but I do not think any of the
existing plumbing is ready for it.  I don't think it'd be very
hard to plug this into ANALYZE or the planner, but there might be
quite some work to be done on the catalog infrastructure, pg_dump,
etc.

cc'ing Tomas in case he has any thoughts about it.

regards, tom lane




Re: 12.1 not useable: clientlib fails after a dozen queries (GSSAPI ?)

2020-01-14 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> I wrote:
> > Here's a draft patch that cleans up all the logic errors I could find.
> 
> So last night I was assuming that this problem just requires more careful
> attention to what to return in the error exit paths.  In the light of
> morning, though, I realize that the algorithms involved in
> be-secure-gssapi.c and fe-secure-gssapi.c are just fundamentally wrong:
> 
> * On the read side, the code will keep looping until it gets a no-data
> error from the underlying socket call.  This is silly.  In every or
> almost every use, the caller's read length request corresponds to the
> size of a buffer that's meant to be larger than typical messages, so
> that betting that we're going to fill that buffer completely is the
> wrong way to bet.  Meanwhile, it's fairly likely that the incoming
> encrypted packet's length *does* correspond to some actual message
> boundary; that would only not happen if the sender is forced to break
> up a message, which ought to be a minority situation, else our buffer
> size choices are too small.  So it's very likely that the looping just
> results in doubling the number of read() calls that are made, with
> half of them failing with EWOULDBLOCK.  What we should do instead is
> return to the caller whenever we finish handing back the decrypted
> contents of a packet.  We can do the read() on the next call, after
> the caller's dealt with that data.

Yeah, I agree that this is a better approach.  Doing unnecessary
read()'s certainly isn't ideal but beyond being silly it doesn't sound
like this was fundamentally broken..? (yes, the error cases certainly
weren't properly being handled, I understand that)

> * On the write side, if the code encrypts some data and then gets
> EWOULDBLOCK trying to write it, it will tell the caller that it
> successfully wrote that data.  If that was all the data the caller
> had to write (again, not so unlikely) this is a catastrophic
> mistake, because the caller will be satisfied and will go to sleep,
> rather than calling again to attempt another write.  What we *must*
> do is to reflect the write failure verbatim whether or not we
> encrypted some data.  We must remember how much data we encrypted
> and then discount that much of the caller's supplied data next time.
> There are hints in the existing comments that somebody understood
> this at one point, but the code isn't acting that way today.

That's a case I hadn't considered and you're right- the algorithm
certainly wouldn't work in such a case.  I don't recall specifically if
the code had handled it better previously, or not, but I do recall there
was something previously about being given a buffer and then having the
API defined as "give me back the exact same buffer because I had to
stop" and I recall finding that to ugly, but I get it now, seeing this
issue.  I'd certainly be happier if there was a better alternative but I
don't know that there really is.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Add pg_file_sync() to adminpack

2020-01-14 Thread Stephen Frost
Greetings,

* Julien Rouhaud (rjuju...@gmail.com) wrote:
> On Fri, Jan 10, 2020 at 10:50 AM Fujii Masao  wrote:
> > On Thu, Jan 9, 2020 at 10:39 PM Julien Rouhaud  wrote:
> > > I think that pg_write_server_files should be allowed to call that
> > > function by default.
> >
> > But pg_write_server_files users are not allowed to execute
> > other functions like pg_file_write() by default. So doing that
> > change only for pg_file_sync() looks strange to me.
> 
> Ah indeed.  I'm wondering if that's an oversight of the original
> default role patch or voluntary.

It was intentional.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: SQL/JSON: JSON_TABLE

2020-01-14 Thread Pavel Stehule
Hi

I read this patch

There are some typo in doc

*name* *type* EXISTS [ PATH *json_path_specification* ]

*Gerenates* a column and inserts a boolean item into each row of this
column.

Is good to allow repeat examples from documentation - so documentation
should to contains a INSERT with JSON, query and result.

JSON_TABLE is pretty complex function, probably the most complex function
what I know, so I propose to divide documentation to two parts - basic
advanced. The basic should to coverage the work with missing or error
values (with examples), and explain what are wrappers. Advanced part should
to describe work with plans. I afraid so lot of smaller examples has to be
necessary. Personally I propose postpone 0003 and 0004 patches to some next
releases. This is extra functionality and not well used and documented in
other RDBMS (depends on your capacity) - there is problem only in well
documentation - because this feature is not almost used in projects, the
small differences from standard or other RDBMS can be fixed later (like we
fixed XMLTABLE last year).

The documentation is good enough for initial commit - but should be
significantly enhanced before release.

I did some small performance tests - and parsing json with result cca 25000
rows needs 150 ms. It is great time.

My previous objections was solved.

The patches was applied cleanly. The compilation is without any issues and
warnings.
There are enough regress tests, and check-world was passed without problem.
Source code is readable, and well formatted.

I checked standard and checked conformance with other RDBMS.

I will mark this patch - JSON_TABLE implementation as ready for commiter.
The documentation should be enhanced - more examples, more simple examples
are necessary.

Regards

Thank you for your great, complex and hard work

It will be great feature

Pavel






út 14. 1. 2020 v 16:26 odesílatel Nikita Glukhov 
napsal:

> Attached 42th version of the patches rebased onto current master.
>
>
> Changes from the previous version:
>  * added EXISTS PATH columns
>  * added DEFAULT clause for FORMAT JSON columns
>  * added implicit FORMAT JSON for columns of json[b], array and composite 
> types
>
>
> On 21.11.2019 19:51, Pavel Stehule wrote:
>
>
> čt 21. 11. 2019 v 17:31 odesílatel Nikita Glukhov 
> napsal:
>
>> On 17.11.2019 13:35, Pavel Stehule wrote:
>> I found:
>>
>> a) Oracle & MySQL (Oracle) supports EXISTS clause, this implementation not.
>> I think should be useful support this clause too.
>>
>> SELECT * FROM JSON_TABLE('...', '...' COLUMNS x INT EXISTS PATH ...
>>
>>
>> EXISTS PATH clause can be emulated with jsonpath EXISTS() predicate:
>>
>> =# SELECT *
>>FROM JSON_TABLE('{"a": 1}', '$'
>>COLUMNS (
>>  a bool PATH 'exists($.a)',
>>  b bool PATH 'exists($.b)'
>>));
>>  a | b
>> ---+---
>>  t | f
>> (1 row)
>>
>> But this works as expected only in lax mode.  In strict mode EXISTS() returns
>> Unknown that transformed into SQL NULL:
>>
>> =# SELECT *
>>FROM JSON_TABLE('{"a": 1}', '$'
>>COLUMNS (
>>  a bool PATH 'strict exists($.a)',
>>  b bool PATH 'strict exists($.b)'
>>));
>>  a | b
>> ---+---
>>  t |
>> (1 row)
>>
>> There is no easy way to return false without external COALESCE(),
>> DEFAULT false ON ERROR also does not help.
>>
>> So, I think it's worth to add EXISTS PATH clause to our implementation.
>>
>>
>> There is a question how to map boolean result to other data types.
>>
>> Now, boolean result can be used in JSON_TABLE columns of bool, int4, text,
>> json[b], and other types which have CAST from bool:
>>
>> SELECT *
>> FROM JSON_TABLE('{"a": 1}', '$'
>> COLUMNS (
>>   a int PATH 'exists($.a)',
>>   b text PATH 'exists($.b)'
>> ));
>>  a |   b
>> ---+---
>>  1 | false
>> (1 row)
>>
>> EXISTS PATH columns were added.  Only column types having CASTS
> from boolean type are accepted.
>
> Example:
>
> SELECT *
> FROM JSON_TABLE(
>   '{"foo": "bar"}', '$'
>COLUMNS (
>  foo_exists boolean EXISTS PATH '$.foo',
>  foo int EXISTS,
>  err text EXISTS PATH '$ / 0' TRUE ON ERROR
>)
> );
>
>  foo_exists | foo |  err
> +-+--
>  t  |   1 | true
> (1 row)
>
>
>
> b) When searched value is not scalar, then it returns null. This behave can be
>> suppressed by clause FORMAT Json. I found a different behave, and maybe I 
>> found
>> a bug.  On MySQL this clause is by default for JSON values (what has sense).
>>
>> SELECT *
>>  FROM
>>   JSON_TABLE(
>> '[{"a":[1,2]}]',
>> '$[*]'
>> COLUMNS(
>>  aj JSON PATH '$.a' DEFAULT '{"x": 333}' ON EMPTY
>> )
>>   ) AS tt;
>>
>> It returns null, although it should to return [1,2].
>>
>> Yes, regular (non-formatted) JSON_TABLE columns can accept only scalar 
>> values.
>> 

Re: doc: vacuum full, fillfactor, and "extra space"

2020-01-14 Thread Fabien COELHO




Patch applies and compiles.

Given that the paragraph begins with "Plain VACUUM (without FULL)", it is
better to have the VACUUM FULL explanations on a separate paragraph, and the


The original patch does that (Fabien agreed when I asked off list)


Indeed. I may have looked at it in reverse, dunno.

I switched it to ready.

--
Fabien.




Re: doc: vacuum full, fillfactor, and "extra space"

2020-01-14 Thread Justin Pryzby
On Fri, Dec 27, 2019 at 11:58:18AM +0100, Fabien COELHO wrote:
>> I started writing this patch to avoid the possibly-misleading phrase: "with 
>> no
>> extra space" (since it's expected to typically take ~2x space, or 1x "extra"
>> space).
>> 
>> But the original phrase "with no extra space" seems to be wrong anyway, since
>> it actually follows fillfactor, so say that.  Possibly should be backpatched.
> 
> Patch applies and compiles.
> 
> Given that the paragraph begins with "Plain VACUUM (without FULL)", it is
> better to have the VACUUM FULL explanations on a separate paragraph, and the

The original patch does that (Fabien agreed when I asked off list)




Re: our checks for read-only queries are not great

2020-01-14 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > Speaking of sensible progress, I think we've drifted off on a tangent
> > here about ALTER SYSTEM.
> 
> Agreed, that's not terribly relevant for the proposed patch.

I agree that the proposed patch seems alright by itself, as the changes
it's making to existing behavior seem to all be bug-fixes and pretty
clear improvements not really related to 'read-only' transactions.

It's unfortunate that we haven't been able to work through to some kind
of agreement around what "SET TRANSACTION READ ONLY" means, so that
users of it can know what to expect.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Avoid full GIN index scan when possible

2020-01-14 Thread Tom Lane
Alexander Korotkov  writes:
> Updated patch is attached.  It contains more comments as well as commit 
> message.

I reviewed this a little bit.  I agree this seems way more straightforward
than the patches we've been considering so far.  I wasn't too happy with
the state of the comments, so I rewrote them a bit in the attached v13.

One thing I'm still not happy about is the comment in
collectMatchesForHeapRow.  v12 failed to touch that at all, so I tried to
fill it in, but I'm not sure if my explanation is good.  Also, if we know
that excludeOnly keys are going to be ignored, can we save any work in
the main loop of that function?

The test cases needed some work too.  Notably, some of the places where
you tried to use EXPLAIN ANALYZE are unportable because they expose "Heap
Blocks" counts that are not stable.  (I checked the patch on a 32-bit
machine and indeed some of these failed.)  While it'd be possible to work
around that by filtering the EXPLAIN output, that would not be any simpler
or faster than our traditional style of just doing a plain EXPLAIN and a
separate execution.

It troubles me a bit as well that the test cases don't really expose
any difference between patched and unpatched code --- I checked, and
they "passed" without applying any of the code changes.  Maybe there's
not much to be done about that, since after all this is an optimization
that's not supposed to change any query results.

I didn't repeat any of the performance testing --- it seems fairly
clear that this can't make any cases worse.

Other than the collectMatchesForHeapRow issue, I think this is
committable.

regards, tom lane

diff --git a/contrib/pg_trgm/expected/pg_trgm.out b/contrib/pg_trgm/expected/pg_trgm.out
index b3e709f..91596f8 100644
--- a/contrib/pg_trgm/expected/pg_trgm.out
+++ b/contrib/pg_trgm/expected/pg_trgm.out
@@ -3498,6 +3498,107 @@ select count(*) from test_trgm where t ~ '[qwerty]{2}-?[qwerty]{2}';
   1000
 (1 row)
 
+-- check handling of indexquals that generate no searchable conditions
+explain (costs off)
+select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
+ QUERY PLAN  
+-
+ Aggregate
+   ->  Bitmap Heap Scan on test_trgm
+ Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+ ->  Bitmap Index Scan on trgm_idx
+   Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+(5 rows)
+
+select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
+ count 
+---
+19
+(1 row)
+
+explain (costs off)
+select count(*) from test_trgm where t like '%99%' and t like '%qw%';
+   QUERY PLAN
+-
+ Aggregate
+   ->  Bitmap Heap Scan on test_trgm
+ Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text))
+ ->  Bitmap Index Scan on trgm_idx
+   Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text))
+(5 rows)
+
+select count(*) from test_trgm where t like '%99%' and t like '%qw%';
+ count 
+---
+19
+(1 row)
+
+-- ensure that pending-list items are handled correctly, too
+create temp table t_test_trgm(t text COLLATE "C");
+create index t_trgm_idx on t_test_trgm using gin (t gin_trgm_ops);
+insert into t_test_trgm values ('qwerty99'), ('qwerty01');
+explain (costs off)
+select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%';
+ QUERY PLAN  
+-
+ Aggregate
+   ->  Bitmap Heap Scan on t_test_trgm
+ Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+ ->  Bitmap Index Scan on t_trgm_idx
+   Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qwerty%'::text))
+(5 rows)
+
+select count(*) from t_test_trgm where t like '%99%' and t like '%qwerty%';
+ count 
+---
+ 1
+(1 row)
+
+explain (costs off)
+select count(*) from t_test_trgm where t like '%99%' and t like '%qw%';
+   QUERY PLAN
+-
+ Aggregate
+   ->  Bitmap Heap Scan on t_test_trgm
+ Recheck Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text))
+ ->  Bitmap Index Scan on t_trgm_idx
+   Index Cond: ((t ~~ '%99%'::text) AND (t ~~ '%qw%'::text))
+(5 rows)
+
+select count(*) from t_test_trgm where t like '%99%' and t like '%qw%';
+ count 
+---
+ 1
+(1 row)
+
+-- run the same queries with sequential scan to check the results
+set enable_bitmapscan=off;
+set enable_seqscan=on;
+select count(*) from test_trgm where t like '%99%' and t like '%qwerty%';
+ count 
+---
+19
+(1 row)
+
+select 

Re: backup manifests

2020-01-14 Thread David Fetter
On Tue, Jan 14, 2020 at 12:53:04PM -0500, Tom Lane wrote:
> Robert Haas  writes:
> > ... I would also expect that depending on an external package
> > would provoke significant opposition. If we suck the code into core,
> > then we have to keep it up to date with the upstream, which is a
> > significant maintenance burden - look at all the time Tom has spent on
> > snowball, regex, and time zone code over the years.
> 
> Also worth noting is that we have a seriously bad track record about
> choosing external packages to depend on.  The regex code has no upstream
> maintainer anymore (well, the Tcl guys seem to think that *we* are
> upstream for that now), and snowball is next door to moribund.
> With C not being a particularly hip language to develop in anymore,
> it wouldn't surprise me in the least for any C-code JSON parser
> we might pick to go dead pretty soon.

Given jq's extreme popularity and compatible license, I'd nominate that.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate




Re: Unicode escapes with any backend encoding

2020-01-14 Thread Chapman Flack
On 1/14/20 10:10 AM, Tom Lane wrote:
> to me that this error is just useless pedantry.  As long as the DB
> encoding can represent the desired character, it should be transparent
> to users.

That's my position too.

Regards,
-Chap




Re: backup manifests

2020-01-14 Thread Tom Lane
Robert Haas  writes:
> ... I would also expect that depending on an external package
> would provoke significant opposition. If we suck the code into core,
> then we have to keep it up to date with the upstream, which is a
> significant maintenance burden - look at all the time Tom has spent on
> snowball, regex, and time zone code over the years.

Also worth noting is that we have a seriously bad track record about
choosing external packages to depend on.  The regex code has no upstream
maintainer anymore (well, the Tcl guys seem to think that *we* are
upstream for that now), and snowball is next door to moribund.
With C not being a particularly hip language to develop in anymore,
it wouldn't surprise me in the least for any C-code JSON parser
we might pick to go dead pretty soon.

Between that problem and the likelihood that we'd need to make
significant code changes anyway to meet our own coding style etc
expectations, I think really we'd have to assume that we're going
to fork and maintain our own copy of any code we pick.

Now, if it's a small enough chunk of code (and really, how complex
is JSON parsing anyway) maybe that doesn't matter.  But I tend to
agree with Robert's position that it's a big ask for this patch
to introduce a frontend JSON parser.

regards, tom lane




Re: Create/alter policy and exclusive table lock

2020-01-14 Thread Tom Lane
Konstantin Knizhnik  writes:
> But let me ask you one more question: why do we obtaining snapshot twice 
> in exec_simple_query:
> first for analyze (pg_analyze_and_rewrite) and one for execution 
> (PortalStart)?

That would happen anyway if the plan is cached.  If we were to throw away
all plan caching and swear a mighty oath that we'll never put it back,
maybe we could build in a design assumption that planning and execution
use identical snapshots.  I doubt that would lead to a net win though.

Also note that our whole approach to cache invalidation is based on the
assumption that if session A needs to see the effects of session B,
they will be taking conflicting locks.  Otherwise sinval signaling
is not guaranteed to be detected at the necessary times.

regards, tom lane




Use compiler intrinsics for bit ops in hash

2020-01-14 Thread David Fetter
Folks,

The recent patch for distinct windowing aggregates contained a partial
fix of the FIXME that didn't seem entirely right, so I extracted that
part, changed it to use compiler intrinsics, and submit it here.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 2d0022be8cb117da0eadc76900d6558853e4 Mon Sep 17 00:00:00 2001
From: David Fetter 
Date: Tue, 14 Jan 2020 09:32:15 -0800
Subject: [PATCH v1] Use compiler intrinsics for bit ops in hash
To: hackers
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="2.24.1"

This is a multi-part message in MIME format.
--2.24.1
Content-Type: text/plain; charset=UTF-8; format=fixed
Content-Transfer-Encoding: 8bit


- In passing, fix the FIXME by centralizing those calls.

diff --git a/src/backend/access/hash/hashpage.c b/src/backend/access/hash/hashpage.c
index 55d85644a4..b1e04c0136 100644
--- a/src/backend/access/hash/hashpage.c
+++ b/src/backend/access/hash/hashpage.c
@@ -30,6 +30,7 @@
 
 #include "access/hash.h"
 #include "access/hash_xlog.h"
+#include "port/pg_bitutils.h"
 #include "miscadmin.h"
 #include "storage/lmgr.h"
 #include "storage/predicate.h"
@@ -543,11 +544,7 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
 	metap->hashm_ffactor = ffactor;
 	metap->hashm_bsize = HashGetMaxBitmapSize(page);
 	/* find largest bitmap array size that will fit in page size */
-	for (i = _hash_log2(metap->hashm_bsize); i > 0; --i)
-	{
-		if ((1 << i) <= metap->hashm_bsize)
-			break;
-	}
+	i =  pg_leftmost_one_pos32(metap->hashm_bsize);
 	Assert(i > 0);
 	metap->hashm_bmsize = 1 << i;
 	metap->hashm_bmshift = i + BYTE_TO_BIT;
@@ -570,7 +567,7 @@ _hash_init_metabuffer(Buffer buf, double num_tuples, RegProcedure procid,
 	 * Set highmask as next immediate ((2 ^ x) - 1), which should be
 	 * sufficient to cover num_buckets.
 	 */
-	metap->hashm_highmask = (1 << (_hash_log2(num_buckets + 1))) - 1;
+	metap->hashm_highmask = next_power_of_2_32(num_buckets + 1) - 1;
 	metap->hashm_lowmask = (metap->hashm_highmask >> 1);
 
 	MemSet(metap->hashm_spares, 0, sizeof(metap->hashm_spares));
@@ -657,13 +654,12 @@ restart_expand:
 	 * Can't split anymore if maxbucket has reached its maximum possible
 	 * value.
 	 *
-	 * Ideally we'd allow bucket numbers up to UINT_MAX-1 (no higher because
-	 * the calculation maxbucket+1 mustn't overflow).  Currently we restrict
-	 * to half that because of overflow looping in _hash_log2() and
-	 * insufficient space in hashm_spares[].  It's moot anyway because an
-	 * index with 2^32 buckets would certainly overflow BlockNumber and hence
-	 * _hash_alloc_buckets() would fail, but if we supported buckets smaller
-	 * than a disk block then this would be an independent constraint.
+	 * Ideally we'd allow bucket numbers up to UINT_MAX-1 (no higher because the
+	 * calculation maxbucket+1 mustn't overflow).  Currently we restrict to half
+	 * that because of insufficient space in hashm_spares[].  It's moot anyway
+	 * because an index with 2^32 buckets would certainly overflow BlockNumber
+	 * and hence _hash_alloc_buckets() would fail, but if we supported buckets
+	 * smaller than a disk block then this would be an independent constraint.
 	 *
 	 * If you change this, see also the maximum initial number of buckets in
 	 * _hash_init().
diff --git a/src/backend/access/hash/hashsort.c b/src/backend/access/hash/hashsort.c
index 9cb41d62e7..322379788c 100644
--- a/src/backend/access/hash/hashsort.c
+++ b/src/backend/access/hash/hashsort.c
@@ -27,6 +27,7 @@
 
 #include "access/hash.h"
 #include "commands/progress.h"
+#include "port/pg_bitutils.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "utils/tuplesort.h"
@@ -69,7 +70,7 @@ _h_spoolinit(Relation heap, Relation index, uint32 num_buckets)
 	 * NOTE : This hash mask calculation should be in sync with similar
 	 * calculation in _hash_init_metabuffer.
 	 */
-	hspool->high_mask = (((uint32) 1) << _hash_log2(num_buckets + 1)) - 1;
+	hspool->high_mask = next_power_of_2_32(num_buckets + 1) - 1;
 	hspool->low_mask = (hspool->high_mask >> 1);
 	hspool->max_buckets = num_buckets - 1;
 
diff --git a/src/backend/access/hash/hashutil.c b/src/backend/access/hash/hashutil.c
index 9efc8016bc..738572ca40 100644
--- a/src/backend/access/hash/hashutil.c
+++ b/src/backend/access/hash/hashutil.c
@@ -17,6 +17,7 @@
 #include "access/hash.h"
 #include "access/reloptions.h"
 #include "access/relscan.h"
+#include "port/pg_bitutils.h"
 #include "storage/buf_internals.h"
 #include "utils/lsyscache.h"
 #include "utils/rel.h"
@@ -134,21 +135,6 @@ _hash_hashkey2bucket(uint32 hashkey, uint32 maxbucket,
 	return bucket;
 }
 
-/*
- * _hash_log2 -- returns ceil(lg2(num))
- */
-uint32
-_hash_log2(uint32 num)
-{
-	uint32		i,
-limit;
-
-	limit = 1;
-	for (i = 0; limit < num; limit <<= 1, i++)
-		;
-	return i;
-}
-
 /*
  * _hash_spareindex -- returns 

Re: our checks for read-only queries are not great

2020-01-14 Thread Tom Lane
Robert Haas  writes:
> Speaking of sensible progress, I think we've drifted off on a tangent
> here about ALTER SYSTEM.

Agreed, that's not terribly relevant for the proposed patch.

regards, tom lane




Re: backup manifests

2020-01-14 Thread Robert Haas
On Thu, Jan 9, 2020 at 8:19 PM David Steele  wrote:
> For example, have you considered what will happen if you have a file in
> the cluster with a tab in the name?  This is perfectly valid in Posix
> filesystems, at least.

Yeah, there's code for that in the patch I posted. I don't think the
validator patch deals with it, but that's fixable.

> You may already be escaping tabs but the simple
> code snippet you provided earlier isn't going to work so well either
> way.  It gets complicated quickly.

Sure, but obviously neither of those code snippets were intended to be
used straight out of the box. Even after you parse the manifest as
JSON, you would still - if you really want to validate it - check that
you have the keys and values you expect, that the individual field
values are sensible, etc. I still stand by my earlier contention that,
as things stand today, you can parse an ad-hoc format in less code
than a JSON format. If we had a JSON parser available on the front
end, I think it'd be roughly comparable, but maybe the JSON format
would come out a bit ahead. Not sure.

> There are a few MIT-licensed JSON projects that are implemented in a
> single file.  cJSON is very capable while JSMN is very minimal. Is is
> possible that one of those (or something like it) would be acceptable?
> It looks like the one requirement we have is that the JSON can be
> streamed rather than just building up one big blob?  Even with that
> requirement there are a few tricks that can be used.  JSON nests rather
> nicely after all so the individual file records can be transmitted
> independently of the overall file format.

I haven't really looked at these. I would have expected that including
a second JSON parser in core would provoke significant opposition.
Generally, people dislike having more than one piece of code to do the
same thing. I would also expect that depending on an external package
would provoke significant opposition. If we suck the code into core,
then we have to keep it up to date with the upstream, which is a
significant maintenance burden - look at all the time Tom has spent on
snowball, regex, and time zone code over the years. If we don't suck
the code into core but depend on it, then every developer needs to
have that package installed on their operating system, and every
packager has to make sure that it is being built for their OS so that
PostgreSQL can depend on it. Perhaps JSON is so popular today that
imposing such a requirement would provoke only a groundswell of
support, but based on past precedent I would assume that if I
committed a patch of this sort the chances that I'd have to revert it
would be about 99.9%. Optional dependencies for optional features are
usually pretty well-tolerated when they're clearly necessary: e.g. you
can't really do JIT without depending on something like LLVM, but the
bar for a mandatory dependency has historically been quite high.

> Would it be acceptable to bring in JSON code with a compatible license
> to use in libcommon?  If so I'm willing to help adapt that code for use
> in Postgres.  It's possible that the pgBackRest code could be adapted
> similarly, but it might make more sense to start from one of these
> general purpose parsers.

For the reasons above, I expect this approach would be rejected, by
Tom and by others.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Add pg_file_sync() to adminpack

2020-01-14 Thread Julien Rouhaud
On Tue, Jan 14, 2020 at 4:08 PM Atsushi Torikoshi  wrote:
>
> > On Sut, Jan 11, 2020 at  2:12 Fujii Masao :
> > > But pg_write_server_files users are not allowed to execute
> > > other functions like pg_file_write() by default. So doing that
> > > change only for pg_file_sync() looks strange to me.
>
> > Ah indeed.  I'm wondering if that's an oversight of the original
> > default role patch or voluntary.
>
> It's not directly related to the patch, but as far as I read the
> manual below, I expected pg_write_server_files users could execute
>  adminpack functions.
>
>   | Table 21.1 Default Roles
>   | pg_write_server_files: Allow writing to files in any location the 
> database can access on the server with COPY and other file-access functions.

Actually, pg_write_server_files has enough privileges to execute those
functions anywhere on the FS as far as C code is concerned, provided
that the user running postgres daemon is allowed to (see
convert_and_check_filename), but won't be allowed to do so by default
as it won't have EXECUTE privilege on the functions.




Re: our checks for read-only queries are not great

2020-01-14 Thread Robert Haas
On Mon, Jan 13, 2020 at 3:00 PM Stephen Frost  wrote:
> I've never heard that and I don't agree with it being a justification
> for blocking sensible progress.

Speaking of sensible progress, I think we've drifted off on a tangent
here about ALTER SYSTEM. As I understand it, nobody's opposed to the
most recent version (v3) of the proposed patch, which also makes no
definitional changes relative to the status quo, but does fix some
bugs, and makes things a little nicer for parallel query, too. So I'd
like to go ahead and commit that.

Discussing of what to do about ALTER SYSTEM can continue, although I
feel perhaps the current discussion isn't particularly productive.  On
the one hand, the argument that it isn't read only because it writes
data someplace doesn't convince me: practically every command can
cause some kind of write some place, e.g. SELECT can write WAL for at
least 2 different reasons, and that does not make it not read-only,
nor does the fact that it updates the table statistics. The question
of what data is being written must be relevant. On the other hand, I'm
unpersuaded by the arguments so far that including ALTER SYSTEM
commands in pg_dump output would be anything other than a train wreck,
though doing it optionally and not by default might be OK. However,
the main thing for me is that messing around with the behavior of
ALTER SYSTEM in either of those ways or some other is not what this
patch is about. I'm just proposing to refactor the code to fix the
existing bugs and make it much less likely that future patches will
create new ones, and I think reclassifying or redesigning ALTER SYSTEM
ought to be done, if at all, separately.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Create/alter policy and exclusive table lock

2020-01-14 Thread Konstantin Knizhnik




On 14.01.2020 17:40, Tom Lane wrote:

Konstantin Knizhnik  writes:

Right now changing policies (create/alter policy statements) requires
exclusive lock of target table:

Yup.


I wonder why do we really need exclusive lock here?

Because it affects the behavior of a SELECT.


May be I missed something, but why we can not rely on standard MVCC
visibility rules for pg_policy table?

We cannot have a situation where the schema details of a table might
change midway through planning/execution of a statement.  The results
are unlikely to be as clean as "you get either the old behavior or the
new one", because that sequence might examine the details more than
once.  Also, even if you cleanly get the old behavior, that's hardly
satisfactory.  Consider

Session 1Session 2

begin;
alter policy ... on t1 ...;
insert new data into t1;

  begin planning SELECT on t1;

commit;

  begin executing SELECT on t1;

With your proposal, session 2 would see the new data in t1
(because the executor takes a fresh snapshot) but it would not
be affected by the new policy.  That's a security failure,
and it's one that does not happen today.


Thank you for explanation.
But let me ask you one more question: why do we obtaining snapshot twice 
in exec_simple_query:
first for analyze (pg_analyze_and_rewrite) and one for execution 
(PortalStart)?
GetSnapshotData is quite expensive operation and the fact that we are 
calling it twice for each query execution (with read committed isolation 
level)
seems to be strange. Also the problem in scenario you have described 
above is caused by using different snapshots by planner and executor. If 
them will share the same snapshot,

then there should be no security violation, right?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: Setting min/max TLS protocol in clientside libpq

2020-01-14 Thread Daniel Gustafsson
> On 14 Jan 2020, at 15:49, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
 On 11 Jan 2020, at 03:49, Michael Paquier  wrote:
 One thing I noticed when looking at it is that we now have sha2_openssl.c 
 and
 openssl_protocol.c in src/common.  For easier visual grouping of OpenSSL
 functionality, it makes sense to me to rename sha2_openssl.c to 
 openssl_sha2.c,
 but that might just be pointless churn.
> 
>>> Databases like consistency, and so do I, so no issues from me to do a
>>> rename of the sha2.c file.  That makes sense with the addition of the
>>> new file.
> 
>> Done in the attached v3.
> 
> I'm kind of down on renaming files unless there is a *really* strong
> reason for it.  It makes back-patching more difficult and it makes
> it much harder to follow the git history.  And, seeing that there is
> also a src/common/sha2.c, it seems to me that renaming sha2_openssl.c
> will just break consistency in a different way.
> 
> Maybe the problem is you've got the new file's name backwards.
> Maybe it should be protocol_openssl.c.

Thats a very good argument, I’ll send a v4 with protocol_openssl.c when back at 
the computer.

cheers ./daniel



Re: Unicode escapes with any backend encoding

2020-01-14 Thread Tom Lane
I wrote:
> Andrew Dunstan  writes:
>> On Tue, Jan 14, 2020 at 10:02 AM Tom Lane  wrote:
>>> Grepping for other direct uses of unicode_to_utf8(), I notice that
>>> there are a couple of places in the JSON code where we have a similar
>>> restriction that you can only write a Unicode escape in UTF8 server
>>> encoding.  I'm not sure whether these same semantics could be
>>> applied there, so I didn't touch that.

>> Off the cuff I'd be inclined to say we should keep the text escape
>> rules the same. We've already extended the JSON standard y allowing
>> non-UTF8 encodings.

> Right.  I'm just thinking though that if you can write "é" literally
> in a JSON string, even though you're using LATIN1 not UTF8, then why
> not allow writing that as "\u00E9" instead?  The latter is arguably
> truer to spec.
> However, if JSONB collapses "\u00E9" to LATIN1 "é", that would be bad,
> unless we have a way to undo it on printout.  So there might be
> some more moving parts here than I thought.

On third thought, what would be so bad about that?  Let's suppose
I write:

INSERT ... values('{"x": "\u00E9"}'::jsonb);

and the jsonb parsing logic chooses to collapse the backslash to
the represented character, i.e., "é".  Why should it matter whether
the database encoding is UTF8 or LATIN1?  If I am using UTF8
client encoding, I will see the "é" in UTF8 encoding either way,
because of output encoding conversion.  If I am using LATIN1
client encoding, I will see the "é" in LATIN1 either way --- or
at least, I will if the database encoding is UTF8.  Right now I get
an error for that when the database encoding is LATIN1 ... but if
I store the "é" as literal "é", it works, either way.  So it seems
to me that this error is just useless pedantry.  As long as the DB
encoding can represent the desired character, it should be transparent
to users.

regards, tom lane




Re: Add pg_file_sync() to adminpack

2020-01-14 Thread Atsushi Torikoshi
Hello,

> On Sut, Jan 11, 2020 at  2:12 Fujii Masao :
> I'm not sure if returning false with WARNING only in some error cases
> is really good idea or not. At least for me, it's more intuitive to
> return true on success and emit an ERROR otherwise. I'd like to hear
> more opinions about this.

+1.
As a user, I expect these adminpack functions to do similar behaviors
to the corresponding system calls.
System calls for flushing data to disk(fsync on Linux and FlushFileBuffers
 on Windows) return different codes on success and failure, and when it
fails we can get error messages. So it seems straightforward for me to
 'return true on success and emit an ERROR otherwise'.


> > On Thu, Jan 9, 2020 at 10:39 PM Julien Rouhaud 
wrote:
> > >
> > > I think that pg_write_server_files should be allowed to call that
> > > function by default.
> >
> > But pg_write_server_files users are not allowed to execute
> > other functions like pg_file_write() by default. So doing that
> > change only for pg_file_sync() looks strange to me.

> Ah indeed.  I'm wondering if that's an oversight of the original
> default role patch or voluntary.

It's not directly related to the patch, but as far as I read the
manual below, I expected pg_write_server_files users could execute
 adminpack functions.

  | Table 21.1 Default Roles
  | pg_write_server_files: Allow writing to files in any location the
database can access on the server with COPY and other file-access functions.


--
Atsushi Torikoshi


Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR

2020-01-14 Thread Daniel Verite
Dent John wrote:

> It’s crashing when it’s checking that the returned tuple matches the
> declared return type in rsinfo->setDesc. Seems rsinfo->setDesc gets
> overwritten. So I think I have a memory management problem.

What is the expected result anyway? A single column with a "record"
type? FWIW I notice that with plpgsql, this is not allowed to happen:

CREATE FUNCTION cursor_unnest(x refcursor) returns setof record
as $$
declare
 r record;
begin
  loop
fetch x into r;
exit when not found;
return next r;
  end loop;
end $$ language plpgsql;

begin;

declare c cursor for select oid::int as i, relname::text as r from pg_class;

select cursor_unnest('c');

ERROR:  set-valued function called in context that cannot accept a set
CONTEXT:  PL/pgSQL function cursor_unnest(refcursor) line 8 at RETURN NEXT


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Setting min/max TLS protocol in clientside libpq

2020-01-14 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 11 Jan 2020, at 03:49, Michael Paquier  wrote:
>>> One thing I noticed when looking at it is that we now have sha2_openssl.c 
>>> and
>>> openssl_protocol.c in src/common.  For easier visual grouping of OpenSSL
>>> functionality, it makes sense to me to rename sha2_openssl.c to 
>>> openssl_sha2.c,
>>> but that might just be pointless churn.

>> Databases like consistency, and so do I, so no issues from me to do a
>> rename of the sha2.c file.  That makes sense with the addition of the
>> new file.

> Done in the attached v3.

I'm kind of down on renaming files unless there is a *really* strong
reason for it.  It makes back-patching more difficult and it makes
it much harder to follow the git history.  And, seeing that there is
also a src/common/sha2.c, it seems to me that renaming sha2_openssl.c
will just break consistency in a different way.

Maybe the problem is you've got the new file's name backwards.
Maybe it should be protocol_openssl.c.

regards, tom lane




Re: Duplicate Workers entries in some EXPLAIN plans

2020-01-14 Thread Georgios Kokolatos
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

This is a high level review only. However, seeing that there is a conflict and 
does not merge cleanly after commit , I return to Author.

To be fair the resolution seems quite straight forward and I took the liberty 
of applying the necessary changes to include the new element of ExplainState 
introduced in the above commit, namely hide_workers. However since the author 
might have a different idea on how to incorporate this change I leave it up to 
him.

Another very high level comment is the introduction of a new test file, namely 
explain. Seeing `explain.sql` in the tests suits, personally and very opinion 
based, I would have expected the whole spectrum of the explain properties to be 
tested. However only a slight fraction of the functionality is tested. Since 
this is a bit more of a personal opinion, I don't expect any changes unless the 
author happens to agree.

Other than these minor nitpick, the code seems clear, concise and does what it 
says on the box. It follows the comments in the discussion thread(s) and solves 
a real issue.

Please have a look on how commit  affects this patch and rebase.

The new status of this patch is: Waiting on Author


Re: Create/alter policy and exclusive table lock

2020-01-14 Thread Tom Lane
Konstantin Knizhnik  writes:
> Right now changing policies (create/alter policy statements) requires 
> exclusive lock of target table:

Yup.

> I wonder why do we really need exclusive lock here?

Because it affects the behavior of a SELECT.

> May be I missed something, but why we can not rely on standard MVCC 
> visibility rules for pg_policy table?

We cannot have a situation where the schema details of a table might
change midway through planning/execution of a statement.  The results
are unlikely to be as clean as "you get either the old behavior or the
new one", because that sequence might examine the details more than
once.  Also, even if you cleanly get the old behavior, that's hardly
satisfactory.  Consider

Session 1Session 2

begin;
alter policy ... on t1 ...;
insert new data into t1;

 begin planning SELECT on t1;

commit;

 begin executing SELECT on t1;

With your proposal, session 2 would see the new data in t1
(because the executor takes a fresh snapshot) but it would not
be affected by the new policy.  That's a security failure,
and it's one that does not happen today.

regards, tom lane




Re: Making psql error out on output failures

2020-01-14 Thread Daniel Verite
David Z wrote:

> $ psql -d postgres  -At -c "select repeat('111', 100)" >
> /mnt/ramdisk/file

The -A option selects the unaligned output format and -t
switches to the "tuples only" mode (no header, no footer).

> Test-2: delete the "file", run the command within psql console,
> $ rm /mnt/ramdisk/file 
> $ psql -d postgres

In this invocation there's no -A and -t, so the beginning of the
output is going to be a left padded column name that is not in the
other output.
The patch is not involved in that difference.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Setting min/max TLS protocol in clientside libpq

2020-01-14 Thread Daniel Gustafsson
> On 11 Jan 2020, at 03:49, Michael Paquier  wrote:

> Heuh.  It would be perfect to rely only on OpenSSL for that part
> to bring some sanity, and compare the results fetched from the SSL
> context so as we don't have to worry about special cases in with the
> GUC reload if the parameter is not set, or the parameter value is not
> supported.

I'm not convinced about this, but since there is a thread opened for discussing
the range check let's take it over there.

> Daniel, are you planning to start a new thread?

I was going to, but you beat me to it =)

>> One thing I noticed when looking at it is that we now have sha2_openssl.c and
>> openssl_protocol.c in src/common.  For easier visual grouping of OpenSSL
>> functionality, it makes sense to me to rename sha2_openssl.c to 
>> openssl_sha2.c,
>> but that might just be pointless churn.
> 
> Databases like consistency, and so do I, so no issues from me to do a
> rename of the sha2.c file.  That makes sense with the addition of the
> new file.

Done in the attached v3.

cheers ./daniel



libpq_minmaxproto_v3.patch
Description: Binary data


Create/alter policy and exclusive table lock

2020-01-14 Thread Konstantin Knizhnik

Hi hackers,

Right now changing policies (create/alter policy statements) requires 
exclusive lock of target table:


    /* Get id of table.  Also handles permissions checks. */
    table_id = RangeVarGetRelidExtended(stmt->table, AccessExclusiveLock,
                                        0,
                                        RangeVarCallbackForPolicy,
                                        (void *) stmt);

Unfortunately there are use cases where policies are changed quite 
frequently and this exclusive lock becomes a bottleneck.

I wonder why do we really need exclusive lock here?
Policies are stored in pg_policy table and we get  RowExclusiveLock on it.

May be I missed something, but why we can not rely on standard MVCC 
visibility rules for pg_policy table?
Until transaction executing CREATE/ALTER POLICY is committed, other 
transactions will not see its changes in pg_policy table and perform
RLS checks according to old policies. Once transaction is committed, 
everybody will switch to new policies.


I wonder if we it is possible to replace AccessExclusiveLock with 
AccessSharedLock in RangeVarGetRelidExtended in CreatePolicy and 
AlterPolicy?


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company





Re: [Proposal] Global temporary tables

2020-01-14 Thread 曾文旌(义从)


> 2020年1月14日 下午9:20,Pavel Stehule  写道:
> 
> 
> 
> út 14. 1. 2020 v 14:09 odesílatel 曾文旌(义从)  > napsal:
> Thank you for review my patch.
> 
> 
>> 2020年1月12日 上午4:27,Pavel Stehule > > 写道:
>> 
>> Hi
>> 
>> so 11. 1. 2020 v 15:00 odesílatel 曾文旌(义从) > > napsal:
>> Hi all
>> 
>> This is the latest patch
>> 
>> The updates are as follows:
>> 1. Support global temp Inherit table global temp partition table
>> 2. Support serial column in GTT
>> 3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
>> 4. Provide view pg_gtt_attached_pids to manage GTT
>> 5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
>> 6. Alter GTT or rename GTT is allowed under some conditions
>> 
>> 
>> Please give me feedback.
>> 
>> I tested the functionality
>> 
>> 1. i think so "ON COMMIT PRESERVE ROWS" should be default mode (like local 
>> temp tables).
> makes sense, I will fix it.
> 
>> 
>> I tested some simple scripts 
>> 
>> test01.sql
>> 
>> CREATE TEMP TABLE foo(a int, b int);
>> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
>> generate_series(1,1000);
>> ANALYZE foo;
>> SELECT sum(a), sum(b) FROM foo;
>> DROP TABLE foo; -- simulate disconnect
>> 
>> 
>> after 100 sec, the table pg_attribute has 3.2MB
>> and 64 tps, 6446 transaction
>> 
>> test02.sql
>> 
>> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
>> generate_series(1,1000);
>> ANALYZE foo;
>> SELECT sum(a), sum(b) FROM foo;
>> DELETE FROM foo; -- simulate disconnect
>> 
>> 
>> after 100 sec, 1688 tps, 168830 transactions
>> 
>> So performance is absolutely different as we expected.
>> 
>> From my perspective, this functionality is great.
> Yes, frequent ddl causes catalog bloat, GTT avoids this problem.
> 
>> 
>> Todo:
>> 
>> pg_table_size function doesn't work
> Do you mean that function pg_table_size() need get the storage space used by 
> the one GTT in the entire db(include all session) .
> 
> It's question how much GTT tables should be similar to classic tables. But 
> the reporting in psql should to work \dt+, \l+, \di+
Get it, I will fix it.
> 
> 
> 
>> 
>> Regards
>> 
>> Pavel
>> 
>> 
>> Wenjing
>> 
>> 
>> 
>> 
>> 
>>> 2020年1月6日 上午4:06,Tomas Vondra >> > 写道:
>>> 
>>> Hi,
>>> 
>>> I think we need to do something with having two patches aiming to add
>>> global temporary tables:
>>> 
>>> [1] https://commitfest.postgresql.org/26/2349/ 
>>> 
>>> 
>>> [2] https://commitfest.postgresql.org/26/2233/ 
>>> 
>>> 
>>> As a reviewer I have no idea which of the threads to look at - certainly
>>> not without reading both threads, which I doubt anyone will really do.
>>> The reviews and discussions are somewhat intermixed between those two
>>> threads, which makes it even more confusing.
>>> 
>>> I think we should agree on a minimal patch combining the necessary/good
>>> bits from the various patches, and terminate one of the threads (i.e.
>>> mark it as rejected or RWF). And we need to do that now, otherwise
>>> there's about 0% chance of getting this into v13.
>>> 
>>> In general, I agree with the sentiment Rober expressed in [1] - the
>>> patch needs to be as small as possible, not adding "nice to have"
>>> features (like support for parallel queries - I very much doubt just
>>> using shared instead of local buffers is enough to make it work.)
>>> 
>>> regards
>>> 
>>> -- 
>>> Tomas Vondra  http://www.2ndQuadrant.com 
>>> 
>>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>> 
> 



Re: [Proposal] Global temporary tables

2020-01-14 Thread 曾文旌(义从)



> 2020年1月12日 上午9:14,Tomas Vondra  写道:
> 
> On Fri, Jan 10, 2020 at 03:24:34PM +0300, Konstantin Knizhnik wrote:
>> 
>> 
>> On 09.01.2020 19:30, Tomas Vondra wrote:
>> 
>> 
>>> 
 
> 
>> 3 Still no one commented on GTT's transaction information processing, 
>> they include
>> 3.1 Should gtt's frozenxid need to be care?
>> 3.2 gtt’s clog clean
>> 3.3 How to deal with "too old" gtt data
>> 
> 
> No idea what to do about this.
> 
 
 I wonder what is the specific of GTT here?
 The same problem takes place for normal (local) temp tables, doesn't it?
 
>>> 
>>> Not sure. TBH I'm not sure I understand what the issue actually is.
>> 
>> Just open session, create temporary table and insert some data in it.
>> Then in other session run 2^31 transactions (at my desktop it takes about 2 
>> hours).
>> As far as temp tables are not proceeded by vacuum, database is stalled:
>> 
>>  ERROR:  database is not accepting commands to avoid wraparound data loss in 
>> database "postgres"
>> 
>> It seems to be quite dubious behavior and it is strange to me that nobody 
>> complains about it.
>> We discuss  many issues related with temp tables (statistic, parallel 
>> queries,...) which seems to be less critical.
>> 
>> But this problem is not specific to GTT - it can be reproduced with normal 
>> (local) temp tables.
>> This is why I wonder why do we need to solve it in GTT patch.
>> 
> 
> Yeah, I think that's out of scope for GTT patch. Once we solve it for
> plain temporary tables, we'll solve it for GTT too.
1. The core problem is that the data contains transaction information (xid), 
which needs to be vacuum(freeze) regularly to avoid running out of xid.
The autovacuum supports vacuum regular table but local temp does not. 
autovacuum also does not support GTT.

2. However, the difference between the local temp table and the global temp 
table(GTT) is that
a) For local temp table: one table hava one piece of data. the frozenxid of one 
local temp table is store in the catalog(pg_class). 
b) For global temp table: each session has a separate copy of data, one GTT may 
contain maxbackend frozenxid.
and I don't think it's a good idea to keep frozenxid of GTT in the 
catalog(pg_class). 
It becomes a question: how to handle GTT transaction information?

I agree that problem 1 should be completely solved by a some feature, such as 
local transactions. It is definitely not included in the GTT patch.

But, I think we need to ensure the durability of GTT data. For example, data in 
GTT cannot be lost due to the clog being cleaned up. It belongs to problem 2.



Wenjing


> 
> regards
> 
> -- 
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





Re: base backup client as auxiliary backend process

2020-01-14 Thread Peter Eisentraut

On 2020-01-14 07:32, Masahiko Sawada wrote:

- Replication slot name used by this WAL receiver
+ 
+  Replication slot name used by this WAL receiver.  This is only set if a
+  permanent replication slot is set using .  Otherwise, the WAL receiver may use
+  a temporary replication slot (determined by ), but these are not shown
+  here.
+ 

Now that the slot name is shown even if it's a temp slot the above
documentation changes needs to be changed. Other changes look good to
me.


committed, thanks

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [Proposal] Global temporary tables

2020-01-14 Thread Pavel Stehule
út 14. 1. 2020 v 14:09 odesílatel 曾文旌(义从) 
napsal:

> Thank you for review my patch.
>
>
> 2020年1月12日 上午4:27,Pavel Stehule  写道:
>
> Hi
>
> so 11. 1. 2020 v 15:00 odesílatel 曾文旌(义从) 
> napsal:
>
>> Hi all
>>
>> This is the latest patch
>>
>> The updates are as follows:
>> 1. Support global temp Inherit table global temp partition table
>> 2. Support serial column in GTT
>> 3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
>> 4. Provide view pg_gtt_attached_pids to manage GTT
>> 5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
>> 6. Alter GTT or rename GTT is allowed under some conditions
>>
>>
>> Please give me feedback.
>>
>
> I tested the functionality
>
> 1. i think so "ON COMMIT PRESERVE ROWS" should be default mode (like local
> temp tables).
>
> makes sense, I will fix it.
>
>
> I tested some simple scripts
>
> test01.sql
>
> CREATE TEMP TABLE foo(a int, b int);
> INSERT INTO foo SELECT random()*100, random()*1000 FROM
> generate_series(1,1000);
> ANALYZE foo;
> SELECT sum(a), sum(b) FROM foo;
> DROP TABLE foo; -- simulate disconnect
>
>
> after 100 sec, the table pg_attribute has 3.2MB
> and 64 tps, 6446 transaction
>
> test02.sql
>
> INSERT INTO foo SELECT random()*100, random()*1000 FROM
> generate_series(1,1000);
> ANALYZE foo;
> SELECT sum(a), sum(b) FROM foo;
> DELETE FROM foo; -- simulate disconnect
>
>
> after 100 sec, 1688 tps, 168830 transactions
>
> So performance is absolutely different as we expected.
>
> From my perspective, this functionality is great.
>
> Yes, frequent ddl causes catalog bloat, GTT avoids this problem.
>
>
> Todo:
>
> pg_table_size function doesn't work
>
> Do you mean that function pg_table_size() need get the storage space used
> by the one GTT in the entire db(include all session) .
>

It's question how much GTT tables should be similar to classic tables. But
the reporting in psql should to work \dt+, \l+, \di+



>
> Regards
>
> Pavel
>
>
>> Wenjing
>>
>>
>>
>>
>>
>> 2020年1月6日 上午4:06,Tomas Vondra  写道:
>>
>> Hi,
>>
>> I think we need to do something with having two patches aiming to add
>> global temporary tables:
>>
>> [1] https://commitfest.postgresql.org/26/2349/
>>
>> [2] https://commitfest.postgresql.org/26/2233/
>>
>> As a reviewer I have no idea which of the threads to look at - certainly
>> not without reading both threads, which I doubt anyone will really do.
>> The reviews and discussions are somewhat intermixed between those two
>> threads, which makes it even more confusing.
>>
>> I think we should agree on a minimal patch combining the necessary/good
>> bits from the various patches, and terminate one of the threads (i.e.
>> mark it as rejected or RWF). And we need to do that now, otherwise
>> there's about 0% chance of getting this into v13.
>>
>> In general, I agree with the sentiment Rober expressed in [1] - the
>> patch needs to be as small as possible, not adding "nice to have"
>> features (like support for parallel queries - I very much doubt just
>> using shared instead of local buffers is enough to make it work.)
>>
>> regards
>>
>> --
>> Tomas Vondra  http://www.2ndQuadrant.com
>> 
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>>
>>
>


Re: [Proposal] Global temporary tables

2020-01-14 Thread 曾文旌(义从)
Thank you for review my patch.


> 2020年1月12日 上午4:27,Pavel Stehule  写道:
> 
> Hi
> 
> so 11. 1. 2020 v 15:00 odesílatel 曾文旌(义从)  > napsal:
> Hi all
> 
> This is the latest patch
> 
> The updates are as follows:
> 1. Support global temp Inherit table global temp partition table
> 2. Support serial column in GTT
> 3. Provide views pg_gtt_relstats pg_gtt_stats for GTT’s statistics
> 4. Provide view pg_gtt_attached_pids to manage GTT
> 5. Provide function pg_list_gtt_relfrozenxids() to manage GTT
> 6. Alter GTT or rename GTT is allowed under some conditions
> 
> 
> Please give me feedback.
> 
> I tested the functionality
> 
> 1. i think so "ON COMMIT PRESERVE ROWS" should be default mode (like local 
> temp tables).
makes sense, I will fix it.

> 
> I tested some simple scripts 
> 
> test01.sql
> 
> CREATE TEMP TABLE foo(a int, b int);
> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
> generate_series(1,1000);
> ANALYZE foo;
> SELECT sum(a), sum(b) FROM foo;
> DROP TABLE foo; -- simulate disconnect
> 
> 
> after 100 sec, the table pg_attribute has 3.2MB
> and 64 tps, 6446 transaction
> 
> test02.sql
> 
> INSERT INTO foo SELECT random()*100, random()*1000 FROM 
> generate_series(1,1000);
> ANALYZE foo;
> SELECT sum(a), sum(b) FROM foo;
> DELETE FROM foo; -- simulate disconnect
> 
> 
> after 100 sec, 1688 tps, 168830 transactions
> 
> So performance is absolutely different as we expected.
> 
> From my perspective, this functionality is great.
Yes, frequent ddl causes catalog bloat, GTT avoids this problem.

> 
> Todo:
> 
> pg_table_size function doesn't work
Do you mean that function pg_table_size() need get the storage space used by 
the one GTT in the entire db(include all session) .

> 
> Regards
> 
> Pavel
> 
> 
> Wenjing
> 
> 
> 
> 
> 
>> 2020年1月6日 上午4:06,Tomas Vondra > > 写道:
>> 
>> Hi,
>> 
>> I think we need to do something with having two patches aiming to add
>> global temporary tables:
>> 
>> [1] https://commitfest.postgresql.org/26/2349/ 
>> 
>> 
>> [2] https://commitfest.postgresql.org/26/2233/ 
>> 
>> 
>> As a reviewer I have no idea which of the threads to look at - certainly
>> not without reading both threads, which I doubt anyone will really do.
>> The reviews and discussions are somewhat intermixed between those two
>> threads, which makes it even more confusing.
>> 
>> I think we should agree on a minimal patch combining the necessary/good
>> bits from the various patches, and terminate one of the threads (i.e.
>> mark it as rejected or RWF). And we need to do that now, otherwise
>> there's about 0% chance of getting this into v13.
>> 
>> In general, I agree with the sentiment Rober expressed in [1] - the
>> patch needs to be as small as possible, not adding "nice to have"
>> features (like support for parallel queries - I very much doubt just
>> using shared instead of local buffers is enough to make it work.)
>> 
>> regards
>> 
>> -- 
>> Tomas Vondra  http://www.2ndQuadrant.com 
>> 
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 



Re: ALTER TABLE support for dropping generation expression

2020-01-14 Thread Peter Eisentraut

On 2020-01-13 10:56, Sergei Kornilov wrote:

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Thank you!
Looks good to me. I have no further comments. I'll mark as ready for committer.

The new status of this patch is: Ready for Committer


committed, thanks

--
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: [HACKERS] Block level parallel vacuum

2020-01-14 Thread Amit Kapila
On Tue, Jan 14, 2020 at 4:17 PM Mahendra Singh Thalor
 wrote:
>
> Hi,
>
> +/*
> + * Try to initialize the parallel vacuum if requested
> + */
> +if (params->nworkers >= 0 && vacrelstats->useindex)
> +{
> +/*
> + * Since parallel workers cannot access data in temporary tables, we
> + * can't perform parallel vacuum on them.
> + */
> +if (RelationUsesLocalBuffers(onerel))
> +{
> +/*
> + * Give warning only if the user explicitly tries to perform a
> + * parallel vacuum on the temporary table.
> + */
> +if (params->nworkers > 0)
> +ereport(WARNING,
> +(errmsg("disabling parallel option of vacuum
> on \"%s\" --- cannot vacuum temporary tables in parallel",
>
> From v45 patch, we moved warning of temporary table into
> "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> don't have any index, then we are not giving any warning. I think, we
> should give warning for all the temporary tables if parallel degree is
> given. (Till v44 patch, we were giving warning for all the temporary
> tables(having index and without index))
>

I am not sure how useful it is to give WARNING in this case as we are
anyway not going to perform a parallel vacuum because it doesn't have
an index?  One can also say that WARNING is expected in the cases
where we skip a parallel vacuum due to any reason (ex., if the size of
the index is small), but I don't think that will be a good idea.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com




Re: [HACKERS] Block level parallel vacuum

2020-01-14 Thread Amit Kapila
On Tue, Jan 14, 2020 at 10:04 AM Masahiko Sawada
 wrote:
>
> On Mon, 13 Jan 2020 at 12:50, Amit Kapila  wrote:
> >
> > On Sat, Jan 11, 2020 at 7:48 PM Masahiko Sawada
> >  wrote:
> >
> > Okay, would it better if we get rid of this variable and have code like 
> > below?
> >
> > /* Skip the indexes that can be processed by parallel workers */
> > if ( !(get_indstats(lps->lvshared, i) == NULL ||
> > skip_parallel_vacuum_index(Irel[i], lps->lvshared)))
> > continue;
>
> Make sense to me.
>

I have changed the comment and condition to make it a positive test so
that it is more clear.

> > ...
> > > Agreed. But with the updated patch the PARALLEL option without the
> > > parallel degree doesn't display warning because params->nworkers = 0
> > > in that case. So how about restoring params->nworkers at the end of
> > > vacuum_rel()?
> > >
> >
> > I had also thought on those lines, but I was not entirely sure about
> > this resetting of workers.  Today, again thinking about it, it seems
> > the idea Mahendra is suggesting that is giving an error if the
> > parallel degree is not specified seems reasonable to me.  This means
> > Vacuum (parallel), Vacuum (parallel) , etc. will give an
> > error "parallel degree must be specified".  This idea has merit as now
> > we are supporting a parallel vacuum by default, so a 'parallel' option
> > without a parallel degree doesn't have any meaning.  If we do that,
> > then we don't need to do anything additional about the handling of
> > temp tables (other than what patch is already doing) as well.  What do
> > you think?
> >
>
> Good point! Agreed.
>

Thanks, changed accordingly.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


v47-0001-Introduce-IndexAM-fields-for-parallel-vacuum.patch
Description: Binary data


v47-0002-Allow-vacuum-command-to-process-indexes-in-parallel.patch
Description: Binary data


Re: Option to dump foreign data in pg_dump

2020-01-14 Thread Luis Carril
Can you have a look at dump with parallel option. Parallel option will
take a lock on table while invoking lockTableForWorker. May be this is
not required for foreign tables.
Thoughts?
I tried with -j and found no issue. I guess that the foreign table needs 
locking anyway to prevent anyone to modify it while is being dumped.

Cheers,

Luis M Carril

From: vignesh C 
Sent: Tuesday, January 14, 2020 1:48 AM
To: Luis Carril 
Cc: Alvaro Herrera ; Daniel Gustafsson 
; Laurenz Albe ; PostgreSQL Hackers 

Subject: Re: Option to dump foreign data in pg_dump

On Fri, Nov 29, 2019 at 2:10 PM Luis Carril  wrote:
>
> Luis,
>
> It seems you've got enough support for this concept, so let's move
> forward with this patch.  There are some comments from Tom about the
> patch; would you like to send an updated version perhaps?
>
> Thanks
>
> Hi,
>
>  I've attached a new version (v6) removing the superfluous JOIN that Tom 
> identified, and not collecting the oids (avoiding the query) if the option is 
> not used at all.
>
> About the testing issues that Tom mentioned:
> I do not see how can we have a pure SQL dummy FDW that tests the 
> functionality. Because the only way to identify if the data of a foreign 
> table for the chosen server is dumped is if the COPY statement appears in the 
> output, but if the C callbacks of the FDW are not implemented, then the 
> SELECT that dumps the data to generate the COPY cannot be executed.
> Also, to test that the include option chooses only the data of the  specified 
> foreign servers we would need some negative testing, i.e. that the COPY 
> statement for the non-desired table does not appear. But I do not find these 
> kind of tests in the test suite, even for other selective options like 
> --table or --exclude-schema.
>

Can you have a look at dump with parallel option. Parallel option will
take a lock on table while invoking lockTableForWorker. May be this is
not required for foreign tables.
Thoughts?

Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

2020-01-14 Thread Mahendra Singh Thalor
On Tue, 14 Jan 2020 at 16:17, Mahendra Singh Thalor 
wrote:
>
> On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
>  wrote:
> >
> > On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor 
wrote:
> > >
> > > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> > > >
> > > > Hi
> > > > Thank you for update! I looked again
> > > >
> > > > (vacuum_indexes_leader)
> > > > +   /* Skip the indexes that can be processed by
parallel workers */
> > > > +   if (!skip_index)
> > > > +   continue;
> > > >
> > > > Does the variable name skip_index not confuse here? Maybe rename to
something like can_parallel?
> > > >
> > >
> > > Again I looked into code and thought that somehow if we can add a
> > > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > > identify that this index is supporting parallel vacuum or not, then it
> > > will be easy to skip those indexes and multiple time we will not call
> > > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > > parallel_vacuum_index)
> > > We can have a linked list of non-parallel supported indexes, then
> > > directly we can pass to vacuum_indexes_leader.
> > >
> > > Ex: let suppose we have 5 indexes into a table.  If before launching
> > > parallel workers, if we can add boolean flag(can_parallel)
> > > IndexBulkDeleteResult structure to identify that this index is
> > > supporting parallel vacuum or not.
> > > Let index 1, 4 are not supporting parallel vacuum so we already have
> > > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > > parallel_vacuum_index will process these indexes and rest will be
> > > processed by parallel workers. If parallel worker found that
> > > can_parallel is false, then it will skip that index.
> > >
> > > As per my understanding, if we implement this, then we can avoid
> > > multiple function calling of skip_parallel_vacuum_index and if there
> > > is no index which can't  performe parallel vacuum, then we will not
> > > call vacuum_indexes_leader as head of list pointing to null. (we can
> > > save unnecessary calling of vacuum_indexes_leader)
> > >
> > > Thoughts?
> > >
> >
> > We skip not only indexes that don't support parallel index vacuum but
> > also indexes supporting it depending on vacuum phase. That is, we
> > could skip different indexes at different vacuum phase. Therefore with
> > your idea, we would need to have at least three linked lists for each
> > possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> > that right?
> >
> > I think we can check if there are indexes that should be processed by
> > the leader process before entering the loop in vacuum_indexes_leader
> > by comparing nindexes_parallel_XXX of LVParallelState to the number of
> > indexes but I'm not sure it's effective since the number of indexes on
> > a table should be small.
> >
>
> Hi,
>
> +/*
> + * Try to initialize the parallel vacuum if requested
> + */
> +if (params->nworkers >= 0 && vacrelstats->useindex)
> +{
> +/*
> + * Since parallel workers cannot access data in temporary
tables, we
> + * can't perform parallel vacuum on them.
> + */
> +if (RelationUsesLocalBuffers(onerel))
> +{
> +/*
> + * Give warning only if the user explicitly tries to perform
a
> + * parallel vacuum on the temporary table.
> + */
> +if (params->nworkers > 0)
> +ereport(WARNING,
> +(errmsg("disabling parallel option of vacuum
> on \"%s\" --- cannot vacuum temporary tables in parallel",
>
> From v45 patch, we moved warning of temporary table into
> "params->nworkers >= 0 && vacrelstats->useindex)" check so if table
> don't have any index, then we are not giving any warning. I think, we
> should give warning for all the temporary tables if parallel degree is
> given. (Till v44 patch, we were giving warning for all the temporary
> tables(having index and without index))
>
> Thoughts?

Hi,
I did some more review.  Below is the 1 review comment for v46-0002.

+/*
+ * Initialize the state for parallel vacuum
+ */
+if (params->nworkers >= 0 && vacrelstats->useindex)
+{
+/*
+ * Since parallel workers cannot access data in temporary tables,
we
+ * can't perform parallel vacuum on them.
+ */
+if (RelationUsesLocalBuffers(onerel)

In above check, we should add "nindexes > 1" check so that if there is only
1 index, then we will not call begin_parallel_vacuum.

"Initialize the state for parallel vacuum",we can improve this comment by
mentioning that what are doing here. (If table has more than index and
parallel vacuum is requested, then try to start parallel vacuum).

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com


Re: [HACKERS] Block level parallel vacuum

2020-01-14 Thread Mahendra Singh Thalor
On Tue, 14 Jan 2020 at 10:06, Masahiko Sawada
 wrote:
>
> On Tue, 14 Jan 2020 at 03:20, Mahendra Singh Thalor  
> wrote:
> >
> > On Fri, 10 Jan 2020 at 15:51, Sergei Kornilov  wrote:
> > >
> > > Hi
> > > Thank you for update! I looked again
> > >
> > > (vacuum_indexes_leader)
> > > +   /* Skip the indexes that can be processed by parallel 
> > > workers */
> > > +   if (!skip_index)
> > > +   continue;
> > >
> > > Does the variable name skip_index not confuse here? Maybe rename to 
> > > something like can_parallel?
> > >
> >
> > Again I looked into code and thought that somehow if we can add a
> > boolean flag(can_parallel)  in IndexBulkDeleteResult structure to
> > identify that this index is supporting parallel vacuum or not, then it
> > will be easy to skip those indexes and multiple time we will not call
> > skip_parallel_vacuum_index (from vacuum_indexes_leader and
> > parallel_vacuum_index)
> > We can have a linked list of non-parallel supported indexes, then
> > directly we can pass to vacuum_indexes_leader.
> >
> > Ex: let suppose we have 5 indexes into a table.  If before launching
> > parallel workers, if we can add boolean flag(can_parallel)
> > IndexBulkDeleteResult structure to identify that this index is
> > supporting parallel vacuum or not.
> > Let index 1, 4 are not supporting parallel vacuum so we already have
> > info in a linked list that 1->4 are not supporting parallel vacuum, so
> > parallel_vacuum_index will process these indexes and rest will be
> > processed by parallel workers. If parallel worker found that
> > can_parallel is false, then it will skip that index.
> >
> > As per my understanding, if we implement this, then we can avoid
> > multiple function calling of skip_parallel_vacuum_index and if there
> > is no index which can't  performe parallel vacuum, then we will not
> > call vacuum_indexes_leader as head of list pointing to null. (we can
> > save unnecessary calling of vacuum_indexes_leader)
> >
> > Thoughts?
> >
>
> We skip not only indexes that don't support parallel index vacuum but
> also indexes supporting it depending on vacuum phase. That is, we
> could skip different indexes at different vacuum phase. Therefore with
> your idea, we would need to have at least three linked lists for each
> possible vacuum phase(bulkdelete, conditional cleanup and cleanup), is
> that right?
>
> I think we can check if there are indexes that should be processed by
> the leader process before entering the loop in vacuum_indexes_leader
> by comparing nindexes_parallel_XXX of LVParallelState to the number of
> indexes but I'm not sure it's effective since the number of indexes on
> a table should be small.
>

Hi,

+/*
+ * Try to initialize the parallel vacuum if requested
+ */
+if (params->nworkers >= 0 && vacrelstats->useindex)
+{
+/*
+ * Since parallel workers cannot access data in temporary tables, we
+ * can't perform parallel vacuum on them.
+ */
+if (RelationUsesLocalBuffers(onerel))
+{
+/*
+ * Give warning only if the user explicitly tries to perform a
+ * parallel vacuum on the temporary table.
+ */
+if (params->nworkers > 0)
+ereport(WARNING,
+(errmsg("disabling parallel option of vacuum
on \"%s\" --- cannot vacuum temporary tables in parallel",

>From v45 patch, we moved warning of temporary table into
"params->nworkers >= 0 && vacrelstats->useindex)" check so if table
don't have any index, then we are not giving any warning. I think, we
should give warning for all the temporary tables if parallel degree is
given. (Till v44 patch, we were giving warning for all the temporary
tables(having index and without index))

Thoughts?

-- 
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com




  1   2   >