date:20201030

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-30 Thread Bharath Rupireddy

On Fri, Oct 30, 2020 at 6:13 PM vignesh C  wrote:
>
> I have added the log validation to the existing tests that are present
> for authentication.
>

I took a look at v3 patch. Here are some comments.

1. Why are the input strings(not the newly added GSS log message
string) to test_access() function are in some places double-quoted and
in some places single quoted?

'succeeds with mapping with default gssencmode and host hba',
'connection authorized: user=test1 database=postgres
application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
principal=test1\@EXAMPLE.COM\)'
);
"succeeds with GSS-encrypted access required with host hba",
'connection authorized: user=test1 database=postgres
application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
principal=test1\@EXAMPLE.COM\)'
);

And also for

test_access(
$node,
'test1',<<< single quotes

test_access(
$node,
"test1",   <<< double quotes

Looks like we use double quoted strings in perl if we have any
variables inside the string to be replaced by the interpreter or else
single quoted strings are fine[1]. If this is true, can we make it
uniform across this file at least?

2. Instead of using hardcoded values for application_name and
principal, can we use variables? For application_name we can directly
use a single variable and use it. I think principal name is a formed
value, can we use that formed variable?

 application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
principal=test1\@EXAMPLE.COM\)'

3. Why are we using escape character before ( and @, IIUC, to not let
interpreter replace it with any value. If this is correct, it doesn't
make sense here as we are using single quoted strings. The perl
interpreter replaces the variables only when strings are used in
double quotes[1].

+'connection authorized: user=test1 database=postgres
application_name=001_auth.pl GSS \(authenticated=yes, encrypted=yes,
principal=test1\@EXAMPLE.COM\)'
+);

I ran the keroberos tests on my dev machine. make check of 001_auth.pl
is passing.

[1] - 
https://www.geeksforgeeks.org/perl-quoted-interpolated-and-escaped-strings/

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-30 Thread Bharath Rupireddy

On Sat, Oct 31, 2020 at 9:04 AM Bharath Rupireddy
 wrote:
>
> > +$node->append_conf('postgresql.conf', "logging_collector= 'on'");
> > +$node->append_conf('postgresql.conf', "log_connections= 'on'");
> >
> > booleans don't need quotes.
> >
>
> I think that's not correct. If I'm right, the snippet pointed above is
> from a perl script. In C, the strings are null terminated and they are
> represented within double quotes. So we need to use double quotes for
> _("on") : _("off"). And also the definition of _( ) macro points to a
> function err_gettext() that expects C-style string i.e null
> terminated.
>

I'm sorry for the above point, I misunderstood it. I took a further
look at the patch. It seems like it's a mix. In some place we are not
using quotes for booleans, for instance,

$node->append_conf('postgresql.conf', 'autovacuum=off');
$node->append_conf('postgresql.conf', 'track_commit_timestamp = on');

but in one place we are using quotes

$node->append_conf('postgresql.conf', "ssl = 'on'");

Either way seems to be fine as we don't have any variables inside the
strings to be replaced by the perl interpreter.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-30 Thread Bharath Rupireddy

On Fri, Oct 30, 2020 at 6:35 PM Euler Taveira
 wrote:
>
> + appendStringInfo(, "replication ");
> +
> + appendStringInfo(, "connection authorized: user=%s",
> + port->user_name);
> + if (!am_walsender)
> + appendStringInfo(, " database=%s", port->database_name);
> +
> + if (port->application_name != NULL)
> + appendStringInfo(, " application_name=%s",
> + port->application_name);
> +
>
> Your approach breaks localization. You should use multiple errmsg.
>

IIUC, isn't it enough calling a single errmsg() inside ereport() with
the prepared logmsg.data (which is a string)? The errmsg() function
will do the required translation of the logmsg.data. Why do we need
multiple errmsg() calls? Could you please elaborate a bit on how the
way currently it is done in the patch breaks localization?

+ereport(LOG, errmsg("%s", logmsg.data));

>
> +$node->append_conf('postgresql.conf', "logging_collector= 'on'");
> +$node->append_conf('postgresql.conf', "log_connections= 'on'");
>
> booleans don't need quotes.
>

I think that's not correct. If I'm right, the snippet pointed above is
from a perl script. In C, the strings are null terminated and they are
represented within double quotes. So we need to use double quotes for
_("on") : _("off"). And also the definition of _( ) macro points to a
function err_gettext() that expects C-style string i.e null
terminated.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: Add important info about ANALYZE after create Functional Index

2020-10-30 Thread Justin Pryzby

On Fri, Oct 30, 2020 at 03:22:52PM +0900, Michael Paquier wrote:
> On Thu, Oct 29, 2020 at 10:59:52AM +0900, Michael Paquier wrote:
> > REINDEX CONCURRENTLY is by design wanted to provide an experience
> > transparent to the user similar to what a plain REINDEX would do, at
> > least that's the idea behind it, so..  This qualifies as a bug to me,
> > in spirit.
> 
> And in spirit, it is possible to address this issue with the patch
> attached which copies the set of stats from the old to the new index.
> For a non-concurrent REINDEX, this does not happen because we keep the
> same base relation, while we replace completely the relation with a
> concurrent operation.  We have a RemoveStatistics() in heap.c, but I
> did not really see the point to invent a copy flavor for this
> particular case.  Perhaps others have an opinion on that?

+1

The implementation of REINDEX CONCURRENTLY is "CREATE INDEX CONCURRENTLY
followed by internal index swap".  But the command is called "reindex", and so
the user experience is that the statistics are inexplicably lost.

(I'm quoting from the commit message of the patch I wrote, which is same as
your patch).

-- 
Justin

Re: bulk typos

2020-10-30 Thread Justin Pryzby

On Sun, Oct 25, 2020 at 02:48:49PM -0500, Justin Pryzby wrote:
> On Sat, Mar 31, 2018 at 05:56:40AM -0500, Justin Pryzby wrote:
> > I needed another distraction so bulk-checked for typos, limited to comments 
> > in
> > *.[ch].
> > 
> > I'm not passionate about this, but it serves the purpose of reducing the
> > overhead of fixing them individually.
> 
> I happened across this old patch, so ran this again to find new typos.
> 
> There's a few that I don't know how best to fix.

I've added a few more, but still not sure about these two.

> Pavel, I can't understand this one.
> I guess s/producement/producing/ is too simple of a fix.
> Please check my proposed language.
> +XmlTableFetchRow(TableFuncScanState *state)
> +* XmlTable returns table - set of composite values. The error 
> context, is
> +* used for producement more values, between two calls, there can be
> +* created and used another libxml2 error context. ...
> 
> Surafel, this typo existed twice in the original commit (357889eb1).
> One instance was removed by Tom in 35cb574aa.  Should we simply fix the typo,
> or borrow Julien/Tom's lanuage?
> +   * Tuple at limit is needed for comparation in subsequent
> +   * execution to detect ties.
>From 5b5fec23af33b25f261a875dcd26c60564df0d89 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 25 Oct 2020 18:28:36 -0500
Subject: [PATCH] pgindent typos

Note that there's two changes to one line in lexi.c
---
 io.c   | 4 ++--
 lexi.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/io.c b/io.c
index fbaa5dd..a4812b6 100644
--- a/io.c
+++ b/io.c
@@ -452,8 +452,8 @@ fill_buffer(void)
  *
  * ALGORITHM: Put tabs and/or blanks into pobuf, then write pobuf.
  *
- * PARAMETERS: current		integer		The current column target
- * nteger		The desired column
+ * PARAMETERS: current		integer		The current column
+ * target   integer		The desired column
  *
  * RETURNS: Integer value of the new column.  (If current >= target, no action is
  * taken, and current is returned.
diff --git a/lexi.c b/lexi.c
index d43723c..f01596a 100644
--- a/lexi.c
+++ b/lexi.c
@@ -442,7 +442,7 @@ lexi(struct parser_state *state)
 	 * then following sign is unary */
 	state->last_u_d = true;	/* will make "int a -1" work */
 	return (ident);		/* the ident is not in the list */
-}/* end of procesing for alpanum character */
+}/* end of processing for alphanum character */
 
 /* Scan a non-alphanumeric token */
 
-- 
2.17.0

>From c0f112b0c9a1c4007a8562269b36f1fa21ee5a07 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 25 Oct 2020 13:56:19 -0500
Subject: [PATCH v2 1/6] bulk typos

Using same strategy as here
https://www.postgresql.org/message-id/20180331105640.GK28454%40telsasoft.com
---
 contrib/amcheck/verify_heapam.c  | 2 +-
 src/backend/access/heap/pruneheap.c  | 2 +-
 src/backend/catalog/namespace.c  | 2 +-
 src/backend/catalog/pg_namespace.c   | 2 +-
 src/backend/catalog/storage.c| 2 +-
 src/backend/executor/execExpr.c  | 2 +-
 src/backend/executor/nodeIncrementalSort.c   | 2 +-
 src/backend/optimizer/path/allpaths.c| 2 +-
 src/backend/optimizer/plan/analyzejoins.c| 2 +-
 src/backend/partitioning/partbounds.c| 2 +-
 src/backend/postmaster/interrupt.c   | 2 +-
 src/backend/statistics/dependencies.c| 2 +-
 src/backend/statistics/extended_stats.c  | 2 +-
 src/backend/storage/buffer/bufmgr.c  | 2 +-
 src/backend/utils/adt/varlena.c  | 2 +-
 src/bin/pgbench/pgbench.c| 2 +-
 src/common/pg_lzcompress.c   | 2 +-
 src/interfaces/libpq/fe-connect.c| 2 +-
 src/test/modules/dummy_index_am/dummy_index_am.c | 2 +-
 19 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 8bb890438a..dbe3134b6b 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -1342,7 +1342,7 @@ fxid_in_cached_range(FullTransactionId fxid, const HeapCheckContext *ctx)
 }
 
 /*
- * Checks wheter a multitransaction ID is in the cached valid range, returning
+ * Checks whether a multitransaction ID is in the cached valid range, returning
  * the nature of the range violation, if any.
  */
 static XidBoundsViolation
diff --git a/src/backend/access/heap/pruneheap.c b/src/backend/access/heap/pruneheap.c
index bc510e2e9b..9e04bc712c 100644
--- a/src/backend/access/heap/pruneheap.c
+++ b/src/backend/access/heap/pruneheap.c
@@ -385,7 +385,7 @@ heap_page_prune(Relation relation, Buffer buffer,
 
 
 /*
- * Perform visiblity checks for heap pruning.
+ * Perform visibility checks for heap pruning.
  *
  * This is more complicated than just using GlobalVisTestIsRemovableXid()
  * because of old_snapshot_threshold. We only want to

Re: Stats collector's idx_blks_hit value is highly misleading in practice

2020-10-30 Thread Tomas Vondra


On Fri, Oct 16, 2020 at 03:35:51PM -0700, Peter Geoghegan wrote:

It occurs to mean that statistics collector stats such as
pg_statio_*_tables.idx_blks_hit are highly misleading in practice
because they fail to take account of the difference between internal
pages and leaf pages in B-Tree indexes. These two types of pages are
in fundamentally different categories, and I think that failing to
recognize that at the level of these system views makes them much less
useful. Somebody should probably write a patch that makes this
difference clear from the system views. Possibly by using some
generalized notion of "record" pages instead of leaf pages, and
"metadata" pages instead of internal pages. That would even work with
hash indexes, I think.

Consider the following example, which is based on a standard nbtree
index, but could work in almost the same way with other index access
methods:

We have a pgbench_accounts pkey after initialization by pgbench at
scale 1500. It has 409,837 leaf pages and 1,451 internal pages,
meaning that about one third of one percent of all pages in the index
are internal pages. Occasionally, with indexes on large text strings
we might notice that as many as 1% of all index pages are internal
pages, but that's very much on the high side. Generally speaking,
we're virtually guaranteed to have *all* internal pages in
shared_buffers once a steady state has been reached. Once the cache
warms up, point lookups (like the queries pgbench performs) will only
have to access one leaf page at most, which amounts to only one I/O at
most. (This asymmetry is the main reason why B-Trees are generally
very effective when buffered in a buffer cache.)

If we run the pgbench queries against this database/example index
we'll find that we have to access 4 index pages per query execution --
the root, two additional internal pages, plus a leaf page. Based on
the reasonable assumptions I'm making, 3 out of 4 of those pages will
be hits when steady state is reached with pgbench's SELECT-only
workload, regardless of how large shared_buffers is or how bloated the
index is (we only need 1451 buffers for that, and those are bound to
get hot quickly).

The overall effect is idx_blks_hit changes over time in a way that
makes no sense -- even to an expert. Let's say we start with this
entire 3213 MB pgbench index in shared_buffers. We should only get
increments in idx_blks_hit, never increments in idx_blks_read - that
much makes sense. If we then iteratively shrink shared_buffers (or
equivalently, make the index grow without adding a new level), the
proportion of page accesses that increment idx_blks_read (rather than
incrementing idx_blks_hit) goes up roughly linearly as misses increase
linearly - which also makes sense. But here is the silly part: we
cannot really have a hit rate of less than 75% if you compare
idx_blks_hit to idx_blks_read, unless and until we can barely even fit
1% of the index in memory (at which point it's hard to distinguish
from noise). So if we naively consume the current view we'll see a hit
rate that starts at 100%, and very slowly shrinks to 75%, which is
where we bottom out (more or less, roughly speaking). This behavior
seems pretty hard to defend to me.



Yeah. The behavior is technically correct, but it's not very useful for
practical purposes. And most people don't even realize it behaves like
this :-( It's possible to compensate for this effect and estimate the
actually "interesting" hit rate, but if we could have it directly that
would be great.


If somebody fixed this by putting internal pages into their own bucket
in the system view, then motivated users would quickly learn that
internal page stats aren't really useful -- they are only included for
completeness. They're such a small contributor to the overall hit rate
that they can simply be ignored completely. The thing that users ought
to focus on is leaf page hit rate. Now index hit rate (by which I mean
leaf page hit rate) actually makes sense. Note that Heroku promoted
simple heuristics like this for many years.

I suppose that a change like this could end up affecting other things,
such as EXPLAIN ANALYZE statistics. OTOH we only break out index pages
separately for bitmap scans at the moment, so maybe it could be fairly
well targeted. And, maybe this is unappealing given the current
statistics collector limitations. I'm not volunteering to work on it
right now, but it would be nice to fix this. Please don't wait for me
to do it.



It seems to me this should not be a particularly difficult patch in
principle, so suitable for new contributors. The main challenge would be
passing information about what page we're dealing with (internal/leaf)
to the place actually calling pgstat_count_buffer_(read|hit). That
happens in ReadBufferExtended, which just has no idea what page it's
dealing with. Not sure how to do that cleanly ...


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development,

Re: [PATCH] Add extra statistics to explain for Nested Loop

2020-10-30 Thread Tomas Vondra


Hello Ekaterina,

seems like an interesting and useful improvement. I did a quick review
of the patch - attached is a 0002 patch with a couple minor changes (the
0001 is just your v1 patch, to keep cfbot happy).

1) There's a couple changes to follow project code style (e.g. brackets
after "if" on a separate line, no brackets around single-line blocks,
etc.). I've reverted some unnecessary whitespace changes. Minor stuff.

2) I don't think InstrEndLoop needs to check if min_t == 0 before
initializing it in the first loop. It certainly has to be 0, no? Same
for min_tuples. I also suggest comment explaining that we don't have to
initialize the max values.

3) In ExplainNode, in the part processing per-worker stats, I think some
of the fields are incorrectly referencing planstate->instrument instead
of using the 'instrument' variable from WorkerInstrumentation.


In general, I agree with Andres this might add overhead to explain
analyze, although I doubt it's going to be measurable. But maybe try
doing some measurements for common and worst-cases.

I wonder if we should have another option EXPLAIN option enabling this.
I.e. by default we'd not collect/print this, and users would have to
pass some option to EXPLAIN. Or maybe we could tie this to VERBOSE?

Also, why print this only for nested loop, and not for all nodes with
(nloops > 1)? I see there was some discussion why checking nodeTag is
necessary to identify NL, but that's not my point.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From 903c203e7e5f298f927ade97ca03a0e129c31e75 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Sat, 31 Oct 2020 01:47:46 +0100
Subject: [PATCH 1/2] extra_statistics_v1

---
 src/backend/commands/explain.c| 131 ++
 src/backend/executor/instrument.c |  32 -
 src/include/executor/instrument.h |   4 +
 src/test/regress/expected/partition_prune.out |  36 ++---
 src/test/regress/expected/select_parallel.out |  12 +-
 src/test/regress/sql/partition_prune.sql  |   2 +
 6 files changed, 163 insertions(+), 54 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 43f9b01e83..72dae57ee2 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -1568,29 +1568,64 @@ ExplainNode(PlanState *planstate, List *ancestors,
double  startup_ms = 1000.0 * 
planstate->instrument->startup / nloops;
double  total_ms = 1000.0 * 
planstate->instrument->total / nloops;
double  rows = planstate->instrument->ntuples / nloops;
+   double  min_r = planstate->instrument->min_tuples;
+   double  max_r = planstate->instrument->max_tuples;
+   double  min_t_ms = 1000.0 * 
planstate->instrument->min_t;
+   double  max_t_ms = 1000.0 * 
planstate->instrument->max_t;
 
if (es->format == EXPLAIN_FORMAT_TEXT)
{
-   if (es->timing)
-   appendStringInfo(es->str,
-" (actual 
time=%.3f..%.3f rows=%.0f loops=%.0f)",
-startup_ms, 
total_ms, rows, nloops);
+   if (nodeTag(plan) == T_NestLoop) {
+   if (es->timing)
+   appendStringInfo(es->str,
+" 
(actual time=%.3f..%.3f min_time=%.3f max_time=%.3f min_rows=%.0f rows=%.0f 
max_rows=%.0f loops=%.0f)",
+
startup_ms, total_ms, min_t_ms, max_t_ms, min_r, rows, max_r, nloops);
+   else
+   appendStringInfo(es->str,
+" 
(actual min_rows=%.0f rows=%.0f max_rows=%.0f loops=%.0f)",
+min_r, 
rows, max_r, nloops);
+   }
else
-   appendStringInfo(es->str,
-" (actual 
rows=%.0f loops=%.0f)",
-rows, nloops);
+   {
+   if (es->timing)
+   appendStringInfo(es->str,
+" 
(actual time=%.3f..%.3f rows=%.0f loops=%.0f)",
+
startup_ms, total_ms, rows, nloops);
+   else
+

Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

2020-10-30 Thread Tom Lane

I wrote:
> * It's not apparent why, if ANALYZE's sample is all nulls, we wouldn't
> conclude stadistinct = 0 and thus arrive at the desired answer that
> way.  (Since we have a complaint, I'm guessing that ANALYZE might
> disbelieve its own result and stick in some larger stadistinct.  But
> then maybe that's where to fix this, not here.)

Oh, on second thought (and with some testing): ANALYZE *does* report
stadistinct = 0.  The real issue is that get_variable_numdistinct is
assuming it can use that value as meaning "stadistinct is unknown".
So maybe we should just fix that, probably by adding an explicit
bool flag for that condition.

BTW ... I've not looked at the callers, but now I'm wondering whether
get_variable_numdistinct ought to count NULL as one of the "distinct"
values.  In applications such as estimating the number of GROUP BY
groups, it seems like that would be correct.  There might be some
callers that don't want it though.

regards, tom lane

Re: Consistent error reporting for encryption/decryption in pgcrypto

2020-10-30 Thread Michael Paquier

On Fri, Oct 30, 2020 at 11:23:27PM +0100, Daniel Gustafsson wrote:
> > On 30 Oct 2020, at 16:54, Georgios Kokolatos  
> > wrote:
> 
> > I did notice that the cfbot [1] is not failing for this patch.
> 
> I assume you mean s/failing/passing/?  I noticed the red Travis and Appveyor
> runs, will fix over the weekend.  Thanks for the heads-up.

It seems to me that you are just missing to declare a new error number
in px.h, so I would suggest to just use -19.
--
Michael


signature.asc
Description: PGP signature

Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

2020-10-30 Thread Tom Lane

Tomas Vondra  writes:
> So I'm not sure I understand what would be the risk with this ... Tom,
> can you elaborate why you dislike the patch?

I've got a couple issues with the patch as presented.

* As you said, it creates discontinuous behavior for stanullfrac = 1.0
versus stanullfrac = 1.0 - epsilon.  That doesn't seem good.

* It's not apparent why, if ANALYZE's sample is all nulls, we wouldn't
conclude stadistinct = 0 and thus arrive at the desired answer that
way.  (Since we have a complaint, I'm guessing that ANALYZE might
disbelieve its own result and stick in some larger stadistinct.  But
then maybe that's where to fix this, not here.)

* We generally disbelieve edge-case estimates to begin with.  The
most obvious example is that we don't accept rowcount estimates that
are zero.  There are also some clamps that disbelieve selectivities
approaching 0.0 or 1.0 when estimating from a histogram, and I think
we have a couple other similar rules.  The reason for this is mainly
that taking such estimates at face value creates too much risk of
severe relative error due to imprecise or out-of-date statistics.
So a special case for stanullfrac = 1.0 seems to go directly against
that mindset.

I agree that there might be some gold to be mined in this area,
as we haven't thought particularly hard about high-stanullfrac
situations.  One idea is to figure what stanullfrac says about the
number of non-null rows, and clamp the get_variable_numdistinct
result to be not more than that.  But I still would not want to
trust an exact zero result.

regards, tom lane

Re: A couple questions about ordered-set aggregates

2020-10-30 Thread Tomas Vondra


On Sun, Oct 25, 2020 at 09:32:22PM -0400, Chapman Flack wrote:

I find I am allowed to create an ordered-set aggregate with a non-empty
direct argument list and no finisher function. Am I right in thinking
that's kind of nonsensical, as nothing will ever look at the direct args?

Also, the syntax summary shows PARALLEL = { SAFE | RESTRICTED | UNSAFE }
in the ordered-set syntax variant, since 9.6, though that variant
accepts no combine/serial/deserial functions, and there's also
a note saying "Partial (including parallel) aggregation is currently
not supported for ordered-set aggregates."

Does PARALLEL = { SAFE | RESTRICTED | UNSAFE } on an ordered-set
aggregate affect anything?



I may be missing something, but I believe PARALLEL = SAFE simply means
it can be executed in the parallel part of the plan. That does not
require support for partial aggregation - we simply don't support
passing partial results to the leader, hence combine/serial/deserial
functions are not needed.

Not sure about the direct arguments.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Should the function get_variable_numdistinct consider the case when stanullfrac is 1.0?

2020-10-30 Thread Tomas Vondra


On Mon, Oct 26, 2020 at 03:01:41PM +, Zhenghua Lyu wrote:

Hi,
   when group by multi-columns, it will multiply all the distinct values 
together, and if one column is all null,
   it also contributes 200 to the final estimate, and if the product is over 
the relation size, it will be clamp.

   So the the value of the agg rel size is not correct, and impacts the upper 
path's cost estimate, and do not
   give a good plan.

   I debug some other queries and find this issue, but not sure if this issue 
is the root cause of my problem,
   just open a thread here for discussion.


I think we understand what the issue is, in principle - if the column is
all-null, the ndistinct estimate 200 is bogus and when multiplied with
estimates for other Vars it may lead to over-estimates. That's a valid
issue, of course.

The question is whether the proposed patch is a good way to handle it.

I'm not sure what exactly are Tom's concerns, but I was worried relying
on (stanullfrac == 1.0) might result in abrupt changes in estimates when
that's a minor difference. For example if column is "almost NULL" we may
end up with either 1.0 or (1.0 - epsilon) and the question is what
estimates we end up with ...

Imagine a column that is 'almost NULL' - it's 99.99% NULL with a couple
non-NULL values. When the ANALYZE samples just NULLs, we'll end up with

  n_distinct = 0.0
  stanullfrac = 1.0

and we'll end up estimating either 200 (current estimate) or 1.0 (with
this patch). Now, what if stanullfrac is not 1.0 but a little bit less?
Say only 1 of the 30k rows is non-NULL? Well, in that case we'll not
even get to this condition, because we'll have

  n_distinct = -3.3318996e-05
  stanullfrac = 0.667

which means get_variable_numdistinct will return from either

if (stadistinct > 0.0)
return ...

or

if (stadistinct < 0.0)
return ...

and we'll never even get to that new condition. And by definition, the
estimate has to be very low, because otherwise we'd need more non-NULL
distinct rows in the sample, which makes it less likely to ever see
stanullfrac being 1.0. And even if we could get a bigger difference
(say, 50 vs. 1.0), but I don't think that's very different from the
current situation with 200 as a default.

Of course, using 1.0 in these cases may make us more vulnerable to
under-estimates for large tables. But for that to happen we must not
sample any of the non-NULL values, and if there are many distinct values
that's probably even less likely than sampling just one (when we end up
with an estimate of 1.0 already).

So I'm not sure I understand what would be the risk with this ... Tom,
can you elaborate why you dislike the patch?


BTW we already have a way to improve the estimate - setting n_distinct
for the column to 1.0 using ALTER TABLE should do the trick, I think.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: enable_incremental_sort changes query behavior

2020-10-30 Thread James Coleman

On Fri, Oct 30, 2020 at 5:03 PM Tomas Vondra
 wrote:
>
> On Fri, Oct 30, 2020 at 01:26:08PM -0400, James Coleman wrote:
> >On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
> > wrote:
> >>
> >> On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
> >> >On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
> >> > wrote:
> >> >> Can you please create an entry in the commitfest for this one so we
> >> >> don't lose track of it?
> >> >
> >>
> >> We're not too far from the next minor release, so I've been looking at
> >> this fix again and I intend to get it committed shortly (on Monday or
> >> so). Attached is a slightly modified version of the v4 patch that:
> >>
> >> (a) Removes comments about projections added to code that is not
> >> directly related to the fix. I'm not against adding such comments
> >> separately.
> >
> >Seems fine. Do you want to commit them at the same time (just a
> >separate commit)? Or have a separate patch? It seems a bit overkill to
> >start a new thread just for those.
> >
>
> Probably sometime later, as a separate patch. I haven't thought very
> much about those comments, it just seemed unrelated to the fix so I've
> removed that for now. Let's not bother with this until after the minor
> release.

For whatever it's worth, I didn't originally consider them unrelated;
the purpose was to explain why those places were safe relative to the
same kinds of issues under discussion here: whether an expr will be in
the target and when it might need to be added.

EIther way I've split it into a separate patch in this series.

> >> (b) Fixes comment in expected output of incremental_sort test.
> >
> >Thanks.
> >
> >> (c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
> >> not quite needed thanks to the "return" in the "if" branch. IMO this
> >> makes it more elegant.
> >
> >No objection.
> >
> >> I do have two questions about find_em_expr_usable_for_sorting_rel.
> >>
> >> (a) Where does the em_is_const check come from? The comment claims we
> >> should not be trying to sort by equivalence class members, but I can't
> >> convince myself it's actually true and I don't see it discussed in this
> >> thread.
> >
> >That comes from find_ec_member_for_tle which is called by
> >prepare_sort_from_pathkeys which we note in the comments contains the
> >set of rules we need to mirror.
> >
>
> Thanks for the pointer. I'll take a look.
>
> >> (b) In find_em_expr_for_rel, which was what was used before, the
> >> condition on relids was this:
> >>
> >>  if (bms_is_subset(em->em_relids, rel->relids) &&
> >>  !bms_is_empty(em->em_relids))
> >>  {
> >>  return em->em_expr;
> >>  }
> >>
> >> but here we're using only
> >>
> >>  if (!bms_is_subset(em->em_relids, rel->relids))
> >>  continue;
> >>
> >> Isn't this missing the second bms_is_empty condition?
> >
> >I definitely remember intentionally removing that condition. For one,
> >it's not present in prepare_sort_from_pathkeys. My memory is a bit
> >fuzzy on all of the whys, but isn't it the case that if we've already
> >excluded const expressions, that we must have vars (and therefore
> >rels) present, and therefore the relids bms must be non-empty?
> >Probably more importantly, we're going to check that an actual em expr
> >matches, which means the bms subset check is really just an
> >optimization to avoid unnecessarily scanning the exprs for equality.
> >Perhaps it's worth adding a comment saying as such?
> >
> >By the way, the fact that this is parallel to
> >prepare_sort_from_pathkeys is the reason the XXX comment is still
> >there about possibly needing to remove relabel types (since
> >prepare_sort_from_pathkeys does that). I don't think it's a hard
> >requirement: the worst case is that by not digging into relabel types
> >we're being unnecessarily strict.
> >
> >Both of these I can add if desired.
> >
>
> Yeah, it'd be good to explain the reasoning why it's fine to have the
> conditions like this. Thanks.

I was looking at this some more, and I'm still convinced that this is
correct, but I don't think a comment about it being an optimization
(though I suspect that that its major benefit), since it came from,
and must parallel, find_ec_member_for_tle, and there is no such
explanation there. I don't want to backfill reasoning onto that
decision, but I do think it's important to parallel it since it's
ultimately what is going to cause errors if we're out of sync with it.

As to why find_em_expr_for_rel is different, I think the answer is
that it's used by the FDW code, and it seems to me to make sense that
in that case we'd only send sort conditions to the foreign server if
the em actually contains rels.

The new series attached includes the primary fix for this issue, the
additional comments that could either go in at the same time or not,
and my patch with semi-related questions (which isn't intended to be
committed, but to keep track of the questions).

James
From

Re: making update/delete of inheritance trees scale better

2020-10-30 Thread Heikki Linnakangas


On 31/10/2020 00:12, Tom Lane wrote:

Heikki Linnakangas  writes:

 But if you do:



postgres=# explain verbose update tab set a = 1, b = 2;
 QUERY PLAN
-
   Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
 ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327
width=14)
   Output: 1, 2, ctid



The Modify Table will still fetch the old tuple, but in this case, it's
not really necessary, because both columns are overwritten.


Ah, that I believe.  Not sure it's a common enough case to spend cycles
looking for, though.

In any case, we still have to access the old tuple, don't we?
To lock it and update its t_ctid, whether or not we have use for
its user columns.  Maybe there's some gain from not having to
deconstruct the tuple, but it doesn't seem like it'd be much.


Yeah, you need to access the old tuple to update its t_ctid, but 
accessing it twice is still more expensive than accessing it once. Maybe 
you could optimize it somewhat by keeping the buffer pinned or 
something. Or push the responsibility down to the table AM, passing the 
AM only the modified columns, and let the AM figure out how to deal with 
the columns that were not modified, hoping that it can do something smart.


It's indeed not a big deal in usual cases. The test case I constructed 
was deliberately bad, and the slowdown was only about 10%. I'm OK with 
that, but if there's an easy way to avoid it, we should. (Seems like 
there isn't.)


- Heikki

Re: Consistent error reporting for encryption/decryption in pgcrypto

2020-10-30 Thread Daniel Gustafsson

> On 30 Oct 2020, at 16:54, Georgios Kokolatos  
> wrote:

> I did notice that the cfbot [1] is not failing for this patch.

I assume you mean s/failing/passing/?  I noticed the red Travis and Appveyor
runs, will fix over the weekend.  Thanks for the heads-up.

cheers ./daniel

Re: contrib/sslinfo cleanup and OpenSSL errorhandling

2020-10-30 Thread Daniel Gustafsson

> On 30 Oct 2020, at 00:40, Andres Freund  wrote:

> There's quite a few copies of this code that look exactly the same,
> except for the be_tls_get_* call. Do you see a way to have fewer copies
> of the same code?

There's really only two of the same, and two sets of those.  I tried some
variations but didn't really achieve anything that would strike the right
balance on the codegolf-to-readability scale.  Maybe others have a richer
imagination than me.

cheers ./daniel

Re: making update/delete of inheritance trees scale better

2020-10-30 Thread Tom Lane

Heikki Linnakangas  writes:
>  But if you do:

> postgres=# explain verbose update tab set a = 1, b = 2;
> QUERY PLAN 
> -
>   Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
> ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327 
> width=14)
>   Output: 1, 2, ctid

> The Modify Table will still fetch the old tuple, but in this case, it's 
> not really necessary, because both columns are overwritten.

Ah, that I believe.  Not sure it's a common enough case to spend cycles
looking for, though.

In any case, we still have to access the old tuple, don't we?
To lock it and update its t_ctid, whether or not we have use for
its user columns.  Maybe there's some gain from not having to
deconstruct the tuple, but it doesn't seem like it'd be much.

regards, tom lane

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas


On 30/10/2020 22:56, Tomas Vondra wrote:

I agree this design looks simpler. I'm a bit worried about serializing
the parsing like this, though. It's true the current approach (where the
first phase of parsing happens in the leader) has a similar issue, but I
think it would be easier to improve that in that design.

My plan was to parallelize the parsing roughly like this:

1) split the input buffer into smaller chunks

2) let workers scan the buffers and record positions of interesting
characters (delimiters, quotes, ...) and pass it back to the leader

3) use the information to actually parse the input data (we only need to
look at the interesting characters, skipping large parts of data)

4) pass the parsed chunks to workers, just like in the current patch


But maybe something like that would be possible even with the approach
you propose - we could have a special parse phase for processing each
buffer, where any worker could look for the special characters, record
the positions in a bitmap next to the buffer. So the whole sequence of
states would look something like this:

  EMPTY
  FILLED
  PARSED
  READY
  PROCESSING


I think it's even simpler than that. You don't need to communicate the 
"interesting positions" between processes, if the same worker takes care 
of the chunk through all states from FILLED to DONE.


You can build the bitmap of interesting positions immediately in FILLED 
state, independently of all previous blocks. Once you've built the 
bitmap, you need to wait for the information on where the first line 
starts, but presumably finding the interesting positions is the 
expensive part.



Of course, the question is whether parsing really is sufficiently
expensive for this to be worth it.


Yeah, I don't think it's worth it. Splitting the lines is pretty fast, I 
think we have many years to come before that becomes a bottleneck. But 
if it turns out I'm wrong and we need to implement that, the path is 
pretty straightforward.


- Heikki

Re: Extending range type operators to cope with elements

2020-10-30 Thread Tomas Vondra


Hi,

On Fri, Oct 30, 2020 at 04:01:27PM +, Georgios Kokolatos wrote:

Hi,

thank you for your contribution.

I did notice that the cfbot [1] is failing for this patch.
Please try to address the issues if you can for the upcoming commitfest.



I took a look at the patch today - the regression failure was trivial,
the expected output for one query was added to the wrong place, a couple
lines off the proper place. Attached is an updated version of the patch,
fixing that.

I also reviewed the code - it seems pretty clean and in line with the
surrounding code in rangetypes.c. Good job Esteban! I'll do a bit more
review next week, and I'll see if I can get it committed.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 
>From 414aefb911308b39ffefbd2190db66f2ee24c26c Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Fri, 30 Oct 2020 22:44:57 +0100
Subject: [PATCH] range-ext

---
 doc/src/sgml/func.sgml   | 140 ++
 src/backend/utils/adt/rangetypes.c   | 318 ++-
 src/include/catalog/pg_operator.dat  |  54 
 src/include/catalog/pg_proc.dat  |  31 +++
 src/include/utils/rangetypes.h   |   7 +
 src/test/regress/expected/rangetypes.out | 246 ++
 src/test/regress/sql/rangetypes.sql  |  42 +++
 7 files changed, 837 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index d8eee3a826..3498716c7e 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18016,6 +18016,34 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+anyrange  
anyelement
+boolean
+   
+   
+Is the range strictly left of the element?
+   
+   
+int8range(1,10)  100::int8
+t
+   
+  
+
+  
+   
+anyelement  
anyrange
+boolean
+   
+   
+Is the element strictly left of the range?
+   
+   
+10::int8  int8range(100,110)
+t
+   
+  
+
   

 anyrange  anyrange
@@ -18030,6 +18058,34 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+anyrange  
anyelement
+boolean
+   
+   
+Is the range strictly right of the element?
+   
+   
+int8range(50,60)  20::int8
+t
+   
+  
+
+  
+   
+anyelement  
anyrange
+boolean
+   
+   
+Is the element strictly right of the element?
+   
+   
+60::int8  int8range(20,30)
+t
+   
+  
+
   

 anyrange  
anyrange
@@ -18044,6 +18100,34 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+anyrange  
anyelement
+boolean
+   
+   
+Does the range not extend to the right of the element?
+   
+   
+int8range(1,20)  20::int8
+t
+   
+  
+
+  
+   
+anyelement  
anyrange
+boolean
+   
+   
+Does the element not extend to the right of the range?
+   
+   
+20::int8  int8range(18,20)
+t
+   
+  
+
   

 anyrange  
anyrange
@@ -18058,6 +18142,34 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+anyrange  
anyelement
+boolean
+   
+   
+Does the range not extend to the left of the element?
+   
+   
+int8range(7,20)  5::int8
+t
+   
+  
+
+  
+   
+anyelement  
anyrange
+boolean
+   
+   
+Does the element not extend to the left of the range?
+   
+   
+7::int8  int8range(5,10)
+t
+   
+  
+
   

 anyrange -|- anyrange
@@ -18072,6 +18184,34 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+anyrange -|- anyelement
+boolean
+   
+   
+Is the range adjacent to the element?
+   
+   
+numrange(1.1,2.2) -|- 2.2
+t
+   
+  
+
+  
+   
+anyelement -|- anyrange
+boolean
+   
+   
+Is the element adjacent to the range?
+   
+   
+2.2 -|- numrange(2.2,3.3,'(]')
+t
+   
+  
+
   

 anyrange + anyrange
diff --git a/src/backend/utils/adt/rangetypes.c 
b/src/backend/utils/adt/rangetypes.c
index 01ad8bc240..954f3d2a46 100644
--- a/src/backend/utils/adt/rangetypes.c
+++ b/src/backend/utils/adt/rangetypes.c
@@ -536,6 +536,323 @@ range_contains_elem(PG_FUNCTION_ARGS)
PG_RETURN_BOOL(range_contains_elem_internal(typcache, r, val));
 }
 
+/* strictly left of element? (internal version) */
+bool
+range_before_elem_internal(TypeCacheEntry *typcache, RangeType *r, Datum val)
+{
+   RangeBound  lower,
+

Re: making update/delete of inheritance trees scale better

2020-10-30 Thread Heikki Linnakangas


On 30/10/2020 23:10, Tom Lane wrote:

Heikki Linnakangas  writes:

I also did some quick performance testing with a simple update designed
as a worst-case scenario:



vacuum tab; update tab set b = b, a = a;



In this case, the patch fetches the old tuple, but it wouldn't really
need to, because all the columns are updated. Could we optimize that
special case?


I'm not following.  We need to read the old values of a and b for
the update source expressions, no?

(One could imagine realizing that this is a no-op update, but that
seems quite distinct from the problem at hand, and probably not
worth the cycles.)


Ah, no, that's not what I meant. You do need to read the old values to 
calculate the new ones, but if you update all the columns or if you 
happened to read all the old values as part of the scan, then you don't 
need to fetch the old tuple in the ModifyTable node.


Let's try better example. Currently with the patch:

postgres=# explain verbose update tab set a = 1;
   QUERY PLAN 


-
 Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
   ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327 
width=10)

 Output: 1, ctid

The Modify Table node will fetch the old tuple to get the value for 'b', 
which is unchanged. But if you do:


postgres=# explain verbose update tab set a = 1, b = 2;
   QUERY PLAN 


-
 Update on public.tab  (cost=0.00..269603.27 rows=0 width=0)
   ->  Seq Scan on public.tab  (cost=0.00..269603.27 rows=10028327 
width=14)

 Output: 1, 2, ctid

The Modify Table will still fetch the old tuple, but in this case, it's 
not really necessary, because both columns are overwritten.


- Heikki

Re: making update/delete of inheritance trees scale better

2020-10-30 Thread Tom Lane

Heikki Linnakangas  writes:
> I also did some quick performance testing with a simple update designed 
> as a worst-case scenario:

> vacuum tab; update tab set b = b, a = a;

> In this case, the patch fetches the old tuple, but it wouldn't really 
> need to, because all the columns are updated. Could we optimize that 
> special case?

I'm not following.  We need to read the old values of a and b for
the update source expressions, no?

(One could imagine realizing that this is a no-op update, but that
seems quite distinct from the problem at hand, and probably not
worth the cycles.)

regards, tom lane

Re: enable_incremental_sort changes query behavior

2020-10-30 Thread Tomas Vondra

On Fri, Oct 30, 2020 at 01:26:08PM -0400, James Coleman wrote:

On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
 wrote:

On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
>On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
> wrote:
>> Can you please create an entry in the commitfest for this one so we
>> don't lose track of it?
>

We're not too far from the next minor release, so I've been looking at
this fix again and I intend to get it committed shortly (on Monday or
so). Attached is a slightly modified version of the v4 patch that:

(a) Removes comments about projections added to code that is not
directly related to the fix. I'm not against adding such comments
separately.

Seems fine. Do you want to commit them at the same time (just a
separate commit)? Or have a separate patch? It seems a bit overkill to
start a new thread just for those.

Probably sometime later, as a separate patch. I haven't thought very
much about those comments, it just seemed unrelated to the fix so I've
removed that for now. Let's not bother with this until after the minor
release.

(b) Fixes comment in expected output of incremental_sort test.

Thanks.

(c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
not quite needed thanks to the "return" in the "if" branch. IMO this
makes it more elegant.

No objection.

I do have two questions about find_em_expr_usable_for_sorting_rel.

(a) Where does the em_is_const check come from? The comment claims we
should not be trying to sort by equivalence class members, but I can't
convince myself it's actually true and I don't see it discussed in this
thread.

That comes from find_ec_member_for_tle which is called by
prepare_sort_from_pathkeys which we note in the comments contains the
set of rules we need to mirror.

Thanks for the pointer. I'll take a look.

(b) In find_em_expr_for_rel, which was what was used before, the
condition on relids was this:

 if (bms_is_subset(em->em_relids, rel->relids) &&
 !bms_is_empty(em->em_relids))
 {
 return em->em_expr;
 }

but here we're using only

 if (!bms_is_subset(em->em_relids, rel->relids))
 continue;

Isn't this missing the second bms_is_empty condition?

I definitely remember intentionally removing that condition. For one,
it's not present in prepare_sort_from_pathkeys. My memory is a bit
fuzzy on all of the whys, but isn't it the case that if we've already
excluded const expressions, that we must have vars (and therefore
rels) present, and therefore the relids bms must be non-empty?
Probably more importantly, we're going to check that an actual em expr
matches, which means the bms subset check is really just an
optimization to avoid unnecessarily scanning the exprs for equality.
Perhaps it's worth adding a comment saying as such?

By the way, the fact that this is parallel to
prepare_sort_from_pathkeys is the reason the XXX comment is still
there about possibly needing to remove relabel types (since
prepare_sort_from_pathkeys does that). I don't think it's a hard
requirement: the worst case is that by not digging into relabel types
we're being unnecessarily strict.

Both of these I can add if desired.

Yeah, it'd be good to explain the reasoning why it's fine to have the
conditions like this. Thanks.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: libpq compression

2020-10-30 Thread Andres Freund

Hi,

On 2020-10-29 16:45:58 +0300, Konstantin Knizhnik wrote:
> > - What does " and it is up to the client whether to continue work
> >without compression or report error" actually mean for a libpq parameter?
> It can not happen.
> The client request from server use of compressed protocol only if
> "compression=XXX" was specified in connection string.
> But XXX should be supported by client, otherwise this request will be
> rejected.
> So supported protocol string sent by client can never be empty.

I think it's pretty important for features like this to be able to fail
softly when the feature is not available on the other side. Otherwise a
lot of applications need to have unnecessary version dependencies coded
into them.


> > - What is the point of the "streaming mode" reference?
>
> There are ways of performing compression:
> - block mode when each block is individually compressed (compressor stores
> dictionary in each compressed blocked)
> - stream mode
> Block mode allows to independently decompress each page. It is good for
> implementing page or field compression (as pglz is used to compress toast
> values).
> But it is not efficient for compressing client-server protocol commands.
> It seems to me to be important to explain that libpq is using stream mode
> and why there is no pglz compressor

To me that seems like unnecessary detail in the user oriented parts of
the docs at least.


> > Why are compression methods identified by one byte identifiers? That
> > seems unnecessarily small, given this is commonly a once-per-connection
> > action?
>
> It is mostly for simplicity of implementation: it is always simple to work
> with fixed size messages (or with array of chars rather than array of
> strings).
> And I do not think that it can somehow decrease flexibility: this one-letter
> algorihth codes are not visible for user. And I do not think that we
> sometime will support more than 127 (or even 64 different compression
> algorithms).

It's pretty darn trivial to have a variable width protocol message,
given that we have all the tooling for that. Having a variable length
descriptor allows us to extend the compression identifier with e.g. the
level without needing to change the protocol. E.g. zstd:9 or
zstd:level=9 or such.

I suggest using a variable length string as the identifier, and split it
at the first : for the lookup, and pass the rest to the compression
method.


> > The protocol sounds to me like there's no way to enable/disable
> > compression in an existing connection. To me it seems better to have an
> > explicit, client initiated, request to use a specific method of
> > compression (including none). That allows to enable compression for bulk
> > work, and disable it in other cases (e.g. for security sensitive
> > content, or for unlikely to compress well content).
>
> It will significantly complicate implementation (because of buffering at
> different levels).

Really? We need to be able to ensure a message is flushed out to the
network at precise moments - otherwise a lot of stuff will break.


> Also it is not clear to me who and how will control enabling/disabling
> compression in this case?
> I can imagine that "\copy" should trigger compression. But what about
> (server side) "copy"  command?

The client could still trigger that. I don't think it should ever be
server controlled.


> And concerning security risks... In most cases such problem is not relevant
> at all because both client and server are located within single reliable
> network.

The trend is towards never trusting the network, even if internal, not
the opposite.


> It if security of communication really matters, you should not switch
> compression in all cases (including COPY and other bulk data transfer).
> It is very strange idea to let client to decide which data is "security
> sensitive" and which not.

Huh?


> > I think that would also make cross-version handling easier, because a
> > newer client driver can send the compression request and handle the
> > error, without needing to reconnect or such.
> >
> > Most importantly, I think such a design is basically a necessity to make
> > connection poolers to work in a sensible way.
>
> I do not completely understand the problem with connection pooler.
> Right now developers of Yandex Odyssey are trying to support libpq
> compression in their pooler.
> If them will be faced with some problems, I will definitely address
> them.

It makes poolers a lot more expensive if they have to decompress and
then recompress again. It'd be awesome if even the decompression could
be avoided in at least some cases, but compression is the really
expensive part. So if a client connects to the pooler with
compression=zlib and an existing server connection is used, the pooler
should be able to switch the existing connection to zlib.



> But if you think that it is so important, I will try to implement it. Many
> questions arise in this case: which side should control

Re: Assertion failure when ATTACH partition followed by CREATE PARTITION.

2020-10-30 Thread Tom Lane

Justin Pryzby  writes:
> Not sure if Tom saw this yet.

Indeed, I'd not been paying attention.  Fix looks good, pushed.

regards, tom lane

Re: Parallel copy

2020-10-30 Thread Tomas Vondra


On Fri, Oct 30, 2020 at 06:41:41PM +0200, Heikki Linnakangas wrote:

On 30/10/2020 18:36, Heikki Linnakangas wrote:

I find this design to be very complicated. Why does the line-boundary
information need to be in shared memory? I think this would be much
simpler if each worker grabbed a fixed-size block of raw data, and
processed that.

In your patch, the leader process scans the input to find out where one
line ends and another begins, and because of that decision, the leader
needs to make the line boundaries available in shared memory, for the
worker processes. If we moved that responsibility to the worker
processes, you wouldn't need to keep the line boundaries in shared
memory. A worker would only need to pass enough state to the next worker
to tell it where to start scanning the next block.


Here's a high-level sketch of how I'm imagining this to work:

The shared memory structure consists of a queue of blocks, arranged as 
a ring buffer. Each block is of fixed size, and contains 64 kB of 
data, and a few fields for coordination:


typedef struct
{
   /* Current state of the block */
   pg_atomic_uint32 state;

   /* starting offset of first line within the block */
   int startpos;

   chardata[64 kB];
} ParallelCopyDataBlock;

Where state is one of:

enum {
 FREE,   /* buffer is empty */
 FILLED, /* leader has filled the buffer with raw data */
 READY,  /* start pos has been filled in, but no worker process 
has claimed the block yet */

 PROCESSING, /* worker has claimed the block, and is processing it */
}

State changes FREE -> FILLED -> READY -> PROCESSING -> FREE. As the 
COPY progresses, the ring of blocks will always look something like 
this:


blk 0 startpos  0: PROCESSING [worker 1]
blk 1 startpos 12: PROCESSING [worker 2]
blk 2 startpos 10: READY
blk 3 starptos  -: FILLED
blk 4 startpos  -: FILLED
blk 5 starptos  -: FILLED
blk 6 startpos  -: FREE
blk 7 startpos  -: FREE

Typically, each worker process is busy processing a block. After the 
blocks being processed, there is one block in READY state, and after 
that, blocks in FILLED state.


Leader process:

The leader process is simple. It picks the next FREE buffer, fills it 
with raw data from the file, and marks it as FILLED. If no buffers are 
FREE, wait.


Worker process:

1. Claim next READY block from queue, by changing its state to
  PROCESSING. If the next block is not READY yet, wait until it is.

2. Start scanning the block from 'startpos', finding end-of-line
  markers. (in CSV mode, need to track when we're in-quotes).

3. When you reach the end of the block, if the last line continues to
  next block, wait for the next block to become FILLED. Peek into the
  next block, and copy the remaining part of the split line to a local
  buffer, and set the 'startpos' on the next block to point to the end
  of the split line. Mark the next block as READY.

4. Process all the lines in the block, call input functions, insert
  rows.

5. Mark the block as DONE.

In this design, you don't need to keep line boundaries in shared 
memory, because each worker process is responsible for finding the 
line boundaries of its own block.


There's a point of serialization here, in that the next block cannot 
be processed, until the worker working on the previous block has 
finished scanning the EOLs, and set the starting position on the next 
block, putting it in READY state. That's not very different from your 
patch, where you had a similar point of serialization because the 
leader scanned the EOLs, but I think the coordination between 
processes is simpler here.




I agree this design looks simpler. I'm a bit worried about serializing
the parsing like this, though. It's true the current approach (where the
first phase of parsing happens in the leader) has a similar issue, but I
think it would be easier to improve that in that design.

My plan was to parallelize the parsing roughly like this:

1) split the input buffer into smaller chunks

2) let workers scan the buffers and record positions of interesting
characters (delimiters, quotes, ...) and pass it back to the leader

3) use the information to actually parse the input data (we only need to
look at the interesting characters, skipping large parts of data)

4) pass the parsed chunks to workers, just like in the current patch


But maybe something like that would be possible even with the approach
you propose - we could have a special parse phase for processing each
buffer, where any worker could look for the special characters, record
the positions in a bitmap next to the buffer. So the whole sequence of
states would look something like this:

EMPTY
FILLED
PARSED
READY
PROCESSING

Of course, the question is whether parsing really is sufficiently
expensive for this to be worth it.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Parallel copy

2020-10-30 Thread Tomas Vondra


Hi,

I've done a bit more testing today, and I think the parsing is busted in
some way. Consider this:

test=# create extension random;
CREATE EXTENSION

test=# create table t (a text);

CREATE TABLE

test=# insert into t select random_string(random_int(10, 256*1024)) from generate_series(1,1);

INSERT 0 1

test=# copy t to '/mnt/data/t.csv';

COPY 1

test=# truncate t;

TRUNCATE TABLE

test=# copy t from '/mnt/data/t.csv';

COPY 1

test=# truncate t;

TRUNCATE TABLE

test=# copy t from '/mnt/data/t.csv' with (parallel 2);

ERROR:  invalid byte sequence for encoding "UTF8": 0x00
CONTEXT:  COPY t, line 485: 
"m&\nh%_a"%r]>qtCl:Q5ltvF~;2oS6@HB>F>og,bD$Lw'nZY\tYl#BH\t{(j~ryoZ08"SGU~.}8CcTRk1\ts$@U3szCC+U1U3i@P..."
parallel worker


The functions come from an extension I use to generate random data, I've
pushed it to github [1]. The random_string() generates a random string
with ASCII characters, symbols and a couple special characters (\r\n\t).
The intent was to try loading data where a fields may span multiple 64kB
blocks and may contain newlines etc.

The non-parallel copy works fine, the parallel one fails. I haven't
investigated the details, but I guess it gets confused about where a
string starts/end, or something like that.


[1] https://github.com/tvondra/random


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: making update/delete of inheritance trees scale better

2020-10-30 Thread Heikki Linnakangas


On 29/10/2020 15:03, Amit Langote wrote:

On Sun, Oct 4, 2020 at 11:44 AM Amit Langote  wrote:

On Fri, Sep 11, 2020 at 7:20 PM Amit Langote  wrote:

Here are the commit messages of the attached patches:

[PATCH v3 1/3] Overhaul how updates compute a new tuple


I tried to assess the performance impact of this rejiggering of how
updates are performed.  As to why one may think there may be a
negative impact, consider that ExecModifyTable() now has to perform an
extra fetch of the tuple being updated for filling in the unchanged
values of the update's NEW tuple, because the plan itself will only
produce the values of changed columns.


...

It seems clear that the saving on the target list computation overhead
that we get from the patch is hard to ignore in this case.

I've attached updated patches, because as Michael pointed out, the
previous version no longer applies.


Rebased over the recent executor result relation related commits.


I also did some quick performance testing with a simple update designed 
as a worst-case scenario:


create unlogged table tab (a int4, b int4);
insert into tab select g, g from generate_series(1, 1000) g;

\timing on
vacuum tab; update tab set b = b, a = a;

Without the patch, the update takes about 7.3 s on my laptop, and about 
8.3 s with the patch.


In this case, the patch fetches the old tuple, but it wouldn't really 
need to, because all the columns are updated. Could we optimize that 
special case?


In principle, it would sometimes also make sense to add the old columns 
to the targetlist like we used to, to avoid the fetch. But estimating 
when that's cheaper would be complicated.


Despite that, I like this new approach a lot. It's certainly much nicer 
than inheritance_planner().


- Heikki

Re: Error on failed COMMIT

2020-10-30 Thread Georgios Kokolatos

Hi,

thank you for your contribution.

I did notice that the cfbot [1] is failing for this patch.
Please try to address the issue for the upcoming commitfest.

Cheers,
//Georgios

[1] http://cfbot.cputube.org/dave-cramer.html

Re: pg_upgrade analyze script

2020-10-30 Thread Georgios Kokolatos

Hi,

thank you for you contribution.

I did notice that the cfbot [1] is failing for this patch.
Please try to address the issues for the upcoming Commitfest.

Cheers,
//Georgios

[1] http://cfbot.cputube.org/magnus-hagander.html

Re: Extending range type operators to cope with elements

2020-10-30 Thread Georgios Kokolatos

Hi,

thank you for your contribution.

I did notice that the cfbot [1] is failing for this patch.
Please try to address the issues if you can for the upcoming commitfest.

Cheers,
//Georgios

[1] http://cfbot.cputube.org/esteban-zimanyi.html

Re: Assertion failure when ATTACH partition followed by CREATE PARTITION.

2020-10-30 Thread Justin Pryzby

Not sure if Tom saw this yet.

On Tue, Oct 27, 2020 at 10:56:01AM +0530, Amul Sul wrote:
> On Mon, Oct 19, 2020 at 4:58 PM Amul Sul  wrote:
> >
> > Hi,
> >
> > Assertion added in commits 6b2c4e59d016 is failing with following test:
> >
> > CREATE TABLE sales
> > (
> >   prod_id   int,
> >   prod_quantity int,
> >   sold_monthdate
> > ) PARTITION BY RANGE(sold_month);
> >
> > CREATE TABLE public.sales_p1 PARTITION OF public.sales
> > FOR VALUES FROM (MINVALUE) TO ('2019-02-15');
> >
> > CREATE TABLE sales_p2(like sales including all);
> > ALTER TABLE sales ATTACH PARTITION sales_p2
> > FOR VALUES FROM ('2019-02-15') TO ('2019-03-15');
> >
> > CREATE TABLE fail PARTITION OF public.sales
> > FOR VALUES FROM ('2019-01-15') TO ('2019-02-15');
> 
> The reported issue has nothing to do with the ATTACH PARTITION stmt this can
> also be reproducible with the following CREATE stmts:
> 
> CREATE TABLE sales
> (
>   prod_id   int,
>   prod_quantity int,
>   sold_monthdate
> ) PARTITION BY RANGE(sold_month);
> 
> CREATE TABLE sales_p1 PARTITION OF sales
> FOR VALUES FROM (MINVALUE) TO ('2019-02-15');
> 
> CREATE TABLE sales_p2 PARTITION OF sales
> FOR VALUES FROM ('2019-02-15') TO ('2019-03-15');
> 
> CREATE TABLE fail PARTITION OF sales
> FOR VALUES FROM ('2019-01-15') TO ('2019-02-15');
> 
> AFAICU, the assert assumption is not correct. In the attached patch, I have
> removed that assert and the related comment.  Also, minor adjustments to the
> code fetching correct datum.
> 
> Regards,
> Amul
> 
> >
> > Here is the backtrace:
> >
> > (gdb) bt
> > #0  0x7fa37277 in raise () from /lib64/libc.so.6
> > #1  0x7fa373334968 in abort () from /lib64/libc.so.6
> > #2  0x00abecdc in ExceptionalCondition (conditionName=0xc5de6d
> > "cmpval >= 0", errorType=0xc5cf03 "FailedAssertion", fileName=0xc5d03e
> > "partbounds.c", lineNumber=3092) at assert.c:69
> > #3  0x0086189c in check_new_partition_bound
> > (relname=0x7fff225f5ef0 "fail", parent=0x7fa3744868a0, spec=0x2e9,
> > pstate=0x2e905e8) at partbounds.c:3092
> > #4  0x006b44dc in DefineRelation (stmt=0x2e83198, relkind=114
> > 'r', ownerId=10, typaddress=0x0, queryString=0x2dc07c0 "CREATE TABLE
> > fail PARTITION OF public.sales \nFOR VALUES FROM ('2019-01-15') TO
> > ('2019-02-15');") at tablecmds.c:1011
> > #5  0x00941430 in ProcessUtilitySlow (pstate=0x2e83080,
> > pstmt=0x2dc19b8, queryString=0x2dc07c0 "CREATE TABLE fail PARTITION OF
> > public.sales \nFOR VALUES FROM ('2019-01-15') TO ('2019-02-15');",
> > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> > dest=0x2dc1aa8, qc=0x7fff225f67c0) at utility.c:1163
> > #6  0x0094123e in standard_ProcessUtility (pstmt=0x2dc19b8,
> > queryString=0x2dc07c0 "CREATE TABLE fail PARTITION OF public.s ales
> > \nFOR VALUES FROM ('2019-01-15') TO ('2019-02-15');",
> > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> > dest=0x2dc1 aa8, qc=0x7fff225f67c0) at utility.c:1071
> > #7  0x00940349 in ProcessUtility (pstmt=0x2dc19b8,
> > queryString=0x2dc07c0 "CREATE TABLE fail PARTITION OF public.sales
> > \nFO R VALUES FROM ('2019-01-15') TO ('2019-02-15');",
> > context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
> > dest=0x2dc1aa8, qc=0 x7fff225f67c0) at utility.c:524
> > #8  0x0093f163 in PortalRunUtility (portal=0x2e22ab0,
> > pstmt=0x2dc19b8, isTopLevel=true, setHoldSnapshot=false, dest=0x2dc1
> > aa8, qc=0x7fff225f67c0) at pquery.c:1159
> > #9  0x0093f380 in PortalRunMulti (portal=0x2e22ab0,
> > isTopLevel=true, setHoldSnapshot=false, dest=0x2dc1aa8, altdest=0x2dc1
> > aa8, qc=0x7fff225f67c0) at pquery.c:1305
> > #10 0x0093e882 in PortalRun (portal=0x2e22ab0,
> > count=9223372036854775807, isTopLevel=true, run_once=true,
> > dest=0x2dc1aa8, altdest=0x2dc1aa8, qc=0x7fff225f67c0) at pquery.c:779
> > #11 0x009389e8 in exec_simple_query (query_string=0x2dc07c0
> > "CREATE TABLE fail PARTITION OF public.sales \nFOR VALUES FROM
> > ('2019-01-15') TO ('2019-02-15');") at postgres.c:1239
> >
> > Regards,
> > Amul Sul



-- 
Justin Pryzby
System Administrator
Telsasoft
+1-952-707-8581

Re: Consistent error reporting for encryption/decryption in pgcrypto

2020-10-30 Thread Georgios Kokolatos

Hi,

thank you for your contribution.

I did notice that the cfbot [1] is not failing for this patch.

Cheers,
//Georgios

[1] http://cfbot.cputube.org/daniel-gustafsson.html

Re: shared-memory based stats collector

2020-10-30 Thread Georgios Kokolatos

Hi,

I noticed that according to the cfbot this patch no longer applies.

As it is registered in the upcoming commitfest, it would be appreciated
if you could rebase it.

Cheers,
//Georgios

Re: Yet another fast GiST build

2020-10-30 Thread Andrey Borodin

Thanks!

> 27 окт. 2020 г., в 16:43, Heikki Linnakangas  написал(а):
> 
> gbt_ts_cmp(), gbt_time_cmp_sort() and gbt_date_cmp_sort() still have the 
> above issue, they still compare "upper" for no good reason.
Fixed.

>> +static Datum
>> +gbt_bit_abbrev_convert(Datum original, SortSupport ssup)
>> +{
>> +   return (Datum) 0;
>> +}
>> +
>> +static int
>> +gbt_bit_cmp_abbrev(Datum z1, Datum z2, SortSupport ssup)
>> +{
>> +   return 0;
>> +}
> 
> If an abbreviated key is not useful, just don't define abbrev functions and 
> don't set SortSupport->abbrev_converter in the first place.
Fixed.
> 
>> static bool
>> gbt_inet_abbrev_abort(int memtupcount, SortSupport ssup)
>> {
>> #if SIZEOF_DATUM == 8
>>  return false;
>> #else
>>  return true;
>> #endif
>> }
> 
> Better to not set the 'abbrev_converter' function in the first place. Or 
> would it be better to cast the float8 to float4 if SIZEOF_DATUM == 4?
Ok, now for 4 bytes Datum we do return (Datum) Float4GetDatum((float) z);

How do you think, should this patch and patch with pageinspect GiST functions 
be registered on commitfest?

Thanks!

Best regards, Andrey Borodin.


v3-0001-Sortsupport-for-sorting-GiST-build-for-gist_btree.patch
Description: Binary data

document deviation from standard on REVOKE ROLE

2020-10-30 Thread John Naylor

This is the other doc fix as suggested in
https://www.postgresql.org/message-id/20201027220555.GS4951%40momjian.us

There is already a compatibility section, so put there.
-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v1-doc-fix-revoke-role.patch
Description: Binary data

Re: enable_incremental_sort changes query behavior

2020-10-30 Thread James Coleman

On Thu, Oct 29, 2020 at 6:06 PM Tomas Vondra
 wrote:
>
> On Tue, Oct 06, 2020 at 09:37:31AM -0400, James Coleman wrote:
> >On Tue, Oct 6, 2020 at 9:28 AM Jaime Casanova
> > wrote:
> >> Can you please create an entry in the commitfest for this one so we
> >> don't lose track of it?
> >
>
> We're not too far from the next minor release, so I've been looking at
> this fix again and I intend to get it committed shortly (on Monday or
> so). Attached is a slightly modified version of the v4 patch that:
>
> (a) Removes comments about projections added to code that is not
> directly related to the fix. I'm not against adding such comments
> separately.

Seems fine. Do you want to commit them at the same time (just a
separate commit)? Or have a separate patch? It seems a bit overkill to
start a new thread just for those.

> (b) Fixes comment in expected output of incremental_sort test.

Thanks.

> (c) Removes else branch from find_em_expr_usable_for_sorting_rel. It's
> not quite needed thanks to the "return" in the "if" branch. IMO this
> makes it more elegant.

No objection.

> I do have two questions about find_em_expr_usable_for_sorting_rel.
>
> (a) Where does the em_is_const check come from? The comment claims we
> should not be trying to sort by equivalence class members, but I can't
> convince myself it's actually true and I don't see it discussed in this
> thread.

That comes from find_ec_member_for_tle which is called by
prepare_sort_from_pathkeys which we note in the comments contains the
set of rules we need to mirror.

> (b) In find_em_expr_for_rel, which was what was used before, the
> condition on relids was this:
>
>  if (bms_is_subset(em->em_relids, rel->relids) &&
>  !bms_is_empty(em->em_relids))
>  {
>  return em->em_expr;
>  }
>
> but here we're using only
>
>  if (!bms_is_subset(em->em_relids, rel->relids))
>  continue;
>
> Isn't this missing the second bms_is_empty condition?

I definitely remember intentionally removing that condition. For one,
it's not present in prepare_sort_from_pathkeys. My memory is a bit
fuzzy on all of the whys, but isn't it the case that if we've already
excluded const expressions, that we must have vars (and therefore
rels) present, and therefore the relids bms must be non-empty?
Probably more importantly, we're going to check that an actual em expr
matches, which means the bms subset check is really just an
optimization to avoid unnecessarily scanning the exprs for equality.
Perhaps it's worth adding a comment saying as such?

By the way, the fact that this is parallel to
prepare_sort_from_pathkeys is the reason the XXX comment is still
there about possibly needing to remove relabel types (since
prepare_sort_from_pathkeys does that). I don't think it's a hard
requirement: the worst case is that by not digging into relabel types
we're being unnecessarily strict.

Both of these I can add if desired.

James

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread John Naylor

On Fri, Oct 30, 2020 at 12:48 PM Tom Lane  wrote:

> John Naylor  writes:
> > Okay, along those lines here's a patch using "this view" in a new
> paragraph
> > for simplicity.
>
> Basically OK with me, but ...
>
> 
> It seems fairly weird to use a nonspecific reference first and then a
> specific one.  That is, I'd expect to read "The pg_settings view ..."
> and then "This view ...", not the other way around.  So we could
> put this para second, or put it first but make this para say
> "The pg_settings view ..." while the existing text gets reduced to
> "This view ...".
>
> Or just make them both say "This view ..." so we don't have to have
> this discussion again the next time somebody wants to add a para here.
> 
>

Okay, how's this?

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v4-doc-pg-settings-no-custom.patch
Description: Binary data

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas


On 30/10/2020 18:36, Heikki Linnakangas wrote:

Whether the leader process finds the EOLs or the worker processes, it's
pretty clear that it needs to be done ASAP, for a chunk at a time,
because that cannot be done in parallel. I think some refactoring in
CopyReadLine() and friends would be in order. It probably would be
faster, or at least not slower, to find all the EOLs in a block in one
tight loop, even when parallel copy is not used.


Something like the attached. It passes the regression tests, but it's 
quite incomplete. It's missing handing of "\." as end-of-file marker, 
and I haven't tested encoding conversions at all, for starters. Quick 
testing suggests that this a little bit faster than the current code, 
but the difference is small; I had to use a "WHERE false" option to 
really see the difference.


The crucial thing here is that there's a new function, ParseLinesText(), 
to find all end-of-line characters in a buffer in one go. In this patch, 
it's used against 'raw_buf', but with parallel copy, you could point it 
at a block in shared memory instead.


- Heikki
>From af3be3bd4e77b66f4605393617da0d15ec21e15b Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Fri, 30 Oct 2020 18:51:10 +0200
Subject: [PATCH 1/1] WIP: Find all line-endings in COPY in chunks.

Refactor CopyReadLines and friends to find all the line-endings in the
buffer in one go, before splitting the lines further.
---
 src/backend/commands/copy.c | 972 
 1 file changed, 536 insertions(+), 436 deletions(-)

diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index 36ddcdccdb8..fbf11cb2550 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -95,6 +95,18 @@ typedef enum CopyInsertMethod
 	CIM_MULTI_CONDITIONAL		/* use table_multi_insert only if valid */
 } CopyInsertMethod;
 
+
+/*
+ * Represents the heap insert method to be used during COPY FROM.
+ */
+typedef enum ParseLinesState
+{
+	PLSTATE_NORMAL,
+	PLSTATE_ESCAPE,
+	PLSTATE_IN_QUOTE,
+	PLSTATE_ESCAPE_IN_QUOTE
+} ParseLinesState;
+
 /*
  * This struct contains all the state variables used throughout a COPY
  * operation. For simplicity, we use the same struct for all variants of COPY,
@@ -110,6 +122,24 @@ typedef enum CopyInsertMethod
  * it's faster to make useless comparisons to trailing bytes than it is to
  * invoke pg_encoding_mblen() to skip over them. encoding_embeds_ascii is true
  * when we have to do it the hard way.
+ *
+ * COPY FROM buffers:
+ *
+ * In COPY FROM processing, there are three levels of buffers:
+ *
+ * raw_buf   - contains raw data read from file/client
+ * converted_buf - contains the data in 'raw_buf', but converted to server encoding
+ * line_buf  - contains "current" line of data, without the end-of-line char
+ *
+ *
+ * In simple cases, no encoding conversion are needed, and converted_buf always
+ * points to raw_buf. If the encoding_embeds_ascii==true, encoding conversion is
+ * performed on the raw buffer, before splitting it to lines. converted_buf contains
+ * the converted version in that case.
+ *
+ * Usually, line_buf pointer points in the middle of converted_buf, but when a line
+ * is split by a raw-buffer boundary, the incomplete line is reassembled
+ * in a separate buffer (split_line_buf), and line_buf points to that.
  */
 typedef struct CopyStateData
 {
@@ -205,16 +235,34 @@ typedef struct CopyStateData
 	char	  **raw_fields;
 
 	/*
-	 * Similarly, line_buf holds the whole input line being processed. The
+	 * These variables are used to track state of parsing raw data into
+	 * lines in COPY FROM.
+	 */
+	bool		last_was_cr;
+	ParseLinesState parse_lines_state;
+
+	int			last_line_no; /* last line in 'endlines', -1 if EOF not reached yet */
+
+	int			nextline;
+	int		   *endlines; /* line ending positions within raw_buf */
+	int			numlines;
+
+	/* split_line_buf holds partial line carried over from previous buf */
+	StringInfoData split_line_buf;
+
+	/*
+	 * Similarly, line_buf holds the current input line being processed. The
 	 * input cycle is first to read the whole line into line_buf, convert it
 	 * to server encoding there, and then extract the individual attribute
 	 * fields into attribute_buf.  line_buf is preserved unmodified so that we
 	 * can display it in error messages if appropriate.  (In binary mode,
 	 * line_buf is not used.)
 	 */
-	StringInfoData line_buf;
+	char	   *line_buf;
+	int			line_len;
 	bool		line_buf_converted; /* converted to server encoding? */
 	bool		line_buf_valid; /* contains the row being processed? */
+	bool		line_buf_alloced;
 
 	/*
 	 * Finally, raw_buf holds raw data read from the data source (file or
@@ -230,6 +278,9 @@ typedef struct CopyStateData
 	int			raw_buf_len;	/* total # of bytes stored */
 	/* Shorthand for number of unconsumed bytes available in raw_buf */
 #define RAW_BUF_BYTES(cstate) ((cstate)->raw_buf_len - (cstate)->raw_buf_index)
+
+	char	   *converted_buf;
+	int

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread Tom Lane

John Naylor  writes:
> Okay, along those lines here's a patch using "this view" in a new paragraph
> for simplicity.

Basically OK with me, but ...

It seems fairly weird to use a nonspecific reference first and then a
specific one.  That is, I'd expect to read "The pg_settings view ..."
and then "This view ...", not the other way around.  So we could
put this para second, or put it first but make this para say
"The pg_settings view ..." while the existing text gets reduced to
"This view ...".

Or just make them both say "This view ..." so we don't have to have
this discussion again the next time somebody wants to add a para here.

regards, tom lane

Re: Additional Chapter for Tutorial

2020-10-30 Thread Erik Rijkers


On 2020-10-30 11:57, Jürgen Purtz wrote:

On 26.10.20 15:53, David G. Johnston wrote:

Removing -docs as moderation won’t let me cross-post.



Hi,

I applied 0009-architecture-vs-master.patch to head
and went through architecture.sgml (only that file),
then produced the attached .diff


And I wrote down some separate items:

1.
'Two Phase Locking' and 'TPL' should be, I think,
'Two-Phase Commit'. Please someone confirm.
(no changes made)

2.
To compare xid to sequence because they similarly 'count up' seems a bad 
idea.

(I don't think it's always true in the case of sequences)
(no changes made)

3.
'accesses' seems a somewhat strange word most of the time just 'access' 
may be better.  Not sure - native speaker wanted. (no changes made)


4.
'heap', in postgres, means often (always?) files. But more generally, 
the meaning is more associated with memory.  Therefore it would be good 
I think to explicitly use 'heap file' at least in the beginning once to 
make clear that heap implies 'safely written away to disk'.  Again, I'm 
not quite sure if my understanding is correct - I have made no changes 
in this regard.




Erik Rijkers
--- doc/src/sgml/architecture.sgml.orig	2020-10-30 15:19:54.469275256 +0100
+++ doc/src/sgml/architecture.sgml	2020-10-30 17:28:24.835233482 +0100
@@ -19,19 +19,18 @@
 In the case of PostgreSQL, the server
 launches a single process for each client connection, referred to as a
 Backend process.
-Those Backend processes handle the client's requests by acting on the
+Such a Backend process handles the client's requests by acting on the
 Shared Memory.
 This leads to other activities (file access, WAL, vacuum, ...) of the
 Instance. The
 Instance is a group of server-side processes acting on a common
-Shared Memory. Notably, PostgreSQL does not utilize application
-threading within its implementation.
+Shared Memory. PostgreSQL does not utilize threading.

 

-The first step in an Instance start is the start of the
+The first step when an Instance starts is the start of the
 Postmaster.
-He loads the configuration files, allocates Shared Memory, and
+It loads the configuration files, allocates Shared Memory, and
 starts the other processes of the Instance:
 Background Writer,
 Checkpointer,
@@ -66,32 +65,32 @@

 When a client application tries to connect to a
 database,
-this request is handled initially by the Postmaster. He
+this request is handled initially by the Postmaster. It
 starts a new Backend process and instructs the client
 application to connect to it. All further client requests
-go to this process and are handled by it.
+are handled by this process.

 

 Client requests like SELECT or
 UPDATE usually lead to the
-necessity to read or write some data. This is carried out
+necessity to read or write data. This is carried out
 by the client's backend process. Reads involve a page-level
-cache housed in Shared Memory (for details see:
+cache, located in Shared Memory (for details see:
 ) for the benefit of all processes
-in the instance. Writes also involve this cache, in additional
+in the instance. Writes also use this cache, in addition
 to a journal, called a write-ahead-log or WAL.

 

-Shared Memory is limited in size. Thus, it becomes necessary
+Shared Memory is limited in size and it can become necessary
 to evict pages. As long as the content of such pages hasn't
 changed, this is not a problem. But in Shared Memory also
 write actions take place. Modified pages are called dirty
 pages or dirty buffers and before they can be evicted they
-must be written back to disk. This happens regularly by the
+must be written to disk. This happens regularly by the
 Background Writer and the Checkpointer process to ensure
-that the disk version of the pages are kept up-to-date.
+that the disk version of the pages are up-to-date.
 The synchronisation from RAM to disk consists of two steps.

 
@@ -109,7 +108,7 @@
 Shared Memory. The parallel running WAL Writer process
 reads them and appends them to the end of the current
 WAL file.
-Such sequential writes are much faster than writes to random
+Such sequential writes are faster than writes to random
 positions of heap and index files. All WAL records created
 out of one dirty page must be transferred to disk before the
 dirty page itself can be transferred to disk in the second step.
@@ -119,19 +118,19 @@
 Second, the transfer of dirty buffers from Shared Memory to
 files must take place. This is the primary task of the
 Background Writer process. Because I/O activities can block
-other processes significantly, it starts periodically and
+other processes, it starts periodically and
 acts only for a short period. Doing so, its extensive

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas


On 30/10/2020 18:36, Heikki Linnakangas wrote:

I find this design to be very complicated. Why does the line-boundary
information need to be in shared memory? I think this would be much
simpler if each worker grabbed a fixed-size block of raw data, and
processed that.

In your patch, the leader process scans the input to find out where one
line ends and another begins, and because of that decision, the leader
needs to make the line boundaries available in shared memory, for the
worker processes. If we moved that responsibility to the worker
processes, you wouldn't need to keep the line boundaries in shared
memory. A worker would only need to pass enough state to the next worker
to tell it where to start scanning the next block.


Here's a high-level sketch of how I'm imagining this to work:

The shared memory structure consists of a queue of blocks, arranged as a 
ring buffer. Each block is of fixed size, and contains 64 kB of data, 
and a few fields for coordination:


typedef struct
{
/* Current state of the block */
pg_atomic_uint32 state;

/* starting offset of first line within the block */
int startpos;

chardata[64 kB];
} ParallelCopyDataBlock;

Where state is one of:

enum {
  FREE,   /* buffer is empty */
  FILLED, /* leader has filled the buffer with raw data */
  READY,  /* start pos has been filled in, but no worker process 
has claimed the block yet */

  PROCESSING, /* worker has claimed the block, and is processing it */
}

State changes FREE -> FILLED -> READY -> PROCESSING -> FREE. As the COPY 
progresses, the ring of blocks will always look something like this:


blk 0 startpos  0: PROCESSING [worker 1]
blk 1 startpos 12: PROCESSING [worker 2]
blk 2 startpos 10: READY
blk 3 starptos  -: FILLED
blk 4 startpos  -: FILLED
blk 5 starptos  -: FILLED
blk 6 startpos  -: FREE
blk 7 startpos  -: FREE

Typically, each worker process is busy processing a block. After the 
blocks being processed, there is one block in READY state, and after 
that, blocks in FILLED state.


Leader process:

The leader process is simple. It picks the next FREE buffer, fills it 
with raw data from the file, and marks it as FILLED. If no buffers are 
FREE, wait.


Worker process:

1. Claim next READY block from queue, by changing its state to
   PROCESSING. If the next block is not READY yet, wait until it is.

2. Start scanning the block from 'startpos', finding end-of-line
   markers. (in CSV mode, need to track when we're in-quotes).

3. When you reach the end of the block, if the last line continues to
   next block, wait for the next block to become FILLED. Peek into the
   next block, and copy the remaining part of the split line to a local
   buffer, and set the 'startpos' on the next block to point to the end
   of the split line. Mark the next block as READY.

4. Process all the lines in the block, call input functions, insert
   rows.

5. Mark the block as DONE.

In this design, you don't need to keep line boundaries in shared memory, 
because each worker process is responsible for finding the line 
boundaries of its own block.


There's a point of serialization here, in that the next block cannot be 
processed, until the worker working on the previous block has finished 
scanning the EOLs, and set the starting position on the next block, 
putting it in READY state. That's not very different from your patch, 
where you had a similar point of serialization because the leader 
scanned the EOLs, but I think the coordination between processes is 
simpler here.


- Heikki

Re: Parallel copy

2020-10-30 Thread Heikki Linnakangas


On 27/10/2020 15:36, vignesh C wrote:

Attached v9 patches have the fixes for the above comments.


I find this design to be very complicated. Why does the line-boundary 
information need to be in shared memory? I think this would be much 
simpler if each worker grabbed a fixed-size block of raw data, and 
processed that.


In your patch, the leader process scans the input to find out where one 
line ends and another begins, and because of that decision, the leader 
needs to make the line boundaries available in shared memory, for the 
worker processes. If we moved that responsibility to the worker 
processes, you wouldn't need to keep the line boundaries in shared 
memory. A worker would only need to pass enough state to the next worker 
to tell it where to start scanning the next block.


Whether the leader process finds the EOLs or the worker processes, it's 
pretty clear that it needs to be done ASAP, for a chunk at a time, 
because that cannot be done in parallel. I think some refactoring in 
CopyReadLine() and friends would be in order. It probably would be 
faster, or at least not slower, to find all the EOLs in a block in one 
tight loop, even when parallel copy is not used.


- Heikki

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread John Naylor

On Fri, Oct 30, 2020 at 12:07 PM Tom Lane  wrote:

> John Naylor  writes:
> > On Thu, Oct 29, 2020 at 11:51 PM Fujii Masao <
> masao.fu...@oss.nttdata.com>
> > wrote:
> >> Also I think this note should be in the different paragraph from the
> >> paragraph
> >> of "The pg_settings view cannot be inserted into or deleted from"
> >> because
> >> they are different topics. Thought?
>
> > Agreed on both points. In a separate paragraph, I think it's awkward to
> > start two consecutive sentences with "The pg_settings view". If we put it
> > in the previous paragraph we could phrase it like this:
>
> > "See Section 20.1 for more information about the various ways to change
> > these parameters. Customized options are not displayed until the
> > extension module that defines them has been loaded.
>
> That just moves the subject-inconsistency to a different para :-(
> I think this item should be its own new para.
>
> As for the repetitiveness, we could just say "This view ...", in one or
> even both paras.
>

Okay, along those lines here's a patch using "this view" in a new paragraph
for simplicity.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


v3-doc-pg-settings-no-custom.patch
Description: Binary data

Re: Corruption during WAL replay

2020-10-30 Thread Anastasia Lubennikova

Status update for a commitfest entry.

I see quite a few unanswered questions in the thread since the last patch 
version was sent. So, I move it to "Waiting on Author".

The new status of this patch is: Waiting on Author

Re: BUG #15383: Join Filter cost estimation problem in 10.5

2020-10-30 Thread David G. Johnston

On Fri, Oct 30, 2020 at 9:16 AM Anastasia Lubennikova <
lubennikov...@gmail.com> wrote:

> Status update for a commitfest entry.
>
> It looks like there was no real progress on this issue since April. I see
> only an experimental patch. What kind of review does it need right now? Do
> we need more testing or maybe production statistics for such queries?
>
> David, are you going to continue working on it?
> Can you, please, provide a short summary of the problem and list open
> questions for reviewers.
>
> The new status of this patch is: Waiting on Author
>

I agree that updating a wiki seems like an unappealing conclusion to this
entry.  I'm on the fence whether we just leave it in the cf and get a
periodic nag when it gets bumped.  As we are describing real problems or
potential improvements to live portions of the codebase ISTM that adding
code comments near to those locations would be warranted.  The commit
message can reference this thread.  There seems to already be a convention
for annotating code comments in such a way that they can be searched for -
which basically is what sticking it in the wiki would accomplish but the
searcher would have to look in the codebase and not on a website.  For the
target audience of this specific patch that seems quite reasonable.

David J.

Re: BUG #15383: Join Filter cost estimation problem in 10.5

2020-10-30 Thread Tom Lane

Anastasia Lubennikova  writes:
> Status update for a commitfest entry.
> It looks like there was no real progress on this issue since April. I see 
> only an experimental patch. What kind of review does it need right now? Do we 
> need more testing or maybe production statistics for such queries?

I think it's probably time to close this entry as RWF.  I don't think
either David or I have any fresh ideas about how to get to something
committable.

The last couple of messages were complaining that adding this topic
to the TODO list would be useless ... but now that Naylor is working
on cleaning that up, maybe it will become less useless.

regards, tom lane

Re: BUG #15383: Join Filter cost estimation problem in 10.5

2020-10-30 Thread Anastasia Lubennikova

Status update for a commitfest entry.

It looks like there was no real progress on this issue since April. I see only 
an experimental patch. What kind of review does it need right now? Do we need 
more testing or maybe production statistics for such queries?

David, are you going to continue working on it?
Can you, please, provide a short summary of the problem and list open questions 
for reviewers.

The new status of this patch is: Waiting on Author

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread Tom Lane

John Naylor  writes:
> On Thu, Oct 29, 2020 at 11:51 PM Fujii Masao 
> wrote:
>> Also I think this note should be in the different paragraph from the
>> paragraph
>> of "The pg_settings view cannot be inserted into or deleted from"
>> because
>> they are different topics. Thought?

> Agreed on both points. In a separate paragraph, I think it's awkward to
> start two consecutive sentences with "The pg_settings view". If we put it
> in the previous paragraph we could phrase it like this:

> "See Section 20.1 for more information about the various ways to change
> these parameters. Customized options are not displayed until the
> extension module that defines them has been loaded.

That just moves the subject-inconsistency to a different para :-(
I think this item should be its own new para.

As for the repetitiveness, we could just say "This view ...", in one or
even both paras.

regards, tom lane

Re: document pg_settings view doesn't display custom options

2020-10-30 Thread John Naylor

On Thu, Oct 29, 2020 at 11:51 PM Fujii Masao 
wrote:

>
>
> On 2020/10/29 21:54, John Naylor wrote:
>
> > The pg_settings does not display
> > customized options
> > that have been set before the relevant extension module has been
> loaded.
>
> I guess that someone can misread this as
>
>  customized options that have been set before the relevant extension
>  module has been loaded are not displayed even after the module is
> loaded.
>
> So what about the following, instead?
>
>  The pg_settings does not display customized options until the
> extension
>  module that defines them has been loaded.
>
> Also I think this note should be in the different paragraph from the
> paragraph
> of "The pg_settings view cannot be inserted into or deleted from"
> because
> they are different topics. Thought?
>

Agreed on both points. In a separate paragraph, I think it's awkward to
start two consecutive sentences with "The pg_settings view". If we put it
in the previous paragraph we could phrase it like this:

"See Section 20.1 for more information about the various ways to change
these parameters. Customized options are not displayed until the
extension module that defines them has been loaded.

The pg_settings view..."

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: cutting down the TODO list thread

2020-10-30 Thread John Naylor

On Wed, Oct 28, 2020 at 1:57 PM Andres Freund  wrote:

> On 2020-10-28 16:20:03 +0100, Magnus Hagander wrote:
> > I would personally prefer a completely seprate page
>
> Same.
>

Ok, that's two votes for a separate page, and one for a new section on the
same page, so it looks like it's a new page. That being the case, I would
think it logical to move "features we don't want" there. As for the name,
we should probably encompass both "won't fix" bugs and features not wanted.
Maybe "past development ideas" or "not worth doing", but I'm open to better
ideas. Once that's agreed upon, I'll make a new page and migrate the items
over, minus the two that were mentioned upthread.

-- 
John Naylor
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Autovacuum worker doesn't immediately exit on postmaster death

2020-10-30 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Alvaro Herrera  writes:
> > On 2020-Oct-29, Stephen Frost wrote:
> >> I do think it'd be good to find a way to check every once in a while
> >> even when we aren't going to delay though.  Not sure what the best
> >> answer there is.
> 
> > Maybe instead of thinking specifically in terms of vacuum, we could
> > count buffer accesses (read from kernel) and check the latch once every
> > 1000th such, or something like that.  Then a very long query doesn't
> > have to wait until it's run to completion.  The cost is one integer
> > addition per syscall, which should be bearable.
> 
> I'm kind of unwilling to add any syscalls at all to normal execution
> code paths for this purpose.  People shouldn't be sig-kill'ing the
> postmaster, or if they do, cleaning up the mess is their responsibility.
> I'd also suggest that adding nearly-untestable code paths for this
> purpose is a fine way to add bugs we'll never catch.

Not sure if either is at all viable, but I had a couple of thoughts
about other ways to possibly address this.

The first simplistic idea is this- we have lots of processes that pick
up pretty quickly on the postmaster going away due to checking if it's
still around while waiting for something else to happen anyway (like the
autovacuum launcher...), and we have CFI's in a lot of places where it's
reasonable to do a CFI but isn't alright to check for postmaster death.
While it'd be better if there were more platforms where parent death
would send a signal to the children, that doesn't seem to be coming any
time soon- so why don't we do it ourselves?  That is, when we discover
that the postmaster has died, scan through the proc array (carefully,
since it could be garbage, but all we're looking for are the PIDs of
anything that might still be around) and try sending a signal to any
processes that are left?  Those signals would hopefully get delivered
and the other backends would discover the signal through CFI and exit
reasonably quickly.

The other thought I had was around trying to check for postmaster death
when we're about to do some I/O, which would probably catch a large
number of these cases too though technically some process might stick
around for a while if it's only dealing with things that are already in
shared buffers, I suppose.  Also seems complicated and expensive to do.

> The if-we're-going-to-delay-anyway path in vacuum_delay_point seems
> OK to add a touch more overhead to, though.

Yeah, this certainly seems reasonable to do too and on a well run system
would likely be enough 90+% of the time.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [PATCH] Clarify wording for convalidated in pg_constraint.

2020-10-30 Thread Tom Lane

Jimmy Angelakos  writes:
> Please find attached my first patch (the tiniest of doc patches) :
> I found the wording around convalidated in pg_catalog documentation a bit
> confusing/ambiguous.

Seems reasonable; pushed.

regards, tom lane

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-30 Thread Euler Taveira

On Fri, 30 Oct 2020 at 09:43, vignesh C  wrote:

>
> Attached v3 patch has the change for the same.
>
>
Hi Vignesh,

+ appendStringInfo(, "replication ");
+
+ appendStringInfo(, "connection authorized: user=%s",
+ port->user_name);
+ if (!am_walsender)
+ appendStringInfo(, " database=%s", port->database_name);
+
+ if (port->application_name != NULL)
+ appendStringInfo(, " application_name=%s",
+ port->application_name);
+

Your approach breaks localization. You should use multiple errmsg.

+$node->append_conf('postgresql.conf', "logging_collector= 'on'");
+$node->append_conf('postgresql.conf', "log_connections= 'on'");

booleans don't need quotes.


-- 
Euler Taveira http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Log message for GSS connection is missing once connection authorization is successful.

2020-10-30 Thread vignesh C

Thanks for the comments Bharath.

On Thu, Oct 29, 2020 at 12:15 PM Bharath Rupireddy
 wrote:
> 1. Instead of just "on/off" after GSS %s in the log message, wouldn't it be 
> informative if we have authenticated and/or encrypted as suggested by Stephen?
>
> So the log message would look like this:
>
> if(be_gssapi_get_auth(port))
> replication connection authorized: user=bob application_name=foo GSS 
> authenticated (principal=bar)
>
> if(be_gssapi_get_enc(port))
> replication connection authorized: user=bob application_name=foo GSS 
> encrypted (principal=bar)
>
> if(be_gssapi_get_auth(port) && be_gssapi_get_enc(port))
> replication connection authorized: user=bob application_name=foo GSS 
> authenticated and encrypted (principal=bar)
>
> +#ifdef ENABLE_GSS
> +if (be_gssapi_get_auth(port) || be_gssapi_get_princ(port))
> +ereport(LOG,
> +(port->application_name != NULL
> + ? errmsg("replication connection authorized: 
> user=%s application_name=%s GSS %s (principal=%s)",
> +  port->user_name,
> +  port->application_name,
> +  be_gssapi_get_auth(port) || 
> be_gssapi_get_enc(port) ? _("on") : _("off"),
> +  be_gssapi_get_princ(port))
> + : errmsg("replication connection authorized: 
> user=%s GSS %s (principal=%s)",
> +  port->user_name,
> +  be_gssapi_get_auth(port) || 
> be_gssapi_get_enc(port) ? _("on") : _("off"),
> +  be_gssapi_get_princ(port;
> +else
>

This is handled in v3 patch posted at [1].

> 2. I think the log message preparation looks a bit clumsy with ternary 
> operators and duplicate log message texts(I know that we already do this for 
> SSL). Can we have the log message prepared using StringInfoData data 
> structure/APIs and use just a single ereport? This way, that part of the code 
> looks cleaner.
>
> Here's what I'm visualizing:
>
> if (Log_connections)
> {
> StringInfoData msg;
>
> if (am_walsender)
> append("replication connection authorized: user=%s");
> else
> append("connection authorized: user=%s database=%s");
>
> if (port->application_name)
> append("application_name=%s");
>
> #ifdef USE_SSL
> if (port->ssl_in_use)
> append("SSL enabled (protocol=%s, cipher=%s, bits=%d, compression=%s");
> #elif defined(ENABLE_GSS)
> blah,blah,blah
> #endif
>
> ereport (LOG, msg.data);
> }

This is handled in the v3 patch posted.

>
> 3. +if (be_gssapi_get_auth(port) || be_gssapi_get_princ(port))
>
> If be_gssapi_get_auth(port) returns false, I think there's no way that 
> be_gssapi_get_princ(port) would return a non null value, see the comment. The 
> function be_gssapi_get_princ() returns NULL if the auth is false, so the 
> check if ( be_gssapi_get_princ(port)) would suffice.
>
> gss_name_tname;/* GSSAPI client name */
> char   *princ;/* GSSAPI Principal used for auth, NULL if
>  * GSSAPI auth was not used */
> boolauth;/* GSSAPI Authentication used */
> boolenc;/* GSSAPI encryption in use */
>

This is handled in the v3 patch posted.


Regards,
Vignesh
EnterpriseDB: http://www.enterprisedb.com

Re: Disable WAL logging to speed up data loading

2020-10-30 Thread Laurenz Albe

On Fri, 2020-10-30 at 05:00 +0900, Fujii Masao wrote:
> > But as I said in the other thread, changing wal_level emits a WAL
> > record, and I am sure that recovery will refuse to proceed if
> > wal_level < replica.
> 
> Yes. What I meant was such a safe guard needs to be implemented.

That should be easy; just modify this code:

static void
CheckRequiredParameterValues(void)
{
/*
 * For archive recovery, the WAL must be generated with at least 'replica'
 * wal_level.
 */
if (ArchiveRecoveryRequested &&
ControlFile->wal_level == WAL_LEVEL_MINIMAL)
{
ereport(WARNING,
(errmsg("WAL was generated with wal_level=minimal, data may be 
missing"),
 errhint("This happens
if you temporarily set wal_level=minimal without taking a new base backup.")));
}

so that it tests

if (ArchiveRecoveryRequested && ControlFile->wal_level <= WAL_LEVEL_MINIMAL)

and we should be safe.

Yours,
Laurenz Albe

Re: Enumize logical replication message actions

2020-10-30 Thread Ashutosh Bapat

On Fri, 30 Oct 2020 at 17:37, Amit Kapila  wrote:

>
> I don't like the word 'Enumize' in commit message. How about changing
> it to something like: (a) Add defines for logical replication protocol
> messages, or (b) Associate names with logical replication protocol
> messages.
>

I have used  "Use Enum for top level logical replication message types" in
the attached patch. But please free to use (a) if you feel so.


>
> + 2. It's easy to locate the code handling a given type.
>
> In the above instead of 'type', shouldn't it be 'message'.
>

Used "message type". But please feel free to use "message" if you think
that's appropriate.


>
> Other than that the patch looks good to me.
>
>
Patch with updated commit message and also the list of reviewers

-- 
Best Wishes,
Ashutosh
From 55957f412c40106288f932d38182cf79f5902ddf Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH] Use Enum for top level logical replication message types

Logical replication protocol uses single byte character to identify a
message type in logical repliation protocol. The code uses string
literals for the same. Enumize those so that

1. All the string literals used can be found at a single place. This
makes it easy to add more types without the risk of conflicts.

2. It's easy to locate the code handling a given message type.

3. When used with switch statements, it is easy to identify the missing
cases using -Wswitch.

Ashutosh Bapat, reviewed by Kyotaro Horiguchi, Andres Freund, Peter
Smith and Amit Kapila

Discussion: https://postgr.es/m/caexhw5upzq7l0oad_enyvaiymopgkraojpe+zy5-obdcvt6...@mail.gmail.com
---
 src/backend/replication/logical/proto.c  | 26 +++
 src/backend/replication/logical/worker.c | 87 
 src/include/replication/logicalproto.h   | 27 
 3 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..fdb31182d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_INDEX);
 
-	pq_sendbyte(out, 'D');		/* action DELETE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_DELETE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -317,7 +317,7 @@ logicalrep_write_truncate(StringInfo out,
 	int			i;
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'T');		/* action TRUNCATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_TRUNCATE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -369,7 +369,7 @@ logicalrep_write_rel(StringInfo out, TransactionId xid, Relation rel)
 {
 	char	   *relname;
 
-	pq_sendbyte(out, 'R');		/* sending RELATION */
+	pq_sendbyte(out, LOGICAL_REP_MSG_RELATION);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -425,7 +425,7 @@ logicalrep_write_typ(StringInfo out, TransactionId xid, Oid typoid)
 	HeapTuple	tup;
 	Form_pg_type typtup;
 
-	pq_sendbyte(out, 'Y');		/* sending TYPE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_TYPE);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -755,7 +755,7 @@ void

Re: Enumize logical replication message actions

2020-10-30 Thread Ashutosh Bapat

On Fri, 30 Oct 2020 at 14:59, Amit Kapila  wrote:

> On Fri, Oct 30, 2020 at 11:50 AM Amit Kapila 
> wrote:
> >
> > On Fri, Oct 30, 2020 at 10:37 AM Peter Smith 
> wrote:
> > >
> > > IIUC getting rid of the default from the switch can make the missing
> > > enum detection easier because then you can use -Wswitch option to
> > > expose the problem (instead of -Wswitch-enum which may give lots of
> > > false positives as well)
> > >
> >
> > Fair enough. So, it makes sense to move the default out of the switch
> case.
> >
>
> One more thing I was thinking about this patch was whether it has any
> impact w.r.t to Endianness as we are using four-bytes to represent
> one-byte and it seems there is no issue with that because pq_sendbyte
> accepts just one-byte and sends that over the network. So, we could
> see a problem only if we use any enum value which is more than
> one-byte which we are anyway adding a warning message along with the
> definition of enum. So, we are safe here. Does that make sense?
>
>
yes. Endian-ness should be handled by the compiler transparently.

-- 
Best Wishes,
Ashutosh

Re: Enumize logical replication message actions

2020-10-30 Thread Ashutosh Bapat

On Fri, 30 Oct 2020 at 09:16, Amit Kapila  wrote

>
> 1.
> + LOGICAL_REP_MSG_STREAM_ABORT = 'A',
> +} LogicalRepMsgType;
>
> There is no need for a comma after the last message.
>
> Done. Thanks for noticing it.


> 2.
> +/*
> + * Logical message types
> + *
> + * Used by logical replication wire protocol.
> + *
> + * Note: though this is an enum it should fit a single byte and should be
> a
> + * printable character.
> + */
> +typedef enum
> +{
>
> I think we can expand the comments to probably say why we need these
> to fit in a single byte or what problem it can cause if that rule is
> disobeyed. This is to make the next person clear why we are imposing
> such a rule.
>

Done. Please check.


>
> 3.
> +typedef enum
> +{
> ..
> +} LogicalRepMsgType;
>
> There are places in code where we use the enum name
> (LogicalRepMsgType) both in the start and end. See TypeCat,
> CoercionMethod, CoercionCodes, etc. I see places where we use the way
> you have in the code. I would prefer the way we have used at places
> like TypeCat as that makes it easier to read.
>

Not my favourite style since changing the type name requires changing enum
name to keep those consistent. But anyway done.


>
> 4.
>   switch (action)
>   {
> - /* BEGIN */
> - case 'B':
> + case LOGICAL_REP_MSG_BEGIN:
>   apply_handle_begin(s);
> - break;
> - /* COMMIT */
> - case 'C':
> + return;
>
> I think we can simply use 'return apply_handle_begin;' instead of
> adding return in another line. Again, I think we changed this handling
> in apply_dispatch() to improve the case where we can detect at the
> compile time any missing enum but at this stage it is not clear to me
> if that is true.
>

I don't see much value in writing it like "return apply_handle_begin()";
gives an impression that apply_handle_begin() and apply_dispatch() are
returning something which they are not. I would prefer return on separate
line unless there's something more than style improvement.

I have added rationale behind Enum in the commit message as you suggested
in one of the later mails.

PFA patch addressing your comments.
-- 
Best Wishes,
Ashutosh
From f71a600d3fd49756926deec1b593472a9fd8a8cc Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Fri, 16 Oct 2020 12:31:35 +0530
Subject: [PATCH] Enumize top level logical replication actions

Logical replication protocol uses single byte character to identify a
message type in logical repliation protocol. The code uses string
literals for the same. Enumize those so that

1. All the string literals used can be found at a single place. This
makes it easy to add more types without the risk of conflicts.

2. It's easy to locate the code handling a given type.

3. When used with switch statements, it is easy to identify the missing
cases using -Wswitch.

Ashutosh Bapat
---
 src/backend/replication/logical/proto.c  | 26 +++
 src/backend/replication/logical/worker.c | 87 
 src/include/replication/logicalproto.h   | 27 
 3 files changed, 83 insertions(+), 57 deletions(-)

diff --git a/src/backend/replication/logical/proto.c b/src/backend/replication/logical/proto.c
index eb19142b48..fdb31182d7 100644
--- a/src/backend/replication/logical/proto.c
+++ b/src/backend/replication/logical/proto.c
@@ -44,7 +44,7 @@ static const char *logicalrep_read_namespace(StringInfo in);
 void
 logicalrep_write_begin(StringInfo out, ReorderBufferTXN *txn)
 {
-	pq_sendbyte(out, 'B');		/* BEGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_BEGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, txn->final_lsn);
@@ -76,7 +76,7 @@ logicalrep_write_commit(StringInfo out, ReorderBufferTXN *txn,
 {
 	uint8		flags = 0;
 
-	pq_sendbyte(out, 'C');		/* sending COMMIT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_COMMIT);
 
 	/* send the flags field (unused for now) */
 	pq_sendbyte(out, flags);
@@ -112,7 +112,7 @@ void
 logicalrep_write_origin(StringInfo out, const char *origin,
 		XLogRecPtr origin_lsn)
 {
-	pq_sendbyte(out, 'O');		/* ORIGIN */
+	pq_sendbyte(out, LOGICAL_REP_MSG_ORIGIN);
 
 	/* fixed fields */
 	pq_sendint64(out, origin_lsn);
@@ -141,7 +141,7 @@ void
 logicalrep_write_insert(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'I');		/* action INSERT */
+	pq_sendbyte(out, LOGICAL_REP_MSG_INSERT);
 
 	/* transaction ID (if not valid, we're not streaming) */
 	if (TransactionIdIsValid(xid))
@@ -185,7 +185,7 @@ void
 logicalrep_write_update(StringInfo out, TransactionId xid, Relation rel,
 		HeapTuple oldtuple, HeapTuple newtuple, bool binary)
 {
-	pq_sendbyte(out, 'U');		/* action UPDATE */
+	pq_sendbyte(out, LOGICAL_REP_MSG_UPDATE);
 
 	Assert(rel->rd_rel->relreplident == REPLICA_IDENTITY_DEFAULT ||
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
@@ -263,7 +263,7 @@ logicalrep_write_delete(StringInfo out, TransactionId xid, Relation rel,
 		   rel->rd_rel->relreplident == REPLICA_IDENTITY_FULL ||
 		   rel->rd_rel->relreplident ==

Re: problem with RETURNING and update row movement

2020-10-30 Thread Amit Langote

On Wed, Sep 30, 2020 at 12:34 PM Amit Langote  wrote:
> On Thu, Sep 24, 2020 at 2:26 PM Amit Langote  wrote:
> > I saw the CF-bot failure too yesterday, although it seems that it's
> > because the bot is trying to apply the patch version meant for v11
> > branch onto HEAD branch.  I just checked that the patches apply
> > cleanly to their respective branches.
>
> I checked that the last statement is still true, so I've switched the
> status back to Needs Review.

And the patch for HEAD no longer applies, because of a recent
refactoring commit that hit update row movement code.

Rebased patch for HEAD attached.  Patch for PG11 should apply as-is.

Here is a summary of where we stand on this, because another issue
related to using RETURNING with partitioned tables [1] kind of ties
into this.

The problem reported on this thread is solved by ExecUpdate() itself
evaluating the RETURNING list using the source partition's
ri_projectReturning, instead of ExecInsert() doing it using the
destination partition's ri_projectReturning.  It must work that way,
because the tuple descriptor of 'planSlot', which provides the values
of the columns mentioned in RETURNING of tables other than the target
table, is based on the source partition.  Note that the tuple inserted
into the destination partition needs to be converted into the source
partition if their tuple descriptors differ, so that RETURNING
correctly returns the new values of the updated tuple.

The point we are stuck on is whether this fix will create problems for
the case when the RETURNING list contains system columns.  Now that we
will be projecting the tuple inserted into the destination using its
copy in the source partition's format and we have no way of preserving
the system information in the tuple across conversions, we need to
find a way to make RETURNING project the correct system information.
Fujita-san has suggested (or agreed with a suggestion made at [1])
that we should simply prevent system information from being projected
with RETURNING when the command is being performed on a partitioned
table, in which case this becomes a non-issue.  If that is not what we
decide to do on the other thread, then at least we will have to figure
out over there how we will support RETURNING system columns when row
movement occurs during an update on partitioned tables.  The question
I guess is whether that thread must conclude before the fix here can
be committed.

--
Amit Langote
EDB: http://www.enterprisedb.com

[1] https://www.postgresql.org/message-id/flat/141051591267657%40mail.yandex.ru

v6-0001-Fix-a-bug-with-RETURNING-when-UPDATE-moves-tuple-HEAD.patch
Description: Binary data

Re: Enumize logical replication message actions

2020-10-30 Thread Amit Kapila

On Fri, Oct 30, 2020 at 11:50 AM Amit Kapila  wrote:
>
> On Fri, Oct 30, 2020 at 10:37 AM Peter Smith  wrote:
> >
> > IIUC getting rid of the default from the switch can make the missing
> > enum detection easier because then you can use -Wswitch option to
> > expose the problem (instead of -Wswitch-enum which may give lots of
> > false positives as well)
> >
>
> Fair enough. So, it makes sense to move the default out of the switch case.
>

One more thing I was thinking about this patch was whether it has any
impact w.r.t to Endianness as we are using four-bytes to represent
one-byte and it seems there is no issue with that because pq_sendbyte
accepts just one-byte and sends that over the network. So, we could
see a problem only if we use any enum value which is more than
one-byte which we are anyway adding a warning message along with the
definition of enum. So, we are safe here. Does that make sense?

-- 
With Regards,
Amit Kapila.

Re: [HACKERS] logical decoding of two-phase transactions

2020-10-30 Thread Peter Smith

On Wed, Oct 21, 2020 at 7:37 PM Amit Kapila  wrote:
> > Comment: I dont think a tablesync worker will use streaming, none of
> > the other stream APIs check this, this might not be relevant for
> > stream_prepare either.
> >
>
> Yes, I think this is right. See pgoutput_startup where we are
> disabling the streaming for init phase. But it is always good to once
> test this and ensure the same.

I have tested this scenario and confirmed that even when the
subscriber is capable of streaming it does NOT do any streaming during
its tablesync phase.

Kind Regards
Peter Smith.
Fujitsu Australia

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-30 Thread Kyotaro Horiguchi

At Fri, 30 Oct 2020 16:33:01 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Fri, 30 Oct 2020 14:38:30 +0900, Amit Langote  
> wrote in 
> I'm not sure how we should construct our won mapping, but the
> difference made by we simply moved to JIS0208.TXT based as Ishii-san
> suggested the differences in the mapping would be as the follows.

Mmm..

I'm not sure how we should construct our won mapping, but the
difference made by simply moving to JIS0208.TXT-based as Ishii-san
suggested, the following differences would be seen in the mappings.

> 1. The following codes (regions) are not defined in JIS0208.
> 
>  8ea1 - 8edf  (up to 64 characters (I didn't actually counted them.))
>  ada1 - adfc  (up to 92 characters (ditto))
>  8ff3f3 - 8ff4a8  (up to 182 characters (ditto))

  8ea1 - 8edf  (64 chars. U+ff61 - U+ff9f) (hankaku-kana)
  ada1 - adfc  (83 chars, U+2460 - U+33a1) (numbers with cicle)
  8ff3f3 - 8ff4a8  (20 chars, U+2160 - U+2179) (roman numerals)

>  a1c0  ff3c: (ff3c: FULLWIDTH REVERSE SOLIDUS)
>8ff4aa  ff07: (ff07: FULLWIDTH APOSTROPHE)
> 
> 2. some individual differences
> 
>EUC  0208  932
>a1c1 301c ff5e: (301c:WAVE DASH)
>a1c2 2016 2225: (2016:DOUBLE_VERTICAL LINE) : (2225:PARALLEL TO)
> *  a1dd 2212 ff0d: (2212: MINUS_SIGN) : (ff0d: FULLWIDTH HYPHEN-MINUS)
>d1f1   a2 ffe0: (00a2: CENT SIGN) :  (ffe0: FULLWIDTH CENT SIGN)
>d1f2   a3 ffe1: (00a3: PUND SIGN) :  (ffe1: FULLWIDTH POUND SIGN)
>a2cc   ac ffe2: (00ac: NOT SIGN)  :  (ffe2: FULLWIDTH NOT SIGN)
> 
> 
> *1: https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0208.TXT
> 
> > > > Please note that the byte sequence (81-7c) in SJIS represents MINUS
> > > > SIGN in SJIS which means the MINUS SIGN in UTF8 got converted to the
> > > > MINUS SIGN in SJIS and that is what we expect. Isn't it?
> > >
> > > I think we don't change authoritative mappings, but maybe can add some
> > > one-way conversions for the convenience.
> > 
> > Maybe UCS_TO_EUC_JP.pl could do something like the above.
> > 
> > Are there other cases that were fixed like this in the past, either
> > for euc_jp or sjis?
> 
> Honestly, I don't know how the mapping was decided in 2002, but
> removing the regions in 1 would cause confusion.  So what we can do in
> this area would be chaning some of 2 to 0208 mapping.  But arbitrary
> mixture of different mapings would cause new problem..

 Forgot about adding one-way mappings.  I think we can add several
 such mappings, say.

 U+3031->:   EUC:a1c1 <-> U+ff5e
 U+2016->:   EUC:a1c2 <-> U+2225
 U+2212->:   EUC:a1dd <-> U+ff0d
 U+00a2->:   EUC:d1f1 <-> U+ffe0
 U+00a3->:   EUC:d1f2 <-> U+ffe1
 U+00ac->:   EUC:a2cc <-> U+ffe2

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8

2020-10-30 Thread Kyotaro Horiguchi

At Fri, 30 Oct 2020 14:38:30 +0900, Amit Langote  
wrote in 
> On Fri, Oct 30, 2020 at 12:20 PM Kyotaro Horiguchi
>  wrote:
> > So ping-pong between Unicode and SJIS behaves like this:
> >
> > U+2212 => 0x817c@sjis => U+ff0d => 0x817c@sjis ...
> 
> Is it the following piece of code in UCS_TO_SJIS.pl that manually adds
> the mapping?

Yes.

> # Add these UTF8->SJIS pairs to the table.
> push @$mapping,
> ...
> {
> direction => FROM_UNICODE,
> ucs   => 0x2212,
> code  => 0x817c,
> comment   => '# MINUS SIGN',
> f => $this_script,
> l => __LINE__
> },
> 
> Given that U+2212 is encoded by e28892 in utf8, I assume that's how
> utf8_to_sjis.map ends up with the following mapping into sjis for that
> byte sequence:
> 
>   /*** Three byte table, leaf: e288xx - offset 0x004ee ***/
> 
>   /* 80 */  0x81cd, 0x, 0x81dd, 0x81ce, 0x, 0x, 0x, 0x81de,
>   /* 88 */  0x81b8, 0x, 0x, 0x81b9, 0x, 0x, 0x, 0x,
>   /* 90 */  0x, 0x8794, "0x817c", ...

I'm not sure how we should construct our won mapping, but the
difference made by we simply moved to JIS0208.TXT based as Ishii-san
suggested the differences in the mapping would be as the follows.

1. The following codes (regions) are not defined in JIS0208.

 8ea1 - 8edf  (up to 64 characters (I didn't actually counted them.))
 ada1 - adfc  (up to 92 characters (ditto))
 8ff3f3 - 8ff4a8  (up to 182 characters (ditto))

 a1c0  ff3c: (ff3c: FULLWIDTH REVERSE SOLIDUS)
   8ff4aa  ff07: (ff07: FULLWIDTH APOSTROPHE)

2. some individual differences

   EUC  0208  932
   a1c1 301c ff5e: (301c:WAVE DASH)
   a1c2 2016 2225: (2016:DOUBLE_VERTICAL LINE) : (2225:PARALLEL TO)
*  a1dd 2212 ff0d: (2212: MINUS_SIGN) : (ff0d: FULLWIDTH HYPHEN-MINUS)
   d1f1   a2 ffe0: (00a2: CENT SIGN) :  (ffe0: FULLWIDTH CENT SIGN)
   d1f2   a3 ffe1: (00a3: PUND SIGN) :  (ffe1: FULLWIDTH POUND SIGN)
   a2cc   ac ffe2: (00ac: NOT SIGN)  :  (ffe2: FULLWIDTH NOT SIGN)


*1: https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/JIS0208.TXT

> > > Please note that the byte sequence (81-7c) in SJIS represents MINUS
> > > SIGN in SJIS which means the MINUS SIGN in UTF8 got converted to the
> > > MINUS SIGN in SJIS and that is what we expect. Isn't it?
> >
> > I think we don't change authoritative mappings, but maybe can add some
> > one-way conversions for the convenience.
> 
> Maybe UCS_TO_EUC_JP.pl could do something like the above.
> 
> Are there other cases that were fixed like this in the past, either
> for euc_jp or sjis?

Honestly, I don't know how the mapping was decided in 2002, but
removing the regions in 1 would cause confusion.  So what we can do in
this area would be chaning some of 2 to 0208 mapping.  But arbitrary
mixture of different mapings would cause new problem..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Add important info about ANALYZE after create Functional Index

2020-10-30 Thread Michael Paquier

On Thu, Oct 29, 2020 at 10:59:52AM +0900, Michael Paquier wrote:
> REINDEX CONCURRENTLY is by design wanted to provide an experience
> transparent to the user similar to what a plain REINDEX would do, at
> least that's the idea behind it, so..  This qualifies as a bug to me,
> in spirit.

And in spirit, it is possible to address this issue with the patch
attached which copies the set of stats from the old to the new index.
For a non-concurrent REINDEX, this does not happen because we keep the
same base relation, while we replace completely the relation with a
concurrent operation.  We have a RemoveStatistics() in heap.c, but I
did not really see the point to invent a copy flavor for this
particular case.  Perhaps others have an opinion on that?
--
Michael
diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index 0974f3e23a..3820a03fb6 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -49,6 +49,7 @@
 #include "catalog/pg_inherits.h"
 #include "catalog/pg_opclass.h"
 #include "catalog/pg_operator.h"
+#include "catalog/pg_statistic.h"
 #include "catalog/pg_tablespace.h"
 #include "catalog/pg_trigger.h"
 #include "catalog/pg_type.h"
@@ -1711,6 +1712,44 @@ index_concurrently_swap(Oid newIndexId, Oid oldIndexId, const char *oldName)
 		}
 	}
 
+	/*
+	 * Copy over any data of pg_statistic from the old index to the new one.
+	 */
+	{
+		HeapTuple   tup;
+		SysScanDesc scan;
+		ScanKeyData key[1];
+		Relation	statrel;
+
+		statrel = table_open(StatisticRelationId, RowExclusiveLock);
+		ScanKeyInit([0],
+	Anum_pg_statistic_starelid,
+	BTEqualStrategyNumber, F_OIDEQ,
+	ObjectIdGetDatum(oldIndexId));
+
+		scan = systable_beginscan(statrel, StatisticRelidAttnumInhIndexId,
+  true, NULL, 1, key);
+
+		while (HeapTupleIsValid((tup = systable_getnext(scan
+		{
+			Form_pg_statistic statform;
+
+			/* make a modifiable copy */
+			tup = heap_copytuple(tup);
+			statform = (Form_pg_statistic) GETSTRUCT(tup);
+
+			/* update the copy of the tuple and insert it */
+			statform->starelid = newIndexId;
+			CatalogTupleInsert(statrel, tup);
+
+			heap_freetuple(tup);
+		}
+
+		systable_endscan(scan);
+
+		table_close(statrel, RowExclusiveLock);
+	}
+
 	/* Close relations */
 	table_close(pg_class, RowExclusiveLock);
 	table_close(pg_index, RowExclusiveLock);
diff --git a/src/test/regress/expected/create_index.out b/src/test/regress/expected/create_index.out
index 6ace7662ee..012c1eb067 100644
--- a/src/test/regress/expected/create_index.out
+++ b/src/test/regress/expected/create_index.out
@@ -2551,6 +2551,17 @@ CREATE UNIQUE INDEX concur_exprs_index_pred ON concur_exprs_tab (c1)
 CREATE UNIQUE INDEX concur_exprs_index_pred_2
   ON concur_exprs_tab ((1 / c1))
   WHERE ('-H') >= (c2::TEXT) COLLATE "C";
+ANALYZE concur_exprs_tab;
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
+starelid | count 
+-+---
+ concur_exprs_index_expr | 1
+(1 row)
+
 SELECT pg_get_indexdef('concur_exprs_index_expr'::regclass);
 pg_get_indexdef
 ---
@@ -2608,6 +2619,17 @@ SELECT pg_get_indexdef('concur_exprs_index_pred_2'::regclass);
  CREATE UNIQUE INDEX concur_exprs_index_pred_2 ON public.concur_exprs_tab USING btree (((1 / c1))) WHERE ('-H'::text >= (c2 COLLATE "C"))
 (1 row)
 
+-- Statistics should remain intact.
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
+starelid | count 
+-+---
+ concur_exprs_index_expr | 1
+(1 row)
+
 DROP TABLE concur_exprs_tab;
 -- Temporary tables and on-commit actions, where CONCURRENTLY is ignored.
 -- ON COMMIT PRESERVE ROWS, the default.
diff --git a/src/test/regress/sql/create_index.sql b/src/test/regress/sql/create_index.sql
index 37f7259da9..4e45b18613 100644
--- a/src/test/regress/sql/create_index.sql
+++ b/src/test/regress/sql/create_index.sql
@@ -1079,6 +1079,12 @@ CREATE UNIQUE INDEX concur_exprs_index_pred ON concur_exprs_tab (c1)
 CREATE UNIQUE INDEX concur_exprs_index_pred_2
   ON concur_exprs_tab ((1 / c1))
   WHERE ('-H') >= (c2::TEXT) COLLATE "C";
+ANALYZE concur_exprs_tab;
+SELECT starelid::regclass, count(*) FROM pg_statistic WHERE starelid IN (
+  'concur_exprs_index_expr'::regclass,
+  'concur_exprs_index_pred'::regclass,
+  'concur_exprs_index_pred_2'::regclass)
+  GROUP BY starelid ORDER BY starelid::regclass::text;
 SELECT

Re: Enumize logical replication message actions

2020-10-30 Thread Amit Kapila

On Fri, Oct 30, 2020 at 10:37 AM Peter Smith  wrote:
>
> On Fri, Oct 30, 2020 at 2:46 PM Amit Kapila  wrote:
>
> Hi Amit
>
> > You mentioned in the beginning that you prefer to use Enum instead of
> > define so that switch cases can detect any remaining items but I have
> > tried adding extra enum value at the end and didn't handle that in
> > switch case but I didn't get any compilation warning or error. Do we
> > need something else to detect that at compile time?
>
> See [1] some GCC compiler options that can expose missing cases like those.
>

Thanks, I am able to see the warnings now.

> e.g. use -Wswitch or -Wswitch-enum
> Detection depends if the switch has a default case or not.
>
> > 4.
> >   switch (action)
> >   {
> > - /* BEGIN */
> > - case 'B':
> > + case LOGICAL_REP_MSG_BEGIN:
> >   apply_handle_begin(s);
> > - break;
> > - /* COMMIT */
> > - case 'C':
> > + return;
> >
> > I think we can simply use 'return apply_handle_begin;' instead of
> > adding return in another line. Again, I think we changed this handling
> > in apply_dispatch() to improve the case where we can detect at the
> > compile time any missing enum but at this stage it is not clear to me
> > if that is true.
>
> IIUC getting rid of the default from the switch can make the missing
> enum detection easier because then you can use -Wswitch option to
> expose the problem (instead of -Wswitch-enum which may give lots of
> false positives as well)
>

Fair enough. So, it makes sense to move the default out of the switch case.

Ashutosh, see if we can add in comments (or may be commit message) why
we preferred to use enum for these messages.

-- 
With Regards,
Amit Kapila.

Re: ModifyTable overheads in generic plans

2020-10-30 Thread Amit Langote

Attached updated patches based on recent the discussion at:

* Re: partition routing layering in nodeModifyTable.c *
https://www.postgresql.org/message-id/CA%2BHiwqHpmMjenQqNpMHrhg3DRhqqQfby2RCT1HWVwMin3_5vMA%40mail.gmail.com

0001 adjusts how ForeignScanState.resultRelInfo is initialized for use
by direct modify operations.

0002 refactors ResultRelInfo initialization do be don lazily on first use

I call these v6, because the last version posted on this thread was
v5, even though it went through a couple of iterations on the above
thread. Sorry about the confusion.

-- 
Amit Langote
EDB: http://www.enterprisedb.com


v6-0001-Call-BeginDirectModify-from-ExecInitModifyTable.patch
Description: Binary data


v6-0002-Initialize-result-relation-information-lazily.patch
Description: Binary data

Re: partition routing layering in nodeModifyTable.c

2020-10-30 Thread Amit Langote

On Wed, Oct 28, 2020 at 4:46 PM Amit Langote  wrote:
>
> On Tue, Oct 27, 2020 at 10:23 PM Heikki Linnakangas  wrote:
> > This patch looks reasonable to me at a quick glance. I'm a bit worried
> > or unhappy about the impact on FDWs, though. It doesn't seem nice that
> > the ResultRelInfo is not available in the BeginDirectModify call. It's
> > not too bad, the FDW can call ExecGetResultRelation() if it needs it,
> > but still. Perhaps it would be better to delay calling
> > BeginDirectModify() until the first modification is performed, to avoid
> > any initialization overhead there, like establishing the connection in
> > postgres_fdw.
>
> Ah, calling BeginDirectModify() itself lazily sounds like a good idea;
> see attached updated 0001 to see how that looks.  While updating that
> patch, I realized that the ForeignScan.resultRelation that we
> introduced in 178f2d560d will now be totally useless. :-(

Given that we've closed the CF entry for this thread and given that
there seems to be enough context to these patches, I will move these
patches back to their original thread, that is:

* ModifyTable overheads in generic plans *
https://www.postgresql.org/message-id/flat/CA%2BHiwqE4k1Q2TLmCAvekw%2B8_NXepbnfUOamOeX%3DKpHRDTfSKxA%40mail.gmail.com

That will also make the CF-bot happy.

-- 
Amit Langote
EDB: http://www.enterprisedb.com

Re: Parallel INSERT (INTO ... SELECT ...)

2020-10-30 Thread Amit Kapila

On Fri, Oct 30, 2020 at 6:09 AM Greg Nancarrow  wrote:
>
> On Tue, Oct 27, 2020 at 8:56 PM Amit Kapila  wrote:
> >
> > IIUC, below is code for this workaround:
> >
> > +MaxRelParallelHazardForModify(Oid relid,
> > + CmdType commandType,
> > + max_parallel_hazard_context *context)
> > +{
> > + Relationrel;
> > + TupleDesc tupdesc;
> > + int attnum;
> > +
> > + LOCKMODE lockmode = AccessShareLock;
> > +
> > + /*
> > + * It's possible that this relation is locked for exclusive access
> > + * in another concurrent transaction (e.g. as a result of a
> > + * ALTER TABLE ... operation) until that transaction completes.
> > + * If a share-lock can't be acquired on it now, we have to assume this
> > + * could be the worst-case, so to avoid blocking here until that
> > + * transaction completes, conditionally try to acquire the lock and
> > + * assume and return UNSAFE on failure.
> > + */
> > + if (ConditionalLockRelationOid(relid, lockmode))
> > + {
> > + rel = table_open(relid, NoLock);
> > + }
> > + else
> > + {
> > + context->max_hazard = PROPARALLEL_UNSAFE;
> > + return context->max_hazard;
> > + }
> >
> > Do we need this workaround if we lock just the parent table instead of
> > locking all the tables? Basically, can we safely identify the
> > parallel-safety of partitioned relation if we just have a lock on
> > parent relation?
>
> I believe the workaround is still needed in this case, because the
> workaround was added because of a test in which the parent table was
> exclusively locked in another concurrent transaction (as a result of
> ALTER TABLE ... ATTACH PARTITION ...) so we could not even get a
> ShareLock on the parent table without hanging (and then ending up
> failing the test because of it).
>

Don't you think the test case design is flawed in that case? Because
even simple "select * from tpart;" will hang in planner while taking
share lock (the code flow is:
add_other_rels_to_query->expand_inherited_rtentry->expand_partitioned_rtentry)
once you take exclusive lock for a parallel session on the table.
Currently we never need to acquire any lock for Inserts in the planner
but not sure we can design a test case based on that assumption as we
can see it fails in this basic case.


> So at the moment the workaround is needed, even if just trying to lock
> the parent table.
>

I am not convinced, rather I think that the test case is not well
designed unless there is any other way (without taking share lock on
the relation) to determine parallel-safety of Inserts which neither of
us have thought of. I understand that you don't want to change that
test case as part of this patch so you are using this workaround.

> I'll do some more testing to determine the secondary issue of whether
> locks on the partition tables are needed, but at the moment I believe
> they are.
>

Fair enough but lets determine that by some testing and analysis. I
feel we should even add a comment if we require to lock all partition
tables. I see that we are already doing it for SELECT in the above
mentioned code path so maybe it is okay to do so for Inserts as well.

> >One more thing I have noticed is that for scan
> > relations (Select query), we do such checks much later based on
> > RelOptInfo (see set_rel_consider_parallel) which seems to have most of
> > the information required to perform parallel-safety checks but I guess
> > for ModifyTable (aka the Insert table) the equivalent doesn't seem
> > feasible but have you thought of doing at the later stage in planner?
> >
>
> Yes, and in fact I tried putting the checks in a later stage of the
> planner, and it's almost successful, except it then makes setting
> "parallelModeNeeded" very tricky indeed, because that is expected to
> be set based on whether the SQL is safe to run in parallel mode
> (paralleModeOK == true) and whether force_parallel_mode is not off.
> With parallel safety checks delayed to a later stage in the planner,
> it's then not known whether there are certain types of parallel-unsafe
> INSERTs (such as INSERT INTO ... VALUES ... ON CONFLICT DO UPDATE
> ...), because processing for those doesn't reach those later stages of
> the planner where parallelism is being considered.
>

I guess if that is the only case then you can have that check in the
earlier stage of planner (we should be able to do that as the
information is present in Query) and other checks in the later stage.
However, I guess that is not the only case, we need to determine
parallel-safety of index expressions, trigger functions if any, any
other CHECK expressions on each of attribute, etc.

> So then to avoid
> errors from when parallel-mode is forced on and such unsafe INSERTs
> are run, the only real choice is to only allow parallelModeNeeded to
> be true for SELECT only (not INSERT), and this is kind of cheating and
> also not picking up cases where parallel-safe INSERT is run but
> invokes parallel-mode-unsafe features.
> My conclusion, at least for the moment, is to leave the

68 matches

Mail list logo