date:20190114

Re: [HACKERS] generated columns

2019-01-14 Thread Pavel Stehule

út 15. 1. 2019 v 8:14 odesílatel Michael Paquier 
napsal:

> On Sun, Jan 13, 2019 at 03:31:23PM +0100, Pavel Stehule wrote:
> > ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut <
> > peter.eisentr...@2ndquadrant.com> napsal:
> >> See here:
> >>
> >>
> https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f91...@2ndquadrant.com
> >
> > I understand - it is logical. But it is sad, so this feature is not
> > complete. The benefit is not too big - against functional indexes or
> views.
> > But it can be first step.
>
> I wouldn't say that.  Volatibility restrictions based on immutable
> functions apply to many concepts similar like expression pushdowns to
> make for deterministic results.  The SQL spec takes things on the safe
> side.
>

I would to have a mechanism for safe replacement of triggers of type

if TG_TYPE = 'INSERT' THEN
  NEW.inserted := CURRENT_TIMESTAMP;
ELSE IF TG_TYPE = 'UPDATE' THEN
  NEW.updated := CURRENT_TIMESTAMP;
 ..

But I understand, so current SQL spec design is safe.

Regards

Pavel



>

Re: [HACKERS] generated columns

2019-01-14 Thread Michael Paquier

On Sun, Jan 13, 2019 at 03:31:23PM +0100, Pavel Stehule wrote:
> ne 13. 1. 2019 v 10:43 odesílatel Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> napsal:
>> See here:
>>
>> https://www.postgresql.org/message-id/b5c27634-1d44-feba-7494-ce5a31f91...@2ndquadrant.com
> 
> I understand - it is logical. But it is sad, so this feature is not
> complete. The benefit is not too big - against functional indexes or views.
> But it can be first step.

I wouldn't say that.  Volatibility restrictions based on immutable
functions apply to many concepts similar like expression pushdowns to
make for deterministic results.  The SQL spec takes things on the safe
side.

>> Ah, the volatility checking needs some improvements.  I'll address that
>> in the next patch version.
>
> ok

The same problem happens for stored and virtual columns.

The latest patch has a small header conflict at the top of
rewriteHandler.c which is simple enough to fix.

It would be nice to add a test with composite types, say something
like:
=# create type double_int as (a int, b int);
CREATE TYPE
=# create table double_tab (a int,
  b double_int GENERATED ALWAYS AS ((a * 2, a * 3)) stored,
  c double_int GENERATED ALWAYS AS ((a * 4, a * 5)) virtual);
CREATE TABLE
=# insert into double_tab values (1), (6);
INSERT 0 2
=# select * from double_tab ;
 a |b|c
---+-+-
 1 | (2,3)   | (4,5)
 6 | (12,18) | (24,30)
(2 rows)

Glad to see that typed tables are handled and forbidden, and the
trigger definition looks sane to me.

+-- partitioned table
+CREATE TABLE gtest_parent (f1 date NOT NULL, f2 bigint, f3 bigint
GENERATED ALWAYS AS (f2 * 2)) PARTITION BY RANGE (f1);
+CREATE TABLE gtest_child PARTITION OF gtest_parent FOR VALUES FROM
('2016-07-01') TO ('2016-08-01');
+INSERT INTO gtest_parent (f1, f2) VALUES ('2016-07-15', 1);
In my experience, having tests which handle multiple layers of
partitions is never a bad thing.

+/*
+ * Changing the type of a column that is used by a
+ * generated column is not allowed by SQL standard.
+ * It might be doable with some thinking and effort.
Just mentioning the part about the SQL standard seems fine to me.

+   boolhas_generated_stored;
+   boolhas_generated_virtual;
 } TupleConstr;
Could have been more simple to use a char as representation here.

Using NULL as generation expression results in a crash when selecting
the relation created:
=# CREATE TABLE gtest_err_1 (a int PRIMARY KEY, b int GENERATED ALWAYS
AS (NULL));
CREATE TABLE
=# select * from gtest_err_1;
server closed the connection unexpectedly

=# CREATE TABLE gtest_err_2 (a int PRIMARY KEY, b int NOT NULL
GENERATED ALWAYS AS (NULL));
CREATE TABLE
A NOT NULL column can use NULL as generated result :)

+   The view column_column_usage identifies all
generated
"column_column_usage" is redundant.  Could it be possible to come up
with a better name?

When testing a bulk INSERT into a table which has a stored generated
column, memory keeps growing in size linearly, which does not seem
normal to me.  If inserting more tuples than what I tested (I stopped
at 10M because of lack of time), it seems to me that this could result
in OOMs.  I would have expected the memory usage to be steady.

+/* ignore virtual generated columns; they are always null here */
+if (att->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)
+continue;
Looking for an assertion or a sanity check of some kind?

+   if (pstate->p_expr_kind == EXPR_KIND_GENERATED_COLUMN &&
+   attnum < InvalidAttrNumber && attnum != TableOidAttributeNumber)
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
+errmsg("cannot use system column \"%s\" in column 
generation expression",
+   colname),
+parser_errposition(pstate, location)));
There is a test using xmon, you may want one with tableoid.

+/*
+ * Thin wrapper around libpq to obtain server version.
+ */
+static int
+libpqrcv_server_version(WalReceiverConn *conn)
This should be introduced in separate patch in my opinion (needed
afterwards for logirep).

I have not gone through the PL part of the changes yet, except
plpgsql.

What about the catalog representation of attgenerated?  Would it merge
with attidentity & co?  Or not?
--
Michael


signature.asc
Description: PGP signature

Re: Pluggable Storage - Andres's take

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-15 18:02:38 +1100, Haribabu Kommi wrote:
> On Tue, Dec 11, 2018 at 1:13 PM Andres Freund  wrote:
> 
> > Hi,
> >
> > On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
> > Further tasks I'm not yet planning to tackle, that I'd welcome help on:
> > - pg_upgrade testing
> >
> 
> I did the pg_upgrade testing from older version with some tables and views
> exists,  and all of them are properly transformed into new server with heap
> as the default access method.
> 
> I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
> the proper access method is retained on the upgraded database.
> 
> 
> 
> > - I think we should consider removing HeapTuple->t_tableOid, it should
> >   imo live entirely in the slot
> >
> 
> I removed the t_tableOid from HeapTuple and during testing I found some
> problems with triggers, will post the patch once it is fixed.


Please note that I'm working on a heavily revised version of the patch
right now, trying to clean up a lot of things (you might have seen some
of the threads I started). I hope to post it ~Thursday.  Local-ish
patches shouldn't be a problem though.

Greetings,

Andres Freund

Re: Pluggable Storage - Andres's take

2019-01-14 Thread Haribabu Kommi

On Tue, Dec 11, 2018 at 1:13 PM Andres Freund  wrote:

> Hi,
>
> On 2018-11-26 17:55:57 -0800, Andres Freund wrote:
> Further tasks I'm not yet planning to tackle, that I'd welcome help on:
> - pg_upgrade testing
>

I did the pg_upgrade testing from older version with some tables and views
exists,  and all of them are properly transformed into new server with heap
as the default access method.

I will add the dimitry pg_dump patch and test the pg_upgrade to confirm
the proper access method is retained on the upgraded database.

> - I think we should consider removing HeapTuple->t_tableOid, it should
>   imo live entirely in the slot
>

I removed the t_tableOid from HeapTuple and during testing I found some
problems with triggers, will post the patch once it is fixed.

Regards,
Haribabu Kommi
Fujitsu Australia

Re: [HACKERS] Block level parallel vacuum

2019-01-14 Thread Masahiko Sawada

On Fri, Dec 28, 2018 at 11:43 AM Masahiko Sawada  wrote:
>
> On Thu, Dec 20, 2018 at 3:38 PM Amit Kapila  wrote:
> >
> > On Tue, Dec 18, 2018 at 1:29 PM Masahiko Sawada  
> > wrote:
> > >
> > > Attached the updated patches. I scaled back the scope of this patch.
> > > The patch now includes only feature (a), that is it execute both index
> > > vacuum and cleanup index in parallel. It also doesn't include
> > > autovacuum support for now.
> > >
> > > The PARALLEL option works alomst same as before patch. In VACUUM
> > > command, we can specify 'PARALLEL n' option where n is the number of
> > > parallel workers to request. If the n is omitted the number of
> > > parallel worekrs would be # of indexes -1.
> > >
> >
> > I think for now this is okay, but I guess in furture when we make
> > heapscans also parallel or maybe allow more than one worker per-index
> > vacuum, then this won't hold good. So, I am not sure if below text in
> > docs is most appropriate.
> >
> > +PARALLEL  > class="parameter">N
> > +
> > + 
> > +  Execute index vacuum and cleanup index in parallel with
> > +  N background
> > workers. If the parallel
> > +  degree N is omitted,
> > +  VACUUM requests the number of indexes - 1
> > processes, which is the
> > +  maximum number of parallel vacuum workers since individual
> > indexes is processed by
> > +  one process. The actual number of parallel vacuum workers may
> > be less due to the
> > +  setting of .
> > +  This option can not use with  FULL option.
> >
> > It might be better to use some generic language in docs, something
> > like "If the parallel degree N is omitted, then vacuum decides the
> > number of workers based on number of indexes on the relation which is
> > further limited by max-parallel-workers-maintenance".
>
> Thank you for the review.
>
> I agreed your concern and the text you suggested.
>
> >  I think you
> > also need to mention in some way that you consider storage option
> > parallel_workers.
>
> Added.
>
> >
> > Few assorted comments:
> > 1.
> > +lazy_begin_parallel_vacuum_index(LVState *lvstate, bool for_cleanup)
> > {
> > ..
> > +
> > + LaunchParallelWorkers(lvstate->pcxt);
> > +
> > + /*
> > + * if no workers launched, we vacuum all indexes by the leader process
> > + * alone. Since there is hope that we can launch workers in the next
> > + * execution time we don't want to end the parallel mode yet.
> > + */
> > + if (lvstate->pcxt->nworkers_launched == 0)
> > + return;
> >
> > It is quite possible that the workers are not launched because we fail
> > to allocate memory, basically when pcxt->nworkers is zero.  I think in
> > such cases there is no use for being in parallel mode.  You can even
> > detect that before calling lazy_begin_parallel_vacuum_index.
>
> Agreed. we can stop preparation and exit parallel mode if
> pcxt->nworkers got 0 after InitializeParallelDSM() .
>
> >
> > 2.
> > static void
> > +lazy_vacuum_all_indexes_for_leader(LVState *lvstate,
> > IndexBulkDeleteResult **stats,
> > +LVTidMap *dead_tuples, bool do_parallel,
> > +bool for_cleanup)
> > {
> > ..
> > + if (do_parallel)
> > + lazy_begin_parallel_vacuum_index(lvstate, for_cleanup);
> > +
> > + for (;;)
> > + {
> > + IndexBulkDeleteResult *r = NULL;
> > +
> > + /*
> > + * Get the next index number to vacuum and set index statistics. In 
> > parallel
> > + * lazy vacuum, index bulk-deletion results are stored in the shared memory
> > + * segment. If it's already updated we use it rather than setting it to 
> > NULL.
> > + * In single vacuum, we can always use an element of the 'stats'.
> > + */
> > + if (do_parallel)
> > + {
> > + idx = pg_atomic_fetch_add_u32(&(lvshared->nprocessed), 1);
> > +
> > + if (lvshared->indstats[idx].updated)
> > + r = &(lvshared->indstats[idx].stats);
> > + }
> >
> > It is quite possible that we are not able to launch any workers in
> > lazy_begin_parallel_vacuum_index, in such cases, we should not use
> > parallel mode path, basically as written we can't rely on
> > 'do_parallel' flag.
> >
>
> Fixed.
>
> Attached new version patch.
>

Rebased.


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


v11-0001-Add-parallel-option-to-VACUUM-command.patch
Description: Binary data


v11-0002-Add-P-option-to-vacuumdb-command.patch
Description: Binary data

Re: Pluggable Storage - Andres's take

2019-01-14 Thread Dmitry Dolgov

> On Mon, Jan 14, 2019 at 2:07 PM Amit Khandekar  wrote:
>
> createPQExpBuffer() should be moved after the below statement, so that
> it does not leak memory

Thanks for noticing, fixed.

> So how about bumping up the archive version and doing these checks ?

Yeah, you're right, I've added this check.


pg_dump_access_method_v3.patch
Description: Binary data

Re: using expression syntax for partition bounds

2019-01-14 Thread Amit Langote

Thanks for the review and sorry it took me a while to reply.

On 2019/01/02 22:58, Peter Eisentraut wrote:
> On 26/11/2018 05:58, Amit Langote wrote:
>> On 2018/11/09 14:38, Amit Langote wrote:
>>> Rebased due to change in addRangeTableEntryForRelation's API.
>>
>> Rebased again due to changes in psql's describe output for partitioned 
>> tables.
> 
> Review:
> 
> Is "partition bound" the right term?  For list partitioning, it's not
> really a bound.  Maybe it's OK.

Hmm, maybe "partition bound value"?  Just want to stress that the
expression specifies "bounding" value of a partition.

> Keep the ordering of EXPR_KIND_PARTITION_BOUND in the various switch
> statements consistent.

OK, fixed.

> I don't see any treatment of non-immutable expressions.  There is a test
> case with a non-immutable cast that was removed, but that's all.  There
> was considerable discussion earlier in the thread on whether
> non-immutable expressions should be allowed.  I'm not sure what the
> resolution was, but in any case there should be documentation and tests
> of the outcome.

The partition bound expression is evaluated only once, that is, when the
partition creation command is executed, and what gets stored in the
catalog is a Const expression that contains the value that was computed,
not the original non-immutable expression.  So, we don't need to do
anything special in terms of code to handle users possibly specifying a
non-immutable expression as partition bound.  Tom said something in that
thread which seems to support such a position:

https://www.postgresql.org/message-id/22534.1523374457%40sss.pgh.pa.us

Part of the test case that was removed is the error that used to be emitted:

 CREATE TABLE moneyp (
 a money
 ) PARTITION BY LIST (a);
 CREATE TABLE moneyp_10 PARTITION OF moneyp FOR VALUES IN (10);
-ERROR:  specified value cannot be cast to type money for column "a"
-LINE 1: ...EATE TABLE moneyp_10 PARTITION OF moneyp FOR VALUES IN (10);
-   ^
-DETAIL:  The cast requires a non-immutable conversion.
-HINT:  Try putting the literal value in single quotes.
-CREATE TABLE moneyp_10 PARTITION OF moneyp FOR VALUES IN ('10');

which is no longer emitted, because the patched
transformPartitionBoundValue evaluates the expression, instead of emitting
error because the expression hasn't become a Const even after applying
eval_const_expressions().

Do we need to mention any aspect of how this now works in the user-facing
documentation?

> The collation handling might need another look.  The following is
> allowed without any diagnostic:
> 
> CREATE TABLE t2 (
> a text COLLATE "POSIX"
> ) PARTITION BY RANGE (a);
> CREATE TABLE t2a PARTITION OF t2 FOR VALUES FROM ('foo' COLLATE "C") TO
> ('xyz');
> 
> I think the correct behavior would be that an explicit collation in the
> partition bound expression is an error.

I tend to agree with that.  What happens is the partition bound expression
is assigned the collation of the parent's partition key:

+partcollation = get_partition_col_collation(key, 0);

+value = transformPartitionBoundValue(pstate, expr,
+ colname, coltype, coltypmod,
+ partcollation);

But that's essentially ignoring the collation specified by the user for
the partition bound value without providing any information about that to
the user.  I've fixed that by explicitly checking if the collate clause in
the partition bound value expression contains the same collation as the
parent partition key collation and give an error otherwise.

Updated patch attached.

Thanks,
Amit
>From 4c14559e2d8412c9ec4990e078cdc017a78ca69f Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Jul 2018 14:05:22 +0900
Subject: [PATCH v8] Allow generalized expression syntax for partition bounds

Authors: Kyotaro Horiguchi, Tom Lane, Amit Langote
---
 doc/src/sgml/ref/alter_table.sgml  |   6 +-
 doc/src/sgml/ref/create_table.sgml |  16 +--
 src/backend/commands/tablecmds.c   |   9 ++
 src/backend/optimizer/util/clauses.c   |   4 +-
 src/backend/parser/gram.y  |  60 +---
 src/backend/parser/parse_agg.c |  10 ++
 src/backend/parser/parse_expr.c|   5 +
 src/backend/parser/parse_func.c|   3 +
 src/backend/parser/parse_utilcmd.c | 211 +++--
 src/include/optimizer/clauses.h|   2 +
 src/include/parser/parse_node.h|   1 +
 src/include/utils/partcache.h  |   6 +
 src/test/regress/expected/create_table.out |  69 +++---
 src/test/regress/sql/create_table.sql  |  31 -
 14 files changed, 265 insertions(+), 168 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 0d1feaf828..bd037f2e39 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/

Re: SPI Interface to Call Procedure with Transaction Control Statements?

2019-01-14 Thread Jack LIU

Hi Andrew,

This is my code to call the procedure with
SPI_connect_ext(SPI_OPT_NONATOMIC).

if (run_proc) {
appendStringInfo(&buf, "CALL \"%s\".run_maintenance_proc(p_analyze
:= %s, p_jobmon := %s);", partman_schema, analyze, jobmon);
expected_ret = SPI_OK_UTILITY;
function_run = "run_maintenance_proc() procedure";
SPI_finish();
SPI_connect_ext(SPI_OPT_NONATOMIC);
pgstat_report_activity(STATE_RUNNING, buf.data);

ret = SPI_execute(buf.data, false, 0);
if (ret != expected_ret)
elog(FATAL, "Cannot call pg_partman %s: error code %d",
function_run, ret);
}

It gave the same error:

2019-01-14 22:18:56.898 PST [16048] LOG:  pg_partman dynamic background
worker (dbname=postgres) dynamic background worker initialized with role
ubuntu on database postgres
2019-01-14 22:18:56.918 PST [16048] ERROR:  invalid transaction termination
2019-01-14 22:18:56.918 PST [16048] CONTEXT:  PL/pgSQL function
partman.run_maintenance_proc(integer,boolean,boolean,boolean) line 45 at
COMMIT
SQL statement "CALL "partman".run_maintenance_proc(p_analyze := true,
p_jobmon := true);"
2019-01-14 22:18:56.923 PST [26352] LOG:  background worker "pg_partman
dynamic background worker (dbname=postgres)" (PID 16048) exited with exit
code 1

Thanks,

Jack

On Sun, Jan 13, 2019 at 10:21 PM Andrew Gierth 
wrote:

> > "Jack" == Jack LIU  writes:
>
>  Jack> (I tried to use SPI_connect_ext(SPI_OPT_NONATOMIC) to establish a
>  Jack> nonatomic connection but it doesn't help.)
>
> You need to be specific here about how it didn't help, because this is
> exactly what you're supposed to do, and it should at least change what
> error you got.
>
> --
> Andrew (irc:RhodiumToad)
>

Re: speeding up planning with partitions

2019-01-14 Thread David Rowley

On Sat, 12 Jan 2019 at 21:49, David Rowley  wrote:
> > Can you say what you think is wrong with this way of making a copy of the 
> > ECs?
>
> If you shallow copy with memcpy(new_ec, ec,
> sizeof(EquivalenceClass));, then fields such as ec_relids will just
> point to the same memory as the parent PlannerInfo's
> EquivalenceClasses, so when you do your adjustment (as above) on the
> child eclasses, you'll modify memory belonging to the parent. I'd
> assume you'd not want to do that since you need to keep the parent
> intact at least to make copies for other child rels.  You'd have
> gotten away with it before since you performed a list_copy() and only
> were deleting the parent's EMs with list_delete_cell() which was
> modifying the copy, not the original list. Since you were missing the
> alteration to ec_relids, then you might not have seen issues with the
> shallow copy, but now that you are changing that field, are you not
> modifying the ec_relids field that still set in the parent's
> PlannerInfo?  In this instance, you've sidestepped that issue by using
> bms_difference() which creates a copy of the Bitmapset and modifies
> that. I think you'd probably see issues if you tried to use
> bms_del_members().  I think not doing the deep copy is going to give
> other people making changes in this are a hard time in the future.

Setting to waiting on author pending clarification about shallow vs
deep copying of ECs.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread David Rowley

On Tue, 15 Jan 2019 at 12:24, James Coleman  wrote:
> While I add that though I wanted to get this updated version out to
> get feedback on the approach.

I had a look at this and there's a couple of things I see:

1. In:

+ if (IsA(clause, ScalarArrayOpExpr))
+ {
+ ScalarArrayOpExpr *saop = (ScalarArrayOpExpr *) clause;
+ Node *subexpr = (Node *) ((NullTest *) predicate)->arg;
+ if (op_strict(saop->opno) &&
+ clause_is_strict_for((Node *) linitial(saop->args), subexpr))
+ return true;
+ }
+
  /* foo IS NOT NULL refutes foo IS NULL */
  if (clause && IsA(clause, NullTest) &&

Your IsA(clause, ScalarArrayOpExpr) condition should also be checking
that "clause" can't be NULL.  The NullTest condition one below does
this.

2. I was also staring predicate_implied_by_simple_clause() a bit at
the use of clause_is_strict_for() to ensure that the IS NOT NULL's
operand matches the ScalarArrayOpExpr left operand.  Since
clause_is_strict_for() = "Can we prove that "clause" returns NULL if
"subexpr" does?", in this case, your clause is the ScalarArrayOpExpr's
left operand and subexpr is the IS NOT NULL's operand.  This means
that a partial index with "WHERE a IS NOT NULL" should also be fine to
use for WHERE strict_func(a) IN (1,2,..., 101); since strict_func(a)
must be NULL if a is NULL. Also also works for WHERE a+a
IN(1,2,...,101);   I wonder if it's worth adding a test for that, or
even just modify one of the existing tests to ensure you get the same
result from it. Perhaps it's worth an additional test to ensure that x
IN(1,2,...,101) does not imply x+x IS NOT NULL and maybe that x+x IS
NULL does not refute x IN(1,2,...,101), as a strict function is free
to return NULL even if it's input are not NULL.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-01-14 Thread Michael Paquier

On Tue, Jan 15, 2019 at 03:08:41PM +1100, Haribabu Kommi wrote:
> current_logfiles is a meta data file, that stores the current log writing
> file, and this file presents in the data directory. This file
> doesn't follow the group access mode set at the initdb time, but it
> follows the log_file_mode permissions.
> 
> Without group access permissions, backup with group access can lead to
> failure.  Attached patch fix the problem.

initdb enforces log_file_mode to 0640 when using the group mode, still
if one enforces the parameter value then current_logfiles would just
stick with it.  This is not really user-friendly.  This impacts also
normal log files as these get included in base backups if the log path
is within the data folder (not everybody uses an absolute path out of
the data folder for the logs).

One way to think about this is that we may want to worry also about
normal log files and document that one had better be careful with the
setting of log_file_mode?  Still, as we are talking about a file
aiming at storing meta-data for log files, something like what you
suggest can make sense.

When discussing about pg_current_logfile(), I raised the point about
not including as well in base backups which would also address the
problem reported here.  However we decided to keep it because it can
be helpful to know what's the last log file associated to a base
backup for debugging purposes:
https://www.postgresql.org/message-id/50b58f25-ab07-f6bd-7a68-68f29f214...@dalibo.com

Instead of what you are proposing, why not revisiting that and just
exclude the file from base backups.  I would be in favor of just doing
that instead of switching the file's permission from log_file_mode to
pg_file_create_mode.
--
Michael

signature.asc
Description: PGP signature

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-14 Thread Ashutosh Bapat

I think, there's something better possible. Two partitioned relations won't
use partition-wise join, if their partition schemes do not match.
Partitioned relations with same partitioning scheme share  PartitionScheme
pointer. PartitionScheme structure should get an extra counter, maintaining
a count of number of partitioned relations sharing that structure. When
this counter is 1, that relation is certainly not going to participate in
PWJ and thus need not have all the structure required by PWJ set up. If we
use this counter coupled with enable_partitionwise_join flag, we can get
rid of consider_partitionwise_join flag altogether, I think.

On Tue, Jan 15, 2019 at 8:12 AM Amit Langote 
wrote:

> Fujita-san,
>
> On 2019/01/11 21:50, Etsuro Fujita wrote:
> >>> (2019/01/10 10:41), Amit Langote wrote:
>  That's a loaded meaning and abusing it to mean something else can be
>  challenged, but we can live with that if properly documented.
>  Speaking of
>  which:
> 
>    /* used by partitionwise joins: */
>    boolconsider_partitionwise_join;/* consider
>  partitionwise join
> * paths? (if
>  partitioned
>  rel) */
> 
>  Maybe, mention here how it will be abused in back-branches for
>  non-partitioned relations?
> >>>
> >>> Will do.
> >>
> >> Thank you.
> >
> > I know we don't yet reach a consensus on what to do in details to address
> > this issue, but for the above, how about adding comments like this to
> > set_append_rel_size(), instead of the header file:
> >
> > /*
> >  * If we consider partitionwise joins with the parent rel, do the
> > same
> >  * for partitioned child rels.
> >  *
> >  * Note: here we abuse the consider_partitionwise_join flag for
> child
> >  * rels that are not partitioned, to tell
> try_partitionwise_join()
> >  * that their targetlists and EC entries have been generated.
> >  */
> > if (rel->consider_partitionwise_join)
> > childrel->consider_partitionwise_join = true;
> >
> > ISTM that that would be more clearer than the header file.
>
> Thanks for updating the patch.  I tend to agree that it might be better to
> add such details here than in the header as it's better to keep the latter
> more stable.
>
> About the comment you added, I think we could clarify the note further as:
>
> Note: here we abuse the consider_partitionwise_join flag by setting it
> *even* for child rels that are not partitioned.  In that case, we set it
> to tell try_partitionwise_join() that it doesn't need to generate their
> targetlists and EC entries as they have already been generated here, as
> opposed to the dummy child rels for which the flag is left set to false so
> that it will generate them.
>
> Maybe it's a bit wordy, but it helps get the intention across more clearly.
>
> Thanks,
> Amit
>
>

-- 
--
Best Wishes,
Ashutosh Bapat

Re: unique, partitioned index fails to distinguish index key from INCLUDEd columns

2019-01-14 Thread Alvaro Herrera

On 2019-Jan-14, Justin Pryzby wrote:

> On Mon, Jan 14, 2019 at 07:31:07PM -0300, Alvaro Herrera wrote:

> > Doh.  Fix pushed.  Commit 8224de4f42cc should have changed one
> > appearance of ii_NumIndexAttrs to ii_NumIndexKeyAttrs, but because of
> > the nature of concurrent development, nobody noticed.
> 
> I figured as much - I thought to test this while trying to fall asleep,
> without knowing they were developed in parallel.

:-)

> Should backpatch to v11 ?
> 0ad41cf537ea5f076273fcffa4c83a184bd9910f

Yep, already done (src/tools/git_changelog in master):

Author: Alvaro Herrera 
Branch: master [0ad41cf53] 2019-01-14 19:28:10 -0300
Branch: REL_11_STABLE [74aa7e046] 2019-01-14 19:25:19 -0300

Fix unique INCLUDE indexes on partitioned tables

We were considering the INCLUDE columns as part of the key, allowing
unicity-violating rows to be inserted in different partitions.

Concurrent development conflict in eb7ed3f30634 and 8224de4f42cc.

Reported-by: Justin Pryzby
Discussion: https://postgr.es/m/20190109065109.ga4...@telsasoft.com


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

current_logfiles not following group access and instead follows log_file_mode permissions

2019-01-14 Thread Haribabu Kommi

current_logfiles is a meta data file, that stores the current log writing
file, and this file
presents in the data directory. This file doesn't follow the group access
mode set at
the initdb time, but it follows the log_file_mode permissions.

without group access permissions, backup with group access can lead to
failure.
Attached patch fix the problem.

comments?

Regards,
Haribabu Kommi
Fujitsu Australia


0001-current_logfiles-file-following-group-access-mode.patch
Description: Binary data

Re: Safely calling index_getprocinfo() while holding an nbtree exclusive buffer lock

2019-01-14 Thread Peter Geoghegan

On Mon, Jan 14, 2019 at 7:12 PM Tom Lane  wrote:
> I think you should stop right there and ask why.  Surely that info
> can be obtained before starting the operation?

*Thinks some more*

Uh, I'm already telling the same _bt_truncate() code path that it is
being called from a CREATE INDEX, allowing it to avoid accessing the
metapage. I now think that it would be perfectly acceptable to just
pass down the insertion scan key for the tuple that caused the split,
instead of that build bool, and handle both deadlock issues
(index_getprocinfo() hazard and metapage hazard) that way instead.
Heikki expressed some concerns about the way the patch accesses the
metapage already.

I jumped the gun with this catalog index business. Clearly I'd be much
better off avoiding *all* new buffer lock protocol stuff by getting
both pieces of information up-front -- for some reason I thought that
that would be harder than it now appears.

Thanks
--
Peter Geoghegan

Re: Tid scan improvements

2019-01-14 Thread Tom Lane

Edmund Horner  writes:
> On Sat, 22 Dec 2018 at 07:10, Tom Lane  wrote:
>> I'm not entirely sure why you're bothering; surely nulltestsel is
>> unrelated to what this patch is about?

> I found that it made a difference with selectivity of range comparisons,
> because clauselist_selectivity tries to correct for it (clausesel.c:274):

Oh, I see.

> I guess we could have a standalone patch to add this for all system columns?

+1

regards, tom lane

Re: Tid scan improvements

2019-01-14 Thread Edmund Horner

On Sat, 22 Dec 2018 at 07:10, Tom Lane  wrote:

> BTW, with respect to this bit in 0001:
>
> @@ -1795,6 +1847,15 @@ nulltestsel(PlannerInfo *root, NullTestType
> nulltesttype, Node *arg,
>  return (Selectivity) 0; /* keep compiler quiet */
>  }
>  }
> +else if (vardata.var && IsA(vardata.var, Var) &&
> + ((Var *) vardata.var)->varattno ==
> SelfItemPointerAttributeNumber)
> +{
> +/*
> + * There are no stats for system columns, but we know CTID is
> never
> + * NULL.
> + */
> +selec = (nulltesttype == IS_NULL) ? 0.0 : 1.0;
> +}
>  else
>  {
>  /*
>
> I'm not entirely sure why you're bothering; surely nulltestsel is
> unrelated to what this patch is about?  And would anybody really
> write "WHERE ctid IS NULL"?
>

I found that it made a difference with selectivity of range comparisons,
because clauselist_selectivity tries to correct for it (clausesel.c:274):

s2 = rqlist->hibound + rqlist->lobound - 1.0

/* Adjust for double-exclusion of NULLs */
s2 += nulltestsel(root, IS_NULL, rqlist->var,
  varRelid, jointype, sjinfo);

It was adding DEFAULT_UNK_SEL = 0.005 to the selectivity, which (while not
major) did make the selectivity less accurate.

However, if we do think it's worth adding code to cover this case,
> I wouldn't make it specific to CTID.  *All* system columns can be
> assumed not null, see heap_getsysattr().
>
I guess we could have a standalone patch to add this for all system columns?

Re: could recovery_target_timeline=latest be the default in standby mode?

2019-01-14 Thread Michael Paquier

On Sun, Jan 13, 2019 at 10:17:32AM +0100, Peter Eisentraut wrote:
> I'm not sure what the coverage is in detail in this area.  It seems we
> already have tests for not-specific-recovery-target, maybe not in this
> file, but most of the other tests rely on that, no?

[... Checking ...]
There are no tests on HEAD using directly recovery_target_timeline in
any .pl or .pm file, which is definitely not a good idea.
004_timeline_switch.pl is actually designed for checking the latest
timeline, which is fine.  002_archiving.pl somewhat tests for the
current timeline, still it is not really designed for this purpose.
--
Michael


signature.asc
Description: PGP signature

Re: Safely calling index_getprocinfo() while holding an nbtree exclusive buffer lock

2019-01-14 Thread Tom Lane

Peter Geoghegan  writes:
> My nbtree patch [1] needs to call index_getprocinfo() with an
> exclusive buffer lock held during a leaf page split.

I think you should stop right there and ask why.  Surely that info
can be obtained before starting the operation?  Quite aside from the
deadlock hazard, I do not think holding an exclusive buffer lock
for long enough to go consult a system catalog will be acceptable
from a performance/concurrency standpoint.

regards, tom lane

Safely calling index_getprocinfo() while holding an nbtree exclusive buffer lock

2019-01-14 Thread Peter Geoghegan

My nbtree patch [1] needs to call index_getprocinfo() with an
exclusive buffer lock held during a leaf page split. This has an
undetectable self-deadlock (LWLock self-deadlock) risk: a syscache
lookup against pg_proc might have a catcache miss, ending with an
index scan that needs to access the very same buffer. That's not
acceptable.

There is very similar code to this in SP-GiST:  there is a
index_getprocinfo() call within doPickSplit(), to get the user-defined
method for a split (SPGIST_PICKSPLIT_PROC). My nbtree patch builds a
new insertion scankey to determine how many attributes we can safely
truncate away in new pivot tuples -- it would be tricky to do this
lookup outside of the split function. I suppose that it's okay to do
this in SP-GiST without special care because there cannot be an
SP-GiST index on a system catalog. I'll need to do something else
about it given that I'm doing this within nbtree, though -- I don't
want to complicate the code that deals with insertion scankeys to make
this work.

Here is a strategy that fixes the problem without complicating matters
for nbtree: It should be safe if I make a point of using a special
comparator (the bitwise one that we already use in other contexts in
the patch) with system catalog indexes. We know that they cannot be of
types that have a varlena header + typstorage != 'p', which ensures
that there are no cases where bitwise equality fails to be a reliable
indicator of opclass equality (e.g. there are no cases like numeric
display scale). We could make sure that this requirement isn't
violated in the future by adding a pg_index test to opr_sanity.sql,
limiting system catalog indexes to opclasses that are known-safe for
the bitwise comparator.

Does that seem sane?

[1] https://commitfest.postgresql.org/21/1787/
-- 
Peter Geoghegan

RE: Protect syscache from bloating with negative cache entries

2019-01-14 Thread Tsunakawa, Takayuki

From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> I'm really disappointed by the direction this thread is going in.
> The latest patches add an enormous amount of mechanism, and user-visible
> complexity, to do something that we learned was a bad idea decades ago.
> Putting a limit on the size of the syscaches doesn't accomplish anything
> except to add cycles if your cache working set is below the limit, or make
> performance fall off a cliff if it's above the limit.  I don't think there's
> any reason to believe that making it more complicated will avoid that
> problem.
> 
> What does seem promising is something similar to Horiguchi-san's original
> patches all the way back at
> 
> https://www.postgresql.org/message-id/20161219.201505.11562604.horiguc
> hi.kyot...@lab.ntt.co.jp

> so I'd been thinking about ways to fix that case in particular.

You're suggesting to go back to the original issue (bloat by negative cache 
entries) and give simpler solution to it once, aren't you?  That may be the way 
to go.

But the syscache/relcache bloat still remains a problem, when there are many 
live tables and application connections.  Would you agree to solve this in some 
way?  I thought Horiguchi-san's latest patches would solve this and the 
negative entries.  Can we consider that his patch and yours are orthogonal, 
i.e., we can pursue Horiguchi-san's patch after yours is committed?

(As you said, some parts of Horiguchi-san's patches may be made simpler.  For 
example, the ability to change another session's GUC variable can be discussed 
in a separate thread.)

I think we need some limit to the size of the relcache, syscache, and 
plancache.  Oracle and MySQL both have it, using LRU to evict less frequently 
used entries.  You seem to be concerned about the LRU management based on your 
experience, but would it really cost so much as long as each postgres process 
can change the LRU list without coordination with other backends now?  Could 
you share your experience?

FYI, Oracle provides one parameter, shared_pool_size, that determine the size 
of a memory area that contains SQL plans and various dictionary objects.  
Oracle decides how to divide the area among constituents.  So it could be 
possible that one component (e.g. table/index metadata) is short of space, and 
another (e.g. SQL plans) has free space.  Oracle provides a system view to see 
the free space and hit/miss of each component.  If one component suffers from 
memory shortage, the user increases shared_pool_size.  This is similar to what 
Horiguchi-san is proposing.

MySQL enables fine-tuning of each component.  It provides the size parameters 
for six memory partitions of the dictionary object cache, and the usage 
statistics of those partitions through the Performance Schema.

tablespace definition cache
schema definition cache
table definition cache
stored program definition cache
character set definition cache
collation definition cache

I wonder whether we can group existing relcache/syscache entries like this.



[MySQL]
14.4 Dictionary Object Cache
https://dev.mysql.com/doc/refman/8.0/en/data-dictionary-object-cache.html
--
The dictionary object cache is a shared global cache that stores previously 
accessed data dictionary objects in memory to enable object reuse and minimize 
disk I/O. Similar to other cache mechanisms used by MySQL, the dictionary 
object cache uses an LRU-based eviction strategy to evict least recently used 
objects from memory.

The dictionary object cache comprises cache partitions that store different 
object types. Some cache partition size limits are configurable, whereas others 
are hardcoded.
--


8.12.3.1 How MySQL Uses Memory
https://dev.mysql.com/doc/refman/8.0/en/memory-use.html
--
table_open_cache
MySQL requires memory and descriptors for the table cache.

table_definition_cache
For InnoDB, table_definition_cache acts as a soft limit for the number of open 
table instances in the InnoDB data dictionary cache. If the number of open 
table instances exceeds the table_definition_cache setting, the LRU mechanism 
begins to mark table instances for eviction and eventually removes them from 
the data dictionary cache. The limit helps address situations in which 
significant amounts of memory would be used to cache rarely used table 
instances until the next server restart.
--

Regards
Takayuki Tsunakawa

Re: Libpq support to connect to standby server as priority

2019-01-14 Thread Tom Lane

Michael Paquier  writes:
> On Tue, Jan 15, 2019 at 02:00:57AM +, Tsunakawa, Takayuki wrote:
>> The original desire should have been the ability to connect to a
>> primary or a standby.  So, I think we should go back to the original
>> thinking (and not complicate the feature), and create a read only
>> GUC_REPORT variable, say, server_role, that identifies whether the
>> server is a primary or a standby. 

> From the point of view of making sure that a client is really
> connected to  a primary or a standby, this is the best idea around.

There are a couple of issues here:

1. Are you sure there are no use-cases for testing transaction_read_only
as such?

2. What will the fallback implementation be, when connecting to a server
too old to have the variable you want?

regards, tom lane

Re: Libpq support to connect to standby server as priority

2019-01-14 Thread Michael Paquier

On Tue, Jan 15, 2019 at 02:00:57AM +, Tsunakawa, Takayuki wrote:
> But as some people said, I don't think this is the right way.  I
> suppose what's leading to the current somewhat complicated situation
> is that there was no easy way for the client to know whether the
> server is the master.  That ended up in using "SHOW
> transaction_read_only" instead, and people supported that compromise
> by saying "read only status is more useful than whether the server
> is standby or not," I'm afraid.

Right.  Another pin point is that it is complicated for a client to be
sure that it is connected to a standby as at the time between
transaction_read_only is checked and the connection is reported as
ready to be used for the application, you may be actually linked to a
primary which has just recently been promoted.  I am not personally
sure if it is worth caring about that in such level of details to get
to get something useful, but there have been doubts about not making
that absolutely right to leverage correctly applications willing to
use read-only clients.

> The original desire should have been the ability to connect to a
> primary or a standby.  So, I think we should go back to the original
> thinking (and not complicate the feature), and create a read only
> GUC_REPORT variable, say, server_role, that identifies whether the
> server is a primary or a standby. 

From the point of view of making sure that a client is really
connected to  a primary or a standby, this is the best idea around.
--
Michael

signature.asc
Description: PGP signature

Re: strange valgrind failures (again)

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-15 03:41:34 +0100, Tomas Vondra wrote:
> On 1/15/19 3:11 AM, Andres Freund wrote:
> > On 2019-01-15 03:07:10 +0100, Tomas Vondra wrote:
> >> I've started observing funny valgrind failures on Fedora 28, possibly
> >> after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
> >> time it does not seem like platform-specific issues, though - the
> >> failures all look like this:
> > 
> > Any chance you're compiling without USE_VALGRIND defined? IIRC these are
> > precisely what the VALGRIND_MAKE_MEM_DEFINED calls in inval.c are
> > intended to fight:
> > /*
> >  * Define padding bytes in SharedInvalidationMessage structs to be
> >  * defined. Otherwise the sinvaladt.c ringbuffer, which is accessed by
> >  * multiple processes, will cause spurious valgrind warnings about
> >  * undefined memory being used. That's because valgrind remembers the
> >  * undefined bytes from the last local process's store, not realizing 
> > that
> >  * another process has written since, filling the previously 
> > uninitialized
> >  * bytes
> >  */
> > VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
> > 
> > 
> 
> ... the bang you might have just heard was me facepalming

Heh ;)


> Anyway, I find it interesting that valgrind notices this particular
> place and not the other places, and that it only starts happening after
> a couple of minutes of running the regression tests (~5 minutes or so).

IIRC you basically need to fill the space for sinvals for this to
matter, and individual backends need to be old enough to have previously
used the same space. So it's not that easy to trigger.   I don't think
we needed many other such tricks to make valgrind work / other things
like this have been solved via valgrind.supp, so it's not that
surprising that you didn't find anything else...

Greetings,

Andres Freund

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-14 Thread Amit Langote

Fujita-san,

On 2019/01/11 21:50, Etsuro Fujita wrote:
>>> (2019/01/10 10:41), Amit Langote wrote:
 That's a loaded meaning and abusing it to mean something else can be
 challenged, but we can live with that if properly documented. 
 Speaking of
 which:

   /* used by partitionwise joins: */
   bool    consider_partitionwise_join;    /* consider
 partitionwise join
    * paths? (if
 partitioned
 rel) */

 Maybe, mention here how it will be abused in back-branches for
 non-partitioned relations?
>>>
>>> Will do.
>>
>> Thank you.
> 
> I know we don't yet reach a consensus on what to do in details to address
> this issue, but for the above, how about adding comments like this to
> set_append_rel_size(), instead of the header file:
> 
>     /*
>  * If we consider partitionwise joins with the parent rel, do the
> same
>  * for partitioned child rels.
>  *
>  * Note: here we abuse the consider_partitionwise_join flag for child
>  * rels that are not partitioned, to tell try_partitionwise_join()
>  * that their targetlists and EC entries have been generated.
>  */
>     if (rel->consider_partitionwise_join)
>     childrel->consider_partitionwise_join = true;
> 
> ISTM that that would be more clearer than the header file.

Thanks for updating the patch.  I tend to agree that it might be better to
add such details here than in the header as it's better to keep the latter
more stable.

About the comment you added, I think we could clarify the note further as:

Note: here we abuse the consider_partitionwise_join flag by setting it
*even* for child rels that are not partitioned.  In that case, we set it
to tell try_partitionwise_join() that it doesn't need to generate their
targetlists and EC entries as they have already been generated here, as
opposed to the dummy child rels for which the flag is left set to false so
that it will generate them.

Maybe it's a bit wordy, but it helps get the intention across more clearly.

Thanks,
Amit

Re: strange valgrind failures (again)

2019-01-14 Thread Tomas Vondra

On 1/15/19 3:11 AM, Andres Freund wrote:
> Hi,
> 
> On 2019-01-15 03:07:10 +0100, Tomas Vondra wrote:
>> I've started observing funny valgrind failures on Fedora 28, possibly
>> after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
>> time it does not seem like platform-specific issues, though - the
>> failures all look like this:
> 
> Any chance you're compiling without USE_VALGRIND defined? IIRC these are
> precisely what the VALGRIND_MAKE_MEM_DEFINED calls in inval.c are
> intended to fight:
>   /*
>* Define padding bytes in SharedInvalidationMessage structs to be
>* defined. Otherwise the sinvaladt.c ringbuffer, which is accessed by
>* multiple processes, will cause spurious valgrind warnings about
>* undefined memory being used. That's because valgrind remembers the
>* undefined bytes from the last local process's store, not realizing 
> that
>* another process has written since, filling the previously 
> uninitialized
>* bytes
>*/
>   VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));
> 
> 

... the bang you might have just heard was me facepalming

Yes, I've been compiling without USE_VALGRIND, because I've been
rebuilding using a command from shell history and the command-line grew
a bit too long to notice that.

Anyway, I find it interesting that valgrind notices this particular
place and not the other places, and that it only starts happening after
a couple of minutes of running the regression tests (~5 minutes or so).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Prepare Transaction support for ON COMMIT DROP temporary tables

2019-01-14 Thread Michael Paquier

On Mon, Jan 14, 2019 at 07:41:18PM +0100, Dimitri Fontaine wrote:
> Thanks for the review here Tom, and for linking with the other
> discussion (Álvaro did that too, thanks!). I've been reviewing it
> too.

If you can look at the patch, reviews are welcome.  There are quite a
couple of patterns I spotted on the way.

> I didn't think about the pg_temp_NN namespaces in my approach, and I
> think it might be possible to make it work, but it's getting quite
> involved now.
>
> One idea would be that if every temp table in the session belongs to the
> transaction, and their namespace too (we could check through pg_depend
> that the namespace doesn't contain anything else beside the
> transaction's tables), then we could dispose of the temp schema and
> on-commit-drop tables at PREPARE COMMIT time.

Hm.  A strong assumption that we rely on in the code is that the
temporary namespace drop only happens when the session ends, so you
would need to complicate the logic so as the namespace is created in a
given transaction, which is something that can be done (at least
that's what my patch on the other thread adds control for), and that
no other objects than ON COMMIT tables are created, which is more
tricky to track (still things would get weird with a LOCK on ON COMMIT
DROP tables?).  The root of the problem is that the objects' previous
versions would still be around between the PREPARE TRANSACTION and
COMMIT PREPARED, and that both queries can be perfectly run
transparently across multiple sessions. 

Back in the time, one thing that we did in Postgres-XC was to enforce
2PC to not be used and use a direct commit instead of failing, which
was utterly wrong.  Postgres-XL may be reusing some of that :(

> Yeah. The goal of my approach is to transparently get back temp table
> support in 2PC when that makes sense, which is most use cases I've been
> confronted to. We use 2PC in Citus, and it would be nice to be able to
> use transaction local temp tables on worker nodes when doing data
> injestion and roll-ups.

You have not considered the case of inherited tables and partitioned
mixing ON COMMIT actions of different types as well.  For inherited
tables this does not matter much I think, perhaps for partitions it
does (see tests in 52ea6a8, which you would need to mix with 2PC).
--
Michael

signature.asc
Description: PGP signature

Re: unique, partitioned index fails to distinguish index key from INCLUDEd columns

2019-01-14 Thread Justin Pryzby

On Mon, Jan 14, 2019 at 07:31:07PM -0300, Alvaro Herrera wrote:
> On 2019-Jan-09, Justin Pryzby wrote:
> 
> > -- Fails to error
> > postgres=# CREATE UNIQUE INDEX ON t(j) INCLUDE(i);
> > 
> > -- Fail to enforce uniqueness across partitions due to failure to enforce 
> > inclusion of partition key in index KEY
> > postgres=# INSERT INTO t VALUES(1,1);
> > postgres=# INSERT INTO t VALUES(2,1); 
> 
> Doh.  Fix pushed.  Commit 8224de4f42cc should have changed one
> appearance of ii_NumIndexAttrs to ii_NumIndexKeyAttrs, but because of
> the nature of concurrent development, nobody noticed.

I figured as much - I thought to test this while trying to fall asleep,
without knowing they were developed in parallel.

Should backpatch to v11 ?
0ad41cf537ea5f076273fcffa4c83a184bd9910f

Thanks,
Justin

Re: New vacuum option to do only freezing

2019-01-14 Thread Masahiko Sawada

On Sat, Jan 12, 2019 at 3:27 AM Robert Haas  wrote:
>
> On Fri, Jan 11, 2019 at 1:14 AM Masahiko Sawada  wrote:
> > Attached the updated patch. Please review it.
>
> I'm quite confused by this patch.  It seems to me that the easiest way
> to implement this patch would be to (1) make lazy_space_alloc take the
> maxtuples = MaxHeapTuplesPerPage branch when the new option is
> specified, and then (2) forget about them after each page i.e.
>
> if (nindexes == 0 &&
> vacrelstats->num_dead_tuples > 0)
> {
> ...
> }
> else if (skipping index cleanup)
> vacrelstats->num_dead_tuples = 0;
>
> I don't see why it should touch the logic inside lazy_vacuum_page() or
> the decision about whether to truncate.
>

I think that because the tuples that got dead after heap_page_prune()
looked are recorded but not removed without lazy_vacuum_page() we need
to process them in lazy_vacuum_page(). For decision about whether to
truncate we should not change it, so I will fix it. It should be an
another option to control whether to truncate if we want.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

Re: strange valgrind failures (again)

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-15 03:07:10 +0100, Tomas Vondra wrote:
> I've started observing funny valgrind failures on Fedora 28, possibly
> after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
> time it does not seem like platform-specific issues, though - the
> failures all look like this:

Any chance you're compiling without USE_VALGRIND defined? IIRC these are
precisely what the VALGRIND_MAKE_MEM_DEFINED calls in inval.c are
intended to fight:
/*
 * Define padding bytes in SharedInvalidationMessage structs to be
 * defined. Otherwise the sinvaladt.c ringbuffer, which is accessed by
 * multiple processes, will cause spurious valgrind warnings about
 * undefined memory being used. That's because valgrind remembers the
 * undefined bytes from the last local process's store, not realizing 
that
 * another process has written since, filling the previously 
uninitialized
 * bytes
 */
VALGRIND_MAKE_MEM_DEFINED(&msg, sizeof(msg));

Greetings,

Andres Freund

strange valgrind failures (again)

2019-01-14 Thread Tomas Vondra

Hi,

I've started observing funny valgrind failures on Fedora 28, possibly
after upgrading from 3.14.0-1 to 3.14.0-7 a couple of days ago. This
time it does not seem like platform-specific issues, though - the
failures all look like this:

==20974== Conditional jump or move depends on uninitialised value(s)
==20974==at 0xA02088: calc_bucket (dynahash.c:870)
==20974==by 0xA021BA: hash_search_with_hash_value (dynahash.c:963)
==20974==by 0xA020EE: hash_search (dynahash.c:909)
==20974==by 0x88DAB3: smgrclosenode (smgr.c:358)
==20974==by 0x9D6C01: LocalExecuteInvalidationMessage (inval.c:607)
==20974==by 0x86C44F: ReceiveSharedInvalidMessages (sinval.c:121)
==20974==by 0x9D6D83: AcceptInvalidationMessages (inval.c:681)
==20974==by 0x539B6B: AtStart_Cache (xact.c:980)
==20974==by 0x53AA6C: StartTransaction (xact.c:1915)
==20974==by 0x53B6F0: StartTransactionCommand (xact.c:2685)
==20974==by 0x892EFB: start_xact_command (postgres.c:2475)
==20974==by 0x89083E: exec_simple_query (postgres.c:923)
==20974==by 0x894E7B: PostgresMain (postgres.c:4143)
==20974==by 0x7F553D: BackendRun (postmaster.c:4412)
==20974==by 0x7F4CA1: BackendStartup (postmaster.c:4084)
==20974==by 0x7F12A0: ServerLoop (postmaster.c:1757)
==20974==by 0x7F08CF: PostmasterMain (postmaster.c:1365)
==20974==by 0x728E33: main (main.c:228)
==20974==  Uninitialised value was created by a stack allocation
==20974==at 0x9D65D4: AddCatcacheInvalidationMessage (inval.c:339)
==20974==

==20974== Use of uninitialised value of size 8
==20974==at 0xA021FD: hash_search_with_hash_value (dynahash.c:968)
==20974==by 0xA020EE: hash_search (dynahash.c:909)
==20974==by 0x88DAB3: smgrclosenode (smgr.c:358)
==20974==by 0x9D6C01: LocalExecuteInvalidationMessage (inval.c:607)
==20974==by 0x86C44F: ReceiveSharedInvalidMessages (sinval.c:121)
==20974==by 0x9D6D83: AcceptInvalidationMessages (inval.c:681)
==20974==by 0x539B6B: AtStart_Cache (xact.c:980)
==20974==by 0x53AA6C: StartTransaction (xact.c:1915)
==20974==by 0x53B6F0: StartTransactionCommand (xact.c:2685)
==20974==by 0x892EFB: start_xact_command (postgres.c:2475)
==20974==by 0x89083E: exec_simple_query (postgres.c:923)
==20974==by 0x894E7B: PostgresMain (postgres.c:4143)
==20974==by 0x7F553D: BackendRun (postmaster.c:4412)
==20974==by 0x7F4CA1: BackendStartup (postmaster.c:4084)
==20974==by 0x7F12A0: ServerLoop (postmaster.c:1757)
==20974==by 0x7F08CF: PostmasterMain (postmaster.c:1365)
==20974==by 0x728E33: main (main.c:228)
==20974==  Uninitialised value was created by a stack allocation
==20974==at 0x9D65D4: AddCatcacheInvalidationMessage (inval.c:339)
==20974==

There are more reports in the attached log, but what they all share is
dynahash and invalidations. Which might be an arguments against a
possible valgrind bug, because that would (probably?) affect various
other places.

It's reproducible quite far back (a couple thousand commits, at least),
so it does not seem like caused by a recent commit either.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
==20974== Conditional jump or move depends on uninitialised value(s)
==20974==at 0xA02088: calc_bucket (dynahash.c:870)
==20974==by 0xA021BA: hash_search_with_hash_value (dynahash.c:963)
==20974==by 0xA020EE: hash_search (dynahash.c:909)
==20974==by 0x88DAB3: smgrclosenode (smgr.c:358)
==20974==by 0x9D6C01: LocalExecuteInvalidationMessage (inval.c:607)
==20974==by 0x86C44F: ReceiveSharedInvalidMessages (sinval.c:121)
==20974==by 0x9D6D83: AcceptInvalidationMessages (inval.c:681)
==20974==by 0x539B6B: AtStart_Cache (xact.c:980)
==20974==by 0x53AA6C: StartTransaction (xact.c:1915)
==20974==by 0x53B6F0: StartTransactionCommand (xact.c:2685)
==20974==by 0x892EFB: start_xact_command (postgres.c:2475)
==20974==by 0x89083E: exec_simple_query (postgres.c:923)
==20974==by 0x894E7B: PostgresMain (postgres.c:4143)
==20974==by 0x7F553D: BackendRun (postmaster.c:4412)
==20974==by 0x7F4CA1: BackendStartup (postmaster.c:4084)
==20974==by 0x7F12A0: ServerLoop (postmaster.c:1757)
==20974==by 0x7F08CF: PostmasterMain (postmaster.c:1365)
==20974==by 0x728E33: main (main.c:228)
==20974==  Uninitialised value was created by a stack allocation
==20974==at 0x9D65D4: AddCatcacheInvalidationMessage (inval.c:339)
==20974== 
{
   
   Memcheck:Cond
   fun:calc_bucket
   fun:hash_search_with_hash_value
   fun:hash_search
   fun:smgrclosenode
   fun:LocalExecuteInvalidationMessage
   fun:ReceiveSharedInvalidMessages
   fun:AcceptInvalidationMessages
   fun:AtStart_Cache
   fun:StartTransaction
   fun:StartTransactionCommand
   fun:start_xact_command
   fun:exec_simple_query
   fun:PostgresMain
   fun:BackendRun
   fun:BackendStartup
   fun:ServerLoop
   fun:PostmasterMain
   fun:

RE: Libpq support to connect to standby server as priority

2019-01-14 Thread Tsunakawa, Takayuki

From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> The problem here of course is that whoever invented target_session_attrs
> was unconcerned with following that precedent, so what we have is
> "target_session_attrs=(any | read-write)".
> Are we prepared to add some aliases in service of unifying these names?

I think "yes".

> 2. Whether or not you want to follow pgJDBC's naming, it seems like we
> ought to have both "require read only" and "prefer read only" behaviors
> in this patch, and maybe likewise "require read write" versus "prefer
> read write".

Agreed, although I don't see a use case for "prefer read write".  I don't think 
there's an app like "I want to write, but I'm OK if I cannot."


> 3. We ought to sync this up with whatever's going to happen in
> https://commitfest.postgresql.org/21/1090/
> at least to the extent of agreeing on what GUCs we'd like to see
> the server start reporting.

Yes.

> 4. Given that other discussion, it's not quite clear what we should
> even be checking.  The existing logic devolves to checking that
> transaction_read_only is true, but that's not really the same thing as
> "is a master server", eg you might have connected to a master server
> under a role that has SET ROLE default_transaction_read_only = false.
> (I wonder what pgJDBC is really checking, under the hood.)
> Do we want to have modes that are checking hot-standby state in some
> fashion, rather than the transaction_read_only state?

PgJDBC uses transaction_read_only like this:

[core/v3/ConnectionFactoryImpl.java]
  private boolean isMaster(QueryExecutor queryExecutor) throws SQLException, 
IOException {
byte[][] results = SetupQueryRunner.run(queryExecutor, "show 
transaction_read_only", true);
String value = queryExecutor.getEncoding().decode(results[0]);
return value.equalsIgnoreCase("off");
  }

But as some people said, I don't think this is the right way.  I suppose what's 
leading to the current somewhat complicated situation is that there was no easy 
way for the client to know whether the server is the master.  That ended up in 
using "SHOW transaction_read_only" instead, and people supported that 
compromise by saying "read only status is more useful than whether the server 
is standby or not," I'm afraid.

The original desire should have been the ability to connect to a primary or a 
standby.  So, I think we should go back to the original thinking (and not 
complicate the feature), and create a read only GUC_REPORT variable, say, 
server_role, that identifies whether the server is a primary or a standby.


Regards
Takayuki Tsunakawa

Re: Copy function for logical replication slots

2019-01-14 Thread Masahiko Sawada

On Tue, Nov 27, 2018 at 3:46 AM Petr Jelinek
 wrote:
>
> Hi,
>
> On 26/11/2018 01:29, Masahiko Sawada wrote:
> > On Sun, Nov 25, 2018 at 12:27 AM Petr Jelinek
> >  wrote:
> >>
> >> The more serious thing is:
> >>
> >>> + if (MyReplicationSlot)
> >>> + ReplicationSlotRelease();
> >>> +
> >>> + /* Release the saved slot if exist while preventing double 
> >>> releasing */
> >>> + if (savedslot && savedslot != MyReplicationSlot)
> >>
> >> This won't work as intended as the ReplicationSlotRelease() will set
> >> MyReplicationSlot to NULL, you might need to set aside MyReplicationSlot
> >> to yet another temp variable inside this function prior to releasing it.
> >>
> >
> > You're right. I've fixed it by checking if we need to release the
> > saved slot before releasing the origin slot. Is that right?
> > Attached an updated patch.
> >
>
> Sounds good.
>
> I do have one more minor gripe after reading though again:
>

Thank you for the review comment and sorry for the late response.

> > +
> > +   /*
> > +* The requested wal lsn is no longer available. We don't 
> > want to retry
> > +* it, so raise an error.
> > +*/
> > +   if (!XLogRecPtrIsInvalid(requested_lsn))
> > +   {
> > +   char filename[MAXFNAMELEN];
> > +
> > +   XLogFileName(filename, ThisTimeLineID, segno, 
> > wal_segment_size);
> > +   ereport(ERROR,
> > +   (errmsg("could not reserve WAL 
> > segment %s", filename)));
> > +   }
>
> I would reword the comment to something like "The caller has requested a
> specific wal lsn which we failed to reserve. We can't retry here as the
> requested wal is no longer available." (It took me a while to understand
> this part).
>
> Also the ereport should have errcode as it's going to be thrown to user
> sessions and it might be better if the error itself used same wording as
> CheckXLogRemoved() and XLogRead() for consistency. What do you think?
>

I agreed your both comments. I've changed the above comment and
ereport. Attached the updated version patch.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


v8-0001-Copy-function-for-logical-and-physical-replicatio.patch
Description: Binary data

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-14 Thread Amit Langote

On 2019/01/15 10:46, Amit Langote wrote:
> On 2019/01/11 20:04, Etsuro Fujita wrote:
>> One thing I intended in that commit was to set the flag to false for
>> partitioned tables contained in inheritance trees where the top parent is
>> a UNION ALL subquery, because we don't consider PWJ for those tables.
>>  Actually we wouldn't need to care about that, because we don't do PWJ for
>> those tables regardless of what the flag is set, but I think that would
>> make the code a bit cleaner.
> 
> Yeah, we wouldn't do partitionwise join between partitioned tables that
> are under UNION ALL.
> 
>> However, what you proposed here as-is would
>> not keep that behavior, because rel->part_scheme is created for those
>> tables as well
> 
> It'd be easy to prevent set consider_partitionwise_join to false in that
> case as:

Oops, I meant to say:

It'd be easy to prevent setting consider_partitionwise_join in that case as:

> 
> +rel->consider_partitionwise_join = (rel->part_scheme != NULL &&
> +   (parent == NULL ||
> +parent->rtekind != RTE_SUBQUERY));

Thanks,
Amit

Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-14 Thread Amit Langote

Fujita-san,

On 2019/01/11 20:04, Etsuro Fujita wrote:
> (2019/01/11 13:46), Amit Langote wrote:
>> On 2019/01/10 15:07, Etsuro Fujita wrote:
>>> (The name of the flag isn't
>>> good?  If so, that would be my fault because I named that flag.)
>>
>> If it's really just to store the fact that the relation's targetlist
>> contains expressions that partitionwise join currently cannot handle, then
>> setting it like this in set_append_rel_size seems a bit misleading:
>>
>>  if (enable_partitionwise_join&&
>>  rel->reloptkind == RELOPT_BASEREL&&
>>  rte->relkind == RELKIND_PARTITIONED_TABLE&&
>>  rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL)
>>  rel->consider_partitionwise_join = true;
>>
>> Sorry, I wasn't around to comment on the patch which got committed in
>> 7cfdc77023a, but checking the value of enable_partitionwise_join and other
>> things in set_append_rel_size() to set the value of
>> consider_partitionwise_join seems a bit odd to me.  Perhaps,
>> consider_partitionwise_join should be initially set to true for a relation
>> (actually, to rel->part_scheme != NULL) and only set it to false if the
>> relation's targetlist is found to contain unsupported expressions.
> 
> One thing I intended in that commit was to set the flag to false for
> partitioned tables contained in inheritance trees where the top parent is
> a UNION ALL subquery, because we don't consider PWJ for those tables.
>  Actually we wouldn't need to care about that, because we don't do PWJ for
> those tables regardless of what the flag is set, but I think that would
> make the code a bit cleaner.

Yeah, we wouldn't do partitionwise join between partitioned tables that
are under UNION ALL.

> However, what you proposed here as-is would
> not keep that behavior, because rel->part_scheme is created for those
> tables as well

It'd be easy to prevent set consider_partitionwise_join to false in that
case as:

+rel->consider_partitionwise_join = (rel->part_scheme != NULL &&
+   (parent == NULL ||
+parent->rtekind != RTE_SUBQUERY));


> (even though there would be no need IIUC).

Partition pruning uses part_scheme and pruning can occur even if a
partitioned table is under UNION ALL, so it *is* needed in that case.

>> I think
>> enable_partitionwise_join should only be checked in relnode.c or
>> joinrels.c.
> 
> Sorry, I don't understand this.

What I was trying to say is that we should check the GUC close to where
partitionwise join is actually implemented even though there is no such
hard and fast rule.  Or maybe I'm just a bit uncomfortable with setting
consider_partitionwise_join based on the GUC.

>> I've attached a patch to show what I mean. Can you please
>> take a look?
> 
> Thanks for the patch!  Maybe I'm missing something, but I don't have a
> strong opinion about that change.  I'd rather think to modify
> build_simple_rel so that it doesn't create rel->part_scheme if unnecessary
> (ie, partitioned tables contained in inheritance trees where the top
> parent is a UNION ALL subquery).

As I said above, partition pruning can occur even if a partitioned table
happens to be under UNION ALL.  However, we *can* avoid creating
part_scheme and setting other partitioning properties if *all* of
enable_partition_pruning, enable_partitionwise_join, and
enable_partitionwise_aggregate are turned off.

>> If you think that this patch is a good idea, then you'll need to
>> explicitly set consider_partitionwise_join to false for a dummy partition
>> rel in set_append_rel_size(), because the assumption of your patch that
>> such partition's rel's consider_partitionwise_join would be false (as
>> initialized with the current code) would be broken by my patch.  But that
>> might be a good thing to do anyway as it will document the special case
>> usage of consider_partitionwise_join variable more explicitly, assuming
>> you'll be adding a comment describing why it's being set to false
>> explicitly.
> 
> I'm not sure we need this as part of a fix for the issue reported on this
> thread.  I don't object to what you proposed here, but that would be
> rather an improvement, so I think we should leave that for another patch.

Sure, no problem with committing it separately if at all.  Thanks for
considering.

Thanks,
Amit

Re: explain plans with information about (modified) gucs

2019-01-14 Thread Tomas Vondra




On 1/14/19 11:13 PM, Alvaro Herrera wrote:
> On 2019-Jan-14, Tomas Vondra wrote:
> 
>> The one slightly annoying issue is that currently all the options are
>> formatted as text, including e.g. cpu_tuple_cost. That's because the GUC
>> options may have show hook, so I can't access the value directly (not
>> sure if there's an option around it).
> 
> I think the problem is that you'd have to know how to print the value,
> which can be in one of several different C types.  You'd have to grow
> some more infrastructure in the GUC tables, I think, and that doesn't
> seem worth the trouble.  Printing as text seems enough.
> 

I don't think the number of formats is such a big issue - the range of
formats is quite limited: PGC_BOOL, PGC_INT, PGC_REAL, PGC_STRING and
PGC_ENUM. But the show hook simply returns string, and I'm not sure it's
guaranteed it matches the raw value (afaik the assign/show hooks can do
all kinds of funny stuff).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Reducing header interdependencies around heapam.h et al.

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-14 17:55:44 -0300, Alvaro Herrera wrote:
> On 2019-Jan-14, Andres Freund wrote:
>
> > I think the whole pointer hiding game that we play is a really really
> > bad idea. We should just about *never* have typedefs that hide the fact
> > that something is a pointer. But it's hard to go away from that for
> > legacy reasons.
> >
> > The problem with your approach is that it's *eminently* reasonable to
> > want to forward declare a struct in multiple places. Otherwise you end
> > up in issues where you include headers like heapam.h solely for a
> > typedef, which obviously doesn't make a ton of sense.
>
> Well, my point is that in an ideal world we would have a header where
> the struct is declared once in a very lean header, which doesn't include
> almost anything else, so you can include it into other headers
> liberally.  Then the struct definitions are any another (heavy) header,
> which *does* need to include lots of other stuff in order to be able to
> define the structs fully, and would be #included very sparingly, only in
> the few .c files that really needed it.

> For example, I would split up execnodes.h so that *only* the
> typedef/struct declarations are there, and *no* struct definition.  Then
> that header can be #included in other headers that need those to declare
> functions -- no problem.  Another header (say execstructs.h, whatever)
> would contain the struct definition and would only be used by executor
> .c files.  So when a struct changes, only the executor is recompiled;
> the rest of the source doesn't care, because execnodes.h (the struct
> decls) hasn't changed.

It's surely better than the current state, but it still requires
recompiling everything in a more cases than necessary.


> This may be too disruptive.  I'm not suggesting that you do things this
> way, only describing my ideal world.

It'd probably doable by leaving execnodes.h as the heavyweight nodes,
and execnodetypes.h as the lightweight one, and including the latter
from the former. And then moving users of execnodes over to
execnodetypes.


> Your idea of "liberally forward-declaring structs in multiple places"
> seems like you don't *have* the lean header at all (only the heavy one
> with all the struct definitions), and instead you distribute bits and
> pieces of the lean header randomly to the places that need it.  It's
> repetitive.  It gets the job done, but it's not *clean*.

I'm not really buying the repetitiveness bit - it's really primarily
adding 'struct ' prefix, and sometimes adding a 'Data *' postfix. That's
not a lot of duplication. When used in structs there's no need to even
add an explicit 'struct ;' forward declaration, that's only needed
for function parameters.  And afterwards there's a lot less entanglement
- no need to recompile every file just because a new node type has been
added etc.


> > Given the fact that including headers just for a typedef is frequent
> > overkill, hiding the typedef in a separate header has basically no
> > gain. I also don't quite understand why using a forward declaration with
> > struct in the name is that confusing, especially when it only happens in
> > the header.
>
> Oh, that's not the confusing part -- that's just repetitive, nothing
> more.  What's confusing (IMO) is having two names for the same struct
> (one pointer and one full struct).  It'd be useful if it was used the
> way I describe above.  But that's the settled project style, so I don't
> have any hopes that it'll ever be changed.

Not within a few days, but we probably can do it over time...


> Anyway, I'm not objecting to your patch ... I just don't want it on
> record that I'm in love with the current situation :-)

Cool, I've pushed these now. I'll rebase my patch to split
(heap|reation)_(open|close)(rv)? patch out of heapam.[ch] now. Brr.

Greetings,

Andres Freund

RE: [Proposal] Add accumulated statistics

2019-01-14 Thread Tsunakawa, Takayuki

From: Pavel Stehule [mailto:pavel.steh...@gmail.com]
> the cumulated lock statistics maybe doesn't help with debugging - but it
> is very good indicator of database (in production usage) health.

I think it will help both.  But I don't think the sampling won't be as helpful 
as the precise lock statistics accumulation, because the sampling doesn't give 
us exactly how effective our improvements to PostgreSQL code are.  I remember 
PG developers used LOCK_STATS to see how many (or ratio of) lwlock waits 
decreased by applying patches.


We can use the cumulated lock stats like:

1. SELECT * FROM pg_session_waits;
2. Run a benchmark.
3. SELECT * FROM pg_session_waits;
4. Calculate the difference between 1 and 3.

Or, reset the wait stats before the benchmark run and just use the stats as-is.

I'd like to know why you thought the cumulated wait stats isn't helpful for 
debugging.


Regards
Takayuki Tsunakawa

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2019-01-14 Thread David Rowley

On Tue, 15 Jan 2019 at 13:17, Tom Lane  wrote:
>
> David Rowley  writes:
> > 1. I don't think having a table named "dual" makes a whole lot of
> > sense for a table with a single row.
>
> Well, I borrowed Oracle terminology there ;-)

yep, but ... go on... break the mould :)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Speeding up text_position_next with multibyte encodings

2019-01-14 Thread John Naylor

On Sun, Dec 23, 2018 at 9:33 AM Tomas Vondra
 wrote:
> So, what is the expected speedup in a "good/average" case? Do we have
> some reasonable real-world workload mixing these cases that could be
> used as a realistic benchmark?

Not sure about a realistic mix, but I investigated the tradeoffs.
First, using the test function Heikki posted upthread [1], I tried a 2
character needle needle with different haystack sizes, and looked for
the position where master and patch were roughly equal (average of 3,
within 1%). Beyond this point, the patch regresses. To keep it simple
I used UTF-8 only.

haystackposition
<=23   (patch always faster)
30  23
100 58
1000   ~550
100~55

For very short haystacks, the patch is always faster or regresses only
when the needle is close to the end. For longer haystacks, the patch
will be faster if the position searched for is less than ~55% of the
way to the end, slower if after. Next, I tested finding the position
of a 2 character needle in a couple positions towards the front of a
1000 character haystack, plus not present and at the end for
comparison. As seen earlier [1], the worst-case regression for this
haystack length was 4.4x slower for UTF-8, but that had a skip table
collision, and this time I used characters with no bytes in common.
Average of 3, with a loop count of 1,000,000:

UTF-8
pos.  masterpatch  diff
* 3880ms 144ms-96%
100   4440ms1410ms-68%
333   5700ms4280ms-25%
999   9370ms   12500ms 34%

EUC-KR
pos.  master patchdiff
* 3900ms 112ms-97%
100   4410ms1289ms-71%
333   5680ms3970ms-30%
999   9370ms   11600ms 24%

The patch is much faster than master when the needle is near the front
of a large haystack or not there at all.

The majority of cases are measurably faster, and the best case is at
least 20x faster. On the whole I'd say this patch is a performance win
even without further optimization. I'm marking it ready for committer.

[1] 
https://www.postgresql.org/message-id/05d95c5c-1fb7-29ea-1c5d-e7e941a0a14d%40iki.fi
--

-John Naylor

RE: O_DIRECT for relations and SLRUs (Prototype)

2019-01-14 Thread Tsunakawa, Takayuki

From: Michael Paquier [mailto:mich...@paquier.xyz]
> One of the reasons why I have begun this thread is that since we have heard
> about the fsync issues on Linux, I think that there is room for giving our
> user base more control of their fate without relying on the Linux community
> decisions to potentially eat data and corrupt a cluster with a page dirty
> bit cleared without its data actually flushed.  Even the latest kernels
> are not fixing all the patterns with open fds across processes, switching
> the problem from one corner of the table to another, and there are folks
> patching the Linux kernel to make Postgres more reliable from this
> perspective, and living happily with this option.  As long as the option
> can be controlled and defaults to false, it seems to be that we could do
> something.  Even if the performance is bad, this gives the user control
> of how he/she wants things to be done.

Thank you for starting an interesting topic.  We probably want the direct I/O.  
On a INSERT and UPDATE heavy system with PostgreSQL 9.2, we suffered from 
occasional high response times due to the Linux page cache activity.  Postgres 
processes competed for the page cache to read/write the data files, write 
online and archive WAL files, and write the server log files (auto_explain and 
autovacuum workers emitted a lot of logs.)  The user with Oracle experience 
asked why PostgreSQL doesn't handle database I/O by itself...

And I wonder how useful the direct I/O for low latency devices like the 
persistent memory.  The overhead of the page cache may become relatively higher.


Regards
Takayuki Tsunakawa

Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-14 Thread Amit Langote

On 2019/01/13 16:56, Michael Paquier wrote:
> On Fri, Jan 11, 2019 at 04:04:41PM +0900, Michael Paquier wrote:
>> Thanks.  I can commit this version if there are no objections then.
> 
> And done.

Thank you!

Regards,
Amit

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2019-01-14 Thread Tom Lane

David Rowley  writes:
> 1. I don't think having a table named "dual" makes a whole lot of
> sense for a table with a single row.

Well, I borrowed Oracle terminology there ;-)

> (Uppercasing these additions would also make them look less of an 
> afterthought.)

Don't really care, can do.

> I also did a quick benchmark of v6 and found the slowdown to be
> smaller after the change made in build_simple_rel()

Thanks for confirming.  I was not very sure that was worth the extra
few bytes of code space, but if you see a difference too, then it's
probably worthwhile.

regards, tom lane

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> What I'm not willing to do is write hacks for pg_upgrade or pg_dump
>> to mask cases where the option has been set on a v11 index.  I judge
>> that it's not worth the trouble.  If someone else disagrees, they
>> can do that work.

> I'd love for there to be a better option beyond "just let people run the
> pg_upgrade and have it fail half-way through"...  Particularly after
> someone's run the pg_upgrade check against the database...

> If there isn't, then I'll write the code to add the check to pg_upgrade.

Go for it.

regards, tom lane

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2019-01-14 Thread David Rowley

On Tue, 15 Jan 2019 at 09:48, Tom Lane  wrote:
>
> David Rowley  writes:
> > SELECT 1; I believe is a common query for some connection poolers as a
> > sort of ping to the database.  In light of that, the performance drop
> > of 2 microseconds per query is not going to amount to very much in
> > total for that use case. i.e you'll need to do half a million pings
> > before it'll cost you 1 second of additional CPU time.
>
> Yeah, I agree this is not something to get hot & bothered over, but
> I thought it was worth spending an hour seeing if there were any
> easy wins.  Not much luck.

Thanks for putting in the effort.

> Anyway, herewith v6, rebased up to HEAD, with the build_simple_rel
> improvement and the regression test fix I mentioned earlier.

I had a look at these changes, I only have 1 comment:

1. I don't think having a table named "dual" makes a whole lot of
sense for a table with a single row.  I'm sure we can come up with a
more suitably named table to serve the purpose. How about "single"?

 INSERT INTO J2_TBL VALUES (0, NULL);
 INSERT INTO J2_TBL VALUES (NULL, NULL);
 INSERT INTO J2_TBL VALUES (NULL, 0);
+-- useful in some tests below
+create temp table dual();
+insert into dual default values;
+analyze dual;

(Uppercasing these additions would also make them look less of an afterthought.)

I also did a quick benchmark of v6 and found the slowdown to be
smaller after the change made in build_simple_rel()

Test 1 = explain select 1;

Unpatched:
$ pgbench -n -f bench.sql -T 60 postgres
tps = 30259.096585 (excluding connections establishing)
tps = 30094.533610 (excluding connections establishing)
tps = 30124.154255 (excluding connections establishing)

Patched:
tps = 29667.414788 (excluding connections establishing)
tps = 29555.325522 (excluding connections establishing)
tps = 29101.083145 (excluding connections establishing)

(2.38% down)

Test 2 = select 1;

Unpatched:
tps = 36535.991023 (excluding connections establishing)
tps = 36568.604011 (excluding connections establishing)
tps = 35938.923066 (excluding connections establishing)

Patched:
tps = 35187.363260 (excluding connections establishing)
tps = 35166.993210 (excluding connections establishing)
tps = 35436.486315 (excluding connections establishing)

(2.98% down)

As far as I can see the patch is ready to go, but I'll defer to Mark,
who's also listed on the reviewer list for this patch.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andres Freund  writes:
> > On 2019-01-14 18:53:02 -0500, Tom Lane wrote:
> >> But I suspect just doing the revert is already going to be painful
> >> enough :-(
> 
> > I assume you're not particularly interested in doing that?
> 
> No, I'm willing to do it, and will do so tomorrow if there haven't
> been objections.
> 
> What I'm not willing to do is write hacks for pg_upgrade or pg_dump
> to mask cases where the option has been set on a v11 index.  I judge
> that it's not worth the trouble.  If someone else disagrees, they
> can do that work.

I'd love for there to be a better option beyond "just let people run the
pg_upgrade and have it fail half-way through"...  Particularly after
someone's run the pg_upgrade check against the database...

If there isn't, then I'll write the code to add the check to pg_upgrade.

I'll also offer to add other such checks, if there's similar cases that
people are aware of.

Thanks,

Stephen

signature.asc
Description: PGP signature

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-14 19:04:07 -0500, Stephen Frost wrote:
> As that's the case, then I guess I'm thinking we really should make
> pg_upgrade complain if it finds it during the check phase.  I really
> don't like having a case like this where the pg_upgrade will fail from
> something that we could have detected during the pre-flight check,
> that's what it's for, after all.

I suggest you write a separate patch for that in that case.

Greetings,

Andres Freund

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-14 18:55:18 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > > Or are you suggesting that pg_dump in v12+ would throw errors if it
> > > > finds that set?  Or that we'll dump it, but fail to allow it into a
> > > > v12+ database?  What if v12 sees "recheck_on_update='false'", as a v11
> > > > pg_dump might output today?
> > > 
> > > It'll just error out on restore (including the internal one by
> > > pg_upgrade). I don't see much point in doing more, this isn't a widely
> > > used option by any stretch.
> > 
> > This.. doesn't actually make sense.  The pg_upgrade will use v12+
> > pg_dump.  You're saying that the v12+ pg_dump will dump out the option,
> > but then the restore will fail to understand it?
> 
> Why does that not make sense? With one exception the reloptions code in
> pg_dump doesn't have knowledge of individual reloptions. So without
> adding any special case code pg_dump will just continue to dump the
> option. And the server will just refuse to restore it, because it
> doesn't know it anymore.

Ugh, yeah, I was catching up to realize that we'd have to special-case
add this into pg_dump to get it avoided; most things in pg_dump don't
work that way.

As that's the case, then I guess I'm thinking we really should make
pg_upgrade complain if it finds it during the check phase.  I really
don't like having a case like this where the pg_upgrade will fail from
something that we could have detected during the pre-flight check,
that's what it's for, after all.

Thanks,

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-14 18:53:02 -0500, Tom Lane wrote:
>> But I suspect just doing the revert is already going to be painful
>> enough :-(

> I assume you're not particularly interested in doing that?

No, I'm willing to do it, and will do so tomorrow if there haven't
been objections.

What I'm not willing to do is write hacks for pg_upgrade or pg_dump
to mask cases where the option has been set on a v11 index.  I judge
that it's not worth the trouble.  If someone else disagrees, they
can do that work.

regards, tom lane

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-14 18:55:18 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > > Or are you suggesting that pg_dump in v12+ would throw errors if it
> > > finds that set?  Or that we'll dump it, but fail to allow it into a
> > > v12+ database?  What if v12 sees "recheck_on_update='false'", as a v11
> > > pg_dump might output today?
> > 
> > It'll just error out on restore (including the internal one by
> > pg_upgrade). I don't see much point in doing more, this isn't a widely
> > used option by any stretch.
> 
> This.. doesn't actually make sense.  The pg_upgrade will use v12+
> pg_dump.  You're saying that the v12+ pg_dump will dump out the option,
> but then the restore will fail to understand it?

Why does that not make sense? With one exception the reloptions code in
pg_dump doesn't have knowledge of individual reloptions. So without
adding any special case code pg_dump will just continue to dump the
option. And the server will just refuse to restore it, because it
doesn't know it anymore.

Greetings,

Andres Freund

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Stephen Frost  writes:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> >> After a few minutes' more thought, I think that the most attractive
> >> option is to leave v11 alone and do a full revert in HEAD.  In this
> >> way, if anyone's attached "recheck_on_update" options to their indexes,
> >> it'll continue to work^H^H^H^Hdo nothing in v11, though they won't be
> >> able to migrate to v12 till they remove the options.  That way we
> >> aren't bound to the questionable design and naming of that storage
> >> option if/when we try this again.
> 
> > So the plan is to add a check into pg_upgrade to complain if it comes
> > across any cases where recheck_on_update is set during its pre-flight
> > checks..?
> 
> It wasn't my plan particularly.  I think the number of databases with
> that option set is probably negligible, not least because it was
> on-by-default during its short lifespan.  So there really has never been
> a point where someone would have had a reason to turn it on explicitly.
> 
> Now if somebody else is excited enough to add such logic to pg_upgrade,
> I wouldn't stand in their way.  But I suspect just doing the revert is
> already going to be painful enough :-(

It seems like the thing to do would be to just ignore the option in v12+
pg_dump then, meaning that pg_dump wouldn't output it and
pg_restore/v12+ wouldn't ever see it, and therefore users wouldn't get
an error even if that option was used when they upgrade.

I could live with that, but you seemed to be suggesting that something
else would happen earlier.

> > What if v12 sees "recheck_on_update='false'", as a v11
> > pg_dump might output today?
> 
> It'll complain that that's an unknown option.

Right, that's kinda what I figured.  I'm not thrilled with that either,
but hopefully it won't be too big a deal for users.

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-14 18:53:02 -0500, Tom Lane wrote:
> But I suspect just doing the revert is already going to be painful
> enough :-(

I assume you're not particularly interested in doing that? I'm more than
happy to leave this to others, but if nobody signals interest I'll give
it a go, because it'll get a lot harder after the pluggable storage work
(and it'd work even less usefully afterwards, given the location in
heapam.c).

Greetings,

Andres Freund

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Stephen Frost

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-01-14 18:46:18 -0500, Stephen Frost wrote:
> > * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > > Andres Freund  writes:
> > > > On 2019-01-14 18:03:24 -0500, Tom Lane wrote:
> > > >> Do we want to revert entirely, or leave the "recheck_on_update" option
> > > >> present but nonfunctional?
> > > 
> > > > I think it depends a bit on whether we want to revert in master or
> > > > master and 11. If only master, I don't see much point in leaving the
> > > > option around. If both, I think we should (need to?) leave it around in
> > > > 11 only.
> > > 
> > > After a few minutes' more thought, I think that the most attractive
> > > option is to leave v11 alone and do a full revert in HEAD.  In this
> > > way, if anyone's attached "recheck_on_update" options to their indexes,
> > > it'll continue to work^H^H^H^Hdo nothing in v11, though they won't be
> > > able to migrate to v12 till they remove the options.  That way we
> > > aren't bound to the questionable design and naming of that storage
> > > option if/when we try this again.
> > 
> > So the plan is to add a check into pg_upgrade to complain if it comes
> > across any cases where recheck_on_update is set during its pre-flight
> > checks..?
> 
> I don't think so.

I could possibly see us just ignoring the option in pg_dump, but that
goes against what Tom was suggesting, since users wouldn't see an error
if we don't dump the option out..

> > Or are you suggesting that pg_dump in v12+ would throw errors if it
> > finds that set?  Or that we'll dump it, but fail to allow it into a
> > v12+ database?  What if v12 sees "recheck_on_update='false'", as a v11
> > pg_dump might output today?
> 
> It'll just error out on restore (including the internal one by
> pg_upgrade). I don't see much point in doing more, this isn't a widely
> used option by any stretch.

This.. doesn't actually make sense.  The pg_upgrade will use v12+
pg_dump.  You're saying that the v12+ pg_dump will dump out the option,
but then the restore will fail to understand it?

Thanks!

Stephen


signature.asc
Description: PGP signature

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Tom Lane

Stephen Frost  writes:
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
>> After a few minutes' more thought, I think that the most attractive
>> option is to leave v11 alone and do a full revert in HEAD.  In this
>> way, if anyone's attached "recheck_on_update" options to their indexes,
>> it'll continue to work^H^H^H^Hdo nothing in v11, though they won't be
>> able to migrate to v12 till they remove the options.  That way we
>> aren't bound to the questionable design and naming of that storage
>> option if/when we try this again.

> So the plan is to add a check into pg_upgrade to complain if it comes
> across any cases where recheck_on_update is set during its pre-flight
> checks..?

It wasn't my plan particularly.  I think the number of databases with
that option set is probably negligible, not least because it was
on-by-default during its short lifespan.  So there really has never been
a point where someone would have had a reason to turn it on explicitly.

Now if somebody else is excited enough to add such logic to pg_upgrade,
I wouldn't stand in their way.  But I suspect just doing the revert is
already going to be painful enough :-(

> What if v12 sees "recheck_on_update='false'", as a v11
> pg_dump might output today?

It'll complain that that's an unknown option.

regards, tom lane

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Andres Freund

On 2019-01-14 18:46:18 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Tom Lane (t...@sss.pgh.pa.us) wrote:
> > Andres Freund  writes:
> > > On 2019-01-14 18:03:24 -0500, Tom Lane wrote:
> > >> Do we want to revert entirely, or leave the "recheck_on_update" option
> > >> present but nonfunctional?
> > 
> > > I think it depends a bit on whether we want to revert in master or
> > > master and 11. If only master, I don't see much point in leaving the
> > > option around. If both, I think we should (need to?) leave it around in
> > > 11 only.
> > 
> > After a few minutes' more thought, I think that the most attractive
> > option is to leave v11 alone and do a full revert in HEAD.  In this
> > way, if anyone's attached "recheck_on_update" options to their indexes,
> > it'll continue to work^H^H^H^Hdo nothing in v11, though they won't be
> > able to migrate to v12 till they remove the options.  That way we
> > aren't bound to the questionable design and naming of that storage
> > option if/when we try this again.
> 
> So the plan is to add a check into pg_upgrade to complain if it comes
> across any cases where recheck_on_update is set during its pre-flight
> checks..?

I don't think so.

> Or are you suggesting that pg_dump in v12+ would throw errors if it
> finds that set?  Or that we'll dump it, but fail to allow it into a
> v12+ database?  What if v12 sees "recheck_on_update='false'", as a v11
> pg_dump might output today?

It'll just error out on restore (including the internal one by
pg_upgrade). I don't see much point in doing more, this isn't a widely
used option by any stretch.

Greetings,

Andres Freund

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Stephen Frost

Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Andres Freund  writes:
> > On 2019-01-14 18:03:24 -0500, Tom Lane wrote:
> >> Do we want to revert entirely, or leave the "recheck_on_update" option
> >> present but nonfunctional?
> 
> > I think it depends a bit on whether we want to revert in master or
> > master and 11. If only master, I don't see much point in leaving the
> > option around. If both, I think we should (need to?) leave it around in
> > 11 only.
> 
> After a few minutes' more thought, I think that the most attractive
> option is to leave v11 alone and do a full revert in HEAD.  In this
> way, if anyone's attached "recheck_on_update" options to their indexes,
> it'll continue to work^H^H^H^Hdo nothing in v11, though they won't be
> able to migrate to v12 till they remove the options.  That way we
> aren't bound to the questionable design and naming of that storage
> option if/when we try this again.

So the plan is to add a check into pg_upgrade to complain if it comes
across any cases where recheck_on_update is set during its pre-flight
checks..?

Or are you suggesting that pg_dump in v12+ would throw errors if it
finds that set?  Or that we'll dump it, but fail to allow it into a
v12+ database?  What if v12 sees "recheck_on_update='false'", as a v11
pg_dump might output today?

To be clear, I'm in agreement with reverting this, just trying to think
through what's going to happen and how users will be impacted.

Thanks!

Stephen

signature.asc
Description: PGP signature

Re: pgsql: Remove WITH OIDS support, change oid catalog column visibility.

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-07 17:03:20 +0530, Amit Khandekar wrote:
> On Sat, 5 Jan 2019 at 02:09, Andres Freund  wrote:
> > On 2019-01-03 13:40:42 -0500, Tom Lane wrote:
> > > I noticed that this patch has broken restores of existing dump files:
> > >
> > > psql:testbed.public.backup:82: ERROR:  unrecognized configuration 
> > > parameter "default_with_oids"
> > >
> > > Quite aside from the fact that this won't work if the user tries to
> > > restore in single-transaction or no-error mode, this is really really
> > > unhelpful because it's impossible to tell whether the error is
> > > ignorable or not, except by digging through the dump file.
> > >
> > > What you should do is restore that GUC, but add a check-hook that throws
> > > an error for an attempt to set it to "true".
> 
> Attached is a patch that does this.

Thanks!  I've pushed this, after some minor editorialization (hiding the
GUC, shortened description, shortened error message).

Regards,

Andres

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread James Coleman

On Mon, Jan 14, 2019 at 11:34 AM Tom Lane  wrote:
> >> Hm.  That would be annoying, but on reflection I think maybe we don't
> >> actually need to worry about emptiness of the array after all.  The
> >> thing that we need to prove, at least for the implication case, is
> >> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
>
> > Are you thinking that implies clause_is_strict_for might not be the
> > right/only place? Or just that the code in that function needs to
> > consider if it should return different results in some cases to handle
> > all 4 proof rules properly?
>
> The latter is what I was envisioning, but I haven't worked out details.

The more that I look at this I'm wondering if there aren't two things
here: it seems that the existing patch (with the code that excludes
empty arrays) is likely useful on its own for cases I hadn't
originally envisioned (or tested). Specifically it seems to allow us
to conclude the result will be null if a field is known to be null. I
think I'll try to add a test specific to this, but I'm not immediately
sure I know a case where that matters. If anyone has an idea on what
direction to head with this, I'd be happy to code it up.

For the original goal of "truth of the ScalarArrayOp implies truth of
the IS NOT NULL clause", it seems to me the proper place is
predicate_implied_by_simple_clause where we already have a place to
specifically work with IS NOT NULL expressions. That will allow this
to be more targeted, work for empty arrays as well, and not
potentially introduce an optimizer bug for strictness with empty
arrays.

I'm attaching an updated patch that does that. I've also added the
"parallel" case in predicate_refuted_by_simple_clause, but I haven't
added a test for that yet. I also would like to add a test for the sad
path to parallel the happy path computed array/cte test.

While I add that though I wanted to get this updated version out to
get feedback on the approach.

James Coleman

saop_is_not_null-v5.patch
Description: Binary data

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-14 18:03:24 -0500, Tom Lane wrote:
>> Do we want to revert entirely, or leave the "recheck_on_update" option
>> present but nonfunctional?

> I think it depends a bit on whether we want to revert in master or
> master and 11. If only master, I don't see much point in leaving the
> option around. If both, I think we should (need to?) leave it around in
> 11 only.

After a few minutes' more thought, I think that the most attractive
option is to leave v11 alone and do a full revert in HEAD.  In this
way, if anyone's attached "recheck_on_update" options to their indexes,
it'll continue to work^H^H^H^Hdo nothing in v11, though they won't be
able to migrate to v12 till they remove the options.  That way we
aren't bound to the questionable design and naming of that storage
option if/when we try this again.

A revert in v11 would have ABI compatibility issues to think about,
and would also likely be a bunch more work on top of what we'll
have to do for HEAD, so leaving it as-is seems like a good idea.

regards, tom lane

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-14 18:03:24 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2018-11-07 14:25:54 -0500, Tom Lane wrote:
> >> In short, it seems likely to me that large parts of this patch need to
> >> be pulled out, rewritten, and then put back in different places than
> >> they are today.  I'm not sure if a complete revert is the best next
> >> step, or if we can make progress without that.
> 
> > We've not really made progress on this. I continue to think that we
> > ought to revert this feature, and then work to re-merge it an
> > architecturally correct way afterwards.  Other opinions?
> 
> Given the lack of progress, I'd agree with a revert.  It's probably
> already going to be a bit painful to undo due to subsequent changes,
> so we shouldn't wait too much longer.

Yea, the reason I re-encountered this is cleaning up the pluggable
storage patch. Which certainly would make this revert harder...


> Do we want to revert entirely, or leave the "recheck_on_update" option
> present but nonfunctional?

I think it depends a bit on whether we want to revert in master or
master and 11. If only master, I don't see much point in leaving the
option around. If both, I think we should (need to?) leave it around in
11 only.

Greetings,

Andres Freund

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Tom Lane

Andres Freund  writes:
> On 2018-11-07 14:25:54 -0500, Tom Lane wrote:
>> In short, it seems likely to me that large parts of this patch need to
>> be pulled out, rewritten, and then put back in different places than
>> they are today.  I'm not sure if a complete revert is the best next
>> step, or if we can make progress without that.

> We've not really made progress on this. I continue to think that we
> ought to revert this feature, and then work to re-merge it an
> architecturally correct way afterwards.  Other opinions?

Given the lack of progress, I'd agree with a revert.  It's probably
already going to be a bit painful to undo due to subsequent changes,
so we shouldn't wait too much longer.

Do we want to revert entirely, or leave the "recheck_on_update" option
present but nonfunctional?

regards, tom lane

Re: explain plans with information about (modified) gucs

2019-01-14 Thread legrand legrand

Tomas Vondra-4 wrote
> Hello Sergei,
> 
>> This patch correlates with my proposal
>> "add session information column to pg_stat_statements"
>> https://www.postgresql.org/message-id/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com
>> They both address the problem to identify the factors that make
>> different execution plans for the same SQL statements. You are
>> interested in the current settings that affect the execution plan, I'm
>> concerned about historical data in pg_stat_statements. From my
>> experience the most often offending settings are
>> current_schemas/search_path and current_user. Please have in mind that
>> probably the same approach that you will use to extend explain plan
>> functionality will be eventually implemented to extend
>> pg_stat_statements.
> 
> Possibly, although I don't have an ambition to export the GUCs into
> pg_stat_statements in this patch. There's an issue with merging
> different values of GUCs in different executions of a statement, and
> it's unclear how to solve that.
> 
> [...]
> 
> -- 
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

This explain with GUCs feature can be included very easily for historical
data management in pg_store_plans or pg_stat_sql_plans extensions 
(that both use a planid based on the normalized plan text).

Regards
PAscal



--
Sent from: http://www.postgresql-archive.org/PostgreSQL-hackers-f1928748.html

Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

2019-01-14 Thread Peter Geoghegan

On Mon, Jan 14, 2019 at 1:46 PM Peter Geoghegan  wrote:
> I would have said that the assumption is that a fixed source tuple
> will generate identical index entries. The problem with that is that
> my idea of what constitutes a fixed input now seems to have been
> faulty. I didn't think that the executor could mutate TOAST state in a
> way that made this kind of inconsistency possible.

The source tuple (by which I mean the mgd.bib_refs heap tuple) is a
HEAP_HASEXTERNAL tuple. If I update it to make a particularly long
text field NULL (UPDATE mgd.bib_refs SET abstract = NULL), and then
"INSERT INTO bug SELECT * FROM mgd.bib_refs", amcheck stops
complaining about the index on "bug.title" is missing. Even though the
"abstract" field has nothing to do with the index.

The source of the inconsistency here must be within
heap_prepare_insert() -- the external datum handling:

/*
 * If the new tuple is too big for storage or contains already toasted
 * out-of-line attributes from some other relation, invoke the toaster.
 */
if (relation->rd_rel->relkind != RELKIND_RELATION &&
relation->rd_rel->relkind != RELKIND_MATVIEW)
{
/* toast table entries should never be recursively toasted */
Assert(!HeapTupleHasExternal(tup));
return tup;
}
else if (HeapTupleHasExternal(tup) || tup->t_len > TOAST_TUPLE_THRESHOLD)
return toast_insert_or_update(relation, tup, NULL, options);
else
return tup;

Even leaving that aside, I really should have spotted that
TOAST_TUPLE_THRESHOLD is a different thing to TOAST_INDEX_TARGET. The
two things are always controlled independently. Mea culpa.

The fix here must be to normalize index tuples that are compressed
within amcheck, both during initial fingerprinting, and during
subsequent probes of the Bloom filter in bt_tuple_present_callback().

-- 
Peter Geoghegan

Re: [HACKERS] Surjective functional indexes

2019-01-14 Thread Andres Freund

On 2018-11-07 14:25:54 -0500, Tom Lane wrote:
> Robert Haas  writes:
> > On Fri, May 11, 2018 at 11:01 AM, Simon Riggs  wrote:
>  I have no problem if you want to replace this with an even better
>  design in a later release.
> 
> >>> Meh. The author / committer should get a patch into the right shape
> 
> >> They have done, at length. Claiming otherwise is just trash talk.
> 
> > Some people might call it "honest disagreement".
> 
> So where we are today is that this patch was lobotomized by commits
> 77366d90f/d06fe6ce2 as a result of this bug report:
> 
> https://postgr.es/m/20181106185255.776mstcyehnc63ty@alvherre.pgsql
> 
> We need to move forward, either by undertaking a more extensive
> clean-out, or by finding a path to a version of the code that is
> satisfactory.  I wanted to enumerate my concerns while yesterday's
> events are still fresh in mind.  (Andres or Robert might have more.)
> 
> * I do not understand why this feature is on-by-default in the first
> place.  It can only be a win for expression indexes that are many-to-one
> mappings; for indexes that are one-to-one or few-to-one, it's a pretty
> big loss.  I see no reason to assume that most expression indexes fall
> into the first category.  I suggest that the design ought to be to use
> this optimization only for indexes for which the user has explicitly
> enabled recheck_on_update.  That would allow getting rid of the cost check
> in IsProjectionFunctionalIndex, about which we'd otherwise have to have
> an additional fight: I do not like its ad-hoc-ness, nor the modularity
> violation (and potential circularity) involved in having the relcache call
> cost_qual_eval.
> 
> * Having heapam.c calling the executor also seems like a
> modularity/layering violation that will bite us in the future.
> 
> * The implementation seems incredibly inefficient.  ProjIndexIsUnchanged
> is doing creation/destruction of an EState, index_open/index_close
> (including acquisition of a lock we should already have), BuildIndexInfo,
> and expression compilation, all of which are completely redundant with
> work done elsewhere in the executor.  And it's doing that *over again for
> every tuple*, which totally aside from wasted cycles probably means there
> are large query-lifespan memory leaks in an UPDATE affecting many rows.
> I think a minimum expectation should be that one-time work is done only
> one time; but ideally none of those things would happen at all because
> we could share work with the regular code path.  Perhaps it's too much
> to hope that we could also avoid duplicate computation of the new index
> expression values, but as long as I'm listing complaints ...
> 
> * As noted in the bug thread, the implementation of the new reloption
> is broken because (a) it fails to work for some built-in index AMs
> and (b) by design, it can't work for add-on AMs.  I agree with Andres
> that that's not acceptable.
> 
> * This seems like bad data structure design:
> 
> +   Bitmapset  *rd_projidx; /* Oids of projection indexes */
> 
> That comment is a lie, although IMO it'd be better if it were true;
> a list of target index OIDs would be a far better design here.  The use
> of rd_projidx as a set of indexes into the relation's indexlist is
> inefficient and overcomplicated, plus it creates an unnecessary and not
> very safe (even if it were documented) coupling between rd_indexlist and
> rd_projidx.
> 
> * Having ProjIndexIsUnchanged examine relation->rd_projidx directly is
> broken by design anyway, both from a modularity standpoint and because
> its inner loop involves catalog accesses that could result in relcache
> flushes.  I'm somewhat amazed that the regression tests passed on
> CLOBBER_CACHE_ALWAYS buildfarm animals, although possibly this is
> explained by the fact that they're only testing cases with a single
> expression index, so that the bitmap isn't checked again after the cache
> flush happens.  Again, this could be fixed if what was returned was just
> a list of relevant index OIDs.
> 
> * This bit of coding is unsafe, for the reason explained in the existing
> comment:
> 
>  /*
>   * Now save copies of the bitmaps in the relcache entry.  We 
> intentionally
>   * set rd_indexattr last, because that's the one that signals validity of
>   * the values; if we run out of memory before making that copy, we won't
>   * leave the relcache entry looking like the other ones are valid but
>   * empty.
>   */
>  oldcxt = MemoryContextSwitchTo(CacheMemoryContext);
>  relation->rd_keyattr = bms_copy(uindexattrs);
>  relation->rd_pkattr = bms_copy(pkindexattrs);
>  relation->rd_idattr = bms_copy(idindexattrs);
>  relation->rd_indexattr = bms_copy(indexattrs);
> +relation->rd_projindexattr = bms_copy(projindexattrs);
> +relation->rd_projidx = bms_copy(projindexes);
>  MemoryContextSwitchTo(oldcxt);
> 
> * There's a lot of other inattention to comments.  For example, I noticed
> that

Re: unique, partitioned index fails to distinguish index key from INCLUDEd columns

2019-01-14 Thread Alvaro Herrera

On 2019-Jan-09, Justin Pryzby wrote:

> -- Fails to error
> postgres=# CREATE UNIQUE INDEX ON t(j) INCLUDE(i);
> 
> -- Fail to enforce uniqueness across partitions due to failure to enforce 
> inclusion of partition key in index KEY
> postgres=# INSERT INTO t VALUES(1,1);
> postgres=# INSERT INTO t VALUES(2,1); 

Doh.  Fix pushed.  Commit 8224de4f42cc should have changed one
appearance of ii_NumIndexAttrs to ii_NumIndexKeyAttrs, but because of
the nature of concurrent development, nobody noticed.

Thanks for reporting.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: hostorder and failover_timeout for libpq

2019-01-14 Thread Tom Lane

Michael Paquier  writes:
> On Wed, Sep 19, 2018 at 02:26:53PM +0200, Ildar Musin wrote:
>> At this point I'd like to ask community what in your opinion would be the
>> best course of action and whether this feature should be implemented within
>> libpq at all? Because from my POV there are factors that really depend on
>> network architecture and there is probably no single right solution.

> By the way, I can see that the latest patch available does not apply at
> tries to juggle with multiple concepts.  I can see at least two of them:
> failover_timeout and hostorder.  You should split things.  I have moved
> the patch to next CF, waiting on author.

Per the discussion about the nearby prefer-standby patch,

https://www.postgresql.org/message-id/flat/caf3+xm+8-ztokav9ghij3wfgentq97qcjxqt+rbfq6f7onz...@mail.gmail.com

it seems pretty unfortunate that this patch proposes functionality
that's nearly identical to something in pgJDBC, but isn't using the
same terminology pgJDBC uses.

It's even more unfortunate that we have three separate patch proposal
threads that are touching more or less the same territory, but don't
seem to be talking to each other.  This one is also relevant:

https://www.postgresql.org/message-id/flat/1700970.crwpxno...@hammer.magicstack.net

regards, tom lane

Re: Libpq support to connect to standby server as priority

2019-01-14 Thread Tom Lane

This patch is marked as "ready for committer", but that characterization
seems way over-optimistic.  It looks like there are several unsettled
questions:

1. The connection parameter name and values are unlike the very similar
feature in pgJDBC.  I think this is a fair complaint.  Now I'm not in love
with "hostRecheckSeconds" --- that seems like a very squishily defined
thing with limited use-case, given Robert's argument that you shouldn't
be using this feature at all for short-lived connections.  And
"loadBalanceHosts" is something we could leave for later.  But it seems
pretty unfortunate not to follow pgJDBC's lead on the main parameter,
"targetServerType=(any | master | secondary | preferSecondary)".

The problem here of course is that whoever invented target_session_attrs
was unconcerned with following that precedent, so what we have is
"target_session_attrs=(any | read-write)".
Are we prepared to add some aliases in service of unifying these names?

2. Whether or not you want to follow pgJDBC's naming, it seems like we
ought to have both "require read only" and "prefer read only" behaviors
in this patch, and maybe likewise "require read write" versus "prefer
read write".

3. We ought to sync this up with whatever's going to happen in
https://commitfest.postgresql.org/21/1090/
at least to the extent of agreeing on what GUCs we'd like to see
the server start reporting.

4. Given that other discussion, it's not quite clear what we should
even be checking.  The existing logic devolves to checking that
transaction_read_only is true, but that's not really the same thing as
"is a master server", eg you might have connected to a master server
under a role that has SET ROLE default_transaction_read_only = false.
(I wonder what pgJDBC is really checking, under the hood.)
Do we want to have modes that are checking hot-standby state in some
fashion, rather than the transaction_read_only state?

regards, tom lane

Re: [HACKERS] logical decoding of two-phase transactions

2019-01-14 Thread Arseny Sher



Nikhil Sontakke  writes:

> I'd like to believe that the latest patch set tries to address some
> (if not all) of your concerns. Can you please take a look and let me
> know?

Hi, sure.

General things:

- Earlier I said that there is no point of sending COMMIT PREPARED if
  decoding snapshot became consistent after PREPARE, i.e. PREPARE hadn't
  been sent. I realized since then that such use cases actually exist:
  prepare might be copied to the replica by e.g. basebackup or something
  else earlier. Still, a plugin must be able to easily distinguish these
  too early PREPARES without doing its own bookkeeping (remembering each
  PREPARE it has seen). Fortunately, turns out this we can make it
  easy. If during COMMIT PREPARED / ABORT PREPARED record decoding we
  see that ReorderBufferTXN with such xid exists, it means that either
  1) plugin refused to do replay of this xact at PREPARE or 2) PREPARE
  was too early in the stream. Otherwise xact would be replayed at
  PREPARE processing and rbtxn purged immediately after. I think we
  should add this to the documentation of filter_prepare_cb. Also, to
  this end we need to add an argument to this callback specifying at
  which context it was called: during prepare / commit prepared / abort
  prepared.  Also, for this to work, ReorderBufferProcessXid must be
  always called at PREPARE, not only when 2PC decoding is disabled.

- BTW, ReorderBufferProcessXid at PREPARE should be always called
  anyway, because otherwise if xact is empty, we will not prepare it
  (and call cb), even if the output plugin asked us not to filter it
  out. However, we will call commit_prepared cb, which is inconsistent.

- I find it weird that in DecodePrepare and in DecodeCommit you always
  ask the plugin whether to filter an xact, given that sometimes you
  know beforehand that you are not going to replay it: it might have
  already been replayed, might have wrong dbid, origin, etc. One
  consequence of this: imagine that notorious xact with PREPARE before
  point where snapshot became consistent and COMMIT PREPARED after that
  point. Even if filter_cb says 'I want 2PC on this xact', with current
  code it won't be replayed on PREPARE and rbxid will be destroyed with
  ReorderBufferForget. Now this xact is lost.

- Doing full-blown SnapBuildCommitTxn during PREPARE decoding is wrong,
  because xact effects must not yet be seen to others. I discussed this
  at length and described adjacent problems in [1].

- I still don't like that if 2PC xact was aborted and its replay
  stopped, prepare callback won't be called but abort_prepared would be.
  This either should be documented or fixed.


Second patch:

+   /* filter_prepare is optional, but requires two-phase decoding */
+   if ((ctx->callbacks.filter_prepare_cb != NULL) && 
(!opt->enable_twophase))
+   ereport(ERROR,
+   (errmsg("Output plugin does not support 
two-phase decoding, but "
+   "registered filter_prepared 
callback.")));

I actually think that enable_twophase output plugin option is
redundant. If plugin author wants 2PC, he just provides
filter_prepare_cb callback and potentially others. I also don't see much
value in checking that exactly 0 or 3 callbacks were registred.


- You allow (commit|abort)_prepared_cb, prepare_cb callbacks to be not
  specified with enabled 2PC and call them without check that they
  actually exist.


- executed within that transaction.
+ executed within that transaction. A transaction that is prepared for
+ a two-phase commit using PREPARE TRANSACTION will
+ also be decoded if the output plugin callbacks needed for decoding
+ them are provided. It is possible that the current transaction which
+ is being decoded is aborted concurrently via a ROLLBACK 
PREPARED
+ command. In that case, the logical decoding of this transaction will
+ be aborted too.

This should say explicitly that such 2PC xact will be decoded at PREPARE
record. Probably also add that otherwise it is decoded at CP
record. Probably also add "and abort_cb callback called" to the last
sentence.


+  The required abort_cb callback is called whenever
+  a transaction abort has to be initiated. This can happen if we are

This callback is not required in the code, and it would be indeed a bad
idea to demand it, breaking compatibility with existing plugins not
caring about 2PC.


+* Otherwise call either PREPARE (for twophase transactions) or 
COMMIT
+* (for regular ones).
+*/
+   if (rbtxn_rollback(txn))
+   rb->abort(rb, txn, commit_lsn);

This is dead code since we don't have decoding of in-progress xacts yet.


+   /*
+* If there is a valid top-level transaction that's different 
from the
+* two-phase one we are aborting, clear its reorder buffer as 
well.
+

Re: explain plans with information about (modified) gucs

2019-01-14 Thread Alvaro Herrera

On 2019-Jan-14, Tomas Vondra wrote:

> The one slightly annoying issue is that currently all the options are
> formatted as text, including e.g. cpu_tuple_cost. That's because the GUC
> options may have show hook, so I can't access the value directly (not
> sure if there's an option around it).

I think the problem is that you'd have to know how to print the value,
which can be in one of several different C types.  You'd have to grow
some more infrastructure in the GUC tables, I think, and that doesn't
seem worth the trouble.  Printing as text seems enough.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

2019-01-14 Thread Peter Geoghegan

On Mon, Jan 14, 2019 at 1:31 PM Tom Lane  wrote:
> Peter Geoghegan  writes:
> > The heapallindexed enhancement that made it into Postgres 11 assumes
> > that the representation of index tuples produced by index_form_tuple()
> > (or all relevant index_form_tuple() callers) is deterministic: for
> > every possible heap tuple input there must be a single possible
> > (bitwise) output.
>
> That assumption seems unbelievably fragile.  How badly do things
> break when it's violated?

Well, they break. You get a false positive report of corruption, since
there isn't a bitwise identical version of the datum from the heap in
the index for that same tuple. This seems to be very unlikely in
practice, but amcheck is concerned with unlikely outcomes.

> Also, is the assumption just that a fixed source tuple will generate
> identical index entries across repeated index_form_tuple attempts?

I would have said that the assumption is that a fixed source tuple
will generate identical index entries. The problem with that is that
my idea of what constitutes a fixed input now seems to have been
faulty. I didn't think that the executor could mutate TOAST state in a
way that made this kind of inconsistency possible.

> Or is it assuming that logically equal index entries will be bitwise
> equal?  The latter is broken on its face, because index_form_tuple()
> doesn't try to hide differences in the toasting state of source
> datums.

Logical equality as I understand the term doesn't enter into it at all
-- B-Tree operator class semantics are not involved here. I'm not sure
if that's what you meant, but I want to be clear on that. amcheck
certainly knows that it cannot assume that scankey logical equality is
the same thing as bitwise equality.

-- 
Peter Geoghegan

Re: Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

2019-01-14 Thread Tom Lane

Peter Geoghegan  writes:
> The heapallindexed enhancement that made it into Postgres 11 assumes
> that the representation of index tuples produced by index_form_tuple()
> (or all relevant index_form_tuple() callers) is deterministic: for
> every possible heap tuple input there must be a single possible
> (bitwise) output.

That assumption seems unbelievably fragile.  How badly do things
break when it's violated?

Also, is the assumption just that a fixed source tuple will generate
identical index entries across repeated index_form_tuple attempts?
Or is it assuming that logically equal index entries will be bitwise
equal?  The latter is broken on its face, because index_form_tuple()
doesn't try to hide differences in the toasting state of source
datums.

regards, tom lane

Re: COPY FROM WHEN condition

2019-01-14 Thread Tomas Vondra

On 12/13/18 8:09 AM, Surafel Temesgen wrote:
> 
> 
> On Wed, Dec 12, 2018 at 9:28 PM Tomas Vondra
> mailto:tomas.von...@2ndquadrant.com>> wrote:
> 
> 
> Can you also update the docs to mention that the functions called from
> the WHERE clause does not see effects of the COPY itself?
> 
> 
> /Of course, i  also add same comment to insertion method selection
> /

FWIW I've marked this as RFC and plan to get it committed this week.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: explain plans with information about (modified) gucs

2019-01-14 Thread Tomas Vondra


Hello Sergei,

> This patch correlates with my proposal
> "add session information column to pg_stat_statements"
> https://www.postgresql.org/message-id/3aa097d7-7c47-187b-5913-db8366cd4491%40gmail.com
> They both address the problem to identify the factors that make
> different execution plans for the same SQL statements. You are
> interested in the current settings that affect the execution plan, I'm
> concerned about historical data in pg_stat_statements. From my
> experience the most often offending settings are
> current_schemas/search_path and current_user. Please have in mind that
> probably the same approach that you will use to extend explain plan
> functionality will be eventually implemented to extend
> pg_stat_statements.

Possibly, although I don't have an ambition to export the GUCs into
pg_stat_statements in this patch. There's an issue with merging
different values of GUCs in different executions of a statement, and
it's unclear how to solve that.

> I think that the list of the GUCs that are reported
> by explain plan should be structured like JSON, something like
> extended_settings: { "current_schemas" : ["pg_catalog", "s1", "s2", "public"],
>   "current_user" : "user1",
>   "enable_nestloop" : "off",
>   "work_mem" = "32MB"
> }
> It is less important for yours use case explain, but is important
> for pg_stat_statements case.
> The pg_stat_statements is often accessed by monitoring and reporting
> tools, and it will be a good idea to have > the data here in the
> structured and easily parsed format.

Yes, that's a good point. I think it's fine to keep the current format
for TEXT output, and use a structured format when the explain format is
set to json or yaml. That's what we do for data about Hash nodes, for
example (see show_hash_info).

So I've done that in the attached v5 of the patch, which now produces
something like this:

test=# explain (gucs, format json) select * from t;
   QUERY PLAN
-
 [  +
   {+
 "Plan": {  +
   "Node Type": "Seq Scan", +
   "Parallel Aware": false, +
   "Relation Name": "t",+
   "Alias": "t",+
   "Startup Cost": 0.00,+
   "Total Cost": 61.00, +
   "Plan Rows": 2550,   +
   "Plan Width": 4  +
 }, +
 "GUC": [   +
   "cpu_tuple_cost": "0.02",+
   "work_mem": "1GB"+
 ]  +
   }+
 ]
(1 row)

The one slightly annoying issue is that currently all the options are
formatted as text, including e.g. cpu_tuple_cost. That's because the GUC
options may have show hook, so I can't access the value directly (not
sure if there's an option around it).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/contrib/auto_explain/auto_explain.c b/contrib/auto_explain/auto_explain.c
index 7b22927674..d3c4def942 100644
--- a/contrib/auto_explain/auto_explain.c
+++ b/contrib/auto_explain/auto_explain.c
@@ -28,6 +28,7 @@ static bool auto_explain_log_verbose = false;
 static bool auto_explain_log_buffers = false;
 static bool auto_explain_log_triggers = false;
 static bool auto_explain_log_timing = true;
+static bool auto_explain_log_gucs = false;
 static int	auto_explain_log_format = EXPLAIN_FORMAT_TEXT;
 static int	auto_explain_log_level = LOG;
 static bool auto_explain_log_nested_statements = false;
@@ -112,6 +113,17 @@ _PG_init(void)
 			 NULL,
 			 NULL);
 
+	DefineCustomBoolVariable("auto_explain.log_gucs",
+			 "Print modified GUC values.",
+			 NULL,
+			 &auto_explain_log_gucs,
+			 false,
+			 PGC_SUSET,
+			 0,
+			 NULL,
+			 NULL,
+			 NULL);
+
 	DefineCustomBoolVariable("auto_explain.log_verbose",
 			 "Use EXPLAIN VERBOSE for plan logging.",
 			 NULL,
@@ -356,6 +368,7 @@ explain_ExecutorEnd(QueryDesc *queryDesc)
 			es->timing = (es->analyze && auto_explain_log_timing);
 			es->summary = es->analyze;
 			es->format = auto_explain_log_format;
+			es->gucs = auto_explain_log_gucs;
 
 			ExplainBeginOutput(es);
 			ExplainQueryText(es, queryDesc);
diff --git a/doc/src/sgml/auto-explain.sgml b/doc/src/sgml/auto-explain.sgml
index 120b168d45..852c69b7bb 100644
--- a/doc/src/sgml/auto-explain.sgml
+++ b/doc/src/sgml/auto-explain.sgml
@@ -169,6 +169,23 @@ LOAD 'auto_explain';
 

 
+   
+
+ auto_explain.log_gucs (boolean)
+ 
+  auto_explain.log_gucs configuration parameter
+ 
+
+
+ 
+  auto_explain.log_gucs controls whether information
+  about modified configuration options are logged with the execution
+  plan.  Only options modified at the database, user, client or session
+  level are considered modified.  This parameter is off by default.
+ 
+
+   
+

Non-deterministic IndexTuple toast compression from index_form_tuple() + amcheck false positives

2019-01-14 Thread Peter Geoghegan

While testing my nbtree heap TID keyspace patch, I came across a case
where amcheck reliably reports corruption. It appeared that a 4 byte
varlena index entry that was expected in an index was not actually
present. However, index scan queries with the "missing" value in their
qual didn't actually give wrong answers. This was reproducible on the
master branch, too. It turned out that the problem has existed since
the heapallindexed enhancement made it into Postgres 11 was committed.

The heapallindexed enhancement that made it into Postgres 11 assumes
that the representation of index tuples produced by index_form_tuple()
(or all relevant index_form_tuple() callers) is deterministic: for
every possible heap tuple input there must be a single possible
(bitwise) output. There is no real corruption present with the test
case, though it's not entirely clear that this is best thought of as a
bug in amcheck -- I'd prefer to make sure that amcheck's expectations
are actually met here, rather than have amcheck normalize its input to
eliminate the difference in bitwise representation.

Steps to reproduce are rather delicate -- I stumbled upon the problem
entirely by accident. I can share the full test case if that helps,
but will hold off for now, since it involves a pg_dump that's a few
megabytes in size. Here is an outline of what I'm doing:

pg_restore -d postgres /home/pg/code/suffix_truncation_test/bib_refs_small.dump

pg@postgres:5432 [9532]=# \d mgd.bib_refs
   Table "mgd.bib_refs"
  Column   │Type │ Collation │
Nullable │ Default
───┼─┼───┼──┼─
 _refs_key │ integer │   │ not null │
 _reviewstatus_key │ integer │   │ not null │
 reftype   │ character(4)│   │ not null │
 authors   │ text│   │  │
 _primary  │ character varying(60)   │   │  │
 title │ text│   │  │
 journal   │ character varying(100)  │   │  │
 vol   │ character varying(20)   │   │  │
 issue │ character varying(25)   │   │  │
 date  │ character varying(30)   │   │  │
 year  │ integer │   │  │
 pgs   │ character varying(30)   │   │  │
 nlmstatus │ character(1)│   │ not null │
 abstract  │ text│   │  │
 isreviewarticle   │ smallint│   │ not null │
 _createdby_key│ integer │   │ not null │ 1001
 _modifiedby_key   │ integer │   │ not null │ 1001
 creation_date │ timestamp without time zone │   │ not null │ now()
 modification_date │ timestamp without time zone │   │ not null │ now()
Indexes:
"bib_refs_pkey" PRIMARY KEY, btree (_refs_key)
"bib_refs_idx_authors" btree (authors)
"bib_refs_idx_createdby_key" btree (_createdby_key)
"bib_refs_idx_isprimary" btree (_primary)
"bib_refs_idx_journal" btree (journal)
"bib_refs_idx_modifiedby_key" btree (_modifiedby_key)
"bib_refs_idx_reviewstatus_key" btree (_reviewstatus_key)
"bib_refs_idx_title" btree (title)
"bib_refs_idx_year" btree (year)

psql -d postgres -c "create table bug (like mgd.bib_refs);"
psql -d postgres -c "create index on bug (title);"
psql -d postgres -c "insert into bug select * from mgd.bib_refs;"
psql -d postgres -c "create extension if not exists amcheck;"
psql -d postgres -c "analyze; set maintenance_work_mem='128MB';"
psql -d postgres -c "select bt_index_parent_check('bug_title_idx', true);"
ERROR:  heap tuple (579,4) from table "bug" lacks matching index tuple
within index "bug_title_idx"

Here are details of the offending datum in the heap:

pg@postgres:5432 [9532]=# select title, length(title),
pg_column_size(title)  from bug where ctid = '(579,4)';
─[ RECORD 1 ]──┬
title  │ Final report on the safety assessment of trilaurin,
triarachidin, tribehenin, tricaprin, tricaprylin, trierucin,
triheptanoin, triheptylundecanoin, triisononanoin, triisopalmitin,
triisostearin, trilinolein, trimyristin, trioctanoin, triolein,
tripalmitin, tripalmitolein, triricinolein, tristearin, triundecanoin,
glyceryl triacetyl hydroxystearate, glyceryl triacetyl ricinoleate,
and gl.
length │ 390
pg_column_size │ 234

Does anyone have any idea why the 4 byte varlena (text) datum in the
single attribute index "bug_title_idx" is uncompressed, while the
value in the heap is compressed? No other value in any other index
happens to trip the problem, though this is complicated real-world
database with many similar indexes over tens

Re: Reducing header interdependencies around heapam.h et al.

2019-01-14 Thread Alvaro Herrera

On 2019-Jan-14, Andres Freund wrote:

> I think the whole pointer hiding game that we play is a really really
> bad idea. We should just about *never* have typedefs that hide the fact
> that something is a pointer. But it's hard to go away from that for
> legacy reasons.
> 
> The problem with your approach is that it's *eminently* reasonable to
> want to forward declare a struct in multiple places. Otherwise you end
> up in issues where you include headers like heapam.h solely for a
> typedef, which obviously doesn't make a ton of sense.

Well, my point is that in an ideal world we would have a header where
the struct is declared once in a very lean header, which doesn't include
almost anything else, so you can include it into other headers
liberally.  Then the struct definitions are any another (heavy) header,
which *does* need to include lots of other stuff in order to be able to
define the structs fully, and would be #included very sparingly, only in
the few .c files that really needed it.

For example, I would split up execnodes.h so that *only* the
typedef/struct declarations are there, and *no* struct definition.  Then
that header can be #included in other headers that need those to declare
functions -- no problem.  Another header (say execstructs.h, whatever)
would contain the struct definition and would only be used by executor
.c files.  So when a struct changes, only the executor is recompiled;
the rest of the source doesn't care, because execnodes.h (the struct
decls) hasn't changed.

This may be too disruptive.  I'm not suggesting that you do things this
way, only describing my ideal world.

Your idea of "liberally forward-declaring structs in multiple places"
seems like you don't *have* the lean header at all (only the heavy one
with all the struct definitions), and instead you distribute bits and
pieces of the lean header randomly to the places that need it.  It's
repetitive.  It gets the job done, but it's not *clean*.

> Given the fact that including headers just for a typedef is frequent
> overkill, hiding the typedef in a separate header has basically no
> gain. I also don't quite understand why using a forward declaration with
> struct in the name is that confusing, especially when it only happens in
> the header.

Oh, that's not the confusing part -- that's just repetitive, nothing
more.  What's confusing (IMO) is having two names for the same struct
(one pointer and one full struct).  It'd be useful if it was used the
way I describe above.  But that's the settled project style, so I don't
have any hopes that it'll ever be changed.  

Anyway, I'm not objecting to your patch ... I just don't want it on
record that I'm in love with the current situation :-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2019-01-14 Thread Tom Lane

David Rowley  writes:
> I ran a few benchmarks on an AWS m5d.large instance based on top of
> c5c7fa261f5. The biggest regression I see is from a simple SELECT 1 at
> around 5-6%. A repeat of your test of SELECT 2+2 showed about half
> that regression so the simple addition function call is introducing
> enough overhead to lower the slowdown percentage by a good amount.

I can reproduce a small slowdown on "SELECT 1;", though for me it's
circa 2% not 5-6%.  I'm not entirely sure that's above the noise level
--- I tend to see variations of that size even from unrelated code
changes.  But to the extent that it's real, it seems like it must be
coming from one of these places:

* replace_empty_jointree adds a few pallocs for the new RTE and
jointree entry.

* After subquery_planner calls replace_empty_jointree, subsequent
places that loop over the rtable will see one entry instead of none.
They won't do much of anything with it, but it's a few more cycles.

* remove_useless_result_rtes is new code; it won't do much in this
case either, but it still has to examine the jointree.

* query_planner() does slightly more work before reaching its fast-path
exit.

None of these are exactly large costs, and it's hard to get rid of
any of them without ugly code contortions.  I experimented with
micro-optimizing the trivial case in remove_useless_result_rtes,
but it didn't seem to make much difference for "SELECT 1", and it
would add cycles uselessly in all larger queries.

I also noticed that I'd been lazy in adding RTE_RESULT support to
build_simple_rel: we can save a couple of palloc's if we give it its own
code path.  That did seem to reduce the penalty a shade, though I'm
still not sure that it's above the noise level.

> SELECT 1; I believe is a common query for some connection poolers as a
> sort of ping to the database.  In light of that, the performance drop
> of 2 microseconds per query is not going to amount to very much in
> total for that use case. i.e you'll need to do half a million pings
> before it'll cost you 1 second of additional CPU time.

Yeah, I agree this is not something to get hot & bothered over, but
I thought it was worth spending an hour seeing if there were any
easy wins.  Not much luck.

Anyway, herewith v6, rebased up to HEAD, with the build_simple_rel
improvement and the regression test fix I mentioned earlier.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177eba..96d9828 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2476,6 +2476,8 @@ JumbleRangeTable(pgssJumbleState *jstate, List *rtable)
 			case RTE_NAMEDTUPLESTORE:
 APP_JUMB_STRING(rte->enrname);
 break;
+			case RTE_RESULT:
+break;
 			default:
 elog(ERROR, "unrecognized RTE kind: %d", (int) rte->rtekind);
 break;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d..b3894d0 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -5361,7 +5361,7 @@ INSERT INTO ft2 (c1,c2,c3) VALUES (1200,999,'foo') RETURNING tableoid::regclass;
QUERY PLAN
 -
  Insert on public.ft2
-   Output: (tableoid)::regclass
+   Output: (ft2.tableoid)::regclass
Remote SQL: INSERT INTO "S 1"."T 1"("C 1", c2, c3, c4, c5, c6, c7, c8) VALUES ($1, $2, $3, $4, $5, $6, $7, $8)
->  Result
  Output: 1200, 999, NULL::integer, 'foo'::text, NULL::timestamp with time zone, NULL::timestamp without time zone, NULL::character varying, 'ft2   '::character(10), NULL::user_enum
diff --git a/src/backend/executor/execAmi.c b/src/backend/executor/execAmi.c
index 12cb18c..cd77f99 100644
--- a/src/backend/executor/execAmi.c
+++ b/src/backend/executor/execAmi.c
@@ -437,9 +437,12 @@ ExecSupportsMarkRestore(Path *pathnode)
 return ExecSupportsMarkRestore(((ProjectionPath *) pathnode)->subpath);
 			else if (IsA(pathnode, MinMaxAggPath))
 return false;	/* childless Result */
+			else if (IsA(pathnode, GroupResultPath))
+return false;	/* childless Result */
 			else
 			{
-Assert(IsA(pathnode, ResultPath));
+/* Simple RTE_RESULT base relation */
+Assert(IsA(pathnode, Path));
 return false;	/* childless Result */
 			}
 
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 19b65f6..c3d27a0 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -2325,10 +2325,6 @@ range_table_walker(List *rt

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Tomas Vondra

On 1/14/19 4:31 PM, Tomas Vondra wrote:
> 
> On 1/14/19 12:20 PM, Dean Rasheed wrote:
>> (Removing Adrien from the CC list, because messages to that address
>> keep bouncing)
>>
>> On Sun, 13 Jan 2019 at 00:31, Tomas Vondra  
>> wrote:
>>>
>>> On 1/10/19 6:09 PM, Dean Rasheed wrote:
>>>>
>>>> In the previous discussion around UpdateStatisticsForTypeChange(), the
>>>> consensus appeared to be that we should just unconditionally drop all
>>>> extended statistics when ALTER TABLE changes the type of an included
>>>> column (just as we do for per-column stats), since such a type change
>>>> can rewrite the data in arbitrary ways, so there's no reason to assume
>>>> that the old stats are still valid. I think it makes sense to extract
>>>> that as a separate patch to be committed ahead of these ones, and I'd
>>>> also argue for back-patching it.
>>>
>>> Wasn't the agreement to keep stats that don't include column values
>>> (functional dependencies and ndistinct coefficients), and reset only
>>> more complex stats? That's what happens in master and how it's extended
>>> by the patch for MCV lists and histograms.
>>>
>>
>> Ah OK, I misremembered the exact conclusion reached last time. In that
>> case the logic in UpdateStatisticsForTypeChange() looks wrong:
>>
>> /*
>>  * If we can leave the statistics as it is, just do minimal cleanup
>>  * and we're done.
>>  */
>> if (!attribute_referenced && reset_stats)
>> {
>> ReleaseSysCache(oldtup);
>> return;
>> }
>>
>> That should be "|| !reset_stats", or have more parentheses.
> 
> Yeah, it should have been
> 
> if (!(attribute_referenced && reset_stats))
> 
> i.e. there's a parenthesis missing. Thanks for noticing this. I guess a
> regression test for this would be useful.
> 
>> In fact, I think that computing attribute_referenced is unnecessary
>> because the dependency information includes the columns that the
>> stats are for and ATExecAlterColumnType() uses that, so
>> attribute_referenced will always be true.
> Hmmm. I'm pretty sure I came to the conclusion it's in fact necessary,
> but I might be wrong. Will check.
> 

Turns out you were right - the attribute_referenced piece was quite
unnecessary. So I've removed it. I've also extended the regression tests
to verify changing type of another column does not reset the stats.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-multivariate-MCV-lists-20190114.patch.gz
Description: application/gzip


0002-multivariate-histograms-20190114.patch.gz
Description: application/gzip

Re: Reducing header interdependencies around heapam.h et al.

2019-01-14 Thread Andres Freund

Hi,

On 2019-01-14 15:36:14 -0300, Alvaro Herrera wrote:
> 0001 -- looks reasonable.  One hunk in executor.h changes LockTupleMode
> to "enum LockTupleMode", but there's no need for that.

Oh, that escaped from an earlier version where I briefly forgot that one
cannot portably forward-declare enums.

> AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr)
> split is so that it's possible to refer to structs without having
> the full struct definition.  I think changing uses of FooBar to "struct
> FooBarData *" defeats the whole purpose -- it becomes pointless noise,
> confusing the reader for no gain.  I've long considered that the struct
> definitions should appear in "internal" headers (such as
> htup_details.h), separate from the pointer typedefs, so that it is the
> forward struct declarations (and the pointer typedefs, where there are
> any) that are part of the exposed API for each module, and not the
> struct definitions.

I think the whole pointer hiding game that we play is a really really
bad idea. We should just about *never* have typedefs that hide the fact
that something is a pointer. But it's hard to go away from that for
legacy reasons.

The problem with your approach is that it's *eminently* reasonable to
want to forward declare a struct in multiple places. Otherwise you end
up in issues where you include headers like heapam.h solely for a
typedef, which obviously doesn't make a ton of sense.

If we were in C11 we could just forward declare the pointer hiding
typedefs in multiple places, and be done with that. But before C11
redundant typedefs aren't allowed. With the C99 move I'm however not
sure if there's any surviving supported compiler that doesn't allow
redundant typedefs as an extension.

Given the fact that including headers just for a typedef is frequent
overkill, hiding the typedef in a separate header has basically no
gain. I also don't quite understand why using a forward declaration with
struct in the name is that confusing, especially when it only happens in
the header.

> I think MissingPtr is a terrible name.  Can we change that while at
> this?

Indeed. I'd just remove the typedef.

Greetings,

Andres Freund

Re: Prepare Transaction support for ON COMMIT DROP temporary tables

2019-01-14 Thread Dimitri Fontaine

Hi,

Tom Lane  writes:
> I'm afraid that this patch is likely to be, if not completely broken,
> at least very much less useful than one could wish by the time we get
> done closing the holes discussed in this other thread:
>
> https://www.postgresql.org/message-id/flat/5d910e2e-0db8-ec06-dd5f-baec420513c3%40imap.cc

Thanks for the review here Tom, and for linking with the other
discussion (Álvaro did that too, thanks!). I've been reviewing it too.

I didn't think about the pg_temp_NN namespaces in my approach, and I
think it might be possible to make it work, but it's getting quite
involved now.

One idea would be that if every temp table in the session belongs to the
transaction, and their namespace too (we could check through pg_depend
that the namespace doesn't contain anything else beside the
transaction's tables), then we could dispose of the temp schema and
on-commit-drop tables at PREPARE COMMIT time.

Otherwise, as before, prevent the transaction to be a 2PC one.

> For instance, if we're going to have to reject the case where the
> session's temporary schema was created during the current transaction,
> then that puts a very weird constraint on whether this case works.

Yeah. The goal of my approach is to transparently get back temp table
support in 2PC when that makes sense, which is most use cases I've been
confronted to. We use 2PC in Citus, and it would be nice to be able to
use transaction local temp tables on worker nodes when doing data
injestion and roll-ups.

> Also, even without worrying about new problems that that discussion
> may lead to, I don't think that the patch works as-is.  The function
> every_on_commit_is_on_commit_drop() does what it says, but that is
> NOT sufficient to conclude that every temp table the transaction has
> touched is on-commit-drop.  This logic will successfully reject cases
> with on-commit-delete-rows temp tables, but not cases where the temp
> table(s) lack any ON COMMIT spec at all.

Thanks! I missed that the lack of ON COMMIT spec would have that impact
in the code. We could add tracking of that I suppose, and will have a
look at how to implement it provided that the other points find an
acceptable solution.

Regards,
-- 
dim

Re: Reducing header interdependencies around heapam.h et al.

2019-01-14 Thread Alvaro Herrera

0001 -- looks reasonable.  One hunk in executor.h changes LockTupleMode
to "enum LockTupleMode", but there's no need for that.

AFAIK the only reason to have the struct FooBarData vs. FooBar (ptr)
split is so that it's possible to refer to structs without having
the full struct definition.  I think changing uses of FooBar to "struct
FooBarData *" defeats the whole purpose -- it becomes pointless noise,
confusing the reader for no gain.  I've long considered that the struct
definitions should appear in "internal" headers (such as
htup_details.h), separate from the pointer typedefs, so that it is the
forward struct declarations (and the pointer typedefs, where there are
any) that are part of the exposed API for each module, and not the
struct definitions.  

I think that would be much more invasive, though, and it's unlikely to
succeed as easily as this simpler approach is.

I think MissingPtr is a terrible name.  Can we change that while at this?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions

2019-01-14 Thread Tomas Vondra

Hi,

Attached is an updated patch series, merging fixes and changes to TAP
tests proposed by Alexey. I've merged the fixes into the appropriate
patches, and I've kept the TAP changes / new tests as separate patches
towards the end of the series.

I'm a bit unhappy with two aspects of the current patch series:

1) We now track schema changes in two ways - using the pre-existing
schema_sent flag in RelationSyncEntry, and the (newly added) flag in
ReorderBuffer. While those options are used for regular vs. streamed
transactions, fundamentally it's the same thing and so having two
competing ways seems like a bad idea. Not sure what's the best way to
resolve this, though.

2) We've removed quite a few asserts, particularly ensuring sanity of
cmin/cmax values. To some extent that's expected, because by allowing
decoding of in-progress transactions relaxes some of those rules. But
I'd be much happier if some of those asserts could be reinstated, even
if only in a weaker form.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


0001-Add-logical_work_mem-to-limit-ReorderBuffer-20190114.patch.gz
Description: application/gzip


0002-Immediately-WAL-log-assignments-20190114.patch.gz
Description: application/gzip


0003-Issue-individual-invalidations-with-wal_lev-20190114.patch.gz
Description: application/gzip


0004-Extend-the-output-plugin-API-with-stream-me-20190114.patch.gz
Description: application/gzip


0005-Implement-streaming-mode-in-ReorderBuffer-20190114.patch.gz
Description: application/gzip


0006-Add-support-for-streaming-to-built-in-repli-20190114.patch.gz
Description: application/gzip


0007-Track-statistics-for-streaming-spilling-20190114.patch.gz
Description: application/gzip


0008-Enable-streaming-for-all-subscription-TAP-t-20190114.patch.gz
Description: application/gzip


0009-BUGFIX-set-final_lsn-for-subxacts-before-cl-20190114.patch.gz
Description: application/gzip


0010-Add-TAP-test-for-streaming-vs.-DDL-20190114.patch.gz
Description: application/gzip

Re: MERGE SQL statement for PG12

2019-01-14 Thread Simon Riggs

On Mon, 14 Jan 2019 at 15:45, Tomas Vondra 
wrote:

>
>
> On 1/11/19 12:26 AM, Peter Geoghegan wrote:
> > On Thu, Jan 10, 2019 at 1:15 PM Robert Haas 
> wrote:
> >> I feel like there has been some other thread where this was discussed,
> >> but I can't find it right now.  I think that the "query construction"
> >> logic in transformMergeStmt is fundamentally the wrong way to do this.
> >> I think several others have said the same.  And I don't think this
> >> should be considered for commit until that is rewritten.
> >
> > I agree with that.
> >
> > I think that it's worth acknowledging that Pavan is in a difficult
> > position with this patch. As things stand, Pavan really needs input
> > from a couple of people to put the query construction stuff on the
> > right path, and that input has not been forthcoming. I'm not
> > suggesting that anybody has failed to meet an obligation to Pavan to
> > put time in here, or that anybody has suggested that this is down to a
> > failure on Pavan's part. I'm merely pointing out that Pavan is in an
> > unenviable position with this patch, through no fault of his own, and
> > despite a worthy effort.
> >
>
> I 100% agree with this. I guess this is the phase where you have a patch
> that generally does the thing you want it to do, but others are saying
> the approach is not the right one. But you're under the spell of your
> approach and can't really see why would it be wrong ...
>

There has been *no* discussion on this point, just assertions that it is
wrong. I have no clue how 3 people can all agree something is wrong without
any discussion and nobody even questions it. So I think there is a spell
here, either Darkness or Cone of Silence.

Where is the further detail? Why is it wrong? What bad effects happen if
you do it this way? What is a better way? Is there even a different way to
do it? Nothing has been said, at all, ever.

The mechanism is exactly the one that Heikki recommended for the MERGE
patch some years ago, so he obviously thought it would work. That was not
my invention, nor Pavan's. We already use SQL assembly in multiple other
places without problem, namely Mat Views and RI Triggers; the approach used
here is better than that and doesn't rely on string handling at all.

Most importantly, it works.

BUT if there really is something wrong, Pavan would love to know and is
willing to make any changes suggested. Review requested.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread Tom Lane

James Coleman  writes:
> Are those invariants we want to keep (and recognize as regression
> tests)? If so, I can confirm that they aren't duplicative of the rest
> of the file, and if not I can remove them.

I can't get terribly excited about them ... if we were changing their
behavior, or if there were a bug discovered here, then I'd think
differently.  But I'm not a fan of adding regression test cycles
without a reason.

regards, tom lane

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread James Coleman

On Mon, Jan 14, 2019 at 11:34 AM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Mon, Jan 14, 2019 at 11:08 AM Tom Lane  wrote:
> >> I think these test cases don't actually prove much about the behavior
> >> of your patch.  Wouldn't they be handled by the generic OR-conversion
> >> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?
>
> > Which ones are you looking at in particular? The "inline" (non-cte)
> > happy and sad path cases have 102 total array elements (as does the
> > happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
> > tests are about the empty array case so much have 0 items (and were
> > the ones that showed different behavior between v3 and v4).
>
> I was looking at the empty-array ones.  I don't see how that reaches
> your patch; we should not be doing predicate_implied_by_simple_clause
> on the ScalarArrayOp itself, because predicate_classify should've
> decided to treat it as an OR clause.

At one point I was getting all 'f' instead of all 't' for those, but
now I can't reproduce it going from either master to my patch or to my
patch without the element count checks, so I'm not sure how/when I was
getting that anymore.

Are those invariants we want to keep (and recognize as regression
tests)? If so, I can confirm that they aren't duplicative of the rest
of the file, and if not I can remove them.

James Coleman

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread Tom Lane

James Coleman  writes:
> On Mon, Jan 14, 2019 at 11:08 AM Tom Lane  wrote:
>> I think these test cases don't actually prove much about the behavior
>> of your patch.  Wouldn't they be handled by the generic OR-conversion
>> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

> Which ones are you looking at in particular? The "inline" (non-cte)
> happy and sad path cases have 102 total array elements (as does the
> happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
> tests are about the empty array case so much have 0 items (and were
> the ones that showed different behavior between v3 and v4).

I was looking at the empty-array ones.  I don't see how that reaches
your patch; we should not be doing predicate_implied_by_simple_clause
on the ScalarArrayOp itself, because predicate_classify should've
decided to treat it as an OR clause.

>> Hm.  That would be annoying, but on reflection I think maybe we don't
>> actually need to worry about emptiness of the array after all.  The
>> thing that we need to prove, at least for the implication case, is
>> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".

> Are you thinking that implies clause_is_strict_for might not be the
> right/only place? Or just that the code in that function needs to
> consider if it should return different results in some cases to handle
> all 4 proof rules properly?

The latter is what I was envisioning, but I haven't worked out details.

regards, tom lane

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread James Coleman

On Mon, Jan 14, 2019 at 11:08 AM Tom Lane  wrote:
>
> James Coleman  writes:
> > On Sun, Jan 13, 2019 at 3:06 PM Tom Lane  wrote:
> >> There's still a logical problem here, which is that in order to
> >> prove that the ScalarArrayOpExpr yields NULL when its LHS does,
> >> you also have to prove that the RHS is not an empty array.
>
> > I've constructed a test case running the test_predtest function on the
> > expression
> > "select x is null, x = any(array[]::int[]) from integers" which I
> > think gets at the logic bug you're noting. In doing this I noticed
> > something I don't fully understand: on master this shows that "x =
> > any(array[]::int[])" proves "x is null" (while my patch shows the
> > opposite). Is this because the second expression being false means
> > there's can be no valid value for x? It also claims it refutes the
> > same, which is even more confusing.
>
> That sounds like a bug, but I think it's actually OK because a vacuous
> OR is necessarily false, and a false predicate "implies" any conclusion
> at all according to the customary definition of implication.  If there
> are any real optimization bugs in this area, it's probably a result of
> calling code believing more than it should about the meaning of a proof.
>
> I think these test cases don't actually prove much about the behavior
> of your patch.  Wouldn't they be handled by the generic OR-conversion
> logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

Which ones are you looking at in particular? The "inline" (non-cte)
happy and sad path cases have 102 total array elements (as does the
happy path cte case), and MAX_SAOP_ARRAY_SIZE is 100. The other two
tests are about the empty array case so much have 0 items (and were
the ones that showed different behavior between v3 and v4).

> > It seems to me that this would mean that an IS NOT NULL index would
> > still not be proven to be usable for a non-constant array (e.g., from
> > a subquery calculated array), and in fact I've included a (failing)
> > test demonstrating that fact in the attached patch. This feeds into my
> > question above about there possibly being another corollary/proof that
> > could be added for this case.
>
> Hm.  That would be annoying, but on reflection I think maybe we don't
> actually need to worry about emptiness of the array after all.  The
> thing that we need to prove, at least for the implication case, is
> "truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
> Per above, if the ScalarArrayOp is necessarily false, then we can
> claim that the implication holds.  However, we need to work through
> all four proof rules (strong vs. weak, implication vs refutation)
> and see which ones this applies to --- I'm not sure offhand that
> the logic works for all four.  (ENOCAFFEINE...)  In any case it
> requires thinking a bit differently about what clause_is_strict_for()
> is really doing.

Are you thinking that implies clause_is_strict_for might not be the
right/only place? Or just that the code in that function needs to
consider if it should return different results in some cases to handle
all 4 proof rules properly?

> > Semi-related: I noticed that predicate_classify handles an ArrayExpr
> > by using list_length; this seems unnecessary if we know we can fail
> > fast. I realize it's not a common problem, but I have seen extremely
> > long arrays before, and maybe we should fall out of the count once we
> > hit max+1 items.
>
> Huh?  list_length() is constant-time.

Facepalm. I'd somehow had it stuck in my head that we actually
iterated the list to count.

James Coleman

Re: insensitive collations

2019-01-14 Thread Daniel Verite

Andreas Karlsson wrote:

> > Nondeterministic collations do address this by allowing canonically
> > equivalent code point sequences to compare as equal.  You still need a
> > collation implementation that actually does compare them as equal; ICU
> > does this, glibc does not AFAICT.
> 
> Ah, right! You could use -ks-identic[1] for this.

Strings that differ like that are considered equal even at this level:

postgres=# create collation identic  (locale='und-u-ks-identic',
provider='icu', deterministic=false);
CREATE COLLATION

postgres=# select 'é' = E'e\u0301' collate "identic";
 ?column? 
--
 t
(1 row)

There's a separate setting "colNormalization", or "kk" in BCP 47

From 
http://www.unicode.org/reports/tr35/tr35-collation.html#Normalization_Setting

  "The UCA always normalizes input strings into NFD form before the
  rest of the algorithm. However, this results in poor performance.
  With normalization=off, strings that are in [FCD] and do not contain
  Tibetan precomposed vowels (U+0F73, U+0F75, U+0F81) should sort
  correctly. With normalization=on, an implementation that does not
  normalize to NFD must at least perform an incremental FCD check and
  normalize substrings as necessary"

But even setting this to false does not mean that NFD and NFC forms
of the same text compare as different:

postgres=# create collation identickk  (locale='und-u-ks-identic-kk-false',
provider='icu', deterministic=false);
CREATE COLLATION

postgres=# select 'é' = E'e\u0301' collate "identickk";
 ?column? 
--
 t
(1 row)

AFAIU such strings may only compare as different when they're not
in FCD form (http://unicode.org/notes/tn5/#FCD)

There are also ICU-specific explanations about FCD here:
http://source.icu-project.org/repos/icu/icuhtml/trunk/design/collation/ICU_collation_design.htm#Normalization

It looks like setting colNormalization to false might provide a
performance benefit when you know your contents are in FCD
form, which is mostly the case according to ICU:

  "Note that all NFD strings are in FCD, and in practice most NFC
  strings will also be in FCD; for that matter most strings (of whatever
  ilk) will be in FCD.
  We guarantee that if any input strings are in FCD, that we will get
  the right results in collation without having to normalize".

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread Tom Lane

James Coleman  writes:
> On Sun, Jan 13, 2019 at 3:06 PM Tom Lane  wrote:
>> There's still a logical problem here, which is that in order to
>> prove that the ScalarArrayOpExpr yields NULL when its LHS does,
>> you also have to prove that the RHS is not an empty array.

> I've constructed a test case running the test_predtest function on the
> expression
> "select x is null, x = any(array[]::int[]) from integers" which I
> think gets at the logic bug you're noting. In doing this I noticed
> something I don't fully understand: on master this shows that "x =
> any(array[]::int[])" proves "x is null" (while my patch shows the
> opposite). Is this because the second expression being false means
> there's can be no valid value for x? It also claims it refutes the
> same, which is even more confusing.

That sounds like a bug, but I think it's actually OK because a vacuous
OR is necessarily false, and a false predicate "implies" any conclusion
at all according to the customary definition of implication.  If there
are any real optimization bugs in this area, it's probably a result of
calling code believing more than it should about the meaning of a proof.

I think these test cases don't actually prove much about the behavior
of your patch.  Wouldn't they be handled by the generic OR-conversion
logic, since there's fewer than MAX_SAOP_ARRAY_SIZE items?

> It seems to me that this would mean that an IS NOT NULL index would
> still not be proven to be usable for a non-constant array (e.g., from
> a subquery calculated array), and in fact I've included a (failing)
> test demonstrating that fact in the attached patch. This feeds into my
> question above about there possibly being another corollary/proof that
> could be added for this case.

Hm.  That would be annoying, but on reflection I think maybe we don't
actually need to worry about emptiness of the array after all.  The
thing that we need to prove, at least for the implication case, is
"truth of the ScalarArrayOp implies truth of the IS NOT NULL clause".
Per above, if the ScalarArrayOp is necessarily false, then we can
claim that the implication holds.  However, we need to work through
all four proof rules (strong vs. weak, implication vs refutation)
and see which ones this applies to --- I'm not sure offhand that
the logic works for all four.  (ENOCAFFEINE...)  In any case it
requires thinking a bit differently about what clause_is_strict_for()
is really doing.

> Semi-related: I noticed that predicate_classify handles an ArrayExpr
> by using list_length; this seems unnecessary if we know we can fail
> fast. I realize it's not a common problem, but I have seen extremely
> long arrays before, and maybe we should fall out of the count once we
> hit max+1 items.

Huh?  list_length() is constant-time.

regards, tom lane

Re: MERGE SQL statement for PG12

2019-01-14 Thread Tomas Vondra




On 1/11/19 12:26 AM, Peter Geoghegan wrote:
> On Thu, Jan 10, 2019 at 1:15 PM Robert Haas  wrote:
>> I feel like there has been some other thread where this was discussed,
>> but I can't find it right now.  I think that the "query construction"
>> logic in transformMergeStmt is fundamentally the wrong way to do this.
>> I think several others have said the same.  And I don't think this
>> should be considered for commit until that is rewritten.
> 
> I agree with that.
> 
> I think that it's worth acknowledging that Pavan is in a difficult
> position with this patch. As things stand, Pavan really needs input
> from a couple of people to put the query construction stuff on the
> right path, and that input has not been forthcoming. I'm not
> suggesting that anybody has failed to meet an obligation to Pavan to
> put time in here, or that anybody has suggested that this is down to a
> failure on Pavan's part. I'm merely pointing out that Pavan is in an
> unenviable position with this patch, through no fault of his own, and
> despite a worthy effort.
> 

I 100% agree with this. I guess this is the phase where you have a patch
that generally does the thing you want it to do, but others are saying
the approach is not the right one. But you're under the spell of your
approach and can't really see why would it be wrong ...

> I hope that he sticks it out, because that seems to be what it takes
> to see something like this through. I don't know how to do better at
> that.

In my experience shepherding a patch like this is quite exhausting, and
while I've always advocated for pushing the MERGE patch forward, I'd not
blame Pavan one iota for just abandoning it.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Tomas Vondra



On 1/14/19 12:20 PM, Dean Rasheed wrote:
> (Removing Adrien from the CC list, because messages to that address
> keep bouncing)
> 
> On Sun, 13 Jan 2019 at 00:31, Tomas Vondra  
> wrote:
>>
>> On 1/10/19 6:09 PM, Dean Rasheed wrote:
>>>
>>> In the previous discussion around UpdateStatisticsForTypeChange(), the
>>> consensus appeared to be that we should just unconditionally drop all
>>> extended statistics when ALTER TABLE changes the type of an included
>>> column (just as we do for per-column stats), since such a type change
>>> can rewrite the data in arbitrary ways, so there's no reason to assume
>>> that the old stats are still valid. I think it makes sense to extract
>>> that as a separate patch to be committed ahead of these ones, and I'd
>>> also argue for back-patching it.
>>
>> Wasn't the agreement to keep stats that don't include column values
>> (functional dependencies and ndistinct coefficients), and reset only
>> more complex stats? That's what happens in master and how it's extended
>> by the patch for MCV lists and histograms.
>>
> 
> Ah OK, I misremembered the exact conclusion reached last time. In that
> case the logic in UpdateStatisticsForTypeChange() looks wrong:
> 
> /*
>  * If we can leave the statistics as it is, just do minimal cleanup
>  * and we're done.
>  */
> if (!attribute_referenced && reset_stats)
> {
> ReleaseSysCache(oldtup);
> return;
> }
> 
> That should be "|| !reset_stats", or have more parentheses.

Yeah, it should have been

if (!(attribute_referenced && reset_stats))

i.e. there's a parenthesis missing. Thanks for noticing this. I guess a
regression test for this would be useful.

> In fact, I think that computing attribute_referenced is unnecessary
> because the dependency information includes the columns that the
> stats are for and ATExecAlterColumnType() uses that, so
> attribute_referenced will always be true.
Hmmm. I'm pretty sure I came to the conclusion it's in fact necessary,
but I might be wrong. Will check.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Reducing header interdependencies around heapam.h et al.

2019-01-14 Thread Alvaro Herrera

On 2019-Jan-13, Andres Freund wrote:

> One split I am wondering about however is splitting out the sysstable_
> stuff out of genam.h. It's imo a different component and shouldn't
> really be in there. Would be quite a bit of rote work to add all the
> necessary includes over the backend...

Yeah -- unless there's a demonstrable win from this split, I would leave
this part alone, unless you regularly carry a shield to PG conferences.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Reducing header interdependencies around heapam.h et al.

2019-01-14 Thread Alvaro Herrera

On 2019-Jan-13, Andres Freund wrote:

> On 2019-01-13 23:54:58 -0300, Alvaro Herrera wrote:
> > On 2019-Jan-13, Andres Freund wrote:
> > 
> > > Alvaro, you'd introduced the split of HeapScanDesc and HeapScanDescData
> > > being in different files (in a3540b0f657c6352) - what do you think about
> > > this change?  I didn't revert that split, but I think we ought to
> > > consider just relying on a forward declared struct in heapam.h instead,
> > > this split is pretty confusing and seems to lead to more interdependence
> > > in a lot of cases.
> > 
> > I wasn't terribly happy with that split, so I'm not opposed to doing
> > things differently.  For your consideration, I've had this patch lying
> > around for a few years, which (IIRC) reduces the exposure of heapam.h by
> > splitting relscan.h in two.  This applies on top of dd778e9d8884 (and as
> > I recall it worked well there).
> 
> You forgot to attach that patch... :).

Oops :-(  Here it is anyway.  Notmuch reminded me that I had posted this
before, to a pretty cold reception:
https://postgr.es/m/20130917020228.gb7...@eldon.alvh.no-ip.org
Needless to say, I disagree with the general sentiment in that thread
that header refactoring is pointless and unwelcome.

> I'm not sure I see a need to split relscan - note my patch makes it so
> that it's not included by heapam.h anymore, and doing for the same for
> genam.h would be fairly straightforward.  The most interesting bit there
> would be whether we'd add the includes necessary for Snapshot (imo no),
> Relation (?), ScanKey (imo no), or whether to add the necessary includes
> directly.

Ah, you managed to get heapam.h and genam.h out of execnodes.h, which I
think was my main motivation ... that seems good enough to me.  I agree
that splitting relscan.h may not be necessary after these changes.

As for struct Relation, note that for that one you only need relcache.h
which should be lean enough, so it doesn't sound too bad to include that
one wherever.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/access/gin/ginget.c b/src/backend/access/gin/ginget.c
index cb17d383bc..1dfa888ec8 100644
--- a/src/backend/access/gin/ginget.c
+++ b/src/backend/access/gin/ginget.c
@@ -15,7 +15,7 @@
 #include "postgres.h"
 
 #include "access/gin_private.h"
-#include "access/relscan.h"
+#include "access/relscan_details.h"
 #include "miscadmin.h"
 #include "utils/datum.h"
 #include "utils/memutils.h"
diff --git a/src/backend/access/gin/ginscan.c b/src/backend/access/gin/ginscan.c
index afee2dbd92..d22724fcd7 100644
--- a/src/backend/access/gin/ginscan.c
+++ b/src/backend/access/gin/ginscan.c
@@ -15,7 +15,7 @@
 #include "postgres.h"
 
 #include "access/gin_private.h"
-#include "access/relscan.h"
+#include "access/relscan_details.h"
 #include "pgstat.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
diff --git a/src/backend/access/gist/gistget.c b/src/backend/access/gist/gistget.c
index e97ab8f3fd..c58152d342 100644
--- a/src/backend/access/gist/gistget.c
+++ b/src/backend/access/gist/gistget.c
@@ -15,7 +15,7 @@
 #include "postgres.h"
 
 #include "access/gist_private.h"
-#include "access/relscan.h"
+#include "access/relscan_details.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "utils/builtins.h"
diff --git a/src/backend/access/gist/gistscan.c b/src/backend/access/gist/gistscan.c
index b5553ffe69..e7e66141b0 100644
--- a/src/backend/access/gist/gistscan.c
+++ b/src/backend/access/gist/gistscan.c
@@ -16,7 +16,7 @@
 
 #include "access/gist_private.h"
 #include "access/gistscan.h"
-#include "access/relscan.h"
+#include "access/relscan_details.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 
diff --git a/src/backend/access/hash/hash.c b/src/backend/access/hash/hash.c
index 8895f58503..3f52a613cb 100644
--- a/src/backend/access/hash/hash.c
+++ b/src/backend/access/hash/hash.c
@@ -19,7 +19,7 @@
 #include "postgres.h"
 
 #include "access/hash.h"
-#include "access/relscan.h"
+#include "access/relscan_details.h"
 #include "catalog/index.h"
 #include "commands/vacuum.h"
 #include "optimizer/cost.h"
diff --git a/src/backend/access/hash/hashscan.c b/src/backend/access/hash/hashscan.c
index 8c2918f14f..54e91f3395 100644
--- a/src/backend/access/hash/hashscan.c
+++ b/src/backend/access/hash/hashscan.c
@@ -16,7 +16,7 @@
 #include "postgres.h"
 
 #include "access/hash.h"
-#include "access/relscan.h"
+#include "access/relscan_details.h"
 #include "utils/memutils.h"
 #include "utils/rel.h"
 #include "utils/resowner.h"
diff --git a/src/backend/access/hash/hashsearch.c b/src/backend/access/hash/hashsearch.c
index 91661ba0e0..9ab44d047f 100644
--- a/src/backend/access/hash/hashsearch.c
+++ b/src/backend/access/hash/hashsearch.c
@@ -15,7 +15,7 @@
 #include "postgres.h"
 
 #include "access/hash.h"
-#include "access/relscan.h"
+#include "access/relscan_details.h"
 #include "miscadmin.h"
 #include "pgstat.h"
 #include "u

Re: insensitive collations

2019-01-14 Thread Andreas Karlsson


On 1/10/19 8:44 AM, Peter Eisentraut wrote:

On 09/01/2019 19:49, Andreas Karlsson wrote:

Maybe this is orthogonal and best handled elsewhere but have you when
working with string equality given unicode normalization forms[1] any
thought?


Nondeterministic collations do address this by allowing canonically
equivalent code point sequences to compare as equal.  You still need a
collation implementation that actually does compare them as equal; ICU
does this, glibc does not AFAICT.


Ah, right! You could use -ks-identic[1] for this.


Would there be any point in adding unicode normalization support into
the collation system or is this best handle for example with a function
run on INSERT or with something else entirely?


I think there might be value in a feature that normalizes strings as
they enter the database, as a component of the encoding conversion
infrastructure.  But that would be a separate feature.


Agreed. And if we ever implement this we could theoretically optimize 
the equality of -ks-identic to do a strcmp() rather than having to 
collate anything.


I think it could also be useful to just add functions which can 
normalize strings, which was in a proposal to the SQL standard which was 
not accepted.[2]


Notes

1. http://www.unicode.org/reports/tr35/tr35-collation.html#Setting_Options
2. https://dev.mysql.com/worklog/task/?id=2048

Andreas

Re: Proving IS NOT NULL inference for ScalarArrayOpExpr's

2019-01-14 Thread James Coleman

On Sun, Jan 13, 2019 at 3:06 PM Tom Lane  wrote:
>
> There's still a logical problem here, which is that in order to
> prove that the ScalarArrayOpExpr yields NULL when its LHS does,
> you also have to prove that the RHS is not an empty array.
>
> Otherwise you're up against the fact that the OR of zero boolean
> expressions is false not null:
>
> regression=# select null::int = any(array[]::int[]);
>  ?column?
> --
>  f
> (1 row)

I've constructed a test case running the test_predtest function on the
expression
"select x is null, x = any(array[]::int[]) from integers" which I
think gets at the logic bug you're noting. In doing this I noticed
something I don't fully understand: on master this shows that "x =
any(array[]::int[])" proves "x is null" (while my patch shows the
opposite). Is this because the second expression being false means
there's can be no valid value for x? It also claims it refutes the
same, which is even more confusing.

> While this is doable when the RHS is an array constant or ArrayExpr, I'm
> afraid it's likely to be a bit tedious.  (Hm, actually predicate_classify
> already has logic to extract the number of elements in these cases ...
> I wonder if there's any use in trying to share code?)

It seems to me that this would mean that an IS NOT NULL index would
still not be proven to be usable for a non-constant array (e.g., from
a subquery calculated array), and in fact I've included a (failing)
test demonstrating that fact in the attached patch. This feeds into my
question above about there possibly being another corollary/proof that
could be added for this case.

Semi-related: I noticed that predicate_classify handles an ArrayExpr
by using list_length; this seems unnecessary if we know we can fail
fast. I realize it's not a common problem, but I have seen extremely
long arrays before, and maybe we should fall out of the count once we
hit max+1 items. My new code also has this "problem", but I figured it
was worth getting this logically correct before attempting to optimize
that especially since it already exists in one place.

> Minor stylistic quibble: I don't like where you inserted the new code in
> clause_is_strict_for, because it renders the comment a few lines above
> that unrelatable to the code.  I'd put it at the bottom, after the
> is_funcclause stanza.

Fixed in this revision.

James Coleman

saop_is_not_null-v4.patch
Description: Binary data

Re: Pluggable Storage - Andres's take

2019-01-14 Thread Amit Khandekar

Thanks for the patch updates.

A few comments so far from me :

+static void _selectTableAccessMethod(ArchiveHandle *AH, const char
*tablespace);
tablespace => tableam

+_selectTableAccessMethod(ArchiveHandle *AH, const char *tableam)
+{
+   PQExpBuffer cmd = createPQExpBuffer();
createPQExpBuffer() should be moved after the below statement, so that
it does not leak memory :
if (have && strcmp(want, have) == 0)
return;

char*tableam; /* table access method, onlyt for TABLE tags */
Indentation is a bit misaligned. onlyt=> only



@@ -2696,6 +2701,7 @@ ReadToc(ArchiveHandle *AH)
te->tablespace = ReadStr(AH);

te->owner = ReadStr(AH);
+  te->tableam = ReadStr(AH);

Above, I am not sure about the this, but possibly we may require to
have archive-version check like how it is done for tablespace :
if (AH->version >= K_VERS_1_10)
   te->tablespace = ReadStr(AH);

So how about bumping up the archive version and doing these checks ?
Otherwise, if we run pg_restore using old version, we may read some
junk into te->tableam, or possibly crash. As I said, I am not sure
about this due to lack of clear understanding of archive versioning,
but let me know if you indeed find this issue to be true.

Re: [Patch] Create a new session in postmaster by calling setsid()

2019-01-14 Thread Heikki Linnakangas


On 30/12/2018 22:56, Tom Lane wrote:

This comment needs copy-editing:

+ * If the user hits interrupts the startup (e.g. with CTRL-C), we'd

Looks good otherwise.


Thanks, fixed that, and pushed.

- Heikki

Re: insensitive collations

2019-01-14 Thread Daniel Verite

Peter Eisentraut wrote:

> Here is an updated patch.

On a table with pre-existing contents, the creation of a unique index
does not seem to detect the duplicates that are equal per the
collation and different binary-wise.

postgres=# \d test3ci
Table "public.test3ci"
 Column | Type |Collation | Nullable | Default 
+--+--+--+-
 x  | text | case_insensitive |  | 

postgres=# select * from test3ci;
  x  
-
 abc
 ABC
 def
 ghi
(4 rows)

postgres=# create unique index idx on test3ci(x);  -- EXPECTED TO FAIL
CREATE INDEX

postgres=# \d test3ci
Table "public.test3ci"
 Column | Type |Collation | Nullable | Default 
+--+--+--+-
 x  | text | case_insensitive |  | 
Indexes:
"idx" UNIQUE, btree (x)

postgres=# select count(*) from test3ci where x='abc';
 count 
---
 2
(1 row)

OTOH with an already existing unique index, attempts to insert
such duplicates are rejected as expected.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Remove all "INTERFACE ROUTINES" style comments

2019-01-14 Thread Heikki Linnakangas


On 12/01/2019 03:12, Andres Freund wrote:

On 2019-01-10 15:58:41 -0800, Andres Freund wrote:

A number of postgres files have sections like heapam's

  * INTERFACE ROUTINES
...
They're often out-of-date, and I personally never found them to be
useful. A few people, including yours truly, have been removing a few
here and there when overhauling a subsystem to avoid having to update
and then adjust them.

I think it might be a good idea to just do that for all at once. Having
to consider separately committing a removal, updating them without
fixing preexisting issues, or just leaving them outdated on a regular
basis imo is a usless distraction.


As the reaction was positive, here's a first draft of a commit removing
them. A few comments:

- I left two INTERFACE ROUTINES blocks intact, because they actually add
   somewhat useful information. Namely fd.c's, which actually seems
   useful, and predicate.c's about which I'm less sure.
- I tried to move all comments about the routines in the INTERFACE
   section to the functions if they didn't have a roughly equivalent
   comment. Even if the comment wasn't that useful. Particularly just
   about all the function comments in executor/node*.c files are useless,
   but I thought that's something best to be cleaned up separately.
- After removing the INTERFACE ROUTINES blocks a number of executor
   files had a separate comment block with just a NOTES section. I merged
   these with the file header comment blocks, and indented them to
   match. I think this is better, but I'm only like 60% convinced of
   that.

Comments? I'll revisit this patch on Monday or so, make another pass
through it, and push it then.


I agree that just listing all the public functions in a source file is 
not useful. But listing the most important ones, perhaps with examples 
on how to use them together, or grouping functions when there are a lot 
of them, is useful. A high-level view of the public interface is 
especially useful for people who are browsing the code for the first time.


The comments in execMain.c seemed like a nice overview to the interface. 
I'm not sure the comments on each function do quite the same thing. The 
grouping of the functions in pqcomm.c's seemed useful. Especially when 
some of the routines listed there are actually macros defined in 
libpq.h, so if someone just looks at the contents of pqcomm.c, he might 
not realize that there's more in libpq.h. The grouping in pqformat.c 
also seemd useful.


In that vein, the comments in heapam.c could be re-structured, something 
like this:


 * Opening/closing relations
 * -
 *
 * The relation_* functions work on any relation, not only heap
 * relations:
 *
 *  relation_open   - open any relation by relation OID
 *  relation_openrv - open any relation specified by a RangeVar
 *  relation_close  - close any relation
 *
 * These are similar, but check that the relation is a Heap
 * relation:
 *
 *  heap_open- open a heap relation by relation OID
 *  heap_openrv  - open a heap relation specified by a RangeVar
 *  heap_close   - (now just a macro for relation_close)
 *
 * Heap scans
 * --
 *
 * Functions for doing a Sequential Scan on a heap table:
 *
 *  heap_beginscan  - begin relation scan
 *  heap_rescan - restart a relation scan
 *  heap_endscan- end relation scan
 *  heap_getnext- retrieve next tuple in scan
 *
 * To retrieve a single heap tuple, by tid:
 *  heap_fetch  - retrieve tuple with given tid
 *
 * Updating a heap
 * ---
 *
 *  heap_insert - insert tuple into a relation
 *  heap_multi_insert   - insert multiple tuples into a relation
 *  heap_delete - delete a tuple from a relation
 *  heap_update - replace a tuple in a relation with another tuple
 *  heap_sync   - sync heap, for when no WAL has been written

- Heikki

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Dean Rasheed

(Removing Adrien from the CC list, because messages to that address
keep bouncing)

On Sun, 13 Jan 2019 at 00:31, Tomas Vondra  wrote:
>
> On 1/10/19 6:09 PM, Dean Rasheed wrote:
> >
> > In the previous discussion around UpdateStatisticsForTypeChange(), the
> > consensus appeared to be that we should just unconditionally drop all
> > extended statistics when ALTER TABLE changes the type of an included
> > column (just as we do for per-column stats), since such a type change
> > can rewrite the data in arbitrary ways, so there's no reason to assume
> > that the old stats are still valid. I think it makes sense to extract
> > that as a separate patch to be committed ahead of these ones, and I'd
> > also argue for back-patching it.
>
> Wasn't the agreement to keep stats that don't include column values
> (functional dependencies and ndistinct coefficients), and reset only
> more complex stats? That's what happens in master and how it's extended
> by the patch for MCV lists and histograms.
>

Ah OK, I misremembered the exact conclusion reached last time. In that
case the logic in UpdateStatisticsForTypeChange() looks wrong:

/*
 * If we can leave the statistics as it is, just do minimal cleanup
 * and we're done.
 */
if (!attribute_referenced && reset_stats)
{
ReleaseSysCache(oldtup);
return;
}

That should be "|| !reset_stats", or have more parentheses. In fact, I
think that computing attribute_referenced is unnecessary because the
dependency information includes the columns that the stats are for and
ATExecAlterColumnType() uses that, so attribute_referenced will always
be true.

Regards,
Dean

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-14 Thread Dean Rasheed

On Sun, 13 Jan 2019 at 00:04, Tomas Vondra  wrote:
> On 1/12/19 8:49 AM, Dean Rasheed wrote:
> > A possible refinement would be to say that if there are more than
> > stats_target items more common than this mincount threshold, rather than
> > excluding the least common ones to get the target number of items,
> > exclude the ones closest to their base frequencies, on the grounds that
> > those are the ones for which the MCV stats will make the least
> > difference. That might complicate the code somewhat though -- I don't
> > have it in front of me, so I can't remember if it even tracks more than
> > stats_target items.
>
> Yes, the patch does limit the number of items to stats_target (a maximum
> of per-attribute stattarget values, to be precise). IIRC that's a piece
> you've added sometime last year ;-)
>
> I've been experimenting with removing items closest to base frequencies
> today, and I came to the conclusion that it's rather tricky for a couple
> of reasons.
>
> 1) How exactly do you measure "closeness" to base frequency? I've tried
> computing the error in different ways, including:
>
>   * Max(freq/base, base/freq)
>   * abs(freq - base)
>
> but this does not seem to affect the behavior very much, TBH.
>
> 2) This necessarily reduces mcv_totalsel, i.e. it increases the part not
> covered by MCV. And estimates on this part are rather crude.
>
> 3) It does nothing for "impossible" items, i.e. combinations that do not
> exist at all. Clearly, those won't be part of the sample, and so can't
> be included in the MCV no matter which error definition we pick. And for
> very rare combinations it might lead to sudden changes, depending on
> whether the group gets sampled or not.
>
> So IMHO it's better to stick to the simple SRE approach for now.
>

OK, that makes sense.

Regards,
Dean

1 2 >

1 - 100 of 101 matches

Mail list logo