date:20190124

RE: Protect syscache from bloating with negative cache entries

2019-01-24 Thread Tsunakawa, Takayuki

From: Robert Haas [mailto:robertmh...@gmail.com]
> On Thu, Jan 24, 2019 at 10:02 AM Tom Lane  wrote:
> > I will argue hard that we should not do it at all, ever.
> >
> > There is already a mechanism for broadcasting global GUC changes:
> > apply them to postgresql.conf (or use ALTER SYSTEM) and SIGHUP.
> > I do not think we need something that can remotely change a GUC's
> > value in just one session.  The potential for bugs, misuse, and
> > just plain confusion is enormous, and the advantage seems minimal.
> 
> I think there might be some merit in being able to activate debugging
> or tracing facilities for a particular session remotely, but designing
> something that will do that sort of thing well seems like a very
> complex problem that certainly should not be sandwiched into another
> patch that is mostly about something else.  And if we ever get such a
> thing I suspect it should be entirely separate from the GUC system.

+1 for a separate patch for remote session configuration.  ALTER SYSTEM + 
SIGHUP targeted at a particular backend would do if the DBA can log into the 
database server (so, it can't be used for DBaaS.)  It would be useful to have 
pg_reload_conf(pid).


Regards
Takayuki Tsunakawa

Re: PostgreSQL vs SQL/XML Standards

2019-01-24 Thread Pavel Stehule

pá 25. 1. 2019 v 5:46 odesílatel Chapman Flack 
napsal:

> Hi,
>
> On 01/21/19 01:33, Pavel Stehule wrote:
> > ne 20. 1. 2019 23:13 odesílatel Chapman Flack 
> > napsal:
>
> >> form (whether or not the word LATERAL is used), and re-executes xmltable
> >> whenever the referenced column value changes. In that case, whether the
> >> default argument is evaluated at function entry or later doesn't seem
> >> to matter: the function is re-executed, so evaluating the new default
> >> at the time of entry is sufficient.
> >
> > it has sense only for volatile functions. it was not often. On second
> hand
> > deferred evaluation shoul not be a problem, and more, it is evaluated
> only
> > when it is necessary, what is more sensible for me.
>
> That makes sense. I trimmed the language about input rows and referring to
> earlier columns, and put more emphasis on the usefulness of evaluating
> volatile functions only when needed.
>
> I am:
>
> - re-attaching xmltable-xpath-result-processing-bugfix-5.patch unchanged
>   (just so CF app does not lose track)
>
> - re-attaching xmltable-xmlexists-passing-mechanisms-1.patch unchanged
>
> - attaching for the first time xml-functions-type-docfix-1.patch
>
> The doc patch is made to go on top of the passing-mechanisms patch
> (there were some doc changes in passing-mechanisms, changed again in
> the new patch to be links to the added Limits and Compatibility section).
>
> I hope the patched docs describe accurately what we have at this point.
>

looks well

Pavel


> Regards,
> -Chap
>

Re: proposal - plpgsql unique statement id

2019-01-24 Thread Pavel Stehule

čt 24. 1. 2019 v 23:08 odesílatel Tom Lane  napsal:

> Peter Eisentraut  writes:
> > committed
>
> Why didn't this patch modify the dumping logic in pl_funcs.c to print
> the IDs?  I'm not aware of other cases where we intentionally omit
> fields from debug-support printouts.
>

I had not a idea, so this field can be useful there. I'll send a patch with
it.

Regards

Pavel



> regards, tom lane
>

Use zero for nullness estimates of system attributes

2019-01-24 Thread Edmund Horner

I added some code to selfuncs.c to estimate the selectivity of CTID,
including nullness, in my ongoing attempt to add TID range scans [1].  And
as Tom pointed out [2], no system attribute can be null, so we might as
well handle them all.

That's what the attached patch does.
I observed a few interesting things with outer join selectivity:

While system attributes aren't NULL in the table, they can be in queries
such as:

SELECT *
FROM a LEFT JOIN b ON a.id = b.id
WHERE b.ctid IS NULL;

And the patch does affect the estimates for such plans.  But it's just
replacing one hardcoded nullness (0.005) for another (0.0), which seems no
worse than the original.

I was a bit concerned that with, for example,

CREATE TABLE a (id INTEGER);
INSERT INTO a SELECT * FROM generate_series(1,1000);
ANALYZE a;
CREATE TABLE b (id INTEGER, id2 INTEGER);
INSERT INTO b SELECT *, * FROM generate_series(1,10);
ANALYZE b;

EXPLAIN ANALYZE
SELECT * FROM a LEFT JOIN b ON a.id = b.id
WHERE b.ctid IS NULL;

you get a row estimate of 1 (vs the actual 990).  It's not specific to
system attributes.  Plain left-join selectivity calculation doesn't seem to
take into account the join selectivity, while anti-join calculation does.

I do not think this affects the usefulness of the present patch, but maybe
it's something we could improve.

Finally: I thought about introducing a macro to attnum.h:

/*
* AttrNumberIsForSystemAttr
* True iff the attribute number corresponds to a system attribute.
*/
#define AttrNumberIsForSystemAttr(attributeNumber) \
 ((bool) ((attributeNumber) < 0))

But there's a zillion places that could be changed to use it, so I haven't
in this version of the patch.

Edmund

[1]
https://www.postgresql.org/message-id/flat/31682.1545415852%40sss.pgh.pa.us#bdca5c18ed64f847f44c2645f98ea3a0
[2] https://www.postgresql.org/message-id/31682.1545415852%40sss.pgh.pa.us


v1-nullness-selectivity-for-system-cols.patch
Description: Binary data

Re: backslash-dot quoting in COPY CSV

2019-01-24 Thread Bruce Momjian

On Wed, Jan  2, 2019 at 04:58:35PM +0100, Daniel Verite wrote:
>  Hi,
> 
> The doc on COPY CSV says about the backslash-dot sequence:
> 
>   To avoid any misinterpretation, a \. data value appearing as a
>   lone entry on a line is automatically quoted on output, and on
>   input, if quoted, is not interpreted as the end-of-data marker
> 
> However this quoting does not happen when \. is already part
> of a quoted field. Example:
> 
> COPY (select 'somevalue', E'foo\n\\.\nbar') TO STDOUT CSV;
> 
> outputs:
> 
> somevalue,"foo
> \.
> bar"
> 
> which conforms to the CSV rules, by which we are not allowed
> to replace \. by anything AFAICS.
> The trouble is, when trying to import this back with COPY FROM,
> it will error out at the backslash-dot and not import anything.
> Furthermore, if these data are meant to be embedded into a
> script, it creates a potential risk of SQL injection.
> 
> It is a known issue? I haven't found previous discussions on this.
> It looks to me like the ability of backslash-dot to be an end-of-data
> marker should be neutralizable for CSV. When the data is not embedded,
> it's not needed anyway, and when it's embedded, we could surely think
> of alternatives.

I was unable to reproduce the failure here using files:

CREATE TABLE test (x TEXT);
INSERT INTO test VALUES (E'foo\n\\.\nbar');

COPY test TO STDOUT CSV;
"foo
\.
bar"

COPY test TO '/u/postgres/tmp/x' CSV;

COPY test FROM '/u/postgres/tmp/x' CSV;

SELECT * FROM test;
  x
-
 foo+
 \. +
 bar
 foo+
 \. +
 bar

but I am able to see the failure using STDIN:

COPY test FROM STDIN CSV;
Enter data to be copied followed by a newline.
End with a backslash and a period on a line by itself, or an EOF signal.
"foo
\.
ERROR:  unterminated CSV quoted field
CONTEXT:  COPY test, line 1: ""foo

This seems like a bug to me.  Looking at the code, psql issues the
prompts for STDIN, but when it sees \. alone on a line, it has no idea
you are in a quoted CSV string, so it thinks the copy is done and sends
the result to the server.  I can't see an easy way to fix this.  I guess
we could document it.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: WIP: Avoid creation of the free space map for small tables

2019-01-24 Thread Amit Kapila

On Fri, Jan 25, 2019 at 1:03 AM John Naylor  wrote:
>
> On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila  wrote:
> > I think what doc means to say is
> > that it copies any unlinked files present in primary's new cluster
> > (which in your case will be data2).
>
> In that case, I'm still confused why that doc says, "Unfortunately,
> rsync needlessly copies files associated with temporary and unlogged
> tables because these files don't normally exist on standby servers."
> I fail to see why the primary's new cluster would have these if they
> weren't linked.
>

Why unlogged files won't be in primary's new cluster?  After the
upgrade, they should be present in a new cluster if they were present
in the old cluster.

> And in the case we're discussing here, the skipped
> FSMs won't be on data2, so won't end up in standby/data2.
>

Right.  I think we are safe with respect to rsync because I have seen
that we do rewrite the vm files in link mode and rsync will copy them
from primary's new cluster.

I think you can try to address my other comments on your pg_upgrade
patch.   Once we agree on the code, we need to test below scenarios:
(a) upgrade from all supported versions to the latest version
(b) upgrade standby with and without using rsync.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

RE: pg_upgrade: Pass -j down to vacuumdb

2019-01-24 Thread Jamison, Kirk

Hi,

According to CF app, this patch needs review so I took a look at it.
Currently, your patch applies and builds cleanly, and all tests are also 
successful
based from the CF bot patch tester.

I'm not sure if I have understood the argument raised by Peter correctly. 
Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same 
way, so it's not a given that the same -j number should be used."
I think it's whether the # jobs for pg_upgrade is used the same way for 
parallel vacuum. 

According to the official docs, the recommended setting for pg_upgrade --j 
option is the maximum of the number of CPU cores and tablespaces. [1]
As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your 
max_connections [2], which is the max # of concurrent connections to your db 
server.

[1] https://www.postgresql.org/docs/current/pgupgrade.html
[2] https://www.postgresql.org/docs/current/app-vacuumdb.html


Regards,
Kirk Jamison

Re: \describe*

2019-01-24 Thread Corey Huinker

Attached is a patch to add verbose \describe commands to compliment our
existing but slightly cryptic family of \d commands.

The goals of this are:
- aid user discovery of \d-commands via tab completion
- make scripts and snippets slightly more self-documenting and
understandable
- save experienced users that 0.22 seconds where they try to remember what
\dFpS+ means or which command lists user mappings.

DESIGN CHOICES:

Every new command is of the form
\describe-some-system-object-type[-system][-verbose]. The -system suffix
stands in for the 'S' suffix and -verbose stands in for '+'.

New commands used the singular form, not plural.

Every new command has a direct analog \d-command, but the reverse is not
always true, especially when it comes to the commands that can specify
multiple object types. In those cases, there are multiple long versions
that correspond to several singular parameters (\describe-view,
\describe-materialized-view, \describe-index, etc) but no combinatorics
(i.e. no \describe-view-and-foreign-table).

There is a \describe-schema and \describe-namespace, both of which perform
\dn.

There is a \describe-role but no \describe-user or \describe-database-role.

I chose \describe-privilege for \dp

I chose \describe-type for \dT instead of \describe-data-type.

The command \describe-aggregate-function is \dfa, whereas
\describe-aggregate is \da.

NOTES:

There is currently nothing stopping you from using the short form suffixes
on long form commands, but the reverse isn't true. For example, you can
type \describe-functionS+ and it'll work, but \df-verbose will not. I allow
this mostly because it would take work to prevent it.

Documentation XML was updated but not formatted to make the diff easier to
read.

No regression cases were added. Currently our coverage of \d commands in
psql ifself is quite minimal:

~/src/postgres$ grep '\\d' src/test/regress/sql/psql.sql | sort | uniq
\copyright \dt arg1 \e arg1 arg2
\df exp
\d psql_serial_tab_id_seq

but perhaps we could test it indirectly in these other areas:

~/src/postgres/src/test/regress/sql$ grep '\\d' * | sed -e 's/^.*\\d/\\d/g'
-e 's/ .*//g' | sort | uniq -c
156 \d
  2 \d'
  1 \d*',
157 \d+
  1 \d{4})',
  1 \da
  2 \d+):([a-zA-Z0-9+/=]+)\$([a-zA-Z0-9+=/]+):([a-zA-Z0-9+/=]+)',
  4 \des
  8 \des+
  1 \det+
  4 \deu
  6 \deu+
  1 \dew
 14 \dew+
 21 \df
  1 \dfn
  1 \dfp
  4 \dp
  4 \dRp
  6 \dRp+
  2 \dRs
  3 \dRs+
  2 \dt

On Mon, Jan 29, 2018 at 9:56 AM David Fetter  wrote:

> On Mon, Jan 29, 2018 at 02:51:53PM +, Ryan Murphy wrote:
> > >
> > > >What I propose is in fact a server command, >which at least three of
> > > >the other popular RDBMSs already have.
> > >
> > Well to actually implement it, it would probably be a client command,
> > because that's what \d* are.
>
> Why should this command be silo'ed off to the psql client?  If it's a
> server command, it's available to all clients, not just psql.
>
> > We would most likely want them implemented the same, to avoid
> > needless complexity.
>
> We could certainly have \d call DESCRIBE for later versions of the
> server.  \ commands which call different SQL depending on server
> version have long been a standard practice.
>
> > I think people are more ok with \describe (with the backslash), which
> seems
> > like what you're suggesting anyway.  I read Vik's "hard pass" as being on
> > having DESCRIBE which looks like an SQL command but would actually be
> > implemented on the client.  This seems simpler at first but could cause
> > deep confusion later.
>
> If we implement \d as DESCRIBE for server versions as of when DESCRIBE
> is actually implemented, we've got wins all around.
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
From e67e61ae789b09c98fe03378c819224d838c2f65 Mon Sep 17 00:00:00 2001
From: Corey Huinker 
Date: Fri, 25 Jan 2019 00:57:23 +
Subject: [PATCH] Add \describe commands to compliment \d commands

---
 doc/src/sgml/ref/psql-ref.sgml | 175 -
 src/bin/psql/command.c | 132 -
 src/bin/psql/describe.c|  13 ++-
 src/bin/psql/describe.h|   3 +
 src/bin/psql/tab-complete.c| 135 -
 5 files changed, 381 insertions(+), 77 deletions(-)

diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml
index 6c76cf2f00..363d6d9678 100644
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@@ -871,6 +871,17 @@ testdb=
 same line.

+
+The family of meta-commands starting with \d often
+have an equivalent \describe- "long form" command.
+The long-form commands often have the suffixes -system
+and -verbose which are the equivalent of the
+short form suffixes S and +
+

RE: libpq debug log

2019-01-24 Thread Nagaura, Ryohei

Hi Iwata-san,

I used your patch for my private work, so I write my opinion and four feedback 
below.
On Fri, Jan 18, 2019 at 8:19 AM, Iwata, Aya wrote:
> - Setting whether to get log or not by using connection strings or environment
> variables. It means that application source code changes is not needed to get
> the log.
> - Getting time when receive and send process start/end. Functions too.
This merit was very helpful for my use, so I want your proposal function in 
postgres.

The followings are feedback from me.

1)
It would be better making the log format the same as the server log format, I 
think.
Your log format:
2019/01/22 04:15:25.496 ...
Server log format:
2019-01-22 04:15:25.496 UTC ...
There are two differences:
One is separator character of date, "/" and "-".
The another is standard time information.

2)
It was difficult for me to understand the first line message in the log file.
"Max log size is 10B, log min level is LEVEL1"
Does this mean as follows?
"The maximum size of this file is 10 Bytes, the parameter 'log min level' is 
set to LEVEL 1."

3)
Under the circumstance that the environment variables "PGLOGDIR" and 
"PGLOGSIZE" are set correctly,
the log file will also be created when the user connect the server with "psql".
Does this follow the specification you have thought?
Is there any option to unset only in that session when you want to connect with 
"psql"?

4)
Your patch affects the behavior of PQtrace().
The log of the existing PQtrace() is as follows:
From backend> "id"
From backend (#4)> 16387
From backend (#2)> 1
From backend (#4)> 23
...
Your patch makes PQtrace() including the following log in addition to the above.
To backend> Msg complete, length 27
Start sending message to backend:End sending message to backend:PQsendQuery end 
:PQgetResult start :Start receiving message from backend:End receiving message 
from backend:From backend> T
...

For your information.
Best regards,
-
Ryohei Nagaura

Re: Synchronous replay take III

2019-01-24 Thread Thomas Munro

On Tue, Jan 15, 2019 at 11:17 PM Masahiko Sawada  wrote:
> Regarding the current (v10 patch) design I have some questions and
> comments.

Hi Sawada-san,

Thanks for your testing and feedback.

> The patch introduces new GUC parameter synchronous_replay. We can set
> synchronous_commit = off while setting synchronous_replay = on. With
> this setting, the backend will synchrnously wait for standbys to
> replay. I'm concerned that having two separate GUC parameters
> controling the transaction commit behaviour would confuse users. It's
> a just idea but maybe we can use 'remote_apply' for synchronous replay
> purpose and introduce new parameter for standby server something like
> allow_stale_read.

That is an interesting idea.  That choice means that the new mode
always implies synchronous_commit = on (since remote_apply is a
"higher" level).  I wanted them to be independent, so you could
express your durability requirement separately from your visibility
requirement.

Concretely, if none of your potential sync replay standbys are keeping
up and they are all dropped to "unavailable", then you'd be able to
see a difference: with your proposal we'd still have a synchronous
commit wait, but with mine that could independently be on or off.

Generally, I think we are too restrictive in our durability levels,
and there was some discussion about whether it's OK to have a strict
linear knob (which your idea extends):

https://www.postgresql.org/message-id/flat/CAEepm%3D3FFaanSS4sugG%2BApzq2tCVjEYCO2wOQBod2d7GWb%3DDvA%40mail.gmail.com

Hmm, perhaps your way would be better for now anyway, just because
it's simpler to understand and explain.  Perhaps you wouldn't need a
separate "allow_stale_read" GUC, you could just set synchronous_commit
to a lower level when talking to the standby.  (That is, give
synchronous_commit a meaning on standbys, whereas currently it has no
effect there.)

> If while a transaction is waiting for all standbys to replay they
> became to unavailable state, should the waiter be released? the patch
> seems not to release the waiter. Similarly, wal senders are not aware
> of postgresql.conf change while waiting synchronous replay. I think we
> should call SyncReplayPotentialStandby() in SyncRepInitConfig().

Good point about the postgresql.conf change.

If all the standbys go to unavailable state, then a waiter should be
released once they have all either acknowledged that they are
unavailable (ie acknowledged that their lease has been revoked, via a
reply message with a serial number matching the revocation message),
or if that doesn't happen (due to lost network connection, crashed
process etc), once the any leases that have been issued have expired
(ie a few seconds).  Is that not what you see?

> With the setting synchronous_standby_names = '' and
> synchronous_replay_standby_names = '*' we would get the standby's
> status in pg_stat_replication, sync_state = 'async' and sync_replay =
> 'available'. It looks odd to me. Yes, this status is correct in
> principle. But considering the architecture of PostgreSQL replication
> this status is impossible.

Yes, this is essentially the same thing that you were arguing against
above.  Perhaps you are right, and there are no people who would want
synchronous replay, but not synchronous commit.

> The synchronous_replay_standby_name = '*' setting means that the
> backend wait for all standbys connected to the master server to
> replay, is that right? In my test, even when some of synchronous
> replay standby servers got stuck and then therefore are revoked their
> lease, the backend could proceed transactions.

It means that it waits for all standbys that are "available" to
replay.  It doesn't wait for the "unavailable" ones.  Most of the
patch deals with the transitions between those states.  During an
available->revoking->unavailable transition, we also wait for the
standby to know that it is unavailable (so that it begins to raise
errors), and during an unavailable->joining->available transition we
also wait for the standby to replay the transition LSN (so that it
stops raising errors).  That way clients on the standby can rely on
the error (or lack of error) to tell them whether their snapshot
definitely contains every commit that has returned control on the
primary.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: monitoring CREATE INDEX [CONCURRENTLY]

2019-01-24 Thread Alvaro Herrera

On 2019-Jan-09, Pavan Deolasee wrote:

> Looks like it's missing the validate_index blocks-scanned report, which is
> not AM-specific and your original proposal does mention that.

True. That phase is actually 3 phases, which would be reported
separately:

  5. index_bulk_delete() scan
  6. performsort
  7. validate_index_heapscan

> I thought a bit about index_build part. If most AMs follow a somewhat
> standard phases while building an index, it might be simpler to define
> those phases and have AM agnostic code report those phases.

Well, think about btrees.  We first scan the whole table putting all
tuples in a spool (two spools actually), then we tell the spools to get
sorted, then we extract tuples from the spools, and finally we read the
spool to produce the leaf pages.  If we just report the table scan, the
reports gets to 100% complete in that phase and then waits for a very
long time during which nothing seems to happen.  That's not cool.

I'm adding a new AM support routine which turns the subphase number into
a textual description, so that we don't have to hardcode phase names in
the agnostic code.

> Can we have more descriptive text for waiters? Such as "waiting for
> existing writers", "waiting for intermediate writers" and "waiting for
> old readers".  Not sure if I got those correct, something of that sort
> will definitely give more insight into what the transaction is waiting
> for.

Can do.

> Can we actually also report the list of transactions the command is waiting
> on?
> That could be useful to the user if CIC appears to be stuck too long.

I'm afraid this is not possible, because the progress report interface
doesn't allow for something like that.

> IMHO we should just use full term INDEX instead of IDX, such as
> PROGRESS_CREATE_INDEX_PARTITIONS_TOTAL. It's already a long name, so couple
> of extra characters won't make a difference.

Really?  I though it was clear enough and it's *three* characters saved
even.

> I think we should clear the PROGRESS_WAITFOR_TOTAL and PROGRESS_WAITFOR_DONE
> when the wait phase is over, to avoid any confusion. For example, I noticed
> that the counters from WAIT_1 are reported as-is if WAIT_2 had no lockers.

Yes, absolutely.

> +GetLockConflicts(const LOCKTAG *locktag, LOCKMODE lockmode, int *ocount)
> 
> Could that out variable be named something differently? "countp" or
> something like that. I did not check if there is some practice that we
> follow, but I remember suffixing with "p" rather than prefixing with
> "o" (for out I assume)

Sure.

> +/* Progress parameters for CREATE INDEX */
> +#define PROGRESS_CREATEIDX_PHASE 0
> +/* 1 and 2 reserved for "waitfor" metrics */
> +#define PROGRESS_CREATEIDX_PARTITIONS_TOTAL 3
> +#define PROGRESS_CREATEIDX_PARTITIONS_DONE 4
> +
> 
> Is there a reason to leave those reserve placeholders, only to fill them a
> few lines down?

Yes -- those are going to be used by other reports that also use similar
metrics, such as the CLUSTER report.

I'm running out of columns to put the numbers into :-(  Right now I have

1. phase
2. subphase (for index_build)
3. lockers total (to wait for)
4. lockers done
5. blocks total
6. blocks done
7. tapes total
8. tapes done
9. partitions total
10. partitions done

The "tapes total/done" bit is about reporting the performsort steps; I'm
not really sure yet it'll be tapes, but I hope it can be done with two
numbers.  Anyway, in btree build I have these subphases

1. spool heapscan using IndexBuildHeapScan
2. _bt_parallel_heapscan stuff  (only one of these in a build)
3. bt_leafbuild, performsort spool 1
4. bt_leafbuild, performsort spool 2
5. bt_load

and I don't have columns to put the metrics for the btload thing,
assuming I figure out what those are.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [PATCH] Fix Proposal - Deadlock Issue in Single User Mode When IO Failure Occurs

2019-01-24 Thread Thomas Munro

On Sun, Jan 20, 2019 at 4:45 PM Amit Kapila  wrote:
> On Sat, Dec 1, 2018 at 2:30 AM Chengchao Yu  wrote:
> > Recently, we hit a few occurrences of deadlock when IO failure (including 
> > disk full, random remote disk IO failures) happens in single user mode. We 
> > found the issue exists on both Linux and Windows in multiple postgres 
> > versions.
> >
> > 3.   Because the unable to write relation data scenario is difficult to 
> > hit naturally even reserved space is turned off, I have prepared a small 
> > patch (see attachment “emulate-error.patch”) to force an error when PG 
> > tries to write data to relation files. We can just apply the patch and 
> > there is no need to put efforts flooding data to disk any more.
>
> I have one question related to the way you have tried to emulate the error.
>
> @@ -840,6 +840,10 @@ mdwrite(SMgrRelation reln, ForkNumber forknum,
> BlockNumber blocknum,
> nbytes,
> BLCKSZ);
> + ereport(ERROR,
> + (errcode(ERRCODE_INTERNAL_ERROR),
> + errmsg("Emulate exception in mdwrite() when writing to disk")));
> +
>
> We generally reserve the space in a relation before attempting to
> write, so not sure how you are able to hit the disk full situation via
> mdwrite.  If you see the description of the function, that also
> indicates same.

Presumably ZFS or BTRFS or something more exotic could still get
ENOSPC here, and of course any filesystem could give us EIO here
(because the disk is on fire or the remote NFS server is rebooting due
to an automatic Windows update).

-- 
Thomas Munro
http://www.enterprisedb.com

Re: Old protocol fastpath calls borked?

2019-01-24 Thread Tom Lane

Andres Freund  writes:
> Ah, brainfade.  Probably triggered by the fact that I forgot that we call
> input functions even on NULL (which never quite made sense to me).

That's so that domain_in can reject NULLs if the domain constraints
say to do so.

Mind you, the SQL committee should never have allowed NOT NULL
domain constraints in the first place, because the concept is
fundamentally incompatible with outer joins.  But it's there
and we try to honor it in this case.

regards, tom lane

Re: Old protocol fastpath calls borked?

2019-01-24 Thread Andres Freund

Hi,

On 2019-01-24 17:04:32 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > In parse_fcall_arguments_20():
> > we appear to constantly setting argnull to true for all arguments?
> 
> Uh, it looks to me like it does so only if the frontend sends a -1
> length field, which is the protocol's convention for indicating a
> null.

Ah, brainfade.  Probably triggered by the fact that I forgot that we call
input functions even on NULL (which never quite made sense to me).

Greetings,

Andres Freund

Re: proposal - plpgsql unique statement id

2019-01-24 Thread Tom Lane

Peter Eisentraut  writes:
> committed

Why didn't this patch modify the dumping logic in pl_funcs.c to print
the IDs?  I'm not aware of other cases where we intentionally omit
fields from debug-support printouts.

regards, tom lane

Re: Old protocol fastpath calls borked?

2019-01-24 Thread Tom Lane

Andres Freund  writes:
> In parse_fcall_arguments_20():
> we appear to constantly setting argnull to true for all arguments?

Uh, it looks to me like it does so only if the frontend sends a -1
length field, which is the protocol's convention for indicating a
null.

regards, tom lane

Old protocol fastpath calls borked?

2019-01-24 Thread Andres Freund

Hi,

In parse_fcall_arguments_20():

c0a8c3ac13f8 (Tom Lane   2003-05-08 18:16:37 + 579) argsize 
= pq_getmsgint(msgBuf, 4);
0ac6298bb8ac (Tom Lane   2003-05-09 18:08:48 + 580) if 
(argsize == -1)
0ac6298bb8ac (Tom Lane   2003-05-09 18:08:48 + 581) {
0ac6298bb8ac (Tom Lane   2003-05-09 18:08:48 + 582) 
fcinfo->argnull[i] = true;
147d4bf3e5e3 (Tom Lane   2006-04-04 19:35:37 + 583) 
fcinfo->arg[i] = OidReceiveFunctionCall(typreceive, NULL,
147d4bf3e5e3 (Tom Lane   2006-04-04 19:35:37 + 584) 
typioparam, -1);
0ac6298bb8ac (Tom Lane   2003-05-09 18:08:48 + 585) 
continue;
c0a8c3ac13f8 (Tom Lane   2003-05-08 18:16:37 + 586) }

we appear to constantly setting argnull to true for all arguments?  Since,
apparently, 2003?  I don't have a test-program at hand, but that kind of seems
to suggest this never really has been used?

Greetings,

Andres Freund

Re: Delay locking partitions during INSERT and UPDATE

2019-01-24 Thread Tomas Vondra

On 1/24/19 10:34 PM, John Naylor wrote:
> On Thu, Jan 24, 2019 at 4:17 PM Tomas Vondra
>  wrote:
>> I can still see about the same performance as before (on both clusters).
> 
> Thanks for checking! I think the thing to do now is have a committer
> look at it. It's a small patch with obvious performance effects --
> there's just the question of whether the locking behavior with
> concurrent DDL is as safe as it is now.
>
> Anyone have anything else?
>

Yes, I don't see why would the patch change that and I've been looking
for such cases. I think David was looking at that this week too, and I
assume he'll send an update if he finds anything. If not, I plan to get
it committed soon-ish (possibly next week after the FOSDEM dev meeting,
unless there are some objections).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Delay locking partitions during INSERT and UPDATE

2019-01-24 Thread John Naylor

On Thu, Jan 24, 2019 at 4:17 PM Tomas Vondra
 wrote:
> I can still see about the same performance as before (on both clusters).

Thanks for checking! I think the thing to do now is have a committer
look at it. It's a small patch with obvious performance effects --
there's just the question of whether the locking behavior with
concurrent DDL is as safe as it is now.

Anyone have anything else?

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Almost bug in COPY FROM processing of GB18030 encoded input

2019-01-24 Thread Robert Haas

On Wed, Jan 23, 2019 at 6:23 AM Heikki Linnakangas  wrote:
> I happened to notice that when CopyReadLineText() calls mblen(), it
> passes only the first byte of the multi-byte characters. However,
> pg_gb18030_mblen() looks at the first and the second byte.
> CopyReadLineText() always passes \0 as the second byte, so
> pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded
> characters as 2.
>
> It works out fine, though, because the second half of the 4-byte encoded
> character always looks like another 2-byte encoded character, in
> GB18030. CopyReadLineText() is looking for delimiter and escape
> characters and newlines, and only single-byte characters are supported
> for those, so treating a 4-byte character as two 2-byte characters is
> harmless.

Yikes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: proposal - plpgsql unique statement id

2019-01-24 Thread Peter Eisentraut

On 22/01/2019 19:04, Pavel Stehule wrote:
> It was missing, fixed, thank you for check

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Log a sample of transactions

2019-01-24 Thread Robert Haas

On Fri, Jan 18, 2019 at 8:23 AM Adrien NAYRAT
 wrote:
> On 1/18/19 9:03 AM, Peter Eisentraut wrote:
> > But if you have trouble with a specific transaction, how will a setting
> > help that randomly logs transactions, not necessarily the one you are
> > concerned about?
>
> Yes, it assumes your application performs same type of transaction.
> Maybe the use case is too specific to have this feature in core?

It doesn't sound too crazy to me.  Say you log a sample of statements.
But then you're like, wow, this is hard to interpret, because I don't
know what happened earlier in the transaction.  So then you use this
feature instead.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Protect syscache from bloating with negative cache entries

2019-01-24 Thread Robert Haas

On Thu, Jan 24, 2019 at 10:02 AM Tom Lane  wrote:
> Bruce Momjian  writes:
> > On Thu, Jan 24, 2019 at 06:39:24PM +0900, Kyotaro HORIGUCHI wrote:
> >> I also thought that there's some other features that is useful if
> >> it could be turned on remotely so the remote GUC feature but it
> >> was too complex...
>
> > Well, I am thinking if we want to do something like this, we should do
> > it for all GUCs, not just for this one, so I suggest we not do this now
> > either.
>
> I will argue hard that we should not do it at all, ever.
>
> There is already a mechanism for broadcasting global GUC changes:
> apply them to postgresql.conf (or use ALTER SYSTEM) and SIGHUP.
> I do not think we need something that can remotely change a GUC's
> value in just one session.  The potential for bugs, misuse, and
> just plain confusion is enormous, and the advantage seems minimal.

I think there might be some merit in being able to activate debugging
or tracing facilities for a particular session remotely, but designing
something that will do that sort of thing well seems like a very
complex problem that certainly should not be sandwiched into another
patch that is mostly about something else.  And if we ever get such a
thing I suspect it should be entirely separate from the GUC system.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Delay locking partitions during INSERT and UPDATE

2019-01-24 Thread Tomas Vondra



On 1/24/19 9:50 PM, John Naylor wrote:
> On Wed, Jan 23, 2019 at 7:56 PM David Rowley
>  wrote:
>> On Thu, 24 Jan 2019 at 13:38, John Naylor  
>> wrote:
>>> Hmm, now instead of an 85x speedup over master in the 10k partition
>>> case, I only get 20x. Anyone else see this?
>>
>> What's it like with fsync off?
> 
> No change. Just in case, I tried git bisect and also recreated the
> cluster, but never got the same performance as my first test, so not
> sure what happened.

I can still see about the same performance as before (on both clusters).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Implicit timezone conversion replicating from timestamp to timestamptz?

2019-01-24 Thread Jeremy Finzel

We are working to migrate several large tables from the timestamp to the
timestamptz data type by using logical replication (so as to avoid long
downtime for type conversions).  We are using pglogical but curious if what
I share below applies to native logical replication as well.

Both source and destination dbs are at localtime, which is
'America/Chicago' time zone.

The source system has a timestamp stored "at time zone UTC", like this for
6:00pm Chicago time:
2019-01-24 20:00:00.00

I was *very surprised* to find that replicating above timestamp to
timestamptz actually does so correctly, showing this value in my psql
client on the subscriber:
2019-01-24 14:00:00.00-06

How does it know/why does it assume it knows that the time zone of the
timestamp data type is UTC on the provider given that my clusters are at
America/Chicago?  I would have actually expected an incorrect conversion of
the data unless I set the timezone to UTC on the way in on the subscriber
via a trigger.

That is, I was expecting to see this:
2019-01-24 20:00:00.00-06

Which is obviously wrong.  So why does it do this and is there some
assumption being made somewhere in the code base that a timestamp is
actually saved "at time zone UTC"?

Thanks,
Jeremy

Re: Delay locking partitions during INSERT and UPDATE

2019-01-24 Thread John Naylor

On Wed, Jan 23, 2019 at 7:56 PM David Rowley
 wrote:
> On Thu, 24 Jan 2019 at 13:38, John Naylor  wrote:
> > Hmm, now instead of an 85x speedup over master in the 10k partition
> > case, I only get 20x. Anyone else see this?
>
> What's it like with fsync off?

No change. Just in case, I tried git bisect and also recreated the
cluster, but never got the same performance as my first test, so not
sure what happened.
--
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: Avoid creation of the free space map for small tables

2019-01-24 Thread John Naylor

On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila  wrote:
>
> On Thu, Jan 24, 2019 at 3:39 AM John Naylor  
> wrote:
> > mkdir -p data1 data2 standby
> >
> > echo 'heap' > data1/foo
> > echo 'fsm' > data1/foo_fsm
> >
> > # simulate streaming replication
> > rsync --archive data1 standby
> >
> > # simulate pg_upgrade, skipping FSM
> > ln data1/foo -t data2/
> >
> > rsync --archive --delete --hard-links --size-only --no-inc-recursive
> > data1 data2 standby
> >
> > # result
> > ls standby/data1
> > ls standby/data2
> >
> >
> > The result is that foo_fsm is not copied to standby/data2, contrary to
> > what the docs above imply for other unlinked files. Can anyone shed
> > light on this?
> >
>
> Is foo_fsm present in standby/data1?

Yes it is.

> I think what doc means to say is
> that it copies any unlinked files present in primary's new cluster
> (which in your case will be data2).

In that case, I'm still confused why that doc says, "Unfortunately,
rsync needlessly copies files associated with temporary and unlogged
tables because these files don't normally exist on standby servers."
I fail to see why the primary's new cluster would have these if they
weren't linked. And in the case we're discussing here, the skipped
FSMs won't be on data2, so won't end up in standby/data2.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: problems with foreign keys on partitioned tables

2019-01-24 Thread Alvaro Herrera

On 2019-Jan-24, Amit Langote wrote:

> A few hunks of the originally proposed patch are attached here as 0001,
> especially the part which fixes ATAddForeignKeyConstraint to pass the
> correct value of connoninherit to CreateConstraintEntry (which should be
> false for partitioned tables).  With that change, many tests start failing
> because of the above bug.  That patch also adds a test case like the one
> above, but it fails along with others due to the bug.  Patch 0002 is the
> aforementioned simpler fix to make the errors (existing and the newly
> added) go away.

Cool, thanks.  I made a bunch of fixes -- I added an elog(ERROR) check
to ensure a constraint whose parent is being set does not already have a
parent; that seemed in line with the new asserts that check the
coninhcount.  I also moved those asserts, changing the spirit of what
they checked.  Also: I wasn't sure about stopping recursion for legacy
inheritance in ATExecDropConstraint() for non-check constraints, so I
changed that to occur in partitioned only.  Also, stylistic fixes.

I was mildly surprised to realize that the my_fkey constraint on
fk_part_1_1 is gone after dropping fkey on its parent, since it was
declared locally when that table was created.  However, it makes perfect
sense in retrospect, since we made it dependent on its parent.  I'm not
terribly happy about this, but I don't quite see a way to make it better
that doesn't require much more code than is warranted.

> These patches apply unchanged to the PG 11 branch.

Yeah, only if you tried to compile, it would have complained about
table_close() ;-)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Built-in connection pooler

2019-01-24 Thread Konstantin Knizhnik


Hi hacker,

I am working for some time under built-in connection pooler for Postgres:
https://www.postgresql.org/message-id/flat/a866346d-5582-e8e8-2492-fd32732b0783%40postgrespro.ru#b1c354d5cf937893a75954e1e77f81f5

Unlike existed external pooler, this solution supports session semantic 
for pooled connections: you can use temporary tables, prepared 
statements, GUCs,...

But to make it possible I need to store/restore session context.
It is not so difficult, but it requires significant number of changes in 
Postgres code.
It will be committed in PgProEE-12 version of Postgres Professional 
version of Postgres,
but I realize that there are few changes to commit it to mainstream 
version of Postgres.


Dimitri Fontaine proposed to develop much simpler version of pooler 
which can be community:



The main idea I want to pursue is the following:

   - only solve the “idle connection” problem, nothing else, making idle 
connection basically free
   - implement a layer in between a connection and a session, managing a 
“client backend” pool
   - use the ability to give a socket to another process, as you did, so that 
the pool is not a proxy
   - allow re-using of a backend for a new session only when it is safe to do so


Unfortunately, we have not found a way to support SSL connections with 
socket redirection.

So I have implemented solution with traditional proxy approach.
If client changes session context (creates temporary tables, set GUC 
values, prepare statements,...) then its backend becomes "tainted"
and is not more participate in pooling. It is now dedicated to this 
backend. But it still receives data through proxy.

Once this client is disconnected, tainted backend is terminated.

Built-in proxy accepts connection on special port (by default 6543).
If you connect to standard port, then normal Postgres backends will be 
launched and there is no difference with vanilla Postgres .
And if you connect to proxy port, then connection is redirected to one 
of proxy workers which then perform scheduling of all sessions, assigned 
to it.
There is currently on migration of sessions between proxies. There are 
three policies of assigning session to proxy:

random, round-robin and load-balancing.


The main differences with pgbouncer are that:

1. It is embedded and requires no extra steps for installation and 
configurations.

2. It is not single threaded (no bottleneck)
3. It supports all clients (if client needs session semantic, then it 
will be implicitly given dedicated backend)



Some performance results (pgbench -S -n):

#Connections
Proxy
Proxy/SSL
Direct
Direct/SSL
1
13752
12396
17443
15762
10
53415
59615
68334
85885
1000
60152
20445
60003
24047



Proxy configuration is the following:

session_pool_size = 4
connection_proxies = 2

postgres=# select * from pg_pooler_state();
 pid  | n_clients | n_ssl_clients | n_pools | n_backends | 
n_dedicated_backends | tx_bytes | rx_bytes | n_transactions
--+---+---+-++--+--+--+ 

 1310 | 1 | 0 |   1 |  4 
|    0 | 10324739 |  9834981 | 156388
 1311 | 0 | 0 |   1 |  4 
|    0 | 10430566 |  9936634 | 158007

(2 rows)


This implementation contains much less changes to Postgres core (it is 
more like invoking pgbouncer as Postgres worker).

The main things I have added are:
1. Mechanism for sending socket to a process (needed to redirect 
connection to proxy)
2. Support of edge pooling mode for epoll (needed to multiplex reads and 
writes)

3. Library libpqconn for establishing libpq connection from core


Proxy version of built-in connection pool is in conn_proxy branch in the 
following GIT repository:

https://github.com/postgrespro/postgresql.builtin_pool.git


Also I attach patch to the master to this mail.
Will be please to receive your comments.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index e94b305..ee12562 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -704,6 +704,123 @@ include_dir 'conf.d'
   
  
 
+ 
+  max_sessions (integer)
+  
+   max_sessions configuration parameter
+  
+  
+  
+
+  The maximum number of client sessions that can be handled by
+  one connection proxy when session pooling is switched on.
+  This parameter does not add any memory or CPU overhead, so
+  specifying a large max_sessions value
+  does not affect performance.
+  If the max_sessions limit is reached new connection are not accepted.
+
+
+  The default value is 1000. This

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-24 Thread Alvaro Herrera

On 2019-Jan-24, Tom Lane wrote:

> > Also, as 
> > the pseudo-random state is fully controlled, seeded test results are 
> > deterministic so the expected value can be fully checked.
> 
> I found that the "expected value" was different in v11 than HEAD,
> which surprised me.  It looks like the reason is that HEAD sets up
> more/different RandomStates from the same seed than v11 did.  Not
> sure if it's a good thing for this behavior to change across versions.

The rationale behind this was that some internal uses of random numbers
messed up the determinism of user-invoked random functions; 409231919443
commit message says

While at it, use separate random state for thread administratrivia such
as deciding which script to run, how long to delay for throttling, or
whether to log a message when sampling; this not only makes these tasks
independent of each other, but makes the actual thread run
deterministic.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: PSA: we lack TAP test coverage on NetBSD and OpenBSD

2019-01-24 Thread Tom Lane

Fabien COELHO  writes:
>> I had in mind something more like the attached.

> Yep.
> I'm not too happy that it mixes API levels, and about the int/double/int 
> path.
> Attached an updated version which relies on pg_jrand48 instead.

Hm, I'm not sure that's really an improvement, but I pushed it like that
(and the other change along with it).

> Also, as 
> the pseudo-random state is fully controlled, seeded test results are 
> deterministic so the expected value can be fully checked.

I found that the "expected value" was different in v11 than HEAD,
which surprised me.  It looks like the reason is that HEAD sets up
more/different RandomStates from the same seed than v11 did.  Not
sure if it's a good thing for this behavior to change across versions.

regards, tom lane

Re: problems with foreign keys on partitioned tables

2019-01-24 Thread Alvaro Herrera

On 2019-Jan-25, Amit Langote wrote:

> On Fri, Jan 25, 2019 at 12:30 AM Amit Langote  wrote:

> > Doesn't the following performDeletion() at the start of
> > ATExecDropConstraint(), through findDependentObject()'s own recursion,
> > take care of deleting *all* constraints, including those of?
> 
> Meant to say: "...including those of the grandchildren?"
> 
> > /*
> >  * Perform the actual constraint deletion
> >  */
> > conobj.classId = ConstraintRelationId;
> > conobj.objectId = con->oid;
> > conobj.objectSubId = 0;
> >
> > performDeletion(, behavior, 0);

Of course it does when the dependencies are set up -- but in the
approach we just gave up on, those dependencies would not exist.
Anyway, my motivation was that performMultipleDeletions has the
advantage that it collects all objects to be dropped before deleting
anyway, and so the error that a constraint was dropped in a previous
recursion step would not occur.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Thread-unsafe coding in ecpg

2019-01-24 Thread Andrew Dunstan



On 1/23/19 10:53 PM, Tom Lane wrote:
> I wrote:
>> This suggests that, rather than throwing up our hands if the initial
>> _configthreadlocale call returns -1, we should act as though the function
>> doesn't exist, and just soldier on the same as before.  The code in there
>> assumes that -1 is a can't-happen case and doesn't try to recover,
>> but apparently that's over-optimistic.
> I pushed a patch to fix that.



jacana has been upgraded to gcc 8.1.0 so it knows about
_configthreadlocale() and it's now happy.


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: problems with foreign keys on partitioned tables

2019-01-24 Thread Amit Langote

On Fri, Jan 25, 2019 at 12:30 AM Amit Langote  wrote:
>
> On Fri, Jan 25, 2019 at 12:08 AM Alvaro Herrera
>  wrote:
> > On 2019-Jan-24, Amit Langote wrote:
> >
> > > Thinking more on this, my proposal to rip dependencies between parent and
> > > child constraints altogether to resolve the bug I initially reported is
> > > starting to sound a bit overambitious especially considering that we'd
> > > need to back-patch it (the patch didn't even consider index constraints
> > > properly, creating a divergence between the behaviors of inherited foreign
> > > key constraints and inherited index constraints).  We can pursue it if
> > > only to avoid bloating the catalog for what can be achieved with little
> > > bit of additional code in tablecmds.c, but maybe we should refrain from
> > > doing it in reaction to this particular bug.
> >
> > While studying your fix it occurred to me that perhaps we could change
> > things so that we first collect a list of objects to drop, and only when
> > we're done recursing perform the deletion, as per the attached patch.
> > However, this fails for the test case in your 0001 patch (but not the
> > one you show in your email body), because you added a stealthy extra
> > ingredient to it: the constraint in the grandchild has a different name,
> > so when ATExecDropConstraint() tries to search for it by name, it's just
> > not there, not because it was dropped but because it has never existed
> > in the first place.
>
> Doesn't the following performDeletion() at the start of
> ATExecDropConstraint(), through findDependentObject()'s own recursion,
> take care of deleting *all* constraints, including those of?

Meant to say: "...including those of the grandchildren?"

> /*
>  * Perform the actual constraint deletion
>  */
> conobj.classId = ConstraintRelationId;
> conobj.objectId = con->oid;
> conobj.objectSubId = 0;
>
> performDeletion(, behavior, 0);

Thanks,
Amit

Re: problems with foreign keys on partitioned tables

2019-01-24 Thread Amit Langote

On Fri, Jan 25, 2019 at 12:08 AM Alvaro Herrera
 wrote:
> On 2019-Jan-24, Amit Langote wrote:
>
> > Thinking more on this, my proposal to rip dependencies between parent and
> > child constraints altogether to resolve the bug I initially reported is
> > starting to sound a bit overambitious especially considering that we'd
> > need to back-patch it (the patch didn't even consider index constraints
> > properly, creating a divergence between the behaviors of inherited foreign
> > key constraints and inherited index constraints).  We can pursue it if
> > only to avoid bloating the catalog for what can be achieved with little
> > bit of additional code in tablecmds.c, but maybe we should refrain from
> > doing it in reaction to this particular bug.
>
> While studying your fix it occurred to me that perhaps we could change
> things so that we first collect a list of objects to drop, and only when
> we're done recursing perform the deletion, as per the attached patch.
> However, this fails for the test case in your 0001 patch (but not the
> one you show in your email body), because you added a stealthy extra
> ingredient to it: the constraint in the grandchild has a different name,
> so when ATExecDropConstraint() tries to search for it by name, it's just
> not there, not because it was dropped but because it has never existed
> in the first place.

Doesn't the following performDeletion() at the start of
ATExecDropConstraint(), through findDependentObject()'s own recursion,
take care of deleting *all* constraints, including those of?

/*
 * Perform the actual constraint deletion
 */
conobj.classId = ConstraintRelationId;
conobj.objectId = con->oid;
conobj.objectSubId = 0;

performDeletion(, behavior, 0);

> Unless I misunderstand, this means that your plan to remove those
> catalog tuples won't work at all, because there is no way to find those
> constraints other than via pg_depend if they have different names.

Yeah, that's right.  Actually, I gave up on developing the patch
further based on that approach (no-dependencies approach) when I
edited the test to give the grandchild constraint its own name.

> I'm leaning towards the idea that your patch is the definitive fix and
> not just a backpatchable band-aid.

Yeah, I think so too.

Thanks,
Amit

Re: problems with foreign keys on partitioned tables

2019-01-24 Thread Alvaro Herrera

Hello

On 2019-Jan-24, Amit Langote wrote:

> Thinking more on this, my proposal to rip dependencies between parent and
> child constraints altogether to resolve the bug I initially reported is
> starting to sound a bit overambitious especially considering that we'd
> need to back-patch it (the patch didn't even consider index constraints
> properly, creating a divergence between the behaviors of inherited foreign
> key constraints and inherited index constraints).  We can pursue it if
> only to avoid bloating the catalog for what can be achieved with little
> bit of additional code in tablecmds.c, but maybe we should refrain from
> doing it in reaction to this particular bug.

While studying your fix it occurred to me that perhaps we could change
things so that we first collect a list of objects to drop, and only when
we're done recursing perform the deletion, as per the attached patch.
However, this fails for the test case in your 0001 patch (but not the
one you show in your email body), because you added a stealthy extra
ingredient to it: the constraint in the grandchild has a different name,
so when ATExecDropConstraint() tries to search for it by name, it's just
not there, not because it was dropped but because it has never existed
in the first place.

Unless I misunderstand, this means that your plan to remove those
catalog tuples won't work at all, because there is no way to find those
constraints other than via pg_depend if they have different names.

I'm leaning towards the idea that your patch is the definitive fix and
not just a backpatchable band-aid.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 747acbd2b9..29860acaa2 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -417,7 +417,7 @@ static void CloneFkReferencing(Relation pg_constraint, Relation parentRel,
 static void ATExecDropConstraint(Relation rel, const char *constrName,
 	 DropBehavior behavior,
 	 bool recurse, bool recursing,
-	 bool missing_ok, LOCKMODE lockmode);
+	 bool missing_ok, LOCKMODE lockmode, ObjectAddresses *objsToDelete);
 static void ATPrepAlterColumnType(List **wqueue,
 	  AlteredTableInfo *tab, Relation rel,
 	  bool recurse, bool recursing,
@@ -4160,12 +4160,12 @@ ATExecCmd(List **wqueue, AlteredTableInfo *tab, Relation rel,
 		case AT_DropConstraint: /* DROP CONSTRAINT */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
  false, false,
- cmd->missing_ok, lockmode);
+ cmd->missing_ok, lockmode, NULL);
 			break;
 		case AT_DropConstraintRecurse:	/* DROP CONSTRAINT with recursion */
 			ATExecDropConstraint(rel, cmd->name, cmd->behavior,
  true, false,
- cmd->missing_ok, lockmode);
+ cmd->missing_ok, lockmode, NULL);
 			break;
 		case AT_AlterColumnType:	/* ALTER COLUMN TYPE */
 			address = ATExecAlterColumnType(tab, rel, cmd, lockmode);
@@ -9219,7 +9219,8 @@ static void
 ATExecDropConstraint(Relation rel, const char *constrName,
 	 DropBehavior behavior,
 	 bool recurse, bool recursing,
-	 bool missing_ok, LOCKMODE lockmode)
+	 bool missing_ok, LOCKMODE lockmode,
+	 ObjectAddresses *objsToDelete)
 {
 	List	   *children;
 	ListCell   *child;
@@ -9237,6 +9238,12 @@ ATExecDropConstraint(Relation rel, const char *constrName,
 
 	conrel = heap_open(ConstraintRelationId, RowExclusiveLock);
 
+	if (!recursing)
+	{
+		Assert(objsToDelete == NULL);
+		objsToDelete = new_object_addresses();
+	}
+
 	/*
 	 * Find and drop the target constraint
 	 */
@@ -9290,13 +9297,13 @@ ATExecDropConstraint(Relation rel, const char *constrName,
 		}
 
 		/*
-		 * Perform the actual constraint deletion
+		 * Register the constraint for deletion
 		 */
 		conobj.classId = ConstraintRelationId;
 		conobj.objectId = HeapTupleGetOid(tuple);
 		conobj.objectSubId = 0;
 
-		performDeletion(, behavior, 0);
+		add_exact_object_address(, objsToDelete);
 
 		found = true;
 	}
@@ -9402,7 +9409,7 @@ ATExecDropConstraint(Relation rel, const char *constrName,
 /* Time to delete this child constraint, too */
 ATExecDropConstraint(childrel, constrName, behavior,
 	 true, true,
-	 false, lockmode);
+	 false, lockmode, objsToDelete);
 			}
 			else
 			{
@@ -9435,6 +9442,9 @@ ATExecDropConstraint(Relation rel, const char *constrName,
 		heap_close(childrel, NoLock);
 	}
 
+	if (!recursing)
+		performMultipleDeletions(objsToDelete, behavior, 0);
+
 	heap_close(conrel, RowExclusiveLock);
 }

Re: Protect syscache from bloating with negative cache entries

2019-01-24 Thread Tom Lane

Bruce Momjian  writes:
> On Thu, Jan 24, 2019 at 06:39:24PM +0900, Kyotaro HORIGUCHI wrote:
>> I also thought that there's some other features that is useful if
>> it could be turned on remotely so the remote GUC feature but it
>> was too complex...

> Well, I am thinking if we want to do something like this, we should do
> it for all GUCs, not just for this one, so I suggest we not do this now
> either.

I will argue hard that we should not do it at all, ever.

There is already a mechanism for broadcasting global GUC changes:
apply them to postgresql.conf (or use ALTER SYSTEM) and SIGHUP.
I do not think we need something that can remotely change a GUC's
value in just one session.  The potential for bugs, misuse, and
just plain confusion is enormous, and the advantage seems minimal.

regards, tom lane

Re: Protect syscache from bloating with negative cache entries

2019-01-24 Thread Bruce Momjian

On Thu, Jan 24, 2019 at 06:39:24PM +0900, Kyotaro HORIGUCHI wrote:
> > Second, when would you use syscache_memory_target != 0? 
> 
> It is a suggestion upthread, we sometimes want to keep some known
> amount of caches despite that expration should be activated.
> 
> > If you had
> > syscache_prune_min_age really fast, e.g. 10 seconds?  What is the
> > use-case for this? You have a query that touches 10k objects, and then
> > the connection stays active but doesn't touch many of those 10k objects,
> > and you want it cleaned up in seconds instead of minutes?  (I can't see
> > why you would not clean up all unreferenced objects after _minutes_ of
> > disuse, but removing them after seconds of disuse seems undesirable.)
> > What are the odds you would retain the entires you want with a fast
> > target?
> 
> Do you asking the reason for the unit? It's just because it won't
> be so large even in seconds, to the utmost 3600 seconds.  Even
> though I don't think such a short dutaion setting is meaningful
> in the real world, either I don't think we need to inhibit
> that. (Actually it is useful for testing:p) Another reason is

We have gone from ignoring the cache bloat problem to designing an API
that even we don't know what value they provide, and if we don't know,
we can be sure our users will not know.  Every GUC has a cost, even if
it is not used.

I suggest you go with just syscache_prune_min_age, get that into PG 12,
and we can then reevaluate what we need.  If you want to hard-code a
minimum cache size where no pruning will happen, maybe based on the system
catalogs or typical load, that is fine.

> that GUC_UNIT_MIN doesn't seem so common that it is used only by
> two variables, log_rotation_age and old_snapshot_threshold.
> 
> > What is the value of being able to change a specific backend's stat
> > interval?  I don't remember any other setting having this ability.
> 
> As mentioned upthread, it takes significant time to take
> statistics so I believe no one is willing to turn it on at all
> times.  As the result it should be useless because it cannot be
> turned on on an active backend when it actually gets bloat. So I
> wanted to provide a remote switching feture.
> 
> I also thought that there's some other features that is useful if
> it could be turned on remotely so the remote GUC feature but it
> was too complex...

Well, I am thinking if we want to do something like this, we should do
it for all GUCs, not just for this one, so I suggest we not do this now
either.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Use an enum for RELKIND_*?

2019-01-24 Thread Tom Lane

Kyotaro HORIGUCHI  writes:
> I might misunderstand something, but my compiler (gcc 7.3.1)
> won't be quiet about omitted value even with default:.
> ...

I would call that a compiler bug, TBH.  The code is 100% correct,
if you intended to allow the default case to handle some enum
values, which is perfectly reasonable coding.

> Isn't it enough that at least one platform correctly warns that?

No, especially not if it's only a warning.  Many developers would
not see it initially, and the buildfarm likely wouldn't complain
either.

regards, tom lane

Re: pgsql: Detach constraints when partitions are detached

2019-01-24 Thread Alvaro Herrera

CC'ing -hackers.

On 2019-Jan-24, Amit Langote wrote:

> On 2019/01/24 13:58, Tom Lane wrote:
> > Alvaro Herrera  writes:
> >> Detach constraints when partitions are detached
> > 
> > Hm ... it looks like this fails under -DRELCACHE_FORCE_RELEASE:
> 
> Oops, sorry.  This is why:
> 
>  index_close(idx, NoLock);
> ...
> +if (!idx->rd_index->indisprimary && !idx->rd_index->indisunique)
> +continue;
> 
> Attached a fix.

Mumble.  I changed course here and went for simpler coding instead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: using expression syntax for partition bounds

2019-01-24 Thread Amit Langote

Hi,

Thanks for looking.

On 2019/01/24 21:00, Alvaro Herrera wrote:
> Why did you lose the parser_errposition in parse_utilcmd.c line 3854?
> 
>> -/* Fail if we don't have a constant (i.e., non-immutable coercion) */
>> -if (!IsA(value, Const))
>> +/* Make sure the expression does not refer to any vars. */
>> +if (contain_var_clause(value))
>>  ereport(ERROR,
>> -(errcode(ERRCODE_DATATYPE_MISMATCH),
>> - errmsg("specified value cannot be cast to type 
>> %s for column \"%s\"",
>> -format_type_be(colType), 
>> colName),
>> - errdetail("The cast requires a non-immutable 
>> conversion."),
>> - errhint("Try putting the literal value in 
>> single quotes."),
>> - parser_errposition(pstate, con->location)));
>> +(errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
>> + errmsg("cannot use column references in 
>> partition bound expression")));

The if (contain_var_clause(value)) block is new code, but I agree its
ereport should have parser_errposition just like other ereports in that
function.  Fixed that in the attached.

Thanks,
Amit
From ae4fbcaa97364e1a33c5c25eb983a23d3acad30c Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Jul 2018 14:05:22 +0900
Subject: [PATCH v11] Allow generalized expression syntax for partition bounds

Authors: Kyotaro Horiguchi, Tom Lane, Amit Langote
---
 doc/src/sgml/ref/alter_table.sgml  |   6 +-
 doc/src/sgml/ref/create_table.sgml |  19 +--
 src/backend/commands/tablecmds.c   |   9 ++
 src/backend/optimizer/util/clauses.c   |   4 +-
 src/backend/parser/gram.y  |  60 +---
 src/backend/parser/parse_agg.c |  10 ++
 src/backend/parser/parse_expr.c|   5 +
 src/backend/parser/parse_func.c|   3 +
 src/backend/parser/parse_utilcmd.c | 215 +++--
 src/include/optimizer/clauses.h|   2 +
 src/include/parser/parse_node.h|   3 +-
 src/include/utils/partcache.h  |   6 +
 src/test/regress/expected/create_table.out |  91 +---
 src/test/regress/sql/create_table.sql  |  51 ++-
 14 files changed, 315 insertions(+), 169 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 0d1feaf828..0aa0f093f2 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -86,9 +86,9 @@ ALTER TABLE [ IF EXISTS ] name
 
 and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
+IN ( partition_bound_expr [, ...] 
) |
+FROM ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 and column_constraint 
is:
diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index 857515ec8f..22dbc07b23 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -87,9 +87,9 @@ class="parameter">referential_action ] [ ON 
UPDATE and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
+IN ( partition_bound_expr [, ...] 
) |
+FROM ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
@@ -413,12 +413,13 @@ WITH ( MODULUS numeric_literal, REM
  
 
  
-  Each of the values specified in
-  the partition_bound_spec is
-  a literal, NULL, MINVALUE, or
-  MAXVALUE.  Each literal value must be either a
-  numeric constant that is coercible to the corresponding partition key
-  column's type, or a string literal that is valid input for that type.
+  partition_bound_expr is
+  any variable-free expression (subqueries, window functions, aggregate
+  functions, and set-returning functions are not allowed).  Its data type
+  must match the data type of the corresponding partition key column.
+  The expression is evaluated once at table creation time, so it can
+  even contain volatile expressions such as
+  CURRENT_TIMESTAMP.
  
 
  
diff --git a/src/backend/commands/tablecmds.c

Re: Improve selectivity estimate for range queries

2019-01-24 Thread Kyotaro HORIGUCHI

Hi.

At Fri, 11 Jan 2019 11:36:47 +0900, "Yuzuko Hosoya" 
 wrote in 
<01d4a956$806a2ab0$813e8010$@lab.ntt.co.jp>
> Hi,
> 
> Thanks for the comments, and I'm sorry for the late reply.
> 
> > From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> > Sent: Friday, January 11, 2019 7:04 AM
> > > Robert Haas  writes:
> > > On Fri, Dec 21, 2018 at 11:50 AM Tom Lane  wrote:
> > >> A smaller-footprint way to fix the immediate problem might be to
> > >> change the values of DEFAULT_INEQ_SEL and friends so that they're
> > >> even less likely to be matched by accident.  That is, instead of
> > >> 0., use 0.342 or some such.
> > 
> > > That's not a dumb idea, but it seems pretty unprincipled to me, and to
> > > be honest I'm kind of surprised that you're not proposing something
> > > cleaner.
> > 
> > The problem is the invasiveness of such a change (large) vs the benefit 
> > (not so large).  The
> upthread
> > patch attempted to add a separate signaling path, but it was very 
> > incomplete --- and yet both I
> and
> > Horiguchi-san thought it was already too messy.
> > 
> > Maybe at some point we'll go over to something reasonably principled, like 
> > adding confidence
> intervals
> > to all selectivity estimates.  That would be *really* invasive but perhaps 
> > would bring enough
> benefit
> > to justify itself.  But the current patch is just attempting to fix one 
> > extremely narrow pain
> point,
> > and if that is all it's doing it should have a commensurately small 
> > footprint.  So I don't think
> the
> > submitted patch looks good from a cost/benefit standpoint.
> > 
> Yes, I agree with you.  Indeed the patch I attached is insufficient in 
> cost-effectiveness.
> However, I want to solve problems of that real estimates happened to equal to 
> the default 
> values such as this case, even though it's a narrow pain point.  So I tried 
> distinguishing
> explicitly between real estimates and otherwise as Robert said.
> 
> The idea Tom proposed and Horiguchi-san tried seems reasonable, but I'm 
> concerned whether
> any range queries really cannot match 0.342 (or some such) by 
> accident in any 
> environments.  Is the way which Horiguchi-san did enough to prove that?

It cannot be perfect in priciple, but I think it is reliable
enough. This is not principled at all but very effective
comparatively the complexity, I think.

Actually, i/(i*3+1) for some 10^n's are:

 1/ 4:binary format: 3f d0 00 00 00 00 00 00
10/31:binary format: 3f d4 a5 29 4a 52 94 a5
   100/   301:binary format: 3f d5 43 30 7a 78 c5 51
  1000/  3001:binary format: 3f d5 53 83 74 70 f1 95
 1/ 30001:binary format: 3f d5 55 26 bb 44 2b a8
10/31:binary format: 3f d5 55 50 ac 4a 74 6d
   100/   301:binary format: 3f d5 55 54 de 07 5a 96
  1000/  3001:binary format: 3f d5 55 55 49 67 22 6d
 1/ 30001:binary format: 3f d5 55 55 54 23 e9 d7
10/31:binary format: 3f d5 55 55 55 36 ca 95
   100/   301:binary format: 3f d5 55 55 55 52 47 75
  1000/  3001:binary format: 3f d5 55 55 55 55 07 25
 1/ 30001:binary format: 3f d5 55 55 55 55 4d 84
10/31:binary format: 3f d5 55 55 55 55 54 8d
   100/   301:binary format: 3f d5 55 55 55 55 55 41
  1000/  3001:binary format: 3f d5 55 55 55 55 55 53

So, (as Tom said upthread) intuitively we can expect that we are
safe up to roughly 10^10? for the simple cases.

# Of course involving histogram makes things complex, though.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: problems with foreign keys on partitioned tables

2019-01-24 Thread Amit Langote

Hi,

On 2019/01/24 6:13, Alvaro Herrera wrote:
> On 2019-Jan-22, Amit Langote wrote:
>> Done.  See the attached patches for HEAD and PG 11.
> 
> I'm not quite sure I understand why the one in DefineIndex needs the
> deps but nothing else does.  I fear that you added that one just to
> appease the existing test that breaks otherwise, and I worry that with
> that addition we're papering over some other, more fundamental bug.

Thinking more on this, my proposal to rip dependencies between parent and
child constraints altogether to resolve the bug I initially reported is
starting to sound a bit overambitious especially considering that we'd
need to back-patch it (the patch didn't even consider index constraints
properly, creating a divergence between the behaviors of inherited foreign
key constraints and inherited index constraints).  We can pursue it if
only to avoid bloating the catalog for what can be achieved with little
bit of additional code in tablecmds.c, but maybe we should refrain from
doing it in reaction to this particular bug.

I've updated the patch that implements a much simpler fix for this
particular bug.  Just to reiterate, the following illustrates the bug:

create table pk (a int primary key);
create table p (a int references pk) partition by list (a);
create table p1 partition of p for values in (1) partition by list (a);
create table p11 partition of p1 for values in (1);
alter table p detach partition p1;
alter table p1 drop constraint p_a_fkey;
ERROR:  constraint "p_a_fkey" of relation "p11" does not exist

The error occurs because ATExecDropConstraint when recursively called on
p11 cannot find the constraint as the dependency mechanism already dropped
it.  The new fix is to return from ATExecDropConstraint without recursing
if the constraint being dropped is index or foreign constraint.

A few hunks of the originally proposed patch are attached here as 0001,
especially the part which fixes ATAddForeignKeyConstraint to pass the
correct value of connoninherit to CreateConstraintEntry (which should be
false for partitioned tables).  With that change, many tests start failing
because of the above bug.  That patch also adds a test case like the one
above, but it fails along with others due to the bug.  Patch 0002 is the
aforementioned simpler fix to make the errors (existing and the newly
added) go away.

These patches apply unchanged to the PG 11 branch.

Thanks,
Amit
From 8eb5ce74e8656b517ad5a4e960b70de0ff3bedd5 Mon Sep 17 00:00:00 2001
From: amit 
Date: Thu, 24 Jan 2019 21:29:17 +0900
Subject: [PATCH 1/2] Fix ATAddForeignKeyConstraint to use correct value of
 connoinherit

While at it, add some Asserts to ConstraintSetParentConstraint to
assert the correct value of coninhcount.

Many tests fail, but the next patch should take care of them.
---
 src/backend/catalog/pg_constraint.c   | 11 +--
 src/backend/commands/tablecmds.c  |  5 -
 src/test/regress/expected/foreign_key.out | 18 --
 src/test/regress/sql/foreign_key.sql  | 15 ++-
 4 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/src/backend/catalog/pg_constraint.c 
b/src/backend/catalog/pg_constraint.c
index 446b54b9ff..d2b8489b6c 100644
--- a/src/backend/catalog/pg_constraint.c
+++ b/src/backend/catalog/pg_constraint.c
@@ -785,6 +785,12 @@ ConstraintSetParentConstraint(Oid childConstrId, Oid 
parentConstrId)
{
constrForm->conislocal = false;
constrForm->coninhcount++;
+   /*
+* An inherited foreign key constraint can never have more than 
one
+* parent, because inheriting foreign keys is only allowed for
+* partitioning where multiple inheritance is disallowed.
+*/
+   Assert(constrForm->coninhcount == 1);
constrForm->conparentid = parentConstrId;

CatalogTupleUpdate(constrRel, >t_self, newtup);
@@ -797,8 +803,9 @@ ConstraintSetParentConstraint(Oid childConstrId, Oid 
parentConstrId)
else
{
constrForm->coninhcount--;
-   if (constrForm->coninhcount <= 0)
-   constrForm->conislocal = true;
+   /* See the above comment. */
+   Assert(constrForm->coninhcount == 0);
+   constrForm->conislocal = true;
constrForm->conparentid = InvalidOid;

deleteDependencyRecordsForClass(ConstraintRelationId, 
childConstrId,
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 738c178107..fea4d99735 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -7251,6 +7251,9 @@ ATAddForeignKeyConstraint(List **wqueue, AlteredTableInfo 
*tab, Relation rel,
boolold_check_ok;
ObjectAddress address;
ListCell   *old_pfeqop_item = list_head(fkconstraint->old_conpfeqop);
+   /* Partitioned table's foreign

Re: ArchiveEntry optional arguments refactoring

2019-01-24 Thread Dmitry Dolgov

Here is another version, where I accumulated all the suggestions:

* Use NULL as a default value where it was an empty string before (this
  required few minor changes for some part of the code outside ArchiveEntry)

* Rename defn, descr to createStmt, description

* Use a macro to avoid pgindent errors

About the last one. I'm also inclined to use the simpler version of
ARCHIVE_OPTS macro, mostly because the difference between "optional" and
"positional" arguments in the alternative proposal is not that visible. So

> mixing struct arguments and normal function arguments seems
> quite confusing

could probably affect not only readability, but also would be bit more
problematic for updating this code (which was the goal in the first place).


0001-ArchiveOpts-structure_v2.patch
Description: Binary data

Re: using expression syntax for partition bounds

2019-01-24 Thread Alvaro Herrera

Why did you lose the parser_errposition in parse_utilcmd.c line 3854?

> - /* Fail if we don't have a constant (i.e., non-immutable coercion) */
> - if (!IsA(value, Const))
> + /* Make sure the expression does not refer to any vars. */
> + if (contain_var_clause(value))
>   ereport(ERROR,
> - (errcode(ERRCODE_DATATYPE_MISMATCH),
> -  errmsg("specified value cannot be cast to type 
> %s for column \"%s\"",
> - format_type_be(colType), 
> colName),
> -  errdetail("The cast requires a non-immutable 
> conversion."),
> -  errhint("Try putting the literal value in 
> single quotes."),
> -  parser_errposition(pstate, con->location)));
> + (errcode(ERRCODE_INVALID_COLUMN_REFERENCE),
> +  errmsg("cannot use column references in 
> partition bound expression")));


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Simplify set of flags used by MyXactFlags

2019-01-24 Thread Alvaro Herrera

On 2019-Jan-22, Michael Paquier wrote:

> On Mon, Jan 21, 2019 at 02:58:39PM -0300, Alvaro Herrera wrote:
> > "... operated on temp namespace" doesn't look good; seems to me to be
> > missing an article, for one thing, but really I'm not sure that
> > 'namespace' is the term to be using here.  I'd say "... operated on
> > temporary objects" instead (the plural avoids the need for the article;
> > and the user doesn't really care about the namespace itself but rather
> > about the objects it contains.)
> 
> Thanks for the input.  Would you prefer something like the attached
> then?  I have switched the error message to "temporary objects", and
> renamed the flag to XACT_FLAGS_ACCESSEDTEMPOBJECTS.

Uh, I didn't think it was necessary nor desirable to rename the flag,
only the user-visible message.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Connection slots reserved for replication

2019-01-24 Thread Kyotaro HORIGUCHI

Hello.

Documentation looks fine for me. By the way, a comment for the
caller-site of CheckRequreParameterValues() in xlog.c looks
somewhat stale.

> /* Check to see if any changes to max_connections give problems */
> CheckRequiredParameterValues();

may be better something like this?:

> /* Check to see if any parameter change gives a problem on replication */


In postinit.c:
>/*
> * The last few connection slots are reserved for superusers.
> */
>if ((!am_superuser && !am_walsender) &&
>ReservedBackends > 0 &&

This is forgetting about explaing about walsenders.

> The last few connection slots are reserved for
> superusers. Replication connections don't share the same slot
> pool.

Or something?

And the parentheses enclosing "!am_superuser..walsender" seems no
longer needed.


In guc.c:
-   /* see max_connections and max_wal_senders */
+   /* see max_connections */

I don't understand for what reason we should see max_connections
just above. (Or even max_wal_senders) Do we need this comment?
(There're some other instances, but they wont'be for this patch.)


In pg_controldata.c:
+   printf(_("max_wal_senders setting: %d\n"),
+  ControlFile->max_wal_senders);
printf(_("max_worker_processes setting: %d\n"),
   ControlFile->max_worker_processes);
printf(_("max_prepared_xacts setting:   %d\n"),

The added item seems to want some additional spaces.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Use an enum for RELKIND_*?

2019-01-24 Thread Greg Stark

On Wed 16 Jan 2019, 18:10 Alvaro Herrera

> Or we could do nothing, since there are no complaints about this

> problem.
>

Unless there are objections my patch is to some add it to the xid case. The
tag is pretty useless for users and it's incredibly hard to even see these
rows anyways

I'll take care of it if there's no objections

>

Re: Use an enum for RELKIND_*?

2019-01-24 Thread Kyotaro HORIGUCHI

At Wed, 19 Dec 2018 09:55:22 -0500, Tom Lane  wrote in 
<10268.1545231...@sss.pgh.pa.us>
> Andres Freund  writes:
> > On 2018-12-18 23:17:54 -0500, Tom Lane wrote:
> >> Yeah, that would sure make things better.  I was considering
> >> something like
> >>
> >>#ifndef USE_ASSERT_CHECKING
> >>default:
> >>elog(ERROR, ...);
> >>#endif
> 
> > Yea, that's the best I can come up with too.  I think we'd probably want
> > to have the warning available during development mainly, so I'm not sure
> > the "lonely buildfarm animal" approach buys us enough.
> 
> Well, if it's controlled by some dedicated symbol ("CHECK_ENUM_SWITCHES"
> or such) then you could certainly run a test compile with that turned
> on, if you felt the need to.  But I don't really buy the complaint
> that leaving it to the buildfarm to find such problems wouldn't work.
> They're almost always simple, easily-fixed oversights.
> 
> Also, if we wrap all this up in a macro then it'd be possible to do
> other things by adjusting the macro.  I'm envisioning
> 
>   switch ((RelationRelkind) rel->rd_rel->relkind)
>   {
>   case ...
>   ...
> 
>   CHECK_ENUM_SWITCH(RelationRelkind, rel->rd_rel->relkind);
>   }

I might misunderstand something, but my compiler (gcc 7.3.1)
won't be quiet about omitted value even with default:.

>   switch ((x) v)
>   {
>  case A: ...
>  case B: ...
>  default: ...
>   }
> 
> t.c:12:3: warning: enumeration value 'C' not handled in switch [-Wswitch-enum]

Isn't it enough that at least one platform correctly warns that?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: using expression syntax for partition bounds

2019-01-24 Thread Amit Langote

Horiguchi-san,

Thanks for checking.

On 2019/01/24 19:03, Kyotaro HORIGUCHI wrote:
> At Fri, 18 Jan 2019 17:50:56 +0900, Amit Langote wrote:
>> On 2019/01/18 16:48, Peter Eisentraut wrote:
 How about the following note in the documentation:

 +  Although volatile expressions such as
 +  CURRENT_TIMESTAMP can be 
 used
 +  for this, be careful when using them, because
 +  PostgreSQL will evaluate them only once
 +  when adding the partition.
>>>
>>> I don't think we have to phrase it in this warning way.  Just say that
>>> volatile expressions are evaluated at the time of the DDL statement.
>>
>> OK, then just this:
>>
>> +  Even volatile expressions such as
>> +  CURRENT_TIMESTAMP can be used.
> 
> I agree to not to phrase in a warning way, but it seems
> too-simplified. I think the reason is still required, like this?:
> 
> ===
> The expression is evaluated once at the table creation time so it
> can involve even volatile expressions such as
> CURRENT_TIMESTAMP.

Ah, that's perhaps a better way of describing this feature.

Attached rebased patch containing above change.

Thanks,
Amit
From 6ed90ad9640c4d7bfe53c280535631c45d2cafae Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Fri, 6 Jul 2018 14:05:22 +0900
Subject: [PATCH v10] Allow generalized expression syntax for partition bounds

Authors: Kyotaro Horiguchi, Tom Lane, Amit Langote
---
 doc/src/sgml/ref/alter_table.sgml  |   6 +-
 doc/src/sgml/ref/create_table.sgml |  19 +--
 src/backend/commands/tablecmds.c   |   9 ++
 src/backend/optimizer/util/clauses.c   |   4 +-
 src/backend/parser/gram.y  |  60 +---
 src/backend/parser/parse_agg.c |  10 ++
 src/backend/parser/parse_expr.c|   5 +
 src/backend/parser/parse_func.c|   3 +
 src/backend/parser/parse_utilcmd.c | 214 +++--
 src/include/optimizer/clauses.h|   2 +
 src/include/parser/parse_node.h|   3 +-
 src/include/utils/partcache.h  |   6 +
 src/test/regress/expected/create_table.out |  89 +---
 src/test/regress/sql/create_table.sql  |  51 ++-
 14 files changed, 312 insertions(+), 169 deletions(-)

diff --git a/doc/src/sgml/ref/alter_table.sgml 
b/doc/src/sgml/ref/alter_table.sgml
index 0d1feaf828..0aa0f093f2 100644
--- a/doc/src/sgml/ref/alter_table.sgml
+++ b/doc/src/sgml/ref/alter_table.sgml
@@ -86,9 +86,9 @@ ALTER TABLE [ IF EXISTS ] name
 
 and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
+IN ( partition_bound_expr [, ...] 
) |
+FROM ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 and column_constraint 
is:
diff --git a/doc/src/sgml/ref/create_table.sgml 
b/doc/src/sgml/ref/create_table.sgml
index 857515ec8f..22dbc07b23 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -87,9 +87,9 @@ class="parameter">referential_action ] [ ON 
UPDATE and partition_bound_spec 
is:
 
-IN ( { numeric_literal | 
string_literal | TRUE | FALSE | 
NULL } [, ...] ) |
-FROM ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] )
-  TO ( { numeric_literal | 
string_literal | TRUE | FALSE | 
MINVALUE | MAXVALUE } [, ...] ) |
+IN ( partition_bound_expr [, ...] 
) |
+FROM ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] )
+  TO ( { partition_bound_expr | 
MINVALUE | MAXVALUE } [, ...] ) |
 WITH ( MODULUS numeric_literal, 
REMAINDER numeric_literal )
 
 index_parameters in 
UNIQUE, PRIMARY KEY, and 
EXCLUDE constraints are:
@@ -413,12 +413,13 @@ WITH ( MODULUS numeric_literal, REM
  
 
  
-  Each of the values specified in
-  the partition_bound_spec is
-  a literal, NULL, MINVALUE, or
-  MAXVALUE.  Each literal value must be either a
-  numeric constant that is coercible to the corresponding partition key
-  column's type, or a string literal that is valid input for that type.
+  partition_bound_expr is
+  any variable-free expression (subqueries, window functions, aggregate
+  functions, and set-returning functions are not allowed).  Its data type
+  must match the data type of the corresponding partition key column.
+  The expression is evaluated once at table creation time, so it can
+  even contain volatile expressions such as
+  CURRENT_TIMESTAMP.
  
 
  
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 738c178107..e338e760ab 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@

Re: WIP: Avoid creation of the free space map for small tables

2019-01-24 Thread Amit Kapila

On Thu, Jan 24, 2019 at 9:46 AM Amit Kapila  wrote:
>
> On Thu, Jan 24, 2019 at 3:39 AM John Naylor  
> wrote:
> >

Few comments related to pg_upgrade patch:

1.
+ if ((maps[mapnum].relkind != RELKIND_RELATION &&
+ maps[mapnum].relkind != RELKIND_TOASTVALUE) ||
+ first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ ||
+ GET_MAJOR_VERSION(new_cluster.major_version) <= 1100)
+ (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit);

I think this check will needlessly be performed for future versions as
well, say when wants to upgrade from PG12 to PG13.  That might not
create any problem, but let's try to be more precise.  Can you try to
rewrite this check?  You might want to encapsulate it inside a
function.  I have thought of doing something similar to what we do for
vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I
guess for this patch it is not important to check catalog version as
even if someone tries to upgrade to the same version.

2.
transfer_relfile()
{
..
- /* Is it an extent, fsm, or vm file? */
- if (type_suffix[0] != '\0' || segno != 0)
+ /* Did file open fail? */
+ if (stat(old_file, ) != 0)
..
}

So from now onwards, we will call stat for even 0th segment which
means there is one additional system call for each relation, not sure
if that matters, but I think there is no harm in once testing with a
large number of relations say 10K to 50K relations which have FSM.
The other alternative is we can fetch pg_class.relpages and rely on
that to take this decision, but again if that is not updated, we might
take the wrong decision.

Anyone else has any thoughts on this point?

3.
-static void
+static Size
 transfer_relfile(FileNameMap *map, const char *type_suffix, bool
vm_must_add_frozenbit)

If we decide to go with the approach proposed by you, we should add
some comments atop this function for return value change?

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

Re: using expression syntax for partition bounds

2019-01-24 Thread Kyotaro HORIGUCHI

Hello.

At Fri, 18 Jan 2019 17:50:56 +0900, Amit Langote 
 wrote in 

> Thanks for the comments.
> 
> On 2019/01/18 16:48, Peter Eisentraut wrote:
> >> How about the following note in the documentation:
> >>
> >> +  Although volatile expressions such as
> >> +  CURRENT_TIMESTAMP can be 
> >> used
> >> +  for this, be careful when using them, because
> >> +  PostgreSQL will evaluate them only once
> >> +  when adding the partition.
> > 
> > I don't think we have to phrase it in this warning way.  Just say that
> > volatile expressions are evaluated at the time of the DDL statement.
> 
> OK, then just this:
> 
> +  Even volatile expressions such as
> +  CURRENT_TIMESTAMP can be used.

I agree to not to phrase in a warning way, but it seems
too-simplified. I think the reason is still required, like this?:

===
The expression is evaluated once at the table creation time so it
can involve even volatile expressions such as
CURRENT_TIMESTAMP.
===

> >> Sorry but I'm not sure how or what I would test about this.  Maybe, just
> >> add a test in create_table.sql/alter_table.sql that shows that using
> >> volatile expression doesn't cause an error?
> > 
> > Possibilities: Create a range partition with current_timestamp as the
> > upper bound and then in a separate transaction insert current_timestamp
> > and have it appear in the default partition.  Or create list partition
> > with session_user as one partition's value and then insert session_user
> > and have it appear in that table.  Or something like those.
> 
> OK, added a test that uses current_timestamp.
> 
> >> So, should the "name" type's collation should simply be discarded in favor
> >> of "POSIX" that's being used for partitioning?
> > 
> > In that specific case, yes, I think so.
> > 
> >>> What we don't want is someone writing an explicit COLLATE clause.  I
> >>> think we just need to check that there is no top-level COLLATE clause.
> >>> This would then sort of match the logic parse_collate.c for combining
> >>> collations.
> >>
> >> Maybe I'm missing something, but isn't it OK to allow the COLLATE clause
> >> as long as it specifies the matching collation as the parent?
> > 
> > Yes, that should be OK.
> 
> Alright, I've included a test that uses cast expression in partition bound.
> 
> Updated patch attached.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: FETCH FIRST clause PERCENT option

2019-01-24 Thread Surafel Temesgen

On Wed, Jan 9, 2019 at 8:18 PM Tomas Vondra 
wrote:


>
> See the attached patch, which recomputes the count regularly. I don't
> claim the patch is committable or that it has no other bugs, but
> hopefully it shows what I meant.
>
>

I got only one issue it is not work well with cursor

postgres=# START TRANSACTION;

START TRANSACTION

postgres=# create table t as select i from generate_series(1,1000) s(i);

SELECT 1000

postgres=# declare c cursor for select * from t fetch first 5 percent rows
only;

DECLARE CURSOR

postgres=# fetch all in c;

ERROR: trying to store a minimal tuple into wrong type of slot


I am looking at it .

meanwhile i fix row estimation and cost and make create_ordered_paths
creation with no LIMIT consideration in PERCENTAGE case


regards

Surafel
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..8491b7831a 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] [ PERCENT ] { ROW | ROWS } ONLY
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,8 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+with PERCENT count specifies the maximum number of rows to return 
+in percentage.ROW
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index baa669abe8..8fc70b1b5f 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -21,6 +21,8 @@
 
 #include "postgres.h"
 
+#include 
+
 #include "executor/executor.h"
 #include "executor/nodeLimit.h"
 #include "miscadmin.h"
@@ -44,6 +46,7 @@ ExecLimit(PlanState *pstate)
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
+	slot = node->subSlot;
 
 	CHECK_FOR_INTERRUPTS();
 
@@ -81,7 +84,15 @@ ExecLimit(PlanState *pstate)
 			/*
 			 * Check for empty window; if so, treat like empty subplan.
 			 */
-			if (node->count <= 0 && !node->noCount)
+			if (node->limitOption == PERCENTAGE)
+			{
+if (node->percent == 0.0)
+{
+	node->lstate = LIMIT_EMPTY;
+	return NULL;
+}
+			}
+			else if (node->count <= 0 && !node->noCount)
 			{
 node->lstate = LIMIT_EMPTY;
 return NULL;
@@ -122,6 +133,7 @@ ExecLimit(PlanState *pstate)
 			return NULL;
 
 		case LIMIT_INWINDOW:
+
 			if (ScanDirectionIsForward(direction))
 			{
 /*
@@ -130,6 +142,32 @@ ExecLimit(PlanState *pstate)
  * advancing the subplan or the position variable; but change
  * the state machine state to record having done so.
  */
+
+/*
+ * When in percentage mode, we need to see if we can get any
+ * additional rows from the subplan (enough to increase the
+ * node->count value).
+ */
+if (node->limitOption == PERCENTAGE)
+{
+	/* loop until the node->count increments */
+	while (node->position - node->offset >= node->count)
+	{
+		int64	cnt;
+
+		slot = ExecProcNode(outerPlan);
+		if (TupIsNull(slot))
+			break;
+
+		tuplestore_puttupleslot(node->totalTuple, slot);
+
+		/* plus 1, because the first tuple is returned from LIMIT_RESCAN */
+		cnt = 1 + tuplestore_tuple_count(node->totalTuple);
+
+		node->count = ceil(node->percent * cnt / 100.0);
+	}
+}
+
 if (!node->noCount &&
 	node->position - node->offset >= node->count)
 {
@@ -145,6 +183,20 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
+if (node->limitOption == PERCENTAGE)
+{
+	while (node->position - node->offset < node->count)
+	{
+		if (tuplestore_gettupleslot(node->totalTuple, true, true, slot))
+		{
+			node->subSlot = slot;
+			node->position++;
+		}
+	}
+}
+else if (node->limitOption == EXACT_NUMBER)
+{
+
 /*
  * Get next tuple from subplan, if any.
  */
@@ -156,6 +208,7 @@ ExecLimit(PlanState *pstate)
 }
 node->subSlot = slot;
 node->position++;
+}
 			}
 			else
 			{
@@ -278,17 +331,29 @@ recompute_limits(LimitState *node)
 		/* Interpret NULL count as no count (LIMIT ALL) */
 		if (isNull)
 		{
-			node->count = 0;
+			node->count = 1;
 			node->noCount

Re: Protect syscache from bloating with negative cache entries

2019-01-24 Thread Kyotaro HORIGUCHI

Thank you for the comments.

At Wed, 23 Jan 2019 18:21:45 -0500, Bruce Momjian  wrote in 
<20190123232145.ga8...@momjian.us>
> On Wed, Jan 23, 2019 at 05:35:02PM +0900, Kyotaro HORIGUCHI wrote:
> > At Mon, 21 Jan 2019 17:22:55 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
> >  wrote in 
> > <20190121.172255.226467552.horiguchi.kyot...@lab.ntt.co.jp>
> > > An option is an additional PGPROC member and interface functions.
> > > 
> > > struct PGPROC
> > > {
> > >  ...
> > >  int syscahe_usage_track_interval; /* track interval, 0 to disable */
> > > 
> > > =# select syscahce_usage_track_add(, [, ]);
> > > =# select syscahce_usage_track_remove(2134);
> > > 
> > > 
> > > Or, just provide an one-shot triggering function.
> > > 
> > > =# select syscahce_take_usage_track();
> > > 
> > > This can use both a similar PGPROC variable or SendProcSignal()
> > > but the former doesn't fire while idle time unless using timer.
> > 
> > The attached is revised version of this patchset, where the third
> > patch is the remote setting feature. It uses static shared memory.
> > 
> > =# select pg_backend_catcache_stats(, );
> > 
> > Activates or changes catcache stats feature on the backend with
> > PID. (The name should be changed to .._syscache_stats, though.)
> > It is far smaller than the remote-GUC feature. (It contains a
> > part that should be in the previous patch. I will fix it later.)
> 
> I have a few questions to make sure we have not made the API too
> complex.  First, for syscache_prune_min_age, that is the minimum age
> that we prune, and entries could last twice that long.  Is there any
> value to doing the scan at 50% of the age so that the
> syscache_prune_min_age is the max age?  For example, if our age cutoff
> is 10 minutes, we could scan every 5 minutes so 10 minutes would be the
> maximum age kept.

(Looking into the patch..) Actually thrice, not twice.  It is
because I put significance on the access frequency. I think it is
reasonable that the entries with more frequent access gets longer
life (within a certain limit). The original problem here was
negative caches that are created but never accessed. However,
there's no firm reason for the number of the steps (3). There
might be no difference if the extra life time were up to once of
s_p_m_age or even with no extra time.

> Second, when would you use syscache_memory_target != 0? 

It is a suggestion upthread, we sometimes want to keep some known
amount of caches despite that expration should be activated.

> If you had
> syscache_prune_min_age really fast, e.g. 10 seconds?  What is the
> use-case for this? You have a query that touches 10k objects, and then
> the connection stays active but doesn't touch many of those 10k objects,
> and you want it cleaned up in seconds instead of minutes?  (I can't see
> why you would not clean up all unreferenced objects after _minutes_ of
> disuse, but removing them after seconds of disuse seems undesirable.)
> What are the odds you would retain the entires you want with a fast
> target?

Do you asking the reason for the unit? It's just because it won't
be so large even in seconds, to the utmost 3600 seconds.  Even
though I don't think such a short dutaion setting is meaningful
in the real world, either I don't think we need to inhibit
that. (Actually it is useful for testing:p) Another reason is
that GUC_UNIT_MIN doesn't seem so common that it is used only by
two variables, log_rotation_age and old_snapshot_threshold.

> What is the value of being able to change a specific backend's stat
> interval?  I don't remember any other setting having this ability.

As mentioned upthread, it takes significant time to take
statistics so I believe no one is willing to turn it on at all
times.  As the result it should be useless because it cannot be
turned on on an active backend when it actually gets bloat. So I
wanted to provide a remote switching feture.

I also thought that there's some other features that is useful if
it could be turned on remotely so the remote GUC feature but it
was too complex...

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: RTLD_GLOBAL (& JIT inlining)

2019-01-24 Thread Kyotaro HORIGUCHI

At Thu, 24 Jan 2019 01:02:14 -0500, Tom Lane  wrote in 
<24744.1548309...@sss.pgh.pa.us>
> Kyotaro HORIGUCHI  writes:
> > With the attached patch, external modules are loaded RTLD_LOCAL
> > by default. PG_MODULE_EXPORT_SYMBOL in a module source files let
> > it loaded RTLD_GLOBAL. I suppose this is the change with the
> > least impact on existing modules because I believe most of them
> > don't need to export symbols.
> 
> This seems really hacky :-(
> 
> In the PL cases, at least, it's not so much the PL itself that
> wants to export symbols as the underlying language library
> (libpython, libperl, etc).  You're assuming that an action on
> the PL .so will propagate to the underlying language .so, which

(If I understand you correctly) I'm not assuming that the
propagation happens or even the contray.  As can be seen from it
actually works, symbols of underlying libraries referred from
pl*.so are resoved using dependency list in pl*.so itself
independently of the RTLD flags given for it. The problem here is
while underlying language libraries cannot ferer symbols back in
the pl*.so when it is loaded with RTLD_LOCAL, RTLD_GLOBAL makes
some extensions unhappy.

I think that the assumption that RTLD_GLOBAL resolves all was
proved rather shaky. RTLD_DEEPBIND is not available on some
platforms so we need to choose between LOCAL and GLOBAL per
loading module.

> seems like a pretty shaky assumption.  I wonder also whether
> closing and reopening the DL handle will really do anything
> at all for already-loaded libraries ...

internal_load_library() avoids duplicate loading so I believe it
cannot happen.  Regarding duplicate dlopen, RTLD_GLOBAL just
promotes RTLD_LOCAL. In other words, the former wins. So it's not
a problem if the dlclose() there did nothing.

We could RTLD_NOLOAD for almost all platforms but win32 implement
needs additinal change to make work it as expected.

If you are saying hacky about that it uses only the symbol of
unused function, we can actually call it to get the flag.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

55 matches

Mail list logo