Re: SQL statement PREPARE does not work in ECPG

2019-03-15 Thread Michael Meskes
Hi all and thank you Matsumura-san.

>   Excuse:
>   It doesn't include regression tests and pass them.
>   Because I must reset all expected C program of regression.
>   # I add an argument to ECPGdo().

Sure, let's do this at the very end.

> 1. Specification
>   It accepts the following .pgc.
>   I confirmed it works well for AT clause.
>   All results for st1 and st2 are same.

I have a similar text case and can confirm that the output is the same
for both ways of preparing the statement.

> 2. Behavior of ecpglib
> (1) PREPARE with AS clause
> Ecpglib sends the PREPARE statement to backend as is. (using
> PQexec).
> 
> (2) EXECUTE with parameter list
> Ecpglib sends the EXECUTE statement as is (using PQexec), but all
> host variables in
> the list are converted to string-formatted and embedded into the
> EXECUTE statement.
> 
> (3) PREPARE with FROM clause (not changed)
> Ecpglib sends 'P' libpq-message with statement (using PQprepare).
> 
> (4) EXECUTE without parameter list (not changed)
> Ecpglib sends 'B' libpq-message with parameters. (using
> PQexecPrepared).
> 
> 
> 3. Change of preprocessor
> 
>  - I add ECPGst_prepare and ECPGst_execnormal.
>ECPGst_prepare is only for (1) and ECPGst_execnormal is only for
> (2).
># I think the names are not good.
> 
>  - I add one argument to ECPGdo().p It's for prepared statement name.

One question though, why is the statement name always quoted? Do we
really need that? Seems to be more of a hassle than and advantage.

> 4.
> I wonder whether I should merge (3) to (1) and (4) to (4) or not.

I would prefer to merge as much as possible, as I am afraid that if we
do not merge the approaches, we will run into issues later. There was a
reason why we added PQprepare, but I do not remember it from the top of
my head. Need to check when I'm online again.

Michael
-- 
Michael Meskes
Michael at Fam-Meskes dot De, Michael at Meskes dot (De|Com|Net|Org)
Meskes at (Debian|Postgresql) dot Org
Jabber: michael at xmpp dot meskes dot org
VfL Borussia! Força Barça! SF 49ers! Use Debian GNU/Linux, PostgreSQL




Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-03-15 Thread Tom Lane
Yun Li  writes:
> Do you think if we can add queryId into the pg_stat_get_activity function
> and ultimatly expose it in the view? It would be easier to track "similar"
> query's performance over time easier.

No, we're not likely to do that, because it would mean (1) baking one
single definition of "query ID" into the core system and (2) paying
the cost to calculate that ID all the time.

pg_stat_statements has a notion of query ID, but that notion might be
quite inappropriate for other usages, which is why it's an extension
and not core.

regards, tom lane



Re: WIP: Avoid creation of the free space map for small tables

2019-03-15 Thread Amit Kapila
On Fri, Mar 15, 2019 at 3:40 PM John Naylor  wrote:
>
> On Fri, Mar 15, 2019 at 5:55 PM Amit Kapila  wrote:
> > I have committed the latest version of this patch.  I think we can
> > wait for a day or two see if there is any complain from buildfarm or
> > in general and then we can close this CF entry.  IIRC, this was the
> > last patch in the series, right?
>
> Great, thanks! I'll keep an eye on the buildfarm as well.
>
> I just spotted two comments in freespace.c that were true during
> earlier patch revisions, but are no longer true, so I've attached a
> fix for those.
>

LGTM, so committed.

> There are no other patches in the series.
>

Okay.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Make pg_checksums complain if compiled BLCKSZ and data folder's block size differ

2019-03-15 Thread Michael Paquier
Hi all,
(related folks in CC)

Sergei Kornilov has reported here an issue with pg_checksums:
https://www.postgresql.org/message-id/5217311552474...@myt2-66bcb87429e6.qloud-c.yandex.net

If the block size the tool is compiled with does not match the data
folder block size, then users would get incorrect checksums failures,
which is confusing.  As pg_checksum_block() uses directly the block
size, this cannot really be made dynamic yet, so we had better issue
an error on that.  Michael Banck has sent a patch for that:
https://www.postgresql.org/message-id/1552476561.4947.67.ca...@credativ.de

The error message proposed is like that:
+   if (ControlFile->blcksz != BLCKSZ)
+   {
+   fprintf(stderr, _("%s: data directory block size %d is different to 
compiled-in block size %d.\n"),
+   progname, ControlFile->blcksz, BLCKSZ);
+   exit(1);
+   }
Still I think that we could do better.

Here is a proposal of message which looks more natural to me, and more
consistent with what xlog.c complains about:
database files are incompatible with pg_checksums.
The database cluster was initialized with BLCKSZ %d, but pg_checksums
was compiled with BLCKSZ %d.

Has somebody a better wording for that?  Attached is a proposal of
patch.
--
Michael
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 5d4083fa9f..423c036c0f 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -327,6 +327,15 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+	if (ControlFile->blcksz != BLCKSZ)
+	{
+		fprintf(stderr, _("%s: database files are incompatible with pg_checksums.\n"),
+progname);
+		fprintf(stderr, _("%s: The database cluster was initialized with BLCKSZ %u, but pg_checksums was compiled with BLCKSZ %u."),
+progname, ControlFile->blcksz, BLCKSZ);
+		exit(1);
+	}
+
 	if (ControlFile->state != DB_SHUTDOWNED &&
 		ControlFile->state != DB_SHUTDOWNED_IN_RECOVERY)
 	{


signature.asc
Description: PGP signature


Re: Lifespan of a BeginInternalSubTransaction subxact ?

2019-03-15 Thread Tom Lane
Chapman Flack  writes:
> PL/Java implements JDBC Savepoints using BeginInternalSubTransaction/
> ReleaseCurrentSubTransaction/RollbackAndReleaseCurrentSubTransaction.
> That seems to be the Accepted Way of Doing Things within backend PLs
> that want control over error recovery, am I right?

Sounds about right, though I haven't checked the details exactly.

> PL/Java also strictly enforces that such a subxact set within a Java
> function must be released or rolled back by the time that function
> returns.

Yup.

> The reasoning there is less obvious to me; my intuition would have been
> that a subtransaction could remain in play for the life of its containing
> transaction, which could have been started outside of this Java function;

When control returns from a function, we resume executing the statement
that called it.  This can *not* be in a different (sub)transaction than
the statement started in; that wouldn't make any sense logically, and
it certainly won't work from an implementation standpoint either.

The rules are laxer for procedures, I believe; at the very least those are
allowed to commit the calling transaction and start a new one.  I'm less
sure about how they can interact with subtransactions.  To support this,
a CALL statement has to not have any internal state that persists past
the procedure call.  But a function cannot expect that the calling
statement lacks internal state.

regards, tom lane



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread David Fetter
On Fri, Mar 15, 2019 at 12:31:21PM -0400, Chapman Flack wrote:
> On 3/15/19 12:15 PM, Tom Lane wrote:
> > Chapman Flack  writes:
> >> So the proposal seems roughly equivalent to making string_to_array's
> >> second parameter optional default null, and array_to_string's second
> >> parameter optional default ''.
> > 
> > In that case why bother?  It'll just create a cross-version compatibility
> > hazard for next-to-no keystroke savings.  If the cases were so common
> > that they could be argued to be sane "default" behavior, I might feel
> > differently --- but if you were asked in a vacuum what the default
> > delimiters ought to be, I don't think you'd say "no delimiter".
> 
> One could go further and argue that the non-optional arguments improve
> clarity: a reader seeing the explicit NULL or '' argument gets a strong
> clue what's intended, who in the optional-argument case might end up
> thinking "must go look up what this function's default delimiter is".

Going to look up the function's behavior would be much more fun if
there were comments on these functions explaining things.  I'll draft
up a patch for some of that.

In a similar vein, I haven't been able to come up with hazards of
naming function parameters in some document-ish way. What did I miss?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Lifespan of a BeginInternalSubTransaction subxact ?

2019-03-15 Thread Chapman Flack
Hi,

PL/Java implements JDBC Savepoints using BeginInternalSubTransaction/
ReleaseCurrentSubTransaction/RollbackAndReleaseCurrentSubTransaction.

That seems to be the Accepted Way of Doing Things within backend PLs
that want control over error recovery, am I right?

PL/Java also strictly enforces that such a subxact set within a Java
function must be released or rolled back by the time that function
returns.

The reasoning there is less obvious to me; my intuition would have been
that a subtransaction could remain in play for the life of its containing
transaction, which could have been started outside of this Java function;
by holding a reference to the JDBC Savepoint object, a later Java function
called in the same transaction could release it or roll it back.

But I am beginning to suspect that the limitation may be essential, given
the comments in xact.c around StartSubTransaction and how its effects would
get clobbered on exit from a Portal, so a subxact started by an actual
SAVEPOINT is started in two steps, the later one after the Portal has
exited. By contrast, within a function (being executed inside a Portal?),
I have to use BeginInternalSubTransaction, which combines the multiple steps
into one, but whose effects wouldn't survive the exit of the Portal.

Have I reverse-engineered this reasoning correctly? If so, I'll add some
comments about it in the PL/Java code where somebody may be thankful for
them later.

Or, if it turns out the limitation isn't so inescapable, and could be
relaxed to allow a subxact lifetime longer than the single function that
starts it, I could look into doing that.

Thanks!
-Chap



Re: seems like a bug in pgbench -R

2019-03-15 Thread Tomas Vondra
On 3/15/19 5:16 PM, Fabien COELHO wrote:
> 
>>> echo 'select 1' > select.sql
>>>
>>> while /bin/true; do
>>>  pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
>>>  date;
>>> done;
>>
>> Indeed. I'll look at it over the weekend.
>>
>>> So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
>>> one of the other commits touching this part of the code.
> 
> I could not reproduce this issue on head, but I confirm on 11.2.
> 

AFAICS on head it's fixed by 3bac77c48f166b9024a5ead984df73347466ae12

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?

2019-03-15 Thread Yun Li
Hey pg developers,

Do you think if we can add queryId into the pg_stat_get_activity function
and ultimatly expose it in the view? It would be easier to track "similar"
query's performance over time easier.

Thanks a lot!
Yun


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-03-15 Thread Robert Haas
On Fri, Mar 15, 2019 at 5:10 PM Tom Lane  wrote:
> ISTM that this is only a problem if you choose the wrong encryption
> method.  One not-wrong encryption method is to use a stream cipher
> --- maybe that's not the exact right technical term, but anyway
> I'm talking about a method which notionally XOR's the cleartext
> data with a random bit stream generated from the encryption key
> (probably along with other knowable inputs such as the block number).
> In such a method, corruption of individual on-disk bytes doesn't
> prevent you from getting the correct decryption of on-disk bytes
> that aren't corrupted.

Oh, that seems like it might be a really good idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Robert Haas
On Fri, Mar 15, 2019 at 3:45 PM Tom Lane  wrote:
> More to the point, we turned *one* rebuild = false situation into
> a bunch of rebuild = true situations.  I haven't studied it closely,
> but I think a CCA animal would probably see O(N^2) rebuild = true
> invocations in a query with N partitions, since each time
> expand_partitioned_rtentry opens another child table, we'll incur
> an AcceptInvalidationMessages call which leads to forced rebuilds
> of the previously-pinned rels.  In non-CCA situations, almost always
> nothing happens with the previously-examined relcache entries.

That's rather unfortunate.  I start to think that clobbering the cache
"always" is too hard a line.

> I agree that copying data isn't great.  What I don't agree is that this
> solution is better.  In particular, copying data out of the relcache
> rather than expecting the relcache to hold still over long periods
> is the way we've done things everywhere else, cf RelationGetIndexList,
> RelationGetStatExtList, RelationGetIndexExpressions,
> RelationGetIndexPredicate, RelationGetIndexAttrBitmap,
> RelationGetExclusionInfo, GetRelationPublicationActions.  I don't care
> for a patch randomly deciding to do things differently on the basis of an
> unsupported-by-evidence argument that it might cost too much to copy the
> data.  If we're going to make a push to reduce the amount of copying of
> that sort that we do, it should be a separately (and carefully) designed
> thing that applies to all the relcache substructures that have the issue,
> not one-off hacks that haven't been reviewed thoroughly.

That's not an unreasonable argument.  On the other hand, if you never
try new stuff, you lose opportunities to explore things that might
turn out to be better and worth adopting more widely.

I am not very convinced that it makes sense to lump something like
RelationGetIndexAttrBitmap in with something like
RelationGetPartitionDesc.  RelationGetIndexAttrBitmap is returning a
single Bitmapset, whereas the data structure RelationGetPartitionDesc
is vastly larger and more complex.  You can't say "well, if it's best
to copy 32 bytes of data out of the relcache every time we need it, it
must also be right to copy 10k or 100k of data out of the relcache
every time we need it."

There is another difference as well: there's a good chance that
somebody is going to want to mutate a Bitmapset, whereas they had
BETTER NOT think that they can mutate the PartitionDesc.  So returning
an uncopied Bitmapset is kinda risky in a way that returning an
uncopied PartitionDesc is not.

If we want an at-least-somewhat unified solution here, I think we need
to bite the bullet and make the planner hold a reference to the
relcache throughout planning.  (The executor already keeps it open, I
believe.) Otherwise, how is the relcache supposed to know when it can
throw stuff away and when it can't?  The only alternative seems to be
to have each subsystem hold its own reference count, as I did with the
PartitionDirectory stuff, which is not something we'd want to scale
out.

> I especially don't care for the much-less-than-half-baked kluge of
> chaining the old rd_pdcxt onto the new one and hoping that it will go away
> at a suitable time.

It will go away at a suitable time, but maybe not at the soonest
suitable time.  It wouldn't be hard to improve that, though.  The
easiest thing to do, I think, would be to have an rd_oldpdcxt and
stuff the old context there.  If there already is one there, parent
the new one under it.  When RelationDecrementReferenceCount reduces
the reference count to zero, destroy anything found in rd_oldpdcxt.
With that change, things get destroyed at the earliest time at which
we know the old things aren't referenced, instead of the earliest time
at which they are not referenced + an invalidation arrives.

> regression=# create table parent (a text, b int) partition by list (a);
> CREATE TABLE
> regression=# create table child (a text, b int);
> CREATE TABLE
> regression=# do $$
> regression$# begin
> regression$# for i in 1..1000 loop
> regression$# alter table parent attach partition child for values in ('x');
> regression$# alter table parent detach partition child;
> regression$# end loop;
> regression$# end $$;

Interesting example.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: REINDEX CONCURRENTLY 2.0

2019-03-15 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 23:10 +0100 schrieb Peter Eisentraut:
> Here is an updated patch.

I had a quick look at some of the comments and noticed some possible
nitpicky-level problems:

> +/*
> + * index_concurrently_set_dead
> + *
> + * Perform the last invalidation stage of DROP INDEX CONCURRENTLY or REINDEX
> + * CONCURRENTLY before actually dropping the index. After calling this
> + * function the index is seen by all the backends as dead. Low-level locks
> + * taken here are kept until the end of the transaction doing calling this
> + * function.
> + */

"the transaction doing calling this function." sounds wrong to me.

> +  * Extract the list of indexes that are going to be rebuilt based on the
> +  * list of relation Oids given by caller. For each element in given 
> list,
> +  * if the relkind of given relation Oid is a table, all its valid 
> indexes
> +  * will be rebuilt, including its associated toast table indexes. If
> +  * relkind is an index, this index itself will be rebuilt. The locks 
> taken
> +  * on parent relations and involved indexes are kept until this
> +  * transaction is committed to protect against schema changes that might
> +  * occur until the session lock is taken on each relation, session lock
> +  * used to similarly protect from any schema change that could happen
> +  * within the multiple transactions that are used during this process.
> +  */

I think the last sentence in the above should be split up into several
sentences, maybe at "session lock used..."? Or maybe it should just say
"a session lock is used" instead?

> + else
> + {
> + /*
> +  * Save the list of 
> relation OIDs in private
> +  * context
> +  */

Missing full stop at end of comment.

> + /* Definetely no indexes, so leave */

s/Definetely/Definitely/.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-03-15 Thread Tom Lane
Robert Haas  writes:
> If the WAL *is* encrypted, the state at this point is that the block
> is unreadable, because the first 4kB of the block is the first half of
> the bits that resulted from encrypting 8kB of data that includes the
> new record, and the second 4kB of the block is the second half of the
> bits that resulted from encrypting 8kB of data that did not include
> the new record, and since the new record perturbed every bit on the
> page with probability ~50%, what that means is you now have garbage.
> That means that not only did we lose the new record, but we also lost
> the 3.5kB of good data which the page previously contained.  That's
> NOT ok.  Some of the changes associated with those WAL records may
> have been flushed to disk already, or there may be commits in there
> that were acknowledged to the client, and we can't just lose them.

ISTM that this is only a problem if you choose the wrong encryption
method.  One not-wrong encryption method is to use a stream cipher
--- maybe that's not the exact right technical term, but anyway
I'm talking about a method which notionally XOR's the cleartext
data with a random bit stream generated from the encryption key
(probably along with other knowable inputs such as the block number).
In such a method, corruption of individual on-disk bytes doesn't
prevent you from getting the correct decryption of on-disk bytes
that aren't corrupted.

regards, tom lane



Re: [HACKERS] is there a deep unyielding reason to limit U&'' literals to ASCII?

2019-03-15 Thread Chapman Flack
On 1/25/16 12:52 PM, Tom Lane wrote:
> Robert Haas  writes:
>> On Sat, Jan 23, 2016 at 11:27 PM, Chapman Flack  
>> wrote:
>>> What I would have expected would be to allow s
>>> for any Unicode codepoint that's representable in the server encoding,
>>> whatever encoding that is.
> 
>> I don't know anything for sure here, but I wonder if it would make
>> validating string literals in non-UTF8 encodings significant more
>> costly.
> 
> I think it would, and it would likely also require function calls to
> loadable functions (at least given the current design whereby encoding
> conversions are farmed out to loadable libraries).  I do not especially
> want the lexer doing that; it will open all sorts of fun questions
> involving what we can lex in an already-failed transaction.

How outlandish would it be (not for v12, obviously!) to decree that
the lexer produces UTF-8 representations of string and identifier
literals unconditionally, and in some later stage of processing
the parse tree, those get munged to the server encoding if different?

That would keep the lexer simple, and I think it's in principle
the 'correct' view if there is such a thing; choice of encoding doesn't
change what counts as valid lexical form for a U&'...' or U&"..."
literal, but only whether a literal thus created might happen to fit
in your encoding.

If it doesn't, I think that's technically a data error (22021)
rather than one of syntax or lexical form.

> It may well be that these issues are surmountable with some sweat,
> but it doesn't sound like an easy patch to me.  And how big is the
> use-case, really?

Hmm, other than the benefit of not having to explain why it /doesn't/
work?

one could imagine a tool generating SQL output that'll be saved and
run in a database through client or server encodings not known in
advance, adopting a simple strategy of producing only 7-bit ASCII
output and using U& literals for whatever ain't ASCII ... that would
be, in principle, about the most bulletproof way for such a tool
to work, but it's exactly what won't work in PostgreSQL unless the
encoding is UTF-8 (which is the one scenario where there's no need
for such machinations, as the literals could appear directly!).

I'm a maintainer of one such SQL-generating tool, so I know the set
of use cases would have at least one element, if only it would work.

Regards,
-Chap



Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Tom Lane
Robert Haas  writes:
> On Fri, Mar 15, 2019 at 2:18 PM Tom Lane  wrote:
>> BTW, after closer study of 898e5e329 I have a theory as to why it
>> made things worse for CCA animals: it causes relcache entries to be
>> held open (via incremented refcounts) throughout planning, which
>> they were not before.  This means that, during each of the many
>> RelationCacheInvalidate events that will occur during a planning
>> cycle, we have to rebuild those relcache entries; previously they
>> were just dropped once and that was the end of it till execution.

> Right.  So it's just that we turned a bunch of rebuild = false
> situations into rebuild = true situations.  I think.

More to the point, we turned *one* rebuild = false situation into
a bunch of rebuild = true situations.  I haven't studied it closely,
but I think a CCA animal would probably see O(N^2) rebuild = true
invocations in a query with N partitions, since each time
expand_partitioned_rtentry opens another child table, we'll incur
an AcceptInvalidationMessages call which leads to forced rebuilds
of the previously-pinned rels.  In non-CCA situations, almost always
nothing happens with the previously-examined relcache entries.

>> I remain of the opinion that we ought not have PartitionDirectories
>> pinning relcache entries ... but this particular effect isn't really
>> significant to that argument, I think.

> Cool.  I would still like to hear more about why you think that.  I
> agree that the pinning is not great, but copying moderately large
> partitioning structures that are only likely to get more complex in
> future releases also does not seem great, so I think it may be the
> least of evils.  You, on the other hand, seem to have a rather
> visceral hatred for it.

I agree that copying data isn't great.  What I don't agree is that this
solution is better.  In particular, copying data out of the relcache
rather than expecting the relcache to hold still over long periods
is the way we've done things everywhere else, cf RelationGetIndexList,
RelationGetStatExtList, RelationGetIndexExpressions,
RelationGetIndexPredicate, RelationGetIndexAttrBitmap,
RelationGetExclusionInfo, GetRelationPublicationActions.  I don't care
for a patch randomly deciding to do things differently on the basis of an
unsupported-by-evidence argument that it might cost too much to copy the
data.  If we're going to make a push to reduce the amount of copying of
that sort that we do, it should be a separately (and carefully) designed
thing that applies to all the relcache substructures that have the issue,
not one-off hacks that haven't been reviewed thoroughly.

I especially don't care for the much-less-than-half-baked kluge of
chaining the old rd_pdcxt onto the new one and hoping that it will go away
at a suitable time.  Yeah, that will work all right as long as it's not
stressed very hard, but we've found repeatedly that users will find a way
to overstress places where we're sloppy about resource management.
In this case, since the leak is potentially of session lifespan, I doubt
it's even very hard to cause unbounded growth in relcache space usage ---
you just have to do $whatever over and over.  Let's see ...

regression=# create table parent (a text, b int) partition by list (a);
CREATE TABLE
regression=# create table child (a text, b int);
CREATE TABLE
regression=# do $$
regression$# begin
regression$# for i in 1..1000 loop
regression$# alter table parent attach partition child for values in ('x');
regression$# alter table parent detach partition child;
regression$# end loop;
regression$# end $$;

After letting that run for awhile, I've got enough rd_pdcxts to send
MemoryContextStats into a nasty loop.

  CacheMemoryContext: 4259904 total in 11 blocks; 1501120 free (19 chunks); 
2758784 used
partition descriptor: 1024 total in 1 blocks; 400 free (0 chunks); 624 
used: parent
  partition descriptor: 1024 total in 1 blocks; 680 free (0 chunks); 344 
used: parent
partition descriptor: 1024 total in 1 blocks; 400 free (0 chunks); 624 
used: parent
  partition descriptor: 1024 total in 1 blocks; 680 free (0 chunks); 
344 used: parent
partition descriptor: 1024 total in 1 blocks; 400 free (0 chunks); 
624 used: parent
  partition descriptor: 1024 total in 1 blocks; 680 free (0 
chunks); 344 used: parent
partition descriptor: 1024 total in 1 blocks; 400 free (0 
chunks); 624 used: parent
  partition descriptor: 1024 total in 1 blocks; 680 free (0 
chunks); 344 used: parent
partition descriptor: 1024 total in 1 blocks; 400 free (0 
chunks); 624 used: parent
  partition descriptor: 1024 total in 1 blocks; 680 free (0 
chunks); 344 used: parent
partition descriptor: 1024 total in 1 blocks; 400 free 
(0 chunks); 624 used: parent
  partition descriptor: 1024 total in 1 blocks; 680 
free (0 chunks); 344 used: parent
 

Re: GiST VACUUM

2019-03-15 Thread Jeff Janes
On Tue, Mar 5, 2019 at 8:21 AM Heikki Linnakangas  wrote:

> On 05/03/2019 02:26, Andrey Borodin wrote:
> >> I also tried your amcheck tool with this. It did not report any
> >> errors.
> >>
> >> Attached is also latest version of the patch itself. It is the
> >> same as your latest patch v19, except for some tiny comment
> >> kibitzing. I'll mark this as Ready for Committer in the commitfest
> >> app, and will try to commit it in the next couple of days.
> >
> > That's cool! I'll work on 2nd step of these patchset to make
> > blockset data structure prettier and less hacky.
>
> Committed the first patch. Thanks for the patch!
>

Thank you.  This is a transformational change; it will allow GiST indexes
larger than RAM to be used in some cases where they were simply not
feasible to use before.  On a HDD, it resulted in a 50 fold improvement in
vacuum time, and the machine went from unusably unresponsive to merely
sluggish during the vacuum.  On a SSD (albeit a very cheap laptop one, and
exposed from Windows host to Ubuntu over VM Virtual Box) it is still a 30
fold improvement, from a far faster baseline.  Even on an AWS instance with
a "GP2" SSD volume, which normally shows little benefit from sequential
reads, I get a 3 fold speed up.

I also ran this through a lot of crash-recovery testing using simulated
torn-page writes using my traditional testing harness with high concurrency
(AWS c4.4xlarge and a1.4xlarge using 32 concurrent update processes) and
did not encounter any problems.  I tested both with btree_gist on a scalar
int, and on tsvector with each tsvector having 101 tokens.

I did notice that the space freed up in the index by vacuum doesn't seem to
get re-used very efficiently, but that is an ancestral problem independent
of this change.

Cheers,

Jeff


Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Robert Haas
On Fri, Mar 15, 2019 at 2:18 PM Tom Lane  wrote:
> BTW, after closer study of 898e5e329 I have a theory as to why it
> made things worse for CCA animals: it causes relcache entries to be
> held open (via incremented refcounts) throughout planning, which
> they were not before.  This means that, during each of the many
> RelationCacheInvalidate events that will occur during a planning
> cycle, we have to rebuild those relcache entries; previously they
> were just dropped once and that was the end of it till execution.

That makes sense.  Thanks for looking at it, and for the insight.

> You noted correctly that the existing PartitionDesc structure would
> generally get preserved during each reload --- but that has exactly
> zip to do with the amount of transient space consumed by execution
> of RelationBuildDesc.  We still build a transient PartitionDesc
> that we then elect not to install.  And besides that there's all the
> not-partitioning-related stuff that RelationBuildDesc can leak.

Right.  So it's just that we turned a bunch of rebuild = false
situations into rebuild = true situations.  I think.

> This is probably largely irrelevant for non-CCA situations, since
> we don't expect forced relcache invals to occur often otherwise.
> But it seems to fit the facts, particularly that hyrax is only dying
> on queries that involve moderately large partitioning structures
> and hence quite a few relations pinned by PartitionDirectory entries.
>
> I remain of the opinion that we ought not have PartitionDirectories
> pinning relcache entries ... but this particular effect isn't really
> significant to that argument, I think.

Cool.  I would still like to hear more about why you think that.  I
agree that the pinning is not great, but copying moderately large
partitioning structures that are only likely to get more complex in
future releases also does not seem great, so I think it may be the
least of evils.  You, on the other hand, seem to have a rather
visceral hatred for it.  That may be based on facts with which I am
not acquainted or it may be that we have a different view of what good
design looks like.  If it's the latter, we may just have to agree to
disagree and see if other people want to opine, but if it's the
former, I'd like to know those facts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: GiST VACUUM

2019-03-15 Thread Andrey Borodin
Hi!

> 11 марта 2019 г., в 20:03, Heikki Linnakangas  написал(а):
> 
> On 10/03/2019 18:40, Andrey Borodin wrote:
>> Here's new version of the patch for actual page deletion.
>> Changes:
>> 1. Fixed possible concurrency issue:
>> We were locking child page while holding lock on internal page. Notes
>> in GiST README recommend locking child before parent. Thus now we
>> unlock internal page before locking children for deletion, and lock
>> it again, check that everything is still on it's place and delete
>> only then.
> 
> Good catch. The implementation is a bit broken, though. This segfaults:
Sorry for the noise. Few lines of old code leaked into the patch between 
testing and mailing.

> BTW, we don't seem to have test coverage for this in the regression tests. 
> The tests that delete some rows, where you updated the comment, doesn't 
> actually seem to produce any empty pages that could be deleted.
I've updated test to delete more rows. But it did not trigger previous bug 
anyway, we had to delete everything for this case.

> 
>> One thing still bothers me. Let's assume that we have internal page
>> with 2 deletable leaves. We lock these leaves in order of items on
>> internal page. Is it possible that 2nd page have follow-right link on
>> 1st and someone will lock 2nd page, try to lock 1st and deadlock with
>> VACUUM?
> 
> Hmm. If the follow-right flag is set on a page, it means that its right 
> sibling doesn't have a downlink in the parent yet. Nevertheless, I think I'd 
> sleep better, if we acquired the locks in left-to-right order, to be safe.
Actually, I did not found lock coupling in GiST code. But I decided to lock 
just two pages at once (leaf, then parent, for every pair). PFA v22 with this 
concurrency logic.

> 
> An easy cop-out would be to use LWLockConditionalAcquire, and bail out if we 
> can't get the lock. But if it's not too complicated, it'd be nice to acquire 
> the locks in the correct order to begin with.
> 
> I did a round of cleanup and moving things around, before I bumped into the 
> above issue. I'm including them here as separate patches, for easier review, 
> but they should of course be squashed into yours. For convenience, I'm 
> including your patches here as well, so that all the patches you need are in 
> the same place, but they are identical to what you posted.
I've rebased all these patches so that VACUUM should work on every commit. 
Let's just squash them for the next iteration, it was quite a messy rebase.
BTW, you deleted numEmptyPages, this makes code cleaner, but this variable 
could stop gistvacuum_recycle_pages() when everything is recycled already.


Thanks!

Best regards, Andrey Borodin.



0001-Add-block-set-data-structure-v22.patch
Description: Binary data


0002-Delete-pages-during-GiST-VACUUM-v22.patch
Description: Binary data


0003-Minor-cleanup-v22.patch
Description: Binary data


0004-Move-the-page-deletion-logic-to-separate-function-v2.patch
Description: Binary data


0005-Remove-numEmptyPages-.-it-s-not-really-needed-v22.patch
Description: Binary data


0006-Misc-cleanup-v22.patch
Description: Binary data


Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Pavel Stehule
pá 15. 3. 2019 v 18:30 odesílatel Chapman Flack 
napsal:

> On 3/15/19 12:59 PM, Pavel Stehule wrote:
> > for this proposal "char" != byte
> >
> > result[n] = substring(str FROM n FOR 1)
>
> I think that's what string_to_array(..., null) already does:
>

sure. My proposal is +/- just reduction about null parameter.



> SHOW server_encoding;
> server_encoding
> UTF8
>
> WITH
>  t0(s) AS (SELECT text 'verlorn ist daz slüzzelîn'),
>  t1(a) AS (SELECT string_to_array(s, null) FROM t0)
> SELECT
>  char_length(s), octet_length(convert_to(s, 'UTF8')),
>  array_length(a,1), a
> FROM
>  t0, t1;
>
> char_length|octet_length|array_length|a
> 25|27|25|{v,e,r,l,o,r,n," ",i,s,t," ",d,a,z," ",s,l,ü,z,z,e,l,î,n}
>
>
> Regards,
> -Chap
>


Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 14, 2019 at 12:36 PM Tom Lane  wrote:
>> Hmm, I wonder why not.  I suppose the answer is that
>> the leak is worse in HEAD than before, but how come?

> I'd like to know that, too, but right now I don't.

BTW, after closer study of 898e5e329 I have a theory as to why it
made things worse for CCA animals: it causes relcache entries to be
held open (via incremented refcounts) throughout planning, which
they were not before.  This means that, during each of the many
RelationCacheInvalidate events that will occur during a planning
cycle, we have to rebuild those relcache entries; previously they
were just dropped once and that was the end of it till execution.

You noted correctly that the existing PartitionDesc structure would
generally get preserved during each reload --- but that has exactly
zip to do with the amount of transient space consumed by execution
of RelationBuildDesc.  We still build a transient PartitionDesc
that we then elect not to install.  And besides that there's all the
not-partitioning-related stuff that RelationBuildDesc can leak.

This is probably largely irrelevant for non-CCA situations, since
we don't expect forced relcache invals to occur often otherwise.
But it seems to fit the facts, particularly that hyrax is only dying
on queries that involve moderately large partitioning structures
and hence quite a few relations pinned by PartitionDirectory entries.

I remain of the opinion that we ought not have PartitionDirectories
pinning relcache entries ... but this particular effect isn't really
significant to that argument, I think.

regards, tom lane



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-15 Thread Robert Haas
On Thu, Mar 14, 2019 at 7:50 AM Kyotaro HORIGUCHI
 wrote:
> Thank you for the kind explanation. I'm not sure but I understand
> this as '"lists" is an extension' turned into 'lists are an
> extension'. That is, the "lists' expresses a concept rather than
> the plurarilty. (But I haven't got a gut feeling..)

The idea that it expresses a concept rather than the plurality is
exactly right -- so apparently you DO have a gut feeling!

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Chapman Flack
On 3/15/19 12:59 PM, Pavel Stehule wrote:
> for this proposal "char" != byte
> 
> result[n] = substring(str FROM n FOR 1)

I think that's what string_to_array(..., null) already does:

SHOW server_encoding;
server_encoding
UTF8

WITH
 t0(s) AS (SELECT text 'verlorn ist daz slüzzelîn'),
 t1(a) AS (SELECT string_to_array(s, null) FROM t0)
SELECT
 char_length(s), octet_length(convert_to(s, 'UTF8')),
 array_length(a,1), a
FROM
 t0, t1;

char_length|octet_length|array_length|a
25|27|25|{v,e,r,l,o,r,n," ",i,s,t," ",d,a,z," ",s,l,ü,z,z,e,l,î,n}


Regards,
-Chap



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Pavel Stehule
pá 15. 3. 2019 v 17:54 odesílatel Chapman Flack 
napsal:

> On 3/15/19 12:26 PM, Pavel Stehule wrote:
> > you use string_to_array function without separator, then only one
> possible
> > semantic is there - separation by chars.
>
> Other languages can and do specify other semantics for the
> separator-omitted case: often (as in Python) it means to split
> around "runs of one or more characters the platform considers white
> space", as a convenience, given that it's a fairly commonly wanted
> meaning but can be tedious to spell out as an explicit separator.
>

for this proposal "char" != byte

result[n] = substring(str FROM n FOR 1)


> I admit I think a separator of '' would be more clear than null,
> so if I were designing string_to_array in a green field, I think
> I would swap the meanings of null and '' as the delimiter: null
> would mean "don't really split anything", and '' would mean "split
> everywhere you can find '' in the string", that is, everywhere.
>
> But the current behavior is already established
>

yes

Pavel

>
> Regards,
> -Chap
>


Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Chapman Flack
On 3/15/19 12:26 PM, Pavel Stehule wrote:
> you use string_to_array function without separator, then only one possible
> semantic is there - separation by chars.

Other languages can and do specify other semantics for the
separator-omitted case: often (as in Python) it means to split
around "runs of one or more characters the platform considers white
space", as a convenience, given that it's a fairly commonly wanted
meaning but can be tedious to spell out as an explicit separator.

I admit I think a separator of '' would be more clear than null,
so if I were designing string_to_array in a green field, I think
I would swap the meanings of null and '' as the delimiter: null
would mean "don't really split anything", and '' would mean "split
everywhere you can find '' in the string", that is, everywhere.

But the current behavior is already established

Regards,
-Chap



Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

2019-03-15 Thread Robert Haas
On Thu, Mar 14, 2019 at 8:30 AM Masahiko Sawada  wrote:
> So I think there is no such insider thread problem, right?

No, I think that's a really bad assumption.  If there's no insider
threat, then what exactly ARE you guarding against?  If somebody
steals the disk, you don't need PostgreSQL to do anything; just
encrypt the filesystem and call it good.  Encryption within the
database can only provide any value when someone already has some
degree of access to the system, and we want to prevent them from
getting more access.  Maybe they are legitimately allowed some access
and we want to keep them from exceeding those bounds, or maybe they
hacked something else to gain limited access and we want to keep them
from parleying that into more access.  Either way, they are to some
extent an insider.

And that is why I think your idea that it's OK to encrypt the WAL with
key A while the tables are meanwhile being encrypted with keys B1, B2,
..., Bn is unsound.  In such an environment, anybody who has key A
does not really need to bother compromising the other keys; they
basically get everything anyway.  There is therefore little point in
having the complexity of multiple keys; just use key A for everything
and be done with it.  To justify the complexity of multiple keys, they
need to be independent of each other from a security perspective; that
is, it must be that all copies of any given piece of data are
encrypted with the same key, and you can therefore get that data only
if you get that key.  And that means, I believe, that fine-grained
encryption must use the same key to encrypt the WAL for table T1 --
and the indexes and TOAST table for table T1 and the WAL for those --
that it uses to encrypt table T1 itself.

If we can't make that happen, then I really don't see any advantage in
supporting multiple keys.  Nobody steals the key to one office if they
can, for the same effort, steal the master key for the whole building.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Ordered Partitioned Table Scans

2019-03-15 Thread David Rowley
On Sat, 16 Mar 2019 at 04:22, David Rowley  wrote:
> I've attached an updated patch which fixes the conflict with 0a9d7e1f6d8

... and here's the one that I should have sent. (renamed to v12 to
prevent confusion)

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v12-0001-Allow-Append-to-be-used-in-place-of-MergeAppend-.patch
Description: Binary data


Re: GTIN14 support for contrib/isn

2019-03-15 Thread Michael Kefeder



Am 15.03.19 um 17:27 schrieb Tom Lane:

Michael Kefeder  writes:

For a project of ours we need GTIN14 data type support.


Hm, what is that and where would a reviewer find the specification for it?


specs are from GS1 here https://www.gs1.org/standards/id-keys/gtin
side-note EAN13 is actually called GTIN-13 now. Wikipedia has a quick 
overview https://en.wikipedia.org/wiki/Global_Trade_Item_Number



Looking at the code I saw every format that isn-extension supports is
stored as an EAN13. Theoretically that can be changed to be GTIN14, but
that would mean quite a lot of rewrite I feared, so I chose to code only
GTIN14 I/O separetely to not interfere with any existing conversion
magic. This yields an easier to understand patch and doesn't touch
existing functionality. However it introduces redundancy to a certain
extent.


Yeah, you certainly don't get to change the on-disk format of the existing
types, unfortunately.  Not sure what the least messy way of dealing with
that is.  I guess we do want this to be part of contrib/isn rather than
an independent module, if there are sane datatype conversions with the
existing isn types.

the on-disk format does not change (it would support even longer codes 
it's just an integer where one bit is used for valid/invalid flag, did 
not touch that at all). Putting GTIN14 in isn makes sense I find and is 
back/forward compatible.



Find my patch attached. Please let me know if there are things that need
changes, I'll do my best to get GTIN support into postgresql.


Well, two comments immediately:

* where's the documentation changes?

* simply editing the .sql file in-place is not acceptable; that breaks
the versioning conventions for extensions, and leaves users with no
easy upgrade path.  What you need to do is create a version upgrade
script that adds the new objects.  For examples look for other recent
patches that have added features to contrib modules, eg

https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=eb6f29141bed9dc95cb473614c30f470ef980705

Also, I'm afraid you've pretty much missed the deadline to get this
into PG v12; we've already got more timely-submitted patches than
we're likely to be able to finish reviewing.  Please add it to the
first v13 commit fest,

https://commitfest.postgresql.org/23/

so that we don't forget about it when the time does come to look at it.

regards, tom lane



thanks for the feedback! will do mentioned documentation changes and 
create a separate upgrade sql file. Making it into v13 is fine by me.


br
 mike



Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2019-03-15 Thread Andres Freund
On 2019-03-15 21:58:25 +0900, Etsuro Fujita wrote:
> (2019/03/04 12:10), Etsuro Fujita wrote:
> > (2019/03/02 3:57), Andres Freund wrote:
> > > FWIW, I pushed the EPQ patch, doing this conversion blindly. It'd be
> > > awesome if you'd check that it actually works...
> > 
> > I'll start the work later this week. I think I can post an (initial)
> > report on that next week, maybe.
> 
> Here is an updated version of the patch [1].  This version doesn't allow
> pushing down joins to the remote if there is a possibility that EPQ will be
> executed, but I think it would be useful to test the EPQ patch.  I haven't
> looked into the EPQ patch in detail yet, but I tested the patch with the
> attached, and couldn't find any issues on the patch.

Thanks a lot for that work!



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Chapman Flack
On 3/15/19 12:15 PM, Tom Lane wrote:
> Chapman Flack  writes:
>> So the proposal seems roughly equivalent to making string_to_array's
>> second parameter optional default null, and array_to_string's second
>> parameter optional default ''.
> 
> In that case why bother?  It'll just create a cross-version compatibility
> hazard for next-to-no keystroke savings.  If the cases were so common
> that they could be argued to be sane "default" behavior, I might feel
> differently --- but if you were asked in a vacuum what the default
> delimiters ought to be, I don't think you'd say "no delimiter".

One could go further and argue that the non-optional arguments improve
clarity: a reader seeing the explicit NULL or '' argument gets a strong
clue what's intended, who in the optional-argument case might end up
thinking "must go look up what this function's default delimiter is".

-Chap



Re: GTIN14 support for contrib/isn

2019-03-15 Thread Tom Lane
Michael Kefeder  writes:
> For a project of ours we need GTIN14 data type support.

Hm, what is that and where would a reviewer find the specification for it?

> Looking at the code I saw every format that isn-extension supports is 
> stored as an EAN13. Theoretically that can be changed to be GTIN14, but 
> that would mean quite a lot of rewrite I feared, so I chose to code only 
> GTIN14 I/O separetely to not interfere with any existing conversion 
> magic. This yields an easier to understand patch and doesn't touch 
> existing functionality. However it introduces redundancy to a certain 
> extent.

Yeah, you certainly don't get to change the on-disk format of the existing
types, unfortunately.  Not sure what the least messy way of dealing with
that is.  I guess we do want this to be part of contrib/isn rather than
an independent module, if there are sane datatype conversions with the
existing isn types.

> Find my patch attached. Please let me know if there are things that need 
> changes, I'll do my best to get GTIN support into postgresql.

Well, two comments immediately:

* where's the documentation changes?

* simply editing the .sql file in-place is not acceptable; that breaks
the versioning conventions for extensions, and leaves users with no
easy upgrade path.  What you need to do is create a version upgrade
script that adds the new objects.  For examples look for other recent
patches that have added features to contrib modules, eg

https://git.postgresql.org/gitweb/?p=postgresql.git=commitdiff=eb6f29141bed9dc95cb473614c30f470ef980705

Also, I'm afraid you've pretty much missed the deadline to get this
into PG v12; we've already got more timely-submitted patches than
we're likely to be able to finish reviewing.  Please add it to the
first v13 commit fest,

https://commitfest.postgresql.org/23/

so that we don't forget about it when the time does come to look at it.

regards, tom lane



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Pavel Stehule
pá 15. 3. 2019 v 17:16 odesílatel Tom Lane  napsal:

> Chapman Flack  writes:
> > So the proposal seems roughly equivalent to making string_to_array's
> > second parameter optional default null, and array_to_string's second
> > parameter optional default ''.
>
> In that case why bother?  It'll just create a cross-version compatibility
> hazard for next-to-no keystroke savings.  If the cases were so common
> that they could be argued to be sane "default" behavior, I might feel
> differently --- but if you were asked in a vacuum what the default
> delimiters ought to be, I don't think you'd say "no delimiter".
>

My motivation is following - sometimes I need to convert string to array of
chars. Using NULL as separator is possible, but it is not intuitive. When
you use string_to_array function without separator, then only one possible
semantic is there - separation by chars.

I understand so there is a possible collision and possible meaning of
missing parameter like default value. But in this case this meaning,
semantic is not practical.

Regards

Pavel


> regards, tom lane
>


Inquiries

2019-03-15 Thread emmjadea
Hello, I am inquiring into the program Parallel access. How do I eile it from 
my device/ data completetly.  I don't believe ir is reliable... a broken laptop 
might have had the harddrive taken by a "friend" and accessed my account 
through this avenue. I am also wondering what you know about phone numbers... I 
had +61 0487653571- but just broke my phone. I will get a new number today and 
need to change it for my verification apps.  is there a way to make sure this 
sim number cannot be replicated. I realise all my phone numbers have 
dissappeared over the past are of everything and see that I am struggling- some 
might be inhibiting the process. Hence why it has taken a year to get anywhere. 
MY new email I will use id : saint.ae.sur...@gmail.com after I phase out this 
sight. Getting my device to flow has not been easy. IF I thought that you 
needed access to help that would be ok... it doesn look lilebNew phone number 
to be updated soon. Please help, Ae

Re: seems like a bug in pgbench -R

2019-03-15 Thread Fabien COELHO




echo 'select 1' > select.sql

while /bin/true; do
 pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
 date;
done;


Indeed. I'll look at it over the weekend.


So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
one of the other commits touching this part of the code.


I could not reproduce this issue on head, but I confirm on 11.2.

--
Fabien.



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Tom Lane
Chapman Flack  writes:
> So the proposal seems roughly equivalent to making string_to_array's
> second parameter optional default null, and array_to_string's second
> parameter optional default ''.

In that case why bother?  It'll just create a cross-version compatibility
hazard for next-to-no keystroke savings.  If the cases were so common
that they could be argued to be sane "default" behavior, I might feel
differently --- but if you were asked in a vacuum what the default
delimiters ought to be, I don't think you'd say "no delimiter".

regards, tom lane



Re: Feature: triggers on materialized views

2019-03-15 Thread Mitar
Hi!

On Fri, Mar 15, 2019 at 2:46 AM David Steele  wrote:
> The reason is that you have not gotten any committer support for this
> patch or attracted significant review, that I can see.  On the contrary,
> three committers have expressed doubts about all or some of the patch
> and it doesn't seem to me that their issues have been addressed.

To my understanding many comments were to early version of the patch
which were or addressed or explained why proposed changes not work
(use TRUNCATE/INSERT instead of swapping heads is slower). If I missed
anything and have not addressed it, please point this out.

The only pending/unaddressed comment is about the philosophical
question of what it means to be a trigger. There it seems we simply
disagree with the reviewer and I do not know how to address that. I
just see this as a very pragmatical feature which provides features
you would have if you would not use PostgreSQL abstraction. If you can
use features then, why not also with the abstraction?

So in my view this looks more like a lack of review feedback on the
latest version of the patch. I really do not know how to ask for more
feedback or to move the philosophical discussion further. I thought
that commit fest is in fact exactly a place to motivate and collect
such feedback instead of waiting for it in the limbo.

> What you need to be doing for PG13 is very specifically addressing
> committer concerns and gathering a consensus that the behavior of this
> patch is the way to go.

To my understanding the current patch addresses all concerns made by
reviewers on older versions of the patch, or explains why proposals
cannot work out, modulo the question of "does this change what trigger
is".

Moreover, it improves performance of CONCURRENT REFRESH for about 5%
based on my tests because of the split to INSERT/UPDATE/DELETE instead
of TRUNCATE/INSERT, when measuring across a mixed set of queries which
include just UPDATEs to source tables.

Thank you everyone who is involved in this process for your time, I do
appreciate. I am just trying to explain that I am a bit at loss on
concrete next steps I could take here.

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m



GTIN14 support for contrib/isn

2019-03-15 Thread Michael Kefeder

Hello hackers,

For a project of ours we need GTIN14 data type support. The isn 
extension already supports EAN13, and a '0' prefixed EAN13 is a valid 
GTIN14. The leftmost "new" 14th digit is a packaging level indicator 
which we need (= using EAN13 and faking a leading 0 in output doesn't 
cut it).


Looking at the code I saw every format that isn-extension supports is 
stored as an EAN13. Theoretically that can be changed to be GTIN14, but 
that would mean quite a lot of rewrite I feared, so I chose to code only 
GTIN14 I/O separetely to not interfere with any existing conversion 
magic. This yields an easier to understand patch and doesn't touch 
existing functionality. However it introduces redundancy to a certain 
extent.


Find my patch attached. Please let me know if there are things that need 
changes, I'll do my best to get GTIN support into postgresql.


thanks in advance
 mike
diff --git a/expected/isn.out b/expected/isn.out
index 18fe37a..b313641 100644
--- a/expected/isn.out
+++ b/expected/isn.out
@@ -18,6 +18,8 @@ INFO:  operator family "isn_ops" of access method btree is 
missing cross-type op
 INFO:  operator family "isn_ops" of access method btree is missing cross-type 
operator(s)
 INFO:  operator family "isn_ops" of access method btree is missing cross-type 
operator(s)
 INFO:  operator family "isn_ops" of access method btree is missing cross-type 
operator(s)
+INFO:  operator family "isn_ops" of access method btree is missing cross-type 
operator(s)
+INFO:  operator family "isn_ops" of access method hash is missing cross-type 
operator(s)
 INFO:  operator family "isn_ops" of access method hash is missing cross-type 
operator(s)
 INFO:  operator family "isn_ops" of access method hash is missing cross-type 
operator(s)
 INFO:  operator family "isn_ops" of access method hash is missing cross-type 
operator(s)
@@ -29,6 +31,7 @@ INFO:  operator family "isn_ops" of access method hash is 
missing cross-type ope
  amname |  opcname   
 +
  btree  | ean13_ops
+ btree  | gtin14_ops
  btree  | isbn13_ops
  btree  | isbn_ops
  btree  | ismn13_ops
@@ -37,6 +40,7 @@ INFO:  operator family "isn_ops" of access method hash is 
missing cross-type ope
  btree  | issn_ops
  btree  | upc_ops
  hash   | ean13_ops
+ hash   | gtin14_ops
  hash   | isbn13_ops
  hash   | isbn_ops
  hash   | ismn13_ops
@@ -44,11 +48,20 @@ INFO:  operator family "isn_ops" of access method hash is 
missing cross-type ope
  hash   | issn13_ops
  hash   | issn_ops
  hash   | upc_ops
-(16 rows)
+(18 rows)
 
 --
 -- test valid conversions
 --
+SELECT '1234567890128'::GTIN14, -- EAN is a GTIN14
+   '123456789012?'::GTIN14, -- compute check digit for me
+   '11234567890125'::GTIN14,
+   '01234567890128'::GTIN14;
+ gtin14  | gtin14  | gtin14  | gtin14  
+-+-+-+-
+ 0123456789012-8 | 0123456789012-8 | 1123456789012-5 | 0123456789012-8
+(1 row)
+
 SELECT '9780123456786'::EAN13, -- old book
'9790123456785'::EAN13, -- music
'9791234567896'::EAN13, -- new book
@@ -249,6 +262,67 @@ SELECT 9780123456786::ISBN;
 ERROR:  cannot cast type bigint to isbn
 LINE 1: SELECT 9780123456786::ISBN;
 ^
+SELECT '91234567890125'::GTIN14; -- invalid indicator
+ERROR:  Indicator digit out of range for GTIN14 number: "91234567890125"
+LINE 1: SELECT '91234567890125'::GTIN14;
+   ^
+SELECT '123456789012'::GTIN14; -- too short
+ERROR:  invalid input syntax for GTIN14 number: "123456789012"
+LINE 1: SELECT '123456789012'::GTIN14;
+   ^
+SELECT '1234567890127'::GTIN14; -- wrong checkdigit
+ERROR:  invalid check digit for GTIN14 number: "1234567890127", should be 8
+LINE 1: SELECT '1234567890127'::GTIN14;
+   ^
+--
+-- test validity helpers
+--
+SELECT make_valid('1234567890120!'::GTIN14); -- EAN-13
+   make_valid
+-
+ 0123456789012-8
+(1 row)
+
+SELECT make_valid('11234567890120!'::GTIN14); -- GTIN-14
+   make_valid
+-
+ 1123456789012-5
+(1 row)
+
+SELECT is_valid(make_valid('1234567890120!'::GTIN14)); -- EAN-13
+ is_valid 
+--
+ t
+(1 row)
+
+SELECT is_valid(make_valid('11234567890120!'::GTIN14)); -- GTIN-14
+ is_valid 
+--
+ t
+(1 row)
+
+CREATE TABLE gtin_valid (gtin GTIN14 NOT NULL);
+INSERT INTO gtin_valid VALUES
+-- all invalid because of ! marking
+   ('1234567890120!'), -- invalid EAN-13
+   ('1234567890128!'), -- valid EAN-13
+   ('11234567890120!'), -- invalid GTIN-14
+   ('11234567890125!'), -- valid GTIN-14
+-- valid
+   ('1234567890128'::GTIN14), -- valid EAN-13
+   ('11234567890125'::GTIN14); -- valid GTIN-14
+SELECT gtin, is_valid(gtin) FROM gtin_valid;
+   gtin   | is_valid 
+--+--
+ 0123456789012-8! | f
+ 0123456789012-8! | f
+ 1123456789012-5! | f
+ 1123456789012-5! | f
+ 0123456789012-8  | t
+ 1123456789012-5  | t
+(6 rows)
+
+DROP 

Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Pavel Stehule
pá 15. 3. 2019 v 16:59 odesílatel Chapman Flack 
napsal:

> On 3/15/19 11:46 AM, Pavel Stehule wrote:
> > pá 15. 3. 2019 v 15:03 odesílatel David Fetter 
> napsal:
> >> Whatever optimizations you have in mind for this, could they also work
> >> for string_to_array() and array_to_string() when they get an empty
> >> string handed to them?
> >
> > my idea is use string_to_array('AHOJ') --> {A,H,O,J}
> >
> > empty input means empty result --> {}
>
> I thought the question was maybe about an empty /delimiter/ string.
>
> It seems that string_to_array already has this behavior if NULL is
> passed as the delimiter:
>
> > select string_to_array('AHOJ', null);
>  string_to_array
> -
>  {A,H,O,J}
>
> and array_to_string has the proposed behavior if passed an
> empty string as the delimiter (as one would naturally expect)
> ... but not null for a delimiter (that just makes the result null).
>
> So the proposal seems roughly equivalent to making string_to_array's
> second parameter optional default null, and array_to_string's second
> parameter optional default ''.
>
> Does that sound right?
>

yes

Pavel


> Regards,
> -Chap
>


Re: Facing issue in using special characters

2019-03-15 Thread Gunther
This is not an issue for "hackers" nor "performance" in fact even for 
"general" it isn't really an issue.


"Special characters" is actually nonsense.

When people complain about "special characters" they haven't thought 
things through.


If you are unwilling to think things through and go step by step to make 
sure you know what you are doing, then you will not get it and really 
nobody can help you.


In my professional experience, people who complain about "special 
characters" need to be cut loose or be given a chance (if they are 
established employees who carry some weight). If a contractor complains 
about "special characters" they need to be fired.


Understand charsets -- character set, code point, and encoding. Then 
understand how encoding and string literals and "escape sequences" in 
string literals might work.


Know that UNICODE today is the one standard, and there is no more need 
to do code table switch. There is nothing special about a Hebrew alef or 
a greek lower case alpha or a latin A. Nor a hyphen and en-dash or an 
em-dash. All these characters are in the UNICODE. Yes, there are some 
Japanese who claim that they don't like that their Chinese character 
versions are put together with simplified reform Chinese font. But 
that's a font issue, not a character code issue.


7 bit ASCII is the first page of UNICODE, even in the UTF-8 encoding.

ISO Latin 1, or the Windoze 123 whatever special table of ISO Latin 1 
has the same code points as UNICODE pages 0 and 1, but not compatible 
with UTF-8 coding because of the way UTF-8 uses the 8th bit.


But none of this is likely your problem.

Your problem is about string literals in SQL for examples. About the 
configuration of your database (I always use initdb with --locale C and 
--encoding UTF-8). Use UTF-8 in the database. Then all your issues are 
about string literals in SQL and in JAVA and JSON and XML or whatever 
you are using.


You have to do the right thing. If you produce any representation, 
whether that is XML or JSON or SQL or URL query parameters, or a CSV 
file, or anything at all, you need to escape your string values properly.


This question with no detail didn't deserve such a thorough answer, but 
it's my soap box. I do not accept people complaining about "special 
characters". My own people get that same sermon from me when they make 
that mistake.


-Gunther

On 3/15/2019 1:19, M Tarkeshwar Rao wrote:


Hi all,

Facing issue in using special characters. We are trying to insert 
records to a remote Postgres Server and our application not able to 
perform this because of errors.


It seems that issue is because of the special characters that has been 
used in one of the field of a row.


Regards

Tarkeshwar



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Chapman Flack
On 3/15/19 11:46 AM, Pavel Stehule wrote:
> pá 15. 3. 2019 v 15:03 odesílatel David Fetter  napsal:
>> Whatever optimizations you have in mind for this, could they also work
>> for string_to_array() and array_to_string() when they get an empty
>> string handed to them?
> 
> my idea is use string_to_array('AHOJ') --> {A,H,O,J}
> 
> empty input means empty result --> {}

I thought the question was maybe about an empty /delimiter/ string.

It seems that string_to_array already has this behavior if NULL is
passed as the delimiter:

> select string_to_array('AHOJ', null);
 string_to_array
-
 {A,H,O,J}

and array_to_string has the proposed behavior if passed an
empty string as the delimiter (as one would naturally expect)
... but not null for a delimiter (that just makes the result null).

So the proposal seems roughly equivalent to making string_to_array's
second parameter optional default null, and array_to_string's second
parameter optional default ''.

Does that sound right?

Regards,
-Chap



Re: Ordered Partitioned Table Scans

2019-03-15 Thread Robert Haas
On Fri, Mar 8, 2019 at 3:15 PM Tom Lane  wrote:
> I took a quick look through this and I'm not very happy with it.
> It seems to me that the premise ought to be "just use an Append
> if we can prove the output would be ordered anyway", but that's not
> what we actually have here: instead you're adding more infrastructure
> onto Append, which notably involves invasive changes to the API of
> create_append_path, which is the main reason why the patch keeps breaking.
> (It's broken again as of HEAD, though the cfbot doesn't seem to have
> noticed yet.)  Likewise there's a bunch of added complication in
> cost_append, create_append_plan, etc.  I think you should remove all that
> and restrict this optimization to the case where all the subpaths are
> natively ordered --- if we have to insert Sorts, it's hardly going to move
> the needle to worry about simplifying the parent MergeAppend to Append.

Other people have already said that they don't think this is true; I
agree with those people.  Even if you have to sort *every* path,
sorting a bunch of reasonably large data sets individually is possibly
better than sorting all the data together, because (1) you can start
emitting rows sooner, (2) it might make you fit in memory instead of
having to spill to disk, and (3) O(n lg n) is supralinear.  Still, if
that were the only case this handled, I wouldn't be too excited,
because it seems at least plausible that lumping a bunch of small
partitions together and sorting it all at once could save some
start-up and tear-down costs vs. sorting them individually.  But it
isn't; the ability to consider that sort of plan is just a fringe
benefit.  If a substantial fraction of the partitions have indexes --
half, three-quarters, all-but-one -- sorting only the remaining ones
should win big.

Admittedly, I think this case is less common than it was a few years
ago, because with table inheritance one often ended up with a parent
partition that was empty and had no indexes so it produced a
dummy-seqscan in every plan, and that's gone with partitioning.
Moreover, because of Alvaro's work on cascaded CREATE INDEX, people
are probably now more likely to have matching indexes on all the
partitions.  Still, it's not that hard to imagine a case where older
data that doesn't change much is more heavily indexed than tables that
are still suffering DML of whatever kind on a regular basis.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread Pavel Stehule
pá 15. 3. 2019 v 15:03 odesílatel David Fetter  napsal:

> On Fri, Mar 15, 2019 at 05:04:02AM +0100, Pavel Stehule wrote:
> > Hi
> >
> > I propose mentioned functions without specified separator. In this case
> the
> > string is transformed to array of chars, in second case, the array of
> chars
> > is transformed back to string.
> >
> > Comments, notes?
>
> Whatever optimizations you have in mind for this, could they also work
> for string_to_array() and array_to_string() when they get an empty
> string handed to them?
>

my idea is use string_to_array('AHOJ') --> {A,H,O,J}

empty input means empty result --> {}


>
> As to naming, some languages use explode/implode.
>

can be, but if we have string_to_array already, I am thinking so it is good
name.



> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>


Re: "WIP: Data at rest encryption" patch and, PostgreSQL 11-beta3

2019-03-15 Thread Robert Haas
On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska  wrote:
> The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code
> needs to compile even w/o OpenSSL (of course the encryption won't be enabled
> in that case). I'll try to reduce the use of this construct only to the code
> blocks that really reference the OpenSSL functions.

That sounds like a good place to start.  The other part will be
avoiding referencing the OpenSSL functions from too many places.  For
example, right now we have secure_read() -- which I know some people
hate, or at least hate the name of] -- which lets you send data to the
client without having to care whether that data is going to be
SSL-encrypted on its way over the network.  One can argue about
whether that particular abstraction is properly-designed, but as Tom
recently felt the urge to remind me on another thread, the value of
abstraction in general is not in doubt.  I think we need one here, so
that the SSL-specific bits can be isolated in a relatively small
number of files, and then clients can ask to write a file, or write a
block to a file, or whatever, and ignore the question of whether that
is going to get encrypted somehow behind the scenes.

> Since buffile.c needs to handle this kind of problem (and it already does in
> the last patch version), I think even other than temporary files could be
> handled by this module. The patch below adds functions BufFileOpenTransient(),
> BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(),
> etc. respectively. Once we implement the encryption, these functions will also
> hide handling of the encryption blocks from user. In this (preliminary) patch
> As an example, I applied this approach to ReorderBufferSerializeChange() and
> the function seems a bit simpler now. (ReorderBufferRestoreChange() would need
> more work to adopt this approach.)

Yeah, I don't have time to look at this in detail right now, but this
kind of thing seems like a promising approach to me.

> > The interaction of this capability with certain tricks that PostgreSQL
> > plays needs some thought -- and documentation, at least developer
> > documentation, maybe user documentation.  One area is recovery.
> > Writing WAL relies on the fact that if you are in the midst of
> > rewriting a block and the power fails, bytes that are the same in both
> > the old and new block images will be undisturbed; encryption will make
> > that false.  How's that going to work?
>
> Yes, if only a single bit is different, decryption will turn the whole
> encryption block (16 bytes) into garbage. So we need to enforce
> full_page_writes=on if encryption is enabled. The last version of the patch
> does not do that.

I believe that we need full_page_writes=on and wal_log_hints=on, but
they don't fix the problem that I'm describing.  Suppose that the very
last page in the write-ahead log contains 3.5kB of good data and 7kB
of zeroes.  Now, somebody comes along and writes a new write-ahead log
record that is 1k in size.  In memory, we update the block: it now
contains 4.5kB of good data and 6.5kB of zeroes.  Now, as we're
writing the block to do disk, the power is removed.  Consequently, the
first 4kB hits the platter and the other 4kB disappears into the void.

If the WAL is not encrypted, the state at this point is that the 3.5kB
of good data that we had previously is still there, and the first 500
bytes of the new WAL record are also there, and the rest is missing.
Recovery will detect that there's a valid-looking record at end of
WAL, but when it tries to checksum that record, it will fail, and so
that trailing half-record will just get ignored.  And that's fine.

If the WAL *is* encrypted, the state at this point is that the block
is unreadable, because the first 4kB of the block is the first half of
the bits that resulted from encrypting 8kB of data that includes the
new record, and the second 4kB of the block is the second half of the
bits that resulted from encrypting 8kB of data that did not include
the new record, and since the new record perturbed every bit on the
page with probability ~50%, what that means is you now have garbage.
That means that not only did we lose the new record, but we also lost
the 3.5kB of good data which the page previously contained.  That's
NOT ok.  Some of the changes associated with those WAL records may
have been flushed to disk already, or there may be commits in there
that were acknowledged to the client, and we can't just lose them.

One idea I have for fixing this problem is to encrypt the individual
WAL records rather than the blocks.  That leaks some information, of
course, but maybe not in a way that really matters.

> > Hint bits also rely on this principle.  I thought there might be some
> > interaction between this work and wal_log_hints for this reason, but I see
> > nothing of that sort in the patch.
>
> I'm not sure if the hint bit is still a problem if we first copy the shared
> buffer to 

Re: Ordered Partitioned Table Scans

2019-03-15 Thread David Rowley
I've attached an updated patch which fixes the conflict with 0a9d7e1f6d8

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


v11-0001-Allow-Append-to-be-used-in-place-of-MergeAppend-.patch
Description: Binary data


Re: libpq environment variables in the server

2019-03-15 Thread Noah Misch
On Fri, Mar 15, 2019 at 10:06:29AM +, Dagfinn Ilmari Mannsåker wrote:
> Peter Eisentraut  writes:
> 
> > On 2019-03-15 05:00, Noah Misch wrote:
> >> I consider the following style more idiomatic:
> >> 
> >>  {
> >>  local %ENV;
> >>  delete $ENV{PGAPPNAME};
> >>  ...
> >>  }
> >
> > That doesn't work because the first line clears the entire environment.
> 
> The solution to that is to do 'local %ENV = %ENV;', to assign a copy of
> the original to the localised variable.

That's the right thing, not what I wrote.  We use that in
src/bin/initdb/t/001_initdb.pl.



Re: seems like a bug in pgbench -R

2019-03-15 Thread Fabien COELHO




echo 'select 1' > select.sql

while /bin/true; do
 pgbench -n -f select.sql -R 1000 -j 8 -c 8 -T 1 > /dev/null 2>&1;
 date;
done;


Indeed. I'll look at it over the weekend.


So I guess this is a bug in 12788ae49e1933f463bc59a6efe46c4a01701b76, or
one of the other commits touching this part of the code.


Yep, possibly.

--
Fabien.



Re: Best way to keep track of a sliced TOAST

2019-03-15 Thread Robert Haas
On Fri, Mar 15, 2019 at 7:37 AM Bruno Hass  wrote:
> This idea is what I was hoping to achieve. Would we be able to make 
> optimizations on deTOASTing  just by storing the chunk lengths in chunk 0?

I don't know. I guess we could also NOT store the chunk lengths and
just say that if you don't know which chunk you want by chunk number,
your only other alternative is to read the chunks in order.  The
problem with that is that it you can no longer index by byte-position
without fetching every chunk prior to that byte position, but maybe
that's not important enough to justify the overhead of a list of chunk
lengths.  Or maybe it depends on what you want to do with it.

Again, stuff like what you are suggesting here has been suggested
before.  I think the problem is if someone did the work to invent such
an infrastructure, that wouldn't actually do anything by itself.  We'd
then need to find an application of it where it brought us some clear
advantage.  As I said in my previous email, jsonb seems like a
promising candidate, but I don't think it's a slam dunk.  What would
the design look like, exactly?  Which operations would get faster, and
could we really make them work?  The existing format is, I think,
designed with a byte-oriented format in mind, and a chunk-oriented
format might have different design constraints.  It seems like an idea
with potential, but there's a lot of daylight between a directional
idea with potential and a specific idea accompanied by a high-quality
implementation thereof.

> Also, wouldn't it break existing functions by dedicating a whole chunk 
> (possibly more) to such metadata?

Anybody writing such a patch would have to be prepared to fix any such
breakage that occurred, at least as regards core code.  I would guess
that this could be done without breaking too much third-party code,
but I guess it depends on exactly what the author of this hypothetical
patch ends up changing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: string_to_array, array_to_string function without separator

2019-03-15 Thread David Fetter
On Fri, Mar 15, 2019 at 05:04:02AM +0100, Pavel Stehule wrote:
> Hi
> 
> I propose mentioned functions without specified separator. In this case the
> string is transformed to array of chars, in second case, the array of chars
> is transformed back to string.
> 
> Comments, notes?

Whatever optimizations you have in mind for this, could they also work
for string_to_array() and array_to_string() when they get an empty
string handed to them?

As to naming, some languages use explode/implode.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Tom Lane
Robert Haas  writes:
> On Thu, Mar 14, 2019 at 6:08 PM Tom Lane  wrote:
>> And it's
>> still calling get_rel_relkind() in the rd_pdcxt context, potentially
>> leaking a *lot* of stuff into that normally-long-lived context, since
>> that will result in fresh catcache (and thence potentially relcache)
>> loads in some cases.

> That's really unfortunate.  I know CLOBBER_CACHE_ALWAYS testing has a
> lot of value, but get_rel_relkind() is the sort of function that
> developers need to be able to call without worrying with creating a
> massive memory leak.

I don't find that to be a concern.  The bug here is that random code is
being called while CurrentMemoryContext is pointing to a long-lived
context, and that's just a localized violation of a well understood coding
convention.  I don't think that that means we need some large fire drill
in response.

>> What I'm thinking, therefore, is that 2455ab488 had the right idea but
>> didn't take it far enough.  We should remove the temp-context logic it
>> added to RelationBuildPartitionDesc and instead put that one level up,
>> in RelationBuildDesc, where the same temp context can serve all of these
>> leak-prone sub-facilities.

> Yeah, that sounds good.

OK, I'll have a go at that today.

>> Possibly it'd make sense to conditionally compile this so that we only
>> do it in a CLOBBER_CACHE_ALWAYS build.  I'm not very sure about that,

> I lean toward thinking it makes more sense to be consistent, but I'm
> not sure that's right.

My feeling right now is that the its-okay-to-leak policy has been in
place for decades and we haven't detected a problem with it before.
Hence, doing that differently in normal builds should require some
positive evidence that it'd be beneficial (or at least not a net loss).
I don't have the time or interest to collect such evidence.  But I'm
happy to set up the patch to make it easy for someone else to do so,
if anyone is sufficiently excited about this to do the work.

regards, tom lane



Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Tom Lane
Heikki Linnakangas  writes:
> On 15/03/2019 00:08, Tom Lane wrote:
>> Possibly it'd make sense to conditionally compile this so that we only
>> do it in a CLOBBER_CACHE_ALWAYS build.  I'm not very sure about that,
>> but arguably in a normal build the overhead of making and destroying
>> a context would outweigh the cost of the leaked memory.  The main
>> argument I can think of for doing it all the time is that having memory
>> allocation work noticeably differently in CCA builds than normal ones
>> seems like a recipe for masking normal-mode bugs from the CCA animals.

> Having CLOBBER_CACHE_ALWAYS behave differently sounds horrible.

I'm not entirely convinced of that.  It's already pretty different...

> We maintain a free list of AllocSetContexts nowadays, so creating a 
> short-lived context should be pretty cheap. Or if it's still too 
> expensive, we could create one short-lived context as a child of 
> TopMemoryContext, and reuse that on every call, resetting it at the end 
> of the function.

RelationBuildDesc potentially recurses, so a single context won't do.
Perhaps you're right that the context freelist would make this cheap
enough to not matter, but I'm not sure of that either.

What I'm inclined to do to get the buildfarm pressure off is to set
things up so that by default we do this only in CCA builds, but there's
a #define you can set from the compile command line to override that
decision in either direction.  Then, if somebody wants to investigate
the performance effects in more detail, they could twiddle it easily.
Depending on the results, we could alter the default policy.

regards, tom lane



Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Robert Haas
On Thu, Mar 14, 2019 at 6:08 PM Tom Lane  wrote:
> I found that even with 2455ab488, there still seemed to be an
> unreasonable amount of MessageContext bloat during some of the queries,
> on the order of 50MB in some of them.  Investigating more closely,
> I found that 2455ab488 has a couple of simple oversights: it's still
> calling partition_bounds_create in the caller's context, allowing
> whatever that builds or leaks to be a MessageContext leak.

Oops.

> And it's
> still calling get_rel_relkind() in the rd_pdcxt context, potentially
> leaking a *lot* of stuff into that normally-long-lived context, since
> that will result in fresh catcache (and thence potentially relcache)
> loads in some cases.

That's really unfortunate.  I know CLOBBER_CACHE_ALWAYS testing has a
lot of value, but get_rel_relkind() is the sort of function that
developers need to be able to call without worrying with creating a
massive memory leak.  Maybe the only time it is really going to get
called enough to matter is from within the cache invalidation
machinery itself, in which case your idea will fix it.  But I'm not
sure about that.

> What I'm thinking, therefore, is that 2455ab488 had the right idea but
> didn't take it far enough.  We should remove the temp-context logic it
> added to RelationBuildPartitionDesc and instead put that one level up,
> in RelationBuildDesc, where the same temp context can serve all of these
> leak-prone sub-facilities.

Yeah, that sounds good.

> Possibly it'd make sense to conditionally compile this so that we only
> do it in a CLOBBER_CACHE_ALWAYS build.  I'm not very sure about that,
> but arguably in a normal build the overhead of making and destroying
> a context would outweigh the cost of the leaked memory.  The main
> argument I can think of for doing it all the time is that having memory
> allocation work noticeably differently in CCA builds than normal ones
> seems like a recipe for masking normal-mode bugs from the CCA animals.
> But that's only a guess not proven fact; it's also possible that having
> two different behaviors would expose bugs we'd otherwise have a hard
> time detecting, such as continuing to rely on the "temporary" data
> structures longer than we should.

I lean toward thinking it makes more sense to be consistent, but I'm
not sure that's right.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] EvalPlanQual behaves oddly for FDW queries involving system columns

2019-03-15 Thread Etsuro Fujita

(2019/03/04 12:10), Etsuro Fujita wrote:

(2019/03/02 3:57), Andres Freund wrote:

FWIW, I pushed the EPQ patch, doing this conversion blindly. It'd be
awesome if you'd check that it actually works...


I'll start the work later this week. I think I can post an (initial)
report on that next week, maybe.


Here is an updated version of the patch [1].  This version doesn't allow 
pushing down joins to the remote if there is a possibility that EPQ will 
be executed, but I think it would be useful to test the EPQ patch.  I 
haven't looked into the EPQ patch in detail yet, but I tested the patch 
with the attached, and couldn't find any issues on the patch.


Best regards,
Etsuro Fujita

[1] https://www.postgresql.org/message-id/16016.1431455074%40sss.pgh.pa.us
*** a/contrib/postgres_fdw/deparse.c
--- b/contrib/postgres_fdw/deparse.c
***
*** 1216,1222  deparseLockingClause(deparse_expr_cxt *context)
  		{
  			PlanRowMark *rc = get_plan_rowmark(root->rowMarks, relid);
  
! 			if (rc)
  			{
  /*
   * Relation is specified as a FOR UPDATE/SHARE target, so
--- 1216,1222 
  		{
  			PlanRowMark *rc = get_plan_rowmark(root->rowMarks, relid);
  
! 			if (rc && rc->markType == ROW_MARK_COPY)
  			{
  /*
   * Relation is specified as a FOR UPDATE/SHARE target, so
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 32,37 
--- 32,38 
  #include "optimizer/pathnode.h"
  #include "optimizer/paths.h"
  #include "optimizer/planmain.h"
+ #include "optimizer/prep.h"
  #include "optimizer/restrictinfo.h"
  #include "optimizer/tlist.h"
  #include "parser/parsetree.h"
***
*** 70,75  enum FdwScanPrivateIndex
--- 71,80 
  	FdwScanPrivateRetrievedAttrs,
  	/* Integer representing the desired fetch_size */
  	FdwScanPrivateFetchSize,
+ 	/* SQL statement to execute remotely (as a String node) */
+ 	FdwScanPrivateSelectSql2,
+ 	/* Integer list of attribute numbers retrieved by SELECT */
+ 	FdwScanPrivateRetrievedAttrs2,
  
  	/*
  	 * String describing join i.e. names of relations being joined and types
***
*** 222,227  typedef struct PgFdwDirectModifyState
--- 227,256 
  	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
  } PgFdwDirectModifyState;
  
+ /*
+  * Execution state for fetching/locking foreign rows.
+  */
+ typedef struct PgFdwFetchState
+ {
+ 	Relation	rel;			/* relcache entry for the foreign table */
+ 	AttInMetadata *attinmeta;	/* attribute datatype conversion metadata */
+ 
+ 	/* for remote query execution */
+ 	PGconn	   *conn;			/* connection for the fetch */
+ 	char	   *p_name;			/* name of prepared statement, if created */
+ 
+ 	/* extracted fdw_private data */
+ 	char	   *query;			/* text of SELECT command */
+ 	List	   *retrieved_attrs;	/* attr numbers retrieved by SELECT */
+ 
+ 	/* info about parameters for prepared statement */
+ 	int			p_nums;			/* number of parameters to transmit */
+ 	FmgrInfo   *p_flinfo;		/* output conversion functions for them */
+ 
+ 	/* working memory context */
+ 	MemoryContext temp_cxt;		/* context for per-tuple temporary data */
+ } PgFdwFetchState;
+ 
  /*
   * Workspace for analyzing a foreign table.
   */
***
*** 333,338  static bool postgresPlanDirectModify(PlannerInfo *root,
--- 362,374 
  static void postgresBeginDirectModify(ForeignScanState *node, int eflags);
  static TupleTableSlot *postgresIterateDirectModify(ForeignScanState *node);
  static void postgresEndDirectModify(ForeignScanState *node);
+ static RowMarkType postgresGetForeignRowMarkType(RangeTblEntry *rte,
+ 			  LockClauseStrength strength);
+ static void postgresRefetchForeignRow(EState *estate,
+ 		  ExecRowMark *erm,
+ 		  Datum rowid,
+ 		  TupleTableSlot *slot,
+ 		  bool *updated);
  static void postgresExplainForeignScan(ForeignScanState *node,
  		   ExplainState *es);
  static void postgresExplainForeignModify(ModifyTableState *mtstate,
***
*** 379,384  static void get_remote_estimate(const char *sql,
--- 415,423 
  static bool ec_member_matches_foreign(PlannerInfo *root, RelOptInfo *rel,
  		  EquivalenceClass *ec, EquivalenceMember *em,
  		  void *arg);
+ static List *create_foreign_fetch_info(PlannerInfo *root,
+ 		  RelOptInfo *baserel,
+ 		  RowMarkType markType);
  static void create_cursor(ForeignScanState *node);
  static void fetch_more_data(ForeignScanState *node);
  static void close_cursor(PGconn *conn, unsigned int cursor_number);
***
*** 396,405  static TupleTableSlot *execute_foreign_modify(EState *estate,
  	   CmdType operation,
  	   TupleTableSlot *slot,
  	   TupleTableSlot *planSlot);
! static void prepare_foreign_modify(PgFdwModifyState *fmstate);
! static const char **convert_prep_stmt_params(PgFdwModifyState *fmstate,
! 		 ItemPointer tupleid,
! 		 TupleTableSlot *slot);
  static void 

Re: Re: Re: [HACKERS] Custom compression methods

2019-03-15 Thread Chris Travers
On Fri, Mar 15, 2019 at 6:07 PM David Steele  wrote:

> On 3/7/19 11:50 AM, Alexander Korotkov wrote:
> > On Thu, Mar 7, 2019 at 10:43 AM David Steele  > > wrote:
> >
> > On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:
> >
> >  > there are another set of patches.
> >  > Only rebased to current master.
> >  >
> >  > Also I will change status on commitfest to 'Needs review'.
> >
> > This patch has seen periodic rebases but no code review that I can
> see
> > since last January 2018.
> >
> > As Andres noted in [1], I think that we need to decide if this is a
> > feature that we want rather than just continuing to push it from CF
> > to CF.
> >
> >
> > Yes.  I took a look at code of this patch.  I think it's in pretty good
> > shape.  But high level review/discussion is required.
>
> OK, but I think this patch can only be pushed one more time, maximum,
> before it should be rejected.
>

As a note, we believe at Adjust that this would be very helpful for some of
our use cases and some other general use cases.  I think as a feature,
custom compression methods are a good thing but we are not the only ones
with interests here and would be interested in pushing this forward if
possible or finding ways to contribute to better approaches in this
particular field.

>
> Regards,
> --
> -David
> da...@pgmasters.net
>
>

-- 
Best Regards,
Chris Travers
Head of Database

Tel: +49 162 9037 210 | Skype: einhverfr | www.adjust.com
Saarbrücker Straße 37a, 10405 Berlin


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Freitag, den 15.03.2019, 21:23 +0900 schrieb Michael Paquier:
> On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:
> > 1. There's a typo in line 578 which makes it fail to compile:
> > 
> > > src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first 
> > > use in this function)
> > >   }y
> 
> I am wondering where you got this one.  My local branch does not have
> it, and the patch I sent does not seem to have it either.

Mea culpa, I must've fat fingered something in the editor before
applying your patch, sorry. I should've double-checked.

> > 2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
> > with the controlfile_path?
> 
> PG_MODE_DISABLE needs controlfile_path as well.  We could make the
> cleanup only available when using --enable, the code just looked more
> simple in its current shape.  I think it's just more simple to set
> everything unconditionally.  This code may become more complicated in
> the future.

Ok.

> > 3. There's (I think) leftover debug output in the following places:
> > 
> > > + printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
> > > + controlfile_path_temp);
> > > + printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
> > > + controlfile_path);
> > > + printf(_("Syncing data folder\n"));
> > 
> > (that one is debatable, we are mentioning this only in verbose mode in
> > pg_basebackup but pg_checksums is more chatty anyway, so probably
> > fine).
> 
> This is wanted.  Many folks have been complaning on this thread about
> crashes and such, surely we want logs about what happens :)
> 
> > > + printf(_("Updating control file\n"));
> > 
> > Besides to the syncing message (which is user-relevant cause they might
> > wonder what is taking so long), the others seem to be implementation
> > details we don't need to tell the user about.
> 
> Perhaps having them under --verbose makes more sense?

Well if we think it is essential in order to tell the user what happened
in the case of an error, it shouldn't be verbose I guess.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Robert Haas
On Fri, Mar 15, 2019 at 3:32 AM Heikki Linnakangas  wrote:
> We maintain a free list of AllocSetContexts nowadays, so creating a
> short-lived context should be pretty cheap. Or if it's still too
> expensive, we could create one short-lived context as a child of
> TopMemoryContext, and reuse that on every call, resetting it at the end
> of the function.

Relcache rebuild is reeentrant, I think, so we have to be careful about that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 12:54:01PM +0100, Michael Banck wrote:
> 1. There's a typo in line 578 which makes it fail to compile:
> 
> |src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use 
> in this function)
> |   }y

I am wondering where you got this one.  My local branch does not have
it, and the patch I sent does not seem to have it either.

> 2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
> with the controlfile_path?

PG_MODE_DISABLE needs controlfile_path as well.  We could make the
cleanup only available when using --enable, the code just looked more
simple in its current shape.  I think it's just more simple to set
everything unconditionally.  This code may become more complicated in
the future.

> 3. There's (I think) leftover debug output in the following places:
> 
> |+printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
> |+controlfile_path_temp);
> 
> |+printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
> |+controlfile_path);
> 
> |+printf(_("Syncing data folder\n"));
> 
> (that one is debatable, we are mentioning this only in verbose mode in
> pg_basebackup but pg_checksums is more chatty anyway, so probably
> fine).

This is wanted.  Many folks have been complaning on this thread about
crashes and such, surely we want logs about what happens :)

> |+printf(_("Updating control file\n"));
> 
> Besides to the syncing message (which is user-relevant cause they might
> wonder what is taking so long), the others seem to be implementation
> details we don't need to tell the user about.

Perhaps having them under --verbose makes more sense?
--
Michael


signature.asc
Description: PGP signature


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:
> I have been able to grab some time to incorporate the feedback gathered
> on this thread, and please find attached a new version of the patch to
> add --enable/--disable. 

Some more feedback:

1. There's a typo in line 578 which makes it fail to compile:

|src/bin/pg_checksums/pg_checksums.c:578:4: error: ‘y’ undeclared (first use in 
this function)
|   }y

2. Should the pqsignal() stuff only be setup in PG_MODE_ENABLE? Same
with the controlfile_path?

3. There's (I think) leftover debug output in the following places:

|+  printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path,
|+  controlfile_path_temp);

|+  printf(_("Renaming \"%s\" to \"%s\"\n"), controlfile_path_temp,
|+  controlfile_path);

|+  printf(_("Syncing data folder\n"));

(that one is debatable, we are mentioning this only in verbose mode in
pg_basebackup but pg_checksums is more chatty anyway, so probably fine).

|+  printf(_("Updating control file\n"));

Besides to the syncing message (which is user-relevant cause they might
wonder what is taking so long), the others seem to be implementation
details we don't need to tell the user about.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



RE: Best way to keep track of a sliced TOAST

2019-03-15 Thread Bruno Hass
> It seems to me that making this overly pluggable is likely to be a net
> negative, because there probably aren't really that many different
> ways of doing this that are useful, and because having to store more
> identifying information will make the toasted datum larger.  One idea
> is to let the datatype divide the datum up into variable-sized chunks
> and then have the on-disk format store a list of chunk lengths in
> chunk 0 (and following, if there are lots of chunks?) followed by the
> chunks themselves.  The data would all go into the TOAST table as it
> does today, and the TOASTed data could be read without knowing
> anything about the data type.  However, code that knows how the data
> was chunked at TOAST time could try to speed things up by operating
> directly on the compressed data if it can figure out which chunk it
> needs without fetching everything.

This idea is what I was hoping to achieve. Would we be able to make 
optimizations on deTOASTing  just by storing the chunk lengths in chunk 0? 
Also, wouldn't it break existing functions by dedicating a whole chunk 
(possibly more) to such metadata?


De: Robert Haas 
Enviado: terça-feira, 12 de março de 2019 14:34
Para: Bruno Hass
Cc: pgsql-hackers
Assunto: Re: Best way to keep track of a sliced TOAST

On Mon, Mar 11, 2019 at 9:27 AM Bruno Hass  wrote:
> I've been reading about TOASTing and would like to modify how the slicing 
> works by taking into consideration the type of the varlena field. These 
> changes would support future implementations of type specific optimized 
> TOAST'ing functions. The first step would be to add information to the TOAST 
> so we know if it is sliced or not and by which function it was sliced and 
> TOASTed. This information should not break the current on disk format of 
> TOASTs. I had the idea of putting this information on the varattrib struct 
> va_header, perhaps adding more bit layouts to represent sliced TOASTs. This 
> idea, however, was pointed to me to be a rather naive approach. What would be 
> the best way to do this?

Well, you can't really use va_header, because every possible bit
pattern for va_header means something already.  The first byte tells
us what kind of varlena we have:

 * Bit layouts for varlena headers on big-endian machines:
 *
 * 00xx 4-byte length word, aligned, uncompressed data (up to 1G)
 * 01xx 4-byte length word, aligned, *compressed* data (up to 1G)
 * 1000 1-byte length word, unaligned, TOAST pointer
 * 1xxx 1-byte length word, unaligned, uncompressed data (up to 126b)
 *
 * Bit layouts for varlena headers on little-endian machines:
 *
 * xx00 4-byte length word, aligned, uncompressed data (up to 1G)
 * xx10 4-byte length word, aligned, *compressed* data (up to 1G)
 * 0001 1-byte length word, unaligned, TOAST pointer
 * xxx1 1-byte length word, unaligned, uncompressed data (up to 126b)

All of the bits other than the ones that tell us what kind of varlena
we've got are part of the length word itself; you couldn't use any bit
pattern for some other purpose without breaking on-disk compatibility
with existing releases.  What you could possibly do is add a new
possible value of vartag_external, which tells us what "kind" of
toasted datum we've got.  Currently, toasted datums stored on disk are
always type 18, but there's no reason that I know of why we couldn't
have more than one possibility there.

However, I think you might want to discuss on this mailing list a bit
more about what you are hoping to achieve before you do too much
development, at least if you aspire to get something committed.  A
project like the one you are proposing sounds like something not for
the faint of heart, and it's not really clear what benefits you
anticipate.  I think there has been previous discussion of this topic
at least for jsonb, so you might also want to search the archives for
those discussions.  I wouldn't go so far as to say that this idea
can't work or wouldn't have any value, but it does seem like the kind
of thing where you could spend a lot of time going down a dead end,
and discussion on the list might help you avoid some of those dead
ends.

It seems to me that making this overly pluggable is likely to be a net
negative, because there probably aren't really that many different
ways of doing this that are useful, and because having to store more
identifying information will make the toasted datum larger.  One idea
is to let the datatype divide the datum up into variable-sized chunks
and then have the on-disk format store a list of chunk lengths in
chunk 0 (and following, if there are lots of chunks?) followed by the
chunks themselves.  The data would all go into the TOAST table as it
does today, and the TOASTed data could be read without knowing
anything about the data type.  However, code that knows how the data
was chunked at TOAST time could try to speed things up by operating
directly on the compressed data 

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Mittwoch, den 13.03.2019, 12:24 +0100 schrieb Magnus Hagander:
> On Wed, Mar 13, 2019 at 11:54 AM Sergei Kornilov  wrote:
> > One new question from me: how about replication?
> > Case: primary+replica, we shut down primary and enable checksum, and
> > "started streaming WAL from primary" without any issue. I have
> > master with checksums, but replica without.
> > Or cluster with checksums, then disable checksums on primary, but
> > standby think we have checksums.
> 
> Enabling or disabling the checksums offline on the master quite
> clearly requires a rebuild of the standby, there is no other way (this
> is one of the reasons for the online enabling in that patch, so I
> still hope we can get that done -- but not for this version).
> 
> You would have the same with PITR backups for example. 

I'd like to get back to PITR. 

I thought about this a bit and actually I think PITR might be fine in
the sense that if you enabled or disabled the cluster after the last
basebackup and then do PITR with the avaiable WAL beyond that, you would
get a working cluster, just with the checksum state the cluster had at
the time of the basebackup. I think that would be entirely accetable, so
long as nothing else breaks?

I made some quick tests and did see no errors, but maybe I am missing
something?
 
> And especially if you have some tool that does block or segment level
> differential.

This might be the case, but not sure if PostgreSQL core must worry about
it? Obviously the documentation must be made explicit about these kinds
of cases.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



Re: Show a human-readable n_distinct in pg_stats view

2019-03-15 Thread Laurenz Albe
Maxence Ahlouche wrote:
> It seems to me that since the pg_stats view is supposed to be
> human-readable, it would make sense to show a human-readable version
> of n_distinct.
> Currently, when the stats collector estimates that the number of
> distinct values is more than 10% of the total row count, what is
> stored in pg_statistic.stadistinct is -1 * n_distinct / totalrows, the
> rationale being that if new rows are inserted in the table, they are
> likely to introduce new values, and storing that value allows the
> stadistinct not to get stale too fast.
> 
> You can find attached a simple WIP patch to show the proper n_distinct
> value in pg_stats.
> 
> * Is this desired?
> * Would it make sense to add a column in the pg_stats view to display
> the information "lost", that is the fact that postgres will assume
> that inserting new rows means a higher n_distinct?
> * Am I right to assume that totalrows in the code
> (src/backend/commands/analyze.c:2170) actually corresponds to
> n_live_tup? That's what I gathered from glancing at the code, but I
> might be wrong.
> * Should the catalog version be changed for this kind of change?
> * Should I add this patch to the commitfest?
> 
> If this patch is actually desired, I'll update the documentation as well.
> I'm guessing this patch would break scripts relying on the pg_stats
> view, but I do not know how much we want to avoid that, since they
> should rely on the base tables rather than on the views.

This may make things easier for those who are confused by a negative
entry, but it will obfuscate matters for those who are not.

I don't think that is a win, particularly since the semantics are
explained in great detail in the documentation of "pg_stats".

So I am -1 on that one.

Yours,
Laurenz Albe




Re: WIP: Avoid creation of the free space map for small tables

2019-03-15 Thread John Naylor
On Fri, Mar 15, 2019 at 5:55 PM Amit Kapila  wrote:
> I have committed the latest version of this patch.  I think we can
> wait for a day or two see if there is any complain from buildfarm or
> in general and then we can close this CF entry.  IIRC, this was the
> last patch in the series, right?

Great, thanks! I'll keep an eye on the buildfarm as well.

I just spotted two comments in freespace.c that were true during
earlier patch revisions, but are no longer true, so I've attached a
fix for those. There are no other patches in the series.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


correct-local-map-comments.patch
Description: Binary data


Re: Re: Re: [HACKERS] Custom compression methods

2019-03-15 Thread David Steele

On 3/7/19 11:50 AM, Alexander Korotkov wrote:
On Thu, Mar 7, 2019 at 10:43 AM David Steele > wrote:


On 2/28/19 5:44 PM, Ildus Kurbangaliev wrote:

 > there are another set of patches.
 > Only rebased to current master.
 >
 > Also I will change status on commitfest to 'Needs review'.

This patch has seen periodic rebases but no code review that I can see
since last January 2018.

As Andres noted in [1], I think that we need to decide if this is a
feature that we want rather than just continuing to push it from CF
to CF.


Yes.  I took a look at code of this patch.  I think it's in pretty good 
shape.  But high level review/discussion is required.


OK, but I think this patch can only be pushed one more time, maximum, 
before it should be rejected.


Regards,
--
-David
da...@pgmasters.net



Re: libpq environment variables in the server

2019-03-15 Thread Dagfinn Ilmari Mannsåker
Peter Eisentraut  writes:

> On 2019-03-15 05:00, Noah Misch wrote:
>> I consider the following style more idiomatic:
>> 
>>  {
>>  local %ENV;
>>  delete $ENV{PGAPPNAME};
>>  ...
>>  }
>
> That doesn't work because the first line clears the entire environment.

The solution to that is to do 'local %ENV = %ENV;', to assign a copy of
the original to the localised variable.  This doesn't work on VMS,
because its concept of environment variables is quite different from
UNIX, but PostgreSQL doesn't support that anyway.

> What does work is
>
> {
> delete local $ENV{PGAPPNAME};
> ...
> }
>
> But that is documented as new in Perl 5.12.0, so we might not be able to
> use it.  It appears to work in the 5.8.9 I have lying around, so I'm
> confused.

It "works" as in it's not a syntax error, but it doesn't actually
localise the deletion. The following program:

use strict;
use warnings;
use feature 'say';

our %env = qw(foo bar baz bat);
say "original:  ", join(", ", sort keys %env);
{
delete local $env{foo};
say "localised: ", join(", ", sort keys %env);
}
say "restored?  ", join(", ", sort keys %env);

on 5.12 prints:

original:  baz, foo
localised: baz
restored?  baz, foo

while on 5.10 it prints:

original:  baz, foo
localised: baz
restored?  baz

BTW, https://perl.bot/ is handy for testing things like this on various
Perl versions you don't have lying around.

- ilmari
-- 
- Twitter seems more influential [than blogs] in the 'gets reported in
  the mainstream press' sense at least.   - Matt McLeod
- That'd be because the content of a tweet is easier to condense down
  to a mainstream media article.  - Calle Dybedahl



Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 09:52:11AM +0100, Magnus Hagander wrote:
> As I said, that's a big hammer. I'm all for having a better solution. But I
> don't think it's acceptable not to have *any* defense against it, given how
> bad corruption it can lead to.

Hm...  It looks that my arguments are not convincing enough.  I am not
really convinced that there is any need to make that the default, nor
does it make much sense to embed that stuff directly into pg_checksums
because that's actually just doing an extra step which is equivalent
to calling pg_resetwal, and we know that this tool has the awesome
reputation to cause more harm than anything else.  At least I would
like to have an option which allows to support the behavior to *not*
update the system identifier so as the cases I mentioned would be
supported, because then it becomes possible to enable checksums on a
primary with only a failover as long as page copies are not directly
involved and that all operations go through WAL.  And that would be
quite nice.
--
Michael


signature.asc
Description: PGP signature


Re: Re: RE: libpq debug log

2019-03-15 Thread David Steele

On 3/5/19 5:28 PM, David Steele wrote:

On 3/5/19 11:48 AM, Iwata, Aya wrote:


So is it alright to add these information to the new/proposed 
PQtrace() default output?


I agree with Andres [1] that it's not very clear where this patch is 
going and we should push the target to PG13.


Hearing no opinions to the contrary, I have set the target version to PG13.

Regards,
--
-David
da...@pgmasters.net



Re: Show a human-readable n_distinct in pg_stats view

2019-03-15 Thread Maxence Ahlouche
On Wed, 13 Mar 2019 at 15:14, Maxence Ahlouche
 wrote:
> * Should I add this patch to the commitfest?

I added it: https://commitfest.postgresql.org/23/2061/



Re: WIP: Avoid creation of the free space map for small tables

2019-03-15 Thread Amit Kapila
On Thu, Mar 14, 2019 at 7:46 PM Amit Kapila  wrote:
>
> On Thu, Mar 14, 2019 at 12:37 PM John Naylor
>  wrote:
> >
> > On Thu, Mar 14, 2019 at 2:17 PM Amit Kapila  wrote:
> > >
> > > 1. Added an Assert in new_cluster_needs_fsm() that old cluster version
> > > should be >= 804 as that is where fsm support has been added.
> >
> > There is already an explicit check for 804 in the caller,
> >
>
> Yeah, I know that, but I have added it to prevent this function being
> used elsewhere.  OTOH, maybe you are right that as per current code it
> is superfluous, so we shouldn't add this assert.
>

I have committed the latest version of this patch.  I think we can
wait for a day or two see if there is any complain from buildfarm or
in general and then we can close this CF entry.  IIRC, this was the
last patch in the series, right?


-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Change ereport level for QueuePartitionConstraintValidation

2019-03-15 Thread Sergei Kornilov
Hello

Per discussion started here: 
https://www.postgresql.org/message-id/CA%2BTgmoZWSLUjVcc9KBSVvbn%3DU5QRgW1O-MgUX0y5CnLZOA2qyQ%40mail.gmail.com

We have INFO ereport messages in alter table attach partition like this:
> partition constraint for table \"%s\" is implied by existing constraints

Personally I like this message and not want remove it.
But recently my colleague noticed that INFO level is written to stderr by psql. 
For example, simple command

> psql -c "alter table measurement attach partition measurement_y2006m04 for 
> values from ('2006-04-01') to ('2006-05-01');"

can produce stderr output like error, but this is expected behavior from 
successful execution.

And INFO level always sent to client regardless of client_min_messages as 
clearly documented in src/include/utils/elog.h

So now I am +1 to idea of change error level for this messages. I attach patch 
to lower such ereport to DEBUG1 level

thanks

PS: possible we can change level to NOTICE but I doubt we will choose this way

regards, Sergeidiff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 515c29072c..bd903ecefc 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -14621,11 +14621,11 @@ QueuePartitionConstraintValidation(List **wqueue, Relation scanrel,
 	if (PartConstraintImpliedByRelConstraint(scanrel, partConstraint))
 	{
 		if (!validate_default)
-			ereport(INFO,
+			ereport(DEBUG1,
 	(errmsg("partition constraint for table \"%s\" is implied by existing constraints",
 			RelationGetRelationName(scanrel;
 		else
-			ereport(INFO,
+			ereport(DEBUG1,
 	(errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
 			RelationGetRelationName(scanrel;
 		return;
diff --git a/src/backend/partitioning/partbounds.c b/src/backend/partitioning/partbounds.c
index 5b897d50ee..66cfeb21e0 100644
--- a/src/backend/partitioning/partbounds.c
+++ b/src/backend/partitioning/partbounds.c
@@ -1181,7 +1181,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 	 */
 	if (PartConstraintImpliedByRelConstraint(default_rel, def_part_constraints))
 	{
-		ereport(INFO,
+		ereport(DEBUG1,
 (errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
 		RelationGetRelationName(default_rel;
 		return;
@@ -1224,7 +1224,7 @@ check_default_partition_contents(Relation parent, Relation default_rel,
 			if (PartConstraintImpliedByRelConstraint(part_rel,
 	 def_part_constraints))
 			{
-ereport(INFO,
+ereport(DEBUG1,
 		(errmsg("updated partition constraint for default partition \"%s\" is implied by existing constraints",
 RelationGetRelationName(part_rel;
 
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 2a26aa3a89..57e9dffce8 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -3480,11 +3480,9 @@ ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4);
 ALTER TABLE list_parted2 DETACH PARTITION part_3_4;
 ALTER TABLE part_3_4 ALTER a SET NOT NULL;
 ALTER TABLE list_parted2 ATTACH PARTITION part_3_4 FOR VALUES IN (3, 4);
-INFO:  partition constraint for table "part_3_4" is implied by existing constraints
 -- check if default partition scan skipped
 ALTER TABLE list_parted2_def ADD CONSTRAINT check_a CHECK (a IN (5, 6));
 CREATE TABLE part_55_66 PARTITION OF list_parted2 FOR VALUES IN (55, 66);
-INFO:  updated partition constraint for default partition "list_parted2_def" is implied by existing constraints
 -- check validation when attaching range partitions
 CREATE TABLE range_parted (
 	a int,
@@ -3509,7 +3507,6 @@ CREATE TABLE part2 (
 	b int NOT NULL CHECK (b >= 10 AND b < 18)
 );
 ALTER TABLE range_parted ATTACH PARTITION part2 FOR VALUES FROM (1, 10) TO (1, 20);
-INFO:  partition constraint for table "part2" is implied by existing constraints
 -- Create default partition
 CREATE TABLE partr_def1 PARTITION OF range_parted DEFAULT;
 -- Only one default partition is allowed, hence, following should give error
@@ -3537,13 +3534,11 @@ ERROR:  partition constraint is violated by some row
 DELETE FROM part_5_a WHERE a NOT IN (3);
 ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IS NOT NULL AND a = 5);
 ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5);
-INFO:  partition constraint for table "part_5" is implied by existing constraints
 ALTER TABLE list_parted2 DETACH PARTITION part_5;
 ALTER TABLE part_5 DROP CONSTRAINT check_a;
 -- scan should again be skipped, even though NOT NULL is now a column property
 ALTER TABLE part_5 ADD CONSTRAINT check_a CHECK (a IN (5)), ALTER a SET NOT NULL;
 ALTER TABLE list_parted2 ATTACH PARTITION part_5 FOR VALUES IN (5);
-INFO:  partition constraint for table "part_5" is implied by existing constraints
 -- Check the case where attnos of 

Re: Feature: temporary materialized views

2019-03-15 Thread David Steele

On 3/15/19 3:19 AM, Mitar wrote:


On Thu, Mar 14, 2019 at 7:56 AM Andreas Karlsson  wrote:

Yeah, your patch is sadly stuck behind the refactoring, and the
refactoring proved to be harder to do than I initially thought. The
different code paths for executing CREATE MATERIALIZED VIEW are so
different that it is hard to find a good common interface.

So there is unfortunately little you can do here other than wait for me
or someone else to do the refactoring as I cannot see your patch getting
accepted without keeping the existing restrictions on side effects for
CREATE MATERIALIZED VIEW.


Sounds good. I will wait.


This patch has been marked as Returned with Feedback since it is not 
clear when the refactoring it depends on will be done.


You can submit to a future commitfest when you are able to produce a new 
patch.


Regards,
--
-David
da...@pgmasters.net



Re: Feature: triggers on materialized views

2019-03-15 Thread David Steele

On 3/14/19 12:05 PM, Mitar wrote:

Hi!

On Thu, Mar 7, 2019 at 12:13 AM David Steele  wrote:

There doesn't seem to be consensus on whether or not we want this patch.
Peter has issues with the way it works and Andres [1] thinks it should
be pushed to PG13 or possibly rejected.

I'll push this to PG13 for now.


Sorry, I am new to PostgreSQL development process. So this has been
pushed for maybe (if at all) release planned for 2020 and is not
anymore in consideration for PG12 to be released this year? From my
very inexperienced eye this looks like a very far push. What is
expected to happen in the year which would make it clearer if this is
something which has a chance of going and/or what should be improved,
if improving is even an option? I worry that nothing will happen for a
year and we will all forget about this and then we will be all to
square zero.

I must say that i do not really see a reason why this would not be
included. I mean, materialized views are really just a sugar on top of
having a table you refresh with a stored query, and if that table can
have triggers, why not also a materialized view.


The reason is that you have not gotten any committer support for this 
patch or attracted significant review, that I can see.  On the contrary, 
three committers have expressed doubts about all or some of the patch 
and it doesn't seem to me that their issues have been addressed.


This is also a relatively new patch which makes large changes -- we 
generally like to get those in earlier than the second-to-last CF.


I can only spend so much time looking at each patch, so Peter, Álvaro, 
or Andres are welcome to jump in and let me know if I have it wrong.


What you need to be doing for PG13 is very specifically addressing 
committer concerns and gathering a consensus that the behavior of this 
patch is the way to go.


Regards,
--
-David
da...@pgmasters.net



Re: [HACKERS] generated columns

2019-03-15 Thread Peter Eisentraut
On 2019-01-15 08:13, Michael Paquier wrote:
> +/*
> + * Thin wrapper around libpq to obtain server version.
> + */
> +static int
> +libpqrcv_server_version(WalReceiverConn *conn)
> This should be introduced in separate patch in my opinion (needed
> afterwards for logirep).

I have committed this separately.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: Timeout parameters

2019-03-15 Thread MikalaiKeida
> Oops, unfortunately, PQcancel() does not follow any timeout 
parameters...  It uses a blocking socket.

> Also, I still don't think it's a good idea to request cancellation. 
socket_timeout should be sufficiently longer than the usually expected 
query execution duration.  And long-running queries should be handled 
bystatement_timeout which indicates the maximum tolerable query execution 
duration.
> For example, if the usually expected query execution time is 100 ms, 
statement_timeout can be set to 3 seconds and socket_timeout to 5 seconds.

Based on your comment it seems to me that 'socket_timeout' should be 
connected with statement_timeout. I mean that end-user should wait 
statement_timeout + 'socket_timeout' for returning control. It looks much 
more safer for me.

Best regards,
Mikalai Keida

Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 4:26 PM Michael Banck 
wrote:

> Hi,
>
> Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> > Given that the failure is data corruption, I don't think big fat
> > warning is enough. We should really make it impossible to start up the
> > postmaster by mistake during the checksum generation. People don't
> > read the documentation until it's too late. And it might not even be
> > under their control - some automated tool might go in and try to start
> > postgres, and boom, corruption.
>
> I guess you're right.
>
> > One big-hammer method could be similar to what pg_upgrade does --
> > temporarily rename away the controlfile so postgresql can't start, and
> > when done, put it back.
>
> That sounds like a good solution to me. I've made PoC patch for that,
> see attached.
>

The downside with this method is we can't get a nice error message during
the attempted startup. But it should at least be safe, which is the most
important part. And at least it's clear what's happening once you list the
files and see the name of the temporary one.

//Magnus


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Thu, Mar 14, 2019 at 4:54 PM Michael Banck 
wrote:

> Hi,
>
> Am Donnerstag, den 14.03.2019, 15:32 +0100 schrieb Magnus Hagander:
> > On Thu, Mar 14, 2019 at 3:28 PM Christoph Berg  wrote:
> > > Re: Magnus Hagander 2019-03-14  zmb8qck7ndmchey5...@mail.gmail.com>
> > > > Are you suggesting we should support running with a master with
> checksums
> > > > on and a standby with checksums off in the same cluster? That
> seems.. Very
> > > > fragile.
> > >
> > > The case "shut down master and standby, run pg_checksums on both, and
> > > start them again" should be supported. That seems safe to do, and a
> > > real-world use case.
> >
> > I can agree with that, if we can declare it safe. You might need some
> > way to ensure it was shut down cleanly on both sides, I'm guessing.
> >
> > > Changing the system id to a random number would complicate this.
> > >
> > > (Horrible idea: maybe just adding 1 (= checksum version) to the system
> > > id would work?)
> >
> > Or any other way of changing the systemid in a predictable way would
> > also work, right? As long as it's done the same on both sides. And
> > that way it would look different to any system that *doesn't* know
> > what it means, which is probably a good thing.
>
> If we change the system identifier, we'll have to reset the WAL as well
> or otherwise we'll get "PANIC:  could not locate a valid checkpoint
> record" on startup.  So even if we do it predictably on both primary and
> standby I guess the standby would need to be re-cloned?
>
> So I think an option that skips that for people who know what they are
> doing with the streaming replication setup would be required, should we
> decide to bump the system identifier.
>

Ugh. I did not think of that one. But yes, the main idea there would be
that if you turn on checksums on the primary then you have to re-clone all
standbys. That's what happens if we change the system idenfier -- that's
why it's the "big hammer method".

But yeah, an option to avoid it could be one way to deal with it. If we
could find some safer way to handle it that'd be better, but otherwise
changing the sysid by default and having an option to turn it off could be
one way to deal with it.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Magnus Hagander
On Fri, Mar 15, 2019 at 1:49 AM Michael Paquier  wrote:

> On Thu, Mar 14, 2019 at 03:23:59PM +0100, Magnus Hagander wrote:
> > Are you suggesting we should support running with a master with checksums
> > on and a standby with checksums off in the same cluster? That seems..
> Very
> > fragile.
>
> Well, saying that it is supported is a too big term for that.  What I
> am saying is that the problems you are pointing out are not as bad as
> you seem to mean they are as long as an operator does not copy on-disk
> pages from one node to the other one.  Knowing that checksums apply
> only to pages flushed on disk on a local node, everything going
> through WAL for availability is actually able to work fine:
> - PITR
> - archive recovery.
> - streaming replication.
> Reading the code I understand that.  I have as well done some tests
> with a primary/standby configuration to convince myself, using pgbench
> on both nodes (read-write for the primary, read-only on the standby),
> with checkpoint (or restart point) triggered on each node every 20s.
> If one node has checksum enabled and the other checksum disabled, then
> I am not seeing any inconsistency.
>
> However, anything which does a physical copy of pages could get things
> easily messed up if one node has checksum disabled and the other
> enabled.  One such tool is pg_rewind.  If the promoted standby has
> checksums disabled (becoming the source), and the old master to rewind
> has checksums enabled, then the rewind could likely copy pages which
> have not their checksums set correctly, resulting in incorrect
> checksums on the old master.
>
> So yes, it is easy to mess up things, however this does not apply to
> all configurations.  The suggestion from Christoph to enable checksums
> on both nodes separately would work, and personally I find the
> suggestion to update the system ID after enabling or disabling
> checksums an over-engineered design because of the reasons in the
> first part of this email (it is technically doable to enable checksums
> with a minimum downtime and a failover), so my recommendation would be
> to document that when enabling checksums on one instance in a cluster,
> it should be applied to all instances as it could cause problems with
> any tools performing a physical copy of relation files or blocks.
>

As I said, that's a big hammer. I'm all for having a better solution. But I
don't think it's acceptable not to have *any* defense against it, given how
bad corruption it can lead to.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Willing to fix a TODO case in libpq module

2019-03-15 Thread Wu, Fei
Hi,all

On website: https://wiki.postgresql.org/wiki/Todo#libpq
I found that in libpq module,there is a todo case:
---
Prevent PQfnumber() from lowercasing unquoted column names
PQfnumber() should never have been doing lowercasing, but historically it has 
so we need a way to prevent it

---
I am interested in this one. So ,Had it be fixed?
If not, I am willing to do so.
In that way ,could anyone tell me the detail features of this function it 
supported to be?
I will try to fix it~


--
Best Regards
-
Wu Fei
Development Department II
Software Division III
Nanjing Fujitsu Nanda Software Tech. Co., Ltd.(FNST)
ADDR.: No.6 Wenzhu Road, Software Avenue,
   Nanjing, 210012, China
TEL  : +86+25-86630566-9356
COINS: 7998-9356
FAX: +86+25-83317685
MAIL:wufei.f...@cn.fujitsu.com
http://www.fujitsu.com/cn/fnst/
---





Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 09:04:51AM +0100, Michael Banck wrote:
> ISTM this would not run fsync_parent_path() unless the first fsync fails
> which is not the intended use. I guess we need two ifs here?

Yes, let's do that.  Let's see if others have input to offer about the
patch.  This thread is gathering attention, which is good.
--
Michael


signature.asc
Description: PGP signature


Re: Problem with default partition pruning

2019-03-15 Thread Kyotaro HORIGUCHI
Hello.

At Fri, 15 Mar 2019 15:05:41 +0900, "Yuzuko Hosoya" 
 wrote in 
<001901d4daf5$1ef4f640$5cdee2c0$@lab.ntt.co.jp>
> v2 patch attached.
> Could you please check it again?

I have some comments on the patch itself.

The patch relies on the fact(?) that the lowest index is always
-1 in range partition and uses it as pseudo default
partition. I'm not sure it is really the fact and anyway it
donsn't seem the right thing to do. Could you explain how it
works, not what you did in this patch?


L96:
>  /* There can only be zero or one matching partition. */
> -if (partindices[off + 1] >= 0)
> -result->bound_offsets = bms_make_singleton(off + 1);
> -else
> -result->scan_default =
> -partition_bound_has_default(boundinfo);
> +result->bound_offsets = bms_make_singleton(off + 1);

The comment had a meaning for the old code. Seems to need rewrite?

L183:
> +/*   
>  
> + * All bounds are greater than the key, so we could only 
>  
> + * expect to find the lookup key in the default partition.   
>  
> + */

Long trailing spaces are attached to every line without
substantial modification.

L198:
> - * inclusive, no need add the adjacent partition.
> + * inclusive, no need add the adjacent partition.  If 'off' 
> is
> + * -1 indicating that all bounds are greater, then we simply
> + * end up adding the first bound's offset, that is, 0.

 off doesn't seem to be -1 there.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: Is PREPARE of ecpglib thread safe?

2019-03-15 Thread Kuroda, Hayato
Dear Matsumura-san, Horiguchi-san,

We should check the specfication of Pro*c before coding
because this follows its feature.
I'll test some cases and send you a simple report.

Best Regards,
Hayato Kuroda
Fujitsu LIMITED






RE: Fix typo in test code comments

2019-03-15 Thread Kato, Sho
> Committed.

Thanks!

Regards,
sho kato
> -Original Message-
> From: Michael Paquier [mailto:mich...@paquier.xyz]
> Sent: Friday, March 15, 2019 4:24 PM
> To: Kato, Sho/加藤 翔 
> Cc: 'Kyotaro HORIGUCHI' ;
> pgsql-hack...@postgresql.org
> Subject: Re: Fix typo in test code comments
> 
> On Fri, Mar 15, 2019 at 05:49:47AM +, Kato, Sho wrote:
> > Oops, thank you for your advice.
> > I fixed it.
> 
> Committed.
> --
> Michael




Re: libpq environment variables in the server

2019-03-15 Thread Peter Eisentraut
On 2019-03-15 05:00, Noah Misch wrote:
> I consider the following style more idiomatic:
> 
>  {
>  local %ENV;
>  delete $ENV{PGAPPNAME};
>  ...
>  }

That doesn't work because the first line clears the entire environment.

What does work is

{
delete local $ENV{PGAPPNAME};
...
}

But that is documented as new in Perl 5.12.0, so we might not be able to
use it.  It appears to work in the 5.8.9 I have lying around, so I'm
confused.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Offline enabling/disabling of data checksums

2019-03-15 Thread Michael Banck
Hi,

Am Freitag, den 15.03.2019, 11:50 +0900 schrieb Michael Paquier:
> On Thu, Mar 14, 2019 at 04:26:20PM +0100, Michael Banck wrote:
> > Am Donnerstag, den 14.03.2019, 15:26 +0100 schrieb Magnus Hagander:
> > > One big-hammer method could be similar to what pg_upgrade does --
> > > temporarily rename away the controlfile so postgresql can't start, and
> > > when done, put it back.
> > 
> > That sounds like a good solution to me. I've made PoC patch for that,
> > see attached.
> 
> Indeed.  I did not know this trick from pg_upgrade.  We could just use
> the same.
> 
> > The only question is whether pg_checksums should try to move pg_control
> > back (i) on failure (ii) when interrupted?
> 
> Yes, we should have a callback on SIGINT and SIGTERM here which just
> moves back in place the control file if the temporary one exists.  I
> have been able to grab some time to incorporate the feedback gathered
> on this thread, and please find attached a new version of the patch to
> add --enable/--disable.  

Thanks!

One thing stood out to me while quickly looking over it:

+   /*
+    * Flush the control file and its parent path to make the change
+    * durable.
+    */
+   if (fsync_fname(controlfile_path, false, progname) != 0 ||
+   fsync_parent_path(controlfile_path, progname) != 0)
+   {
+   /* errors are already logged on failure */
+   exit(1);
+   }

ISTM this would not run fsync_parent_path() unless the first fsync fails
which is not the intended use. I guess we need two ifs here?


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutz



RE: Timeout parameters

2019-03-15 Thread Tsunakawa, Takayuki
From: mikalaike...@ibagroup.eu [mailto:mikalaike...@ibagroup.eu]
> In case of failure PQcancel() terminates in 'socket_timeout'. So, control
> to the end-user in such a failure situation will be returned in 2 *
> 'socket_timeout' interval. It is much better than hanging forever in some
> specific cases. Moreover, such solution will not lead to the overloading
> of PostgreSQL server by abnormally ignored 'heavy' queries results by
> end-users.

Oops, unfortunately, PQcancel() does not follow any timeout parameters...  It 
uses a blocking socket.

Also, I still don't think it's a good idea to request cancellation.  
socket_timeout should be sufficiently longer than the usually expected query 
execution duration.  And long-running queries should be handled by 
statement_timeout which indicates the maximum tolerable query execution 
duration.
For example, if the usually expected query execution time is 100 ms, 
statement_timeout can be set to 3 seconds and socket_timeout to 5 seconds.


Regards
Takayuki Tsunakawa







Re: current_logfiles not following group access and instead follows log_file_mode permissions

2019-03-15 Thread Haribabu Kommi
On Tue, Mar 12, 2019 at 5:03 PM Michael Paquier  wrote:

> On Tue, Feb 26, 2019 at 12:22:53PM +1100, Haribabu Kommi wrote:
> > I checked the code why the current_logfiles is not implemented as
> > shared memory and found that the current syslogger doesn't attach to
> > the shared memory of the postmaster. To support storing the
> > current_logfiles in shared memory, the syslogger process also needs
> > to attach to the shared memory, this seems to be a new
> > infrastructure change.
>
> I don't think you can do that anyway and we should not do it.  Shared
> memory can be reset after a backend exits abnormally, but the
> syslogger lives across that.  What you sent upthread to improve the
> documentation is in my opinion sufficient:
>
> https://www.postgresql.org/message-id/cajrrpge-v2_lmfd9nhrbejjy3vvokjwy3w_h+fs2nxcjg3p...@mail.gmail.com
>
> I would not have split the paragraph you broke into two, but instead
> just add this part in-between:
> +   
> +Permissions 0640 are recommended to be
> compatible with
> +initdb option
> --allow-group-access.
> +   
> Any objections in doing that?
>

If I remember correctly, in one of the mails, you mentioned that having a
separate
para is better. Attached is the updated patch as per your suggestion.

IMO, this update is just a recommendation to the user, and sometimes it is
still
possible that there may be strict permissions for the log file even the
data directory
is allowed for the group access. So I feel it is still better to update the
permissions
of the current_logfiles to the database files permissions than log file
permissions.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-log_file_mode-recommended-value-update_v2.patch
Description: Binary data


RE: Is PREPARE of ecpglib thread safe?

2019-03-15 Thread Matsumura, Ryo
Horiguchi-san, Kuroda-san

> Horiguchi-san wrote:
> > A namespace of declared statement is not connection independent.
> > Therefore, we must manage the namespce in global and consider about race 
> > condition.
> 
> Right, and but thread independent.

I was wrong. I understand that DECLARE STATEMENT should be same policy as the 
combination of PREPARE STATEMENT and SET CONNECTION.
We should fix the current implementation of DECLARE STATEMENT.

Current:
  t1:Thread1: exec sql at conn1 declare st1 statement;
  t2:Thread2: exec sql at conn2 declare st1 statement;  // NG

ToBe:
  t1:Thread1: exec sql at conn1 declare st1 statement;
  t2:Thread2: exec sql at conn2 declare st1 statement;  // OK
  t3:Thread2: exec sql prepared st1 from "select 1";// OK: prepared on conn2
  t4:Thread1: exec sql execute st1; // NG: not prepared
  t5:Thread2: exec sql execute st1; // OK: executed on conn2

  t1:Thread1: exec sql at conn1 declare st1 statement;
  t2:Thread1: exec sql at conn2 declare st1 statement;  // NG

Regards
Ryo Matsumura




Re: hyrax vs. RelationBuildPartitionDesc

2019-03-15 Thread Heikki Linnakangas

On 15/03/2019 00:08, Tom Lane wrote:

What I'm thinking, therefore, is that 2455ab488 had the right idea but
didn't take it far enough.  We should remove the temp-context logic it
added to RelationBuildPartitionDesc and instead put that one level up,
in RelationBuildDesc, where the same temp context can serve all of these
leak-prone sub-facilities.

Possibly it'd make sense to conditionally compile this so that we only
do it in a CLOBBER_CACHE_ALWAYS build.  I'm not very sure about that,
but arguably in a normal build the overhead of making and destroying
a context would outweigh the cost of the leaked memory.  The main
argument I can think of for doing it all the time is that having memory
allocation work noticeably differently in CCA builds than normal ones
seems like a recipe for masking normal-mode bugs from the CCA animals.


Having CLOBBER_CACHE_ALWAYS behave differently sounds horrible.

We maintain a free list of AllocSetContexts nowadays, so creating a 
short-lived context should be pretty cheap. Or if it's still too 
expensive, we could create one short-lived context as a child of 
TopMemoryContext, and reuse that on every call, resetting it at the end 
of the function.


- Heikki



RE: Timeout parameters

2019-03-15 Thread MikalaiKeida
Hello Takayuki-san,

> Yes, so I think it would be necessary to describe how to set 
socket_timeout with relation to other timeout parameters -- socket_timeout 
> statement_timeout, emphasizing that socket_timeout is not for canceling 
long-running queries but for returning control to the client.

For me it does not look enough.
If you would like to implement the 'socket_timeout' parameter, could you 
pay attention on the Fabien Coelho comment?

> The fact that no answer data is received may mean that it takes time to 
> compute the result, so cancelling seems appropriate to me, rather than 
> severing the connection and starting a new one immediately, leaving the 
> server loaded with its query.

This comment is very important to pay attention on.
Let's imagine once more:
1. end-user makes a query
2. the query has not completed within this timeout due to any reason: the 
query is too 'heavy', some problem with OS happened, PostgreSQL server 
terminated
3. if we simple close connection, the first case is negative by mistake. 
To avoid such a situation I think it would be better to call a PQcancel(). 
In case of failure PQcancel() terminates in 'socket_timeout'. So, control 
to the end-user in such a failure situation will be returned in 2 * 
'socket_timeout' interval. It is much better than hanging forever in some 
specific cases. Moreover, such solution will not lead to the overloading 
of PostgreSQL server by abnormally ignored 'heavy' queries results by 
end-users.

What do you think about it? 

Best regards,
Mikalai Keida

Re: Fix typo in test code comments

2019-03-15 Thread Michael Paquier
On Fri, Mar 15, 2019 at 05:49:47AM +, Kato, Sho wrote:
> Oops, thank you for your advice.
> I fixed it.

Committed.
--
Michael


signature.asc
Description: PGP signature


Re: pg_basebackup ignores the existing data directory permissions

2019-03-15 Thread Haribabu Kommi
On Fri, Mar 15, 2019 at 10:33 AM Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 2019-03-09 02:19, Haribabu Kommi wrote:
> > Yes, I agree that it may be a problem if the existing data directory
> > permissions
> > are 0700 to changing it to 0750. But it may not be a problem for the
> > scenarios,
> > where the existing data permissions  >=0750, to the upstream permissions.
> > Because user must need to change anyway to start the server, otherwise
> > server
> > start fails, and also the files inside the data folder follows the
> > permissions of the
> > upstream data directory.
> >
> > usually production systems follows same permissions are of upstream, I
> don't
> > see a problem in following the same for development environment also?
>
> I think the potential problems of getting this wrong are bigger than the
> issue we are trying to fix.
>

Thanks for your opinion. I am not sure exactly what are the problems.
But anyway I can go with your suggestion.

How about changing the data directory permissions for the -R scenario?
if executing pg_basebackup on to an existing data directory is a common
scenario? or leave it?

Regards,
Haribabu Kommi
Fujitsu Australia


Re: Is PREPARE of ecpglib thread safe?

2019-03-15 Thread Kyotaro HORIGUCHI
Oops.

At Fri, 15 Mar 2019 15:33:50 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI 
 wrote in 
<20190315.153350.226491548.horiguchi.kyot...@lab.ntt.co.jp>
> >   // If ecpglib didn't reject the above, ecpglib cannot judge
> >   // which connection the followings should be executed on.
> >   exec sql prepare st1 from "select 1";
> >   exec sql execute st1;
> 
> I'm not sure about ECPG, but according to the documentation, the
> following statements should work correctly.
> 
>SQL SET CONNECTION con1;

Of course this is missing prefixing "EXEC".

>EXEC SQL PREPARE st1 FROM "select 1";
>EXEC SQL EXECUTE st1;
> 
> should succeed and executed on con1.
> 
> > Kuroda-san, is it right?
> > If it's right, I will fix it with using pthread_lock.
> 
> Mmm. Are you saying that prepared statements on ECPG should have
> names in global namespace and EXECUTE should implicitly choose
> the underlying connection automatically from the name of a
> prepared statement? I don't think it is the right direction.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Is PREPARE of ecpglib thread safe?

2019-03-15 Thread Kyotaro HORIGUCHI
At Fri, 15 Mar 2019 05:27:01 +, "Matsumura, Ryo" 
 wrote in 
<03040DFF97E6E54E88D3BFEE5F5480F737AC3F24@G01JPEXMBYT04>
> Hi Horiguchi-san, Kuroda-san
> 
> Horiguchi-san, thank you for your comment.
> 
> I have a question.
> A bug of StatementCache is occurred in previous versions.
> Should a patch be separated?
> 
> > Horiguchi-san wrote:
> > It seems like a local cache of server-side data, which is similar
> > to catcache on server side for each process. 
> 
> I agree.
> I will fix it with using pthread_setspecific like descriptor.c.
> 
> > I don't think
> > prepared statements is in that category. A prepared statement is
> > bonded to a connection, not to a thread. Different threads can
> > execute the same prepared statement on the same connection.
> 
> A namespace of declared statement is not connection independent.
> Therefore, we must manage the namespce in global and consider about race 
> condition.

Right, and but thread independent.

> For example, ecpglib must refer the information of (A) when ecpglib executes 
> (B)
> in order to occur "double declare" error.
> 
>   (A) exec sql at conn1 declare st1 statement;
>   (B) exec sql at conn2 declare st1 statement;

On an interactive SQL environment like psql, we can declar
pareared statements with the same name on different
connections. Do you mean you are going to implement different way
on ECPG? Actually the current ECPGprepare seems already managing
prepared statements separately for each connections. This is
naturally guarded by per-connection concurrencly control that
applications should do.

>  this = ecpg_find_prepared_statement(name, con, );

What you showed at the beginning of this thread was the sutff for
auto prepare, the name of which is generated using the global
variable nextStmtID and stored into global stmtCacheEntries. They
are not guarded at all and seem to need multithread-protection.

>   // If ecpglib didn't reject the above, ecpglib cannot judge
>   // which connection the followings should be executed on.
>   exec sql prepare st1 from "select 1";
>   exec sql execute st1;

I'm not sure about ECPG, but according to the documentation, the
following statements should work correctly.

   SQL SET CONNECTION con1;
   EXEC SQL PREPARE st1 FROM "select 1";
   EXEC SQL EXECUTE st1;

should succeed and executed on con1.

> Kuroda-san, is it right?
> If it's right, I will fix it with using pthread_lock.

Mmm. Are you saying that prepared statements on ECPG should have
names in global namespace and EXECUTE should implicitly choose
the underlying connection automatically from the name of a
prepared statement? I don't think it is the right direction.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Sparse bit set data structure

2019-03-15 Thread Julien Rouhaud
On Thu, Mar 14, 2019 at 4:37 PM Julien Rouhaud  wrote:
>
> +   if (newitem <= sbs->last_item)
> +   elog(ERROR, "cannot insert to sparse bitset out of order");
>
> Is there any reason to disallow inserting duplicates?  AFAICT nothing
> prevents that in the current code.  If that's intended, that probably
> should be documented.

That of course won't work well with SBS_LEAF_BITMAP.  I'd still prefer
a more explicit error message.



RE: proposal: pg_restore --convert-to-text

2019-03-15 Thread Imai, Yoshikazu
Hi Jose,

Sorry for my late reply.

On Wed, Mar 6, 2019 at 10:58 AM, José Arthur Benetasso Villanova wrote:
> On Thu, 28 Feb 2019, Imai, Yoshikazu wrote:
> 
> > Is there no need to rewrite the Description in the Doc to state we should
> specify either -d or -f option?
> > (and also it might be better to write if -l option is given, neither
> > -d nor -f option isn't necessarily needed.)
> 
> Since the default part of text was removed, looks ok to me.

Ah, yeah, after looking again, I also think it's ok.

> > I also have the simple question in the code.
> >
> > I thought the below if-else condition
> >
> > +   if (filename && strcmp(filename, "-") == 0)
> > +   fn = fileno(stdout);
> > +   else if (filename)
> > fn = -1;
> >else if (AH->FH)
> >
> > can also be written by the form below.
> >
> >if (filename)
> >{
> >if(strcmp(filename, "-") == 0)
> >fn = fileno(stdout);
> >else
> >fn = -1;
> >}
> >else if (AH->FH)
> >
> > I think the former one looks like pretty, but which one is preffered?
> 
> Aside the above question, I tested the code against a up-to-date
> repository. It compiled, worked as expected and passed all tests.

It still can be applied to HEAD by cfbot.


Upon committing this, we have to care this patch break backwards compatibility, 
but I haven't seen any complaints so far. If there are no objections, I will 
set this patch to ready for committer.

--
Yoshikazu Imai


RE: Problem with default partition pruning

2019-03-15 Thread Yuzuko Hosoya
Hi Thibaut,

Thanks a lot for your test and comments.

> 
> Le 28/02/2019 à 09:26, Imai, Yoshikazu a écrit :
> > Hosoya-san
> >
> > On Wed, Feb 27, 2019 at 6:51 AM, Yuzuko Hosoya wrote:
> >>> From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> >>> Sent: Wednesday, February 27, 2019 11:22 AM
> >>>
> >>> Hosoya-san,
> >>>
> >>> On 2019/02/22 17:14, Yuzuko Hosoya wrote:
>  Hi,
> 
>  I found the bug of default partition pruning when executing a range
> >> query.
>  -
>  postgres=# create table test1(id int, val text) partition by range
>  (id); postgres=# create table test1_1 partition of test1 for values
>  from (0) to (100); postgres=# create table test1_2 partition of
>  test1 for values from (150) to (200); postgres=# create table
>  test1_def partition of test1 default;
> 
>  postgres=# explain select * from test1 where id > 0 and id < 30;
> QUERY PLAN
>  
>   Append  (cost=0.00..11.83 rows=59 width=11)
> ->  Seq Scan on test1_1  (cost=0.00..5.00 rows=58 width=11)
>   Filter: ((id > 0) AND (id < 30))
> ->  Seq Scan on test1_def  (cost=0.00..6.53 rows=1 width=12)
>   Filter: ((id > 0) AND (id < 30))
>  (5 rows)
> 
>  There is no need to scan the default partition, but it's scanned.
>  -
> 
>  In the current implement, whether the default partition is scanned
>  or not is determined according to each condition of given WHERE
>  clause at get_matching_range_bounds().  In this example,
>  scan_default is set true according to id > 0 because id >= 200
>  matches the default partition.  Similarly, according to id < 30,
> >> scan_default is set true.
>  Then, these results are combined according to AND/OR at
> >> perform_pruning_combine_step().
>  In this case, final result's scan_default is set true.
> 
>  The modifications I made are as follows:
>  - get_matching_range_bounds() determines only offsets of range bounds
>    according to each condition
>  - These results are combined at perform_pruning_combine_step()
>  - Whether the default partition is scanned or not is determined at
>    get_matching_partitions()
> 
>  Attached the patch.  Any feedback is greatly appreciated.
> >>> Thank you for reporting.  Can you please add this to March CF in
> >>> Bugs category so as not to lose
> >> track
> >>> of this?
> >>>
> >>> I will try to send review comments soon.
> >>>
> >> Thank you for your reply.  I added this to March CF.
> > I tested with simple use case and I confirmed it works correctly like below.
> >
> > In case using between clause:
> > postgres=# create table test1(id int, val text) partition by range
> > (id); postgres=# create table test1_1 partition of test1 for values
> > from (0) to (100); postgres=# create table test1_2 partition of test1
> > for values from (150) to (200); postgres=# create table test1_def
> > partition of test1 default;
> >
> > [HEAD]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> > QUERY PLAN
> > --
> > -
> >  Append  (cost=0.00..58.16 rows=12 width=36) (actual time=0.008..0.008 
> > rows=0 loops=1)
> >->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actual 
> > time=0.005..0.005
> rows=0 loops=1)
> >  Filter: ((id >= 0) AND (id <= 50))
> >->  Seq Scan on test1_def  (cost=0.00..29.05 rows=6 width=36) (actual
> time=0.002..0.002 rows=0 loops=1)
> >  Filter: ((id >= 0) AND (id <= 50))
> >
> >
> > [patched]
> > postgres=# explain analyze select * from test1 where id between 0 and 50;
> >QUERY PLAN
> > --
> > ---
> >  Append  (cost=0.00..29.08 rows=6 width=36) (actual time=0.006..0.006 
> > rows=0 loops=1)
> >->  Seq Scan on test1_1  (cost=0.00..29.05 rows=6 width=36) (actual 
> > time=0.004..0.005
> rows=0 loops=1)
> >  Filter: ((id >= 0) AND (id <= 50))
> >
> >
> >
> > I considered about another use case. If default partition contains rows 
> > whose id = 300
> and then we add another partition which have constraints like id >= 300 and 
> id < 400, I thought
> we won't scan the rows anymore. But I noticed we simply can't add such a 
> partition.
> >
> > postgres=# insert into test1 values (300); INSERT 0 1 postgres=#
> > create table test1_3 partition of test1 for values from (300) to
> > (400);
> > ERROR:  updated partition constraint for default partition "test1_def"
> > would be violated by some row
> >
> >
> > So I haven't come up with bad cases so far :)
> >
> > --
> > Yoshikazu Imai
> 
> Hello Yoshikazu-San,
> 
> I