Re: Fix XML handling with DOCTYPE

2019-03-26 Thread Chapman Flack
On 03/26/19 21:39, Ryan Lambert wrote:

> I can't verify technical accuracy for many of the details (nuances between
> XPath 1.0, et. al), but overall my experience with the XML functionality
> lines up with what has been documented here.

By the way, in case it's buried too far back in the email thread now,
much of the early drafting for this happened on the wiki page

https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards

which includes a lot of reference links, including a nice paper by
Andrew Eisenberg and Jim Melton that introduced the major changes
from the SQL:2003 to :2006 editions of SQL/XML.

Cheers,
-Chap




Re: speeding up planning with partitions

2019-03-26 Thread David Rowley
On Wed, 27 Mar 2019 at 18:39, Amit Langote
 wrote:
>
> On 2019/03/27 14:26, David Rowley wrote:
> > Perhaps the way to make this work, at least in the long term is to do
> > in the planner what we did in the executor in d73f4c74dd34.
>
> Maybe you meant 9ddef36278a9?

Probably.

> What would be nice is being able to readily access Relation pointers of
> all tables accessed in a query from anywhere in the planner, whereby, a
> given table is opened only once.

Well, yeah, that's what the commit did for the executor, so it is what
I was trying to get at.

> Note that Tom complained upthread that the patch is introducing
> table_open()'s at random points within the planner, which is something to
> avoid.

Yip. :)  hence my suggestion.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: COPY FROM WHEN condition

2019-03-26 Thread Andres Freund
Hi,

On 2019-01-29 07:22:16 -0800, Andres Freund wrote:
> On January 29, 2019 6:25:59 AM PST, Tomas Vondra 
>  wrote:
> >On 1/29/19 8:18 AM, David Rowley wrote:
> >> ...
> >> Here are my performance tests of with and without your change to the
> >> memory contexts (I missed where you posted your results).
> >> 
> >> $ cat bench.pl
> >> for (my $i=0; $i < 8912891; $i++) {
> >> print "1\n1\n2\n2\n";
> >> }
> >> 36a1281f86c: (with your change)
> >> 
> >> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> >> COPY 35651564
> >> Time: 26825.142 ms (00:26.825)
> >> Time: 27202.117 ms (00:27.202)
> >> Time: 26266.705 ms (00:26.267)
> >> 
> >> 4be058fe9ec: (before your change)
> >> 
> >> postgres=# copy listp from program $$perl ~/bench.pl$$ delimiter '|';
> >> COPY 35651564
> >> Time: 25645.460 ms (00:25.645)
> >> Time: 25698.193 ms (00:25.698)
> >> Time: 25737.251 ms (00:25.737)
> >> 
> >
> >How do I reproduce this? I don't see this test in the thread you
> >linked,
> >so I'm not sure how many partitions you were using, what's the
> >structure
> >of the table etc.
> 
> I think I might have a patch addressing the problem incidentally. For
> pluggable storage I slotified copy.c, which also removes the first
> heap_form_tuple. Quite possible that nothing more is needed. I've
> removed the batch context altogether in yesterday's rebase, there was
> no need anymore.

Here's a version that applies onto HEAD. It needs a bit more cleanup
(I'm not sure I like the addition of ri_batchInsertSlots to store slots
for each partition, there's some duplicated code around slot array and
slot creation, docs for the multi insert callback).

When measuring inserting into unlogged partitions (to remove the
overhead of WAL logging, which makes it harder to see the raw
performance difference), I see a few percent gains of the slotified
version over the current code.  Same with insertions that have fewer
partition switches.

Sorry for posting this here - David, it'd be cool if you could take a
look anyway. You've played a lot with this code.

Btw, for v13, I hope that either I or somebody else cleans up CopyFrom()
a bit - it's gotten really hard to understand. I think we need to split
it into a few pieces.

Greetings,

Andres Freund
>From 3e8e43232b08eac40bb5f634be158b8577fc844f Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Sun, 20 Jan 2019 13:28:26 -0800
Subject: [PATCH v23] tableam: multi_insert and slotify COPY.

Author:
Reviewed-By:
Discussion: https://postgr.es/m/
Backpatch:
---
 src/backend/access/heap/heapam.c |  23 +-
 src/backend/access/heap/heapam_handler.c |   1 +
 src/backend/commands/copy.c  | 322 ---
 src/include/access/heapam.h  |   3 +-
 src/include/access/tableam.h |  14 +
 src/include/nodes/execnodes.h|   6 +
 6 files changed, 204 insertions(+), 165 deletions(-)

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index f3812dd5871..71d14789c98 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -2106,7 +2106,7 @@ heap_prepare_insert(Relation relation, HeapTuple tup, TransactionId xid,
  * temporary context before calling this, if that's a problem.
  */
 void
-heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
+heap_multi_insert(Relation relation, TupleTableSlot **slots, int ntuples,
   CommandId cid, int options, BulkInsertState bistate)
 {
 	TransactionId xid = GetCurrentTransactionId();
@@ -2127,11 +2127,18 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 	saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
    HEAP_DEFAULT_FILLFACTOR);
 
-	/* Toast and set header data in all the tuples */
+	/* Toast and set header data in all the slots */
 	heaptuples = palloc(ntuples * sizeof(HeapTuple));
 	for (i = 0; i < ntuples; i++)
-		heaptuples[i] = heap_prepare_insert(relation, tuples[i],
-			xid, cid, options);
+	{
+		HeapTuple tuple;
+
+		tuple = ExecFetchSlotHeapTuple(slots[i], true, NULL);
+		slots[i]->tts_tableOid = RelationGetRelid(relation);
+		tuple->t_tableOid = slots[i]->tts_tableOid;
+		heaptuples[i] = heap_prepare_insert(relation, tuple, xid, cid,
+			options);
+	}
 
 	/*
 	 * We're about to do the actual inserts -- but check for conflict first,
@@ -2361,13 +2368,9 @@ heap_multi_insert(Relation relation, HeapTuple *tuples, int ntuples,
 			CacheInvalidateHeapTuple(relation, heaptuples[i], NULL);
 	}
 
-	/*
-	 * Copy t_self fields back to the caller's original tuples. This does
-	 * nothing for untoasted tuples (tuples[i] == heaptuples[i)], but it's
-	 * probably faster to always copy than check.
-	 */
+	/* copy t_self fields back to the caller's slots */
 	for (i = 0; i < ntuples; i++)
-		tuples[i]->t_self = heaptuples[i]->t_self;
+		slots[i]->tts_tid = heaptuples[i]->t_self;
 
 	pgstat_count_heap_insert(relation, ntuples);
 }
diff --git 

Re: [HACKERS] Block level parallel vacuum

2019-03-26 Thread Haribabu Kommi
On Wed, Mar 27, 2019 at 1:31 AM Masahiko Sawada 
wrote:

> On Tue, Mar 26, 2019 at 10:19 AM Haribabu Kommi
>  wrote:
> >
> >
> > + for (i = 0; i < nindexes; i++)
> > + {
> > + LVIndStats *s = &(copied_indstats[i]);
> > +
> > + if (s->updated)
> > + lazy_update_index_statistics(Irel[i], &(s->stats));
> > + }
> > +
> > + pfree(copied_indstats);
> >
> > why can't we use the shared memory directly to update the stats once all
> the workers
> > are finished, instead of copying them to a local memory?
>
> Since we cannot use heap_inplace_update() which is called by
> vac_update_relstats() during parallel mode I copied the stats. Is that
> safe if we destroy the parallel context *after* exited parallel mode?
>

OK, understood the reason behind the copy.

Thanks for the updated patches. I reviewed them again and they
are fine. I marked the patch as "ready for committer".

Regards,
Haribabu Kommi
Fujitsu Australia


Re: speeding up planning with partitions

2019-03-26 Thread Amit Langote
On 2019/03/27 14:26, David Rowley wrote:
> On Wed, 27 Mar 2019 at 18:13, Amit Langote
>  wrote:
>>
>> On 2019/03/27 13:50, Amit Langote wrote:
>>> On 2019/03/27 12:06, Amit Langote wrote:
 I wonder if I should rework inherit.c so that its internal interfaces
 don't pass around parent Relation, but make do with the RelOptInfo?  I'll
 need to add tupdesc and reltype fields to RelOptInfo to go ahead with that
 though.
>>>
>>> To give more context on the last sentence, the only place we need the
>>> TupleDesc for is make_inh_translation_list().  Maybe, instead of adding
>>> tupdesc to RelOptInfo, we could add a List of Vars of all attributes of a
>>> table.  To avoid the overhead of always setting it, maybe we could set it
>>> only for inheritance parent tables.  Adding tupdesc to RelOptInfo might be
>>> a hard sell to begin with, because it would need to be pinned and then
>>> released at the end of planning.
>>>
>>> I'll see if I can make this work.
>>
>> Hmm, scratch that.  We'd need to keep around the attribute names too for
>> comparing with the attribute names of the children during translation.
>> Maybe we could store TargetEntry's in the list instead of just Vars, but
>> maybe that's too much.
>>
>> Furthermore, if we make make_inh_translation_list too dependent on this
>> stuff, that will make it hard to use from inheritance_planner(), where we
>> expand the target inheritance without having created any RelOptInfos.
> 
> Perhaps the way to make this work, at least in the long term is to do
> in the planner what we did in the executor in d73f4c74dd34.

Maybe you meant 9ddef36278a9?

What would be nice is being able to readily access Relation pointers of
all tables accessed in a query from anywhere in the planner, whereby, a
given table is opened only once.

Note that Tom complained upthread that the patch is introducing
table_open()'s at random points within the planner, which is something to
avoid.

>  I'm not
> quite sure how exactly you'd know what size to make the array, but if
> it's not something that's happening for PG12, then maybe we can just
> use a List, once we get array based versions of them, at least.

We're going to have to expand the PlannerInfo arrays (simple_rel_*,
append_rel_array, etc.) with the patches here anyway.  Maybe, it will be
just one more array to consider?

Thanks,
Amit





Re: speeding up planning with partitions

2019-03-26 Thread David Rowley
On Wed, 27 Mar 2019 at 18:13, Amit Langote
 wrote:
>
> On 2019/03/27 13:50, Amit Langote wrote:
> > On 2019/03/27 12:06, Amit Langote wrote:
> >> I wonder if I should rework inherit.c so that its internal interfaces
> >> don't pass around parent Relation, but make do with the RelOptInfo?  I'll
> >> need to add tupdesc and reltype fields to RelOptInfo to go ahead with that
> >> though.
> >
> > To give more context on the last sentence, the only place we need the
> > TupleDesc for is make_inh_translation_list().  Maybe, instead of adding
> > tupdesc to RelOptInfo, we could add a List of Vars of all attributes of a
> > table.  To avoid the overhead of always setting it, maybe we could set it
> > only for inheritance parent tables.  Adding tupdesc to RelOptInfo might be
> > a hard sell to begin with, because it would need to be pinned and then
> > released at the end of planning.
> >
> > I'll see if I can make this work.
>
> Hmm, scratch that.  We'd need to keep around the attribute names too for
> comparing with the attribute names of the children during translation.
> Maybe we could store TargetEntry's in the list instead of just Vars, but
> maybe that's too much.
>
> Furthermore, if we make make_inh_translation_list too dependent on this
> stuff, that will make it hard to use from inheritance_planner(), where we
> expand the target inheritance without having created any RelOptInfos.

Perhaps the way to make this work, at least in the long term is to do
in the planner what we did in the executor in d73f4c74dd34.   I'm not
quite sure how exactly you'd know what size to make the array, but if
it's not something that's happening for PG12, then maybe we can just
use a List, once we get array based versions of them, at least.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services




Re: speeding up planning with partitions

2019-03-26 Thread Amit Langote
On 2019/03/27 13:50, Amit Langote wrote:
> On 2019/03/27 12:06, Amit Langote wrote:
>> I wonder if I should rework inherit.c so that its internal interfaces
>> don't pass around parent Relation, but make do with the RelOptInfo?  I'll
>> need to add tupdesc and reltype fields to RelOptInfo to go ahead with that
>> though.
> 
> To give more context on the last sentence, the only place we need the
> TupleDesc for is make_inh_translation_list().  Maybe, instead of adding
> tupdesc to RelOptInfo, we could add a List of Vars of all attributes of a
> table.  To avoid the overhead of always setting it, maybe we could set it
> only for inheritance parent tables.  Adding tupdesc to RelOptInfo might be
> a hard sell to begin with, because it would need to be pinned and then
> released at the end of planning.
> 
> I'll see if I can make this work.

Hmm, scratch that.  We'd need to keep around the attribute names too for
comparing with the attribute names of the children during translation.
Maybe we could store TargetEntry's in the list instead of just Vars, but
maybe that's too much.

Furthermore, if we make make_inh_translation_list too dependent on this
stuff, that will make it hard to use from inheritance_planner(), where we
expand the target inheritance without having created any RelOptInfos.

Thanks,
Amit





Re: Fix XML handling with DOCTYPE

2019-03-26 Thread Chapman Flack
On 03/26/19 21:39, Ryan Lambert wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:tested, passed

Thanks for the review!

> I have two recommendations for features.sgml.  You state: 
> 
>>  relies on the libxml library
> 
> Should this be clarified as the libxml2 library?  That's what I installed
> to build postgres from source (Ubuntu 16/18).  If it is the libxml library
> and the "2" is irrelevant

That's a good catch. I'm not actually sure whether there is any "libxml"
library that isn't libxml2. Maybe there was once and nobody admits to
hanging out with it. Most Google hits on "libxml" seem to be modules
that have libxml in their names and libxml2 as their actual dependency.

  Perl XML:LibXML: "This module is an interface to libxml2"
  Haskell libxml: "Binding to libxml2"
  libxml-ruby: "The Libxml-Ruby project provides Ruby language bindings
for the GNOME Libxml2 ..."

  --with-libxml is the PostgreSQL configure option to make it use libxml2.

  The very web page http://xmlsoft.org/index.html says "The XML C parser
  and toolkit of Gnome: libxml" and is all about libxml2.

So I think I was unsure what convention to follow, and threw up my hands
and went with libxml. I could just as easily throw them up and go with
libxml2. Which do you think would be better?

On 03/26/19 23:52, Tom Lane wrote:
> Do we need to mention that at all?  If you're not building from source,
> it doesn't seem very interesting ... but maybe I'm missing some reason
> why end users would care.

The three places I've mentioned it were the ones where I thought users
might care:

 - why are we stuck at XPath 1.0? It's what we get from the library we use.

 - in what order do we get things out from a (hmm) node-set? Per XPath 1.0,
   it's indeterminate (it's a set!), unlike XPath 2.0/XQuery where there's
   a sequence type and you have order control. Observable behavior from
   libxml2 (and you could certainly want to know this) is you get things out
   in document order, whether that's what you wanted or not, even though
   this is undocumented, and even counter-documented[1], libxml2 behavior.
   So it's an example of something you would fundamentally like to know,
   where the only available answer depends precariously on the library
   we happen to be using.

 - which limits in our implementation are inherent to the library, and
   which are just current limits in our embedding of it? (Maybe this is
   right at the border of what a user would care to know, but I know it's
   a question that crosses my mind when I bonk into a limit I wasn't
   expecting.)

> There are a few places where the parenthesis around a block of text
> seem unnecessary.

)blush( that's a long-standing wart in my writing ... seems I often think
in parentheses, then look and say "those aren't needed" and take them out,
only sometimes I don't.

I skimmed just now and found a few instances of parenthesized whole
sentence: the one you quoted, and some (if argument is null, the result
is null), and (No rows will be produced if ). Shall I deparenthesize
them all? Did you have other instances in mind?

> It seems you are standardizing from "node set" to "nodeset", is that
> the preferred nomenclature or preference?

Another good catch. I remember consciously making a last pass to get them
all consistent, and I wanted them consistent with the spec, and I see now
I messed up.

XPath 1.0 [2] has zero instances of "nodeset", two of "node set" and about
six dozen of "node-set". The only appearances of "node set" without the
hyphen are in a heading and its ToC entry. The stuff under that heading
consistently uses node-set. It seems that's the XPath 1.0 term for sure.

When I made my consistency pass, I must have been looking too recently
in libxml2 C source, rather than the spec.

On 03/26/19 23:52, Tom Lane wrote:
> That seemed a bit jargon-y to me too.  If that's standard terminology
> in the XML world, maybe it's fine; but I'd have stuck with "node set".

It really was my intention (though I flubbed it) to use XPath 1.0's term
for XPath 1.0's concept; in my doc philosophy, that gives readers
the most breadcrumbs to follow for the rest of the details if they want
them. "Node set" might be some sort of squishy expository concept I'm
using, but node-set is a thing, in a spec.

If you agree, I should go through and fix my nodesets to be node-sets.

I do think the terminology matters here, especially because of the
differences between what you can do with a node-set (XPath 1.0 thing)
and with a sequence (XPath 2.0+,XQuery,SQL/XML thing).

Let me know what you'd like best on these points and I'll revise the patch.

Regards,
-Chap


[1] http://xmlsoft.org/html/libxml-xpath.html#xmlNodeSet : "array of nodes
in no particular order"

[2] 

Re: speeding up planning with partitions

2019-03-26 Thread Amit Langote
On 2019/03/27 12:06, Amit Langote wrote:
> I wonder if I should rework inherit.c so that its internal interfaces
> don't pass around parent Relation, but make do with the RelOptInfo?  I'll
> need to add tupdesc and reltype fields to RelOptInfo to go ahead with that
> though.

To give more context on the last sentence, the only place we need the
TupleDesc for is make_inh_translation_list().  Maybe, instead of adding
tupdesc to RelOptInfo, we could add a List of Vars of all attributes of a
table.  To avoid the overhead of always setting it, maybe we could set it
only for inheritance parent tables.  Adding tupdesc to RelOptInfo might be
a hard sell to begin with, because it would need to be pinned and then
released at the end of planning.

I'll see if I can make this work.

Thanks,
Amit





RE: Libpq support to connect to standby server as priority

2019-03-26 Thread Tsunakawa, Takayuki
From: Haribabu Kommi [mailto:kommi.harib...@gmail.com]
> while going through the old patch where the GUC_REPORT is implemented, Tom
> has commented the logic of sending the signal to all backends to process
> the hot standby exit with SIGHUP, if we add the logic of updating the GUC
> variable value in SIGHUP, we may need to change all the SIGHUP handler code
> paths. It is also possible that there is no need to update the variable
> for other processes except backends.
> 
> If we go with adding the new SIGUSR1 type to check and update the GUC varaible
> can work for most of the backends and background workers.
> 
> opinions

SIGUSR1 looks reasonable.  We can consider it as notifying that the server 
status has changed.

I've fully reviewed 0001-0003 and my comments follow.  I'll review 0004-0007.


(1) 0001
before issuing the transaction_readonly to find out whether
the server is read-write or not is restuctured under a new


transaction_readonly -> "SHOW transaction_read_only"
restuctured -> restructured


(2) 0001
+succesful connection or failure.
+successful connection; if it returns on, means 
server

succesful -> successful
means -> it means


(3) 0003
+But for servers version 12 or greater uses the 
transaction_read_only
+GUC that is reported by the server upon successful connection.

GUC doesn't seem to be a term to be used in the user manual.  Instead:

uses the value of transaction_read_only configuration 
parameter that is...

as in:

https://www.postgresql.org/docs/devel/libpq-connect.html

client_encoding
This sets the client_encoding configuration parameter for this connection.

application_name
Specifies a value for the application_name configuration parameter.


(4) 0003
boolstd_strings;/* standard_conforming_strings */
+   booltransaction_read_only;  /* session_read_only */

Looking at the comment for std_strings, it's better to change the comment to 
transaction_read_only to represent the backing configuration parameter name.


Regards
Takayuki Tsunakawa





Re: COPY FREEZE and setting PD_ALL_VISIBLE/visibility map bits

2019-03-26 Thread Masahiko Sawada
Thank you for sharing the updated patch!

On Tue, Mar 26, 2019 at 6:26 PM Pavan Deolasee  wrote:
>
>
> On Fri, Mar 22, 2019 at 12:19 PM Masahiko Sawada  
> wrote:
>>
>> I've looked at the patch and have comments and questions.
>>
>> +/*
>> + * While we are holding the lock on the page, check if all tuples
>> + * in the page are marked frozen at insertion. We can safely mark
>> + * such page all-visible and set visibility map bits too.
>> + */
>> +if (CheckPageIsAllFrozen(relation, buffer))
>> +PageSetAllVisible(page);
>> +
>> +MarkBufferDirty(buffer);
>>
>> Maybe we don't need to mark the buffer dirty if the page is not set 
>> all-visible.
>>
>
> Yeah, makes sense. Fixed.
>
>>
>>  If we have CheckAndSetPageAllVisible() for only this
>> situation we would rather need to check that the VM status of the page
>> should be 0 and then set two flags to the page? The 'flags' will
>> always be (VISIBILITYMAP_ALL_FROZEN | VISIBILITYMAP_ALL_VISIBLE) in
>> copy freeze case. I'm confused that this function has both code that
>> assumes some special situations and code that can be used in generic
>> situations.
>
>
> If a second COPY FREEZE is run within the same transaction and if starts 
> inserting into the page used by the previous COPY FREEZE, then the page will 
> already be marked all-visible/all-frozen. So we can skip repeating the 
> operation again. While it's quite unlikely that someone will do that and I 
> can't think of a situation where only one of those flags will be set, I don't 
> see a harm in keeping the code as is. This code is borrowed from vacuumlazy.c 
> and at some point we can even move it to some common location.

Thank you for explanation, agreed.

>
>>
>> Perhaps we can add some tests for this feature to pg_visibility module.
>>
>
> That's a good idea. Please see if the tests included in the attached patch 
> are enough.
>

The patch looks good to me. There is no comment from me.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Fix XML handling with DOCTYPE

2019-03-26 Thread Tom Lane
Ryan Lambert  writes:
> I have two recommendations for features.sgml.  You state: 

>> relies on the libxml library

> Should this be clarified as the libxml2 library?  That's what I installed to 
> build postgres from source (Ubuntu 16/18).  If it is the libxml library and 
> the "2" is irrelevant, or it it works with either, it might be nice to have a 
> clarifying note to indicate that.

Do we need to mention that at all?  If you're not building from source,
it doesn't seem very interesting ... but maybe I'm missing some reason
why end users would care.

> It seems you are standardizing from "node set" to "nodeset", is that the 
> preferred nomenclature or preference?

That seemed a bit jargon-y to me too.  If that's standard terminology
in the XML world, maybe it's fine; but I'd have stuck with "node set".

regards, tom lane




Re: Misleading errors with column references in default expressions and partition bounds

2019-03-26 Thread Michael Paquier
On Tue, Mar 26, 2019 at 10:03:35AM -0400, Tom Lane wrote:
> +1 for the general idea, but I find the switch a bit overly verbose.
> Do we really need to force every new EXPR_KIND to visit this spot,
> when so few of them have a need to do anything?  I'd be a bit inclined
> to simplify it to
> 
>   switch (pstate->p_expr_kind)
>   {
>   case EXPR_KIND_COLUMN_DEFAULT:
>   ereport(...);
>   break;
>   case EXPR_KIND_PARTITION_BOUND:
>   ereport(...);
>   break;
>   default:
>   break;
>   }
> 
> That's just a nitpick though.

ParseExprKind is an enum, so listing all the options without the
default has the advantage to generate a warning if somebody adds a
value.  This way anybody changing this code will need to think about
it.

> I don't see that as an issue.

Thanks!

> I think the idea is that trying to reference another column is something
> that people might well try to do, whereas referencing the DEFAULT's
> own column is obviously silly.  In particular the use of "cross-reference"
> implies that another column is what is being referenced.  If we dumb it
> down to just "references to columns in the current table", then it's
> consistent, but it's also completely redundant with the main part of the
> sentence.  It doesn't help that somebody decided to jam the independent
> issue of subqueries into the same sentence.  In short, maybe it'd be
> better like this:
> 
>... The value
>is any variable-free expression (in particular, cross-references
>to other columns in the current table are not allowed).  Subqueries
>are not allowed either.

Okay, I think I see your point here.  That sounds sensible.
--
Michael


signature.asc
Description: PGP signature


Re: pg_malloc0() instead of pg_malloc()+memset()

2019-03-26 Thread Michael Paquier
On Tue, Mar 26, 2019 at 09:14:46AM +, Daniel Gustafsson wrote:
> Nice, I had missed them as I my eyes set on pg_malloc(). I've done another 
> pass over
> the codebase and I can't spot any other on top of the additional ones you 
> found where
> MemSet() in palloc0 is preferrable over memset().

Same here, committed.  Just for the note, I have been just using git
grep -A to look at the follow-up lines of any allocation calls.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table

2019-03-26 Thread Michael Paquier
On Wed, Mar 27, 2019 at 11:40:12AM +0900, Amit Langote wrote:
> I'm thinking of adding this to open items under Older Bugs.  Attached the
> patch that I had posted on -bugs, but it's only a rough sketch as
> described above, not a full fix.

Adding it to the section for older bugs sounds fine to me.  Thanks for
doing so.
--
Michael


signature.asc
Description: PGP signature


Re: BUG #15672: PostgreSQL 11.1/11.2 crashed after dropping a partition table

2019-03-26 Thread Amit Langote
On 2019/03/08 19:22, Amit Langote wrote:
> On 2019/03/07 20:36, Amit Langote wrote:
>> On Thu, Mar 7, 2019 at 11:17 AM Amit Langote
>>  wrote:
>>> The problem start when ALTER TABLE users ALTER COLUMN is executed.
>>> create table users(user_id int, name varchar(64), unique (user_id, name))
>>> partition by list(user_id);
>>>
>>> create table users_000 partition of users for values in(0);
>>> create table users_001 partition of users for values in(1);
>>> select relname, relfilenode from pg_class where relname like 'users%';
>>>   relname   │ relfilenode
>>> ┼─
>>>  users  │   16441
>>>  users_000  │   16446
>>>  users_000_user_id_name_key │   16449
>>>  users_001  │   16451
>>>  users_001_user_id_name_key │   16454
>>>  users_user_id_name_key │   16444
>>> (6 rows)
>>>
>>> alter table users alter column name type varchar(127);
>>> select relname, relfilenode from pg_class where relname like 'users%';
>>>   relname   │ relfilenode
>>> ┼─
>>>  users  │   16441
>>>  users_000  │   16446
>>>  users_000_user_id_name_key │   16444<===  duplicated
>>>  users_001  │   16451
>>>  users_001_user_id_name_key │   16444<===  duplicated
>>>  users_user_id_name_key │   16444<===  duplicated
>>> (6 rows)
>>
>> I checked why users_000's and user_0001's indexes end up reusing
>> users_user_id_name_key's relfilenode.  At the surface, it's because
>> DefineIndex() is carrying oldNode =
>>  in IndexStmt, which is recursively
>> passed down to DefineIndex().  This
>> DefineIndex() chain is running due to ATPostAlterTypeCleanup() on the
>> parent rel.  This surface problem may be solved in DefineIndex() by
>> just resetting oldNode in each child IndexStmt before recursing, but
>> that means child indexes are recreated with new relfilenodes.  That
>> solves the immediate problem of relfilenodes being wrongly duplicated,
>> that's leading to madness such as SMgrRelationHash corruption being
>> seen in the original bug report.
> 
> This doesn't happen in HEAD, because in HEAD we got 807ae415c5, which
> changed things so that partitioned relations always have their relfilenode
> set to 0.  So, there's no question of parent's relfilenode being passed to
> children and hence being duplicated.
> 
> But that also means child indexes are unnecessarily rewritten, that is,
> with new relfilenodes.
> 
>> But, the root problem seems to be that ATPostAlterTypeCleanup() on
>> child tables isn't setting up their own
>> DefineIndex() step.  That's because the
>> parent's ATPostAlterTypeCleanup() dropped child copies of the UNIQUE
>> constraint due to dependencies (+ CCI).  So, ATExecAlterColumnType()
>> on child relations isn't able to find the constraint on the individual
>> child relations to turn into their own
>> DefineIndex(). If we manage to handle
>> each relation's ATPostAlterTypeCleanup() independently, child's
>> recreated indexes will be able to reuse their old relfilenodes and
>> everything will be fine.  But maybe that will require significant
>> overhaul of how this post-alter-type-cleanup occurs?
> 
> We could try to solve this dividing ATPostAlterTypeCleanup processing into
> two functions:
> 
> 1 The first one runs right after ATExecAlterColumnType() is finished for a
> given table (like it does today), which then runs ATPostAlterTypeParse to
> generate commands for constraints and/or indexes to re-add.  This part
> won't drop the old constraints/indexes just yet, so child
> constraints/indexes will remain for ATExecAlterColumnType to see when
> executed for the children.
> 
> 2. Dropping the old constraints/indexes is the responsibility of the 2nd
> part, which runs just before executing ATExecReAddIndex or
> ATExecAddConstraint (with is_readd = true), so that the new constraints
> don't collide with the existing ones.
> 
> This arrangement allows step 1 to generate the commands to recreate even
> the child indexes such that the old relfilenode can be be preserved by
> setting IndexStmt.oldNode.
> 
> Attached patch is a very rough sketch, which fails some regression tests,
> but I ran out of time today.

I'm thinking of adding this to open items under Older Bugs.  Attached the
patch that I had posted on -bugs, but it's only a rough sketch as
described above, not a full fix.

Link to the original bug report:
https://www.postgresql.org/message-id/flat/15672-b9fa7db32698269f%40postgresql.org

Thanks,
Amit
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index 5dcedc337a..ca514919d1 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -872,7 +872,7 @@ DefineIndex(Oid relationId,
CreateComments(indexRelationId, RelationRelationId, 0,
  

Re: Shouldn't current_schema() be at least PARALLEL RESTRICTED?

2019-03-26 Thread Michael Paquier
On Tue, Mar 26, 2019 at 02:16:00PM +, Daniel Gustafsson wrote:
> The new status of this patch is: Ready for Committer

Thanks Daniel for the review, committed.
--
Michael


signature.asc
Description: PGP signature


Re: performance issue in remove_from_unowned_list()

2019-03-26 Thread Tomas Vondra

On Thu, Mar 21, 2019 at 02:22:40AM +0100, Tomas Vondra wrote:

...

OK, so here is a bit more polished version of the dlist-based patch.

It's almost identical to what I posted before, except that it:

1) undoes the non-working optimization in DropRelationFiles()

2) removes add_to_unowned_list/remove_from_unowned_list entirely and
just replaces that with dlist_push_tail/dlist_delete

I've originally planned to keep those functions, mostly because the
remove_from_unowned_list comment says this:

 - * If the reln is not present in the list, nothing happens.
 - * Typically this would be caller error, but there seems no
 - * reason to throw an error.

I don't think dlist_delete allows that. But after further inspection of
all places calling those functions, don't think that can happen.

I plan to commit this soon-ish and backpatch it as mentioned before,
because I consider it pretty much a fix for b4166911.



Aand committed + backpatched.



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services





RE: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-03-26 Thread Tsunakawa, Takayuki
From: Masahiko Sawada [mailto:sawada.m...@gmail.com]
> On Wed, Mar 27, 2019 at 2:30 AM Robert Haas  wrote:
> >
> > On Tue, Mar 26, 2019 at 11:23 AM Masahiko Sawada 
> wrote:
> > > > I don't see a patch with the naming updated, here or there, and I'm
> > > > going to be really unhappy if we end up with inconsistent naming
> > > > between two patches that do such fundamentally similar things.  -1
> > > > from me to committing either one until that inconsistency is resolved.
> > >
> > > Agreed. I've just submitted the latest version patch that adds
> > > INDEX_CLEANUP option and vacuum_index_cleanup reloption. I already
> > > mentioned on that thread but I agreed with adding phrase positively
> > > than negatively. So if we got consensus on such naming the new options
> > > added by this patch could be something like SHRINK option (with
> > > true/false) and vacuum_shrink reloption.
> >
> > No, that's just perpetuating the problem.  Then you have an option
> > SHRINK here that you set to TRUE to skip something, and an option
> > INDEX_CLEANUP over there that you set to FALSE to skip something.
> >
> 
> Well, I imagined that both INDEX_CLEANUP option and SHRINK option (or
> perhaps TRUNCATE option) should be true by default. If we want to skip
> some operation of vacuum we can set each options to false like "VACUUM
> (INDEX_CLEANUP false, SHRINK true, VERBOSE true)". I think that
> resolves the problem but am I missing something?

I almost have the same view as Sawada-san.  The reloption vacuum_shrink_enabled 
is a positive name and follows the naming style of other reloptions.  I hope 
this matches the style you have in mind.


Regards
Takayuki Tsunakawa




Re: Fix XML handling with DOCTYPE

2019-03-26 Thread Ryan Lambert
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:tested, passed

Overall I think this patch [1] improves the docs and help explain edge case 
functionality that many of us, myself included, don't fully understand.  I 
can't verify technical accuracy for many of the details (nuances between XPath 
1.0, et. al), but overall my experience with the XML functionality lines up 
with what has been documented here.  Adding the clear declaration of "XPath 
1.0" instead of the generic "XPath" helps make it clear of the functional 
differences and helps frame the differences for new users.

I have two recommendations for features.sgml.  You state: 

>  relies on the libxml library

Should this be clarified as the libxml2 library?  That's what I installed to 
build postgres from source (Ubuntu 16/18).  If it is the libxml library and the 
"2" is irrelevant, or it it works with either, it might be nice to have a 
clarifying note to indicate that.

There are a few places where the parenthesis around a block of text seem 
unnecessary.  I don't think parens serve a purpose when a full sentence is 
contained within.

> (The libxml library does seem to always return nodesets to PostgreSQL with 
> their members in the same relative order they had in the input document; it 
> does not commit to this behavior, and an XPath 1.0 expression cannot control 
> it.)


It seems you are standardizing from "node set" to "nodeset", is that the 
preferred nomenclature or preference?

Hopefully this helps!  Thanks,

Ryan Lambert

[1] 
https://www.postgresql.org/message-id/attachment/100016/xml-functions-type-docfix-4.patch

The new status of this patch is: Waiting on Author


Re: Transaction commits VS Transaction commits (with parallel) VS query mean time

2019-03-26 Thread Haribabu Kommi
On Tue, Mar 26, 2019 at 9:08 PM Amit Kapila  wrote:

> On Mon, Mar 25, 2019 at 6:55 PM Haribabu Kommi 
> wrote:
> >
> >
> > Thanks to everyone for their opinions and suggestions to improve.
> >
> > Without parallel workers, there aren't many internal implementation
> > logic code that causes the stats to bloat. Parallel worker stats are not
> > just the transactions, it update other stats also, for eg; seqscan stats
> > of a relation. I also eagree that just fixing it for transactions may not
> > be a proper solution.
> >
> > Using of XID data may not give proper TPS of the instance as it doesn't
> > involve the read only transactions information.
> >
> > How about adding additional two columns that provides all the internal
> > and background worker transactions into that column?
> >
>
> I can see the case where the users will be interested in application
> initiated transactions, so if we want to add new columns, it could be
> to display that information and the current columns can keep
> displaying the overall transactions in the database.   Here, I think
> we need to find a way to distinguish between internal and
> user-initiated transactions.  OTOH, I am not sure adding new columns
> is better than changing the meaning of existing columns.  We can go
> either way based on what others feel good, but I think we can do that
> as a separate head-only feature.


I agree with you. Adding new columns definitely needs more discussions
of what processes should be skipped and what needs to be added and etc.



>   As part of this thread, maybe we can
> just fix the case of the parallel cooperating transaction.
>

With the current patch, all the parallel implementation transaction are
getting
skipped, in my tests parallel workers are the major factor in the
transaction
stats counter. Even before parallelism, the stats of the autovacuum and etc
are still present but their contribution is not greatly influencing the
stats.

I agree with you in fixing the stats with parallel workers and improve
stats.

Regards,
Haribabu Kommi
Fujitsu Australia


Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-03-26 Thread Masahiko Sawada
On Wed, Mar 27, 2019 at 2:30 AM Robert Haas  wrote:
>
> On Tue, Mar 26, 2019 at 11:23 AM Masahiko Sawada  
> wrote:
> > > I don't see a patch with the naming updated, here or there, and I'm
> > > going to be really unhappy if we end up with inconsistent naming
> > > between two patches that do such fundamentally similar things.  -1
> > > from me to committing either one until that inconsistency is resolved.
> >
> > Agreed. I've just submitted the latest version patch that adds
> > INDEX_CLEANUP option and vacuum_index_cleanup reloption. I already
> > mentioned on that thread but I agreed with adding phrase positively
> > than negatively. So if we got consensus on such naming the new options
> > added by this patch could be something like SHRINK option (with
> > true/false) and vacuum_shrink reloption.
>
> No, that's just perpetuating the problem.  Then you have an option
> SHRINK here that you set to TRUE to skip something, and an option
> INDEX_CLEANUP over there that you set to FALSE to skip something.
>

Well, I imagined that both INDEX_CLEANUP option and SHRINK option (or
perhaps TRUNCATE option) should be true by default. If we want to skip
some operation of vacuum we can set each options to false like "VACUUM
(INDEX_CLEANUP false, SHRINK true, VERBOSE true)". I think that
resolves the problem but am I missing something?

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center




Re: Improvement of installation document

2019-03-26 Thread Yugo Nagata
On Tue, 26 Mar 2019 22:18:53 +0900 (JST)
Tatsuo Ishii  wrote:

> > One of our clients suggested that the installation document[1] lacks 
> > description
> > about requriements of installing *-devel packages. For example, 
> > postgresqlxx-devel
> > is required for using --with-pgsql, and openssl-devel for --with-openssl, 
> > and so on,
> > but these are not documented.
> > 
> > [1] http://www.pgpool.net/docs/pgpool-II-3.7.4/en/html/install-pgpool.html
> > 
> > I know the document of PostgreSQL[2] also lacks the description about 
> > openssl-devel,
> > kerberos-devel, etc. (except to readline-devl). However, it would be 
> > convenient
> > for users who want to install Pgpool-II from source code if the required 
> > packages
> > for installation are described in the document explicitly.
> > 
> > [2] https://www.postgresql.org/docs/current/install-requirements.html
> > 
> > Is it not worth to consider this?
> 
> I am against the idea.
> 
> Development package names could differ according to
> distributions/OS. For example, the developement package of OpenSSL is
> "openssl-dev", not "openssl-devel" in Debian or Debian derived
> systems.
> 
> Another reason is, a user who is installaing software from source code
> should be familiar enough with the fact that each software requires
> development libraries.
> 
> In summary adding not-so-complete-list-of-development-package-names to
> our document will give incorrect information to novice users, and will
> be annoying for skilled users.

OK. I agreed.

# From this viewpoint, it would not be so good that PostgreSQL doc[2]
# mentions readline-devel, but this is noa a topic here.

> 
> > BTW, the Pgpool-II doc[2] says:
> > 
> > --with-memcached=path
> > Pgpool-II binaries will use memcached for in memory query cache. You 
> > have to install libmemcached. 
> > 
> > , but maybe libmemcached-devel is correct instead of libmemcached?
> 
> I don't think so. "libmemcached-devel" is just a package name in a
> cetain Linux distribution. "libmemcached" is a more geneal and non
> distribution dependent term.

Thanks for your explaination. I understood it.
 
> Best regards,
> --
> Tatsuo Ishii
> SRA OSS, Inc. Japan
> English: http://www.sraoss.co.jp/index_en.php
> Japanese:http://www.sraoss.co.jp


-- 
Yugo Nagata 




Re: basebackup checksum verification

2019-03-26 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-03-26 21:01:27 -0400, Stephen Frost wrote:
> > I'm also not convinced that these changes to pg_basebackup will be free
> > of issues that may impact users in a negative way, making me concerned
> > that we're going to end up doing more harm than good with such a change
> > being back-patched.  Simply comparing the skipped LSNs to the
> > end-of-backup LSN seems much less invasive when it comes to this core
> > code, and certainly increases the chances quite a bit that we'll detect
> > an issue with corruption in the LSN.
> 
> Yea, in the other thread we'd discussed that that might be the correct
> course for backpatch, at least initially. But I think the insert/replay
> LSN would be the correct LSN to compare to in the basebackup.c case?

Yes, it seems like that could be done in the basebackup case and would
avoid the need to track the skipped LSNs, since you could just look up
the insert/replay LSN at the time and do the comparison right away.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: basebackup checksum verification

2019-03-26 Thread Andres Freund
Hi,

On 2019-03-26 21:01:27 -0400, Stephen Frost wrote:
> I'm also not convinced that these changes to pg_basebackup will be free
> of issues that may impact users in a negative way, making me concerned
> that we're going to end up doing more harm than good with such a change
> being back-patched.  Simply comparing the skipped LSNs to the
> end-of-backup LSN seems much less invasive when it comes to this core
> code, and certainly increases the chances quite a bit that we'll detect
> an issue with corruption in the LSN.

Yea, in the other thread we'd discussed that that might be the correct
course for backpatch, at least initially. But I think the insert/replay
LSN would be the correct LSN to compare to in the basebackup.c case?

Greetings,

Andres Freund




Re: setLastTid() and currtid()

2019-03-26 Thread Inoue, Hiroshi

Hi Andres,
Sorry for the late reply.

On 2019/03/26 9:44, Andres Freund wrote:

Hi,

For the tableam work I'd like to remove heapam.h from
nodeModifyTable.c. The only remaining impediment to that is a call to
setLastTid(), which is defined in tid.c but declared in heapam.h.

That doesn't seem like a particularly accurate location, it doesn't
really have that much to do with heap. It seems more like a general
executor facility or something.  Does anybody have a good idea where to
put the declaration?


Looking at how this function is used, lead to some confusion on my part.


We currently call setLastTid in ExecInsert():

if (canSetTag)
{
(estate->es_processed)++;
setLastTid(>tts_tid);
}

And Current_last_tid, the variable setLastTid sets, is only used in
currtid_byreloid():


Datum
currtid_byreloid(PG_FUNCTION_ARGS)
{
Oid reloid = PG_GETARG_OID(0);
ItemPointer tid = PG_GETARG_ITEMPOINTER(1);
ItemPointer result;
Relationrel;
AclResult   aclresult;
Snapshotsnapshot;

result = (ItemPointer) palloc(sizeof(ItemPointerData));
if (!reloid)
{
*result = Current_last_tid;
PG_RETURN_ITEMPOINTER(result);
}

I've got to say I'm a bit baffled by this interface. If somebody passes
in a 0 reloid, we just ignore the passed in tid, and return the last tid
inserted into any table?

I then was even more baffled to find that there's no documentation of
this function, nor this special case behaviour, to be found
anywhere. Not in the docs (which don't mention the function, nor it's
special case behaviour for relation 0), nor in the code.


It's unfortunately used in psqlobdc:

 else if ((flag & USE_INSERTED_TID) != 0)
 printfPQExpBuffer(, "%s where ctid = (select 
currtid(0, '(0,0)'))", load_stmt);


The above code remains only for PG servers whose version < 8.2.
Please remove the code around setLastTid().

regards,
Hiroshi Inoue


I gotta say, all that currtid code looks to me like it just should be
ripped out.  It really doesn't make a ton of sense to just walk the tid
chain for a random tid - without an open snapshot, there's absolutely no
guarantee that you get back anything meaningful.  Nor am I convinced
it's perfectly alright to just return the latest inserted tid for a
relation the user might not have any permission for.

OTOH, we probably can't just break psqlodbc, so we probably have to hold
our noses a bit longer and just move the prototype elsewhere?  But I'm
inclined to just say that this functionality is going to get ripped out
soon, unless somebody from the odbc community works on making it make a
bit more sense (tests, docs at the very very least).

Greetings,

Andres Freund


---
このメールは、AVG によってウイルス チェックされています。
http://www.avg.com





Re: basebackup checksum verification

2019-03-26 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> On 2019-03-26 20:18:31 -0400, Stephen Frost wrote:
> > > >>I thought Robert's response was generally good, pointing out that
> > > >>we're talking about this being an issue if the corruption happens in a
> > > >>certain set of bytes.  That said, I'm happy to see improvements in
> > > >>this area but I'm flat out upset about the notion that we must be
> > > >>perfect here- our checksums themselves aren't perfect for catching
> > > >>corruption either.
> > > >
> > > >The point is that we're not detecting errors that we can detect when
> > > >read outside of basebackup. I really entirely completely fail how that
> > > >can be defended.
> > > >
> > > >I think we're making promises with this the basebackup feature we're not
> > > >even remotely keeping. I don't understand how you can defend that, given
> > > >the current state, you can have a basebackup that you took with
> > > >checksums enabled, and then when actually use that basebackup you get
> > > >checksum failures.  Like it's one thing not to detect all storage
> > > >issues, but if we do detect them after using the basebackup, that's
> > > >really not ok.
> > > 
> > > Yeah, if basebackup completes without reporting any invalid checksums, but
> > > running pg_verify_checksums on the same backups detects those, that 
> > > probably
> > > should raise some eyebrows.
> > 
> > That isn't actually what would happen at this point, just so we're
> > clear.  What Andres is talking about is a solution which would only
> > actually work for pg_basebackup, and not for pg_verify_checksums
> > (without some serious changes which make it connect to the running
> > server and run various functions to perform the locking that he's
> > proposing pg_basebackup do...).
> 
> Well, I still think it's just plain wrong to do online checksum
> verification outside of the server, and we should just reject adding
> that as a feature.

I get that, and I disagree with it.

> Besides the fact that I think having at precisely equal or more error
> detection capabilities than the backend, I think all the LSN based
> approaches also have the issue that they'll prevent us from using them
> on non WAL logged data. There's ongoing work to move SLRUs into the
> backend allowing them to be checksummed (Shawn Debnath is IIRC planning
> to propose a patch for v13), and we also really should offer to also
> checksum unlogged tables (and temp tables?) - just because they'd be
> gone after a crash, imo doesn't make it OK to not detect corrupted on
> disk data outside of a crash.  For those things we won't necessarily
> have LSNs that we can conveniently can associate with those buffers -
> making LSN based logic harder.

I'm kinda guessing that the SLRUs are still going to be WAL'd.  If
that's changing, I'd be very curious to hear the details.

As for unlogged table and temp tables, it's an interesting idea to
checksum them but far less valuable by the very nature of what those are
used for.  Futher, it's utterly useless to checksum as part of backup,
like what pg_basebackup is actually doing, and is actually valuable to
skip over them rather than back them up, since otherwise we'd back them
up, and then restore them ... and then remove them immediately during
recovery from the backup state.  Having a way to verify checksum on
unlogged tables or temp tables using some *other* tool or background
process could be valuable, provided it doesn't cause any issues for
ongoing operations.

> > I outlined a couple of other approaches to improving that situation,
> > which would be able to be used with pg_verify_checksums without having
> > to connect to the backend, but I'll note that those were completely
> > ignored, leading me to believe that there's really not much more to
> > discuss here since other ideas are just not open to being considered.
> 
> Well, given that we can do an accurate determination without too much
> code in the basebackup case, I don't see what your proposals gain over
> that? That's why I didn't comment on them.  I'm focusing on the
> basebackup case, over the online checksum case, because it's released
> code.

I'm fine with improving the basebackup case, but I don't agree at all
with the idea that all checksum validation must be exclusively done in a
backend process.

I'm also not convinced that these changes to pg_basebackup will be free
of issues that may impact users in a negative way, making me concerned
that we're going to end up doing more harm than good with such a change
being back-patched.  Simply comparing the skipped LSNs to the
end-of-backup LSN seems much less invasive when it comes to this core
code, and certainly increases the chances quite a bit that we'll detect
an issue with corruption in the LSN.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: basebackup checksum verification

2019-03-26 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Tue, Mar 26, 2019 at 08:18:31PM -0400, Stephen Frost wrote:
> >* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> >>On Tue, Mar 26, 2019 at 04:49:21PM -0700, Andres Freund wrote:
> >>>On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > As detailed in
> > https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
> > the way the backend's basebackup checksum verification works makes its
> > error detection capabilities very dubious.
> 
> I disagree that it's 'very dubious', even with your analysis.
> >>>
> >>>I really don't know what to say.  The current algorithm is flat out
> >>>bogus.
> >>
> >>Bogus might be a bit too harsh, but yeah - failure to reliably detect 
> >>obviously
> >>invalid checksums when the LSN just happens to be high due to randomness is 
> >>not
> >>a good thing. We'll still detect pages corrupted in other places, but this 
> >>is
> >>rather unfortunate.
> >
> >I'm all for improving it, as I said originally.
> >
> I thought Robert's response was generally good, pointing out that
> we're talking about this being an issue if the corruption happens in a
> certain set of bytes.  That said, I'm happy to see improvements in
> this area but I'm flat out upset about the notion that we must be
> perfect here- our checksums themselves aren't perfect for catching
> corruption either.
> >>>
> >>>The point is that we're not detecting errors that we can detect when
> >>>read outside of basebackup. I really entirely completely fail how that
> >>>can be defended.
> >>>
> >>>I think we're making promises with this the basebackup feature we're not
> >>>even remotely keeping. I don't understand how you can defend that, given
> >>>the current state, you can have a basebackup that you took with
> >>>checksums enabled, and then when actually use that basebackup you get
> >>>checksum failures.  Like it's one thing not to detect all storage
> >>>issues, but if we do detect them after using the basebackup, that's
> >>>really not ok.
> >>
> >>Yeah, if basebackup completes without reporting any invalid checksums, but
> >>running pg_verify_checksums on the same backups detects those, that probably
> >>should raise some eyebrows.
> >
> >That isn't actually what would happen at this point, just so we're
> >clear.  What Andres is talking about is a solution which would only
> >actually work for pg_basebackup, and not for pg_verify_checksums
> >(without some serious changes which make it connect to the running
> >server and run various functions to perform the locking that he's
> >proposing pg_basebackup do...).
> 
> I was talking about pg_verify_checksums in offline mode, i.e. when you
> take a backup and then run pg_verify_checksums on it. I'm pretty sure
> that does not need to talk to the cluster. Sorry if that was not clear.

To make that work, you'd have to take a backup, then restore it, then
bring PG up and have it replay all of the outstanding WAL, then shut
down PG cleanly, and *then* you could run pg_verify_checksums on it.

> >>We already have such blind spot, but it's expected to be pretty small
> >>(essentially pages modified since start of the backup).
> >
> >eh..?  This is.. more-or-less entirely what's being discussed here:
> >exactly how we detect and determine which pages were modified since the
> >start of the backup, and which might have been partially written out
> >when we tried to read them and therefore fail a checksum check, but it
> >doesn't matter because we don't actually end up using those pages.
> 
> And? All I'm saying is that we knew there's a gap that we don't check,
> but that the understanding was that it's rather small and limited to
> recently modified pages. If we can further reduce that window, that's
> great and we should do it, of course.

The point that Andres is making is that in the face of corruption of
certain particular bytes, the set of pages that we check can be much
less than the overall size of the DB (or that of what was recently
modified), and we end up skipping a lot of other pages due to the
corruption rather than because they've been recently modified because
the determination of what's been recently modified is based on those
bytes.  With the changes being made to pg_checksums, we'd at least
report back those pages as having been 'skipped', but better would be if
we could accurately determine that the pages were recently modified and
therefore what we saw was a torn page.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: MSVC Build support with visual studio 2019

2019-03-26 Thread Haribabu Kommi
On Wed, Mar 27, 2019 at 3:03 AM Andrew Dunstan <
andrew.duns...@2ndquadrant.com> wrote:

>
> On 3/20/19 8:36 PM, Haribabu Kommi wrote:
> > Hi Hackers,
> >
> > Here I attached a patch that supports building of PostgreSQL with VS
> 2019.
> > VS 2019 is going to release on Apr 2nd 2019, it will be good if version
> 12
> > supports compiling. The attached for is for review, it may needs some
> > updates
> > once the final version is released.
> >
> > Commit d9dd406fe281d22d5238d3c26a7182543c711e74 has reduced the
> > minimum visual studio support to 2013 to support C99 standards,
> > because of this
> > reason, the current attached patch cannot be backpatched as it is.
> >
> > I can provide a separate back branches patch later once this patch
> > comes to a stage of commit. Currently all the supported branches are
> > possible to compile with VS 2017.
> >
> > comments?
> >
> >
>
>
> I have verified that this works with VS2019.
>
>
> There are a few typos in the comments that need cleaning up.
>

Thanks for the review.

I corrected the typos in the comments, hopefully I covered everything.
Attached is the updated patch. Once the final VS 2019, I will check the
patch if it needs any more updates.

Regards,
Haribabu Kommi
Fujitsu Australia


0001-Support-building-with-visual-studio-2019_v2.patch
Description: Binary data


Re: basebackup checksum verification

2019-03-26 Thread Tomas Vondra

On Tue, Mar 26, 2019 at 08:18:31PM -0400, Stephen Frost wrote:

Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:

On Tue, Mar 26, 2019 at 04:49:21PM -0700, Andres Freund wrote:
>On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:
>>* Andres Freund (and...@anarazel.de) wrote:
>>> As detailed in
>>> https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
>>> the way the backend's basebackup checksum verification works makes its
>>> error detection capabilities very dubious.
>>
>>I disagree that it's 'very dubious', even with your analysis.
>
>I really don't know what to say.  The current algorithm is flat out
>bogus.

Bogus might be a bit too harsh, but yeah - failure to reliably detect obviously
invalid checksums when the LSN just happens to be high due to randomness is not
a good thing. We'll still detect pages corrupted in other places, but this is
rather unfortunate.


I'm all for improving it, as I said originally.


>>I thought Robert's response was generally good, pointing out that
>>we're talking about this being an issue if the corruption happens in a
>>certain set of bytes.  That said, I'm happy to see improvements in
>>this area but I'm flat out upset about the notion that we must be
>>perfect here- our checksums themselves aren't perfect for catching
>>corruption either.
>
>The point is that we're not detecting errors that we can detect when
>read outside of basebackup. I really entirely completely fail how that
>can be defended.
>
>I think we're making promises with this the basebackup feature we're not
>even remotely keeping. I don't understand how you can defend that, given
>the current state, you can have a basebackup that you took with
>checksums enabled, and then when actually use that basebackup you get
>checksum failures.  Like it's one thing not to detect all storage
>issues, but if we do detect them after using the basebackup, that's
>really not ok.

Yeah, if basebackup completes without reporting any invalid checksums, but
running pg_verify_checksums on the same backups detects those, that probably
should raise some eyebrows.


That isn't actually what would happen at this point, just so we're
clear.  What Andres is talking about is a solution which would only
actually work for pg_basebackup, and not for pg_verify_checksums
(without some serious changes which make it connect to the running
server and run various functions to perform the locking that he's
proposing pg_basebackup do...).



I was talking about pg_verify_checksums in offline mode, i.e. when you
take a backup and then run pg_verify_checksums on it. I'm pretty sure
that does not need to talk to the cluster. Sorry if that was not clear.


We already have such blind spot, but it's expected to be pretty small
(essentially pages modified since start of the backup).


eh..?  This is.. more-or-less entirely what's being discussed here:
exactly how we detect and determine which pages were modified since the
start of the backup, and which might have been partially written out
when we tried to read them and therefore fail a checksum check, but it
doesn't matter because we don't actually end up using those pages.



And? All I'm saying is that we knew there's a gap that we don't check,
but that the understanding was that it's rather small and limited to
recently modified pages. If we can further reduce that window, that's
great and we should do it, of course.


I outlined a couple of other approaches to improving that situation,
which would be able to be used with pg_verify_checksums without having
to connect to the backend, but I'll note that those were completely
ignored, leading me to believe that there's really not much more to
discuss here since other ideas are just not open to being considered.



Uh.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





minimizing pg_stat_statements performance overhead

2019-03-26 Thread Raymond Martin
Hello hackers, 
This email is regarding the Postgres pg_stat_statements extension. 
I noticed that enabling pg_stat_statements can effect performance. I thought 
that changing the pg_stat_statements.track parameter to 'none' could reduce 
this overhead without requiring a restart to remove it from 
shared_preload_libraries. Changing this config did not improve performance as I 
expected. Looking over the code, I noticed that pg_stat_statements is not 
checking if it is enabled before executing the post_parse_analyze_hook 
function. Other hooks that require access to the pg_stat_statements query hash 
table (through the pgss_store function) check for pgss_enabled. 
Would it make sense to check for pgss_enabled in the post_parse_analyze_hook 
function?
 
**Patching**
Making this change drastically improved performance while 
pg_stat_statement.track was set to NONE. This change allows me to more 
effectively enable/disable pg_stat_statements without requiring a restart. 
Example patch:
@@ -783,8 +783,8 @@ pgss_post_parse_analyze(ParseState *pstate, Query *query)
    /* Assert we didn't do this already */
    Assert(query->queryId == 0);
 
-   /* Safety check... */
-   if (!pgss || !pgss_hash)
+   /* Safety check...and ensure that pgss is enabled before we do any work 
*/
+   if (!pgss || !pgss_hash || !pgss_enabled())
    return;

**Simple Mini Benchmark**
I ran a simple test on my local machine with this spec: 16 core/32 GB 
memory/Windows Server 2016.
The simple query I used was 'select 1'. I called pg_stat_statements_reset() 
before each simple query to clear the pg_stat_statements query hash. The 
majority of the latency happens the first time a query is run. 
Median runtime of 100 simple queries in milliseconds: 
PGSS loaded (ms)PGSS loaded + this patch (ms)
track = top  0.53   0.55
track = none   0.41 0.20

PGSS not loaded: 0.18ms

--
Raymond Martin
rama...@microsoft.com
Azure Database for PostgreSQL





Re: basebackup checksum verification

2019-03-26 Thread Andres Freund
Hi,

On 2019-03-26 20:18:31 -0400, Stephen Frost wrote:
> > >>I thought Robert's response was generally good, pointing out that
> > >>we're talking about this being an issue if the corruption happens in a
> > >>certain set of bytes.  That said, I'm happy to see improvements in
> > >>this area but I'm flat out upset about the notion that we must be
> > >>perfect here- our checksums themselves aren't perfect for catching
> > >>corruption either.
> > >
> > >The point is that we're not detecting errors that we can detect when
> > >read outside of basebackup. I really entirely completely fail how that
> > >can be defended.
> > >
> > >I think we're making promises with this the basebackup feature we're not
> > >even remotely keeping. I don't understand how you can defend that, given
> > >the current state, you can have a basebackup that you took with
> > >checksums enabled, and then when actually use that basebackup you get
> > >checksum failures.  Like it's one thing not to detect all storage
> > >issues, but if we do detect them after using the basebackup, that's
> > >really not ok.
> > 
> > Yeah, if basebackup completes without reporting any invalid checksums, but
> > running pg_verify_checksums on the same backups detects those, that probably
> > should raise some eyebrows.
> 
> That isn't actually what would happen at this point, just so we're
> clear.  What Andres is talking about is a solution which would only
> actually work for pg_basebackup, and not for pg_verify_checksums
> (without some serious changes which make it connect to the running
> server and run various functions to perform the locking that he's
> proposing pg_basebackup do...).

Well, I still think it's just plain wrong to do online checksum
verification outside of the server, and we should just reject adding
that as a feature.

Besides the fact that I think having at precisely equal or more error
detection capabilities than the backend, I think all the LSN based
approaches also have the issue that they'll prevent us from using them
on non WAL logged data. There's ongoing work to move SLRUs into the
backend allowing them to be checksummed (Shawn Debnath is IIRC planning
to propose a patch for v13), and we also really should offer to also
checksum unlogged tables (and temp tables?) - just because they'd be
gone after a crash, imo doesn't make it OK to not detect corrupted on
disk data outside of a crash.  For those things we won't necessarily
have LSNs that we can conveniently can associate with those buffers -
making LSN based logic harder.


> I outlined a couple of other approaches to improving that situation,
> which would be able to be used with pg_verify_checksums without having
> to connect to the backend, but I'll note that those were completely
> ignored, leading me to believe that there's really not much more to
> discuss here since other ideas are just not open to being considered.

Well, given that we can do an accurate determination without too much
code in the basebackup case, I don't see what your proposals gain over
that? That's why I didn't comment on them.  I'm focusing on the
basebackup case, over the online checksum case, because it's released
code.

Greetings,

Andres Freund




RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-03-26 Thread Tsunakawa, Takayuki
From: David Rowley [mailto:david.row...@2ndquadrant.com]
> Here a benchmark doing that using pgbench's script weight feature.

Wow, I didn't know that pgbench has evolved to have such a convenient feature.  
Thanks for telling me how to utilize it in testing.  PostgreSQL is cool!


Regards
Takayuki Tsunakawa




Re: basebackup checksum verification

2019-03-26 Thread Peter Geoghegan
On Tue, Mar 26, 2019 at 5:10 PM Tomas Vondra
 wrote:
> Bogus might be a bit too harsh, but yeah - failure to reliably detect 
> obviously
> invalid checksums when the LSN just happens to be high due to randomness is 
> not
> a good thing. We'll still detect pages corrupted in other places, but this is
> rather unfortunate.

I have personally seen real world corruption that involved a page
image consisting of random noise. Several times. Failing to detect
blatant corruption is unacceptable IMV.

Can't we do better here without great difficulty? There are plenty of
generic things that you we could do that can verify that almost any
type of initialized page is at least somewhat sane. For example, you
can verify that line pointers indicate that tuples are
non-overlapping.

That said, Andres' approach sounds like the way to go to me.

-- 
Peter Geoghegan




Re: speeding up planning with partitions

2019-03-26 Thread Amit Langote
On 2019/03/27 1:06, Tom Lane wrote:
> Amit Langote  writes:
>> 0002 is a new patch to get rid of the duplicate RTE and PlanRowMark that's
>> created for partitioned parent table, as it's pointless.  I was planning
>> to propose it later, but decided to post it now if we're modifying the
>> nearby code anyway.
> 
> Good idea, but it needs a bit more work on nearby comments, and the
> logic in expand_single_inheritance_child could be simplified, and
> you evidently didn't try check-world because the RTE-removal causes
> cosmetic changes in postgres_fdw.out.  I fixed that up and pushed
> this part.

Thanks.  Sorry that I left a lot to be done.

Regards,
Amit





RE: Speed up transaction completion faster after many relations are accessed in a transaction

2019-03-26 Thread Tsunakawa, Takayuki
From: Amit Langote [mailto:langote_amit...@lab.ntt.co.jp]
> My understanding of what David wrote is that the slowness of bloated hash
> table is hard to notice, because planning itself is pretty slow.  With the
> "speeding up planning with partitions" patch, planning becomes quite fast,
> so the bloated hash table overhead and so your patch's benefit is easier
> to notice.  This patch is clearly helpful, but it's just hard to notice
> it
> when the other big bottleneck is standing in the way.

Ah, I see.  I failed to recognize the simple fact that without your patch, 
EXECUTE on a table with many partitions is slow due to the custom planning time 
proportional to the number of partitions.  Thanks for waking up my sleeping 
head!


Regards
Takayuki Tsunakawa



Re: Ordered Partitioned Table Scans

2019-03-26 Thread David Rowley
Thanks for having another look.

On Wed, 27 Mar 2019 at 00:22, Julien Rouhaud  wrote:
> A few, mostly nitpicking, comments:
>
> +   if (rel->part_scheme != NULL && IS_SIMPLE_REL(rel) &&
> +   partitions_are_ordered(root, rel))
>
> shouldn't the test be IS_PARTITIONED_REL(rel) instead of testing
> part_scheme?  I'm thinking of 1d33858406 and related discussions.

I don't think it's really needed. There must be > 0 partitions in this
case as if there were either 0 partitions or all partitions had been
pruned then the partitioned table would have been turned into a dummy
rel in set_append_rel_size(), and we'd never try to generate paths for
it. There are also quite a number of other places where we do the same
in add_paths_to_append_rel().

> + * partitions_are_ordered
> + * For the partitioned table given in 'partrel', returns true if the
> + * partitioned table guarantees that tuples which sort earlier according
> + * to the partition bound are stored in an earlier partition.  Returns
> + * false this is not possible, or if we have insufficient means to prove
> + * it.
> [...]
> + * partkey_is_bool_constant_for_query
> + *
> + * If a partition key column is constrained to have a constant value by the
> + * query's WHERE conditions, then we needn't take the key into consideration
> + * when checking if scanning partitions in order can't cause lower-order
> + * values to appear in later partitions.
>
> Maybe it's because I'm not a native english speaker, but I had to read
> those comments multiple time.

I've changed the wording of these a bit. I ended up aligning
partkey_is_bool_constant_for_query() with its cousin
indexcol_is_bool_constant_for_query(). Previously I'd tried to make
the comment contain a bit more detail about what calls it, but I've
now removed that part and replaced it with "then it's irrelevant for
sort-order considerations".

> I'd also add to partitions_are_ordered
> comments a note about default_partition (even though the function is
> super short).

hmm. The comments there do mention default partitions in each place
it's relevant.  It's not relevant to mention anything about default
partitions in the header comment of the function since callers don't
need to know about implementation details. They just need details of
what the function does and what callers need to know.  If we invent
some other naturally ordered partition strategy in the future that
does not allow default partitions then a comment in the function
header about default partitions would be not only irrelevant but also
confusing.

> +   if (boundinfo->ndatums +
> partition_bound_accepts_nulls(boundinfo) != partrel->nparts)
> +   return false;
>
> there are a few over lengthy lines, maybe a pgindent run would be useful.

I've run pgindent. It won't wrap that line, so I wrapped it manually.
I don't think it's become any more pretty for it though.

> + * build_partition_pathkeys
> + *   Build a pathkeys list that describes the ordering induced by the
> + *   partitions of 'partrel'.  (Callers must ensure that this partitioned
> + *   table guarantees that lower order tuples never will be found in a
> + *   later partition.).  Sets *partialkeys to false if pathkeys were only
> + *   built for a prefix of the partition key, otherwise sets it to true.
> + */
> +List *
> +build_partition_pathkeys(PlannerInfo *root, RelOptInfo *partrel,
> +ScanDirection scandir, bool *partialkeys)
>
> Maybe add an assert partitions_are_ordered also?

Added that.

> And finally, should this optimisation be mentioned in ddl.sgml (or
> somewhere else)?

I'm not too sure about this. We don't generally detail out planner
optimisations in the docs.  However, maybe it's worth users knowing
about it as it may control their design choices of partition
hierarchies. I'd just not be sure where exactly something should be
written. I suppose ideally there'd be a section in the docs for
planner optimisations which could contain a section on partitioned
tables which we could reference from the partitioned table docs in
ddl.sgml. That would be asking a bit much for this patch though.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


mergeappend_to_append_conversion_v15.patch
Description: Binary data


Re: basebackup checksum verification

2019-03-26 Thread Stephen Frost
Greetings,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
> On Tue, Mar 26, 2019 at 04:49:21PM -0700, Andres Freund wrote:
> >On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:
> >>* Andres Freund (and...@anarazel.de) wrote:
> >>> As detailed in
> >>> https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
> >>> the way the backend's basebackup checksum verification works makes its
> >>> error detection capabilities very dubious.
> >>
> >>I disagree that it's 'very dubious', even with your analysis.
> >
> >I really don't know what to say.  The current algorithm is flat out
> >bogus.
> 
> Bogus might be a bit too harsh, but yeah - failure to reliably detect 
> obviously
> invalid checksums when the LSN just happens to be high due to randomness is 
> not
> a good thing. We'll still detect pages corrupted in other places, but this is
> rather unfortunate.

I'm all for improving it, as I said originally.

> >>I thought Robert's response was generally good, pointing out that
> >>we're talking about this being an issue if the corruption happens in a
> >>certain set of bytes.  That said, I'm happy to see improvements in
> >>this area but I'm flat out upset about the notion that we must be
> >>perfect here- our checksums themselves aren't perfect for catching
> >>corruption either.
> >
> >The point is that we're not detecting errors that we can detect when
> >read outside of basebackup. I really entirely completely fail how that
> >can be defended.
> >
> >I think we're making promises with this the basebackup feature we're not
> >even remotely keeping. I don't understand how you can defend that, given
> >the current state, you can have a basebackup that you took with
> >checksums enabled, and then when actually use that basebackup you get
> >checksum failures.  Like it's one thing not to detect all storage
> >issues, but if we do detect them after using the basebackup, that's
> >really not ok.
> 
> Yeah, if basebackup completes without reporting any invalid checksums, but
> running pg_verify_checksums on the same backups detects those, that probably
> should raise some eyebrows.

That isn't actually what would happen at this point, just so we're
clear.  What Andres is talking about is a solution which would only
actually work for pg_basebackup, and not for pg_verify_checksums
(without some serious changes which make it connect to the running
server and run various functions to perform the locking that he's
proposing pg_basebackup do...).

> We already have such blind spot, but it's expected to be pretty small
> (essentially pages modified since start of the backup).

eh..?  This is.. more-or-less entirely what's being discussed here:
exactly how we detect and determine which pages were modified since the
start of the backup, and which might have been partially written out
when we tried to read them and therefore fail a checksum check, but it
doesn't matter because we don't actually end up using those pages.

I outlined a couple of other approaches to improving that situation,
which would be able to be used with pg_verify_checksums without having
to connect to the backend, but I'll note that those were completely
ignored, leading me to believe that there's really not much more to
discuss here since other ideas are just not open to being considered.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Pluggable Storage - Andres's take

2019-03-26 Thread Andres Freund
Hi,

On 2019-02-22 14:52:08 -0500, Robert Haas wrote:
> On Fri, Feb 22, 2019 at 11:19 AM Amit Khandekar  
> wrote:
> > Thanks for the review. Attached v2.
> 
> Thanks.  I took this, combined it with Andres's
> v12-0040-WIP-Move-xid-horizon-computation-for-page-level-.patch, did
> some polishing of the code and comments, and pgindented.  Here's what
> I ended up with; see what you think.

I pushed this after some fairly minor changes, directly including the
patch to route the horizon computation through tableam. The only real
change is that I removed the table relfilenode from the nbtree/hash
deletion WAL record - it was only required to access the heap without
accessing the catalog and was unused now.  Also added a WAL version
bump.

It seems possible that some other AM might want to generalize the
prefetch logic from heapam.c, but I think it's fair to defer that until
such an AM wants to do so.

Greetings,

Andres Freund




Re: basebackup checksum verification

2019-03-26 Thread Tomas Vondra

On Tue, Mar 26, 2019 at 04:49:21PM -0700, Andres Freund wrote:

Hi,

On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:

Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> As detailed in
> https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
> the way the backend's basebackup checksum verification works makes its
> error detection capabilities very dubious.

I disagree that it's 'very dubious', even with your analysis.


I really don't know what to say.  The current algorithm is flat out
bogus.



Bogus might be a bit too harsh, but yeah - failure to reliably detect obviously
invalid checksums when the LSN just happens to be high due to randomness is not
a good thing. We'll still detect pages corrupted in other places, but this is
rather unfortunate.




I thought Robert's response was generally good, pointing out that
we're talking about this being an issue if the corruption happens in a
certain set of bytes.  That said, I'm happy to see improvements in
this area but I'm flat out upset about the notion that we must be
perfect here- our checksums themselves aren't perfect for catching
corruption either.


The point is that we're not detecting errors that we can detect when
read outside of basebackup. I really entirely completely fail how that
can be defended.

I think we're making promises with this the basebackup feature we're not
even remotely keeping. I don't understand how you can defend that, given
the current state, you can have a basebackup that you took with
checksums enabled, and then when actually use that basebackup you get
checksum failures.  Like it's one thing not to detect all storage
issues, but if we do detect them after using the basebackup, that's
really not ok.



Yeah, if basebackup completes without reporting any invalid checksums, but
running pg_verify_checksums on the same backups detects those, that probably
should raise some eyebrows.

We already have such blind spot, but it's expected to be pretty small
(essentially pages modified since start of the backup).




> 2) If 1) errored out, ensure that that's because the backend is
>currently writing out the page. That basically requires doing what
>BufferAlloc() does. So I think we'd need to end up with a function
>like:
>
> LockoutBackendWrites():
> buf = ReadBufferWithoutRelcache(relfilenode);

This is going to cause it to be pulled into shared buffers, if it isn't
already there, isn't it?


I can't see that being a problem. We're only going to enter this path if
we encountered a buffer where the checksum was wrong. And either that's
a data corruption even in which case we don't care about a small
performance penalty, or it's a race around writing out the page because
basebackup read it while half writen - in which case it's pretty pretty
likely that the page is still in shared buffers.



Yep, I think this is fine. Although, I think in the other thread where this
failure mode was discussed, I think we've only discussed to get I/O lock on
the buffer, no? But as you say, this should be a rare code path.




That seems less than ideal and isn't it going
to end up just doing exactly the same PageIsVerified() call, and what
happens when that fails?


That seems quite easily handled with a different RBM_ mode.




> LWLockAcquire(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> /*
>  * Reread page from OS, and recheck. This needs to happen while
>  * the IO lock prevents rereading from the OS. Note that we do
>  * not want to rely on the buffer contents here, as that could
>  * be very old cache contents.
>  */
> perform_checksum_check(relfilenode, ERROR);
>
> LWLockRelease(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> ReleaseBuffer(buf);


This should be the IO lock, not content lock, sorry. Copy & pasto.



I don't particularly like having to lock pages in this way while
performing this check, espectially with having to read the page into
shared buffers potentially.


Given it's only the IO lock (see above correction), and only if we can't
verify the checksum during the race, I fail to see how that can be a
problem?



IMHO not a problem.


cheers

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 





Re: basebackup checksum verification

2019-03-26 Thread Andres Freund
Hi,

On 2019-03-26 19:22:03 -0400, Stephen Frost wrote:
> Greetings,
> 
> * Andres Freund (and...@anarazel.de) wrote:
> > As detailed in
> > https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
> > the way the backend's basebackup checksum verification works makes its
> > error detection capabilities very dubious.
> 
> I disagree that it's 'very dubious', even with your analysis.

I really don't know what to say.  The current algorithm is flat out
bogus.


> I thought Robert's response was generally good, pointing out that
> we're talking about this being an issue if the corruption happens in a
> certain set of bytes.  That said, I'm happy to see improvements in
> this area but I'm flat out upset about the notion that we must be
> perfect here- our checksums themselves aren't perfect for catching
> corruption either.

The point is that we're not detecting errors that we can detect when
read outside of basebackup. I really entirely completely fail how that
can be defended.

I think we're making promises with this the basebackup feature we're not
even remotely keeping. I don't understand how you can defend that, given
the current state, you can have a basebackup that you took with
checksums enabled, and then when actually use that basebackup you get
checksum failures.  Like it's one thing not to detect all storage
issues, but if we do detect them after using the basebackup, that's
really not ok.


> > 2) If 1) errored out, ensure that that's because the backend is
> >currently writing out the page. That basically requires doing what
> >BufferAlloc() does. So I think we'd need to end up with a function
> >like:
> > 
> > LockoutBackendWrites():
> > buf = ReadBufferWithoutRelcache(relfilenode);
> 
> This is going to cause it to be pulled into shared buffers, if it isn't
> already there, isn't it?

I can't see that being a problem. We're only going to enter this path if
we encountered a buffer where the checksum was wrong. And either that's
a data corruption even in which case we don't care about a small
performance penalty, or it's a race around writing out the page because
basebackup read it while half writen - in which case it's pretty pretty
likely that the page is still in shared buffers.


> That seems less than ideal and isn't it going
> to end up just doing exactly the same PageIsVerified() call, and what
> happens when that fails?

That seems quite easily handled with a different RBM_ mode.



> > LWLockAcquire(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> > /*
> >  * Reread page from OS, and recheck. This needs to happen while
> >  * the IO lock prevents rereading from the OS. Note that we do
> >  * not want to rely on the buffer contents here, as that could
> >  * be very old cache contents.
> >  */
> > perform_checksum_check(relfilenode, ERROR);
> > 
> > LWLockRelease(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> > ReleaseBuffer(buf);

This should be the IO lock, not content lock, sorry. Copy & pasto.


> I don't particularly like having to lock pages in this way while
> performing this check, espectially with having to read the page into
> shared buffers potentially.

Given it's only the IO lock (see above correction), and only if we can't
verify the checksum during the race, I fail to see how that can be a
problem?

Greetings,

Andres Freund




Re: partitioned tables referenced by FKs

2019-03-26 Thread Alvaro Herrera
On 2019-Mar-26, Alvaro Herrera wrote:

> Thanks for the thorough testing and bug analysis!  It was spot-on.  I've
> applied your two proposed fixes, as well as added a new test setup that
> covers both these bugs.  The attached set is rebased on 7c366ac969ce.

Attached is rebased on 126d63122232.  No further changes.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From fbf2d42a3caac983395579d89a07d5ccdfab5973 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 28 Nov 2018 11:52:00 -0300
Subject: [PATCH v9 1/2] Rework deleteObjectsInList to allow objtype-specific
 checks

This doesn't change any functionality yet.
---
 src/backend/catalog/dependency.c | 41 +++-
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index f7acb4103eb..09876705b08 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -230,29 +230,38 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
 	int			i;
 
 	/*
-	 * Keep track of objects for event triggers, if necessary.
+	 * Invoke pre-deletion callbacks and keep track of objects for event
+	 * triggers, if necessary.
 	 */
-	if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
+	for (i = 0; i < targetObjects->numrefs; i++)
 	{
-		for (i = 0; i < targetObjects->numrefs; i++)
+		const ObjectAddress *thisobj = >refs[i];
+		ObjectClass		objectClass = getObjectClass(thisobj);
+
+		if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
 		{
-			const ObjectAddress *thisobj = >refs[i];
-			const ObjectAddressExtra *extra = >extras[i];
-			bool		original = false;
-			bool		normal = false;
-
-			if (extra->flags & DEPFLAG_ORIGINAL)
-original = true;
-			if (extra->flags & DEPFLAG_NORMAL)
-normal = true;
-			if (extra->flags & DEPFLAG_REVERSE)
-normal = true;
-
-			if (EventTriggerSupportsObjectClass(getObjectClass(thisobj)))
+			if (EventTriggerSupportsObjectClass(objectClass))
 			{
+bool		original = false;
+bool		normal = false;
+const ObjectAddressExtra *extra = >extras[i];
+
+if (extra->flags & DEPFLAG_ORIGINAL)
+	original = true;
+if (extra->flags & DEPFLAG_NORMAL ||
+	extra->flags & DEPFLAG_REVERSE)
+	normal = true;
+
 EventTriggerSQLDropAddObject(thisobj, original, normal);
 			}
 		}
+
+		/* Invoke class-specific pre-deletion checks */
+		switch (objectClass)
+		{
+			default:
+break;
+		}
 	}
 
 	/*
-- 
2.17.1

>From 1addb0b6dfe973b4319c01cdf2537b2aff805ae6 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 20 Feb 2019 15:08:20 -0300
Subject: [PATCH v9 2/2] support FKs referencing partitioned tables

---
 doc/src/sgml/ref/create_table.sgml|7 +-
 src/backend/catalog/dependency.c  |3 +
 src/backend/catalog/heap.c|   24 +
 src/backend/commands/tablecmds.c  | 1367 +++--
 src/backend/utils/adt/ri_triggers.c   |  244 +++-
 src/backend/utils/adt/ruleutils.c |   18 +
 src/include/catalog/heap.h|2 +
 src/include/commands/tablecmds.h  |8 +-
 src/include/commands/trigger.h|1 +
 src/include/utils/ruleutils.h |1 +
 src/test/regress/expected/foreign_key.out |  211 +++-
 src/test/regress/sql/foreign_key.sql  |  144 ++-
 12 files changed, 1595 insertions(+), 435 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index e94fe2c3b67..37ae0f00fda 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -378,9 +378,6 @@ WITH ( MODULUS numeric_literal, REM
  
   Partitioned tables do not support EXCLUDE constraints;
   however, you can define these constraints on individual partitions.
-  Also, while it's possible to define PRIMARY KEY
-  constraints on partitioned tables, creating foreign keys that
-  reference a partitioned table is not yet supported.
  
 
  
@@ -995,9 +992,7 @@ WITH ( MODULUS numeric_literal, REM
   addition of a foreign key constraint requires a
   SHARE ROW EXCLUSIVE lock on the referenced table.
   Note that foreign key constraints cannot be defined between temporary
-  tables and permanent tables.  Also note that while it is possible to
-  define a foreign key on a partitioned table, it is not possible to
-  declare a foreign key that references a partitioned table.
+  tables and permanent tables.
  
 
  
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index 09876705b08..83965bb048c 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -259,6 +259,9 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
 		/* Invoke class-specific pre-deletion checks */
 		switch 

RE: Planning counters in pg_stat_statements (using pgss_store)

2019-03-26 Thread legrand legrand
here is a new version:

- "track_planning" GUC added
 to permit to keep previous behavior unchanged
- columns names have been changed / added:
 total_plan_time, total_exec_time, total_time
- trailing whitespaces and comments wider than 80 characters
 not fixed
- "if (jstate.clocations_count > 0) pgss_store(pstate->p_sourcetext,..."
 has been reverted
- expose query text to the planner
 won't fix (out of my knowledge)
- "Assert(planning_time > 0 && total_time > 0);"
 moved at the beginning of pgss_store

Regards
PAscal


pgss_add_planning_counters_v3
Description: pgss_add_planning_counters_v3


Re: basebackup checksum verification

2019-03-26 Thread Stephen Frost
Greetings,

* Andres Freund (and...@anarazel.de) wrote:
> As detailed in
> https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
> the way the backend's basebackup checksum verification works makes its
> error detection capabilities very dubious.

I disagree that it's 'very dubious', even with your analysis.  I thought
Robert's response was generally good, pointing out that we're talking
about this being an issue if the corruption happens in a certain set of
bytes.  That said, I'm happy to see improvements in this area but I'm
flat out upset about the notion that we must be perfect here- our
checksums themselves aren't perfect for catching corruption either.

> I think we need to fix this before the next set of backbranch releases,
> or at the very least add a big fat warning that the feature isn't doing
> much.

I disagree about this level of urgency, but if you have a decent idea
about how to improve the situation, I'm fully in support of it.

> The more I think about it, the less convinced I am of the method to
> avoid the torn page problem using LSNs. To make e.g. the PageIsNew()
> check correct, we need to verify that the whole page is zeroes - but we
> can't use the LSN for that, as it's not on the page. But there very well
> could be a torn page issue with only the second half of the page being
> written back (with the default 8kb pages that can trivially happen as
> the kernel's pagecache commonly uses 4kb granularity).
> 
> I basically think it needs to work like this:
> 
> 1) Perform the entire set of PageIsVerified() checks *without*
>previously checking the page's LSN, but don't error out.

Performing the PageIsVerified() checks seems reasonable, I don't see any
downside to doing that, so if you'd like to add that, sure, go for it.

> 2) If 1) errored out, ensure that that's because the backend is
>currently writing out the page. That basically requires doing what
>BufferAlloc() does. So I think we'd need to end up with a function
>like:
> 
> LockoutBackendWrites():
> buf = ReadBufferWithoutRelcache(relfilenode);

This is going to cause it to be pulled into shared buffers, if it isn't
already there, isn't it?  That seems less than ideal and isn't it going
to end up just doing exactly the same PageIsVerified() call, and what
happens when that fails?  You're going to end up getting an
ereport(ERROR) and long-jump out of here...  Depending on the other
code, maybe that's something you can manage, but it seems rather tricky
to me.  I do think, as was discussed extensively previously, that the
backup *should* continue even in the face of corruption, but there
should be warnings issued to notify the user of the issues.

> LWLockAcquire(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> /*
>  * Reread page from OS, and recheck. This needs to happen while
>  * the IO lock prevents rereading from the OS. Note that we do
>  * not want to rely on the buffer contents here, as that could
>  * be very old cache contents.
>  */
> perform_checksum_check(relfilenode, ERROR);
> 
> LWLockRelease(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
> ReleaseBuffer(buf);

I don't particularly like having to lock pages in this way while
performing this check, espectially with having to read the page into
shared buffers potentially.

This also isn't the only approach to dealing with this issue that the
LSN might be corrupted.  There's at least two other ways we can improve
the situation here- we can keep track of the highest LSN seen, perhaps
on a per-file basis, and then compare those to the end-of-backup LSN,
and issue a warning or perform a re-check or do something else if we
discover that the LSN found was later than the end-of-backup LSN.
That's not perfect, but it's certainly a good improvement over what we
have today.  The other approach would be to track all of the pages
which were skipped and then compare them to the pages in the WAL which
were archived during the backup, making sure that all pages which failed
checksum exist in the WAL.  That should allow us to confirm that the
page was actually being modified and won't ever be used in the state
that we saw it in, since it'll be replayed over by WAL, and therefore we
don't have to worry about the LSN or the page itself being corrupt.  Of
course, that requires tracking all the pages which are modified by the
WAL for the duration of the backup, and tracking all the pages which
failed checksum and/or other validation, and then performing the
cross-check.  That seems like a fair bit of work for this, but I'm not
sure that it's avoidable, ultimately.

I'm happy with incremental improvements in this area though, and just
checking that the LSN of pages skipped isn't completely insane would
definitely be a good improvement to begin with and might be simple
enough to back-patch.  I don't think back-patching changes like those
proposed here 

Fwd: Gsoc proposal perffarn

2019-03-26 Thread Victor Kukshiev
Hello, my name is Victor Kuvshiev.

Currently I'm third-year student of Petrozavodsk State University, studying
information systems and technologies.

I have relatively good knowledge of HTML, CSS and Python also have some
skills in javascript language.

example of my works: ruletka, console game in Lua that stores data in
PostgreSQL: https://github.com/kloun/ruletka

I can spend 5-6 hours in a day for project. Currently I don't have any
other work in the summer.

Selected interested project
https://wiki.postgresql.org/wiki/GSoC_2019#Develop_Performance_Farm_Database_and_Website_.282019.29


Re: Fix XML handling with DOCTYPE

2019-03-26 Thread Tom Lane
Ryan Lambert  writes:
> Is the xml-functions-type-docfix-4.patch [1] the one needing review?  I'll
> test applying it and review the changes in better detail.  Is there a
> section in the docs that shows how to verify if the updated pages render
> properly?  I would assume the pages are build when installing from source.

Plain old "make all" doesn't build the docs.  See
https://www.postgresql.org/docs/devel/docguide.html
for tooling prerequisites and build instructions.

(Usually people just build the HTML docs and look at them
with a browser; the other doc formats are less interesting.)

regards, tom lane




Re: speeding up planning with partitions

2019-03-26 Thread Tom Lane
Amit Langote  writes:
> On 2019/03/23 6:07, Tom Lane wrote:
>> I also feel like you used a dartboard while deciding where to insert the
>> call in query_planner(); dropping it into the middle of a sequence of
>> equivalence-class-related operations seems quite random.  Maybe we could
>> delay that step all the way to just before make_one_rel, since the other
>> stuff in between seems to only care about baserels?  For example,
>> it'd certainly be better if we could run remove_useless_joins before
>> otherrel expansion, so that we needn't do otherrel expansion at all
>> on a table that gets thrown away as being a useless join.

> create_lateral_join_info() expects otherrels to be present to propagate
> the information it creates.

Sure, but the actually-useful work in that function is just concerned
with baserels.  The only thing it does with otherrels is to propagate
parents' lateral-ref info to children, which is a lot easier, cheaper,
and more reliable if we let build_simple_rel do it.  See the version
of 0001 I just pushed.

I'll look at 0003 and up tomorrow.

regards, tom lane




Re: Fix XML handling with DOCTYPE

2019-03-26 Thread Ryan Lambert
Ok, I'll give it a go.


> If you happened to feel moved to look over a documentation patch, that
> would be what this CF entry most needs in the waning days of the
> commitfest.


Is the xml-functions-type-docfix-4.patch [1] the one needing review?  I'll
test applying it and review the changes in better detail.  Is there a
section in the docs that shows how to verify if the updated pages render
properly?  I would assume the pages are build when installing from source.

Ryan

[1]
https://www.postgresql.org/message-id/attachment/100016/xml-functions-type-docfix-4.patch

On Mon, Mar 25, 2019 at 4:52 PM Chapman Flack  wrote:

> On 03/25/19 18:03, Ryan Lambert wrote:
> > The following review has been posted through the commitfest application:
> > make installcheck-world:  tested, passed
> > Implements feature:   tested, passed
> > Spec compliant:   not tested
> > Documentation:not tested
>
> Hi,
>
> Thanks for the review! Yes, that part of this commitfest entry has been
> committed already and will appear in the next minor releases of those
> branches.
>
> That leaves only one patch in this commitfest entry that is still in
> need of review, namely the update to the documentation.
>
> If you happened to feel moved to look over a documentation patch, that
> would be what this CF entry most needs in the waning days of the
> commitfest.
>
> There seem to be community members reluctant to review it because of not
> feeling sufficiently expert in XML to scrutinize every technical detail,
> but there are other valuable angles for documentation review. (And the
> reason there *is* a documentation patch is the plentiful room for
> improvement in the documentation that's already there, so as far as
> reviewing goes, the old yarn about the two guys, the running shoes, and
> the bear comes to mind.)
>
> I can supply pointers to specs, etc., for anyone who does see some
> technical
> details in the patch and has questions about them.
>
> Regards,
> -Chap
>


Re: partitioned tables referenced by FKs

2019-03-26 Thread Alvaro Herrera
Hello Amit

On 2019-Mar-26, Amit Langote wrote:

> + Oid objectClass = getObjectClass(thisobj);
> 
> I guess you meant to use ObjectClass, not Oid here.

Absolutely.

> Tested 0002 a bit more and found some problems.

Thanks for the thorough testing and bug analysis!  It was spot-on.  I've
applied your two proposed fixes, as well as added a new test setup that
covers both these bugs.  The attached set is rebased on 7c366ac969ce.

In the course of testing those fixes, I noticed a stupid bug in
partitioned PKs -- one of the CompareIndexInfo was passing the wrong
attmap length. 0001 here fixes that, but it's really a backpatchable
bugfix so I'll put it on pg11 and master ahead of time.

> > As I said before, I'm thinking of getting rid of the whole business of
> > checking partitions on the referenced side of an FK at DROP time, and
> > instead jut forbid the DROP completely if any FKs reference an ancestor
> > of that partition.
> 
> Will that allow `DROP TABLE parted_pk CASCADE` to succeed even if
> partitions still contain referenced data?  I suppose that's the example
> you cited upthread as a bug that remains to be solved.

That's the idea, yes, it should do that: only allow a DROP of a
partition referenced by an FK if the topmost constraint is also being
dropped.  Maybe this means I need to get rid of 0002 completely.  But I
haven't got to doing that yet.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 2da528468d7ce31248d8681434ca6496808d07d5 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 26 Mar 2019 17:05:07 -0300
Subject: [PATCH v8 1/3] fix CompareIndexInfo-related bugs

---
 src/backend/catalog/index.c  | 3 +--
 src/backend/commands/indexcmds.c | 1 -
 src/backend/commands/tablecmds.c | 2 +-
 3 files changed, 2 insertions(+), 4 deletions(-)

diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c
index d2e284f6de6..c0417d4c5a2 100644
--- a/src/backend/catalog/index.c
+++ b/src/backend/catalog/index.c
@@ -1850,7 +1850,6 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	if (info1->ii_NumIndexKeyAttrs != info2->ii_NumIndexKeyAttrs)
 		return false;
 
-
 	/*
 	 * and columns match through the attribute map (actual attribute numbers
 	 * might differ!)  Note that this implies that index columns that are
@@ -1859,7 +1858,7 @@ CompareIndexInfo(IndexInfo *info1, IndexInfo *info2,
 	 */
 	for (i = 0; i < info1->ii_NumIndexAttrs; i++)
 	{
-		if (maplen < info2->ii_IndexAttrNumbers[i])
+		if (maplen < info2->ii_IndexAttrNumbers[i] - 1)
 			elog(ERROR, "incorrect attribute map");
 
 		/* ignore expressions at this stage */
diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c
index c3a53d81aab..d6eb48cb4e6 100644
--- a/src/backend/commands/indexcmds.c
+++ b/src/backend/commands/indexcmds.c
@@ -925,7 +925,6 @@ DefineIndex(Oid relationId,
 			   gettext_noop("could not convert row type"));
 maplen = parentDesc->natts;
 
-
 foreach(cell, childidxs)
 {
 	Oid			cldidxid = lfirst_oid(cell);
diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 3183b2aaa12..048c1196685 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15591,7 +15591,7 @@ ATExecAttachPartitionIdx(List **wqueue, Relation parentIdx, RangeVar *name)
 			  partIdx->rd_opfamily,
 			  parentIdx->rd_opfamily,
 			  attmap,
-			  RelationGetDescr(partTbl)->natts))
+			  RelationGetDescr(parentTbl)->natts))
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 	 errmsg("cannot attach index \"%s\" as a partition of index \"%s\"",
-- 
2.17.1

>From 9515207bb97c46e8766008bcd2761afd1b83ac7c Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Wed, 28 Nov 2018 11:52:00 -0300
Subject: [PATCH v8 2/3] Rework deleteObjectsInList to allow objtype-specific
 checks

This doesn't change any functionality yet.
---
 src/backend/catalog/dependency.c | 41 +++-
 1 file changed, 25 insertions(+), 16 deletions(-)

diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index f7acb4103eb..09876705b08 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -230,29 +230,38 @@ deleteObjectsInList(ObjectAddresses *targetObjects, Relation *depRel,
 	int			i;
 
 	/*
-	 * Keep track of objects for event triggers, if necessary.
+	 * Invoke pre-deletion callbacks and keep track of objects for event
+	 * triggers, if necessary.
 	 */
-	if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
+	for (i = 0; i < targetObjects->numrefs; i++)
 	{
-		for (i = 0; i < targetObjects->numrefs; i++)
+		const ObjectAddress *thisobj = >refs[i];
+		ObjectClass		objectClass = getObjectClass(thisobj);
+
+		if (trackDroppedObjectsNeeded() && !(flags & PERFORM_DELETION_INTERNAL))
 		{
-			

Re: Usage of epoch in txid_current

2019-03-26 Thread Thomas Munro
On Tue, Mar 26, 2019 at 12:58 PM Thomas Munro  wrote:
> ...  I think you could probably reclaim that space by
> using a more compact representation of vacuumFlags, overflowed,
> delayChkpt, nxids (it's funny, the comment says "as tightly as
> possible", which clearly isn't the case).

Woops, I take that back.  I was thinking of sizeof(bool) == 4, but
it's usually 1, so you could probably only squeeze a couple of bytes
out by moving those flags.  allPgXact elements are currently 12 bytes
apart on this system.  It's possible that expanding it to 16 bytes
would be OK, not sure, and I see there was a whole thread
investigating that a couple of years back:

https://www.postgresql.org/message-id/flat/CAPpHfdtJY4zOEDsjad6J5AyZMqZcv6gSY9AkKpA7qN3jyQ2%2B1Q%40mail.gmail.com

-- 
Thomas Munro
https://enterprisedb.com




Re: explain plans with information about (modified) gucs

2019-03-26 Thread Tomas Vondra

On Mon, Mar 18, 2019 at 11:31:48AM +0100, Rafia Sabih wrote:

On Sun, 24 Feb 2019 at 00:06, Tomas Vondra  wrote:


Hi,

attached is an updated patch, fixing and slightly tweaking the docs.


Barring objections, I'll get this committed later next week.


I was having a look at this patch, and this kept me wondering,

+static void
+ExplainShowSettings(ExplainState *es)
+{
Is there some reason for not providing any explanation above this
function just like the rest of the functions in this file?

Similarly, for

struct config_generic **
get_explain_guc_options(int *num)
{

/* also bail out of there are no options */
+ if (!num)
+ return;
I think you meant 'if' instead if 'of' here.


Thanks for the review - attached is a patch adding the missing comments,
and doing two additional minor improvements:

1) Renaming ExplainShowSettings to ExplainPrintSettings, to make it more
consistent with naming of the other functions in explain.c.

2) Actually counting GUC_EXPLAIN options, and only allocating space for
those in get_explain_guc_options, instead of using num_guc_variables. The
diffrence is quite significant (~50 vs. ~300), and considering each entry
is 8B it makes a difference because such large chunks tend to have higher
palloc overhed (due to ALLOCSET_SEPARATE_THRESHOLD).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 8e48b94d4a..2a289b8b94 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -599,8 +599,12 @@ ExplainOnePlan(PlannedStmt *plannedstmt, IntoClause *into, 
ExplainState *es,
ExplainCloseGroup("Query", NULL, true, es);
 }
 
+/*
+ * ExplainPrintSettings -
+ *Print summary of modified settings affecting query planning.
+ */
 static void
-ExplainShowSettings(ExplainState *es)
+ExplainPrintSettings(ExplainState *es)
 {
int num;
struct config_generic **gucs;
@@ -700,10 +704,10 @@ ExplainPrintPlan(ExplainState *es, QueryDesc *queryDesc)
ExplainNode(ps, NIL, NULL, NULL, es);
 
/*
-* If requested, include information about GUC parameters that don't
-* match the built-in defaults.
+* If requested, include information about GUC parameters with values
+* that don't match the built-in defaults.
 */
-   ExplainShowSettings(es);
+   ExplainPrintSettings(es);
 }
 
 /*
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index 7306d7c232..39f844ebc5 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -4532,6 +4532,7 @@ static struct config_generic **guc_variables;
 
 /* Current number of variables contained in the vector */
 static int num_guc_variables;
+static int num_guc_explain_variables;
 
 /* Vector capacity */
 static int size_guc_variables;
@@ -4796,6 +4797,7 @@ build_guc_variables(void)
 {
int size_vars;
int num_vars = 0;
+   int num_explain_vars = 0;
struct config_generic **guc_vars;
int i;
 
@@ -4806,6 +4808,9 @@ build_guc_variables(void)
/* Rather than requiring vartype to be filled in by hand, do 
this: */
conf->gen.vartype = PGC_BOOL;
num_vars++;
+
+   if (conf->gen.flags & GUC_EXPLAIN)
+   num_explain_vars++;
}
 
for (i = 0; ConfigureNamesInt[i].gen.name; i++)
@@ -4814,6 +4819,9 @@ build_guc_variables(void)
 
conf->gen.vartype = PGC_INT;
num_vars++;
+
+   if (conf->gen.flags & GUC_EXPLAIN)
+   num_explain_vars++;
}
 
for (i = 0; ConfigureNamesReal[i].gen.name; i++)
@@ -4822,6 +4830,9 @@ build_guc_variables(void)
 
conf->gen.vartype = PGC_REAL;
num_vars++;
+
+   if (conf->gen.flags & GUC_EXPLAIN)
+   num_explain_vars++;
}
 
for (i = 0; ConfigureNamesString[i].gen.name; i++)
@@ -4830,6 +4841,9 @@ build_guc_variables(void)
 
conf->gen.vartype = PGC_STRING;
num_vars++;
+
+   if (conf->gen.flags & GUC_EXPLAIN)
+   num_explain_vars++;
}
 
for (i = 0; ConfigureNamesEnum[i].gen.name; i++)
@@ -4838,6 +4852,9 @@ build_guc_variables(void)
 
conf->gen.vartype = PGC_ENUM;
num_vars++;
+
+   if (conf->gen.flags & GUC_EXPLAIN)
+   num_explain_vars++;
}
 
/*
@@ -4869,6 +4886,7 @@ build_guc_variables(void)
free(guc_variables);
guc_variables = guc_vars;
num_guc_variables = num_vars;
+   num_guc_explain_variables = num_explain_vars;
size_guc_variables = size_vars;
qsort((void *) 

Re: [HACKERS] generated columns

2019-03-26 Thread Pavel Stehule
Hi

út 26. 3. 2019 v 14:33 odesílatel Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> napsal:

> On 2019-03-20 03:51, Michael Paquier wrote:
> > On Mon, Mar 18, 2019 at 03:14:09PM +0100, Pavel Stehule wrote:
> >> postgres=# update foo set name = 'bbbxx' where id = 1; -- error
> >> ERROR:  no generation expression found for column number 3 of table
> >> "foo"
> >
> > Yes I can see the problem after adding a generated column and dropping
> > it on an INSERT query.
>
> fixed
>
> > +   if (relid && attnum && get_attgenerated(relid, attnum))
> > Better to use OidIsValid here?
>
> fixed
>
> > +   (walrcv_server_version(wrconn) >= 12 ? "AND a.attgenerated =
> ''" : ""),
> > I think that it is better to always have version-related references
> > stored as defines.
>
> A valid idea, but I don't see it widely done (see psql, pg_dump).
>
> > +CREATE TABLE gtest22a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED UNIQUE);
> > +CREATE TABLE gtest22b (a int, b int GENERATED ALWAYS AS (a * 2) STORED,
> PRIMARY KEY (a, b));
> > Some tests for unique constraints with a generated column should be in
> > place?
>
> done
>
> > It would be nice to have extra tests for forbidden expression types
> > on generated columns especially SRF, subquery and aggregates/window
> > functions.
>
> done
>

I checked this functionality and it looks very well.

1. there are not any warning or any compilation issue.
2. all tests passed, check-world passed
3. documentation is checked
4. source code is readable, commented, and well formatted
5. regress tests are enough
6. I checked a functionality and did comparison with db that implements
this function already and there are not differences
7. I tested performance and I got significantly better times against
trigger based solution - up tu 2x

It is great feature and I'll mark this feature as ready for commit

Regards

Pavel

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


Re: [HACKERS] generated columns

2019-03-26 Thread Pavel Stehule
út 26. 3. 2019 v 19:52 odesílatel Pavel Stehule 
napsal:

> Hi
>
> út 26. 3. 2019 v 14:33 odesílatel Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> napsal:
>
>> On 2019-03-20 03:51, Michael Paquier wrote:
>> > On Mon, Mar 18, 2019 at 03:14:09PM +0100, Pavel Stehule wrote:
>> >> postgres=# update foo set name = 'bbbxx' where id = 1; -- error
>> >> ERROR:  no generation expression found for column number 3 of table
>> >> "foo"
>> >
>> > Yes I can see the problem after adding a generated column and dropping
>> > it on an INSERT query.
>>
>> fixed
>>
>> > +   if (relid && attnum && get_attgenerated(relid, attnum))
>> > Better to use OidIsValid here?
>>
>> fixed
>>
>> > +   (walrcv_server_version(wrconn) >= 12 ? "AND a.attgenerated
>> = ''" : ""),
>> > I think that it is better to always have version-related references
>> > stored as defines.
>>
>> A valid idea, but I don't see it widely done (see psql, pg_dump).
>>
>> > +CREATE TABLE gtest22a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
>> * 2) STORED UNIQUE);
>> > +CREATE TABLE gtest22b (a int, b int GENERATED ALWAYS AS (a * 2)
>> STORED, PRIMARY KEY (a, b));
>> > Some tests for unique constraints with a generated column should be in
>> > place?
>>
>> done
>>
>> > It would be nice to have extra tests for forbidden expression types
>> > on generated columns especially SRF, subquery and aggregates/window
>> > functions.
>>
>> done
>>
>
> make check-world fails
>

looks like some garbage in my git repository. After cleaning it is ok.
Sorry for noise.

Regards

Pavel


> regards
>
> Pavel
>
>
>> --
>> Peter Eisentraut  http://www.2ndQuadrant.com/
>> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>>
>


Re: patch to allow disable of WAL recycling

2019-03-26 Thread Jerry Jelinek
On Thu, Mar 7, 2019 at 6:26 PM Thomas Munro  wrote:

> On Fri, Mar 8, 2019 at 12:35 PM Jerry Jelinek 
> wrote:
> > On Thu, Mar 7, 2019 at 3:09 PM Thomas Munro 
> wrote:
> >> My understanding is that it's not really the COW-ness that makes it
> >> not necessary, it's the fact that fdatasync() doesn't do anything
> >> different from fsync() on ZFS and there is no situation where
> >> fdatasync() succeeds, you lose power, you come back up and find that
> >> the file size is wrong or a hole in the middle of the file has come
> >> back from the dead, and you lost the data.  The whole concept of "data
> >> sync" implies that file meta-data and file contents are cached and
> >> synchronized separately and you can deliberately ask for weaker
> >> coherency to cut down on IOs; *that's* the thing that ZFS doesn't
> >> have, and couldn't benefit from because it's just going to write stuff
> >> in its tidy sequential log in the form of all-or-nothing transactions
> >> anyway.  I don't know if that's also true for eg BTRFS or any other
> >> COW filesystem that might be out there, but I don't know why you'd
> >> want to mention COW instead of wal_sync_mode as the motivation when
> >> the source code comment know better.
> >
> >
> > Hopefully I am not misinterpreting your comment here, but I'm not sure I
> fully agree with that assessment. I can't speak for other filesystems, but
> for ZFS, none of the zero-filled blocks will actually be written to disk,
> but that determination happens fairly late in the process, after the
> thousands of write system calls have been processed. So on ZFS, these
> writes are basically useless, except as a way to increase the size of the
> file. No disk space is actually allocated. However, on any COW filesystem,
> any write to any of these zero-filled blocks will have to allocate a new
> block, so nothing about "preallocating space" has been accomplished by all
> of these system calls. At least, preallocating space is my understanding of
> why the zero-fill is currently being performed.
>
> It seems like you're focusing on the performance and I'm focusing on
> the safety.  Obviously it's a complete waste of time to try to
> "preallocate" space on COW filesystems since they will not reuse that
> space anyway by definition.  My point was that it may be unsafe to
> turn if off when configured to use fdatasync() for later writes to the
> file on filesystems that make fewer durability guarantees with
> fdatasync() than with a full fsync(), and that seems like another
> newsworthy angle on this for end users to know about.  I dunno, maybe
> those things are so closely linked that it's OK to write just "only
> turn it off on COW filesystems", but I'm wondering why we don't
> mention the actual reason for the feature when we make that claim in
> the comments.
>
> Hmm... digging a bit further.  So, those comments in xlog.c date from
> 2013/2014 when this stuff was going down:
>
> https://lkml.org/lkml/2012/9/3/83
>
> https://www.usenix.org/conference/osdi14/technical-sessions/presentation/zheng_mai
> http://www.openldap.org/lists/openldap-devel/201411/msg2.html
>
> So that was not actually the intended behaviour of fdatasync(), but
> rather a bug in ext3/4 that's been fixed now.  POSIX says "File
> attributes that are not necessary for data retrieval (access time,
> modification time, status change time) need not be successfully
> transferred prior to returning to the calling process.", and the Linux
> man page says that it "does not flush modified metadata unless that
> metadata is needed in order to allow a subsequent data retrieval to be
> correctly handled", so... if there are no OS bugs, the comments in
> xlog.c are overly pessimistic and the only effect of using fdatasync()
> instead of fsync() to to avoid extra IOs for mtime etc.
>
> I still like the pessimism in the code.  But OK, I withdraw my
> complaint about that sentence in the documentation for now!  :-)


 I haven't heard any additional feedback in the last couple of weeks, so I
wanted to see if there is anything else needed for this patch? I did update
the code to use pg_pwrite. A new version of the patch is attached. The only
difference from the previous version is this diff:

--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -3283,8 +3283,8 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent,
bool use_lock)
 */
errno = 0;
pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_WRITE);
-   fail = lseek(fd, wal_segment_size - 1, SEEK_SET) < (off_t)
0 ||
-   (int) write(fd, zbuffer.data, 1) != (int) 1;
+   fail = pg_pwrite(fd, zbuffer.data, 1, wal_segment_size - 1)
!=
+   (ssize_t) 1;
pgstat_report_wait_end();
}

The latest patch is rebased, builds clean, and passes some basic testing.
Please let me know if there is anything else I could do on this.

Thanks,
Jerry



Re: Enable data checksums by default

2019-03-26 Thread Peter Geoghegan
On Fri, Mar 22, 2019 at 9:07 AM Tom Lane  wrote:
> IMO, the main value of checksums is that they allow the Postgres
> project to deflect blame.  That's nice for us but I'm not sure
> that it's a benefit for users.  I've seen little if any data to
> suggest that checksums actually catch enough problems to justify
> the extra CPU costs and the risk of false positives.

I share your concern.

Some users have a peculiar kind of cognitive dissonance around
corruption, at least in my experience. It's very difficult for them to
make a choice on whether or not to fail hard. Perhaps that needs to be
taken into account, without being indulged.

-- 
Peter Geoghegan




Re: [HACKERS] generated columns

2019-03-26 Thread Pavel Stehule
Hi

út 26. 3. 2019 v 14:33 odesílatel Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> napsal:

> On 2019-03-20 03:51, Michael Paquier wrote:
> > On Mon, Mar 18, 2019 at 03:14:09PM +0100, Pavel Stehule wrote:
> >> postgres=# update foo set name = 'bbbxx' where id = 1; -- error
> >> ERROR:  no generation expression found for column number 3 of table
> >> "foo"
> >
> > Yes I can see the problem after adding a generated column and dropping
> > it on an INSERT query.
>
> fixed
>
> > +   if (relid && attnum && get_attgenerated(relid, attnum))
> > Better to use OidIsValid here?
>
> fixed
>
> > +   (walrcv_server_version(wrconn) >= 12 ? "AND a.attgenerated =
> ''" : ""),
> > I think that it is better to always have version-related references
> > stored as defines.
>
> A valid idea, but I don't see it widely done (see psql, pg_dump).
>
> > +CREATE TABLE gtest22a (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a
> * 2) STORED UNIQUE);
> > +CREATE TABLE gtest22b (a int, b int GENERATED ALWAYS AS (a * 2) STORED,
> PRIMARY KEY (a, b));
> > Some tests for unique constraints with a generated column should be in
> > place?
>
> done
>
> > It would be nice to have extra tests for forbidden expression types
> > on generated columns especially SRF, subquery and aggregates/window
> > functions.
>
> done
>

make check-world fails

regards

Pavel


> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>


regression.diffs
Description: Binary data


dblink.out
Description: Binary data


Re: pgsql: Get rid of backtracking in jsonpath_scan.l

2019-03-26 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Mar-26, Tom Lane wrote:
>> Alvaro Herrera  writes:
>>> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55

>> I've never cared for the ntstatus.h reference, but how stable is
>> the URL you suggest going to be?  That UUID or whatever it is
>> does not inspire confidence.

> That's true.  Before posting, I looked for a statement about URL
> stability, couldn't find anything.  I suppose one currently working URL
> is better than four currently dead URLs.

It looks like we could just point to the parent page,

https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/

That requires a little more drilling down, but it seems likely to
remain stable across versions, whereas I strongly suspect the URL
you mention is version-specific.

regards, tom lane




Re: [Patch] Base backups and random or zero pageheaders (was: Online verification of checksums)

2019-03-26 Thread Michael Banck
Hi,

Am Dienstag, den 26.03.2019, 10:30 -0700 schrieb Andres Freund:
> On 2019-03-26 18:22:55 +0100, Michael Banck wrote:
> > Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> > > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM 
> > > generate_series(1, 100) g(i);
> > > SELECT pg_relation_size('corruptme');
> > > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || 
> > > pg_relation_filepath('corruptme');
> > > ┌─┐
> > > │  ?column?   │
> > > ├─┤
> > > │ /srv/dev/pgdev-dev/base/13390/16384 │
> > > └─┘
> > > (1 row)
> > > dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 
> > > conv=notrunc
> > > 
> > > Try a basebackup and see how many times it'll detect the corrupt
> > > data. In the vast majority of cases you're going to see checksum
> > > failures when reading the data for normal operation, but not when using
> > > basebackup (or this new tool).
> > > 
> > > At the very very least this would need to do
> > > 
> > > a) checks that the page is all zeroes if PageIsNew() (like
> > >PageIsVerified() does for the backend). That avoids missing cases
> > >where corruption just zeroed out the header, but not the whole page.
> > > b) Check that pd_lsn is between startlsn and the insertion pointer. That
> > >avoids accepting just about all random data.
> > > 
> > > And that'd *still* be less strenuous than what normal backends
> > > check. And that's already not great (due to not noticing zeroed out
> > > data).
> > 
> > I've done the above in the attached patch now. Well, literally like an
> > hour ago, then went jogging and came back to see you outlined about
> > fixing this differently in a separate thread. Still might be helpful for
> > the TAP test changes at least.
> 
> Sorry, I just hadn't seen much movement on this, and I'm a bit concerned
> about such a critical issue not being addressed.

Sure, I was working on this a bit on and off for a few days but I had
random corruption issues which I finally tracked down earlier to reusing
"for (i=0 [...]" via copy, d'oh.  That's why I renamed the `i'
variable to `page_in_buf' cause it's a pretty long loop so should have a
useful variable name IMO.

> > /*
> > -* Only check pages which have not been 
> > modified since the
> > -* start of the base backup. Otherwise, they 
> > might have been
> > -* written only halfway and the checksum would 
> > not be valid.
> > -* However, replaying WAL would reinstate the 
> > correct page in
> > -* this case. We also skip completely new 
> > pages, since they
> > -* don't have a checksum yet.
> > +* We skip completely new pages after checking 
> > they are
> > +* all-zero, since they don't have a checksum 
> > yet.
> >  */
> > -   if (!PageIsNew(page) && PageGetLSN(page) < 
> > startptr)
> > +   if (PageIsNew(page))
> > {
> > -   checksum = pg_checksum_page((char *) 
> > page, blkno + segmentno * RELSEG_SIZE);
> > -   phdr = (PageHeader) page;
> > -   if (phdr->pd_checksum != checksum)
> > +   all_zeroes = true;
> > +   pagebytes = (size_t *) page;
> > +   for (int i = 0; i < (BLCKSZ / 
> > sizeof(size_t)); i++)
> 
> Can we please abstract the zeroeness check into a separate function to
> be used both by PageIsVerified() and this?

Ok, done so as PageIsZero further down in bufpage.c.

> > +   if (!all_zeroes)
> > +   {
> > +   /*
> > +* pd_upper is zero, but the 
> > page is not all zero.  We
> > +* cannot run 
> > pg_checksum_page() on the page as it
> > +* would throw an assertion 
> > failure.  Consider this a
> > +* checksum failure.
> > +*/
> 
> I don't think the assertion failure is the relevant bit here, it's htat
> the page is corrupted, no?

Well, relevant in the sense that the reader might wonder why we don't
just call pg_checksum_page() and have a consistent error message with
the other codepath.

We could maybe run pg_checksum_block() on it and reverse the rest of the
permutations from pg_checksum_page() but that might be overly
complicated for 

Re: pgsql: Get rid of backtracking in jsonpath_scan.l

2019-03-26 Thread Alvaro Herrera
On 2019-Mar-26, Tom Lane wrote:

> Alvaro Herrera  writes:
> >> 0xC028 is STATUS_BAD_STACK, per
> >> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
> >> Not sure how credible/useful a stack trace is going to be.
> 
> > BTW I think we should update our message to use this URL instead of
> > ambiguously pointing to "ntstatus.h".
> 
> I've never cared for the ntstatus.h reference, but how stable is
> the URL you suggest going to be?  That UUID or whatever it is
> does not inspire confidence.

That's true.  Before posting, I looked for a statement about URL
stability, couldn't find anything.  I suppose one currently working URL
is better than four currently dead URLs.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [PATCH][HOTFIX] vacuum_cost_delay type change from int to real have not been done everywhere

2019-03-26 Thread Tom Lane
Nikolay Shaplov  writes:
> In caf626b2 type of vacuum_cost_delay have been switched from int to real, 
> everywhere, but not in *RelOpts[] arrays.

Ugh.

> For some reason it continued to build and work.

I'm not quite sure why it worked either; apparently, the type of that
array entry doesn't have anything to do with the variable's storage
format.  The bounds-check code must think it's dealing with an integer,
but that doesn't matter either for the values we need.

> PS. As you can see current reloption code is error-prone.

Yeah, that was pretty obvious already :-(.  Having more than one place
defining the type of an option is clearly bogus.  I missed that this
entry was type-specific because you'd really have to go up to the top
of the array to notice that; and since the type information *is* contained
in another entry, my bogometer failed to trigger.

Fix pushed, thanks for the report!

regards, tom lane



Re: [Patch] Base backups and random or zero pageheaders (was: Online verification of checksums)

2019-03-26 Thread Andres Freund
On 2019-03-26 18:22:55 +0100, Michael Banck wrote:
> Hi,
> 
> Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> > CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 
> > 100) g(i);
> > SELECT pg_relation_size('corruptme');
> > postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || 
> > pg_relation_filepath('corruptme');
> > ┌─┐
> > │  ?column?   │
> > ├─┤
> > │ /srv/dev/pgdev-dev/base/13390/16384 │
> > └─┘
> > (1 row)
> > dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 
> > conv=notrunc
> > 
> > Try a basebackup and see how many times it'll detect the corrupt
> > data. In the vast majority of cases you're going to see checksum
> > failures when reading the data for normal operation, but not when using
> > basebackup (or this new tool).
> > 
> > At the very very least this would need to do
> > 
> > a) checks that the page is all zeroes if PageIsNew() (like
> >PageIsVerified() does for the backend). That avoids missing cases
> >where corruption just zeroed out the header, but not the whole page.
> > b) Check that pd_lsn is between startlsn and the insertion pointer. That
> >avoids accepting just about all random data.
> > 
> > And that'd *still* be less strenuous than what normal backends
> > check. And that's already not great (due to not noticing zeroed out
> > data).
> 
> I've done the above in the attached patch now. Well, literally like an
> hour ago, then went jogging and came back to see you outlined about
> fixing this differently in a separate thread. Still might be helpful for
> the TAP test changes at least.

Sorry, I just hadn't seen much movement on this, and I'm a bit concerned
about such a critical issue not being addressed.


>   /*
> -  * Only check pages which have not been 
> modified since the
> -  * start of the base backup. Otherwise, they 
> might have been
> -  * written only halfway and the checksum would 
> not be valid.
> -  * However, replaying WAL would reinstate the 
> correct page in
> -  * this case. We also skip completely new 
> pages, since they
> -  * don't have a checksum yet.
> +  * We skip completely new pages after checking 
> they are
> +  * all-zero, since they don't have a checksum 
> yet.
>*/
> - if (!PageIsNew(page) && PageGetLSN(page) < 
> startptr)
> + if (PageIsNew(page))
>   {
> - checksum = pg_checksum_page((char *) 
> page, blkno + segmentno * RELSEG_SIZE);
> - phdr = (PageHeader) page;
> - if (phdr->pd_checksum != checksum)
> + all_zeroes = true;
> + pagebytes = (size_t *) page;
> + for (int i = 0; i < (BLCKSZ / 
> sizeof(size_t)); i++)

Can we please abstract the zeroeness check into a separate function to
be used both by PageIsVerified() and this?


> + if (!all_zeroes)
> + {
> + /*
> +  * pd_upper is zero, but the 
> page is not all zero.  We
> +  * cannot run 
> pg_checksum_page() on the page as it
> +  * would throw an assertion 
> failure.  Consider this a
> +  * checksum failure.
> +  */

I don't think the assertion failure is the relevant bit here, it's htat
the page is corrupted, no?


> + /*
> +  * Only check pages which have not been 
> modified since the
> +  * start of the base backup. Otherwise, 
> they might have been
> +  * written only halfway and the 
> checksum would not be valid.
> +  * However, replaying WAL would 
> reinstate the correct page in
> +  * this case. If the page LSN is larger 
> than the current redo
> +  * pointer then we assume a bogus LSN 
> due to random page header
> +  * corruption and do verify the 
> checksum.
> +  */
> + if 

Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-03-26 Thread Robert Haas
On Tue, Mar 26, 2019 at 11:23 AM Masahiko Sawada  wrote:
> > I don't see a patch with the naming updated, here or there, and I'm
> > going to be really unhappy if we end up with inconsistent naming
> > between two patches that do such fundamentally similar things.  -1
> > from me to committing either one until that inconsistency is resolved.
>
> Agreed. I've just submitted the latest version patch that adds
> INDEX_CLEANUP option and vacuum_index_cleanup reloption. I already
> mentioned on that thread but I agreed with adding phrase positively
> than negatively. So if we got consensus on such naming the new options
> added by this patch could be something like SHRINK option (with
> true/false) and vacuum_shrink reloption.

No, that's just perpetuating the problem.  Then you have an option
SHRINK here that you set to TRUE to skip something, and an option
INDEX_CLEANUP over there that you set to FALSE to skip something.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



[Patch] Base backups and random or zero pageheaders (was: Online verification of checksums)

2019-03-26 Thread Michael Banck
Hi,

Am Dienstag, den 19.03.2019, 13:00 -0700 schrieb Andres Freund:
> CREATE TABLE corruptme AS SELECT g.i::text AS data FROM generate_series(1, 
> 100) g(i);
> SELECT pg_relation_size('corruptme');
> postgres[22890][1]=# SELECT current_setting('data_directory') || '/' || 
> pg_relation_filepath('corruptme');
> ┌─┐
> │  ?column?   │
> ├─┤
> │ /srv/dev/pgdev-dev/base/13390/16384 │
> └─┘
> (1 row)
> dd if=/dev/urandom of=/srv/dev/pgdev-dev/base/13390/16384 bs=8192 count=1 
> conv=notrunc
> 
> Try a basebackup and see how many times it'll detect the corrupt
> data. In the vast majority of cases you're going to see checksum
> failures when reading the data for normal operation, but not when using
> basebackup (or this new tool).
> 
> At the very very least this would need to do
> 
> a) checks that the page is all zeroes if PageIsNew() (like
>PageIsVerified() does for the backend). That avoids missing cases
>where corruption just zeroed out the header, but not the whole page.
> b) Check that pd_lsn is between startlsn and the insertion pointer. That
>avoids accepting just about all random data.
> 
> And that'd *still* be less strenuous than what normal backends
> check. And that's already not great (due to not noticing zeroed out
> data).

I've done the above in the attached patch now. Well, literally like an
hour ago, then went jogging and came back to see you outlined about
fixing this differently in a separate thread. Still might be helpful for
the TAP test changes at least.


Michael

-- 
Michael Banck
Projektleiter / Senior Berater
Tel.: +49 2166 9901-171
Fax:  +49 2166 9901-100
Email: michael.ba...@credativ.de

credativ GmbH, HRB Mönchengladbach 12080
USt-ID-Nummer: DE204566209
Trompeterallee 108, 41189 Mönchengladbach
Geschäftsführung: Dr. Michael Meskes, Jörg Folz, Sascha Heuer

Unser Umgang mit personenbezogenen Daten unterliegt
folgenden Bestimmungen: https://www.credativ.de/datenschutzdiff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c
index 537f09e342..70a41b6998 100644
--- a/src/backend/replication/basebackup.c
+++ b/src/backend/replication/basebackup.c
@@ -1369,16 +1369,18 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 		 bool missing_ok, Oid dboid)
 {
 	FILE	   *fp;
+	bool		all_zeroes = false;
 	BlockNumber blkno = 0;
 	bool		block_retry = false;
 	char		buf[TAR_SEND_SIZE];
 	uint16		checksum;
 	int			checksum_failures = 0;
 	off_t		cnt;
-	int			i;
+	int			page_in_buf;
 	pgoff_t		len = 0;
 	char	   *page;
 	size_t		pad;
+	size_t *pagebytes;
 	PageHeader	phdr;
 	int			segmentno = 0;
 	char	   *segmentpath;
@@ -1449,78 +1451,42 @@ sendFile(const char *readfilename, const char *tarfilename, struct stat *statbuf
 
 		if (verify_checksum)
 		{
-			for (i = 0; i < cnt / BLCKSZ; i++)
+			for (page_in_buf = 0; page_in_buf < cnt / BLCKSZ; page_in_buf++)
 			{
-page = buf + BLCKSZ * i;
+page = buf + BLCKSZ * page_in_buf;
 
 /*
- * Only check pages which have not been modified since the
- * start of the base backup. Otherwise, they might have been
- * written only halfway and the checksum would not be valid.
- * However, replaying WAL would reinstate the correct page in
- * this case. We also skip completely new pages, since they
- * don't have a checksum yet.
+ * We skip completely new pages after checking they are
+ * all-zero, since they don't have a checksum yet.
  */
-if (!PageIsNew(page) && PageGetLSN(page) < startptr)
+if (PageIsNew(page))
 {
-	checksum = pg_checksum_page((char *) page, blkno + segmentno * RELSEG_SIZE);
-	phdr = (PageHeader) page;
-	if (phdr->pd_checksum != checksum)
+	all_zeroes = true;
+	pagebytes = (size_t *) page;
+	for (int i = 0; i < (BLCKSZ / sizeof(size_t)); i++)
 	{
-		/*
-		 * Retry the block on the first failure.  It's
-		 * possible that we read the first 4K page of the
-		 * block just before postgres updated the entire block
-		 * so it ends up looking torn to us.  We only need to
-		 * retry once because the LSN should be updated to
-		 * something we can ignore on the next pass.  If the
-		 * error happens again then it is a true validation
-		 * failure.
-		 */
-		if (block_retry == false)
+		if (pagebytes[i] != 0)
 		{
-			/* Reread the failed block */
-			if (fseek(fp, -(cnt - BLCKSZ * i), SEEK_CUR) == -1)
-			{
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg("could not fseek in file \"%s\": %m",
-readfilename)));
-			}
-
-			if (fread(buf + BLCKSZ * i, 1, BLCKSZ, fp) != BLCKSZ)
-			{
-ereport(ERROR,
-		(errcode_for_file_access(),
-		 errmsg("could not reread block %d of file \"%s\": %m",
-blkno, 

Re: pgsql: Get rid of backtracking in jsonpath_scan.l

2019-03-26 Thread Tom Lane
Alvaro Herrera  writes:
>> 0xC028 is STATUS_BAD_STACK, per
>> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
>> Not sure how credible/useful a stack trace is going to be.

> BTW I think we should update our message to use this URL instead of
> ambiguously pointing to "ntstatus.h".

I've never cared for the ntstatus.h reference, but how stable is
the URL you suggest going to be?  That UUID or whatever it is
does not inspire confidence.

regards, tom lane



Re: pgsql: Get rid of backtracking in jsonpath_scan.l

2019-03-26 Thread Andrew Dunstan


On 3/26/19 12:53 PM, Alvaro Herrera wrote:
> On 2019-Mar-26, Alvaro Herrera wrote:
>
>>> 2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG:  server process (PID 
>>> 8368) was terminated by exception 0xC028
>> 0xC028 is STATUS_BAD_STACK, per
>> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
>> Not sure how credible/useful a stack trace is going to be.
> BTW I think we should update our message to use this URL instead of
> ambiguously pointing to "ntstatus.h".  Also, all the URLs in
> win32_port.h (except the wine one) are dead.
>


That's a fairly awful URL. How stable is it likely to be?


cheers


andrew

-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




basebackup checksum verification

2019-03-26 Thread Andres Freund
Hi,

As detailed in
https://postgr.es/m/20190319200050.ncuxejradurjakdc%40alap3.anarazel.de
the way the backend's basebackup checksum verification works makes its
error detection capabilities very dubious.

I think we need to fix this before the next set of backbranch releases,
or at the very least add a big fat warning that the feature isn't doing
much.

The more I think about it, the less convinced I am of the method to
avoid the torn page problem using LSNs. To make e.g. the PageIsNew()
check correct, we need to verify that the whole page is zeroes - but we
can't use the LSN for that, as it's not on the page. But there very well
could be a torn page issue with only the second half of the page being
written back (with the default 8kb pages that can trivially happen as
the kernel's pagecache commonly uses 4kb granularity).

I basically think it needs to work like this:

1) Perform the entire set of PageIsVerified() checks *without*
   previously checking the page's LSN, but don't error out.

2) If 1) errored out, ensure that that's because the backend is
   currently writing out the page. That basically requires doing what
   BufferAlloc() does. So I think we'd need to end up with a function
   like:

LockoutBackendWrites():
buf = ReadBufferWithoutRelcache(relfilenode);
LWLockAcquire(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
/*
 * Reread page from OS, and recheck. This needs to happen while
 * the IO lock prevents rereading from the OS. Note that we do
 * not want to rely on the buffer contents here, as that could
 * be very old cache contents.
 */
perform_checksum_check(relfilenode, ERROR);

LWLockRelease(BufferDescriptorGetContentLock(buf), LW_EXCLUSIVE);
ReleaseBuffer(buf);

3) If 2) also failed, then we can be sure that the page is truly
   corrupted.


Greetings,

Andres Freund



Re: pgsql: Get rid of backtracking in jsonpath_scan.l

2019-03-26 Thread Alvaro Herrera
On 2019-Mar-26, Alvaro Herrera wrote:

> > 2019-03-26 00:49:02.208 EDT [5c99ae9e.20cc:6] LOG:  server process (PID 
> > 8368) was terminated by exception 0xC028
> 
> 0xC028 is STATUS_BAD_STACK, per
> https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
> Not sure how credible/useful a stack trace is going to be.

BTW I think we should update our message to use this URL instead of
ambiguously pointing to "ntstatus.h".  Also, all the URLs in
win32_port.h (except the wine one) are dead.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index f84f882c4cb..549c82ea627 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -647,7 +647,8 @@ pgarch_archiveXlog(char *xlog)
 			ereport(lev,
 	(errmsg("archive command was terminated by exception 0x%X",
 			WTERMSIG(rc)),
-	 errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+	 errhint("See \"%s\" for a description of the hexadecimal value.",
+			 "https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55;),
 	 errdetail("The failed archive command was: %s",
 			   xlogarchcmd)));
 #else
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index fe599632d3d..a540d51d79a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -3630,7 +3630,8 @@ LogChildExit(int lev, const char *procname, int pid, int exitstatus)
 		  "server process" */
 (errmsg("%s (PID %d) was terminated by exception 0x%X",
 		procname, pid, WTERMSIG(exitstatus)),
- errhint("See C include file \"ntstatus.h\" for a description of the hexadecimal value."),
+ errhint("See \"%s\" for a description of the hexadecimal value.",
+		 "https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55;),
  activity ? errdetail("Failed process was running: %s", activity) : 0));
 #else
 		ereport(lev,
diff --git a/src/include/port/win32_port.h b/src/include/port/win32_port.h
index f4841fb3975..12abd08cda4 100644
--- a/src/include/port/win32_port.h
+++ b/src/include/port/win32_port.h
@@ -107,7 +107,7 @@
  *	similar to a unix-style signal exit (think SIGSEGV ==
  *	STATUS_ACCESS_VIOLATION).  Return values are broken up into groups:
  *
- *	http://msdn2.microsoft.com/en-gb/library/aa489609.aspx
+ *	https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/87fba13e-bf06-450e-83b1-9241dc81e781
  *
  *		NT_SUCCESS			0 - 0x3FFF
  *		NT_INFORMATION		0x4000 - 0x7FFF
@@ -119,24 +119,10 @@
  *	by the system, and it seems values >= 0x100 are system-generated.
  *	See this URL for a list of WIN32 STATUS_* values:
  *
- *		Wine (URL used in our error messages) -
+ *		https://docs.microsoft.com/en-us/openspecs/windows_protocols/ms-erref/596a1078-e883-4972-9bbc-49e60bebca55
+ *
+ *		Wine -
  *			http://source.winehq.org/source/include/ntstatus.h
- *		Descriptions - http://www.comp.nus.edu.sg/~wuyongzh/my_doc/ntstatus.txt
- *		MS SDK - http://www.nologs.com/ntstatus.html
- *
- *	It seems the exception lists are in both ntstatus.h and winnt.h, but
- *	ntstatus.h has a more comprehensive list, and it only contains
- *	exception values, rather than winnt, which contains lots of other
- *	things:
- *
- *		http://www.microsoft.com/msj/0197/exception/exception.aspx
- *
- *		The ExceptionCode parameter is the number that the operating system
- *		assigned to the exception. You can see a list of various exception codes
- *		in WINNT.H by searching for #defines that start with "STATUS_". For
- *		example, the code for the all-too-familiar STATUS_ACCESS_VIOLATION is
- *		0xC005. A more complete set of exception codes can be found in
- *		NTSTATUS.H from the Windows NT DDK.
  *
  *	Some day we might want to print descriptions for the most common
  *	exceptions, rather than printing an include file name.  We could use


Re: PostgreSQL Participates in GSoC 2019!

2019-03-26 Thread Ritom Sonowal
Hi,

I have applied and submitted an initial draft of my proposal for GSoC 2019
through the Summer of Code site. The project is titled 'pgAdmin4 Graphing
Query Tool'.

I would like to get some feedback for the same so that I can improve on
making the final proposal better. The link to the draft is - GSoC 2019
Proposal


Thanks,
Ritom

On Thu, 28 Feb, 2019, 8:52 PM Stephen Frost,  wrote:

> Greetings,
>
> I'm pleased to announce that we have been accepted by Google to
> participate in the Summer of Code (GSoC) 2019 program. This will be the
> 12th time that the PostgreSQL Project will provide mentorship for
> students to help develop new features for PostgreSQL. We have the chance
> to accept student projects that will be developed from May to August.
>
> If you are a student, and want to participate in this year's GSoC,
> please watch this Wiki page: https://wiki.postgresql.org/wiki/GSoC
>
> If you are interested in mentoring a student, you can add your own idea
> to the project list. Please reach out to the PG GSoC admins, listed
> here: https://wiki.postgresql.org/wiki/GSoC
>
> And finally, we ask everyone to reach out to students and universities
> and let them know about GSoC.
>
> Thanks!
>
> Stephen
> PostgreSQL GSoC 2019 Administrator
>


Re: Tid scan improvements

2019-03-26 Thread Andres Freund
Hi,

On 2019-03-26 19:11:13 +1300, Edmund Horner wrote:
> The changes in heapam.c were required for backward scan support, as
> used by ORDER BY ctid DESC and MAX(ctid); and also for FETCH LAST and
> FETCH PRIOR.  I have removed the backward scans functionality from the
> current set of patches, but support for backward cursor fetches
> remains.
> 
> I guess to brutally simplify the patch further, we could give up
> backward cursor fetches entirely?  This means such cursors that end up
> using a TidRangeScan will require SCROLL to go backwards (which is a
> small pain for user experience), but TBH I don't think backwards-going
> cursors on CTID will be hugely common.

FWIW, I think it'd be entirely reasonable to remove support for backward
scans without SCROLL. In fact, I think it'd be wasted effort to maintain
code for it, without a pretty clear reason why we need it (unless it
were trivial to support, which it isn't).

Greetings,

Andres Freund



[PATCH][HOTFIX] vacuum_cost_delay type change from int to real have not been done everywhere

2019-03-26 Thread Nikolay Shaplov
Hi!

In caf626b2 type of vacuum_cost_delay have been switched from int to real, 
everywhere, but not in *RelOpts[] arrays.

For some reason it continued to build and work. But I think it is better to 
move vacuum_cost_delay from int to real there too...

Patch attached.

PS. As you can see current reloption code is error-prone. To properly change 
reloptions you should simultaneously change code in several different places, 
and as you can see, it may not report if you missed something.
I am working on reloptions code refactoring now, please join reviewing my 
patches. This work is important as you can see from this example... 
 
diff --git a/.gitignore b/.gitignore
index 794e35b..37331c2 100644
diff --git a/src/backend/access/common/reloptions.c b/src/backend/access/common/reloptions.c
index 3b0b138..ba90235 100644
--- a/src/backend/access/common/reloptions.c
+++ b/src/backend/access/common/reloptions.c
@@ -214,15 +214,6 @@ static relopt_int intRelOpts[] =
 	},
 	{
 		{
-			"autovacuum_vacuum_cost_delay",
-			"Vacuum cost delay in milliseconds, for autovacuum",
-			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
-			ShareUpdateExclusiveLock
-		},
-		-1, 0, 100
-	},
-	{
-		{
 			"autovacuum_vacuum_cost_limit",
 			"Vacuum cost amount available before napping, for autovacuum",
 			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
@@ -366,6 +357,16 @@ static relopt_real realRelOpts[] =
 	},
 	{
 		{
+			"autovacuum_vacuum_cost_delay",
+			"Vacuum cost delay in milliseconds, for autovacuum",
+			RELOPT_KIND_HEAP | RELOPT_KIND_TOAST,
+			ShareUpdateExclusiveLock
+		},
+		-1, 0, 100
+	},
+
+	{
+		{
 			"seq_page_cost",
 			"Sets the planner's estimate of the cost of a sequentially fetched disk page.",
 			RELOPT_KIND_TABLESPACE,


Re: speeding up planning with partitions

2019-03-26 Thread Tom Lane
Amit Langote  writes:
> 0002 is a new patch to get rid of the duplicate RTE and PlanRowMark that's
> created for partitioned parent table, as it's pointless.  I was planning
> to propose it later, but decided to post it now if we're modifying the
> nearby code anyway.

Good idea, but it needs a bit more work on nearby comments, and the
logic in expand_single_inheritance_child could be simplified, and
you evidently didn't try check-world because the RTE-removal causes
cosmetic changes in postgres_fdw.out.  I fixed that up and pushed
this part.

regards, tom lane



Re: MSVC Build support with visual studio 2019

2019-03-26 Thread Andrew Dunstan


On 3/20/19 8:36 PM, Haribabu Kommi wrote:
> Hi Hackers,
>
> Here I attached a patch that supports building of PostgreSQL with VS 2019.
> VS 2019 is going to release on Apr 2nd 2019, it will be good if version 12
> supports compiling. The attached for is for review, it may needs some
> updates
> once the final version is released.
>
> Commit d9dd406fe281d22d5238d3c26a7182543c711e74 has reduced the
> minimum visual studio support to 2013 to support C99 standards,
> because of this
> reason, the current attached patch cannot be backpatched as it is.
>
> I can provide a separate back branches patch later once this patch
> comes to a stage of commit. Currently all the supported branches are
> possible to compile with VS 2017.
>
> comments?
>
>


I have verified that this works with VS2019.


There are a few typos in the comments that need cleaning up.


cheers


andrew


-- 
Andrew Dunstanhttps://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Should the docs have a warning about pg_stat_reset()?

2019-03-26 Thread Euler Taveira
Em ter, 26 de mar de 2019 às 09:54, David Rowley
 escreveu:
>
> As I mentioned in [1], I've had a few cases recently about auto-vacuum
> not working. On the other thread, it was all about auto-vacuum being
> configured to run too slowly.  The other culprit for auto-vacuum not
> working is when people periodically use pg_stat_reset().
>
> The problem with pg_stat_reset() is that it zeros n_dead_tup and
> n_mod_since_analyze.  If say a user resets the stats on a monthly
> basis then this can mean that tables that normally receive an
> auto-vacuum any less frequently than once per month could never
> receive an auto-vacuum... at least not until an anti-wraparound vacuum
> gets hold of it.
>
It seems a bug^H^H^H new feature. The problem is if you keep resetting
statistic before reaching an ANALYZE threshold. In this case,
autoVACUUM was never triggered because we don't have stats. The
consequence is a huge bloat.

> The best I can think to do to try and avoid this is to put a giant
> WARNING in the docs about either not using it or to at least run
> ANALYZE after using it.
>
+1. I am afraid it is not sufficient.

> Does anyone else think this is a problem worth trying to solve?
>
I don't remember why we didn't consider table without stats to be
ANALYZEd. Isn't it the case to fix autovacuum? Analyze
autovacuum_count + vacuum_count = 0?

If at least autovacuum was also time-based, it should mitigate the
lack of statistic.


-- 
   Euler Taveira   Timbira -
http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento



Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-03-26 Thread Masahiko Sawada
On Tue, Mar 26, 2019 at 10:30 PM Robert Haas  wrote:
>
> On Tue, Mar 26, 2019 at 3:57 AM Tsunakawa, Takayuki
>  wrote:
> > From: David Steele [mailto:da...@pgmasters.net]
> > > This patch appears to have been stalled for a while.
> > >
> > > Takayuki -- the ball appears to be in your court.  Perhaps it would be
> > > helpful to summarize what you think are next steps?
> >
> > disable_index_cleanup is handled by Sawada-san in another thread.  I 
> > understand I've reflected all review comments in the latest patch, and 
> > replied to the opinions/proposals, so the patch status is kept "needs 
> > review."  (I hope new fire won't happen...)
>
> I don't see a patch with the naming updated, here or there, and I'm
> going to be really unhappy if we end up with inconsistent naming
> between two patches that do such fundamentally similar things.  -1
> from me to committing either one until that inconsistency is resolved.

Agreed. I've just submitted the latest version patch that adds
INDEX_CLEANUP option and vacuum_index_cleanup reloption. I already
mentioned on that thread but I agreed with adding phrase positively
than negatively. So if we got consensus on such naming the new options
added by this patch could be something like SHRINK option (with
true/false) and vacuum_shrink reloption.

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Tomas Vondra

On Tue, Mar 26, 2019 at 01:37:33PM +, Dean Rasheed wrote:

On Tue, 26 Mar 2019 at 11:59, Dean Rasheed  wrote:


On Mon, 25 Mar 2019 at 23:36, Tomas Vondra  wrote:
>
> Attached is an updated patch...

I just looked through the latest set of changes and I have a couple of
additional review comments:



I just spotted another issue while reading the code:

It's possible to build an MCV list with more than
STATS_MCVLIST_MAX_ITEMS = 8192 items, which then causes an error when
the code tries to read it back in:

create temp table foo(a int, b int);
insert into foo select x,x from generate_series(1,1) g(x);
insert into foo select x,x from generate_series(1,1) g(x);
alter table foo alter column a set statistics 1;
alter table foo alter column b set statistics 1;
create statistics s (mcv) on a,b from foo;
analyse foo;
select * from foo where a=1 and b=1;

ERROR:  invalid length (1) item array in MCVList

So this needs to be checked when building the MCV list.

In fact, the stats targets for table columns can be as large as 1
(a hard-coded constant in tablecmds.c, which is pretty ugly, but
that's a different matter), so I think STATS_MCVLIST_MAX_ITEMS
probably ought to match that.

There are also a couple of comments that refer to the 8k limit, which
would need updating, if you change it.



I think we can simply ditch this separate limit, and rely on the
statistics target. The one issue with it is that if we ever allows the
statistics target to we may end up overflowing uint16 (which is used in
the serialized representation). But I think it's OK to just check that in
an assert, or so.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Enable data checksums by default

2019-03-26 Thread Christoph Berg
Re: Tom Lane 2019-03-22 <4368.1553270...@sss.pgh.pa.us>
> Christoph Berg  writes:
> > I think, the next step in that direction would be to enable data
> > checksums by default. They make sense in most setups,
> 
> Well, that is exactly the point that needs some proof, not just
> an unfounded assertion.

I run a benchmark with checksums disabled/enabled. shared_buffers is
512kB to make sure almost any read will fetch the page from the OS
cache; scale factor is 50 (~750MB) to make sure the whole cluster fits
into RAM.

model name: Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz (8 threads)
alter system set shared_buffers = '512kB';
pgbench -s 50 -i
pgbench -P 5 -M prepared -c 8 -j 8 -T 60 --select-only

without checksums:
tps = 96893.627255 (including connections establishing)
tps = 97570.587793 (including connections establishing)
tps = 97455.484419 (including connections establishing)
tps = 97533.668801 (including connections establishing)
average: 97363

with checksums:
tps = 91942.502487 (including connections establishing)
tps = 92390.556925 (including connections establishing)
tps = 92956.923271 (including connections establishing)
tps = 92914.205047 (including connections establishing)
average: 92551

select 92551.0/97363;
0.9506

So the cost is 5% in this very contrived case. In almost any other
setting, the cost would be lower, I'd think.

Christoph



Re: New vacuum option to do only freezing

2019-03-26 Thread Masahiko Sawada
On Sat, Mar 23, 2019 at 3:25 AM Robert Haas  wrote:
>
> On Fri, Mar 8, 2019 at 12:14 AM Masahiko Sawada  wrote:
> > IIUC we've discussed the field-and-value style vacuum option. I
> > suggested that since we have already the disable_page_skipping option
> > the disable_page_skipping option would be more natural style and
> > consistent. I think "VACUUM (INDEX_CLEANUP false)" seems consistent
> > with its reloption but not with other vacuum options. So why does only
> > this option (and probably up-coming new options) need to support new
> > style? Do we need the same change to the existing options?
>
> Well, it's too late to change to change DISABLE_PAGE_SKIPPING to work
> some other way; it's been released, and we're stuck with it at this
> point.

Agreed.

> However, I generally believe that it is preferable to phrase
> options positively then negatively, so that for example one writes
> EXPLAIN (ANALYZE, TIMING OFF) not EXPLAIN (ANALYZE, NO_TIMING).  So
> I'd like to do it that way for the new options that we're proposing to
> add.

Agreed with using phrase options positively than negatively. Since
DISABLE_PAGE_SKIPPING is an option for emergency we might be able to
rename for consistency in a future release.

Attached updated version patches. 0001 patch can be applied on top of
the patch that allows the all existing options have one boolean
argument, which I've attached on another thread[1]. So please apply
them in following order.

1. v20-0001-All-VACUUM-command-options-allow-an-argument.patch
(attached on [1] thread)
2. v10-0001-Add-INDEX_CLEANUP-option-to-VACUUM-command.patch
3. v10-0002-Add-disable-index-cleanup-option-to-vacuumdb.patch

I kept the --disable-index-cleanup option of vacuumdb command since
perhaps it would be understandable to specify this option rather than
setting true/false as a command line option.

Please review the patches.

[1] 
https://www.postgresql.org/message-id/CAD21AoBg8CBf1OAse6ESKJmNBon14h3nAR67nJhZ%3DyujA%2BLk4A%40mail.gmail.com

Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From 6900a0d3fec03748c4d413af4e379a9bf1209718 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Thu, 7 Mar 2019 09:45:11 +0900
Subject: [PATCH v10 1/2] Add INDEX_CLEANUP option to VACUUM command

If this option is false, VACUUM does HOT-pruning for live tuples but
doesn't remove dead tuples completely and disables index vacuum.

vacrelstats->dead_tuples could have tuples that became dead after
checked at a HOT-pruning time, which are not marked as dead. Per
discussion on pgsql-hackers We normally records and remove them but
with this option we don't process and leave for the next vacuum for
simplifing the code. That's okay because it's very rare condition and
those tuples will be processed by the next vacuum.
---
 doc/src/sgml/ref/create_table.sgml | 17 +++
 doc/src/sgml/ref/vacuum.sgml   | 28 
 src/backend/access/common/reloptions.c | 13 +-
 src/backend/access/heap/vacuumlazy.c   | 82 ++
 src/backend/commands/vacuum.c  | 22 -
 src/backend/postmaster/autovacuum.c|  2 +-
 src/bin/psql/tab-complete.c|  6 +--
 src/include/commands/vacuum.h  |  3 +-
 src/include/utils/rel.h|  1 +
 src/test/regress/expected/vacuum.out   |  3 ++
 src/test/regress/sql/vacuum.sql|  3 ++
 11 files changed, 153 insertions(+), 27 deletions(-)

diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml
index e94fe2c..4209743 100644
--- a/doc/src/sgml/ref/create_table.sgml
+++ b/doc/src/sgml/ref/create_table.sgml
@@ -1345,6 +1345,23 @@ WITH ( MODULUS numeric_literal, REM

 

+vacuum_index_cleanup (boolean)
+
+ 
+  Per table setting to use INDEX_CLEANUP option
+  of VACUUM command. The default value is true.
+  If false, autovacuum daemon and VACUUM
+  never perform index vacuuming and index cleanup, that is, always set
+  INDEX_CLEANUP option to false.
+  Note that out of disk space due to index bloat. Setting this parameter to
+  false makes sense to avoid scanning large indexes when
+  the table has a few dead tuples. See  for more
+  details on INDEX_CLEANUP option.
+ 
+
+   
+
+   
 autovacuum_vacuum_threshold, toast.autovacuum_vacuum_threshold (integer)
 
  
diff --git a/doc/src/sgml/ref/vacuum.sgml b/doc/src/sgml/ref/vacuum.sgml
index 906d0c2..3f87979 100644
--- a/doc/src/sgml/ref/vacuum.sgml
+++ b/doc/src/sgml/ref/vacuum.sgml
@@ -32,6 +32,7 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ boolean ]
 DISABLE_PAGE_SKIPPING [ boolean ]
 SKIP_LOCKED [ boolean ]
+INDEX_CLEANUP [ boolean ]
 
 and table_and_columns is:
 
@@ -182,6 +183,26 @@ VACUUM [ FULL ] [ FREEZE ] [ VERBOSE ] [ ANALYZE ] [ ) but not sufficient for
+  avoiding index bloat. It defaults to TRUE.
+  This option is ignored if the table does not 

Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Tomas Vondra

On Tue, Mar 26, 2019 at 11:59:56AM +, Dean Rasheed wrote:

On Mon, 25 Mar 2019 at 23:36, Tomas Vondra  wrote:


Attached is an updated patch, fixing all the issues pointed out so far.
Unless there are some objections, I plan to commit the 0001 part by the
end of this CF. Part 0002 is a matter for PG13, as previously agreed.



Yes, I think that's reasonable. It looks to be in pretty good shape. I
have reviewed most of the actual code, but note that I haven't
reviewed the docs changes and I didn't spend much time reading code
comments. It might benefit from a quick once-over comment tidy up.

I just looked through the latest set of changes and I have a couple of
additional review comments:

In the comment about WIDTH_THRESHOLD, s/pg_statistic/pg_statistic_ext/.

In statext_mcv_build(), I'm not convinced by the logic around keeping
the whole MCV list if it fits. Suppose there were a small number of
very common values, and then a bunch of uniformly distributed less
common values. The sample might consist of all the common values, plus
one or two instances of some of the uncommon ones, leading to a list
that would fit, but it would not be appropriate to keep the uncommon
values on the basis of having seen them only one or two times. The
fact that the list of items seen fits doesn't by itself mean that
they're all common enough to justify being kept. In the per-column
stats case, there are a bunch of other checks that have to pass, which
are intended to test not just that the list fits, but that it believes
that those are all the items in the table. For MV stats, you don't
have that, and so I think it would be best to just remove that test
(the "if (ngroups > nitems)" test) and *always* call
get_mincount_for_mcv_list() to determine how many MCV items to keep.
Otherwise there is a risk of keeping too many MCV items, with the ones
at the tail end of the list producing poor estimates.



I need to think about it a bit, but I think you're right we may not need
to keep those items. The reason why I decided to keep them is that they
may represent cases where the (actual frequency << base frequency). But
now that I think about it, that can probably be handled as

 ((1 - total_sel) / ndistinct)

So yeah, I think I'll just change the code to always us the mincount.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: jsonpath

2019-03-26 Thread Alexander Korotkov
On Tue, Mar 26, 2019 at 5:32 PM Tom Lane  wrote:
> Alexander Korotkov  writes:
> > Got access to that buildfarm animal thanks to Tom Turelinckx.  Now
> > running check-world in a loop on the same commit hash with same build
> > options.  Error wasn't triggered yet.
>
> I notice that snapper is using force_parallel_mode = regress ...
> have you got that enabled in your manual test?

Nope.  Thank you for pointing!  I've rerun my test loop with this...

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: libpq compression

2019-03-26 Thread Konstantin Knizhnik


Version of the patch correctly working when no compression algorithm are 
avaiable.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

diff --git a/configure b/configure
index 8068108..8ebd961 100755
--- a/configure
+++ b/configure
@@ -701,6 +701,7 @@ ELF_SYS
 EGREP
 GREP
 with_zlib
+with_zstd
 with_system_tzdata
 with_libxslt
 with_libxml
@@ -864,6 +865,7 @@ with_libxml
 with_libxslt
 with_system_tzdata
 with_zlib
+with_zstd
 with_gnu_ld
 enable_largefile
 enable_float4_byval
@@ -8307,6 +8309,86 @@ fi
 
 
 #
+# ZStd
+#
+
+
+
+# Check whether --with-zstd was given.
+if test "${with_zstd+set}" = set; then :
+  withval=$with_zstd;
+  case $withval in
+yes)
+  ;;
+no)
+  :
+  ;;
+*)
+  as_fn_error $? "no argument expected for --with-zstd option" "$LINENO" 5
+  ;;
+  esac
+
+else
+  with_zstd=no
+
+fi
+
+
+
+
+if test "$with_zstd" = yes ; then
+  { $as_echo "$as_me:${as_lineno-$LINENO}: checking for ZSTD_compress in -lzstd" >&5
+$as_echo_n "checking for ZSTD_compress in -lzstd... " >&6; }
+if ${ac_cv_lib_zstd_ZSTD_compress+:} false; then :
+  $as_echo_n "(cached) " >&6
+else
+  ac_check_lib_save_LIBS=$LIBS
+LIBS="-lzstd  $LIBS"
+cat confdefs.h - <<_ACEOF >conftest.$ac_ext
+/* end confdefs.h.  */
+
+/* Override any GCC internal prototype to avoid an error.
+   Use char because int might match the return type of a GCC
+   builtin and then its argument prototype would still apply.  */
+#ifdef __cplusplus
+extern "C"
+#endif
+char ZSTD_compress ();
+int
+main ()
+{
+return ZSTD_compress ();
+  ;
+  return 0;
+}
+_ACEOF
+if ac_fn_c_try_link "$LINENO"; then :
+  ac_cv_lib_zstd_ZSTD_compress=yes
+else
+  ac_cv_lib_zstd_ZSTD_compress=no
+fi
+rm -f core conftest.err conftest.$ac_objext \
+conftest$ac_exeext conftest.$ac_ext
+LIBS=$ac_check_lib_save_LIBS
+fi
+{ $as_echo "$as_me:${as_lineno-$LINENO}: result: $ac_cv_lib_zstd_ZSTD_compress" >&5
+$as_echo "$ac_cv_lib_zstd_ZSTD_compress" >&6; }
+if test "x$ac_cv_lib_zstd_ZSTD_compress" = xyes; then :
+  cat >>confdefs.h <<_ACEOF
+#define HAVE_LIBZSTD 1
+_ACEOF
+
+  LIBS="-lzstd $LIBS"
+
+else
+  as_fn_error $? "library 'zstd' is required for ZSTD support" "$LINENO" 5
+fi
+
+fi
+
+
+
+#
 # Elf
 #
 
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index c1d1b6b..a77adc3 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1135,6 +1135,20 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  compression
+  
+  
+Request compression of libpq traffic. Client sends to the server list of compression algorithms, supported by client library.
+If server supports one of this algorithms, then it acknowledges use of this algorithm and then all libpq messages send both from client to server and
+visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+message and it is up to the client whether to continue work without compression or report error.
+Supported compression algorithms are chosen at configure time. Right now two libraries are supported: zlib (default) and zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+  
+ 
+
  
   client_encoding
   
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index d66b860..aaca963 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -92,6 +92,15 @@
such as COPY.
   
 
+  
+It is possible to compress protocol data to reduce traffic and speed-up client-server interaction.
+Compression is especial useful for importing/exporting data to/from database using COPY command
+and for replication (both physical and logical). Also compression can reduce server response time
+in case of queries returning large amount of data (for example returning JSON, BLOBs, text,...)
+Right now two libraries are supported: zlib (default) and zstd (if Postgres was
+configured with --with-zstd option). In both cases streaming mode is used.
+  
+
  
   Messaging Overview
 
@@ -263,6 +272,21 @@
  
 
  
+  CompressionAck
+  
+   
+ Server acknowledges using compression for client-server communication protocol.
+ Compression can be requested by client by including "compression" option in connection string.
+ Client sends to the server list of compression algorithms, supported by client library
+ (compression algorithm is identified by one letter: 'f' - Facebook zstd, 'z' - zlib,...).
+ If server supports one of this algorithms, then it acknowledges use of this algorithm and all subsequent libpq messages send both from client to server and
+ visa versa will be compressed. If server is not supporting any of the suggested algorithms, then it replies with 'n' (no compression)
+ algorithm 

Re: FETCH FIRST clause PERCENT option

2019-03-26 Thread Tomas Vondra

On Tue, Mar 26, 2019 at 03:06:52PM +0300, Surafel Temesgen wrote:

On Thu, Feb 28, 2019 at 11:16 PM Tomas Vondra 
wrote:



To give you a (admittedly, somewhat contrived and artificial example):

SELECT * FROM t1 WHERE id IN (
  SELECT id FROM t2 ORDER BY x FETCH FIRST 10 PERCENT ROWS ONLY
);

Maybe this example is bogus and/or does not really matter in practice. I
don't know, but I've been unable to convince myself that's the case.



does this means we abandon incremental approach? and am not sure of
calculating
percentage after OFFSET clause is acceptable or not



I don't follow. I don't think I've suggested to abandon the incremental
approach - I've explained why I think it's what the patch should be doing,
and illustrated that with an example. I admit the example may be too
artificial and we never (or very infrequently) generate such plans. But
no such argument was presented here.

As for the OFFSET, I don't see why that would be incompatible with PERCENT
clause. I'm not sure if it should compute the percentage from the total
number of rows (including those skipped by the OFFSET) or the rest, but I
assume SQL standard says something about that.

cheers


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: psql display of foreign keys

2019-03-26 Thread Alvaro Herrera
On 2019-Mar-26, Peter Eisentraut wrote:

> On 2019-03-26 03:42, Alvaro Herrera wrote:
> > Patch tester didn't like that one bit.  Here's v10 with the fixup
> > applied.
> 
> Looks good to me.

Thanks!

I ran "make installcheck-parallel" using this psql version on all
supported branches plus 9.2.  There were some expected failures, such as
some changed table titles, "FDW options" changed to "FDW Options", and
things like that; but AFAICT other than those everything works as
expected.  (I did find a couple of bugs this way.)

Now pushed.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Block level parallel vacuum

2019-03-26 Thread Masahiko Sawada
On Tue, Mar 26, 2019 at 10:19 AM Haribabu Kommi
 wrote:
>
>
> On Fri, Mar 22, 2019 at 4:06 PM Masahiko Sawada  wrote:
>>
>>
>> Attached the updated version patch. 0001 patch allows all existing
>> vacuum options an boolean argument. 0002 patch introduces parallel
>> lazy vacuum. 0003 patch adds -P (--parallel) option to vacuumdb
>> command.
>
>
> Thanks for sharing the updated patches.

Thank you for reviewing the patch.

>
> 0001 patch:
>
> +PARALLEL [ N ]
>
> But this patch contains syntax of PARALLEL but no explanation, I saw that
> it is explained in 0002. It is not a problem, but just mentioning.

Oops, removed it from 0001.

>
> +  Specifies parallel degree for PARALLEL option. The
> +  value must be at least 1. If the parallel degree
> +  integer is omitted, then
> +  VACUUM decides the number of workers based on 
> number of
> +  indexes on the relation which further limited by
> +  .
>
> Can we add some more details about backend participation also, parallel 
> workers will
> come into picture only when there are 2 indexes in the table.

Agreed. I've add the description. Since such behavior might change in
a future release I've added such description.

>
> + /*
> + * Do post-vacuum cleanup and statistics update for each index if
> + * we're not in parallel lazy vacuum. If in parallel lazy vacuum, do
> + * only post-vacum cleanup and then update statistics after exited
> + * from parallel mode.
> + */
> + lazy_vacuum_all_indexes(vacrelstats, Irel, nindexes, indstats,
> + lps, true);
>
> How about renaming the above function, as it does the cleanup also?
> lazy_vacuum_or_cleanup_all_indexes?

Agreed. I think lazy_vacuum_or_cleanup_indexes would be better. Also
the same is true for lazy_vacuum_indexes_for_workers(). Fixed.

>
>
> + if (!IsInParallelVacuum(lps))
> + {
> + /*
> + * Update index statistics. If in parallel lazy vacuum, we will
> + * update them after exited from parallel mode.
> + */
> + lazy_update_index_statistics(Irel[idx], stats[idx]);
> +
> + if (stats[idx])
> + pfree(stats[idx]);
> + }
>
> The above check in lazy_vacuum_all_indexes can be combined it with the outer
> if check where the memcpy is happening. I still feel that the logic around 
> the stats
> makes it little bit complex.

Hmm, memcpy is needed in both vacuuming index and cleanup index case
but the updating index statistics is needed in only cleanup index
case. I've split the code for parallel index vacuuming from
lazy_vacuum_or_cleanup_indexes. Also, I've changed the patch so that
both the leader process and worker processes use the same code for
index vacuuming and index cleanup. I hope the code got simple and
understandable.

>
> + if (IsParallelWorker())
> + msg = "scanned index \"%s\" to remove %d row versions by parallel vacuum 
> worker";
> + else
> + msg = "scanned index \"%s\" to remove %d row versions";
>
> I feel, this way of error message may not be picked for the translations.
> Is there any problem if we duplicate the entire ereport message with changed 
> message?

No, but I'd like to avoid writing the same arguments in multiple
ereport()s. I've add gettext_noop() as per comment from Horiguchi-san.

>
> + for (i = 0; i < nindexes; i++)
> + {
> + LVIndStats *s = &(copied_indstats[i]);
> +
> + if (s->updated)
> + lazy_update_index_statistics(Irel[i], &(s->stats));
> + }
> +
> + pfree(copied_indstats);
>
> why can't we use the shared memory directly to update the stats once all the 
> workers
> are finished, instead of copying them to a local memory?

Since we cannot use heap_inplace_update() which is called by
vac_update_relstats() during parallel mode I copied the stats. Is that
safe if we destroy the parallel context *after* exited parallel mode?

>
> + tab->at_params.nworkers = 0; /* parallel lazy autovacuum is not supported */
>
> User is not required to provide workers number compulsory even that parallel 
> vacuum can
> work, so just setting the above parameters doesn't stop the parallel workers, 
> user must
> pass the PARALLEL option also. So mentioning that also will be helpful later 
> when we
> start supporting it or some one who is reading the code can understand.
>

We should set -1 to at_params.nworkers here to disable parallel lazy
vacuum. And this review comment got me thinking that VACOPT_PARALLEL
can be combined with nworkers of VacuumParams. So I've removed
VACOPT_PARALELL and passing nworkers with >= 0 enables parallel lazy
vacuum.






Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
From f243cf3b1b22136cad8fe981cf3b51f3cc44dfc5 Mon Sep 17 00:00:00 2001
From: Masahiko Sawada 
Date: Tue, 26 Mar 2019 22:13:53 +0900
Subject: [PATCH v20 1/3] All VACUUM command options allow an argument.

All existing VACUUM command options allow a boolean argument like
EXPLAIN command.
---
 doc/src/sgml/ref/vacuum.sgml  | 26 --
 src/backend/commands/vacuum.c | 13 +++--
 

Re: jsonpath

2019-03-26 Thread Tom Lane
Alexander Korotkov  writes:
> Got access to that buildfarm animal thanks to Tom Turelinckx.  Now
> running check-world in a loop on the same commit hash with same build
> options.  Error wasn't triggered yet.

I notice that snapper is using force_parallel_mode = regress ...
have you got that enabled in your manual test?

regards, tom lane



Re: Shouldn't current_schema() be at least PARALLEL RESTRICTED?

2019-03-26 Thread Daniel Gustafsson
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

Reviewing the codepath in question (as well as the commit that changed the 
behaviour of fetch_search_path()), switching current_schema(s) to 
parallel-unsafe seems like the correct solution. check-world tests pass cleanly 
and no documentation patch is required.

Switching to ready for committer.

The new status of this patch is: Ready for Committer


Re: Misleading errors with column references in default expressions and partition bounds

2019-03-26 Thread Tom Lane
Michael Paquier  writes:
> One idea which came from Amit, and it seems to me that it is a good
> idea, would be to have more context-related error messages directly in
> transformColumnRef(), so as we can discard at an early stage column
> references which are part of contexts where there is no meaning to
> have them.

+1 for the general idea, but I find the switch a bit overly verbose.
Do we really need to force every new EXPR_KIND to visit this spot,
when so few of them have a need to do anything?  I'd be a bit inclined
to simplify it to

switch (pstate->p_expr_kind)
{
case EXPR_KIND_COLUMN_DEFAULT:
ereport(...);
break;
case EXPR_KIND_PARTITION_BOUND:
ereport(...);
break;
default:
break;
}

That's just a nitpick though.

> While this takes care of the RTE issues, this has a downside though.
> Take for example this case using an expression with an aggregate and
> a column reference: 
> =# CREATE TABLE part_bogus_expr_fail PARTITION OF list_parted
>FOR VALUES IN (sum(a));
> -ERROR:  aggregate functions are not allowed in partition bound
> +ERROR:  cannot use column reference in partition bound expression

I don't see that as an issue.

> The docs of CREATE TABLE also look incorrect to me when it comes to
> default expressions.  It says the following: "other columns in the
> current table are not allowed".  However *all* columns are not
> authorized, including the column which uses the expression.

I think the idea is that trying to reference another column is something
that people might well try to do, whereas referencing the DEFAULT's
own column is obviously silly.  In particular the use of "cross-reference"
implies that another column is what is being referenced.  If we dumb it
down to just "references to columns in the current table", then it's
consistent, but it's also completely redundant with the main part of the
sentence.  It doesn't help that somebody decided to jam the independent
issue of subqueries into the same sentence.  In short, maybe it'd be
better like this:

   ... The value
   is any variable-free expression (in particular, cross-references
   to other columns in the current table are not allowed).  Subqueries
   are not allowed either.

regards, tom lane



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Dean Rasheed
On Tue, 26 Mar 2019 at 11:59, Dean Rasheed  wrote:
>
> On Mon, 25 Mar 2019 at 23:36, Tomas Vondra  
> wrote:
> >
> > Attached is an updated patch...
>
> I just looked through the latest set of changes and I have a couple of
> additional review comments:
>

I just spotted another issue while reading the code:

It's possible to build an MCV list with more than
STATS_MCVLIST_MAX_ITEMS = 8192 items, which then causes an error when
the code tries to read it back in:

create temp table foo(a int, b int);
insert into foo select x,x from generate_series(1,1) g(x);
insert into foo select x,x from generate_series(1,1) g(x);
alter table foo alter column a set statistics 1;
alter table foo alter column b set statistics 1;
create statistics s (mcv) on a,b from foo;
analyse foo;
select * from foo where a=1 and b=1;

ERROR:  invalid length (1) item array in MCVList

So this needs to be checked when building the MCV list.

In fact, the stats targets for table columns can be as large as 1
(a hard-coded constant in tablecmds.c, which is pretty ugly, but
that's a different matter), so I think STATS_MCVLIST_MAX_ITEMS
probably ought to match that.

There are also a couple of comments that refer to the 8k limit, which
would need updating, if you change it.

Regards,
Dean



Re: Special role for subscriptions

2019-03-26 Thread Robert Haas
On Thu, Mar 21, 2019 at 9:28 PM Michael Paquier  wrote:
> By the way, as the commit fest is coming to its end in a couple of
> days, and that we are still discussing how the thing should be shaped,
> I would recommend to mark the patch as returned with feedback.  Any
> objections with that?

+1.  It doesn't even seem that we agree on how it should be designed,
still less the implementation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Re: reloption to prevent VACUUM from truncating empty pages at the end of relation

2019-03-26 Thread Robert Haas
On Tue, Mar 26, 2019 at 3:57 AM Tsunakawa, Takayuki
 wrote:
> From: David Steele [mailto:da...@pgmasters.net]
> > This patch appears to have been stalled for a while.
> >
> > Takayuki -- the ball appears to be in your court.  Perhaps it would be
> > helpful to summarize what you think are next steps?
>
> disable_index_cleanup is handled by Sawada-san in another thread.  I 
> understand I've reflected all review comments in the latest patch, and 
> replied to the opinions/proposals, so the patch status is kept "needs 
> review."  (I hope new fire won't happen...)

I don't see a patch with the naming updated, here or there, and I'm
going to be really unhappy if we end up with inconsistent naming
between two patches that do such fundamentally similar things.  -1
from me to committing either one until that inconsistency is resolved.
I have made a proposal for resolving it in a way that I think would be
satisfactory and best; other options might also exist; the patch looks
unproblematic otherwise; but I don't think it is committable as it
stands.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Offline enabling/disabling of data checksums

2019-03-26 Thread Fabien COELHO


Bonjour Michaël,


Here is an attempt at improving the Notes. [...]


So, the ordering of the notes for each paragraph is as follows: 1) 
Replication issues when mixing different checksum setups across nodes. 
2) Consistency of the operations if killed. 3) Don't start Postgres 
while the operation runs.


Your proposal is to switch the order of the paragraphs to 3), 1) and
then 2).


Yes. I suggest to emphasize cluster corruption risks by putting them 
first.



Do others have any opinion?  I am fine with the current
order of things, still it may make sense to tweaks the docs.

In the paragraph related to replication, the second statement is
switched to be first so as the docs warn first, and then give
recommendations.


Yep.


This part makes sense.


Yep!


I am not sure that "checksum status" is a correct term.  It seems to
me that "same configuration for data checksums as before the tool ran"
or something like that would be more correct.


Possibly, I cannot say.

--
Fabien.

Re: Refactoring the checkpointer's fsync request queue

2019-03-26 Thread Robert Haas
On Tue, Mar 26, 2019 at 12:20 AM Thomas Munro  wrote:
> I've been trying to decide if that is a problem.  Maybe there is a
> performance angle, and I'm also wondering if it might increase the
> risk of missing a write-back error.  Of course we'll find a proper
> solution to that problem (perhaps fd-passing or sync-on-close[1]), but
> I don't want to commit anything in the name of refactoring that might
> make matters worse incidentally.  Or perhaps those worries are bogus,
> since the checkpointer calls smgrcloseall() at the end anyway.
>
> On balance, I'm inclined to err on the side of caution and try to keep
> things a bit closer to the way they are for now.
>
> Here's a fixup patch.  0001 is the same as Shawn's v12 patch, and 0002
> has my proposed changes to switch to callbacks that actually perform
> the sync and unlink operations given a file tag, and do so via the
> SMGR fd cache, rather than exposing the path to sync.c.  This moves us
> back towards the way I had it in an earlier version of the patch, but
> instead of using smgrsyncimmed() as I had it, it goes via Shawn's new
> sync handler function lookup table, allowing for non-smgr components
> to use this machinery in future (as requested by Andres).

Strong +1.  Not only might closing and reopening the files have
performance and reliability implications, but a future smgr might talk
to the network, having no local file to sync.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improvement of installation document

2019-03-26 Thread Tatsuo Ishii
> One of our clients suggested that the installation document[1] lacks 
> description
> about requriements of installing *-devel packages. For example, 
> postgresqlxx-devel
> is required for using --with-pgsql, and openssl-devel for --with-openssl, and 
> so on,
> but these are not documented.
> 
> [1] http://www.pgpool.net/docs/pgpool-II-3.7.4/en/html/install-pgpool.html
> 
> I know the document of PostgreSQL[2] also lacks the description about 
> openssl-devel,
> kerberos-devel, etc. (except to readline-devl). However, it would be 
> convenient
> for users who want to install Pgpool-II from source code if the required 
> packages
> for installation are described in the document explicitly.
> 
> [2] https://www.postgresql.org/docs/current/install-requirements.html
> 
> Is it not worth to consider this?

I am against the idea.

Development package names could differ according to
distributions/OS. For example, the developement package of OpenSSL is
"openssl-dev", not "openssl-devel" in Debian or Debian derived
systems.

Another reason is, a user who is installaing software from source code
should be familiar enough with the fact that each software requires
development libraries.

In summary adding not-so-complete-list-of-development-package-names to
our document will give incorrect information to novice users, and will
be annoying for skilled users.

> BTW, the Pgpool-II doc[2] says:
> 
> --with-memcached=path
> Pgpool-II binaries will use memcached for in memory query cache. You have 
> to install libmemcached. 
> 
> , but maybe libmemcached-devel is correct instead of libmemcached?

I don't think so. "libmemcached-devel" is just a package name in a
cetain Linux distribution. "libmemcached" is a more geneal and non
distribution dependent term.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: partitioned tables referenced by FKs

2019-03-26 Thread Jesper Pedersen

Hi Amit,

On 3/26/19 2:06 AM, Amit Langote wrote:

Wouldn't you get the same numbers on HEAD too?  IOW, I'm not sure how the
patch here, which seems mostly about getting DDL in order to support
foreign keys on partitioned tables, would have affected the result of this
benchmark.  Can you clarify your intention of running this benchmark
against these patches?



Yeah, you are right. As I'm also seeing this issue with a test case of

-- ddl.sql --
CREATE TABLE t1 (i1 INT PRIMARY KEY, i2 INT NOT NULL) PARTITION BY HASH 
(i1);

CREATE TABLE t2 (i1 INT PRIMARY KEY, i2 INT NOT NULL);

\o /dev/null
SELECT 'CREATE TABLE t1_p' || x::text || ' PARTITION OF t1
FOR VALUES WITH (MODULUS 64, REMAINDER ' || x::text || ');'
from generate_series(0,63) x;
\gexec
\o

ALTER TABLE t1 ADD CONSTRAINT fk_t1_i2_t2_i1 FOREIGN KEY (i2) REFERENCES 
t2(i1);


ANALYZE;

we shouldn't focus on it in this thread.

Best regards,
 Jesper



Re: pg_upgrade: Pass -j down to vacuumdb

2019-03-26 Thread Tom Lane
Peter Eisentraut  writes:
> On 2019-03-25 22:57, Tom Lane wrote:
>> +fprintf(script, "echo %sYou may wish to add --jobs=N for parallel 
>> analyzing.%s\n",
>> +ECHO_QUOTE, ECHO_QUOTE);

> But then you get that information after you have already started the script.

Yes, but that's already true of all the info that this script prints out.

> I don't find any information about this analyze business on the
> pg_upgrade reference page.  Maybe a discussion there could explain the
> different paths better than making the output script extra complicated.

> Essentially: If you want a slow and gentle analyze, use the supplied
> script.  If you want a fast analyze, use vacuumdb, perhaps with an
> appropriate --jobs option.  Note that pg_upgrade --jobs and vacuumdb
> --jobs are resource-bound in different ways, so the same value might not
> be appropriate for both.

I have no objection to handling it that way (i.e., just as a doc fix).
Or we could do both.

regards, tom lane



Should the docs have a warning about pg_stat_reset()?

2019-03-26 Thread David Rowley
As I mentioned in [1], I've had a few cases recently about auto-vacuum
not working. On the other thread, it was all about auto-vacuum being
configured to run too slowly.  The other culprit for auto-vacuum not
working is when people periodically use pg_stat_reset().

The problem with pg_stat_reset() is that it zeros n_dead_tup and
n_mod_since_analyze.  If say a user resets the stats on a monthly
basis then this can mean that tables that normally receive an
auto-vacuum any less frequently than once per month could never
receive an auto-vacuum... at least not until an anti-wraparound vacuum
gets hold of it.

The best I can think to do to try and avoid this is to put a giant
WARNING in the docs about either not using it or to at least run
ANALYZE after using it.

Does anyone else think this is a problem worth trying to solve?

[1] 
https://www.postgresql.org/message-id/CAKJS1f_YbXC2qTMPyCbmsPiKvZYwpuQNQMohiRXLj1r=8_r...@mail.gmail.com

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Problems with plan estimates in postgres_fdw

2019-03-26 Thread Etsuro Fujita

(2019/03/20 20:47), Etsuro Fujita wrote:

Attached is an updated version of the patch set.


One thing I noticed while self-reviewing the patch for UPPERREL_FINAL 
is: the previous versions of the patch don't show EPQ plans in EXPLAIN, 
as shown in the below example, so we can't check if those plans are 
constructed correctly, which I'll explain below:


* HEAD
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;



 QUERY 
PLAN 



-
 Limit
   Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
   ->  LockRows
 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 ->  Foreign Scan
   Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
   Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN 
(r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, 
r1.c6, r1.c7, r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN 
ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM 
("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" 
ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1

-->->  Result
-->  Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
 ->  Sort
   Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
   Sort Key: t1.c3, t1.c1
   ->  Merge Join
 Output: t1.c1, t1.c3, t1.*, t2.c1, t2.*
 Merge Cond: (t1.c1 = t2.c1)
 ->  Sort
   Output: t1.c1, t1.c3, t1.*
   Sort Key: t1.c1
   ->  Foreign Scan on public.ft1 t1
 Output: t1.c1, t1.c3, t1.*
 Remote SQL: SELECT "C 1", 
c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE

 ->  Sort
   Output: t2.c1, t2.*
   Sort Key: t2.c1
   ->  Foreign Scan on public.ft2 t2
 Output: t2.c1, t2.*
 Remote SQL: SELECT "C 1", 
c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1"

(28 rows)

* Patched
EXPLAIN (VERBOSE, COSTS OFF)
SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY 
t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1;




  QUERY PLAN 




-
 Foreign Scan
   Output: t1.c1, t2.c1, t1.c3, t1.*, t2.*
   Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2)
   Remote SQL: SELECT r1."C 1", r2."C 1", r1.c3, CASE WHEN (r1.*)::text 
IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, 
r1.c8) END, CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, 
r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER 
JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC 
NULLS LAST, r1."C 1" ASC NULLS LAST LIMIT 10::bigint OFFSET 100::bigint 
FOR UPDATE OF r1

(4 rows)

In HEAD, this test case checks that a Result node is inserted atop of an 
EPQ plan for a foreign join (if necessary) when the foreign join is at 
the topmost join level (see discussion [1]), but the patched version 
doesn't show EPQ plans in EXPLAIN, so we can't check that as-is.  Should 
we show EPQ plans in EXPLAIN?  My inclination would be to not show them, 
because that information would be completely useless for the user.


Another thing I noticed is: the previous versions make pointless another 
test case added by commit 4bbf6edfb (and adjusted by commit 99f6a17dd) 
that checks an EPQ plan constructed from multiple merge joins, because 
they changed a query plan for that test case like the above.  (Actually, 
though, that test case doesn't need that EPQ plan already 

Re: warning to publication created and wal_level is not set to logical

2019-03-26 Thread Lucas Viecelli
>> One idea that might be useful is to have walsenders refuse to transmit
> >> any logical-replication data if they see wal_level is too low.  That
> >> would get users' attention pretty quickly.
>
> > They do:
>

I checked this before creating the patch


>
> Oh, OK, then this seems like it's basically covered already.  I think
> the original suggestion to add a WARNING during CREATE PUBLICATION
> isn't unreasonable.  But we don't need to do more than that (and it
> shouldn't be higher than WARNING).
>

Okay, I think it will improve understanding of new users.

Since everything is fine, thank you all for the comments
-- 

Atenciosamente.

*Lucas Viecelli*




Re: FETCH FIRST clause PERCENT option

2019-03-26 Thread Surafel Temesgen
On Thu, Feb 28, 2019 at 11:16 PM Tomas Vondra 
wrote:

>
> To give you a (admittedly, somewhat contrived and artificial example):
>
> SELECT * FROM t1 WHERE id IN (
>   SELECT id FROM t2 ORDER BY x FETCH FIRST 10 PERCENT ROWS ONLY
> );
>
> Maybe this example is bogus and/or does not really matter in practice. I
> don't know, but I've been unable to convince myself that's the case.


does this means we abandon incremental approach? and am not sure of
calculating
percentage after OFFSET clause is acceptable or not

regards
Surafel


Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-03-26 Thread Dean Rasheed
On Mon, 25 Mar 2019 at 23:36, Tomas Vondra  wrote:
>
> Attached is an updated patch, fixing all the issues pointed out so far.
> Unless there are some objections, I plan to commit the 0001 part by the
> end of this CF. Part 0002 is a matter for PG13, as previously agreed.
>

Yes, I think that's reasonable. It looks to be in pretty good shape. I
have reviewed most of the actual code, but note that I haven't
reviewed the docs changes and I didn't spend much time reading code
comments. It might benefit from a quick once-over comment tidy up.

I just looked through the latest set of changes and I have a couple of
additional review comments:

In the comment about WIDTH_THRESHOLD, s/pg_statistic/pg_statistic_ext/.

In statext_mcv_build(), I'm not convinced by the logic around keeping
the whole MCV list if it fits. Suppose there were a small number of
very common values, and then a bunch of uniformly distributed less
common values. The sample might consist of all the common values, plus
one or two instances of some of the uncommon ones, leading to a list
that would fit, but it would not be appropriate to keep the uncommon
values on the basis of having seen them only one or two times. The
fact that the list of items seen fits doesn't by itself mean that
they're all common enough to justify being kept. In the per-column
stats case, there are a bunch of other checks that have to pass, which
are intended to test not just that the list fits, but that it believes
that those are all the items in the table. For MV stats, you don't
have that, and so I think it would be best to just remove that test
(the "if (ngroups > nitems)" test) and *always* call
get_mincount_for_mcv_list() to determine how many MCV items to keep.
Otherwise there is a risk of keeping too many MCV items, with the ones
at the tail end of the list producing poor estimates.

Regards,
Dean



Re: Improvement of installation document

2019-03-26 Thread Yugo Nagata
Hi,

I apologize that I accidentally sent the following email to this list.
Please disregard this.

I am sorry for making a lot of noise.

Regard,

On Tue, 26 Mar 2019 20:38:31 +0900
Yugo Nagata  wrote:

> Hi,
> 
> One of our clients suggested that the installation document[1] lacks 
> description
> about requriements of installing *-devel packages. For example, 
> postgresqlxx-devel
> is required for using --with-pgsql, and openssl-devel for --with-openssl, and 
> so on,
> but these are not documented.
> 
> [1] http://www.pgpool.net/docs/pgpool-II-3.7.4/en/html/install-pgpool.html
> 
> I know the document of PostgreSQL[2] also lacks the description about 
> openssl-devel,
> kerberos-devel, etc. (except to readline-devl). However, it would be 
> convenient
> for users who want to install Pgpool-II from source code if the required 
> packages
> for installation are described in the document explicitly.
> 
> [2] https://www.postgresql.org/docs/current/install-requirements.html
> 
> Is it not worth to consider this?
> 
> 
> BTW, the Pgpool-II doc[2] says:
> 
> --with-memcached=path
> Pgpool-II binaries will use memcached for in memory query cache. You have 
> to install libmemcached. 
> 
> , but maybe libmemcached-devel is correct instead of libmemcached?
> 
> Regards,
> 
> -- 
> Yugo Nagata 
> 
> 
> -- 
> Yugo Nagata 
> 


-- 
Yugo Nagata 



  1   2   >