Re: How does the planner determine plan_rows ?

2019-01-10 Thread Donald Dong


> On Jan 10, 2019, at 8:01 PM, Tom Lane  wrote:
> 
> ... estimate_rel_size() in plancat.c is where to look to find out
> about the planner's default estimates when it's lacking hard data.

Thank you! Now I see how the planner uses the rows to estimate the cost and
generates the best_plan.

To me, tracing the function calls is not a simple task. I'm using cscope, and I
use printf when I'm not entirely sure. I was considering to use gbd, but I'm
having issues referencing the source code in gdb.

I'm very interested to learn how the professionals explore the codebase!


Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-10 Thread Michael Paquier
On Fri, Jan 11, 2019 at 10:50:32AM +0900, Amit Langote wrote:
> Okay, I withdraw my objection to the wording proposed by you.

Thanks.  I can commit this version if there are no objections then.
--
Michael


signature.asc
Description: PGP signature


RE: speeding up planning with partitions

2019-01-10 Thread Imai, Yoshikazu
On Thu, Jan 10, 2019 at 6:10 PM, Imai, Yoshikazu wrote:
> > Does that not mean that the if (parent->top_parent_relids) will always
> > be false in build_dummy_partition_rel() and it'll only ever get
> > rtekind == RTE_RELATION?
> 
> At least, I checked if (parent->top_parent_relids) can be true if I
> execute below SQL.
> 
> select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 15;

Sorry, I also made mistake. I was executed:
select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.a < 15;

--
Yoshikazu Imai


> -Original Message-
> From: Imai, Yoshikazu [mailto:imai.yoshik...@jp.fujitsu.com]
> Sent: Friday, January 11, 2019 3:10 PM
> To: 'David Rowley' ; Amit Langote
> 
> Cc: Amit Langote ; Alvaro Herrera
> ; Pg Hackers 
> Subject: RE: speeding up planning with partitions
> 
> Hi David,
> 
> On Thu, Jan 10, 2019 at 4:02 PM, David Rowley wrote:
> > 3. I wonder if there's a better way to handle what
> > build_dummy_partition_rel() does. I see the child's relid to the
> > parent's relid and makes up a fake AppendRelInfo and puts it in the
> > parent's slot.  What's going to happen when the parent is not the
> > top-level parent? It'll already have a AppendRelInfo set.
> ...
> >
> > select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id
> > < 10;
> 
> I think there is a mistake in the select SQL.
> "p1.id < 10" doesn't prune any partition because tables are partitioned
> by column "a" in your definition. Isn't it?
> 
> > Does that not mean that the if (parent->top_parent_relids) will always
> > be false in build_dummy_partition_rel() and it'll only ever get
> > rtekind == RTE_RELATION?
> 
> At least, I checked if (parent->top_parent_relids) can be true if I
> execute below SQL.
> 
> select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id
> < 15;
> 
> I couldn't check other points you mentioned, but I also think
> build_dummy_partition_rel() needs more consideration because I felt it
> has complicated logic when I was checking around here.
> 
> 
> Amit,
> I also realized there are some mistakes in the comments around this
> function.
> 
> + * build_dummy_partition_rel
> + *   Build a RelOptInfo and AppendRelInfo for a pruned
> partition
> s/and AppendRelInfo/and an AppendRelInfo/
> 
> +  * Now we'll need a (no-op) AppendRelInfo for parent, because
> we're
> +  * setting the dummy partition's relid to be same as the parent's.
> s/a \(no-op\) AppendRelInfo/an \(no-op\) AppendRelInfo/
> 
> --
> Yoshikazu Imai



Re: speeding up planning with partitions

2019-01-10 Thread Amit Langote
On 2019/01/11 11:07, Amit Langote wrote:
> On 2019/01/11 6:47, David Rowley wrote:
>> On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera  
>> wrote:
>>> Pushed 0001 with some minor tweaks, thanks.
>>
>> Thanks for pushing.  I had looked at 0001 last night and there wasn't
>> much to report, just:
>>
>> v12 0001
>>
>> 1. I see you've moved translate_col_privs() out of prepunion.c into
>> appendinfo.c. This required making it an external function, but it's
>> only use is in inherit.c, so would it not be better to put it there
>> and keep it static?
> 
> Actually, I *was* a bit puzzled where to put it.  I tend to agree with you
> now that it might be define it locally within inherit.c as it might not be
> needed elsewhere.  Let's hear what Alvaro thinks.  I'm attaching a patch
> anyway.
> 
>> 2. The following two lines I think need to swap their order.
>>
>> +#include "utils/rel.h"
>> +#include "utils/lsyscache.h"
> 
> Oops, fixed.
> 
>> Both are pretty minor details but thought I'd post them anyway.
> 
> Thank you for reporting.
> 
> Attached find the patch.

Looking around a bit more, I started thinking even build_child_join_sjinfo
doesn't belong in appendinfo.c (just to be clear, it was defined in
prepunion.c before this commit), so maybe we should move it to joinrels.c
and instead export adjust_child_relids that's required by it from
appendinfo.c.  There's already adjust_child_relids_multilevel in
appendinfo.h, so having adjust_child_relids next to it isn't too bad.  At
least not as bad as appendinfo.c exporting build_child_join_sjinfo for
joinrels.c to use.

I have updated the patch.

Thanks,
Amit
diff --git a/src/backend/optimizer/path/joinrels.c 
b/src/backend/optimizer/path/joinrels.c
index 38eeb23d81..db18feccfe 100644
--- a/src/backend/optimizer/path/joinrels.c
+++ b/src/backend/optimizer/path/joinrels.c
@@ -46,6 +46,9 @@ static void try_partitionwise_join(PlannerInfo *root, 
RelOptInfo *rel1,
   List *parent_restrictlist);
 static int match_expr_to_partition_keys(Expr *expr, RelOptInfo *rel,
 bool strict_op);
+static SpecialJoinInfo *build_child_join_sjinfo(PlannerInfo *root,
+   SpecialJoinInfo *parent_sjinfo,
+   Relids left_relids, Relids 
right_relids);
 
 
 /*
@@ -1582,3 +1585,45 @@ match_expr_to_partition_keys(Expr *expr, RelOptInfo 
*rel, bool strict_op)
 
return -1;
 }
+
+/*
+ * Construct the SpecialJoinInfo for a child-join by translating
+ * SpecialJoinInfo for the join between parents. left_relids and right_relids
+ * are the relids of left and right side of the join respectively.
+ */
+static SpecialJoinInfo *
+build_child_join_sjinfo(PlannerInfo *root, SpecialJoinInfo *parent_sjinfo,
+   Relids left_relids, Relids 
right_relids)
+{
+   SpecialJoinInfo *sjinfo = makeNode(SpecialJoinInfo);
+   AppendRelInfo **left_appinfos;
+   int left_nappinfos;
+   AppendRelInfo **right_appinfos;
+   int right_nappinfos;
+
+   memcpy(sjinfo, parent_sjinfo, sizeof(SpecialJoinInfo));
+   left_appinfos = find_appinfos_by_relids(root, left_relids,
+   
_nappinfos);
+   right_appinfos = find_appinfos_by_relids(root, right_relids,
+   
 _nappinfos);
+
+   sjinfo->min_lefthand = adjust_child_relids(sjinfo->min_lefthand,
+   
   left_nappinfos, left_appinfos);
+   sjinfo->min_righthand = adjust_child_relids(sjinfo->min_righthand,
+   
right_nappinfos,
+   
right_appinfos);
+   sjinfo->syn_lefthand = adjust_child_relids(sjinfo->syn_lefthand,
+   
   left_nappinfos, left_appinfos);
+   sjinfo->syn_righthand = adjust_child_relids(sjinfo->syn_righthand,
+   
right_nappinfos,
+   
right_appinfos);
+   sjinfo->semi_rhs_exprs = (List *) adjust_appendrel_attrs(root,
+   
 (Node *) sjinfo->semi_rhs_exprs,
+   
 right_nappinfos,
+

Re: New vacuum option to do only freezing

2019-01-10 Thread Masahiko Sawada
On Thu, Jan 10, 2019 at 2:45 PM Masahiko Sawada  wrote:
>
> On Thu, Jan 10, 2019 at 5:23 AM Bossart, Nathan  wrote:
> >
> > Hi,
> >
> > On 1/8/19, 7:03 PM, "Masahiko Sawada"  wrote:
> > > Attached the updated version patch incorporated all comments I got.
> >
> > Thanks for the new patch!
> >
> > > * Instead of freezing xmax I've changed the behaviour of the new
> > > option (DISABLE_INDEX_CLEANUP) so that it sets dead tuples as dead
> > > instead of as unused and skips both index vacuum and index cleanup.
> > > That is, we remove the storage of dead tuple but leave dead itemids
> > > for the index lookups. These are removed by the next vacuum execution
> > > enabling index cleanup. Currently HOT-pruning doesn't set the root of
> > > the chain as unused even if the whole chain is dead. Since setting
> > > tuples as dead removes storage the freezing xmax is no longer
> > > required.
> >
> > I tested this option with a variety of cases (HOT, indexes, etc.), and
> > it seems to work well.  I haven't looked too deeply into the
> > implications of using LP_DEAD this way, but it seems like a reasonable
> > approach at first glance.
>
> Thank you for reviewing the patch!
>
> >
> > +
> > +DISABLE_INDEX_CLEANUP
> > +
> > + 
> > +  VACUUM removes dead tuples and prunes HOT-updated
> > +  tuples chain for live tuples on table. If the table has any dead 
> > tuple
> > +  it removes them from both table and indexes for re-use. With this
> > +  option VACUUM marks tuples as dead (i.e., it 
> > doesn't
> > +  remove tuple storage) and disables removing dead tuples from indexes.
> > +  This is suitable for avoiding transaction ID wraparound but not
> > +  sufficient for avoiding index bloat. This option cannot be used in
> > +  conjunction with FULL option.
> > + 
> > +
> > +   
> >
> > I think we should clarify the expected behavior when this option is
> > used on a table with no indexes.  We probably do not want to fail, as
> > this could disrupt VACUUM commands that touch several tables.  Also,
> > we don't need to set tuples as dead instead of unused, which appears
> > to be what this patch is actually doing:
> >
> > +   if (nindexes > 0 && disable_index_cleanup)
> > +   lazy_set_tuples_dead(onerel, blkno, buf, 
> > vacrelstats);
> > +   else
> > +   lazy_vacuum_page(onerel, blkno, buf, 0, 
> > vacrelstats, );
> >
> > In this case, perhaps we should generate a log statement that notes
> > that DISABLE_INDEX_CLEANUP is being ignored and set
> > disable_index_cleanup to false during processing.
>
> Agreed. How about output a NOTICE message before calling
> lazy_scan_heap() in lazy_vacuum_rel()?
>
> >
> > +   if (disable_index_cleanup)
> > +   ereport(elevel,
> > +   (errmsg("\"%s\": marked %.0f row 
> > versions in %u pages as dead",
> > +   
> > RelationGetRelationName(onerel),
> > +   tups_vacuumed, 
> > vacuumed_pages)));
> > +   else
> > +   ereport(elevel,
> > +   (errmsg("\"%s\": removed %.0f row 
> > versions in %u pages",
> > +   
> > RelationGetRelationName(onerel),
> > +   tups_vacuumed, 
> > vacuumed_pages)));
> >
> > Should the first log statement only be generated when there are also
> > indexes?
>
> You're right. Will fix.
>
> >
> > +static void
> > +lazy_set_tuples_dead(Relation onerel, BlockNumber blkno, Buffer buffer,
> > +   LVRelStats *vacrelstats)
> >
> > This function looks very similar to lazy_vacuum_page().  Perhaps the
> > two could be combined?
> >
>
> Yes, I intentionally separed them as I was concerned the these
> functions have different assumptions and usage. But the combining them
> also could work. Will change it.
>

Attached the updated patch. Please review it.



Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


v4-0001-Add-DISABLE_INDEX_CLEANUP-option-to-VACUUM-comman.patch
Description: Binary data


RE: speeding up planning with partitions

2019-01-10 Thread Imai, Yoshikazu
Hi David,

On Thu, Jan 10, 2019 at 4:02 PM, David Rowley wrote:
> 3. I wonder if there's a better way to handle what
> build_dummy_partition_rel() does. I see the child's relid to the
> parent's relid and makes up a fake AppendRelInfo and puts it in the
> parent's slot.  What's going to happen when the parent is not the
> top-level parent? It'll already have a AppendRelInfo set.
... 
> 
> select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 10;

I think there is a mistake in the select SQL.
"p1.id < 10" doesn't prune any partition because tables are partitioned by
column "a" in your definition. Isn't it?

> Does that not mean that the if (parent->top_parent_relids) will always
> be false in build_dummy_partition_rel() and it'll only ever get
> rtekind == RTE_RELATION?

At least, I checked if (parent->top_parent_relids) can be true if I execute
below SQL.

select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 15;

I couldn't check other points you mentioned, but I also think
build_dummy_partition_rel() needs more consideration because I felt it has
complicated logic when I was checking around here.


Amit,
I also realized there are some mistakes in the comments around this function.

+ * build_dummy_partition_rel
+ * Build a RelOptInfo and AppendRelInfo for a pruned partition 
s/and AppendRelInfo/and an AppendRelInfo/

+* Now we'll need a (no-op) AppendRelInfo for parent, because we're
+* setting the dummy partition's relid to be same as the parent's.
s/a \(no-op\) AppendRelInfo/an \(no-op\) AppendRelInfo/

--
Yoshikazu Imai



Re: doc: where best to add ~ 400 words to the manual?

2019-01-10 Thread Chapman Flack
On 01/10/19 23:48, Chapman Flack wrote:
> On 12/30/18 22:23, Chapman Flack wrote:
> (3) seems to be supported by (the only) precedent, the "Limits and
> Compatibility" sect3 within functions-posix-regexp.
> 
> So, absent objection, I'll work on a Limits and Compatibility sect2
> within functions-xml.

One fly in the ointment: tables of contents seem to list sect1 and sect2
elements, but not sect3. So the "Limits and Compatibility" sect3 under
functions-posix-regexp does not clutter the overall "Functions and
Operators" ToC, but a "Limits and Compatibility" sect2 under
functions-xml would be visible there. Tucking it as a sect3 within an
existing sect2 would prevent that, but it is common information that
pertains to more than one of them, so that doesn't seem quite right.

Or is it ok to have a "Limits and Compatibility" visible in the ToC
under XML Functions?

-Chap



Re: What to name the current heap after pluggable storage / what to rename?

2019-01-10 Thread Haribabu Kommi
On Fri, Jan 11, 2019 at 11:05 AM Andres Freund  wrote:

> Hi,
>
> On 2018-12-19 14:21:29 -0500, Robert Haas wrote:
> > On Tue, Dec 18, 2018 at 11:17 PM Andres Freund 
> wrote:
> > > Another would be to be aggressive in renaming, and deconflict by
> > > renaming functions like heap_create[_with_catalog] etc to sound more
> > > accurate. I think that has some appeal, because a lot of those names
> > > aren't describing their tasks particularly well.
> >
> > I like that option.
>
> I'd like to start doing that by moving the functions in the following
> heapam.h block elsewhere:
>
> /* in heap/heapam.c */
> extern Relation relation_open(Oid relationId, LOCKMODE lockmode);
> extern Relation try_relation_open(Oid relationId, LOCKMODE lockmode);
> extern Relation relation_openrv(const RangeVar *relation, LOCKMODE
> lockmode);
> extern Relation relation_openrv_extended(const RangeVar *relation,
>  LOCKMODE lockmode, bool
> missing_ok);
> extern void relation_close(Relation relation, LOCKMODE lockmode);
>
> extern Relation heap_open(Oid relationId, LOCKMODE lockmode);
> extern Relation heap_openrv(const RangeVar *relation, LOCKMODE lockmode);
> extern Relation heap_openrv_extended(const RangeVar *relation,
>  LOCKMODE lockmode, bool
> missing_ok);
>
> ISTM that the first block would best belong into new files like
> access/rel[ation].h and access/common/rel[ation].h.  I think the second
> set should be renamed to be table_open() (with backward compat
> redirects, there's way way too many references) and should go into
> access/table.h access/table/table.c alongside tableam.[ch], but I could
> also see just putting them into relation.[ch].
>
>  Comments?
>

Yes, that will be good.

Regards,
Haribabu Kommi
Fujitsu Australia


Re: Displaying and dumping of table access methods

2019-01-10 Thread Amit Khandekar
On Wed, 9 Jan 2019 at 14:30, Dmitry Dolgov <9erthali...@gmail.com> wrote:
>
> > On Tue, Jan 8, 2019 at 6:34 PM Andres Freund  wrote:
> >
> > On 2019-01-08 11:30:56 +0100, Peter Eisentraut wrote:
> > > On 08/01/2019 00:56, Andres Freund wrote:
> > > > A patch at [2] adds display of a table's access method to \d+ - but that
> > > > means that running the tests with a different default table access
> > > > method (e.g. using PGOPTIONS='-c default_table_access_method=...)
> > > > there'll be a significant number of test failures, even though the test
> > > > results did not meaningfully differ.
> > >
> > > For psql, a variable that hides the access method if it's the default.
> >
> > Yea, I think that seems the least contentious solution.  Don't like it
> > too much, but it seems better than the alternative. I wonder if we want
> > one for multiple regression related issues, or whether one specifically
> > about table AMs is more appropriate. I lean towards the latter.
>
> Are there any similar existing regression related issues? If no, then probably
> the latter indeed makes more sense.
>
> > > > Similarly, if pg_dump starts to dump table access methods either
> > > > unconditionally, or for all non-heap AMS, the pg_dump tests fail due to
> > > > unimportant differences.
> > >
> > > For pg_dump, track and set the default_table_access_method setting
> > > throughout the dump (similar to how default_with_oids was handled, I
> > > believe).
> >
> > Yea, that's similar to that, and I think that makes sense.
>
> Yes, sounds like a reasonable approach, I can proceed with it.

Dmitry, I believe you have taken the pg_dump part only. If that's
right, I can proceed with the psql part. Does that sound right ?

>


-- 
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company



Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-10 Thread Amit Langote
On 2019/01/11 11:21, Etsuro Fujita wrote:
> (2019/01/10 21:23), Amit Langote wrote:
>> On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat
>>   wrote:
>>> Though this will solve a problem for performance when partition-wise
>>> join is not possible, we still have the same problem when
>>> partition-wise join is possible. And that problem really happens
>>> because our inheritance mechanism requires expression translation from
>>> parent to child everywhere. That consumes memory, eats CPU cycles and
>>> generally downgrades performance of partition related query planning. I
>>> think a better way would be to avoid these translations and use Parent
>>> var to represent a Var of the child being dealt with. That will be a
>>> massive churn on inheritance based planner code, but it will improve
>>> planning time for queries involving thousands of partitions.
>>
>> Yeah, it would be nice going forward to overhaul inheritance planning
>> such that parent-to-child Var translation is not needed, especially
>> where no pruning can occur or many partitions remain even after
>> pruning.
> 
> I agree on that point, but I think that's an improvement for a future
> release rather than a fix for the issue reported on this thread.

Agreed.  Improving planning performance for large number of partitions
even in the absence of pruning is a good goal to pursue for future
versions, as is being discussed in some other threads [1].

Thanks,
Amit

[1]
https://www.postgresql.org/message-id/0A3221C70F24FB45833433255569204D1FB60AE5%40G01JPEXMBYT05




Re: doc: where best to add ~ 400 words to the manual?

2019-01-10 Thread Chapman Flack
On 12/30/18 22:23, Chapman Flack wrote:
> I would like to add material to the manual, detailing the differences
> between the XPath 1.0 language supported in PG, and the XQuery/XPath 2.0+
> expected by the standard.
> ...
> I have thought of:
> ...
> 3. A sect2 at the very end of functions-xml, linked to from a short remark
> at the top.

(3) seems to be supported by (the only) precedent, the "Limits and
Compatibility" sect3 within functions-posix-regexp.

So, absent objection, I'll work on a Limits and Compatibility sect2
within functions-xml.

-Chap



Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-10 Thread Amit Langote
Fujita-san,

On 2019/01/10 15:07, Etsuro Fujita wrote:
> Amit-san,
> 
> (2019/01/10 10:41), Amit Langote wrote:
>> On 2019/01/09 20:20, Etsuro Fujita wrote:
>>> I like your patch in general.  I think one way to address Ashutosh's
>>> concerns would be to use the consider_partitionwise_join flag: originally,
>>> that was introduced for partitioned relations to show that they can be
>>> partitionwise-joined, but I think that flag could also be used for
>>> non-partitioned relations to show that they have been set up properly for
>>> partitionwise-joins, and I think by checking that flag we could avoid
>>> creating those copies for child dummy rels in try_partitionwise_join.
>>
>> Ah, that's an interesting idea.
>>
>> If I understand the original design of it correctly,
>> consider_partitionwise_join being true for a given relation (simple or
>> join) means that its RelOptInfo contains properties to consider it to be
>> joined with another relation (simple or join) using partitionwise join
>> mechanism.  Partitionwise join will occur between the pair if the other
>> relation also has relevant properties (hence its
>> consider_partitionwise_join set to true) and properties on the two sides
>> match.
> 
> Actually, the flag being true just means that the tlist for a given
> partitioned relation (simple or join) doesn't contain any whole-row Vars. 
> And if two given partitioned relations having the flag being true have
> additional properties to be joined using the PWJ technique, then we try to
> do PWJ for those partitioned relations.

I see.  Thanks for the explanation.

> (The name of the flag isn't
> good?  If so, that would be my fault because I named that flag.)

If it's really just to store the fact that the relation's targetlist
contains expressions that partitionwise join currently cannot handle, then
setting it like this in set_append_rel_size seems a bit misleading:

if (enable_partitionwise_join &&
rel->reloptkind == RELOPT_BASEREL &&
rte->relkind == RELKIND_PARTITIONED_TABLE &&
rel->attr_needed[InvalidAttrNumber - rel->min_attr] == NULL)
rel->consider_partitionwise_join = true;

Sorry, I wasn't around to comment on the patch which got committed in
7cfdc77023a, but checking the value of enable_partitionwise_join and other
things in set_append_rel_size() to set the value of
consider_partitionwise_join seems a bit odd to me.  Perhaps,
consider_partitionwise_join should be initially set to true for a relation
(actually, to rel->part_scheme != NULL) and only set it to false if the
relation's targetlist is found to contain unsupported expressions.  That
way, it becomes easier to think what it means imho.  I think
enable_partitionwise_join should only be checked in relnode.c or
joinrels.c.  I've attached a patch to show what I mean. Can you please
take a look?

If you think that this patch is a good idea, then you'll need to
explicitly set consider_partitionwise_join to false for a dummy partition
rel in set_append_rel_size(), because the assumption of your patch that
such partition's rel's consider_partitionwise_join would be false (as
initialized with the current code) would be broken by my patch.  But that
might be a good thing to do anyway as it will document the special case
usage of consider_partitionwise_join variable more explicitly, assuming
you'll be adding a comment describing why it's being set to false explicitly.

>> That's a loaded meaning and abusing it to mean something else can be
>> challenged, but we can live with that if properly documented.  Speaking of
>> which:
>>
>>  /* used by partitionwise joins: */
>>  bool    consider_partitionwise_join;    /* consider
>> partitionwise join
>>   * paths? (if partitioned
>> rel) */
>>
>> Maybe, mention here how it will be abused in back-branches for
>> non-partitioned relations?
> 
> Will do.

Thank you.

>>> Please find attached an updated version of the patch.  I modified your
>>> version so that building tlists for child dummy rels are also postponed
>>> until after they actually participate in partitionwise-joins, to avoid
>>> that possibly-useless work as well.  I haven't done any performance tests
>>> yet though.
>>
>> Thanks for updating the patch.  I tested your patch (test setup described
>> below) and it has almost the same performance as my previous version:
>> 552ms (vs. 41159ms on HEAD vs. 253ms on PG 10) for the query also
>> mentioned below.
> 
> Thanks for that testing!
> 
> I also tested the patch with your script:
> 
> 253.559 ms (vs. 85776.515 ms on HEAD vs. 206.066 ms on PG 10)

Oh, PG 11 doesn't appear as bad compared to PG 10 with your numbers as it
did on my machine.  That's good anyway.

Thanks,
Amit
diff --git a/src/backend/optimizer/path/allpaths.c 
b/src/backend/optimizer/path/allpaths.c
index 1999cd4436..24900ce593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1305,15 

Acceptable/Best formatting of callbacks (for pluggable storage)

2019-01-10 Thread Andres Freund
Hi,

The pluggable storage patchset has a large struct full of callbacks, and
a bunch of wrapper functions for calling those callbacks. While
starting to polish the patchset, I tried to make the formatting nice.

By default pgindent yields formatting like:

/*
 * API struct for a table AM.  Note this must be allocated in a
 * server-lifetime manner, typically as a static const struct, which then gets
 * returned by FormData_pg_am.amhandler.
 */
typedef struct TableAmRoutine
{
NodeTag type;

...
void(*relation_set_new_filenode) (Relation relation,
  char persistence,
  TransactionId *freezeXid,
  MultiXactId *minmulti);

...


static inline void
table_set_new_filenode(Relation rel, char persistence,
   TransactionId *freezeXid, MultiXactId *minmulti)
{
rel->rd_tableam->relation_set_new_filenode(rel, persistence,
   freezeXid, minmulti);
}

which isn't particularly pretty, especially because there's callbacks
with longer names than the example above.


Unfortunately pgindent prevents formatting the callbacks like:
void(*relation_set_new_filenode) (
Relation relation,
char persistence,
TransactionId *freezeXid,
MultiXactId *minmulti);

or something in that vein.  What however does work, is:

void(*relation_set_new_filenode)
(Relation relation,
 char persistence,
 TransactionId *freezeXid,
 MultiXactId *minmulti);

I.e. putting the opening ( of the parameter list into a separate line
yields somewhat usable formatting. This also has the advantage that the
arguments of all callbacks line up, making it a bit easier to scan.

Similarly, to reduce the indentation, especially for callbacks with long
names and/o with longer paramter names, we can do:

static inline void
table_set_new_filenode(Relation rel, char persistence,
   TransactionId *freezeXid, MultiXactId *minmulti)
{
rel->rd_tableam->relation_set_new_filenode
(rel, persistence, freezeXid, minmulti);
}


So, putting the parameter list, both in use and declaration, entirely
into a new line yields decent formatting with pgindent. But it's kinda
weird.  I can't really come up with a better alternative, and after a
few minutes it looks pretty reasonable.

Comments? Better alternatives?

Greetings,

Andres Freund



Re: Ryu floating point output patch

2019-01-10 Thread Tom Lane
Andres Freund  writes:
> On 2019-01-10 23:02:01 -0500, Chapman Flack wrote:
>> Does the project have an established view on non-ASCII names in
>> sources or docs?
>> AFAICS [1], the name of the algorithm may be Ryū.

> I think it'd be a really bad idea to start having non-ascii
> filenames, and I quite doubt that I'm alone in that.

Non-ASCII filenames seem right out.  I thought the question was
about whether we even want to have non-ASCII characters within
source code (my view: avoid if possible) or in docs (we do,
but it's better if you can make it into html entities).

regards, tom lane



Re: Ryu floating point output patch

2019-01-10 Thread Andres Freund
Hi,

On 2019-01-10 23:02:01 -0500, Chapman Flack wrote:
> Does the project have an established view on non-ASCII names in
> sources or docs?
> 
> AFAICS [1], the name of the algorithm may be Ryū.

I think it'd be a really bad idea to start having non-ascii
filenames, and I quite doubt that I'm alone in that.

- Andres



Re: Ryu floating point output patch

2019-01-10 Thread Chapman Flack
Does the project have an established view on non-ASCII names in
sources or docs?

AFAICS [1], the name of the algorithm may be Ryū.

-Chap


[1] https://dl.acm.org/citation.cfm?id=3192369



Re: How does the planner determine plan_rows ?

2019-01-10 Thread Tom Lane
Donald Dong  writes:
> I created some empty tables and run ` EXPLAIN ANALYZE` on `SELECT * `. I found
> the results have different row numbers, but the tables are all empty.

This isn't a terribly interesting case, since you've neither loaded
any data nor vacuumed/analyzed the table, but ...

> I found this behavior unexpected. I'm still trying to find out how/where the 
> planner
> determines the plan_rows.

... estimate_rel_size() in plancat.c is where to look to find out
about the planner's default estimates when it's lacking hard data.

regards, tom lane



Re: speeding up planning with partitions

2019-01-10 Thread David Rowley
On Thu, 10 Jan 2019 at 21:41, Amit Langote
 wrote:
> Here's v12, which is more or less same as v11 but with the order of
> patches switched so that the code movement patch is first.  Note that the
> attached v12-0001 contains no functional changes (but there's tiny note in
> the commit message mentioning the addition of a tiny function which is
> just old code).

A few more comments based on reading v12.

v12 0002:

1. Missing braces around the else clause. (Should be consistent with
the "if" above)

+ if (!has_live_children)
+ {
+ /*
+ * All children were excluded by constraints, so mark the relation
+ * ass dummy.  We must do this in this phase so that the rel's
+ * dummy-ness is visible when we generate paths for other rels.
+ */
+ set_dummy_rel_pathlist(rel);
+ }
+ else
+ /*
+ * Set a non-zero value here to cope with the caller's requirement
+ * that non-dummy relations are actually not empty.  We don't try to
+ * be accurate here, because we're not going to create a path that
+ * combines the children outputs.
+ */
+ rel->rows = 1;

v12 0004:

2. I wonder if there's a better way, instead of doing this:

+ if (child_rel1 == NULL)
+ child_rel1 = build_dummy_partition_rel(root, rel1, cnt_parts);
+ if (child_rel2 == NULL)
+ child_rel2 = build_dummy_partition_rel(root, rel2, cnt_parts);

maybe add some logic in populate_joinrel_with_paths() to allow NULL
rels to mean dummy rels.  There's a bit of a problem there as the
joinrel has already been created by that time, but perhaps
populate_joinrel_with_paths is a better place to decide if the dummy
rel needs to be built or not.

3. I wonder if there's a better way to handle what
build_dummy_partition_rel() does. I see the child's relid to the
parent's relid and makes up a fake AppendRelInfo and puts it in the
parent's slot.  What's going to happen when the parent is not the
top-level parent? It'll already have a AppendRelInfo set.

I had thought something like the following could break this, but of
course, it does not since we build the dummy rel for the pruned
sub_parent2, so we don't hit the NULL relation case when doing the
next level. i.e we only make dummies for the top-level, never dummys
of joinrels.

Does that not mean that the if (parent->top_parent_relids) will always
be false in build_dummy_partition_rel() and it'll only ever get
rtekind == RTE_RELATION?

drop table if exists parent;
create table parent (id int, a int, b text, c float) partition by range (a);
create table sub_parent1 (b text, c float, a int, id int) partition by
range (a);
create table sub_parent2 (c float, b text, id int, a int) partition by
range (a);
alter table parent attach partition sub_parent1 for values from (0) to (10);
alter table parent attach partition sub_parent2 for values from (10) to (20);

create table child11 (id int, b text, c float, a int);
create table child12 (id int, b text, c float, a int);
create table child21 (id int, b text, c float, a int);
create table child22 (id int, b text, c float, a int);

alter table sub_parent1 attach partition child11 for values from (0) to (5);
alter table sub_parent1 attach partition child12 for values from (5) to (10);
alter table sub_parent2 attach partition child21 for values from (10) to (15);
alter table sub_parent2 attach partition child22 for values from (15) to (20);

insert into parent values(0,1,'test',100.0);

select * from parent p1 inner join parent p2 on p1.a=p2.a where p1.id < 10;

4. How are dummy rels handled in grouping_planner()?

I see you have this:

- if (IS_DUMMY_REL(planned_rel))
+ if (!parent_rte->inh || IS_DUMMY_REL(planned_rel))
  {
  grouping_planner(root, false, planned_rel, 0.0);
  return;

With the above example I tried to see how it was handled by doing:

postgres=# update parent set c = c where a = 333;
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.

I didn't look into what's causing the crash.

5. Wondering why you removed the if (childOID != parentOID) from:

- if (childOID != parentOID && RELATION_IS_OTHER_TEMP(newrelation))
- {
- heap_close(newrelation, lockmode);
- continue;
- }

Isn't that releasing the only lock we hold on the rel defined in the query?

I tested with:

-- session 1
create temp table a1(a int);
create temp table a2(a int) inherits(a1);

-- session 2
select oid::regclass from pg_class where relname = 'a1';
 oid
--
 pg_temp_3.a1
(1 row)

explain select * from pg_temp_3.a1;
WARNING:  you don't own a lock of type AccessShareLock
QUERY PLAN
--
 Result  (cost=0.00..0.00 rows=0 width=4)
   One-Time Filter: false
(2 rows)

6. expand_planner_arrays() zeros a portion of the append_rel_array
even if it just palloc0'd the array. While it's not a bug, it is
repeat work.  It should be okay to move the Memset() up to the
repalloc().

7. I see get_relation_info() grew an extra parameter.  Would it now be
better 

Re: How does the planner determine plan_rows ?

2019-01-10 Thread Donald Dong
Thank you for the great explanation!

> On Jan 10, 2019, at 7:48 PM, Andrew Gierth  
> wrote:
> 
>> "Donald" == Donald Dong  writes:
> 
> Donald> Hi,
> Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
> Donald> `SELECT * `. I found the results have different row numbers,
> Donald> but the tables are all empty.
> 
> Empty tables are something of a special case, because the planner
> doesn't assume that they will _stay_ empty, and using an estimate of 0
> or 1 rows would tend to create a distorted plan that would likely blow
> up in runtime as soon as you insert a second row.
> 
> The place to look for info would be estimate_rel_size in
> optimizer/util/plancat.c, from which you can see that empty tables get
> a default size estimate of 10 pages. Thus:
> 
> Donald> =# CREATE TABLE t1(id INT, data INT);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
> Donald>   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
> Donald>   time=0.003..0.003 rows=0 loops=1)
> 
> An (int,int) tuple takes about 36 bytes, so you can get about 226 of
> them on a page, so 10 pages is 2260 rows.
> 
> Donald> =# CREATE TABLE t2(data VARCHAR);
> Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
> Donald>   Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
> Donald>   time=0.002..0.002 rows=0 loops=1)
> 
> Size of a varchar with no specified length isn't known, so the planner
> determines an average length of 32 by the time-honoured method of rectal
> extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
> page.
> 
> -- 
> Andrew (irc:RhodiumToad)




Re: How does the planner determine plan_rows ?

2019-01-10 Thread Andrew Gierth
> "Donald" == Donald Dong  writes:

 Donald> Hi,
 Donald> I created some empty tables and run ` EXPLAIN ANALYZE` on
 Donald> `SELECT * `. I found the results have different row numbers,
 Donald> but the tables are all empty.

Empty tables are something of a special case, because the planner
doesn't assume that they will _stay_ empty, and using an estimate of 0
or 1 rows would tend to create a distorted plan that would likely blow
up in runtime as soon as you insert a second row.

The place to look for info would be estimate_rel_size in
optimizer/util/plancat.c, from which you can see that empty tables get
a default size estimate of 10 pages. Thus:

 Donald> =# CREATE TABLE t1(id INT, data INT);
 Donald> =# EXPLAIN ANALYZE SELECT * FROM t1;
 Donald>   Seq Scan on t1  (cost=0.00..32.60 rows=2260 width=8) (actual
 Donald>   time=0.003..0.003 rows=0 loops=1)

An (int,int) tuple takes about 36 bytes, so you can get about 226 of
them on a page, so 10 pages is 2260 rows.

 Donald> =# CREATE TABLE t2(data VARCHAR);
 Donald> =# EXPLAIN ANALYZE SELECT * FROM t2;
 Donald>   Seq Scan on t2  (cost=0.00..23.60 rows=1360 width=32) (actual
 Donald>   time=0.002..0.002 rows=0 loops=1)

Size of a varchar with no specified length isn't known, so the planner
determines an average length of 32 by the time-honoured method of rectal
extraction (see get_typavgwidth in lsyscache.c), making 136 rows per
page.

-- 
Andrew (irc:RhodiumToad)



Re: Remove all "INTERFACE ROUTINES" style comments

2019-01-10 Thread Tom Lane
Thomas Munro  writes:
> On Fri, Jan 11, 2019 at 12:58 PM Andres Freund  wrote:
>> A number of postgres files have sections like heapam's
>> * INTERFACE ROUTINES
>> 
>> They're often out-of-date, and I personally never found them to be
>> useful. A few people, including yours truly, have been removing a few
>> here and there when overhauling a subsystem to avoid having to update
>> and then adjust them.
>> I think it might be a good idea to just do that for all at once.

> +1

I agree we don't maintain them well, so it'd be better to remove them,
as long as we make sure any useful info gets transferred someplace else
(like per-function header comments).

regards, tom lane



Re: New function pg_stat_statements_reset_query() to reset statistics of a specific query

2019-01-10 Thread Amit Kapila
On Thu, Jan 10, 2019 at 12:11 PM Haribabu Kommi
 wrote:
> On Thu, Jan 10, 2019 at 2:32 PM Amit Kapila  wrote:
>>
>> I am happy with this patch now, attached is the new version of the
>> patch where I have slightly modified the commit message, see if that
>> looks fine to you.  I will push this patch tomorrow morning (after
>> again going through it) unless someone has any objections or see any
>> other issues.
>
>
> Thanks for changes. LGTM.
>

Pushed.

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: Ryu floating point output patch

2019-01-10 Thread Andrew Gierth
Rebased this patch onto current master. No functional changes; just
fixed up the copyright dates and a couple of stray missing UINT64CONSTs.

Regression tests still fail because how to fix them depends on the
issues below. Likewise docs are not changed yet for the same reason.

Decisions that need to be made in order to proceed:

1. Should exact output become the default, or is it more important to
   preserve the historical output?

  I will argue very strongly that the default output should be exact.

2. How far do we need to go to support existing uses of
   extra_float_digits? For example, do we need a way for clients to
   request that they get the exact same output as previously for
   extra_float_digits=2 or 3, rather than just assuming that any
   round-trip-exact value will do?

  (this would likely mean adding a new GUC, rather than basing
  everything off extra_float_digits as the patch does now)

3. Do we need to do anything about how conversions from floats to
   numeric work? At present they _ignore_ extra_float_digits (presumably
   on immutability grounds) and convert using only DBL_DIG digits of
   precision.

  I have no very strong position on this but incline to keeping the
  existing behavior, though possibly it would be good to add functions
  to get the round-trip value and possibly even the true exact value.
  (It would be sort of nice if CAST(floatval AS numeric(400,18)) or
  similar could work as you'd expect, but currently we treat that as
  floatval::numeric::numeric(400,18) so the cast function doesn't know
  about the eventual typmod.)

4. Can we allow declaration-after-statement please? That would allow
   keeping this code significantly closer to its upstream.

-- 
Andrew (irc:RhodiumToad)

diff --git a/src/backend/utils/adt/float.c b/src/backend/utils/adt/float.c
index 117ded8d1d..6c6f031b4a 100644
--- a/src/backend/utils/adt/float.c
+++ b/src/backend/utils/adt/float.c
@@ -21,6 +21,7 @@
 
 #include "catalog/pg_type.h"
 #include "common/int.h"
+#include "common/ryu.h"
 #include "libpq/pqformat.h"
 #include "miscadmin.h"
 #include "utils/array.h"
@@ -31,7 +32,7 @@
 
 
 /* Configurable GUC parameter */
-int			extra_float_digits = 0; /* Added to DBL_DIG or FLT_DIG */
+int			extra_float_digits = 1; /* Added to DBL_DIG or FLT_DIG */
 
 /* Cached constants for degree-based trig functions */
 static bool degree_consts_set = false;
@@ -252,6 +253,12 @@ float4out(PG_FUNCTION_ARGS)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = FLT_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_f2s_buffered(num, ascii);
+		PG_RETURN_CSTRING(ascii);
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	PG_RETURN_CSTRING(ascii);
 }
@@ -468,6 +475,12 @@ float8out_internal(double num)
 	char	   *ascii = (char *) palloc(32);
 	int			ndig = DBL_DIG + extra_float_digits;
 
+	if (extra_float_digits > 0)
+	{
+		ryu_d2s_buffered(num, ascii);
+		return ascii;
+	}
+
 	(void) pg_strfromd(ascii, 32, ndig, num);
 	return ascii;
 }
diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c
index f81e0424e7..404ba2bfa0 100644
--- a/src/backend/utils/misc/guc.c
+++ b/src/backend/utils/misc/guc.c
@@ -2631,11 +2631,12 @@ static struct config_int ConfigureNamesInt[] =
 		{"extra_float_digits", PGC_USERSET, CLIENT_CONN_LOCALE,
 			gettext_noop("Sets the number of digits displayed for floating-point values."),
 			gettext_noop("This affects real, double precision, and geometric data types. "
-		 "The parameter value is added to the standard number of digits "
-		 "(FLT_DIG or DBL_DIG as appropriate).")
+		 "A zero or negative parameter value is added to the standard "
+		 "number of digits (FLT_DIG or DBL_DIG as appropriate). "
+		 "Any value greater than zero selects round-trip-safe output.")
 		},
 		_float_digits,
-		0, -15, 3,
+		1, -15, 3,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/common/Makefile b/src/common/Makefile
index d0c2b970eb..7e6e1237c5 100644
--- a/src/common/Makefile
+++ b/src/common/Makefile
@@ -44,9 +44,11 @@ override CPPFLAGS += -DVAL_LIBS="\"$(LIBS)\""
 override CPPFLAGS := -DFRONTEND -I. -I$(top_srcdir)/src/common $(CPPFLAGS)
 LIBS += $(PTHREAD_LIBS)
 
-OBJS_COMMON = base64.o config_info.o controldata_utils.o exec.o file_perm.o \
-	ip.o keywords.o kwlookup.o link-canary.o md5.o pg_lzcompress.o \
-	pgfnames.o psprintf.o relpath.o \
+# If you add objects here, see also src/tools/msvc/Mkvcbuild.pm
+
+OBJS_COMMON = base64.o config_info.o controldata_utils.o d2s.o exec.o f2s.o \
+	file_perm.o ip.o keywords.o kwlookup.o link-canary.o md5.o \
+	pg_lzcompress.o pgfnames.o psprintf.o relpath.o \
 	rmtree.o saslprep.o scram-common.o string.o unicode_norm.o \
 	username.o wait_error.o
 
diff --git a/src/common/d2s.c b/src/common/d2s.c
new file mode 100644
index 00..1c63fb198a
--- /dev/null
+++ b/src/common/d2s.c
@@ -0,0 +1,961 @@
+/*---
+ *
+ * Ryu floating-point output 

RE: Improve selectivity estimate for range queries

2019-01-10 Thread Yuzuko Hosoya
Hi,

Thanks for the comments, and I'm sorry for the late reply.

> From: Tom Lane [mailto:t...@sss.pgh.pa.us]
> Sent: Friday, January 11, 2019 7:04 AM
> > Robert Haas  writes:
> > On Fri, Dec 21, 2018 at 11:50 AM Tom Lane  wrote:
> >> A smaller-footprint way to fix the immediate problem might be to
> >> change the values of DEFAULT_INEQ_SEL and friends so that they're
> >> even less likely to be matched by accident.  That is, instead of
> >> 0., use 0.342 or some such.
> 
> > That's not a dumb idea, but it seems pretty unprincipled to me, and to
> > be honest I'm kind of surprised that you're not proposing something
> > cleaner.
> 
> The problem is the invasiveness of such a change (large) vs the benefit (not 
> so large).  The
upthread
> patch attempted to add a separate signaling path, but it was very incomplete 
> --- and yet both I
and
> Horiguchi-san thought it was already too messy.
> 
> Maybe at some point we'll go over to something reasonably principled, like 
> adding confidence
intervals
> to all selectivity estimates.  That would be *really* invasive but perhaps 
> would bring enough
benefit
> to justify itself.  But the current patch is just attempting to fix one 
> extremely narrow pain
point,
> and if that is all it's doing it should have a commensurately small 
> footprint.  So I don't think
the
> submitted patch looks good from a cost/benefit standpoint.
> 
Yes, I agree with you.  Indeed the patch I attached is insufficient in 
cost-effectiveness.
However, I want to solve problems of that real estimates happened to equal to 
the default 
values such as this case, even though it's a narrow pain point.  So I tried 
distinguishing
explicitly between real estimates and otherwise as Robert said.

The idea Tom proposed and Horiguchi-san tried seems reasonable, but I'm 
concerned whether
any range queries really cannot match 0.342 (or some such) by 
accident in any 
environments.  Is the way which Horiguchi-san did enough to prove that?


Best regards,
Yuzuko Hosoya
NTT Open Source Software Center





Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-10 Thread Etsuro Fujita

(2019/01/10 21:23), Amit Langote wrote:

On Thu, Jan 10, 2019 at 6:49 PM Ashutosh Bapat
  wrote:

Though this will solve a problem for performance when partition-wise join is 
not possible, we still have the same problem when partition-wise join is 
possible. And that problem really happens because our inheritance mechanism 
requires expression translation from parent to child everywhere. That consumes 
memory, eats CPU cycles and generally downgrades performance of partition 
related query planning. I think a better way would be to avoid these 
translations and use Parent var to represent a Var of the child being dealt 
with. That will be a massive churn on inheritance based planner code, but it 
will improve planning time for queries involving thousands of partitions.


Yeah, it would be nice going forward to overhaul inheritance planning
such that parent-to-child Var translation is not needed, especially
where no pruning can occur or many partitions remain even after
pruning.


I agree on that point, but I think that's an improvement for a future 
release rather than a fix for the issue reported on this thread.


Best regards,
Etsuro Fujita




Re: add_partial_path() may remove dominated path but still in use

2019-01-10 Thread Kohei KaiGai
2019年1月11日(金) 5:52 Robert Haas :
>
> On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai  wrote:
> > So, is it sufficient if set_rel_pathlist_hook is just relocated in
> > front of the generate_gather_paths?
> > If we have no use case for the second hook, here is little necessity
> > to have the post_rel_pathlist_hook() here.
> > (At least, PG-Strom will use the first hook only.)
>
> +1.  That seems like the best way to be consistent with the principle
> that we need to have all the partial paths before generating any
> Gather paths.
>
Patch was updated, just for relocation of the set_rel_pathlist_hook.
Please check it.

Thanks,
-- 
HeteroDB, Inc / The PG-Strom Project
KaiGai Kohei 


pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v10.patch
Description: Binary data


pgsql-bugfix-relocation-path-set_rel_pathlist_hook.v11.patch
Description: Binary data


Re: speeding up planning with partitions

2019-01-10 Thread Amit Langote
Sorry, I hadn't read this email before sending my earlier "thank you for
committing" email.

On 2019/01/11 6:47, David Rowley wrote:
> On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera  wrote:
>> Pushed 0001 with some minor tweaks, thanks.
> 
> Thanks for pushing.  I had looked at 0001 last night and there wasn't
> much to report, just:
> 
> v12 0001
> 
> 1. I see you've moved translate_col_privs() out of prepunion.c into
> appendinfo.c. This required making it an external function, but it's
> only use is in inherit.c, so would it not be better to put it there
> and keep it static?

Actually, I *was* a bit puzzled where to put it.  I tend to agree with you
now that it might be define it locally within inherit.c as it might not be
needed elsewhere.  Let's hear what Alvaro thinks.  I'm attaching a patch
anyway.

> 2. The following two lines I think need to swap their order.
> 
> +#include "utils/rel.h"
> +#include "utils/lsyscache.h"

Oops, fixed.

> Both are pretty minor details but thought I'd post them anyway.

Thank you for reporting.

Attached find the patch.

Regards,
Amit
diff --git a/src/backend/optimizer/util/appendinfo.c 
b/src/backend/optimizer/util/appendinfo.c
index d48e3a09b3..2f23e7bf49 100644
--- a/src/backend/optimizer/util/appendinfo.c
+++ b/src/backend/optimizer/util/appendinfo.c
@@ -15,13 +15,12 @@
 #include "postgres.h"
 
 #include "access/htup_details.h"
-#include "access/sysattr.h"
 #include "nodes/makefuncs.h"
 #include "nodes/nodeFuncs.h"
 #include "optimizer/appendinfo.h"
 #include "parser/parsetree.h"
-#include "utils/rel.h"
 #include "utils/lsyscache.h"
+#include "utils/rel.h"
 #include "utils/syscache.h"
 
 
@@ -167,58 +166,6 @@ make_inh_translation_list(Relation oldrelation, Relation 
newrelation,
 }
 
 /*
- * translate_col_privs
- *   Translate a bitmapset representing per-column privileges from the
- *   parent rel's attribute numbering to the child's.
- *
- * The only surprise here is that we don't translate a parent whole-row
- * reference into a child whole-row reference.  That would mean requiring
- * permissions on all child columns, which is overly strict, since the
- * query is really only going to reference the inherited columns.  Instead
- * we set the per-column bits for all inherited columns.
- */
-Bitmapset *
-translate_col_privs(const Bitmapset *parent_privs,
-   List *translated_vars)
-{
-   Bitmapset  *child_privs = NULL;
-   boolwhole_row;
-   int attno;
-   ListCell   *lc;
-
-   /* System attributes have the same numbers in all tables */
-   for (attno = FirstLowInvalidHeapAttributeNumber + 1; attno < 0; attno++)
-   {
-   if (bms_is_member(attno - FirstLowInvalidHeapAttributeNumber,
- parent_privs))
-   child_privs = bms_add_member(child_privs,
-   
 attno - FirstLowInvalidHeapAttributeNumber);
-   }
-
-   /* Check if parent has whole-row reference */
-   whole_row = bms_is_member(InvalidAttrNumber - 
FirstLowInvalidHeapAttributeNumber,
- parent_privs);
-
-   /* And now translate the regular user attributes, using the vars list */
-   attno = InvalidAttrNumber;
-   foreach(lc, translated_vars)
-   {
-   Var*var = lfirst_node(Var, lc);
-
-   attno++;
-   if (var == NULL)/* ignore dropped columns */
-   continue;
-   if (whole_row ||
-   bms_is_member(attno - 
FirstLowInvalidHeapAttributeNumber,
- parent_privs))
-   child_privs = bms_add_member(child_privs,
-   
 var->varattno - FirstLowInvalidHeapAttributeNumber);
-   }
-
-   return child_privs;
-}
-
-/*
  * adjust_appendrel_attrs
  *   Copy the specified query or expression and translate Vars referring 
to a
  *   parent rel to refer to the corresponding child rel instead.  We also
diff --git a/src/backend/optimizer/util/inherit.c 
b/src/backend/optimizer/util/inherit.c
index 350e6afe27..db474acbc5 100644
--- a/src/backend/optimizer/util/inherit.c
+++ b/src/backend/optimizer/util/inherit.c
@@ -15,6 +15,7 @@
 #include "postgres.h"
 
 #include "access/heapam.h"
+#include "access/sysattr.h"
 #include "catalog/partition.h"
 #include "catalog/pg_inherits.h"
 #include "miscadmin.h"
@@ -38,6 +39,8 @@ static void expand_single_inheritance_child(PlannerInfo *root,
PlanRowMark 
*top_parentrc, Relation childrel,
List 
**appinfos, RangeTblEntry **childrte_p,
  

Re: Commitfest delayed: extend it?

2019-01-10 Thread David Fetter
On Fri, Jan 11, 2019 at 09:53:16AM +0900, Michael Paquier wrote:
> On Thu, Jan 10, 2019 at 05:44:34PM -0300, Alvaro Herrera wrote:
> > It has definitely started, at least for me :-)
> 
> Yes, there is no point in extending or delaying it.
> 
> > We're going to have a bit of a triage session in the FOSDEM dev's
> > meeting, on Jan 31st, right at the end.  I think that will be a
> > good opportunity to give some final cleanup, and we should close
> > it then or shortly thereafter.
> 
> A lot of folks are going to be there then (not me)?  If it is
> possible to get the CF closed more or less on time using this method
> that would be nice.

Consensus having been reached, I'll aim for 1/31 or 2/1 at the latest.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-10 Thread Amit Langote
On 2019/01/10 19:27, Michael Paquier wrote:
> On Thu, Jan 10, 2019 at 05:58:10PM +0900, Amit Langote wrote:
>> The reason I started this thread is due to this Stack Overflow question:
>>
>> https://stackoverflow.com/questions/53554727/logical-replication-and-declarative-partitioning-in-postgresql-11
>>
>> So, it appears that there may be an element of surprise involved in
>> encountering such an error (despite the documentation).
> 
> Improving the user experience is definitely a good thing in my
> opinion because the current error message can be confusing, so you
> were right to start this thread.  Still I don't agree that classifying
> those relkinds as not supported is right either for consistency with
> the code existing for two years and for the way the code is designed
> to work as rows are replicated on a per-physically-defined relation
> basis.

Okay, I withdraw my objection to the wording proposed by you.

Thanks,
Amit




Re: speeding up planning with partitions

2019-01-10 Thread Amit Langote
On 2019/01/11 2:56, Alvaro Herrera wrote:
> On 2019-Jan-10, Amit Langote wrote:
> 
>> Here's v12, which is more or less same as v11 but with the order of
>> patches switched so that the code movement patch is first.  Note that the
>> attached v12-0001 contains no functional changes (but there's tiny note in
>> the commit message mentioning the addition of a tiny function which is
>> just old code).
> 
> Pushed 0001 with some minor tweaks, thanks.

Thank you for the tweaks and committing.

Regards,
Amit




Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-10 Thread Michael Paquier
On Fri, Jan 11, 2019 at 10:09:04AM +0900, Michael Paquier wrote:
> On Thu, Jan 10, 2019 at 04:52:53PM -0800, Andres Freund wrote:
> > It's a minor simplification for code that's existed for quite a
> > while. Not worth it.
> 
> Meh, I disagree for the ready_to_display part as it has been around
> for a long time:
> commit: 9ed551e0a4fdb046d8e5ea779adfd3e184d5dc0d
> author: Alvaro Herrera 
> date: Wed, 29 Jun 2016 16:57:17 -0400
> Add conninfo to pg_stat_wal_receiver

This should read "ready_to_display has not been around for a long
time" :)
--
Michael


signature.asc
Description: PGP signature


RE: [Proposal] Add accumulated statistics

2019-01-10 Thread Tsunakawa, Takayuki
From: Robert Haas [mailto:robertmh...@gmail.com]
> My theory is that the number of wait events is NOT useful information,
> or at least not nearly as useful the results of a sampling approach.
> The data that LWLOCK_STATS produce are downright misleading -- they
> lead you to think that the bottlenecks are in different places than
> they really are, because the locks that produce the most waiting can
> be 5th or 10th in terms of the number of wait events.

I understood you're saying that the number of waits alone does not necessarily 
indicate the bottleneck, because a wait with fewer counts but longer time can 
take a large portion of the entire SQL execution time.  So, wait time is also 
useful.  I think that's why Oracle describes and MySQL provides precise count 
and time without sampling.

Haven't LOCK_STATS been helpful for PG developers?  IIRC, it was used to 
pinpoint the bottleneck and evaluate the patch to improve shared buffers, WAL 
buffers, ProcArray, etc.


Regards
Takayuki Tsunakawa






Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-10 Thread Michael Paquier
On Thu, Jan 10, 2019 at 04:52:53PM -0800, Andres Freund wrote:
> It's a minor simplification for code that's existed for quite a
> while. Not worth it.

Meh, I disagree for the ready_to_display part as it has been around
for a long time:
commit: 9ed551e0a4fdb046d8e5ea779adfd3e184d5dc0d
author: Alvaro Herrera 
date: Wed, 29 Jun 2016 16:57:17 -0400
Add conninfo to pg_stat_wal_receiver

I was honestly unhappy from the start with it because there was no
actual way to have the WAL receiver *not* save the original connection
string.  In my opinion, getting rid of it is worth it because this
really simplifies the way the WAL receiver data visibility is handled
at SQL level and we don't finish by using again and again the same
field for different purposes, consolidating the code.

For the reloading part, we also make the WAL receiver rely on the GUC
context and stop it if there is a change in primary_conninfo and
primary_slot_name.  It would be inconsistent to rely on the GUC
context for one thing, and the startup process for the other one.
Adding a safeguard to fail WAL receiver startup is actually more of
sanity check that anything else (that could be an assertion but for
forks a failure is of better benefit).

At this stage, I would like to hear more opinions before doing or not
doing something.  Alvaro, you got involved in the introduction of
ready_to_siplay.  Peter, you have committed the merge of the recovery
params.  Perhaps you have an opinion to share?
--
Michael


signature.asc
Description: PGP signature


Re: Using Btree to Provide Sorting on Suffix Keys with LIMIT

2019-01-10 Thread Peter Geoghegan
On Sat, Dec 29, 2018 at 6:50 PM David Rowley
 wrote:
> > Areas of extension: (given index `(a, b, c)`) include `a = 1 and b in
> > (...) order by c` and `a in (...) and b = 1 order by c` (and further
> > similar derivations with increasing numbers of equality quals).
>
> I don't quite understand this the above paragraph, but I assume this
> would be possible to do with some new index am routine which allowed
> multiple different values for the initial key.

I'm confused about James' use of the term "new index am". I guess he
just meant new support function or something?

> > Note: Another (loosely) related problem is that sorting can't
> > currently take advantage of cases where an index provides a partial
> > (prefix of requested pathkeys) ordering.
>
> There has been a patch [2] around for about 4 years now that does
> this.  I'm unsure of the current status, other than not yet committed.
>
> [1] 
> https://www.postgresql.org/message-id/flat/7f70bd5a-5d16-e05c-f0b4-2fdfc8873489%40BlueTreble.com
> [2] 
> https://www.postgresql.org/message-id/flat/CAPpHfds1waRZ=NOmueYq0sx1ZSCnt+5QJvizT8ndT2=etze...@mail.gmail.com

I can see why you'd mention these two, but I also expected you to
mention the skip scan project, since that involves pushing down
knowledge about how the index is to be accessed to the index am (at
least, I assume that it does), and skipping leading attributes to use
the sort order from a suffix attribute. Actually, the partial sort
idea that you linked to is more or less a dual of skip scan, at least
to my mind (the former *extends* the sort order by adding a suffix
tie-breaker, while the latter *skips* a leading attribute to get to an
interesting suffix attribute).

The way James constructed his example suggested that there'd be some
kind of natural locality, that we'd expect to be able to take
advantage of at execution time when the new strategy is favorable. I'm
not sure if that was intended -- James? I think it might help James to
construct a more obviously realistic/practical motivating example. I'm
perfectly willing to believe that this idea would help his real world
queries, and having an example that can easily be played with is
helpful in other ways. But I'd like to know why this idea is important
is in detail, since I think that it would help me to place it in the
wider landscape of ideas that are like this. Placing it in that wider
landscape, and figuring out next steps at a high level seem to be the
problem right now.


--
Peter Geoghegan



Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-10 Thread Michael Paquier
On Thu, Jan 10, 2019 at 04:41:47PM -0800, Andres Freund wrote:
> I still think this whole direction of accessing the GUC in walreceiver
> is a bad idea and shouldn't be pursued further.  There's definite
> potential for startup process and WAL receiver having different states
> of GUCs, the code doesn't get meaningfully simpler, the GUC value checks
> in walreceiver make for horrible reporting up the chain.

Did you notice the set of messages from upthread?  The code *gets*
simpler by removing ready_to_display and the need to manipulate the
non-clobbered connection string sent directly from the startup
process.  In my opinion that's a clear gain.  We gain also the
possibility to track down that a WAL receiver is started but not
connected yet for monitoring tools.
--
Michael


signature.asc
Description: PGP signature


Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-10 Thread Andres Freund
Hi,

On 2019-01-11 09:34:28 +0900, Michael Paquier wrote:
> On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote:
> > Thank you, patch looks good for me.
> 
> Thanks Sergei for the review.  I have been looking at the patch again
> this morning with a fresh set of eyes and the thing looks in a
> committable shape (the GUC values for NULL checks are a bit
> inconsistent in the last patch by the way, so I have fixed them
> locally).
> 
> I'll do an extra pass on it in the next couple of days and commit.
> Then let's move on with the reloadable portions.

I still think this whole direction of accessing the GUC in walreceiver
is a bad idea and shouldn't be pursued further.  There's definite
potential for startup process and WAL receiver having different states
of GUCs, the code doesn't get meaningfully simpler, the GUC value checks
in walreceiver make for horrible reporting up the chain.

Greetings,

Andres Freund



Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-10 Thread Michael Paquier
On Thu, Jan 10, 2019 at 02:20:27PM +0300, Sergei Kornilov wrote:
> Thank you, patch looks good for me.

Thanks Sergei for the review.  I have been looking at the patch again
this morning with a fresh set of eyes and the thing looks in a
committable shape (the GUC values for NULL checks are a bit
inconsistent in the last patch by the way, so I have fixed them
locally).

I'll do an extra pass on it in the next couple of days and commit.
Then let's move on with the reloadable portions.
--
Michael


signature.asc
Description: PGP signature


RE: [Proposal] Add accumulated statistics

2019-01-10 Thread Yotsunaga, Naoki
On Thu, Jan 10, 2019 at 8:42 PM, Robert Hass wrote:

Thanks for comments.

>or at least not nearly as useful the results of a sampling approach.
I agree with your opinion.
Because it can't be asserted that the wait event is a bottleneck just because 
the number of wait event is large.
The same thing is mentioned in Oracle.
It also suggests that it is important to acquire waiting time for that.

https://docs.oracle.com/en/database/oracle/oracle-database/18/tgdba/measuring-database-performance.html#GUID-811E9E65-C64A-4028-A90E-102BBFF6E68F
5.2.3 Using Wait Events without Timed Statistics


>The data that LWLOCK_STATS produce are downright misleading 
Is that so?
I do not know the need for this function.

---
Naoki Yotsunaga


Re: Remove all "INTERFACE ROUTINES" style comments

2019-01-10 Thread Thomas Munro
On Fri, Jan 11, 2019 at 12:58 PM Andres Freund  wrote:
> A number of postgres files have sections like heapam's
>
>  * INTERFACE ROUTINES
>  *  relation_open   - open any relation by relation OID
>  *  relation_openrv - open any relation specified by a RangeVar
>  *  relation_close  - close any relation
>  *  heap_open   - open a heap relation by relation OID
>  *  heap_openrv - open a heap relation specified by a RangeVar
>  *  heap_close  - (now just a macro for relation_close)
>  *  heap_beginscan  - begin relation scan
>  *  heap_rescan - restart a relation scan
>  *  heap_endscan- end relation scan
>  *  heap_getnext- retrieve next tuple in scan
>  *  heap_fetch  - retrieve tuple with given tid
>  *  heap_insert - insert tuple into a relation
>  *  heap_multi_insert - insert multiple tuples into a relation
>  *  heap_delete - delete a tuple from a relation
>  *  heap_update - replace a tuple in a relation with another tuple
>  *  heap_sync   - sync heap, for when no WAL has been written
>
> They're often out-of-date, and I personally never found them to be
> useful. A few people, including yours truly, have been removing a few
> here and there when overhauling a subsystem to avoid having to update
> and then adjust them.
>
> I think it might be a good idea to just do that for all at once. Having
> to consider separately committing a removal, updating them without
> fixing preexisting issues, or just leaving them outdated on a regular
> basis imo is a usless distraction.
>
> Comments?

+1

-- 
Thomas Munro
http://www.enterprisedb.com



Re: What to name the current heap after pluggable storage / what to rename?

2019-01-10 Thread Andres Freund
Hi,

On 2018-12-19 14:21:29 -0500, Robert Haas wrote:
> On Tue, Dec 18, 2018 at 11:17 PM Andres Freund  wrote:
> > Another would be to be aggressive in renaming, and deconflict by
> > renaming functions like heap_create[_with_catalog] etc to sound more
> > accurate. I think that has some appeal, because a lot of those names
> > aren't describing their tasks particularly well.
>
> I like that option.

I'd like to start doing that by moving the functions in the following
heapam.h block elsewhere:

/* in heap/heapam.c */
extern Relation relation_open(Oid relationId, LOCKMODE lockmode);
extern Relation try_relation_open(Oid relationId, LOCKMODE lockmode);
extern Relation relation_openrv(const RangeVar *relation, LOCKMODE lockmode);
extern Relation relation_openrv_extended(const RangeVar *relation,
 LOCKMODE lockmode, bool 
missing_ok);
extern void relation_close(Relation relation, LOCKMODE lockmode);

extern Relation heap_open(Oid relationId, LOCKMODE lockmode);
extern Relation heap_openrv(const RangeVar *relation, LOCKMODE lockmode);
extern Relation heap_openrv_extended(const RangeVar *relation,
 LOCKMODE lockmode, bool missing_ok);

ISTM that the first block would best belong into new files like
access/rel[ation].h and access/common/rel[ation].h.  I think the second
set should be renamed to be table_open() (with backward compat
redirects, there's way way too many references) and should go into
access/table.h access/table/table.c alongside tableam.[ch], but I could
also see just putting them into relation.[ch].

Comments?

Greetings,

Andres Freund



Remove all "INTERFACE ROUTINES" style comments

2019-01-10 Thread Andres Freund
Hi,

A number of postgres files have sections like heapam's

 * INTERFACE ROUTINES
 *  relation_open   - open any relation by relation OID
 *  relation_openrv - open any relation specified by a RangeVar
 *  relation_close  - close any relation
 *  heap_open   - open a heap relation by relation OID
 *  heap_openrv - open a heap relation specified by a RangeVar
 *  heap_close  - (now just a macro for relation_close)
 *  heap_beginscan  - begin relation scan
 *  heap_rescan - restart a relation scan
 *  heap_endscan- end relation scan
 *  heap_getnext- retrieve next tuple in scan
 *  heap_fetch  - retrieve tuple with given tid
 *  heap_insert - insert tuple into a relation
 *  heap_multi_insert - insert multiple tuples into a relation
 *  heap_delete - delete a tuple from a relation
 *  heap_update - replace a tuple in a relation with another tuple
 *  heap_sync   - sync heap, for when no WAL has been written

They're often out-of-date, and I personally never found them to be
useful. A few people, including yours truly, have been removing a few
here and there when overhauling a subsystem to avoid having to update
and then adjust them.

I think it might be a good idea to just do that for all at once. Having
to consider separately committing a removal, updating them without
fixing preexisting issues, or just leaving them outdated on a regular
basis imo is a usless distraction.

Comments?

Greetings,

Andres Freund



Re: unnecessary creation of FSM files during bootstrap mode

2019-01-10 Thread Tom Lane
John Naylor  writes:
> Thought I'd ping...

Sorry, I'd not been paying attention to this thread.

> On Sat, Dec 15, 2018 at 12:02 AM Amit Kapila  wrote:
>> As pointed out by John Naylor [1], it seems during bootstrap mode, we
>> are always creating FSM files which are not required.  In commit's
>> b9d01fe288 and 3908473c80, we have added some code where we allowed
>> the creation of files during mdopen even if they didn't exist during
>> the bootstrap mode.  The comments in the code say: "During bootstrap,
>> there are cases where a system relation will be accessed (by internal
>> backend processes) before the bootstrap script nominally creates it."
>> I am sure this will be the case when that code is added but is it
>> required today?

I'm surprised to hear that it isn't, but if we get through initdb
then it must not be.  The special case was certainly still necessary
as of 3908473c80, for the BKI_BOOTSTRAP catalogs.  It didn't bother me
at the time that those were special --- how are you going to create
pg_class, in particular, without cheating like mad?  But I guess we must
have cleaned up something in higher levels of bootstrapping to the point
where those do get mdcreate'd in advance of being opened.

It's also possible that you just aren't exercising the cases where
trouble occurs.  In particular, noting this bit in InsertOneValue():

/*
 * We use ereport not elog here so that parameters aren't evaluated unless
 * the message is going to be printed, which generally it isn't
 */
ereport(DEBUG4,
(errmsg_internal("inserted -> %s",
 OidOutputFunctionCall(typoutput, values[i];

I'd counsel running initdb under DEBUG4 or higher before deciding
you're out of the woods.

(If that does turn out to be a problem, and it's the only problem,
we could discuss whether we really need that debug message at all.)

regards, tom lane



Re: MERGE SQL statement for PG12

2019-01-10 Thread Peter Geoghegan
On Thu, Jan 10, 2019 at 1:15 PM Robert Haas  wrote:
> I feel like there has been some other thread where this was discussed,
> but I can't find it right now.  I think that the "query construction"
> logic in transformMergeStmt is fundamentally the wrong way to do this.
> I think several others have said the same.  And I don't think this
> should be considered for commit until that is rewritten.

I agree with that.

I think that it's worth acknowledging that Pavan is in a difficult
position with this patch. As things stand, Pavan really needs input
from a couple of people to put the query construction stuff on the
right path, and that input has not been forthcoming. I'm not
suggesting that anybody has failed to meet an obligation to Pavan to
put time in here, or that anybody has suggested that this is down to a
failure on Pavan's part. I'm merely pointing out that Pavan is in an
unenviable position with this patch, through no fault of his own, and
despite a worthy effort.

I hope that he sticks it out, because that seems to be what it takes
to see something like this through. I don't know how to do better at
that.

-- 
Peter Geoghegan



Re: [PATCH] kNN for btree

2019-01-10 Thread Alexander Korotkov
Hi!

I've couple more notes regarding this patch.
1) There are two loops over scan key determining scan strategy:
existing loop in _bt_first(), and in new function
_bt_select_knn_search_strategy().  It's kind of redundant that we've
to process scan keys twice for knn searches.  I think scan keys
processing should be merged into one loop.
2) We're not supporting knn ordering only using the first key.
Supporting multiple knn keys would require significant reword of
B-tree traversal algorithm making it closer to GiST and SP-GiST.  So,
I think supporting only one knn key is reasonable restriction, at
least for now.  But we could support single-key knn ordering in more
cases.  For instance, knn search in "SELECT * FROM tbl WHERE a =
const1 ORDER BY b <-> const2" could benefit from (a, b) B-tree index.
So, we can support knn search on n-th key if there are equality scan
keys for [1, n-1] index keys.

I've already discussed these issues with Nikita personally.
Hopefully, new version will be published soon.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: unnecessary creation of FSM files during bootstrap mode

2019-01-10 Thread John Naylor
Thought I'd ping...
(sorry for the top post)

On Sat, Dec 15, 2018 at 12:02 AM Amit Kapila  wrote:
>
> As pointed out by John Naylor [1], it seems during bootstrap mode, we
> are always creating FSM files which are not required.  In commit's
> b9d01fe288 and 3908473c80, we have added some code where we allowed
> the creation of files during mdopen even if they didn't exist during
> the bootstrap mode.  The comments in the code say: "During bootstrap,
> there are cases where a system relation will be accessed (by internal
> backend processes) before the bootstrap script nominally creates it."
> I am sure this will be the case when that code is added but is it
> required today?  While going through commit 3908473c80, I came across
> below comment:
>
> - * During bootstrap processing, we skip that check, because pg_time,
> - * pg_variable, and pg_log get created before their .bki file entries
> - * are processed.
> - */
> + fd = FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | PG_BINARY, 0600);
>
> The system tables mentioned in above commit are not present today, so
> do we really need that code and even if it is required shall we do it
> only for 'main' or 'init' forks?
>
> Tom, as you are a committer of the commits b9d01fe288 and 3908473c80,
> do you remember anything in this regard?
>
>
> [1] - 
> https://www.postgresql.org/message-id/CAJVSVGVtf%2B-2sQGVyEZJQh15dpVicpFA6BBiPLugaD4oBEaiHg%40mail.gmail.com
>
> --
> With Regards,
> Amit Kapila.
> EnterpriseDB: http://www.enterprisedb.com
>



Re: WIP: Avoid creation of the free space map for small tables

2019-01-10 Thread John Naylor
On Wed, Jan 9, 2019 at 10:50 PM Amit Kapila  wrote:
> Thanks, Mithun for performance testing, it really helps us to choose
> the right strategy here.  Once John provides next version, it would be
> good to see the results of regular pgbench (read-write) runs (say at
> 50 and 300 scale factor) and the results of large copy.  I don't think
> there will be any problem, but we should just double check that.

Attached is v12 using the alternating-page strategy. I've updated the
comments and README as needed. In addition, I've

-handled a possible stat() call failure during pg_upgrade
-added one more assertion
-moved the new README material into a separate paragraph
-added a comment to FSMClearLocalMap() about transaction abort
-corrected an outdated comment that erroneously referred to extension
rather than creation
-fleshed out the draft commit messages
From 854ba27740ed26e0d3e89f6228186f2b0914b21e Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 10 Jan 2019 16:37:57 -0500
Subject: [PATCH v12 1/2] Avoid creation of the free space map for small heap
 relations.

Previously, all heaps had FSMs. For very small tables, this means that
the FSM took up more space than the heap did. This is wasteful, so now
we refrain from creating the FSM for heaps with 4 pages or fewer. If the
last known target block has insufficient space, we still try to insert
into some other page before before giving up and extending the relation,
since doing otherwise leads to table bloat. Testing showed that trying
every page penalized performance slightly, so we compromise and try
every other page. This way, we visit at most two pages. Any pages with
wasted free space become visible at next relation extension, so we still
control table bloat. As a bonus, directly attempting one or two pages
can even be faster than consulting the FSM would have been.

If a heap with a FSM is truncated back to below the threshold, the FSM
stays around and can be used as usual.
---
 contrib/pageinspect/expected/page.out |  77 +++
 contrib/pageinspect/sql/page.sql  |  33 +--
 doc/src/sgml/storage.sgml |  13 +-
 src/backend/access/brin/brin.c|   2 +-
 src/backend/access/brin/brin_pageops.c|  10 +-
 src/backend/access/heap/hio.c |  47 ++--
 src/backend/access/transam/xact.c |  14 ++
 src/backend/commands/vacuumlazy.c |  17 +-
 src/backend/storage/freespace/README  |  13 +-
 src/backend/storage/freespace/freespace.c | 262 +-
 src/backend/storage/freespace/indexfsm.c  |   6 +-
 src/include/storage/freespace.h   |   9 +-
 src/test/regress/expected/fsm.out |  62 +
 src/test/regress/parallel_schedule|   6 +
 src/test/regress/serial_schedule  |   1 +
 src/test/regress/sql/fsm.sql  |  46 
 16 files changed, 516 insertions(+), 102 deletions(-)
 create mode 100644 src/test/regress/expected/fsm.out
 create mode 100644 src/test/regress/sql/fsm.sql

diff --git a/contrib/pageinspect/expected/page.out b/contrib/pageinspect/expected/page.out
index 3fcd9fbe6d..83e5910453 100644
--- a/contrib/pageinspect/expected/page.out
+++ b/contrib/pageinspect/expected/page.out
@@ -1,48 +1,69 @@
 CREATE EXTENSION pageinspect;
-CREATE TABLE test1 (a int, b int);
-INSERT INTO test1 VALUES (16777217, 131584);
-VACUUM test1;  -- set up FSM
+CREATE TABLE test_rel_forks (a int);
+-- Make sure there are enough blocks in the heap for the FSM to be created.
+INSERT INTO test_rel_forks SELECT g from generate_series(1,1) g;
+-- set up FSM and VM
+VACUUM test_rel_forks;
 -- The page contents can vary, so just test that it can be read
 -- successfully, but don't keep the output.
-SELECT octet_length(get_raw_page('test1', 'main', 0)) AS main_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 0)) AS main_0;
  main_0 
 
8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'main', 1)) AS main_1;
-ERROR:  block number 1 is out of range for relation "test1"
-SELECT octet_length(get_raw_page('test1', 'fsm', 0)) AS fsm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'main', 100)) AS main_100;
+ERROR:  block number 100 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 0)) AS fsm_0;
  fsm_0 
 ---
   8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'fsm', 1)) AS fsm_1;
- fsm_1 

-  8192
-(1 row)
-
-SELECT octet_length(get_raw_page('test1', 'vm', 0)) AS vm_0;
+SELECT octet_length(get_raw_page('test_rel_forks', 'fsm', 10)) AS fsm_10;
+ERROR:  block number 10 is out of range for relation "test_rel_forks"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 0)) AS vm_0;
  vm_0 
 --
  8192
 (1 row)
 
-SELECT octet_length(get_raw_page('test1', 'vm', 1)) AS vm_1;
-ERROR:  block number 1 is out of range for relation "test1"
+SELECT octet_length(get_raw_page('test_rel_forks', 'vm', 1)) AS vm_1;
+ERROR:  block number 1 is out of range for 

Re: Improve selectivity estimate for range queries

2019-01-10 Thread Tom Lane
Robert Haas  writes:
> On Fri, Dec 21, 2018 at 11:50 AM Tom Lane  wrote:
>> A smaller-footprint way to fix the immediate problem might be to
>> change the values of DEFAULT_INEQ_SEL and friends so that they're
>> even less likely to be matched by accident.  That is, instead of
>> 0., use 0.342 or some such.

> That's not a dumb idea, but it seems pretty unprincipled to me, and to
> be honest I'm kind of surprised that you're not proposing something
> cleaner.

The problem is the invasiveness of such a change (large) vs the benefit
(not so large).  The upthread patch attempted to add a separate signaling
path, but it was very incomplete --- and yet both I and Horiguchi-san
thought it was already too messy.

Maybe at some point we'll go over to something reasonably principled,
like adding confidence intervals to all selectivity estimates.  That
would be *really* invasive but perhaps would bring enough benefit to
justify itself.  But the current patch is just attempting to fix one
extremely narrow pain point, and if that is all it's doing it should
have a commensurately small footprint.  So I don't think the submitted
patch looks good from a cost/benefit standpoint.

regards, tom lane



Re: speeding up planning with partitions

2019-01-10 Thread David Rowley
On Fri, 11 Jan 2019 at 06:56, Alvaro Herrera  wrote:
> Pushed 0001 with some minor tweaks, thanks.

Thanks for pushing.  I had looked at 0001 last night and there wasn't
much to report, just:

v12 0001

1. I see you've moved translate_col_privs() out of prepunion.c into
appendinfo.c. This required making it an external function, but it's
only use is in inherit.c, so would it not be better to put it there
and keep it static?

2. The following two lines I think need to swap their order.

+#include "utils/rel.h"
+#include "utils/lsyscache.h"

Both are pretty minor details but thought I'd post them anyway.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-10 Thread Fabien COELHO



Hello Alvaro,


There's a lot of the new code in pgbench that can be simplified if we
remove \cset.


I'm not very happy with the resulting syntax, but IMO the feature is 
useful. My initial design was to copy PL/pgSQL "into" with some "\into" 
orthogonal to \; and ;, but the implementation was not especially nice and 
I was told to use psql's \gset approach, which I did.


If we do not provide \cset, then combined queries and getting results are 
not orthogonal, although from a performance testing point of view an 
application could do both, and the point is to allow pgbench to test the 
performance impact of doing that.


There are other existing restrictions which are a arbitrary, eg you cannot 
use prepared with combined. I wish not to add more of this kind of 
restrictions, which are not "up to project standard" in my opinion. I may 
try to remove this particular restriction in the future.


Not many people know that queries can be combined, but if you are 
interested in latency that is really an interesting option, and being able 
to check how much can be gained from doing that is a point of a tool like 
pgbench.


--
Fabien.



Re: MERGE SQL statement for PG12

2019-01-10 Thread Robert Haas
On Thu, Jan 3, 2019 at 2:11 AM Pavan Deolasee  wrote:
> On Tue, Nov 27, 2018 at 4:48 PM Pavan Deolasee  
> wrote:
>>  In the meanwhile, I am posting a rebased version.
>
> Another rebase on the current master.

I feel like there has been some other thread where this was discussed,
but I can't find it right now.  I think that the "query construction"
logic in transformMergeStmt is fundamentally the wrong way to do this.
I think several others have said the same.  And I don't think this
should be considered for commit until that is rewritten.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-10 Thread Fabien COELHO



Hello Tom,


So who needs that?  Just merge the queries, if it's so important that
you avoid multiple round trips.


Pgbench is about testing (measuring) performance in various settings and 
realistic scenarii: queries prepared or not, possible combined, and so on. 
As postgres allows to send several queries into one message, I think it 
interesting to be able to test the performance impact of doing that (the 
answer is very significant, esp wrt lantency), and it should not preclude 
to get the results out as a client application could do.


Queries can be "merged", but ISTM syntax is not especially friendly 
(UNION, SELECT of SELECT, CROSS JOIN not sure which one you had in 
mind...) nor reprensentative of what a client application would really do, 
and it would mess with readability, message size, planning and so. Also, 
compound queries need to return all one row, but this constraint is 
neeeded for variable.



We can take it out I guess, but my impression was that we already pretty
much had a consensus that it was wanted.


Maybe if the implementation weren't a pile of junk it'd be all right,
but as-is this is a mess.


Thanks:-)

The dependency on counting \; in particular is setting me off, because 
that has little if anything to do with the number of query results to be 
expected.


The kludge is because there is a kludge (aka optimizations:-) server side 
to silently ignore empty queries. On "SELECT 1 \; /**/ \; SELECT 2 ;" the 
server sends two results back instead of 3, whereas it should logically 
return an empty result on the empty query.


Now pgbench could detect that there is an empty query (possibly skipping 
comments and so), an early version of the patch did that AFAICR, but the 
code did not seem worth it, it seemed cleaner to just document the 
restriction, so it was removed.


I imagine the argument will be that nobody would write the sort of 
queries that break that assumption in a pgbench script;


Detecting empty queries is possible, although the code for doing that is 
kind of ugly and would look bad in the lexer, on which it seemed desirable 
to have minimum changes.


but I don't find that kind of design to be up to project standards, 
especially not when the argument for the feature is tissue-thin in the 
first place.


The "first place" is to be able to implement more realistic scenarii, and 
to have getting results into variables orthogonal to combined queries.


Although I'm not specially thrilled by the resulting syntax, the point is 
to provide a feature pertinent to performance testing, not to have a 
incredibly well designed syntax. It just goes on with the existing 
backslash approach used by psql & pgbench, which has the significant 
advantage that it is mostly SQL with a few things around.


--
Fabien.



Re: NOTIFY and pg_notify performance when deduplicating notifications

2019-01-10 Thread Tom Lane
Julien Demoor  writes:
> [ postgres-notify-all-v8.patch ]

I took a quick look through this.  A few comments:

* I find the proposed syntax extension for NOTIFY to be fairly bizarre;
it's unlike the way that we handle options for any other utility
statement.  It's also non-orthogonal, since you can't specify a collapse
mode without giving a payload string.  I think possibly a better answer
is the way that we've been adding optional parameters to VACUUM and
suchlike recently:

 NOTIFY [ (collapse = off/on) ] channel [ , payload ]

This'd be more extensible if we ever find a need for other options,
too.

* I'm also unimpressed with the idea of having a "maybe" collapse mode.
What's that supposed to do?  It doesn't appear to be different from
"always", so why not just reduce this to a boolean collapse-or-not
flag?

* The documentation doesn't agree with the code, since it fails to
mention "always" mode.

* I was kind of disappointed to find that the patch doesn't really
do anything to fix the performance problem for duplicate notifies.
The thread title had led me to hope for more ;-).  I wonder if we
couldn't do something involving hashing.  OTOH, it's certainly
arguable that that would be an independent patch, and that this
one should be seen as a feature patch ("make NOTIFY's behavior
for duplicate notifies more flexible and more clearly specified")
rather than a performance patch.

regards, tom lane



Re: add_partial_path() may remove dominated path but still in use

2019-01-10 Thread Robert Haas
On Wed, Jan 9, 2019 at 12:44 AM Kohei KaiGai  wrote:
> So, is it sufficient if set_rel_pathlist_hook is just relocated in
> front of the generate_gather_paths?
> If we have no use case for the second hook, here is little necessity
> to have the post_rel_pathlist_hook() here.
> (At least, PG-Strom will use the first hook only.)

+1.  That seems like the best way to be consistent with the principle
that we need to have all the partial paths before generating any
Gather paths.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Improve selectivity estimate for range queries

2019-01-10 Thread Robert Haas
On Fri, Dec 21, 2018 at 11:50 AM Tom Lane  wrote:
> A smaller-footprint way to fix the immediate problem might be to
> change the values of DEFAULT_INEQ_SEL and friends so that they're
> even less likely to be matched by accident.  That is, instead of
> 0., use 0.342 or some such.  It might
> seem that that's just moving the problem around, but I think it
> might be possible to show that such a value couldn't be computed
> by scalarltsel given a histogram with no more than 1 members.
> (I haven't tried to actually prove that, but it seems intuitive
> that the set of possible results would be quantized with no more
> than about 5 digits precision.

That's not a dumb idea, but it seems pretty unprincipled to me, and to
be honest I'm kind of surprised that you're not proposing something
cleaner.  If you admit the need to distinguish between real estimates
and last-ditch ones, why shouldn't we have an explicit representation
of that rather than trying to pick a last-ditch value that is unlikely
to be confused with a real one?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Commitfest delayed: extend it?

2019-01-10 Thread Alvaro Herrera
On 2019-Jan-10, Tom Lane wrote:

> David Fetter  writes:
> > We're 10 days into the Commitfest, the first few having been the new
> > year, with people maybe paying attention to other things.
> > I'd like to propose extending this CF by some period, maybe as long
> > as ten days, so people get all the opportunities they might have had
> > if it had started on time.
> 
> I think it *did* start on time, at least people have been acting like
> it was on.  It just wasn't very official.

It has definitely started, at least for me :-)

We're going to have a bit of a triage session in the FOSDEM dev's
meeting, on Jan 31st, right at the end.  I think that will be a good
opportunity to give some final cleanup, and we should close it then or
shortly thereafter.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [Proposal] Add accumulated statistics

2019-01-10 Thread Robert Haas
On Thu, Dec 20, 2018 at 8:48 PM Yotsunaga, Naoki
 wrote:
> If so, is not that the number of wait events is useful information?

My theory is that the number of wait events is NOT useful information,
or at least not nearly as useful the results of a sampling approach.
The data that LWLOCK_STATS produce are downright misleading -- they
lead you to think that the bottlenecks are in different places than
they really are, because the locks that produce the most waiting can
be 5th or 10th in terms of the number of wait events.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Commitfest delayed: extend it?

2019-01-10 Thread Tom Lane
David Fetter  writes:
> We're 10 days into the Commitfest, the first few having been the new
> year, with people maybe paying attention to other things.
> I'd like to propose extending this CF by some period, maybe as long
> as ten days, so people get all the opportunities they might have had
> if it had started on time.

I think it *did* start on time, at least people have been acting like
it was on.  It just wasn't very official.

regards, tom lane



Commitfest delayed: extend it?

2019-01-10 Thread David Fetter
Folks,

We're 10 days into the Commitfest, the first few having been the new
year, with people maybe paying attention to other things.

I'd like to propose extending this CF by some period, maybe as long
as ten days, so people get all the opportunities they might have had
if it had started on time.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-10 Thread Alvaro Herrera
On 2019-Jan-10, Tom Lane wrote:

> Alvaro Herrera  writes:
> > On 2019-Jan-10, Tom Lane wrote:
> >> This \cset thing seem like an incredibly badly thought out kluge.
> >> What is its excuse to live?
> 
> > The reason is that you can set variables from several queries in one
> > network trip.
> 
> So who needs that?  Just merge the queries, if it's so important that
> you avoid multiple round trips.

Hmm, I suppose that's true.

> > We can take it out I guess, but my impression was that we already pretty
> > much had a consensus that it was wanted. 
> 
> Maybe if the implementation weren't a pile of junk it'd be all right,
> but as-is this is a mess.  The dependency on counting \; in particular
> is setting me off, because that has little if anything to do with the
> number of query results to be expected.  I imagine the argument will
> be that nobody would write the sort of queries that break that assumption
> in a pgbench script; but I don't find that kind of design to be up
> to project standards, especially not when the argument for the feature
> is tissue-thin in the first place.

There's a lot of the new code in pgbench that can be simplified if we
remove \cset.

I'll leave time for others to argue for or against cset, and then act
accordingly.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-10 Thread Tom Lane
Alvaro Herrera  writes:
> On 2019-Jan-10, Tom Lane wrote:
>> This \cset thing seem like an incredibly badly thought out kluge.
>> What is its excuse to live?

> The reason is that you can set variables from several queries in one
> network trip.

So who needs that?  Just merge the queries, if it's so important that
you avoid multiple round trips.

> We can take it out I guess, but my impression was that we already pretty
> much had a consensus that it was wanted. 

Maybe if the implementation weren't a pile of junk it'd be all right,
but as-is this is a mess.  The dependency on counting \; in particular
is setting me off, because that has little if anything to do with the
number of query results to be expected.  I imagine the argument will
be that nobody would write the sort of queries that break that assumption
in a pgbench script; but I don't find that kind of design to be up
to project standards, especially not when the argument for the feature
is tissue-thin in the first place.

regards, tom lane



Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-10 Thread Tom Lane
Alvaro Herrera  writes:
> Now let's see how the buildfarm likes this ...

This \cset thing seem like an incredibly badly thought out kluge.
What is its excuse to live?

regards, tom lane



Re: speeding up planning with partitions

2019-01-10 Thread Alvaro Herrera
On 2019-Jan-10, Amit Langote wrote:

> Here's v12, which is more or less same as v11 but with the order of
> patches switched so that the code movement patch is first.  Note that the
> attached v12-0001 contains no functional changes (but there's tiny note in
> the commit message mentioning the addition of a tiny function which is
> just old code).

Pushed 0001 with some minor tweaks, thanks.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-10 Thread Dean Rasheed
On Wed, 26 Dec 2018 at 22:09, Tomas Vondra  wrote:
>
> Attached is an updated version of the patch - rebased and fixing the
> warnings reported by Thomas Munro.
>

Here are a few random review comments based on what I've read so far:


On the CREATE STATISTICS doc page, the syntax in the new examples
added to the bottom of the page is incorrect. E.g., instead of

CREATE STATISTICS s2 WITH (mcv) ON (a, b) FROM t2;

it should read

CREATE STATISTICS s2 (mcv) ON a, b FROM t2;

I think perhaps there should be also be a short explanatory sentence
after each example (as in the previous one) just to explain what the
example is intended to demonstrate. E.g., for the new MCV example,
perhaps say

   These statistics give the planner more detailed information about the
   specific values that commonly appear in the table, as well as an upper
   bound on the selectivities of combinations of values that do not appear in
   the table, allowing it to generate better estimates in both cases.

I don't think there's a need for too much detail there, since it's
explained more fully elsewhere, but it feels like it needs a little
more just to explain the purpose of the example.


There is additional documentation in perform.sgml that needs updating
-- about what kinds of stats the planner keeps. Those docs are
actually quite similar to the ones on planstats.sgml. It seems the
former focus more one what stats the planner stores, while the latter
focus on how the planner uses those stats.


In func.sgml, the docs for pg_mcv_list_items need extending to include
the base frequency column. Similarly for the example query in
planstats.sgml.


Tab-completion for the CREATE STATISTICS statement should be extended
for the new kinds.


Looking at mcv_update_match_bitmap(), it's called 3 times (twice
recursively from within itself), and I think the pattern for calling
it is a bit messy. E.g.,

/* by default none of the MCV items matches the clauses */
bool_matches = palloc0(sizeof(char) * mcvlist->nitems);

if (or_clause(clause))
{
/* OR clauses assume nothing matches, initially */
memset(bool_matches, STATS_MATCH_NONE, sizeof(char) *
mcvlist->nitems);
}
else
{
/* AND clauses assume everything matches, initially */
memset(bool_matches, STATS_MATCH_FULL, sizeof(char) *
mcvlist->nitems);
}

/* build the match bitmap for the OR-clauses */
mcv_update_match_bitmap(root, bool_clauses, keys,
mcvlist, bool_matches,
or_clause(clause));

the comment for the AND case directly contradicts the initial comment,
and the final comment is wrong because it could be and AND clause. For
a NOT clause it does:

/* by default none of the MCV items matches the clauses */
not_matches = palloc0(sizeof(char) * mcvlist->nitems);

/* NOT clauses assume nothing matches, initially */
memset(not_matches, STATS_MATCH_FULL, sizeof(char) *
mcvlist->nitems);

/* build the match bitmap for the NOT-clause */
mcv_update_match_bitmap(root, not_args, keys,
mcvlist, not_matches, false);

so the second comment is wrong. I understand the evolution that lead
to this function existing in this form, but I think that it can now be
refactored into a "getter" function rather than an "update" function.
I.e., something like mcv_get_match_bitmap() which first allocates the
array to be returned and initialises it based on the passed-in value
of is_or. That way, all the calling sites can be simplified to
one-liners like

/* get the match bitmap for the AND/OR clause */
bool_matches = mcv_get_match_bitmap(root, bool_clauses, keys,
mcvlist, or_clause(clause));


In the previous discussion around UpdateStatisticsForTypeChange(), the
consensus appeared to be that we should just unconditionally drop all
extended statistics when ALTER TABLE changes the type of an included
column (just as we do for per-column stats), since such a type change
can rewrite the data in arbitrary ways, so there's no reason to assume
that the old stats are still valid. I think it makes sense to extract
that as a separate patch to be committed ahead of these ones, and I'd
also argue for back-patching it.


That's it for now. I'll try to keep reviewing if time permits.

Regards,
Dean



Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-10 Thread Alvaro Herrera
On 2019-Jan-10, Fabien COELHO wrote:

> However, I switched "pg_free" to "termPQExpBuffer", which seems more
> appropriate, even if it just does the same thing. I also ensured that
> prefixes are allocated & freed. I've commented about expr which is not
> freed.

Oops, of course, thanks.

> I'm not keen on having the command array size checked and updated *after*
> the command is appended, even if the initial allocation ensures that there
> is no overflow, but I let it as is.

It was already done that way, only it was done in two places rather than
one.  I just refactored it.  (In fairness, I think the assignment of the
new command to the array could also be done in one place instead of two, 
but it seems slightly clearer like this.)

> Attached a v29 with the above minor changes wrt your version.

Thanks, pushed.  I fixed a couple of very minor issues in the docs.

Now let's see how the buildfarm likes this ...

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Policy on cross-posting to multiple lists

2019-01-10 Thread Amit Langote
On Fri, Jan 11, 2019 at 12:58 AM Tom Lane  wrote:
> Dean Rasheed  writes:
> > The "Crash on ALTER TABLE" thread [1] started on -bugs, but Andrew's
> > message on 8 Jan with an initial proposed patch and my response later
> > that day both CC'ed -hackers and seem to have been rejected, and so
> > are missing from the archives.
>
> I've done the same recently, without problems.  I'd suggest inquiring
> on the pgsql-www list; project infrastructure issues are not really
> on-topic here.

Fwiw, an email I sent yesterday was rejected similarly because I'd
tried to send it to both pgsql-hackers and pgsql-performance.  I
mentioned about that when I resent the same email successfully [1]
after dropping pgsql-performance from the list of recipients.

Thanks,
Amit

[1] 
https://www.postgresql.org/message-id/96720c99-ffa0-01ad-c594-0504c8eda708%40lab.ntt.co.jp



Re: Policy on cross-posting to multiple lists

2019-01-10 Thread Tom Lane
Dean Rasheed  writes:
> Has the policy on cross-posting to multiple lists been hardened recently?

Not that I've heard of.

> The "Crash on ALTER TABLE" thread [1] started on -bugs, but Andrew's
> message on 8 Jan with an initial proposed patch and my response later
> that day both CC'ed -hackers and seem to have been rejected, and so
> are missing from the archives.

I've done the same recently, without problems.  I'd suggest inquiring
on the pgsql-www list; project infrastructure issues are not really
on-topic here.

regards, tom lane



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-10 Thread Tomas Vondra
On 1/10/19 4:20 PM, Dean Rasheed wrote:
> On Wed, 9 Jan 2019 at 15:40, Tomas Vondra  
> wrote:
>> On 1/8/19 3:18 PM, Dean Rasheed wrote:
>>> So actually, the estimate for a group of values will be either the MCV
>>> item's frequency (if the MCV item is kept), or (roughly) the MCV
>>> item's base_frequency (if the MCV item is not kept). That suggests
>>> that we should simply keep items that are significantly more or less
>>> common than the item's base frequency -- i.e., keep rule (b) and ditch
>>> rule (a).
>>>
>>
>> Hmmm, but won't that interfere with how we with how we extrapolate the
>> MCV estimate to the non-MCV part? Currently the patch does what you
>> proposed, i.e.
>>
>> other_sel = simple_sel - mcv_basesel;
>>
>> I'm worried that if we only include the items that are significantly
>> more or less common than the base frequency, it may skew the other_sel
>> estimate.
>>
> 
> I don't see how that would skew other_sel. Items close to the base
> frequency would also tend to be close to simple_sel, making other_sel
> approximately zero, so excluding them should have little effect.

Oh, I see. You're right those items should contribute very little to
other_sel, I should have realized that.

> However...
> 
> Re-reading the thread where we enhanced the per-column MCV stats last
> year [1], it was actually the case that an algorithm based on just
> looking at the relative standard error worked pretty well for a very
> wide range of data distributions.
> 
> The final algorithm chosen in analyze_mcv_list() was only a marginal
> improvement on that, and was directly based upon the fact that, in the
> univariate statistics case, all the values not included in the MCV
> list are assigned the same selectivity. However, that's not the case
> for multivariate stats, because each group not included in the
> multivariate MCV list gets assigned a different selectivity based on
> its per-column stats.
> 
> So perhaps what we should do for multivariate stats is simply use the
> relative standard error approach (i.e., reuse the patch in [2] with a
> 20% RSE cutoff). That had a lot of testing at the time, against a wide
> range of data distributions, and proved to be very good, not to
> mention being very simple.
> 
> That approach would encompass both groups more and less common than
> the base frequency, because it relies entirely on the group appearing
> enough times in the sample to infer that any errors on the resulting
> estimates will be reasonably well controlled. It wouldn't actually
> look at the base frequency at all in deciding which items to keep.
> 
> Moreover, if the group appears sufficiently often in the sample to
> justify being kept, each of the individual column values must also
> appear at least that often as well, which means that the errors on the
> base frequency estimate are also well controlled. That was one of my
> concerns about other algorithms such as "keep items significantly more
> or less common than the base frequency" -- in the less common case,
> there's no lower bound on the number of occurrences seen, and so no
> guarantee that the errors are kept under control.
> 

Yep, that looks like a great approach. Simple and tested. I'll try
tweaking the patch accordingly over the weekend.

Thanks!

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] PATCH: multivariate histograms and MCV lists

2019-01-10 Thread Dean Rasheed
On Wed, 9 Jan 2019 at 15:40, Tomas Vondra  wrote:
> On 1/8/19 3:18 PM, Dean Rasheed wrote:
> > So actually, the estimate for a group of values will be either the MCV
> > item's frequency (if the MCV item is kept), or (roughly) the MCV
> > item's base_frequency (if the MCV item is not kept). That suggests
> > that we should simply keep items that are significantly more or less
> > common than the item's base frequency -- i.e., keep rule (b) and ditch
> > rule (a).
> >
>
> Hmmm, but won't that interfere with how we with how we extrapolate the
> MCV estimate to the non-MCV part? Currently the patch does what you
> proposed, i.e.
>
> other_sel = simple_sel - mcv_basesel;
>
> I'm worried that if we only include the items that are significantly
> more or less common than the base frequency, it may skew the other_sel
> estimate.
>

I don't see how that would skew other_sel. Items close to the base
frequency would also tend to be close to simple_sel, making other_sel
approximately zero, so excluding them should have little effect.
However...

Re-reading the thread where we enhanced the per-column MCV stats last
year [1], it was actually the case that an algorithm based on just
looking at the relative standard error worked pretty well for a very
wide range of data distributions.

The final algorithm chosen in analyze_mcv_list() was only a marginal
improvement on that, and was directly based upon the fact that, in the
univariate statistics case, all the values not included in the MCV
list are assigned the same selectivity. However, that's not the case
for multivariate stats, because each group not included in the
multivariate MCV list gets assigned a different selectivity based on
its per-column stats.

So perhaps what we should do for multivariate stats is simply use the
relative standard error approach (i.e., reuse the patch in [2] with a
20% RSE cutoff). That had a lot of testing at the time, against a wide
range of data distributions, and proved to be very good, not to
mention being very simple.

That approach would encompass both groups more and less common than
the base frequency, because it relies entirely on the group appearing
enough times in the sample to infer that any errors on the resulting
estimates will be reasonably well controlled. It wouldn't actually
look at the base frequency at all in deciding which items to keep.

Moreover, if the group appears sufficiently often in the sample to
justify being kept, each of the individual column values must also
appear at least that often as well, which means that the errors on the
base frequency estimate are also well controlled. That was one of my
concerns about other algorithms such as "keep items significantly more
or less common than the base frequency" -- in the less common case,
there's no lower bound on the number of occurrences seen, and so no
guarantee that the errors are kept under control.

Regards,
Dean

[1] 
https://www.postgresql.org/message-id/flat/CAMkU%3D1yvdGvW9TmiLAhz2erFnvnPFYHbOZuO%2Ba%3D4DVkzpuQ2tw%40mail.gmail.com

[2] 
https://www.postgresql.org/message-id/CAEZATCUEmHCZeOHJN8JO5O9LK_VuFeCecy_AxTk7S_2SmLXeyw%40mail.gmail.com



Re: PostgreSQL vs SQL/XML Standards

2019-01-10 Thread Pavel Stehule
Hi

čt 10. 1. 2019 v 14:00 odesílatel Arthur Zakirov 
napsal:

> Hello Pavel,
>
> On 09.11.2018 07:07, Pavel Stehule wrote:
> > I used your patch and append regress tests. I checked the result against
> > Oracle.
>
> I checked the patch with Chap cases. The patch fixes handling of
> boolean, number types which mentioned in the wiki.
>
> I have a few comments related to the code and the documentation. I
> attached the patch, which fixes it.
>
> There is an example in the documentation:
>
> SELECT xmltable.*
>FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
> element
> --
> Hello2a2   bbbCC
>
> With the patch XMLTABLE returns different result now.
>
> copy_and_safe_free_xmlchar() function should be hid by #ifdef
> USE_LIBXML, otherwise I get an error if I build the Postgres without
> --with-libxml.
>
> There is a comment within XmlTableGetValue(). I changed it, mainly I
> used Markus patch from the related thread mentioned by Alvaro.
>
> Please see the changes in the patch.
>

I merged your changes, and fixed regress tests.

Thank you for patch

Regards

Pavel



> --
> Arthur Zakirov
> Postgres Professional: http://www.postgrespro.com
> Russian Postgres Company
>
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 90d67f1acf..06f3f69073 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10999,9 +10999,9 @@ $$ AS data;
 
 SELECT xmltable.*
   FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
-   element
---
-   Hello2a2   bbbCC  
+ element 
+-
+   Hello2a2   bbbxxxCC  
 ]]>
 
 
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index 1cec168b2a..df7f0cc20d 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -3674,15 +3674,15 @@ SPI_sql_row_to_xmlelement(uint64 rownum, StringInfo result, char *tablename,
 #ifdef USE_LIBXML
 
 /*
- * Convert XML node to text (dump subtree in case of element,
- * return value otherwise)
+ * Convert XML node to text (dump subtree), for attribute and text
+ * returns escaped text.
  */
 static text *
 xml_xmlnodetoxmltype(xmlNodePtr cur, PgXmlErrorContext *xmlerrcxt)
 {
 	xmltype*result;
 
-	if (cur->type == XML_ELEMENT_NODE)
+	if (cur->type != XML_ATTRIBUTE_NODE && cur->type != XML_TEXT_NODE)
 	{
 		xmlBufferPtr buf;
 		xmlNodePtr	cur_copy;
@@ -4427,6 +4427,37 @@ XmlTableFetchRow(TableFuncScanState *state)
 #endif			/* not USE_LIBXML */
 }
 
+#ifdef USE_LIBXML
+/*
+ * Copy XmlChar string to PostgreSQL memory. Ensure releasing of
+ * source xmllib string.
+ */
+static char *
+copy_and_safe_free_xmlchar(xmlChar *str)
+{
+	char *result;
+
+	if (str)
+	{
+		PG_TRY();
+		{
+			result = pstrdup((char *) str);
+		}
+		PG_CATCH();
+		{
+			xmlFree(str);
+			PG_RE_THROW();
+		}
+		PG_END_TRY();
+		xmlFree(str);
+	}
+	else
+		result = NULL;
+
+	return result;
+}
+#endif
+
 /*
  * XmlTableGetValue
  *		Return the value for column number 'colnum' for the current row.  If
@@ -4475,9 +4506,9 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 		/*
 		 * There are four possible cases, depending on the number of nodes
 		 * returned by the XPath expression and the type of the target column:
-		 * a) XPath returns no nodes.  b) One node is returned, and column is
-		 * of type XML.  c) One node, column type other than XML.  d) Multiple
-		 * nodes are returned.
+		 * a) XPath returns no nodes.  b) The target type is XML (return all
+		 * as XML).  For non-XML types:  c) One node (return content).
+		 * d) Multiple nodes (error).
 		 */
 		if (xpathobj->type == XPATH_NODESET)
 		{
@@ -4490,85 +4521,72 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 			{
 *isnull = true;
 			}
-			else if (count == 1 && typid == XMLOID)
-			{
-text	   *textstr;
-
-/* simple case, result is one value */
-textstr = xml_xmlnodetoxmltype(xpathobj->nodesetval->nodeTab[0],
-			   xtCxt->xmlerrcxt);
-cstr = text_to_cstring(textstr);
-			}
-			else if (count == 1)
+			else
 			{
-xmlChar*str;
-xmlNodePtr	node;
-
-/*
- * Most nodes (elements and even attributes) store their data
- * in children nodes. If they don't have children nodes, it
- * means that they are empty (e.g. ). Text nodes and
- * CDATA sections are an exception: they don't have children
- * but have content in the Text/CDATA node itself.
- */
-node = xpathobj->nodesetval->nodeTab[0];
-if (node->type != XML_CDATA_SECTION_NODE &&
-	node->type != XML_TEXT_NODE)
-	node = node->xmlChildrenNode;
-
-str = xmlNodeListGetString(xtCxt->doc, node, 1);
-if (str != NULL)
+if (typid == XMLOID)
 {
-	PG_TRY();
-	{
-		cstr = pstrdup((char *) str);
-	}
-	PG_CATCH();
+	text	   *textstr;
+	StringInfoData str;
+	int			i;
+
+	/* Concatenate serialized values */
+	initStringInfo();
+		

Re: Remove Deprecated Exclusive Backup Mode

2019-01-10 Thread Stephen Frost
Greetings,

* Laurenz Albe (laurenz.a...@cybertec.at) wrote:
> Stephen Frost wrote:
> > * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> > > Clearly, not having to do that at all is better, but if this is all
> > > there is to it, then I'm confused by the characterizations of how awful
> > > and terrible this feature is and how we must rush to remove it.
> > 
> > It's not all there is to it though.
> > 
> > This issue leads to extended downtime regularly and is definitely a huge
> > 'gotcha' for users, even if you don't want to call it outright broken,
> 
> Only if PostgreSQL crashes regularly, right?

This happens anytime a backup is in progress and the system crashes in
any way- PostgreSQL going down, the container Postgres is running in,
the server Postgres is running on, or if the filesystem that Postgres is
running on goes away, etc.

There's certainly no shortage of cases where this can happen.

Thanks!

Stephen


signature.asc
Description: PGP signature


Re: Remove Deprecated Exclusive Backup Mode

2019-01-10 Thread Laurenz Albe
Stephen Frost wrote:
> * Peter Eisentraut (peter.eisentr...@2ndquadrant.com) wrote:
> > Clearly, not having to do that at all is better, but if this is all
> > there is to it, then I'm confused by the characterizations of how awful
> > and terrible this feature is and how we must rush to remove it.
> 
> It's not all there is to it though.
> 
> This issue leads to extended downtime regularly and is definitely a huge
> 'gotcha' for users, even if you don't want to call it outright broken,

Only if PostgreSQL crashes regularly, right?

Yours,
Laurenz Albe




Re: PostgreSQL vs SQL/XML Standards

2019-01-10 Thread Arthur Zakirov

Hello Pavel,

On 09.11.2018 07:07, Pavel Stehule wrote:
I used your patch and append regress tests. I checked the result against 
Oracle.


I checked the patch with Chap cases. The patch fixes handling of 
boolean, number types which mentioned in the wiki.


I have a few comments related to the code and the documentation. I 
attached the patch, which fixes it.


There is an example in the documentation:

SELECT xmltable.*
  FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
   element
--
   Hello2a2   bbbCC

With the patch XMLTABLE returns different result now.

copy_and_safe_free_xmlchar() function should be hid by #ifdef 
USE_LIBXML, otherwise I get an error if I build the Postgres without 
--with-libxml.


There is a comment within XmlTableGetValue(). I changed it, mainly I 
used Markus patch from the related thread mentioned by Alvaro.


Please see the changes in the patch.

--
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 90d67f1acf..06f3f69073 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10999,9 +10999,9 @@ $$ AS data;
 
 SELECT xmltable.*
   FROM xmlelements, XMLTABLE('/root' PASSING data COLUMNS element text);
-   element
---
-   Hello2a2   bbbCC  
+ element 
+-
+   Hello2a2   bbbxxxCC  
 ]]>
 
 
diff --git a/src/backend/utils/adt/xml.c b/src/backend/utils/adt/xml.c
index a83882f5de..df7f0cc20d 100644
--- a/src/backend/utils/adt/xml.c
+++ b/src/backend/utils/adt/xml.c
@@ -4427,6 +4427,7 @@ XmlTableFetchRow(TableFuncScanState *state)
 #endif			/* not USE_LIBXML */
 }
 
+#ifdef USE_LIBXML
 /*
  * Copy XmlChar string to PostgreSQL memory. Ensure releasing of
  * source xmllib string.
@@ -4455,6 +4456,7 @@ copy_and_safe_free_xmlchar(xmlChar *str)
 
 	return result;
 }
+#endif
 
 /*
  * XmlTableGetValue
@@ -4504,9 +4506,9 @@ XmlTableGetValue(TableFuncScanState *state, int colnum,
 		/*
 		 * There are four possible cases, depending on the number of nodes
 		 * returned by the XPath expression and the type of the target column:
-		 * a) XPath returns no nodes.  b) One node is returned, and column is
-		 * of type XML.  c) One node, column type other than XML.  d) Multiple
-		 * nodes are returned.
+		 * a) XPath returns no nodes.  b) The target type is XML (return all
+		 * as XML).  For non-XML types:  c) One node (return content).
+		 * d) Multiple nodes (error).
 		 */
 		if (xpathobj->type == XPATH_NODESET)
 		{


Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-10 Thread Michael Paquier
On Thu, Jan 10, 2019 at 01:10:16PM +0300, Sergei Kornilov wrote:
> Without both ready_to_display and cleaning in RequestXLogStreaming
> we do not show outdated PrimaryConnInfo in case walreceiver just
> started, but does not connected yet? While waiting walrcv_connect
> for example. 

Good point.  There is a gap between the moment the WAL receiver PID is
set when the WAL receiver starts up and the moment where the different
fields are reset to 0 which is not good as it could be possible that
the user sees the information from the previous WAL receiver, so we
should move the memset calls when the PID is set, reaching a state
where the PID is alive but where there is no connection yet.  The port
number needs a similar treatment.

Looking closer at the code, it seems to me that it would be a good
thing if we update the shared memory state when the WAL receiver dies,
so as not only the PID is set to 0, but all connection-related
information gets the call.

With all those comments I am finishing with the updated attached.
Thoughts?
--
Michael
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 9823b75767..4191b23621 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -11857,8 +11857,7 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess,
 	 tli, curFileTLI);
 		}
 		curFileTLI = tli;
-		RequestXLogStreaming(tli, ptr, PrimaryConnInfo,
-			 PrimarySlotName);
+		RequestXLogStreaming(tli, ptr);
 		receivedUpto = 0;
 	}
 
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 2e90944ad5..7e3ff63a44 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -188,9 +188,7 @@ DisableWalRcvImmediateExit(void)
 void
 WalReceiverMain(void)
 {
-	char		conninfo[MAXCONNINFO];
 	char	   *tmp_conninfo;
-	char		slotname[NAMEDATALEN];
 	XLogRecPtr	startpoint;
 	TimeLineID	startpointTLI;
 	TimeLineID	primaryTLI;
@@ -248,10 +246,6 @@ WalReceiverMain(void)
 	walrcv->pid = MyProcPid;
 	walrcv->walRcvState = WALRCV_STREAMING;
 
-	/* Fetch information required to start streaming */
-	walrcv->ready_to_display = false;
-	strlcpy(conninfo, (char *) walrcv->conninfo, MAXCONNINFO);
-	strlcpy(slotname, (char *) walrcv->slotname, NAMEDATALEN);
 	startpoint = walrcv->receiveStart;
 	startpointTLI = walrcv->receiveStartTLI;
 
@@ -262,6 +256,16 @@ WalReceiverMain(void)
 	/* Report the latch to use to awaken this process */
 	walrcv->latch = >procLatch;
 
+	/*
+	 * Reset all connection information when the PID is set, which makes
+	 * the information visible at SQL level, still we are not connected
+	 * yet.
+	 */
+	memset(walrcv->conninfo, 0, MAXCONNINFO);
+	memset(walrcv->slotname, 0, NAMEDATALEN);
+	memset(walrcv->sender_host, 0, NI_MAXHOST);
+	walrcv->sender_port = 0;
+
 	SpinLockRelease(>mutex);
 
 	/* Arrange to clean up at walreceiver exit */
@@ -291,32 +295,36 @@ WalReceiverMain(void)
 	/* Unblock signals (they were blocked when the postmaster forked us) */
 	PG_SETMASK();
 
+	if (PrimaryConnInfo == NULL || PrimaryConnInfo[0] == '\0')
+		ereport(ERROR,
+(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("cannot connect to the primary server as \"primary_conninfo\" is not defined")));
+
 	/* Establish the connection to the primary for XLOG streaming */
 	EnableWalRcvImmediateExit();
-	wrconn = walrcv_connect(conninfo, false, "walreceiver", );
+	wrconn = walrcv_connect(PrimaryConnInfo, false, "walreceiver", );
 	if (!wrconn)
 		ereport(ERROR,
 (errmsg("could not connect to the primary server: %s", err)));
 	DisableWalRcvImmediateExit();
 
 	/*
-	 * Save user-visible connection string.  This clobbers the original
-	 * conninfo, for security. Also save host and port of the sender server
-	 * this walreceiver is connected to.
+	 * Save user-visible connection string.  Also save host and port of the
+	 * sender server this walreceiver is connected to.
 	 */
 	tmp_conninfo = walrcv_get_conninfo(wrconn);
 	walrcv_get_senderinfo(wrconn, _host, _port);
 	SpinLockAcquire(>mutex);
-	memset(walrcv->conninfo, 0, MAXCONNINFO);
 	if (tmp_conninfo)
 		strlcpy((char *) walrcv->conninfo, tmp_conninfo, MAXCONNINFO);
 
-	memset(walrcv->sender_host, 0, NI_MAXHOST);
+	if (PrimarySlotName)
+		strlcpy((char *) walrcv->slotname, PrimarySlotName, NAMEDATALEN);
+
 	if (sender_host)
 		strlcpy((char *) walrcv->sender_host, sender_host, NI_MAXHOST);
 
 	walrcv->sender_port = sender_port;
-	walrcv->ready_to_display = true;
 	SpinLockRelease(>mutex);
 
 	if (tmp_conninfo)
@@ -387,7 +395,8 @@ WalReceiverMain(void)
 		 */
 		options.logical = false;
 		options.startpoint = startpoint;
-		options.slotname = slotname[0] != '\0' ? slotname : NULL;
+		options.slotname = (PrimarySlotName && PrimarySlotName[0] != '\0') ?
+			PrimarySlotName : NULL;
 		options.proto.physical.startpointTLI = startpointTLI;
 		ThisTimeLineID = startpointTLI;
 		if 

Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-10 Thread Michael Paquier
On Thu, Jan 10, 2019 at 05:58:10PM +0900, Amit Langote wrote:
> The reason I started this thread is due to this Stack Overflow question:
> 
> https://stackoverflow.com/questions/53554727/logical-replication-and-declarative-partitioning-in-postgresql-11
> 
> So, it appears that there may be an element of surprise involved in
> encountering such an error (despite the documentation).

Improving the user experience is definitely a good thing in my
opinion because the current error message can be confusing, so you
were right to start this thread.  Still I don't agree that classifying
those relkinds as not supported is right either for consistency with
the code existing for two years and for the way the code is designed
to work as rows are replicated on a per-physically-defined relation
basis.
--
Michael


signature.asc
Description: PGP signature


Re: Making WAL receiver startup rely on GUC context for primary_conninfo and primary_slot_name

2019-01-10 Thread Sergei Kornilov
Hello

> I was just double-checking the code, and it is possible to remove the
> part from RequestXLogStreaming() which sets slotname and conninfo to
> '\0', so I removed that part from my local branch to simplify the
> code.

Without both ready_to_display and cleaning in RequestXLogStreaming we do not 
show outdated PrimaryConnInfo in case walreceiver just started, but does not 
connected yet? While waiting walrcv_connect for example.

regards, Sergei



Policy on cross-posting to multiple lists

2019-01-10 Thread Dean Rasheed
Has the policy on cross-posting to multiple lists been hardened recently?

The "Crash on ALTER TABLE" thread [1] started on -bugs, but Andrew's
message on 8 Jan with an initial proposed patch and my response later
that day both CC'ed -hackers and seem to have been rejected, and so
are missing from the archives.

In that case, it's not a big deal because subsequent replies included
the text from the missing messages, so it's still possible to follow
the discussion, but I wanted to check whether this was an intentional
change of policy. If so, it seems a bit harsh to flat-out reject these
messages. My prior understanding was that cross-posting, while
generally discouraged, does still sometimes have value.

[1] 
https://www.postgresql.org/message-id/flat/CAEZATCVqksrnXybSaogWOzmVjE3O-NqMSpoHDuDw9_7mhNpeLQ%40mail.gmail.com#2c25e9a783d4685912dcef8b3f3edd63

Regards,
Dean



Re: Query with high planning time at version 11.1 compared versions 10.5 and 11.0

2019-01-10 Thread Ashutosh Bapat
On Thu, Jan 10, 2019 at 7:12 AM Amit Langote 
wrote:

> Fujita-san,
>
> On 2019/01/09 20:20, Etsuro Fujita wrote:
> > (2019/01/09 9:30), Amit Langote wrote:
> >> So, while the patch at [1] can take care of this issue as I also
> mentioned
> >> upthread, I was trying to come up with a solution that can be
> back-patched
> >> to PG 11.  The patch I posted above is one such solution and as Ashutosh
> >> points out it's perhaps not the best, because it can result in
> potentially
> >> creating many copies of the same child EC member if we do it in
> joinrel.c,
> >> as the patch proposes.  I will try to respond to the concerns he raised
> in
> >> the next week if possible.
> >
> > Thanks for working on this!
> >
> > I like your patch in general.  I think one way to address Ashutosh's
> > concerns would be to use the consider_partitionwise_join flag:
> originally,
> > that was introduced for partitioned relations to show that they can be
> > partitionwise-joined, but I think that flag could also be used for
> > non-partitioned relations to show that they have been set up properly for
> > partitionwise-joins, and I think by checking that flag we could avoid
> > creating those copies for child dummy rels in try_partitionwise_join.
>
> Ah, that's an interesting idea.
>
> If I understand the original design of it correctly,
> consider_partitionwise_join being true for a given relation (simple or
> join) means that its RelOptInfo contains properties to consider it to be
> joined with another relation (simple or join) using partitionwise join
> mechanism.  Partitionwise join will occur between the pair if the other
> relation also has relevant properties (hence its
> consider_partitionwise_join set to true) and properties on the two sides
> match.
>
>
Though this will solve a problem for performance when partition-wise join
is not possible, we still have the same problem when partition-wise join is
possible. And that problem really happens because our inheritance mechanism
requires expression translation from parent to child everywhere. That
consumes memory, eats CPU cycles and generally downgrades performance of
partition related query planning. I think a better way would be to avoid
these translations and use Parent var to represent a Var of the child being
dealt with. That will be a massive churn on inheritance based planner code,
but it will improve planning time for queries involving thousands of
partitions.

--
Best Wishes,
Ashutosh Bapat


Re: BUG #15446: Crash on ALTER TABLE

2019-01-10 Thread Dean Rasheed
On Wed, 9 Jan 2019 at 23:24, Andrew Dunstan
 wrote:
> Here's another attempt. For the rewrite case it kept the logic of the
> previous patch to clear all the missing attributes, but if we're not
> rewriting we reconstruct the missing value according to the new type
> settings.
>

Looks good to me.

Regards,
Dean



Re: speeding up planning with partitions

2019-01-10 Thread David Rowley
On Thu, 10 Jan 2019 at 21:41, Amit Langote
 wrote:
> In the v13 that I will try to post tomorrow, I will have hopefully
> addressed David's and Imai-san's review comments.  Thank you both!

I'd been looking at v11's 0002 and started on 0003 too. I'll include
my notes so far if you're about to send a v13.


v11 0002

18. There's a missing case in the following code. I understand that
makeNode() will 0 the member here, but that does not stop the various
other initialisations that set the default value for the field.  Below
there's a missing case where parent != NULL && parent->rtekind !=
RTE_RELATION. You might be better just always zeroing the field below
"rel->partitioned_child_rels = NIL;"

+
+ /*
+ * For inheritance child relations, we also set inh_root_parent.
+ * Note that 'parent' might itself be a child (a sub-partitioned
+ * partition), in which case we simply use its value of
+ * inh_root_parent.
+ */
+ if (parent->rtekind == RTE_RELATION)
+ rel->inh_root_parent = parent->inh_root_parent > 0 ?
+ parent->inh_root_parent :
+ parent->relid;
  }
  else
+ {
  rel->top_parent_relids = NULL;
+ rel->inh_root_parent = 0;
+ }

19. Seems strange to have a patch that adds a new field that is
unused. I also don't quite understand yet why top_parent_relids can't
be used instead. I think I recall being confused about that before, so
maybe it's worth writing a comment to mention why it cannot be used.

v11 0003

20. This code looks wrong:

+ /*
+ * expand_inherited_tables may have proved that the relation is empty, so
+ * check if it's so.
+ */
+ else if (rte->inh && !IS_DUMMY_REL(rel))


Likely you'll want:

else if rte->inh)
{
if (IS_DUMMY_REL(rel))
return;
// other stuff
}

otherwise, you'll end up in the else clause when you shouldn't be.

21. is -> was

+ * The child rel's RelOptInfo is created during
+ * expand_inherited_tables().
  */
  childrel = find_base_rel(root, childRTindex);

since you're talking about something that already happened.

I'll continue looking at v12.

-- 
 David Rowley   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: [HACKERS] pgbench - allow to store select results into variables

2019-01-10 Thread Fabien COELHO


Hello Alvaro,


Here are my proposed final changes.


Thanks again for improving the patch!

I noticed that you were allocating the prefixes for all cases even when 
there were no cset/gset in the command; I changed it to delay the 
allocation until needed.


Ok, why not.


I also noticed you were skipping the Meta enum dance for no good reason;


Indeed. I think that the initial version of the patch was before the 
"dance" was added, and it did not keep up with it.


added that makes conditionals simpler.  The read_response routine seemed 
misplaced and I gave it a name in a style closer to the rest of the 
pgbench code.


Fine.

Also, you were missing to free the ->lines pqexpbuffer when the command 
is discarded.  I grant that the free_command() stuff might be bogus 
since it's only tested with a command that's barely initialized, but it 
seems better to make it bogus in this way (fixable if we ever extend its 
use) than to forever leak memory silently.


Ok.

However, I switched "pg_free" to "termPQExpBuffer", which seems more 
appropriate, even if it just does the same thing. I also ensured that 
prefixes are allocated & freed. I've commented about expr which is not 
freed.



I didn't test this beyond running "make check".


That's a good start.

I'm not keen on having the command array size checked and updated *after* 
the command is appended, even if the initial allocation ensures that there 
is no overflow, but I let it as is.


Further tests did not yield any new issue.

Attached a v29 with the above minor changes wrt your version.

--
Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index b5e3a62a33..cc369c423e 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -954,6 +954,91 @@ pgbench  options  d
   
 
   
+   
+
+ \cset [prefix]
+
+
+
+ 
+  This command may be used to end SQL queries, replacing an embedded
+  semicolon (\;) within a compound SQL command.
+ 
+
+ 
+  When this command is used, the preceding SQL query is expected to
+  return one row, the columns of which are stored into variables named after
+  column names, and prefixed with prefix if provided.
+ 
+
+ 
+  The following example sends four queries as one compound SQL command,
+  inducing one message sent at the protocol level.
+  The result of the first query is stored into variable one,
+  the results of the third query are stored into variables z_three
+  and z_four,
+  whereas the results of the other queries are discarded.
+
+-- compound of four queries
+SELECT 1 AS one \cset
+SELECT 2 AS two \;
+SELECT 3 AS three, 4 AS four \cset z_
+SELECT 5;
+
+ 
+
+ 
+  
+\cset does not work when empty SQL queries appear
+within a compound SQL command.
+  
+ 
+
+   
+
+   
+
+ \gset [prefix]
+
+
+
+ 
+  This commands may be used to end SQL queries, replacing a final semicolon
+  (;). 
+ 
+
+ 
+  When this command is used, the preceding SQL query is expected to
+  return one row, the columns of which are stored into variables named after
+  column names, and prefixed with prefix if provided.
+ 
+
+ 
+  The following example puts the final account balance from the first query
+  into variable abalance, and fills variables
+  p_two and p_three 
+  with integers from the last query.
+  The result of the second query is discarded.
+
+UPDATE pgbench_accounts
+  SET abalance = abalance + :delta
+  WHERE aid = :aid
+  RETURNING abalance \gset
+-- compound of two queries
+SELECT 1 \;
+SELECT 2 AS two, 3 AS three \gset p_
+
+ 
+
+ 
+  
+\gset does not work when empty SQL queries appear
+within a compound SQL command.
+  
+ 
+
+   
+

 \if expression
 \elif expression
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index 649297ae4f..8bac939ff8 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -482,6 +482,8 @@ typedef enum MetaCommand
 	META_SETSHELL,/* \setshell */
 	META_SHELL,	/* \shell */
 	META_SLEEP,	/* \sleep */
+	META_CSET,	/* \cset */
+	META_GSET,	/* \gset */
 	META_IF,	/* \if */
 	META_ELIF,	/* \elif */
 	META_ELSE,	/* \else */
@@ -499,16 +501,39 @@ typedef enum QueryMode
 static QueryMode querymode = QUERY_SIMPLE;
 static const char *QUERYMODE[] = {"simple", "extended", "prepared"};
 
-typedef struct
+/*
+ * struct Command represents one command in a script.
+ *
+ * lines		The raw, possibly multi-line command text.  Variable substitution
+ *not applied.
+ * first_line	A short, single-line extract of 'lines', for error reporting.
+ * type			SQL_COMMAND or META_COMMAND
+ * meta			The type of meta-command, or META_NONE if command is SQL
+ * argc			Number of arguments of the command, 0 if not yet processed.
+ * argv			Command arguments, the first of which 

Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-10 Thread Amit Langote
Hi,

On 2019/01/10 17:46, Arkhena wrote:
>> Your rewritten version is perhaps fine, although I remain a bit concerned
>> that some users might be puzzled when they see this error, that is, if
>> they interpret the message as "it's impossible to use a partitioned table
>> as logical replication target".
>>
>>
>>From [documentation](
> https://www.postgresql.org/docs/current/logical-replication-restrictions.html
> ) :
>> Attempts to replicate tables other than base tables will result in an
> error.
> 
> That's basicaly what I had understood about logicial replication...

Ah, if the documentation contains such description then maybe it's fine.

The reason I started this thread is due to this Stack Overflow question:

https://stackoverflow.com/questions/53554727/logical-replication-and-declarative-partitioning-in-postgresql-11

So, it appears that there may be an element of surprise involved in
encountering such an error (despite the documentation).

Thanks,
Amit




Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-10 Thread Arkhena
> > On Tue, Jan 08, 2019 at 04:42:35PM +0900, Michael Paquier wrote:
> >> I can see your point, though I would stick with
> >> ERRCODE_WRONG_OBJECT_TYPE for consistency with the existing code and
> >> because the code is intended to not work on anything else than plain
> >> tables, at least now.
> >
> > Attached are my suggestions shaped as a patch.  Thoughts?
>
> Thanks for updating the patch and sorry couldn't reply sooner.
>
> Your rewritten version is perhaps fine, although I remain a bit concerned
> that some users might be puzzled when they see this error, that is, if
> they interpret the message as "it's impossible to use a partitioned table
> as logical replication target".
>
>
>From [documentation](
https://www.postgresql.org/docs/current/logical-replication-restrictions.html
) :
> Attempts to replicate tables other than base tables will result in an
error.

That's basicaly what I had understood about logicial replication...

Cheers,

Lætitia
-- 
Adoptez l'éco-attitude
N'imprimez ce mail que si c'est vraiment nécessaire


Re: [Sender Address Forgery]Re: error message when subscription target is a partitioned table

2019-01-10 Thread Amit Langote
On 2019/01/10 14:25, Michael Paquier wrote:
> On Tue, Jan 08, 2019 at 04:42:35PM +0900, Michael Paquier wrote:
>> I can see your point, though I would stick with
>> ERRCODE_WRONG_OBJECT_TYPE for consistency with the existing code and
>> because the code is intended to not work on anything else than plain
>> tables, at least now.
> 
> Attached are my suggestions shaped as a patch.  Thoughts?

Thanks for updating the patch and sorry couldn't reply sooner.

Your rewritten version is perhaps fine, although I remain a bit concerned
that some users might be puzzled when they see this error, that is, if
they interpret the message as "it's impossible to use a partitioned table
as logical replication target".

Thanks,
Amit




Re: GSoC 2019

2019-01-10 Thread Andrey Borodin
Hi!

> 8 янв. 2019 г., в 22:57, Alexander Korotkov  
> написал(а):
> 
> On Tue, Jan 8, 2019 at 1:06 AM Stephen Frost  wrote:
>> All the entries are marked with '2018' to indicate they were pulled from
>> last year.  If the project from last year is still relevant, please
>> update it to be '2019' and make sure to update all of the information
>> (in particular, make sure to list yourself as a mentor and remove the
>> other mentors, as appropriate).
> 
> I can confirm that I'm ready to mentor projects, where I'm listed as
> potential mentor.

I've updated GiST API and amcheck project year, removed mentors (except 
Alexander). Please, put your names back if you still wish to mentor this 
projects.

Also, we are planning to add new WAL-G project, Vladimir Leskov is now 
composing multiple tasks WAL-G to single project. Vladimir had done 2018's 
WAL-G project during Yandex internship, so I'll remove this project from page.

Best regards, Andrey Borodin.