date:20190125

Re: using expression syntax for partition bounds

2019-01-25 Thread Michael Paquier

On Sat, Jan 26, 2019 at 12:14:33PM +0900, Amit Langote wrote:
> The describe lines are there just to show that the stored expessions are
> not verbatim same as the input expressions, so it seemed an overkill to add
> them for all of the partitions.

I see, so per 7c079d7 this is the reason why showing part_3 matters
because you want to show the result of the expression after executing
the DDL, and this has been just added:
+CREATE TABLE part_3 PARTITION OF list_parted FOR VALUES IN ((2+1));

I think that it would be a good thing to show at least the NULL
partition because its partition definition has semantics different
from the three others so as it replaces part_1.  What do you think?
--
Michael


signature.asc
Description: PGP signature

Re: proposal: new polymorphic types - commontype and commontypearray

2019-01-25 Thread Pavel Stehule

so 26. 1. 2019 v 1:20 odesílatel Tom Lane  napsal:

> Pavel Stehule  writes:
> > Four years ago I proposed implicit casting to common type of arguments
> with
> > anyelement type.
> >
> https://www.postgresql.org/message-id/CAFj8pRCZVo_xoW0cfxt%3DmmgjXKBgr3Gm1VMGL_zx9wDRHmm6Cw%40mail.gmail.com
> > My proposal was rejected, because it introduce compatibility issues.
>
> Yup.
>
> > Now I have a solution that doesn't break anything. With two new
> polymorphic
> > types: commontype and commontypearray we can write functions like
> coalesce,
> > greatest, ..
>
> I think this is a good idea at its core, but I don't like the specifics
> too much.
>
> I agree with the basic idea of introducing a second, independent set of
> polymorphic-type variables.  Way back when we first discussed polymorphic
> types, we thought maybe we should invent anyelement2 and anyarray2, and
> perhaps even more pairs, to allow polymorphic functions to deal with two
> or more base types.  We didn't do that for lack of convincing examples of
> the need for it, but I expected some would emerge; I'm rather astonished
> that we've gotten by for so many years without adding a second set.
> So where I think we should go with this is to solve that need while
> we're at it.
>

I still expect we can have a more polymorphic types like any***(N)

There are important questions

1. Has a sense to have more distinct polymorphic types with same be behave?
- I see a benefit of this possibility - we can introduce anyelement2 .. AN2
and we can have a function

  fx(AN,AN, AN2,AN2) .. that means P2 and P4 should to have a same types
like P1 and P3.

2. What are a strategy of choosing real type for polymorphic types? - now
only equivalence is supported, but I can see more possibilities

  * common type - I did it
  * first win - often used in Oracle

The common type strategy is more important, because it is typical for some
"pseudo" functions like coalesce, least, greatest, .. in Postgres - and
extension's developers can design functions more compatible with core
functionality.

first win can be interesting for me (like Orafce creator and maintainer).
It can increase level of similarity implemented functions there, and reduce
work when queries are ported to Postgres. But this is important for smaller
group of PostgreSQL users.


>
> However, this proposal doesn't do so, because it omits "commonrange".
> I'm prepared to believe that we don't need "commonenum"; that would
> presumably have the semantics of "resolve the common type and then
> it must be an enum".  And that seems pretty useless, because there
> are no type resolution rules that would let us choose one enum out of
> a set.  (I suppose somebody might create implicit casts between some
> enum types, but it doesn't seem very likely.)  I also suspect that
> we could get away without "commonnonarray".  Anynonarray is really
> just a hack that we invented to avoid ambiguity around the ||
> operator, and an equivalent need would likely not come up for this
> second set of types.  (I could be wrong though; I'm not sure right
> now whether array_agg's use of anynonarray rather than anyelement
> is essential or just randomness.)  But neither of those arguments
> apply to commonrange; in fact it's highly likely that somebody would
> want to have "myfunc(commontype, commontype) returns commonrange"
> as a customized range constructor that can deal with slightly
> different input types.
>

I implemented just minimal set of new polymorphic types, just for
demonstration of my idea. Better coverage of other variants (where it has a
sense) is not a problem. Now, mostly I am searching a design where can be a
some agreement.


>
> My second problem with this proposal is that it simply ignores
> the naming precedent of the existing polymorphic types.  We have
> a convention that polymorphic types are named "any-something",
> and I do not think we should just toss that overboard.  Moreover,
> if we do end up needing "commonnonarray" or "commonenum", those
> names are ugly, typo-prone, and unreasonably long.
>
> We could do worse than to call these types anyelement2, anyarray2,
> anyrange2 and just document that their resolution rule depends
> on finding a common type rather than identical base types.
> I suppose that's not too pretty --- it reminds one of Oracle finally
> getting varchar semantics right with varchar2 :-(.  Another idea
> is anyelementc, anyarrayc, anyrangec ("c" for "common") but that's
> not pretty either.  Anyway I think the names need to be any-something.
>

I am open to any ideas. I don't like anyelement2, anyarray2 because

a) it is not verbose - and really different behave should not be signed by
number
b) I can imagine very well more anyelementX types.

I don't think so length is too important factor (but I fully agree -
shorter is better here). The polymorphic types are not too common.

I though about your proposed anyelementc, but the "c" is not much visible.
Can we use snake notation?

Re: PostgreSQL vs SQL/XML Standards

2019-01-25 Thread Chapman Flack

On 01/25/19 22:53, Pavel Stehule wrote:

> Documentation review will be harder - I am not a native speaker and I have
> not a necessary knowledges of XQuery (probably only you have this
> knowledge).

The doc patch is enough that I think it would be ideal to somehow attract
a native speaker who has interest in documentation to review it.
Even if that is someone without much XQuery-specific knowledge—I'll happily
answer whatever questions they ask about that. :)

-Chap

Re: PostgreSQL vs SQL/XML Standards

2019-01-25 Thread Chapman Flack

On 01/25/19 19:37, Chapman Flack wrote:
> There is still nothing in this patch set to address [1], though that
> also seems worth doing, perhaps in another patch, and probably not
> difficult, perhaps needing only a regex.

Heck, it could be even simpler than that. If some XML input has a DTD,
the attempt to parse it as 'content' with libxml is sure to fail early
in the parse (because that's where the DTD is found). If libxml reports
enough error detail to show it failed because of a DTD—and it appears to:

DETAIL:  line 1: StartTag: invalid element name

—then simply recognize that error and reparse as 'document' on the spot.
The early-failing first parse won't have cost much, and there is probably
next to nothing to gain by trying to be any more clever.

The one complication might that there seem to be versions of libxml that
report error detail differently (hence the variant regression test expected
files), so the code might have to recognize a couple different forms.

-Chap

> [1] https://www.postgresql.org/message-id/5BD1C44B.6040300%40anastigmatix.net

Re: PostgreSQL vs SQL/XML Standards

2019-01-25 Thread Pavel Stehule

Hi

so 26. 1. 2019 v 4:25 odesílatel Chapman Flack 
napsal:

> The following review has been posted through the commitfest application:
> make installcheck-world:  tested, passed
> Implements feature:   tested, passed
> Spec compliant:   not tested
> Documentation:not tested
>

maybe more simple will be work with original commitfest subscriptions. I
can do review of xmltable-xmlexists-passing-mechanisms-2.patch.
Documentation review will be harder - I am not a native speaker and I have
not a necessary knowledges of XQuery (probably only you have this
knowledge).


>
> As the reporter of the issues raised in this email thread, I've reviewed
> the first patch
> and contributed the second and third.
>
> WHAT THE PATCHES DO:
>
> xmltable-xpath-result-processing-bugfix-5.patch contains code changes
> correcting
> a subset of the issues that were raised in this email thread.
>
> xmltable-xmlexists-passing-mechanisms-2.patch adjusts the grammar to allow
> the XML
> parameter passing mechanism BY VALUE as well as BY REF. Both are ignored,
> but
> formerly BY VALUE was a syntax error, which was unintuitive considering
> that BY VALUE
> is the passing mechanism PostgreSQL implements (XML node identities are
> not preserved).
>
> xml-functions-type-docfix-1.patch conforms the documentation to reflect
> the changes in
> this patch set and the limitations identified in this thread.
>
> WHAT I HAVE REVIEWED:
>
> I have applied all three patches over 18c0da8 and confirmed that
> installcheck-world passes
> and that the code changes resolve the issues they set out to resolve.
>
> I've made no entry for "spec compliant" because the question is moot; the
> spec is written
> in terms of the XQuery language, types, and concepts, and these facilities
> in PG are
> implemented on XPath 1.0, which doesn't have those. But the changes in
> this patch set
> do make the PG behaviors more, well, closely analogous to the way the spec
> compliant
> functions would behave.
>
> WHAT I HAVE NOT REVIEWED:
>
> The passing-mechanisms and docfix patches are my own work, so there should
> be another
> reviewer who is not me. I've looked closely at the technical, SQL/XML
> behavior aspects already,
> but a reviewer with an eye for documentation would be welcome.
>
> I'll venture my opinion that this is ready-for-committer to the extent of
> my own review, but will
> leave the status at needs-review for a not-me reviewer to update.

Re: [PATCH] Allow UNLISTEN during recovery

2019-01-25 Thread Shay Rojansky

>
> > Thanks for insisting - I ended up setting up the environment and running
> > the tests, and discovering that some test-related changes were missing.
> > Here's a 3rd version of the patch. Hope this is now in good shape, let me
> > know if you think anything else needs to be done.
>
> Lotta work for a one-line code change, eh?  Pushed now.
>

Yes, especially when you're new to building and testing PostgreSQL :)
Thanks for accepting!

Re: PostgreSQL vs SQL/XML Standards

2019-01-25 Thread Chapman Flack

The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:   tested, passed
Spec compliant:   not tested
Documentation:not tested

As the reporter of the issues raised in this email thread, I've reviewed the 
first patch
and contributed the second and third.

WHAT THE PATCHES DO:

xmltable-xpath-result-processing-bugfix-5.patch contains code changes correcting
a subset of the issues that were raised in this email thread.

xmltable-xmlexists-passing-mechanisms-2.patch adjusts the grammar to allow the 
XML
parameter passing mechanism BY VALUE as well as BY REF. Both are ignored, but
formerly BY VALUE was a syntax error, which was unintuitive considering that BY 
VALUE
is the passing mechanism PostgreSQL implements (XML node identities are not 
preserved).

xml-functions-type-docfix-1.patch conforms the documentation to reflect the 
changes in
this patch set and the limitations identified in this thread.

WHAT I HAVE REVIEWED:

I have applied all three patches over 18c0da8 and confirmed that 
installcheck-world passes
and that the code changes resolve the issues they set out to resolve.

I've made no entry for "spec compliant" because the question is moot; the spec 
is written
in terms of the XQuery language, types, and concepts, and these facilities in 
PG are
implemented on XPath 1.0, which doesn't have those. But the changes in this 
patch set
do make the PG behaviors more, well, closely analogous to the way the spec 
compliant
functions would behave.

WHAT I HAVE NOT REVIEWED:

The passing-mechanisms and docfix patches are my own work, so there should be 
another
reviewer who is not me. I've looked closely at the technical, SQL/XML behavior 
aspects already,
but a reviewer with an eye for documentation would be welcome.

I'll venture my opinion that this is ready-for-committer to the extent of my 
own review, but will
leave the status at needs-review for a not-me reviewer to update.

Re: pg_basebackup, walreceiver and wal_sender_timeout

2019-01-25 Thread Michael Paquier

On Fri, Jan 25, 2019 at 03:26:38PM +0100, Nick B wrote:
> On server we see this error firing: "terminating walsender process due to
> replication timeout"
> The problem occurs during a network or file system acting very slow. One
> example of such case looks like this (strace output for fsync calls):
> 
> 0.033383 fsync(8)  = 0 <20.265275>
> 20.265399 fsync(8) = 0 <0.11>
> 0.022892 fsync(7)  = 0 <48.350566>
> 48.350654 fsync(7) = 0 <0.05>
> 0.000674 fsync(8)  = 0 <0.851536>
> 0.851619 fsync(8)  = 0 <0.07>
> 0.67 fsync(7)  = 0 <0.06>
> 0.45 fsync(7)  = 0 <0.05>
> 0.031733 fsync(8)  = 0 <0.826957>
> 0.827869 fsync(8)  = 0 <0.16>
> 0.005344 fsync(7)  = 0 <1.437103>
> 1.446450 fsync(6)  = 0 <0.063148>
> 0.063246 fsync(6)  = 0 <0.06>
> 0.000381 +++ exited with 1 +++

These are a bit unregular.  Which files are taking that long to
complete while others are way faster?  It may be something that we
could improve on the base backup side as there is no actual point in
syncing segments while the backup is running and we could delay that
at the end of the backup (if I recall that stuff correctly).

> This begs a question, why is the GUC handled the way it is? What would be
> the correct way to solve this? Shall we change the behaviour or do a better
> job explaining what are implications of wal_sender_timeout in the
> docs?

The following commit and thread are the ones you look for here:
https://www.postgresql.org/message-id/506972b9.6060...@vmware.com

commit: 6f60fdd7015b032bf49273c99f80913d57eac284
committer: Heikki Linnakangas 
date: Thu, 11 Oct 2012 17:48:08 +0300
Improve replication connection timeouts.

Rename replication_timeout to wal_sender_timeout, and add a new setting
called wal_receiver_timeout that does the same at the walreceiver side.
There was previously no timeout in walreceiver, so if the network went down,
for example, the walreceiver could take a long time to notice that the
connection was lost. Now with the two settings, both sides of a replication
connection will detect a broken connection similarly.

It is no longer necessary to manually set wal_receiver_status_interval
to a value smaller than the timeout. Both wal sender and receiver now 
automatically send a "ping" message if more than 1/2 of the configured
timeout has elapsed, and it hasn't received any messages from the
other end.

The docs could be improved to describe that better..
--
Michael

signature.asc
Description: PGP signature

Re: using expression syntax for partition bounds

2019-01-25 Thread Amit Langote

 Hi,

On Sat, Jan 26, 2019 at 12:01 Michael Paquier  wrote:

> On Sat, Jan 26, 2019 at 03:14:51AM +0900, Amit Langote wrote:
> > How about replacing \d+ list_parted with couple of \d on individual
> > partitions, like in the attached?
>
> That would make it.  Why just part_1 and part_3 though?  It looks more
> complete to add part_null and part_2 as well.

The describe lines are there just to show that the stored expessions are
not verbatim same as the input expressions, so it seemed an overkill to add
them for all of the partitions.

Thanks,
Amit

PS: away from a computer, typing on the smartphone

>

Re: crosstab/repivot...any interest?

2019-01-25 Thread Morris de Oryx

Hello, I'm not a C coder and can't helpbut I *love* cross-tab/pivot
tables. They're the best, and just fantastic for preparing data to feed
into various analysis tools. The tablefunc module is helpful, but a bit
awkward to use (as I remember it.)

>From a user's point of view, I high-performance cross-tab generator would
be just fantastic.

As I understand it, this is what's involved in a pivot:

1. Identify rows that should be *grouped* (consolidated.)
2. Distinguish the value that identifies each *derived* column.
3. Distinguish the value that *identifies* each row-column value.
4. *Collapse* the rows, *build* the columns, and *populate* the 'cells'
with data.

In an ideal world, you would be able to perform different grouping
operations. Such as count, sum, avg, etc.

If there's a way to do this in a system-wide and standards-pointing way, so
much the better.

*Apologies* if I'm violating list etiquette by jumping in here. I've been
lurking on several Postgres lists for a bit and picking up interesting
details every day. If I've been Unintentionally and Cluelessly Off, I'm
find with being told.

On Sat, Jan 26, 2019 at 10:49 AM David Fetter  wrote:

> On Fri, Jan 25, 2019 at 04:31:00PM -0600, Merlin Moncure wrote:
> > On Fri, Jan 25, 2019 at 3:16 PM David Fetter  wrote:
> > >
> > > On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote:
> > > > Hackers,
> > > >
> > > > We have a strong need to make a variant to the crosstab interface so
> > > > that data that is pivoted one way would be sent through a crosstab
> > > > like function so that it would be pivoted another way.  For example,
> > > > if you had
> > > >
> > > > row 0: a1, a2, a3, k1, c1, c2, ...
> > > > row 1: a1, a2, a3, k2, c1, c2, ...
> > > > row 2: a1, a2, a3, k3, c1, c2, ...
> > > > ...
> > > >
> > > > where 'a' columns are uninteresting attribute columns, 'k' is the
> > > > dimension we want to pivot on, and c1->cN would be stacked
> vertically,
> > > > so that we'd end up with,
> > > > row 0: a1, a2, a3, c1, k1, k2, ...
> > > > row 1: a1, a2, a3, c2, k1, k2, ...
> > > > row 2: a1, a2, a3, c3, k1, k2, ...
> > > >
> > > > There are various SQL level approaches to this but they tend to be
> > > > imperformant with large datasets so that I think a crosstab-like C
> > > > implementation ought to be able to do better (or at least I hope so)
> > > > since you have to cross product rows and columns in such a way that
> > > > you can get a clean join.  Cribbing from tablefunc.c I don't think
> > > > this is a terrible challenge to do in hash table style.
> > > >
> > > > Questions on the table:
> > > > *) Has anyone done anything like this or know of any current
> implementations?
> > > > *) Would there be any interest in expanding tablefunc along these
> lines?
> > >
> > > There's something in SQL:2016 that I read as crosstabs, or at least as
> > > enabling crosstabs.
> > > https://www.iso.org/standard/69776.html
> > >
> > > If we're going to put work into crosstabs, it seems to me that the
> > > "we" needs to be the project as a whole, and the work should be, to
> > > the extent reasonable, toward standard compliance.
> >
> > Interesting.  Do you see that the spec (it makes my brain hurt) can
> > handle that kind of repivoting?
>
> I believe the constructs can nest and/or refer to each other, so yes.
>
> Best,
> David.
> --
> David Fetter  http://fetter.org/
> Phone: +1 415 235 3778
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
>

Re: Simplify set of flags used by MyXactFlags

2019-01-25 Thread Michael Paquier

On Fri, Jan 25, 2019 at 12:30:39PM -0300, Alvaro Herrera wrote:
> I am, except that the change of "table" to "object" in xact.c line 2262
> makes the new paragraph read a bit weird -- it's now saying "if we've
> added a temp object ...   Other objects have the same problem".  Doesn't
> make sense.  If you keep that word as "table", it works fine.

Indeed.  Committed with this suggestion.  Thanks for the review.
--
Michael


signature.asc
Description: PGP signature

Re: using expression syntax for partition bounds

2019-01-25 Thread Michael Paquier

On Sat, Jan 26, 2019 at 03:14:51AM +0900, Amit Langote wrote:
> How about replacing \d+ list_parted with couple of \d on individual
> partitions, like in the attached?

That would make it.  Why just part_1 and part_3 though?  It looks more
complete to add part_null and part_2 as well.
--
Michael


signature.asc
Description: PGP signature

Re: [PATCH] Allow UNLISTEN during recovery

2019-01-25 Thread Tom Lane

Shay Rojansky  writes:
> Thanks for insisting - I ended up setting up the environment and running
> the tests, and discovering that some test-related changes were missing.
> Here's a 3rd version of the patch. Hope this is now in good shape, let me
> know if you think anything else needs to be done.

Lotta work for a one-line code change, eh?  Pushed now.

regards, tom lane

Re: jsonpath

2019-01-25 Thread Alexander Korotkov

On Wed, Jan 23, 2019 at 8:01 AM Alexander Korotkov
 wrote:
> Finally, I'm going to commit this if no objections.

BTW, I decided to postpone commit for few days.  Nikita and me are
still working on better error messages.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: "SELECT ... FROM DUAL" is not quite as silly as it appears

2019-01-25 Thread Tom Lane

David Rowley  writes:
> As far as I can see the patch is ready to go, but I'll defer to Mark,
> who's also listed on the reviewer list for this patch.

Mark, are you planning to do further review on this patch?
I'd like to move it along, since (IMO anyway) we need it in
before progress can be made on
https://commitfest.postgresql.org/21/1664/

regards, tom lane

Re: [HACKERS] Removing [Merge]Append nodes which contain a single subpath

2019-01-25 Thread Tom Lane

David Rowley  writes:
> On Thu, 3 Jan 2019 at 08:01, Tomas Vondra  
> wrote:
>> AFAICS the patch essentially does two things: (a) identifies Append
>> paths with a single member and (b) ensures the Vars are properly mapped
>> when the Append node is skipped when creating the plan.
>> I agree doing (a) in setrefs.c is too late, as mentioned earlier. But
>> perhaps we could do at least (b) to setrefs.c?

> The problem I found with doing var translations in setrefs.c was that
> the plan tree is traversed there breadth-first and in createplan.c we
> build the plan depth-first.  The problem I didn't manage to overcome
> with that is that when we replace a "ProxyPath" (actually an Append
> path marked as is_proxy=true) it only alter targetlists, etc of nodes
> above that in the plan tree. The nodes below it and their targetlists
> don't care, as these don't fetch Vars from nodes above them.  If we do
> the Var translation in setrefs then we've not yet looked at the nodes
> that don't care and we've already done the nodes that do care. So the
> tree traversal is backwards.

I don't quite buy this argument, because you haven't given a reason
why it doesn't apply with equal force to setrefs.c's elimination of
no-op SubqueryScan nodes.  Yet that does work.

Stepping back for a minute: even if we get this to work, how often
is it going to do anything useful?  It seems like most of the other
development that's going on is trying to postpone pruning till later,
so I wonder how often we'll really know at the beginning of planning
that only one child will survive.

regards, tom lane

Re: proposal: new polymorphic types - commontype and commontypearray

2019-01-25 Thread Tom Lane

Pavel Stehule  writes:
> Four years ago I proposed implicit casting to common type of arguments with
> anyelement type.
> https://www.postgresql.org/message-id/CAFj8pRCZVo_xoW0cfxt%3DmmgjXKBgr3Gm1VMGL_zx9wDRHmm6Cw%40mail.gmail.com
> My proposal was rejected, because it introduce compatibility issues.

Yup.

> Now I have a solution that doesn't break anything. With two new polymorphic
> types: commontype and commontypearray we can write functions like coalesce,
> greatest, ..

I think this is a good idea at its core, but I don't like the specifics
too much.

I agree with the basic idea of introducing a second, independent set of
polymorphic-type variables.  Way back when we first discussed polymorphic
types, we thought maybe we should invent anyelement2 and anyarray2, and
perhaps even more pairs, to allow polymorphic functions to deal with two
or more base types.  We didn't do that for lack of convincing examples of
the need for it, but I expected some would emerge; I'm rather astonished
that we've gotten by for so many years without adding a second set.
So where I think we should go with this is to solve that need while
we're at it.

However, this proposal doesn't do so, because it omits "commonrange".
I'm prepared to believe that we don't need "commonenum"; that would
presumably have the semantics of "resolve the common type and then
it must be an enum".  And that seems pretty useless, because there
are no type resolution rules that would let us choose one enum out of
a set.  (I suppose somebody might create implicit casts between some
enum types, but it doesn't seem very likely.)  I also suspect that
we could get away without "commonnonarray".  Anynonarray is really
just a hack that we invented to avoid ambiguity around the ||
operator, and an equivalent need would likely not come up for this
second set of types.  (I could be wrong though; I'm not sure right
now whether array_agg's use of anynonarray rather than anyelement
is essential or just randomness.)  But neither of those arguments
apply to commonrange; in fact it's highly likely that somebody would
want to have "myfunc(commontype, commontype) returns commonrange"
as a customized range constructor that can deal with slightly
different input types.

My second problem with this proposal is that it simply ignores
the naming precedent of the existing polymorphic types.  We have
a convention that polymorphic types are named "any-something",
and I do not think we should just toss that overboard.  Moreover,
if we do end up needing "commonnonarray" or "commonenum", those
names are ugly, typo-prone, and unreasonably long.

We could do worse than to call these types anyelement2, anyarray2,
anyrange2 and just document that their resolution rule depends
on finding a common type rather than identical base types.
I suppose that's not too pretty --- it reminds one of Oracle finally
getting varchar semantics right with varchar2 :-(.  Another idea
is anyelementc, anyarrayc, anyrangec ("c" for "common") but that's
not pretty either.  Anyway I think the names need to be any-something.

I haven't particularly studied the patch code, but I will note that
this sort of change seems pretty dumb:

@@ -953,7 +953,7 @@ make_scalar_array_op(ParseState *pstate, List *opname,
 * enforce_generic_type_consistency may or may not have replaced a
 * polymorphic type with a real one.
 */
-   if (IsPolymorphicType(declared_arg_types[1]))
+   if (IsPolymorphicTypeAny(declared_arg_types[1]))
{
/* assume the actual array type is OK */
res_atypeId = atypeId;

Why would we want to reject the new poly types here?  Or just about
anyplace else that tests IsPolymorphicType?  The argument-type resolution
functions themselves need to distinguish the two groups of types,
at least for some purposes, but it's very hard to believe anyplace
else should do so.

regards, tom lane

Re: How does the planner determine plan_rows ?

2019-01-25 Thread Bruce Momjian

On Thu, Jan 10, 2019 at 11:41:51PM -0800, Donald Dong wrote:
> 
> > On Jan 10, 2019, at 8:01 PM, Tom Lane  wrote:
> > 
> > ... estimate_rel_size() in plancat.c is where to look to find out
> > about the planner's default estimates when it's lacking hard data.
> 
> Thank you! Now I see how the planner uses the rows to estimate the cost and
> generates the best_plan.
> 
> To me, tracing the function calls is not a simple task. I'm using cscope, and 
> I
> use printf when I'm not entirely sure. I was considering to use gbd, but I'm
> having issues referencing the source code in gdb.
> 
> I'm very interested to learn how the professionals explore the codebase!

Uh, the developer FAQ has some info on this:

https://wiki.postgresql.org/wiki/Developer_FAQ

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Strange wording in advanced tutorial

2019-01-25 Thread Bruce Momjian

On Thu, Jan 10, 2019 at 01:33:43PM +0100, Daniel Gustafsson wrote:
> The file header in the advanced tutorial has what seems like incorrect (or at
> least odd) wording: "Tutorial on advanced more PostgreSQL features”.  Attached
> patch changes to “more advanced” which I think is what was the intention.
> 
> I can willingly admit that I had never even noticed the tutorial directory
> until I yesterday stumbled across it.  The commit introducing the above 
> wording
> is by now old enough to drive.
> 

Agreed, thanks more.  ;-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Bison state table

2019-01-25 Thread Bruce Momjian

On Sat, Jan 26, 2019 at 12:49:50AM +0100, David Fetter wrote:
> On Fri, Jan 25, 2019 at 05:38:59PM -0500, Bruce Momjian wrote:
> > With our scanner keywords list now more cache-aware, and with us
> > planning to use Bison for years to come, has anyone ever looked at
> > reordering the bison state machine array to be more cache aware, e.g.,
> > having common states next to each other rather than scattered around the
> > array?
> 
> Do we have a pretty good idea of what commonly grouped states are, or
> at least a way to get some not-awful estimates of what they are?

Uh, I am afraid we would need to profile the grammer with some sample
queries and then base the reordering on that, kind of how compilers use
sampling to do branch prediction.  Yeah, crazy idea, I know.  I figured
some smart person might have written a tool to do that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Bison state table

2019-01-25 Thread David Fetter

On Fri, Jan 25, 2019 at 05:38:59PM -0500, Bruce Momjian wrote:
> With our scanner keywords list now more cache-aware, and with us
> planning to use Bison for years to come, has anyone ever looked at
> reordering the bison state machine array to be more cache aware, e.g.,
> having common states next to each other rather than scattered around the
> array?

Do we have a pretty good idea of what commonly grouped states are, or
at least a way to get some not-awful estimates of what they are?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: crosstab/repivot...any interest?

2019-01-25 Thread David Fetter

On Fri, Jan 25, 2019 at 04:31:00PM -0600, Merlin Moncure wrote:
> On Fri, Jan 25, 2019 at 3:16 PM David Fetter  wrote:
> >
> > On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote:
> > > Hackers,
> > >
> > > We have a strong need to make a variant to the crosstab interface so
> > > that data that is pivoted one way would be sent through a crosstab
> > > like function so that it would be pivoted another way.  For example,
> > > if you had
> > >
> > > row 0: a1, a2, a3, k1, c1, c2, ...
> > > row 1: a1, a2, a3, k2, c1, c2, ...
> > > row 2: a1, a2, a3, k3, c1, c2, ...
> > > ...
> > >
> > > where 'a' columns are uninteresting attribute columns, 'k' is the
> > > dimension we want to pivot on, and c1->cN would be stacked vertically,
> > > so that we'd end up with,
> > > row 0: a1, a2, a3, c1, k1, k2, ...
> > > row 1: a1, a2, a3, c2, k1, k2, ...
> > > row 2: a1, a2, a3, c3, k1, k2, ...
> > >
> > > There are various SQL level approaches to this but they tend to be
> > > imperformant with large datasets so that I think a crosstab-like C
> > > implementation ought to be able to do better (or at least I hope so)
> > > since you have to cross product rows and columns in such a way that
> > > you can get a clean join.  Cribbing from tablefunc.c I don't think
> > > this is a terrible challenge to do in hash table style.
> > >
> > > Questions on the table:
> > > *) Has anyone done anything like this or know of any current 
> > > implementations?
> > > *) Would there be any interest in expanding tablefunc along these lines?
> >
> > There's something in SQL:2016 that I read as crosstabs, or at least as
> > enabling crosstabs.
> > https://www.iso.org/standard/69776.html
> >
> > If we're going to put work into crosstabs, it seems to me that the
> > "we" needs to be the project as a whole, and the work should be, to
> > the extent reasonable, toward standard compliance.
> 
> Interesting.  Do you see that the spec (it makes my brain hurt) can
> handle that kind of repivoting?

I believe the constructs can nest and/or refer to each other, so yes.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: WIP: Avoid creation of the free space map for small tables

2019-01-25 Thread John Naylor

On Thu, Jan 24, 2019 at 9:50 PM Amit Kapila  wrote:
>
> On Fri, Jan 25, 2019 at 1:03 AM John Naylor  
> wrote:
> >
> > On Wed, Jan 23, 2019 at 11:17 PM Amit Kapila  
> > wrote:
> > > I think what doc means to say is
> > > that it copies any unlinked files present in primary's new cluster
> > > (which in your case will be data2).
> >
> > In that case, I'm still confused why that doc says, "Unfortunately,
> > rsync needlessly copies files associated with temporary and unlogged
> > tables because these files don't normally exist on standby servers."
> > I fail to see why the primary's new cluster would have these if they
> > weren't linked.
> >
>
> Why unlogged files won't be in primary's new cluster?  After the
> upgrade, they should be present in a new cluster if they were present
> in the old cluster.

I assume they would be linked, however (I haven't checked this). I did
think rewritten VM files would fall under this, but I was confused
about unlogged files.

> > And in the case we're discussing here, the skipped
> > FSMs won't be on data2, so won't end up in standby/data2.
> >
>
> Right.  I think we are safe with respect to rsync because I have seen
> that we do rewrite the vm files in link mode and rsync will copy them
> from primary's new cluster.

Okay.

> I think you can try to address my other comments on your pg_upgrade
> patch.   Once we agree on the code, we need to test below scenarios:
> (a) upgrade from all supported versions to the latest version
> (b) upgrade standby with and without using rsync.

Sounds good.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: WIP: Avoid creation of the free space map for small tables

2019-01-25 Thread John Naylor

On Thu, Jan 24, 2019 at 5:19 AM Amit Kapila  wrote:
>
> 1.
> + if ((maps[mapnum].relkind != RELKIND_RELATION &&
> + maps[mapnum].relkind != RELKIND_TOASTVALUE) ||
> + first_seg_size > HEAP_FSM_CREATION_THRESHOLD * BLCKSZ ||
> + GET_MAJOR_VERSION(new_cluster.major_version) <= 1100)
> + (void) transfer_relfile([mapnum], "_fsm", vm_must_add_frozenbit);
>
> I think this check will needlessly be performed for future versions as
> well, say when wants to upgrade from PG12 to PG13.  That might not
> create any problem, but let's try to be more precise.  Can you try to
> rewrite this check?  You might want to encapsulate it inside a
> function.  I have thought of doing something similar to what we do for
> vm, see checks relate to VISIBILITY_MAP_FROZEN_BIT_CAT_VER, but I
> guess for this patch it is not important to check catalog version as
> even if someone tries to upgrade to the same version.

Agreed, done for v19 (I've only attached the pg_upgrade patch).

> 2.
> transfer_relfile()
> {
> ..
> - /* Is it an extent, fsm, or vm file? */
> - if (type_suffix[0] != '\0' || segno != 0)
> + /* Did file open fail? */
> + if (stat(old_file, ) != 0)
> ..
> }
>
> So from now onwards, we will call stat for even 0th segment which
> means there is one additional system call for each relation, not sure
> if that matters, but I think there is no harm in once testing with a
> large number of relations say 10K to 50K relations which have FSM.

Performance testing is probably a good idea anyway, but I went ahead
and implemented your next idea:

> The other alternative is we can fetch pg_class.relpages and rely on
> that to take this decision, but again if that is not updated, we might
> take the wrong decision.

We can think of it this way: Which is worse,
1. Transferring a FSM we don't need, or
2. Skipping a FSM we need

I'd say #2 is worse. So, in v19 we check pg_class.relpages and if it's
a heap and less than or equal the threshold we call stat on the 0th
segment to verify. In the common case, the cost of the stat call is
offset by not linking the FSM. Despite needing another pg_class field,
I think this code is actually easier to read than my earlier versions.

> 3.
> -static void
> +static Size
>  transfer_relfile(FileNameMap *map, const char *type_suffix, bool
> vm_must_add_frozenbit)
>
> If we decide to go with the approach proposed by you, we should add
> some comments atop this function for return value change?

Done, as well as other comment edits.

-- 
John Naylorhttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From 277d969c31349b22e2c7726935d681e72aacaece Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Fri, 25 Jan 2019 17:18:28 -0500
Subject: [PATCH v19 3/3] During pg_upgrade, conditionally skip transfer of
 FSMs.

If a heap on the old cluster has 4 pages or fewer, don't copy or
link the FSM. This will reduce space usage for installations with
large numbers of small tables.
---
 src/bin/pg_upgrade/info.c| 18 ++-
 src/bin/pg_upgrade/pg_upgrade.h  |  6 +++
 src/bin/pg_upgrade/relfilenode.c | 84 +++-
 3 files changed, 93 insertions(+), 15 deletions(-)

diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 2f925f086c..55d4911d10 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -200,6 +200,8 @@ create_rel_filename_map(const char *old_data, const char *new_data,
 
 	map->old_db_oid = old_db->db_oid;
 	map->new_db_oid = new_db->db_oid;
+	map->relpages = old_rel->relpages;
+	map->relkind = old_rel->relkind;
 
 	/*
 	 * old_relfilenode might differ from pg_class.oid (and hence
@@ -415,9 +417,11 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	int			ntups;
 	int			relnum;
 	int			num_rels = 0;
+	int			relpages;
 	char	   *nspname = NULL;
 	char	   *relname = NULL;
 	char	   *tablespace = NULL;
+	char	   *relkind = NULL;
 	int			i_spclocation,
 i_nspname,
 i_relname,
@@ -425,7 +429,9 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 i_indtable,
 i_toastheap,
 i_relfilenode,
-i_reltablespace;
+i_reltablespace,
+i_relpages,
+i_relkind;
 	char		query[QUERY_ALLOC];
 	char	   *last_namespace = NULL,
 			   *last_tablespace = NULL;
@@ -494,7 +500,7 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	 */
 	snprintf(query + strlen(query), sizeof(query) - strlen(query),
 			 "SELECT all_rels.*, n.nspname, c.relname, "
-			 "  c.relfilenode, c.reltablespace, %s "
+			 "  c.relfilenode, c.reltablespace, c.relpages, c.relkind, %s "
 			 "FROM (SELECT * FROM regular_heap "
 			 "  UNION ALL "
 			 "  SELECT * FROM toast_heap "
@@ -526,6 +532,8 @@ get_rel_infos(ClusterInfo *cluster, DbInfo *dbinfo)
 	i_relfilenode = PQfnumber(res, "relfilenode");
 	i_reltablespace = PQfnumber(res, "reltablespace");
 	i_spclocation = PQfnumber(res, "spclocation");
+	i_relpages = PQfnumber(res, "relpages");
+	i_relkind = PQfnumber(res, "relkind");

Re: GUC parameters accepts values in both octal and hexadecimal formats

2019-01-25 Thread Bruce Momjian

On Tue, Jan  8, 2019 at 05:08:41PM +1100, Haribabu Kommi wrote:
> Hi Hackers,
> 
> The Server GUC parameters accepts values in both Octal and hexadecimal formats
> also.
> 
> postgres=# set max_parallel_workers_per_gather='0x10';
> postgres=# show max_parallel_workers_per_gather;
>  max_parallel_workers_per_gather 
> -
>  16
> 
> postgres=# set max_parallel_workers_per_gather='010';
> postgres=# show max_parallel_workers_per_gather;
>  max_parallel_workers_per_gather 
> -
>  8
> 
> I can check that this behavior exists for quite some time, but I am not able 
> to
> find any documentation related to it? Can some one point me to relevant 
> section
> where it is available? If not exists, is it fine to add it?

Well, we call strtol() in guc.c, and the strtol() manual page says:

The string may begin with an arbitrary amount of white space (as
determined by isspace(3)) followed by a single optional '+' or '-' sign.
If base is zero or 16, the string  may  then include a "0x" prefix, and
the number will be read in base 16; otherwise, a zero base is taken as
10 (decimal) unless the next character is '0', in which case it is taken
as 8 (octal).

so it looks like the behavior is just a side-effect of our strtol call. 
I am not sure it is worth documenting though.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Alternative to \copy in psql modelled after \g

2019-01-25 Thread Fabien COELHO




Hello Tom,


Fixing the problem envolves deciding what is the right behavior, and
update the documentation and the implementation accordingly. Currently the
documentation suggests that :ERROR is about SQL error (subject to
interpretation), and the implementation is more or less consistent with
that view, but not always, as pointed out by Daniel.


FWIW, I think we should adopt the rule that :ERROR reflects any error
condition, whether server-side or client-side.


I think that everybody agrees with that.

It's unlikely that scripts would really not care about client-side 
errors.  Moreover, I do not think we can reliably tell the difference; 
there are always going to be corner cases that are hard to classify. 
Example: if we lose the server connection, is that a server-side error 
or client-side? Or maybe neither, maybe the network went down.


Because of this concern, I find the idea of inventing separate
SQL_ERROR and CLIENT_ERROR variables to be a pretty terrible one.


Then the committer is right:-)


I think we'd be creating a lot of make-work and hard choices for
ourselves, to do something that most script writers won't give a
fig about.  If you've got subtle error-handling logic in mind,
you shouldn't be trying to implement your app in a psql script
in the first place, IMO anyway.


Possibly. I was also thinking of debug. ISTM that SQL_ERROR is pretty 
trivial to implement because we already set SQLSTATE, and SQL_ERROR is 
probably something like SQLSTATE != "" or close to that.



It's unclear to me whether to push ahead with Daniel's existing
patch or not.  It doesn't look to me like it's making things
any worse from the error-consistency standpoint than they were
already, so I'd be inclined to consider error semantics cleanup
as something to be done separately/later.


Fine.


BTW, it looks to me like the last patch is still not right as far
as when to close copystream --- won't it incorrectly close a
stream obtained from pset.copyStream?


Argh, I should have checked this point.

--
Fabien.

Bison state table

2019-01-25 Thread Bruce Momjian

With our scanner keywords list now more cache-aware, and with us
planning to use Bison for years to come, has anyone ever looked at
reordering the bison state machine array to be more cache aware, e.g.,
having common states next to each other rather than scattered around the
array?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: Adding a concept of TEMPORARY TABLESPACE for the use in temp_tablespaces

2019-01-25 Thread Bruce Momjian

On Sun, Jan  6, 2019 at 11:01:52AM -0800, Mitar wrote:
> Hi!
> 
> I have read around the Internet a lot about the idea of using /dev/shm
> for a tablespace to put tables in and issues with that. But I still
> have not managed to get a good grasp why would that be a bad idea for
> using it for temporary objects. I understand that for regular tables
> this might prevent database startup and recovery because tables and
> all files associated with tables would be missing. While operations
> for those tables could reside in the oplog. (Not sure if this means
> that unlogged tables can be stored on such tablesspace.)
> 
> I have experimented a bit and performance really improves if /dev/shm
> is used. I have experimented with creating temporary tables inside a
> regular (SSD backed) tablespace /dev/shm and I have seen at least 2x
> improvement in time it takes for a set of modification+select queries
> to complete.
> 
> I have also tested what happens if I kill all processes with KILL and
> restart it. There is noise in logs about missing files, but it does
> start up. Dropping and recreating the tablespace works.
> 
> So I wonder, should we add a TEMPORARY flag to a TABLESPACE which
> would mark a tablespace such that if at startup its location is empty,
> it is automatically recreated, without warnings/errors? (Maybe some
> other term could be used for this.)
> 
> Ideally, such tablespace could be set as temp_tablespaces and things
> should work out: PostgreSQL should recreate the tablespace before
> trying to use temp_tablespaces for the first time.
> 
> We could even make it so that only temporary objects are allowed to be
> created in a TEMPORARY TABLESPACE, to make sure user does not make a
> mistake.

I wrote a blog entry about this:

https://momjian.us/main/blogs/pgblog/2017.html#June_2_2017

This is certainly an area we can improve, but it would require changes
in several parts of the system to handle cases where the tablespace
disappears.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: crosstab/repivot...any interest?

2019-01-25 Thread Merlin Moncure

On Fri, Jan 25, 2019 at 3:16 PM David Fetter  wrote:
>
> On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote:
> > Hackers,
> >
> > We have a strong need to make a variant to the crosstab interface so
> > that data that is pivoted one way would be sent through a crosstab
> > like function so that it would be pivoted another way.  For example,
> > if you had
> >
> > row 0: a1, a2, a3, k1, c1, c2, ...
> > row 1: a1, a2, a3, k2, c1, c2, ...
> > row 2: a1, a2, a3, k3, c1, c2, ...
> > ...
> >
> > where 'a' columns are uninteresting attribute columns, 'k' is the
> > dimension we want to pivot on, and c1->cN would be stacked vertically,
> > so that we'd end up with,
> > row 0: a1, a2, a3, c1, k1, k2, ...
> > row 1: a1, a2, a3, c2, k1, k2, ...
> > row 2: a1, a2, a3, c3, k1, k2, ...
> >
> > There are various SQL level approaches to this but they tend to be
> > imperformant with large datasets so that I think a crosstab-like C
> > implementation ought to be able to do better (or at least I hope so)
> > since you have to cross product rows and columns in such a way that
> > you can get a clean join.  Cribbing from tablefunc.c I don't think
> > this is a terrible challenge to do in hash table style.
> >
> > Questions on the table:
> > *) Has anyone done anything like this or know of any current 
> > implementations?
> > *) Would there be any interest in expanding tablefunc along these lines?
>
> There's something in SQL:2016 that I read as crosstabs, or at least as
> enabling crosstabs.
> https://www.iso.org/standard/69776.html
>
> If we're going to put work into crosstabs, it seems to me that the
> "we" needs to be the project as a whole, and the work should be, to
> the extent reasonable, toward standard compliance.

Interesting.  Do you see that the spec (it makes my brain hurt) can
handle that kind of repivoting?

merlin

Re: Early WIP/PoC for inlining CTEs

2019-01-25 Thread Tom Lane

Andreas Karlsson  writes:
> [ inlining-ctes-v8.patch ]

I went ahead and pushed the stuff about QTW_EXAMINE_RTES_BEFORE/_AFTER,
because that seems like an independent change with other possible uses.

Attached is an updated version of the rest of the patch, with mostly
cosmetic changes.  I've not touched the documentation, but I think this
is otherwise committable if we are satisfied with the semantics.

However ... after thinking about it more, I'm not really satisfied
with that.  In particular I don't like the fact that by default this
will inline regardless of the number of references to the CTE.  I doubt
that inlining when there are multiple references is so likely to be a
win as to justify it being the default, especially given that it flies
in the face of what our documentation has said for as long as we've
had CTEs.

Therefore, I'm reversing my previous opinion that we should not have
an explicit NOT MATERIALIZED option.  I think we should add that, and
the behavior ought to be:

* No option given: inline if there's exactly one reference.

* With MATERIALIZED: never inline.

* With NOT MATERIALIZED: inline regardless of the number of references.

(Obviously, we should not inline if there's RECURSIVE or the CTE
potentially has side-effects, regardless of the user option;
I don't think those cases are up for debate.)

I haven't done anything about that here, but the changes would be pretty
minor.

regards, tom lane

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index f177eba..6d456f6 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -2925,6 +2925,7 @@ JumbleExpr(pgssJumbleState *jstate, Node *node)
 
 /* we store the string name because RTE_CTE RTEs need it */
 APP_JUMB_STRING(cte->ctename);
+APP_JUMB(cte->ctematerialized);
 JumbleQuery(jstate, castNode(Query, cte->ctequery));
 			}
 			break;
diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index bb92d9d..8c26dd1 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -1888,7 +1888,7 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  QUERY PLAN  
 -
  Limit
@@ -1905,7 +1905,7 @@ WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2
Output: t.c1_1, t.c2_1, t.c1_3
 (12 rows)
 
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
  c1_1 | c2_1 
 --+--
   101 |  101
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index f438165..56602a1 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -493,8 +493,8 @@ SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t
 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR SHARE;
 -- join in CTE
 EXPLAIN (VERBOSE, COSTS OFF)
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
-WITH t (c1_1, c1_3, c2_1) AS (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
+WITH t (c1_1, c1_3, c2_1) AS MATERIALIZED (SELECT t1.c1, t1.c3, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1)) SELECT c1_1, c2_1 FROM t ORDER BY c1_3, c1_1 OFFSET 100 LIMIT 10;
 -- ctid with whole-row reference
 EXPLAIN (VERBOSE, COSTS OFF)
 SELECT t1.ctid, t1, t2, t1.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10;
diff --git a/doc/src/sgml/queries.sgml

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-01-25 Thread Alvaro Herrera

On 2019-Jan-25, Robert Haas wrote:

> I finally gotten a little more time to work on this.  It took me a
> while to understand that a PartitionedRelPruneInfos assumes that the
> indexes of partitions in the PartitionDesc don't change between
> planning and execution, because subplan_map[] and subplan_map[] are
> indexed by PartitionDesc offset.

Right, the planner/executor "disconnect" is one of the challenges, and
why I was trying to keep the old copy of the PartitionDesc around
instead of building updated ones as needed.

> I suppose the reason for this is so
> that we don't have to go to the expense of copying the partition
> bounds from the PartitionDesc into the final plan, but it somehow
> seems a little scary to me.  Perhaps I am too easily frightened, but
> it's certainly a problem from the point of view of this project, which
> wants to let the PartitionDesc change concurrently.

Well, my definition of the problem started with the assumption that we
would keep the partition array indexes unchanged, so "change
concurrently" is what we needed to avoid.  Yes, I realize that you've
opted to change that definition.

I may have forgotten some of your earlier emails on this, but one aspect
(possibly a key one) is that I'm not sure we really need to cope, other
than with an ERROR, with queries that continue to run across an
attach/detach -- moreso in absurd scenarios such as the ones you
described where the detached table is later re-attached, possibly to a
different partitioned table.  I mean, if we can just detect the case and
raise an error, and this let us make it all work reasonably, that might
be better.

> I wrote a little patch that stores the relation OIDs of the partitions
> into the PartitionedPruneRelInfo and then, at execution time, does an
> Assert() that what it gets matches what existed at plan time.  I
> figured that a good start would be to find a test case where this
> fails with concurrent DDL allowed, but I haven't so far succeeded in
> devising one.  To make the Assert() fail, I need to come up with a
> case where concurrent DDL has caused the PartitionDesc to be rebuilt
> but without causing an update to the plan.  If I use prepared queries
> inside of a transaction block, [...]

> I also had the idea of trying to use a cursor, because if I could
> start execution of a query, [...]

Those are the ways I thought of, and the reason for the shape of some of
those .spec tests.  I wasn't able to hit the situation.

> Maybe if I try it with CLOBBER_CACHE_ALWAYS...

I didn't try this one.

> Anyway, I think this idea of passing a list of relation OIDs that we
> saw at planning time through to the executor and cross-checking them
> might have some legs.  If we only allowed concurrently *adding*
> partitions and not concurrently *removing* them, then even if we find
> the case(s) where the PartitionDesc can change under us, we can
> probably just adjust subplan_map and subpart_map to compensate, since
> we can iterate through the old and new arrays of relation OIDs and
> just figure out which things have shifted to higher indexes in the
> PartitionDesc.  This is all kind of hand-waving at the moment; tips
> appreciated.

I think detaching partitions concurrently is a necessary part of this
feature, so I would prefer not to go with a solution that works for
attaching partitions but not for detaching them.  That said, I don't see
why it's impossible to adjust the partition maps in both cases.  But I
don't have anything better than hand-waving ATM.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: crosstab/repivot...any interest?

2019-01-25 Thread David Fetter

On Fri, Jan 25, 2019 at 02:21:55PM -0600, Merlin Moncure wrote:
> Hackers,
> 
> We have a strong need to make a variant to the crosstab interface so
> that data that is pivoted one way would be sent through a crosstab
> like function so that it would be pivoted another way.  For example,
> if you had
> 
> row 0: a1, a2, a3, k1, c1, c2, ...
> row 1: a1, a2, a3, k2, c1, c2, ...
> row 2: a1, a2, a3, k3, c1, c2, ...
> ...
> 
> where 'a' columns are uninteresting attribute columns, 'k' is the
> dimension we want to pivot on, and c1->cN would be stacked vertically,
> so that we'd end up with,
> row 0: a1, a2, a3, c1, k1, k2, ...
> row 1: a1, a2, a3, c2, k1, k2, ...
> row 2: a1, a2, a3, c3, k1, k2, ...
> 
> There are various SQL level approaches to this but they tend to be
> imperformant with large datasets so that I think a crosstab-like C
> implementation ought to be able to do better (or at least I hope so)
> since you have to cross product rows and columns in such a way that
> you can get a clean join.  Cribbing from tablefunc.c I don't think
> this is a terrible challenge to do in hash table style.
> 
> Questions on the table:
> *) Has anyone done anything like this or know of any current implementations?
> *) Would there be any interest in expanding tablefunc along these lines?

There's something in SQL:2016 that I read as crosstabs, or at least as
enabling crosstabs.
https://www.iso.org/standard/69776.html

If we're going to put work into crosstabs, it seems to me that the
"we" needs to be the project as a whole, and the work should be, to
the extent reasonable, toward standard compliance.

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: ATTACH/DETACH PARTITION CONCURRENTLY

2019-01-25 Thread Robert Haas

On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera  wrote:
> Namely: how does this handle the case of partition pruning structure
> being passed from planner to executor, if an attach happens in the
> middle of it and puts a partition in between existing partitions?  Array
> indexes of any partitions that appear later in the partition descriptor
> will change.

I finally gotten a little more time to work on this.  It took me a
while to understand that a PartitionedRelPruneInfos assumes that the
indexes of partitions in the PartitionDesc don't change between
planning and execution, because subplan_map[] and subplan_map[] are
indexed by PartitionDesc offset.  I suppose the reason for this is so
that we don't have to go to the expense of copying the partition
bounds from the PartitionDesc into the final plan, but it somehow
seems a little scary to me.  Perhaps I am too easily frightened, but
it's certainly a problem from the point of view of this project, which
wants to let the PartitionDesc change concurrently.

I wrote a little patch that stores the relation OIDs of the partitions
into the PartitionedPruneRelInfo and then, at execution time, does an
Assert() that what it gets matches what existed at plan time.  I
figured that a good start would be to find a test case where this
fails with concurrent DDL allowed, but I haven't so far succeeded in
devising one.  To make the Assert() fail, I need to come up with a
case where concurrent DDL has caused the PartitionDesc to be rebuilt
but without causing an update to the plan.  If I use prepared queries
inside of a transaction block, I can continue to run old plans after
concurrent DDL has changed things, but I can't actually make the
Assert() fail, because the queries continue to use the old plans
precisely because they haven't processed invalidation messages, and
therefore they also have the old PartitionDesc and everything works.
Maybe if I try it with CLOBBER_CACHE_ALWAYS...

I also had the idea of trying to use a cursor, because if I could
start execution of a query, then force a relcache rebuild, then
continue executing the query, maybe something would blow up somehow.
But that's not so easy because I don't think we have any way using SQL
to declare a cursor for a prepared query, so how do I need to get a
query plan that involves run-time pruning without using parameters,
which I'm pretty sure is possible but I haven't figured it out yet.
And even there the PartitionDirectory concept might preserve us from
any damage if the change happens after the executor is initialized,
though I'm not sure if there are any cases where we don't do the first
PartitionDesc lookup for a particular table until mid-execution.

Anyway, I think this idea of passing a list of relation OIDs that we
saw at planning time through to the executor and cross-checking them
might have some legs.  If we only allowed concurrently *adding*
partitions and not concurrently *removing* them, then even if we find
the case(s) where the PartitionDesc can change under us, we can
probably just adjust subplan_map and subpart_map to compensate, since
we can iterate through the old and new arrays of relation OIDs and
just figure out which things have shifted to higher indexes in the
PartitionDesc.  This is all kind of hand-waving at the moment; tips
appreciated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

crosstab/repivot...any interest?

2019-01-25 Thread Merlin Moncure

Hackers,

We have a strong need to make a variant to the crosstab interface so
that data that is pivoted one way would be sent through a crosstab
like function so that it would be pivoted another way.  For example,
if you had

row 0: a1, a2, a3, k1, c1, c2, ...
row 1: a1, a2, a3, k2, c1, c2, ...
row 2: a1, a2, a3, k3, c1, c2, ...
...

where 'a' columns are uninteresting attribute columns, 'k' is the
dimension we want to pivot on, and c1->cN would be stacked vertically,
so that we'd end up with,
row 0: a1, a2, a3, c1, k1, k2, ...
row 1: a1, a2, a3, c2, k1, k2, ...
row 2: a1, a2, a3, c3, k1, k2, ...

There are various SQL level approaches to this but they tend to be
imperformant with large datasets so that I think a crosstab-like C
implementation ought to be able to do better (or at least I hope so)
since you have to cross product rows and columns in such a way that
you can get a clean join.  Cribbing from tablefunc.c I don't think
this is a terrible challenge to do in hash table style.

Questions on the table:
*) Has anyone done anything like this or know of any current implementations?
*) Would there be any interest in expanding tablefunc along these lines?

thanks in advance,
merlin

Re: move hash_any to utils/hash/hashfn.c

2019-01-25 Thread Tom Lane

Andres Freund  writes:
> On 2019-01-25 16:35:18 -0300, Alvaro Herrera wrote:
>> Would anybody object too hard if I move hash_any() and friends to
>> src/backend/utils/hash/hashfn.c and the declarations to
>> src/include/utils/hashutils.h?

> I hate the current split quite a bit, albeit for somewhat different
> reasons. We make things like tag_hash, uint32_hash unnecessarily more
> expensive by adding an entirely useless external function call. And
> some of these can be fairly hot (e.g. for syscache).  So yea, let's
> move this stuff together.

+1

regards, tom lane

Re: move hash_any to utils/hash/hashfn.c

2019-01-25 Thread Andres Freund

Hi,

On 2019-01-25 16:35:18 -0300, Alvaro Herrera wrote:
> I just noticed (once again) that we have hash_any() declared in
> src/access/hash.h (the header file for the hash index AM) rather than
> somewhere in utils/, for no good reason other than perhaps there was no
> better place when it was introduced.  This means that some files that
> seem to just need hash_any end up pointlessly #including the whole AM
> world upon themselves.
> 
> Would anybody object too hard if I move hash_any() and friends to
> src/backend/utils/hash/hashfn.c and the declarations to
> src/include/utils/hashutils.h?

I hate the current split quite a bit, albeit for somewhat different
reasons. We make things like tag_hash, uint32_hash unnecessarily more
expensive by adding an entirely useless external function call. And
some of these can be fairly hot (e.g. for syscache).  So yea, let's
move this stuff together.

Greetings,

Andres Freund

move hash_any to utils/hash/hashfn.c

2019-01-25 Thread Alvaro Herrera

I just noticed (once again) that we have hash_any() declared in
src/access/hash.h (the header file for the hash index AM) rather than
somewhere in utils/, for no good reason other than perhaps there was no
better place when it was introduced.  This means that some files that
seem to just need hash_any end up pointlessly #including the whole AM
world upon themselves.

Would anybody object too hard if I move hash_any() and friends to
src/backend/utils/hash/hashfn.c and the declarations to
src/include/utils/hashutils.h?


-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Protect syscache from bloating with negative cache entries

2019-01-25 Thread 'Bruce Momjian'

On Fri, Jan 25, 2019 at 08:14:19AM +, Tsunakawa, Takayuki wrote:
> Hi Horiguchi-san, Bruce,
>
> From: Bruce Momjian [mailto:br...@momjian.us]
> > I suggest you go with just syscache_prune_min_age, get that into
> > PG 12, and we can then reevaluate what we need.  If you want to
> > hard-code a minimum cache size where no pruning will happen, maybe
> > based on the system catalogs or typical load, that is fine.
>
> Please forgive me if I say something silly (I might have got lost.)
>
> Are you suggesting to make the cache size limit system-defined and
> uncontrollable by the user?  I think it's necessary for the DBA to
> be able to control the cache memory amount.  Otherwise, if many
> concurrent connections access many partitions within a not-so-long
> duration, then the cache eviction can't catch up and ends up in OOM.
> How about the following questions I asked in my previous mail?
>
> --
> This is a pure question.  How can we answer these questions from
> users?
>
> * What value can I set to cache_memory_target when I can use 10 GB for
> * the caches and max_connections = 100?  How much RAM do I need to
> * have for the caches when I set cache_memory_target = 1M?
>
> The user tends to estimate memory to avoid OOM.

Well, let's walk through this.  Suppose the default for
syscache_prune_min_age is 10 minutes, and that we prune all cache
entries unreferenced in the past 10 minutes, or we only prune every 10
minutes if the cache size is larger than some fixed size like 100.

So, when would you change syscache_prune_min_age?  If you reference many
objects and then don't reference them at all for minutes, you might want
to lower syscache_prune_min_age to maybe 1 minute.  Why would you want
to change the behavior of removing all unreferenced cache items, at
least when there are more than 100?  (You called this
syscache_memory_target.)

My point is I can see someone wanting to change syscache_prune_min_age,
but I can't see someone wanting to change syscache_memory_target.  Who
would want to keep 5k cache entries that have not been accessed in X
minutes?  If we had some global resource manager that would allow you to
control work_mem, maintenance_work_mem, cache size, and set global
limits on their sizes, I can see where maybe it might make sense, but
right now the memory usage of a backend is so fluid that setting some
limit on its size  for unreferenced entries just doesn't make sense.

One of my big points is that syscache_memory_target doesn't even
guarantee that the cache will be this size or lower, it only controls
whether the cleanup happens at syscache_prune_min_age intervals.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +

Re: using expression syntax for partition bounds

2019-01-25 Thread Amit Langote

On Sat, Jan 26, 2019 at 12:19 AM Tom Lane  wrote:
> Peter Eisentraut  writes:
> > committed
>
> Some of the buildfarm members are having sort-ordering problems
> with this.  Looks like you could work around it with different
> partition names (don't assume the relative sort order of
> letters and digits).

How about replacing \d+ list_parted with couple of \d on individual
partitions, like in the attached?

Thanks,
Amit


7c079d7417-stabilize-test-output.patch
Description: Binary data

Re: Use zero for nullness estimates of system attributes

2019-01-25 Thread Tom Lane

Jim Finnerty  writes:
> Tom, there's an analogous issue of adjusting distinct values on a per-column
> basis based on the selectivity of other local predicates.  Several
> commercial RDBMS's make such adjustments in an effort to get better
> selectivity estimates when there are multiple local predicates.  Is this
> something that the PG community has considered and decided not to do because
> of the additional planning-time overhead that may be required, or just
> something that hasn't been discussed or tackled yet?

I think what you're talking about is the correlated-variables problem,
which we have made a start on with the "extended statistics" mechanism,
though certainly a lot remains to be done.

regards, tom lane

Re: Alternative to \copy in psql modelled after \g

2019-01-25 Thread Tom Lane

Fabien COELHO  writes:
> Fixing the problem envolves deciding what is the right behavior, and 
> update the documentation and the implementation accordingly. Currently the 
> documentation suggests that :ERROR is about SQL error (subject to 
> interpretation), and the implementation is more or less consistent with 
> that view, but not always, as pointed out by Daniel.

FWIW, I think we should adopt the rule that :ERROR reflects any error
condition, whether server-side or client-side.  It's unlikely that
scripts would really not care about client-side errors.  Moreover,
I do not think we can reliably tell the difference; there are always
going to be corner cases that are hard to classify.  Example: if we
lose the server connection, is that a server-side error or client-side?
Or maybe neither, maybe the network went down.

Because of this concern, I find the idea of inventing separate
SQL_ERROR and CLIENT_ERROR variables to be a pretty terrible one.
I think we'd be creating a lot of make-work and hard choices for
ourselves, to do something that most script writers won't give a
fig about.  If you've got subtle error-handling logic in mind,
you shouldn't be trying to implement your app in a psql script
in the first place, IMO anyway.

It's unclear to me whether to push ahead with Daniel's existing
patch or not.  It doesn't look to me like it's making things
any worse from the error-consistency standpoint than they were
already, so I'd be inclined to consider error semantics cleanup
as something to be done separately/later.

BTW, it looks to me like the last patch is still not right as far
as when to close copystream --- won't it incorrectly close a
stream obtained from pset.copyStream?  While you could fix that
with

-   if (pset.gfname && copystream != NULL)
+   if (!pset.copyStream && pset.gfname && copystream != NULL)

I think that's getting overly complex and unmaintainable.  I'd
be inclined to code things more like

FILE   *copystream = NULL;
boolneed_close = false;

...

need_close = openQueryOutputFile(...);

...

if (need_close)
// close copystream here

regards, tom lane

Re: Connection slots reserved for replication

2019-01-25 Thread Oleksii Kliukin

Hello,

> On 24. Jan 2019, at 13:47, Kyotaro HORIGUCHI 
>  wrote:

Thank you for the review.I took a liberty to address it with v9.

> 
> Documentation looks fine for me. By the way, a comment for the
> caller-site of CheckRequreParameterValues() in xlog.c looks
> somewhat stale.
> 
>> /* Check to see if any changes to max_connections give problems */
>> CheckRequiredParameterValues();
> 
> may be better something like this?:
> 
>> /* Check to see if any parameter change gives a problem on replication */

I changed it to "Check if any parameter change gives a problem on recovery”, as 
I think it is independent of the [streaming] replication, but I still don’t 
like the wording, as it just duplicate the comment at the definition of 
CheckRequiredParameterValues. Maybe remove the comment altogether?

> 
> 
> In postinit.c:
>>   /*
>>* The last few connection slots are reserved for superusers.
>>*/
>>   if ((!am_superuser && !am_walsender) &&
>>   ReservedBackends > 0 &&
> 
> This is forgetting about explaing about walsenders.
> 
>> The last few connection slots are reserved for
>> superusers. Replication connections don't share the same slot
>> pool.
> 
> Or something?

I changed it to

+* The last few connection slots are reserved for superusers.
+* Replication connections are drawn from a separate pool and
+* not limited by max_connections or superuser_reserved_connections.


> 
> And the parentheses enclosing "!am_superuser..walsender" seems no
> longer needed.
> 
> 
> In guc.c:
> - /* see max_connections and max_wal_senders */
> + /* see max_connections */
> 
> I don't understand for what reason we should see max_connections
> just above. (Or even max_wal_senders) Do we need this comment?
> (There're some other instances, but they wont'be for this patch.)

I don’t understand what is it pointing to as well, so I’ve removed it.

> 
> 
> In pg_controldata.c:
> + printf(_("max_wal_senders setting: %d\n"),
> +ControlFile->max_wal_senders);
>   printf(_("max_worker_processes setting: %d\n"),
>  ControlFile->max_worker_processes);
>   printf(_("max_prepared_xacts setting:   %d\n"),
> 
> The added item seems to want some additional spaces.

Good catch, fixed.

Attached is v9. I also bumped up the PG_CONTROL_VERSION to 1200 per the prior 
comment by Robert. 

Cheers,
Oleksii



replication_reserved_connections_v9.patch
Description: Binary data

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-25 Thread Tom Lane

Alvaro Herrera  writes:
> So let's have it write with a $VACUUMDB_OPTS variable, which is by
> default defined as empty but with a comment suggesting that maybe the
> user wants to add the -j option.  This way, if they have to edit it,
> they only have to edit the VACUUMDB_OPTS line instead of each of the two
> vacuumdb lines.

Even better, allow the script to absorb a value from the environment,
so that it needn't be edited at all.

regards, tom lane

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-25 Thread Alvaro Herrera

On 2019-Jan-25, Peter Eisentraut wrote:

> On 25/01/2019 11:28, Michael Paquier wrote:
> > based on that linking the value used by pg_upgrade and vacuumdb is a
> > bad concept in my opinion, and the patch should be rejected.  More
> > documentation on pg_upgrade side to explain that a bit better could be
> > a good idea though, as it is perfectly possible to use your own
> > post-upgrade script or rewrite partially the generated one.
> 
> Right.  pg_upgrade doesn't actually call vacuumdb.  It creates a script
> that you may use.  The script itself contains a comment that says, if
> you want to do this as fast as possible, don't use this script.  That
> comment could be enhanced to suggest the use of the -j option.

So let's have it write with a $VACUUMDB_OPTS variable, which is by
default defined as empty but with a comment suggesting that maybe the
user wants to add the -j option.  This way, if they have to edit it,
they only have to edit the VACUUMDB_OPTS line instead of each of the two
vacuumdb lines.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] logical decoding of two-phase transactions

2019-01-25 Thread Alvaro Herrera

Eyeballing 0001, it has a few problems.

1. It's under-parenthesizing the txn argument of the macros.

2. the "has"/"is" macro definitions don't return booleans -- see
fce4609d5e5b.

3. the remainder of this no longer makes sense:

/* Do we know this is a subxact?  Xid of top-level txn if so */
-   boolis_known_as_subxact;
TransactionId toplevel_xid;

I suggest to fix the comment, and also improve the comment next to the
macro that tests this flag.


(4. the macro names are ugly.)

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Use zero for nullness estimates of system attributes

2019-01-25 Thread Tom Lane

Edmund Horner  writes:
> I added some code to selfuncs.c to estimate the selectivity of CTID,
> including nullness, in my ongoing attempt to add TID range scans [1].  And
> as Tom pointed out [2], no system attribute can be null, so we might as
> well handle them all.
> That's what the attached patch does.

Seems pretty uncontroversial, so pushed.

> I observed a few interesting things with outer join selectivity:
> While system attributes aren't NULL in the table, they can be in queries
> such as:

Yeah, none of our selectivity calculations account for the possibility
that we're above a join that has affected the distribution of a Var's
values.  Going to NULL in an outer join is just part of that issue.
I don't feel this patch needs to solve it, and anyway it'd be a rather
massive rethink.

> Finally: I thought about introducing a macro to attnum.h:
> #define AttrNumberIsForSystemAttr(attributeNumber) \
>  ((bool) ((attributeNumber) < 0))
> But there's a zillion places that could be changed to use it, so I haven't
> in this version of the patch.

I can't get too excited about that.  Even if the reader is unfamiliar with
the negative-attno convention, most of these places are commented in a way
that makes it clear what's going on.

regards, tom lane

Re: Alternative to \copy in psql modelled after \g

2019-01-25 Thread Fabien COELHO




Bonsoir Daniel,


Sure. As there are several bugs (doubtful features) uncovered, a first
patch could fix "COPY ...TO STDOUT \g file", but probably replicate ERROR
current behavior however debatable it is (i.e. your patch without the
ERROR change, which is unrelated to the bug being fixed), and then another
patch should fix/modify the behavior around ERROR (everywhere and
consistently), and probably IMHO add an SQL_ERROR.


It's not as if the patch issued an explicit call SetVariable("ERROR", "true")
that could be commented, or something like that. The assignment
of the variable happens as a consequence of patched code that aims at
being correct in its error handling.
So I'm for leaving this decision to a maintainer, because I don't agree
with your conclusion that the current patch should be changed in
that regard.


Ok, fine with me.

To summarize the debate, when fixing "\g filename" for "COPY TO STDOUT" 
the patch does not set error consistently on:


  sql> COPY (SELECT 1) TO STDOUT \g /BAD
# Permission denied on "/BAD"
-> :ERROR is true
# the command is executed but could not write to the file

compared to the existing behavior:

  sql> SELECT 1 \g BAD
# Permission denied on "/BAD"
-> :ERROR is false
# the SQL command is fine, although writing to the file failed

Should we include this inconsistency and then fix the problem later, or 
replicate the initial (doubtful?) behavior for consistency and then fix 
the problem later, anyway?


Fixing the problem envolves deciding what is the right behavior, and 
update the documentation and the implementation accordingly. Currently the 
documentation suggests that :ERROR is about SQL error (subject to 
interpretation), and the implementation is more or less consistent with 
that view, but not always, as pointed out by Daniel.


Note that as the author of the initial patch adding :ERROR and others 
(69835bc8), I'm probably somehow responsible for this situation, so shame 
on me.


--
Fabien.

Re: Simplify set of flags used by MyXactFlags

2019-01-25 Thread Alvaro Herrera

On 2019-Jan-25, Michael Paquier wrote:

> On Thu, Jan 24, 2019 at 08:56:05AM -0300, Alvaro Herrera wrote:
> > Uh, I didn't think it was necessary nor desirable to rename the flag,
> > only the user-visible message.
> 
> Oh, OK.  I have overstated your comment then.  Are you fine with the
> attached instead?  The flag name remains the same, and the comment is
> updated.

I am, except that the change of "table" to "object" in xact.c line 2262
makes the new paragraph read a bit weird -- it's now saying "if we've
added a temp object ...   Other objects have the same problem".  Doesn't
make sense.  If you keep that word as "table", it works fine.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: insensitive collations

2019-01-25 Thread Peter Eisentraut

On 16/01/2019 21:50, Peter Eisentraut wrote:
> On 16/01/2019 14:20, Daniel Verite wrote:
>> I've found another issue with aggregates over distinct:
>> the deduplication seems to ignore the collation.
> 
> I have a fix for that.  I'll send it with the next update.

Another patch.  This fixes your issue, and it incorporates the findings
from the thread "ExecBuildGroupingEqual versus collations", as well as a
few other fixes and more tests.

As far as I can tell, this covers everything now, meaning all the
relevant plan types propagate the collation correctly and all the
relevant operators and functions do the right things with them.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>From 1d675c51fb40a23b355ad5aa3de382e9fd4ffc82 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 25 Jan 2019 16:13:00 +0100
Subject: [PATCH v4] Collations with nondeterministic comparison

This adds a flag "deterministic" to collations.  If that is false,
such a collation disables various optimizations that assume that
strings are equal only if they are byte-wise equal.  That then allows
use cases such as case-insensitive or accent-insensitive comparisons
or handling of strings with different Unicode normal forms.

The term "deterministic comparison" in this context is from Unicode
Technical Standard #10
(https://unicode.org/reports/tr10/#Deterministic_Comparison).
---
 contrib/bloom/bloom.h |   1 +
 contrib/bloom/blutils.c   |   3 +-
 doc/src/sgml/catalogs.sgml|   7 +
 doc/src/sgml/charset.sgml |  11 +-
 doc/src/sgml/func.sgml|   6 +
 doc/src/sgml/ref/create_collation.sgml|  22 +
 src/backend/access/hash/hashfunc.c|  86 +++
 src/backend/catalog/pg_collation.c|   2 +
 src/backend/commands/collationcmds.c  |  25 +-
 src/backend/executor/execExpr.c   |   4 +-
 src/backend/executor/execGrouping.c   |  12 +-
 src/backend/executor/execPartition.c  |   1 +
 src/backend/executor/execReplication.c|   5 +-
 src/backend/executor/nodeAgg.c|   9 +-
 src/backend/executor/nodeGroup.c  |   1 +
 src/backend/executor/nodeHash.c   |  14 +-
 src/backend/executor/nodeHashjoin.c   |   5 +
 src/backend/executor/nodeRecursiveunion.c |   1 +
 src/backend/executor/nodeSetOp.c  |   2 +
 src/backend/executor/nodeSubplan.c|  14 +-
 src/backend/executor/nodeUnique.c |   1 +
 src/backend/executor/nodeWindowAgg.c  |   2 +
 src/backend/nodes/copyfuncs.c |   7 +
 src/backend/nodes/outfuncs.c  |   7 +
 src/backend/nodes/readfuncs.c |   7 +
 src/backend/optimizer/plan/createplan.c   |  54 +-
 src/backend/optimizer/util/tlist.c|  25 +
 src/backend/partitioning/partbounds.c |   4 +-
 src/backend/partitioning/partprune.c  |   3 +-
 src/backend/regex/regc_pg_locale.c|   5 +
 src/backend/utils/adt/arrayfuncs.c|   2 +-
 src/backend/utils/adt/like.c  |  27 +-
 src/backend/utils/adt/name.c  |  32 +-
 src/backend/utils/adt/orderedsetaggs.c|   3 +-
 src/backend/utils/adt/pg_locale.c |   1 +
 src/backend/utils/adt/ri_triggers.c   |  39 +-
 src/backend/utils/adt/varchar.c   | 117 +++
 src/backend/utils/adt/varlena.c   | 166 +++--
 src/backend/utils/cache/catcache.c|   9 +-
 src/backend/utils/cache/lsyscache.c   |  16 +
 src/bin/initdb/initdb.c   |   4 +-
 src/bin/pg_dump/pg_dump.c |  39 +-
 src/bin/psql/describe.c   |  17 +-
 src/include/catalog/pg_collation.h|   2 +
 src/include/executor/executor.h   |   3 +
 src/include/executor/hashjoin.h   |   1 +
 src/include/executor/nodeHash.h   |   2 +-
 src/include/nodes/execnodes.h |   3 +
 src/include/nodes/plannodes.h |   7 +
 src/include/optimizer/planmain.h  |   2 +-
 src/include/optimizer/tlist.h |   1 +
 src/include/partitioning/partbounds.h |   1 +
 src/include/utils/lsyscache.h |   1 +
 src/include/utils/pg_locale.h |   1 +
 .../regress/expected/collate.icu.utf8.out | 702 +-
 .../regress/expected/collate.linux.utf8.out   |  25 +-
 src/test/regress/expected/collate.out |  15 +
 src/test/regress/expected/subselect.out   |  19 +
 src/test/regress/sql/collate.icu.utf8.sql | 230 ++
 src/test/regress/sql/collate.linux.utf8.sql   |   8 +
 src/test/regress/sql/collate.sql  |   5 +
 src/test/regress/sql/subselect.sql|  17 +
 src/test/subscription/Makefile|   2 +

Re: using expression syntax for partition bounds

2019-01-25 Thread Tom Lane

Peter Eisentraut  writes:
> committed

Some of the buildfarm members are having sort-ordering problems
with this.  Looks like you could work around it with different
partition names (don't assume the relative sort order of
letters and digits).

regards, tom lane

Re: House style for DocBook documentation?

2019-01-25 Thread Chapman Flack

On 01/25/19 09:01, Peter Eisentraut wrote:
> On 19/01/2019 21:24, Chapman Flack wrote:
>> (thinks to self half-seriously about an XSL transform for generating
>> printed output that could preserve link-texted links, add raised numbers,
>> and produce a numbered URLs section at the back)
> 
> External links already create footnotes in the PDF output.  Is that
> different from what you are saying?

That might, only, indicate that I was just thinking aloud in email and
had not gone and checked in PDF output to see how the links were handled.

Yes, it could very possibly indicate that.

If they are already processed that way, does that mean the

o  Do not use text with  so the URL appears in printed output

in README.links should be considered obsolete, and removed even, and
doc authors should feel free to put link text in  without
hesitation?

Regards,
-Chap

Re: Speeding up text_position_next with multibyte encodings

2019-01-25 Thread Heikki Linnakangas


On 15/01/2019 02:52, John Naylor wrote:

The majority of cases are measurably faster, and the best case is at
least 20x faster. On the whole I'd say this patch is a performance win
even without further optimization. I'm marking it ready for committer.


I read through the patch one more time, tweaked the comments a little 
bit, and committed. Thanks for the review!


I did a little profiling of the worst case, where this is slower than 
the old approach. There's a lot of function call overhead coming from 
walking the string with pg_mblen(). That could be improved. If we 
inlined pg_mblen() into loop, it becomes much faster, and I think this 
code would be faster even in the worst case. (Except for the very worst 
cases, where hash table with the new code happens to have a collision at 
a different point than before, but that doesn't seem like a fair 
comparison.)


I think this is good enough as it is, but if I have the time, I'm going 
to try optimizing the pg_mblen() loop, as well as similar loops e.g. in 
pg_mbstrlen(). Or if someone else wants to give that a go, feel free.


- Heikki

Re: [HACKERS] logical decoding of two-phase transactions

2019-01-25 Thread Petr Jelinek

On 14/01/2019 23:16, Arseny Sher wrote:
> 
> Nikhil Sontakke  writes:
> 
>> I'd like to believe that the latest patch set tries to address some
>> (if not all) of your concerns. Can you please take a look and let me
>> know?
> 
> Hi, sure.
> 
> General things:
> 
> - Earlier I said that there is no point of sending COMMIT PREPARED if
>   decoding snapshot became consistent after PREPARE, i.e. PREPARE hadn't
>   been sent. I realized since then that such use cases actually exist:
>   prepare might be copied to the replica by e.g. basebackup or something
>   else earlier. 

Basebackup does not copy slots though and slot should not reach
consistency until all prepared transactions are committed no?

> 
> - BTW, ReorderBufferProcessXid at PREPARE should be always called
>   anyway, because otherwise if xact is empty, we will not prepare it
>   (and call cb), even if the output plugin asked us not to filter it
>   out. However, we will call commit_prepared cb, which is inconsistent.
> 
> - I find it weird that in DecodePrepare and in DecodeCommit you always
>   ask the plugin whether to filter an xact, given that sometimes you
>   know beforehand that you are not going to replay it: it might have
>   already been replayed, might have wrong dbid, origin, etc. One
>   consequence of this: imagine that notorious xact with PREPARE before
>   point where snapshot became consistent and COMMIT PREPARED after that
>   point. Even if filter_cb says 'I want 2PC on this xact', with current
>   code it won't be replayed on PREPARE and rbxid will be destroyed with
>   ReorderBufferForget. Now this xact is lost.

Yeah this is wrong.

> 
> Second patch:
> 
> + /* filter_prepare is optional, but requires two-phase decoding */
> + if ((ctx->callbacks.filter_prepare_cb != NULL) && 
> (!opt->enable_twophase))
> + ereport(ERROR,
> + (errmsg("Output plugin does not support 
> two-phase decoding, but "
> + "registered filter_prepared 
> callback.")));
> 
> I actually think that enable_twophase output plugin option is
> redundant. If plugin author wants 2PC, he just provides
> filter_prepare_cb callback and potentially others.

+1

> I also don't see much
> value in checking that exactly 0 or 3 callbacks were registred.
> 

I think that check makes sense, if you support 2pc you need to register
all callbacks.

> 
> Nitpicking:
> 
> First patch: I still don't think that these flags need a bitmask.

Since we are discussing this, I personally prefer the bitmask here.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

pg_basebackup, walreceiver and wal_sender_timeout

2019-01-25 Thread Nick B

Hi, hackers.

When running pg_basebackup with -X s with network file system as target we
would consistently get "could not receive data from WAL stream: server
closed the connection unexpectedly".

On server we see this error firing: "terminating walsender process due to
replication timeout"
The problem occurs during a network or file system acting very slow. One
example of such case looks like this (strace output for fsync calls):

0.033383 fsync(8)  = 0 <20.265275>
20.265399 fsync(8) = 0 <0.11>
0.022892 fsync(7)  = 0 <48.350566>
48.350654 fsync(7) = 0 <0.05>
0.000674 fsync(8)  = 0 <0.851536>
0.851619 fsync(8)  = 0 <0.07>
0.67 fsync(7)  = 0 <0.06>
0.45 fsync(7)  = 0 <0.05>
0.031733 fsync(8)  = 0 <0.826957>
0.827869 fsync(8)  = 0 <0.16>
0.005344 fsync(7)  = 0 <1.437103>
1.446450 fsync(6)  = 0 <0.063148>
0.063246 fsync(6)  = 0 <0.06>
0.000381 +++ exited with 1 +++

So the longest fsync call took 48 seconds, but how would that result in a
termination if wal_sender_timeout is (default) 60 seconds?

The problem is in the way wal_sender handles this timeout:

/*
* If half of wal_sender_timeout has lapsed without receiving any reply
* from the standby, send a keep-alive message to the standby requesting
* an immediate reply.
*/

Obviously the receiver cannot respond immediately while in a syscall.

This begs a question, why is the GUC handled the way it is? What would be
the correct way to solve this? Shall we change the behaviour or do a better
job explaining what are implications of wal_sender_timeout in the docs?

Regards,
Nick.

Re: [HACKERS] logical decoding of two-phase transactions

2019-01-25 Thread Petr Jelinek

Hi,

I think the difference between abort and abort prepared should be
explained better (I am not quite sure I get it myself).

> +  The required abort_cb callback is called whenever

Also, why is this one required when all the 2pc stuff is optional?

> +static void
> +DecodePrepare(LogicalDecodingContext *ctx, XLogRecordBuffer *buf,
> +   xl_xact_parsed_prepare * parsed)
> +{
> + XLogRecPtr  origin_lsn = parsed->origin_lsn;
> + TimestampTz commit_time = parsed->origin_timestamp;
> + XLogRecPtr  origin_id = XLogRecGetOrigin(buf->record);
> + TransactionId xid = parsed->twophase_xid;
> + boolskip;
> +
> + Assert(parsed->dbId != InvalidOid);
> + Assert(TransactionIdIsValid(parsed->twophase_xid));
> +
> + /* Whether or not this PREPARE needs to be skipped. */
> + skip = DecodeEndOfTxn(ctx, buf, parsed, xid);
> +
> + FinalizeTxnDecoding(ctx, buf, parsed, xid, skip);

Given that DecodeEndOfTxn calls SnapBuildCommitTxn, won't this make the
catalog changes done by prepared transaction visible to other
transactions (which is undesirable as they should only be visible after
it's committed) ?

> + if (unlikely(TransactionIdIsValid(CheckXidAlive) &&
> + !(IsCatalogRelation(scan->rs_rd) ||
> +   RelationIsUsedAsCatalogTable(scan->rs_rd
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_TRANSACTION_STATE),
> +  errmsg("improper heap_getnext call")));
> +
I think we should log the relation oid as well so that plugin developers
have easier time debugging this (for all variants of this).


-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services

Re: House style for DocBook documentation?

2019-01-25 Thread Peter Eisentraut

On 19/01/2019 21:24, Chapman Flack wrote:
> (thinks to self half-seriously about an XSL transform for generating
> printed output that could preserve link-texted links, add raised numbers,
> and produce a numbered URLs section at the back)

External links already create footnotes in the PDF output.  Is that
different from what you are saying?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-25 Thread Jesper Pedersen


Hi Kirk,

On 1/24/19 9:31 PM, Jamison, Kirk wrote:

According to CF app, this patch needs review so I took a look at it.
Currently, your patch applies and builds cleanly, and all tests are also 
successful
based from the CF bot patch tester.

I'm not sure if I have understood the argument raised by Peter correctly.
Quoting Peter, "it's not clear that pg_upgrade and vacuumdb are bound the same way, 
so it's not a given that the same -j number should be used."
I think it's whether the # jobs for pg_upgrade is used the same way for 
parallel vacuum.

According to the official docs, the recommended setting for pg_upgrade --j 
option is the maximum of the number of CPU cores and tablespaces. [1]
As for vacuumdb, parallel vacuum's (-j) recommended setting is based from your 
max_connections [2], which is the max # of concurrent connections to your db 
server.



Thanks for your feedback !

As per Peter's comments I have changed the patch (v2) to not pass down 
the -j option to vacuumdb.


Only an update to the documentation and console output is made in order 
to make it more clear.


Best regards,
 Jesper

Re: Alternative to \copy in psql modelled after \g

2019-01-25 Thread Daniel Verite

Fabien COELHO wrote:

> Sure. As there are several bugs (doubtful features) uncovered, a first 
> patch could fix "COPY ...TO STDOUT \g file", but probably replicate ERROR 
> current behavior however debatable it is (i.e. your patch without the 
> ERROR change, which is unrelated to the bug being fixed), and then another 
> patch should fix/modify the behavior around ERROR (everywhere and 
> consistently), and probably IMHO add an SQL_ERROR.

It's not as if the patch issued an explicit call SetVariable("ERROR", "true")
that could be commented, or something like that. The assignment
of the variable happens as a consequence of patched code that aims at
being correct in its error handling.
So I'm for leaving this decision to a maintainer, because I don't agree
with your conclusion that the current patch should be changed in
that regard.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: Almost bug in COPY FROM processing of GB18030 encoded input

2019-01-25 Thread Heikki Linnakangas


On 24/01/2019 23:27, Robert Haas wrote:

On Wed, Jan 23, 2019 at 6:23 AM Heikki Linnakangas  wrote:

I happened to notice that when CopyReadLineText() calls mblen(), it
passes only the first byte of the multi-byte characters. However,
pg_gb18030_mblen() looks at the first and the second byte.
CopyReadLineText() always passes \0 as the second byte, so
pg_gb18030_mblen() will incorrectly report the length of 4-byte encoded
characters as 2.

It works out fine, though, because the second half of the 4-byte encoded
character always looks like another 2-byte encoded character, in
GB18030. CopyReadLineText() is looking for delimiter and escape
characters and newlines, and only single-byte characters are supported
for those, so treating a 4-byte character as two 2-byte characters is
harmless.


Yikes.


Committed the comment changes, so it's less of a gotcha now.

- Heikki

Re: backslash-dot quoting in COPY CSV

2019-01-25 Thread Daniel Verite

Bruce Momjian wrote:

> but I am able to see the failure using STDIN:
> 
> COPY test FROM STDIN CSV;
> Enter data to be copied followed by a newline.
> End with a backslash and a period on a line by itself, or an EOF
> signal.
> "foo
> \.
> ERROR:  unterminated CSV quoted field
> CONTEXT:  COPY test, line 1: ""foo
> 
> This seems like a bug to me.  Looking at the code, psql issues the
> prompts for STDIN, but when it sees \. alone on a line, it has no idea
> you are in a quoted CSV string, so it thinks the copy is done and sends
> the result to the server.  I can't see an easy way to fix this.  I guess
> we could document it.

Thanks for looking into this.  

\copy from file with csv is also affected since it uses COPY FROM
STDIN behind the scene. The case of embedded data looks more worrying
because psql will execute the data following \. as if they were
SQL statements.

ISTM that only ON_ERROR_STOP=on prevents the risk of SQL injection
in that scenario, but it's off by default.

What about this idea: when psql is feeding COPY data from its command
stream and an error occurs, it should act as if ON_ERROR_STOP was "on"
even if it's not.

Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-25 Thread Peter Eisentraut

On 25/01/2019 11:28, Michael Paquier wrote:
> based on that linking the value used by pg_upgrade and vacuumdb is a
> bad concept in my opinion, and the patch should be rejected.  More
> documentation on pg_upgrade side to explain that a bit better could be
> a good idea though, as it is perfectly possible to use your own
> post-upgrade script or rewrite partially the generated one.

Right.  pg_upgrade doesn't actually call vacuumdb.  It creates a script
that you may use.  The script itself contains a comment that says, if
you want to do this as fast as possible, don't use this script.  That
comment could be enhanced to suggest the use of the -j option.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2019-01-25 Thread Peter Eisentraut

On 25/01/2019 09:52, Takashi Menjo wrote:
> Heikki Linnakangas wrote:
>> To re-iterate what I said earlier in this thread, I think the next step 
>> here is to write a patch that modifies xlog.c to use plain old 
>> mmap()/msync() to memory-map the WAL files, to replace the WAL buffers.
> Sorry but my new patchset still uses PMDK, because PMDK is supported on
> Linux 
> _and Windows_, and I think someone may want to test this patchset on
> Windows...

When you manage the WAL (or perhaps in the future relation files)
through PMDK, is there still a file system view of it somewhere, for
browsing, debugging, and for monitoring tools?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

postgres_fdw: estimate_path_cost_size fails to re-use cached costs

2019-01-25 Thread Etsuro Fujita

Hi,

I noticed yet another thing while updating the patch for pushing down
ORDER BY LIMIT.  Let me explain.  When costing foreign paths on the
basis of local statistics, we calculate/cache the costs of an unsorted
foreign path, and re-use them to estimate the costs of presorted foreign
paths, as shown below.  BUT: we fail to re-use them for some typical
queries, such as "select * from ft1 order by a", due to
fpinfo->rel_startup_cost=0, leading to doing the same cost calculation
repeatedly.

/*
 * We will come here again and again with different set of pathkeys
 * that caller wants to cost. We don't need to calculate the cost of
 * bare scan each time. Instead, use the costs if we have cached
them
 * already.
 */
if (fpinfo->rel_startup_cost > 0 && fpinfo->rel_total_cost > 0)
{
startup_cost = fpinfo->rel_startup_cost;
run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
}

I think we should use "fpinfo->rel_startup_cost >= 0" here, not
"fpinfo->rel_startup_cost > 0".  Also, it would be possible that the
total cost calculated is zero in corner cases (eg, seq_page_cost=0 and
cpu_tuple_cost=0 for the example), so I think we should change the total
cost part as well.  Attached is a patch for that.

Best regards,
Etsuro Fujita
*** a/contrib/postgres_fdw/postgres_fdw.c
--- b/contrib/postgres_fdw/postgres_fdw.c
***
*** 2625,2631  estimate_path_cost_size(PlannerInfo *root,
  		 * bare scan each time. Instead, use the costs if we have cached them
  		 * already.
  		 */
! 		if (fpinfo->rel_startup_cost > 0 && fpinfo->rel_total_cost > 0)
  		{
  			startup_cost = fpinfo->rel_startup_cost;
  			run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;
--- 2625,2631 
  		 * bare scan each time. Instead, use the costs if we have cached them
  		 * already.
  		 */
! 		if (fpinfo->rel_startup_cost >= 0 && fpinfo->rel_total_cost >= 0)
  		{
  			startup_cost = fpinfo->rel_startup_cost;
  			run_cost = fpinfo->rel_total_cost - fpinfo->rel_startup_cost;

Re: Tab completion for ALTER INDEX|TABLE ALTER COLUMN SET STATISTICS

2019-01-25 Thread Peter Eisentraut

On 26/12/2018 07:07, Tatsuro Yamada wrote:
> +#define Query_for_list_of_attribute_numbers \
> +"SELECT attnum "\
> +"  FROM pg_catalog.pg_attribute a, "\
> +"   pg_catalog.pg_class c "\
> +" WHERE c.oid = a.attrelid "\
> +"   AND a.attnum > 0 "\
> +"   AND NOT a.attisdropped "\
> +"   /* %d %s */" \
> +"   AND a.attrelid = (select oid from pg_catalog.pg_class where relname = 
> '%s') "\
> +"   AND pg_catalog.pg_table_is_visible(c.oid) "\
> +"order by a.attnum asc "

This needs a bit of refinement.  You need to handle quoted index names
(see nearby Query_for_list_of_attributes), and you should also complete
partial numbers (e.g., if I type 1 then complete 10, 11, ... if
appropriate).

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Implicit timezone conversion replicating from timestamp to timestamptz?

2019-01-25 Thread Peter Eisentraut

On 24/01/2019 21:57, Jeremy Finzel wrote:
> The source system has a timestamp stored "at time zone UTC", like this
> for 6:00pm Chicago time:
> 2019-01-24 20:00:00.00
> 
> I was *very surprised* to find that replicating above timestamp to
> timestamptz actually does so correctly, showing this value in my psql
> client on the subscriber:
> 2019-01-24 14:00:00.00-06
> 
> How does it know/why does it assume it knows that the time zone of the
> timestamp data type is UTC on the provider given that my clusters are at
> America/Chicago?

This only works by accident in pglogical because the binary
representations of both types are compatible in this sense.  You're not
really supposed to do that.

> I would have actually expected an incorrect conversion
> of the data unless I set the timezone to UTC on the way in on the
> subscriber via a trigger.
> 
> That is, I was expecting to see this:
> 2019-01-24 20:00:00.00-06

This is what you get in built-in logical replication, because it goes
via textual representation.

To make it do what you actually wanted, set the time zone for the
subscription worker to UTC.  The way to do that (could be easier) is to
create a separate user, use ALTER USER SET timezone = 'UTC', and create
the subscription as that user.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: using expression syntax for partition bounds

2019-01-25 Thread Peter Eisentraut

On 24/01/2019 13:57, Amit Langote wrote:
> The if (contain_var_clause(value)) block is new code, but I agree its
> ereport should have parser_errposition just like other ereports in that
> function.  Fixed that in the attached.

committed

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: pg_upgrade: Pass -j down to vacuumdb

2019-01-25 Thread Michael Paquier

On Fri, Jan 25, 2019 at 02:31:57AM +, Jamison, Kirk wrote:
> I'm not sure if I have understood the argument raised by Peter
> correctly.  Quoting Peter, "it's not clear that pg_upgrade and
> vacuumdb are bound the same way, so it's not a given that the same
> -j number should be used."  I think it's whether the # jobs for
> pg_upgrade is used the same way for parallel vacuum.

vacuumdb and pg_upgrade are designed for different purposes and have
different properties, hence using a value of -j for one does not
necessarily mean that the same value should be used for the other.
An ANALYZE is nice because it is non-intrusive for a live production
system, and if you begin to pass down a -j argument you may finish by
eating more resources that would have been necessary for the same
job.  For some users perhaps that matters, for some it does not.  So
based on that linking the value used by pg_upgrade and vacuumdb is a
bad concept in my opinion, and the patch should be rejected.  More
documentation on pg_upgrade side to explain that a bit better could be
a good idea though, as it is perfectly possible to use your own
post-upgrade script or rewrite partially the generated one.
--
Michael


signature.asc
Description: PGP signature

Re: Simplify set of flags used by MyXactFlags

2019-01-25 Thread Michael Paquier

On Thu, Jan 24, 2019 at 08:56:05AM -0300, Alvaro Herrera wrote:
> Uh, I didn't think it was necessary nor desirable to rename the flag,
> only the user-visible message.

Oh, OK.  I have overstated your comment then.  Are you fine with the
attached instead?  The flag name remains the same, and the comment is
updated.
--
Michael
From 2b9bb74106800b9c9b3fc95fe4f141858abbde02 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Fri, 25 Jan 2019 19:18:39 +0900
Subject: [PATCH] Simplify 2PC restriction handling for temporary objects

There are two flags used to control access to temporary tables and
access to the temporary namespace of a session, however the first
control flag is actually a concept included in the second.  This removes
the flag for temporary table tracking, keeping around only the one at
namespace level.
---
 src/backend/access/common/relation.c  |  4 ++--
 src/backend/access/transam/xact.c | 21 ++
 src/backend/commands/lockcmds.c   |  2 +-
 src/backend/commands/tablecmds.c  |  2 +-
 src/include/access/xact.h | 12 +++---
 .../expected/test_extensions.out  |  2 +-
 src/test/regress/expected/temp.out| 22 +--
 7 files changed, 26 insertions(+), 39 deletions(-)

diff --git a/src/backend/access/common/relation.c b/src/backend/access/common/relation.c
index beec34f126..41a0051f88 100644
--- a/src/backend/access/common/relation.c
+++ b/src/backend/access/common/relation.c
@@ -71,7 +71,7 @@ relation_open(Oid relationId, LOCKMODE lockmode)
 
 	/* Make note that we've accessed a temporary relation */
 	if (RelationUsesLocalBuffers(r))
-		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
 
 	pgstat_initstats(r);
 
@@ -121,7 +121,7 @@ try_relation_open(Oid relationId, LOCKMODE lockmode)
 
 	/* Make note that we've accessed a temporary relation */
 	if (RelationUsesLocalBuffers(r))
-		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
 
 	pgstat_initstats(r);
 
diff --git a/src/backend/access/transam/xact.c b/src/backend/access/transam/xact.c
index 7c3a9c1e89..38434a199e 100644
--- a/src/backend/access/transam/xact.c
+++ b/src/backend/access/transam/xact.c
@@ -2259,13 +2259,18 @@ PrepareTransaction(void)
 	/* NOTIFY will be handled below */
 
 	/*
-	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary table in
+	 * Don't allow PREPARE TRANSACTION if we've accessed a temporary object in
 	 * this transaction.  Having the prepared xact hold locks on another
 	 * backend's temp table seems a bad idea --- for instance it would prevent
 	 * the backend from exiting.  There are other problems too, such as how to
 	 * clean up the source backend's local buffers and ON COMMIT state if the
 	 * prepared xact includes a DROP of a temp table.
 	 *
+	 * Other objects types, like functions, operators or extensions, share the
+	 * same restriction as they should not be created, locked or dropped as
+	 * this can mess up with this session or even a follow-up session trying
+	 * to use the same temporary namespace.
+	 *
 	 * We must check this after executing any ON COMMIT actions, because they
 	 * might still access a temp relation.
 	 *
@@ -2273,22 +2278,10 @@ PrepareTransaction(void)
 	 * cases, such as a temp table created and dropped all within the
 	 * transaction.  That seems to require much more bookkeeping though.
 	 */
-	if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPREL))
-		ereport(ERROR,
-(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has operated on temporary tables")));
-
-	/*
-	 * Similarly, PREPARE TRANSACTION is not allowed if the temporary
-	 * namespace has been involved in this transaction as we cannot allow it
-	 * to create, lock, or even drop objects within the temporary namespace
-	 * as this can mess up with this session or even a follow-up session
-	 * trying to use the same temporary namespace.
-	 */
 	if ((MyXactFlags & XACT_FLAGS_ACCESSEDTEMPNAMESPACE))
 		ereport(ERROR,
 (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
- errmsg("cannot PREPARE a transaction that has operated on temporary namespace")));
+ errmsg("cannot PREPARE a transaction that has operated on temporary objects")));
 
 	/*
 	 * Likewise, don't allow PREPARE after pg_export_snapshot.  This could be
diff --git a/src/backend/commands/lockcmds.c b/src/backend/commands/lockcmds.c
index f4da564e01..43bba717f2 100644
--- a/src/backend/commands/lockcmds.c
+++ b/src/backend/commands/lockcmds.c
@@ -108,7 +108,7 @@ RangeVarCallbackForLockTable(const RangeVar *rv, Oid relid, Oid oldrelid,
 	 */
 	relpersistence = get_rel_persistence(relid);
 	if (relpersistence == RELPERSISTENCE_TEMP)
-		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPREL;
+		MyXactFlags |= XACT_FLAGS_ACCESSEDTEMPNAMESPACE;
 
 	/* Check permissions. */
 	aclresult = LockTableAclCheck(relid, lockmode,

RE: [HACKERS][PATCH] Applying PMDK to WAL operations for persistent memory

2019-01-25 Thread Takashi Menjo

Hello,

On behalf of Yoshimi, I rebased the patchset onto the latest master
(e3565fd6).
Please see the attachment. It also includes an additional bug fix (in patch
0002) 
about temporary filename.

Note that PMDK 1.4.2+ supports MAP_SYNC and MAP_SHARED_VALIDATE flags, 
so please use a new version of PMDK when you test. The latest version is
1.5.

Heikki Linnakangas wrote:
> To re-iterate what I said earlier in this thread, I think the next step 
> here is to write a patch that modifies xlog.c to use plain old 
> mmap()/msync() to memory-map the WAL files, to replace the WAL buffers.

Sorry but my new patchset still uses PMDK, because PMDK is supported on
Linux 
_and Windows_, and I think someone may want to test this patchset on
Windows...

Regards,
Takashi

-- 
Takashi Menjo - NTT Software Innovation Center

0001-Add-configure-option-for-PMDK-v2.patch
Description: Binary data

0002-Read-write-WAL-files-using-PMDK-v2.patch
Description: Binary data

0003-Walreceiver-WAL-IO-using-PMDK-v2.patch
Description: Binary data

Re: proposal - plpgsql unique statement id

2019-01-25 Thread Pavel Stehule

čt 24. 1. 2019 v 23:08 odesílatel Tom Lane  napsal:

> Peter Eisentraut  writes:
> > committed
>
> Why didn't this patch modify the dumping logic in pl_funcs.c to print
> the IDs?  I'm not aware of other cases where we intentionally omit
> fields from debug-support printouts.
>

Currently we don't print lineno, what is maybe for user more important
information.

I looked to the code, and now I am thinking so it is little bit harder,
than I expected. Any new information can break output formatting

static void
dump_loop(PLpgSQL_stmt_loop *stmt)
{
dump_ind();
printf("LOOP\n");

dump_stmts(stmt->body);

dump_ind();
printf("ENDLOOP\n");
}

can looks like

static void
dump_loop(PLpgSQL_stmt_loop *stmt, int stmtid_width)
{
dump_ind();
printf("%*d LOOP\n", stmtid_width, stmt->stmtid);

dump_stmts(stmt->body);

dump_ind();
printf(" ENDLOOP\n");
}

It is some what do you expect ?

Regards

Maybe more simple


static void
dump_loop(PLpgSQL_stmt_loop *stmt, int stmtid_width)
{
dump_ind();
printf("LOOP {%d}\n",stmt->stmtid);

dump_stmts(stmt->body);

dump_ind();
printf(" ENDLOOP\n");
}

Pavel



> regards, tom lane
>

RE: Protect syscache from bloating with negative cache entries

2019-01-25 Thread Tsunakawa, Takayuki

Hi Horiguchi-san, Bruce,

From: Bruce Momjian [mailto:br...@momjian.us]
> I suggest you go with just syscache_prune_min_age, get that into PG 12,
> and we can then reevaluate what we need.  If you want to hard-code a
> minimum cache size where no pruning will happen, maybe based on the system
> catalogs or typical load, that is fine.

Please forgive me if I say something silly (I might have got lost.)

Are you suggesting to make the cache size limit system-defined and 
uncontrollable by the user?  I think it's necessary for the DBA to be able to 
control the cache memory amount.  Otherwise, if many concurrent connections 
access many partitions within a not-so-long duration, then the cache eviction 
can't catch up and ends up in OOM.  How about the following questions I asked 
in my previous mail?

--
This is a pure question.  How can we answer these questions from users?

* What value can I set to cache_memory_target when I can use 10 GB for the 
caches and max_connections = 100?
* How much RAM do I need to have for the caches when I set cache_memory_target 
= 1M?

The user tends to estimate memory to avoid OOM.
--


Regards
Takayuki Tsunakawa

71 matches

Mail list logo