Re: [HACKERS] Block level parallel vacuum

2018-11-24 Thread Amit Kapila
On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila  wrote:
> On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada  wrote:
> >
>
> I could see that you have put a lot of effort on this patch and still
> we are not able to make much progress mainly I guess because of
> relation extension lock problem.  I think we can park that problem for
> some time (as already we have invested quite some time on it), discuss
> a bit about actual parallel vacuum patch and then come back to it.
>

Today, I was reading this and previous related thread [1] and it seems
to me multiple people Andres [2], Simon [3] have pointed out that
parallelization for index portion is more valuable.  Also, some of the
results [4] indicate the same.  Now, when there are no indexes,
parallelizing heap scans also have benefit, but I think in practice we
will see more cases where the user wants to vacuum tables with
indexes.  So how about if we break this problem in the following way
where each piece give the benefit of its own:
(a) Parallelize index scans wherein the workers will be launched only
to vacuum indexes.  Only one worker per index will be spawned.
(b) Parallelize per-index vacuum.  Each index can be vacuumed by
multiple workers.
(c) Parallelize heap scans where multiple workers will scan the heap,
collect dead TIDs and then launch multiple workers for indexes.

I think if we break this problem into multiple patches, it will reduce
the scope of each patch and help us in making progress.   Now, it's
been more than 2 years that we are trying to solve this problem, but
still didn't make much progress.  I understand there are various
genuine reasons and all of that work will help us in solving all the
problems in this area.  How about if we first target problem (a) and
once we are done with that we can see which of (b) or (c) we want to
do first?


[1] - 
https://www.postgresql.org/message-id/CAD21AoD1xAqp4zK-Vi1cuY3feq2oO8HcpJiz32UDUfe0BE31Xw%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/20160823164836.naody2ht6cutioiz%40alap3.anarazel.de
[3] - 
https://www.postgresql.org/message-id/CANP8%2BjKWOw6AAorFOjdynxUKqs6XRReOcNy-VXRFFU_4bBT8ww%40mail.gmail.com
[4] - 
https://www.postgresql.org/message-id/CAGTBQpbU3R_VgyWk6jaD%3D6v-Wwrm8%2B6CbrzQxQocH0fmedWRkw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com



Re: POC for a function trust mechanism

2018-11-24 Thread Noah Misch
On Sun, Aug 12, 2018 at 10:40:30PM -0400, Robert Haas wrote:
> On Wed, Aug 8, 2018 at 1:15 PM, Tom Lane  wrote:
> > If we had, say, a catalog that provided the desired list of trusted roles
> > for every role, then we could imagine implementing that context change
> > automatically.  Likewise, stuff like autovacuum or REINDEX would want
> > to run with the table owner's list of trusted roles, but the GUC approach
> > doesn't really provide enough infrastructure to know what to do there.
> 
> Yeah, I think these are all good points.  It seems natural for trust
> to be a property of a role, for just the reasons that you mention.
> However, there does also seem to be a use case for varying it
> temporarily on a per-session or per-function basis, and I'm not
> exactly sure how to cater to those needs.

Yep.  A GUC is great for src/bin sessions, since many of those applications
consistently want tight trust settings.

> I wonder if Noah would like to rebase and post his version also.

Sure, attached (last merge at 757c518).  The owner_trustcheck() header comment
is a good starting point for reading it.  Tom Lane later asserted that it's
okay to perform the function trust check in fmgr_info_cxt_security() instead
of doing it in most fmgr_info() callers and some *FunctionCall* callers.
While I suspected he was right, I have not made that change in this rebase.
(I also haven't audited every fmgr_info() call added after -v1.)  When I
shared -v1 with the security team, I included the following submission notes:

===

Here's an implementation.  Design decisions:

1. A role trusts itself implicitly

2. Default pg_auth_trust entry for "public TRUST public"

This effectively disables the new restrictions; concerned administrators will
test their applications with it removed, then remove it.

3. Default pg_auth_trust entry for "public TRUST "

Almost everyone will leave this untouched.

4. Changing trust for a role requires ADMIN OPTION on the role.

Trust is the next step down from granting role membership.  ("ALTER ROLE
public TRUST ..." does require superuser.)

5. Non-inheritance of trust

Trust is like a reverse permission; I find it helpful to think of "ALTER ROLE
foo TRUST bar" as "GRANT may_supply_functions_to_foo TO bar".  It would be
reasonable to have that trust relationship be effective for INHERITS members
of bar; after all, they could always "ALTER FUNCTION ... OWNER TO bar".  For
now, trust relationships are not subject to inheritance.  This keeps things
simpler to understand and poses no major downsides.  For similar reasons, I
have not made a role trust its own members implicitly.

6. Non-transitivity of trust

If I trust only alice, alice trusts only bob, and I call alice's function that
calls bob's function, should the call succeed?  This is a tough one.  In favor
of "yes", this choice would allow alice to refactor her function without
needing her callers to revise their trust settings.  That is, she could switch
from bob's function to carol's function, and her callers would never notice.
My trust in alice is probably misplaced if she chooses her friends badly.

In favor of "no", making the trust relationship transitive seems to mix two
otherwise-separate concepts: alice's willingness to run code as herself and
alice's willingness to certify third-party functions to her friends.  In
particular, current defects in the system are at odds with conflating those
concepts.  Everyone should trust the bootstrap superuser, but it currently
owns functions like integer_pl_date that are not careful in what they call.
That closed the issue for me; trust is not transitive by default.  Even so, I
think there would be value in a facility for requesting trust transitivity in
specific situations.

7. (Lack of) negative trust entries

There's no way to opt out of a PUBLIC trust relationship; until a superuser
removes the "public TRUST public" entry, no user can achieve anything with
ALTER USER [NO] TRUST.  I considered adding the concept of a negative trust
relationship to remedy this problem; it could also be used to remove the
implicit self-trust for testing purposes.  I have refrained; I don't know
whether the benefits would pay for the extra user-facing complexity.

8. Applicable roles during trust checks

We had examined whether the check could be looser for SECURITY DEFINER
functions, since those can't perform arbitrary actions as the caller.  I
described some challenges then, but a deeper look turned up more:  a SECURITY
DEFINER function can do things like "SET search_path = ..." that affect the
caller.  Consequently, I concluded that all roles on the active call stack
should have a stake in whether a particular function owner is permitted.  Each
trust check walks the call stack and checks every role represented there; if
any lacks the trust relationships to approve the newly-contemplated function
call, the call is rejected.

The stack traversal may and does stop at the edge of a security-restricted
operation; 

Re: tab-completion debug print

2018-11-24 Thread David Fetter
On Fri, Nov 23, 2018 at 04:32:31PM -0500, Tom Lane wrote:
> Kyotaro HORIGUCHI  writes:
> > I was reminded that I was often annoyed with identifying the code
> > that made a word-completion, by hearing the same complaint from a
> > collegue of mine just now.
> > Something like the attached that tweaks completion_matches calls
> > lets psql emit the line number where a word-completion
> > happens. The output can be split out using redirection so that it
> > doesn't break into the conversation on console.
> 
> > (make -s COPT=-DTABCOMPLETION_DEBUG install)
> > $ psql postgres 2>~debug.out
> > =# alt[tab]er [tab]t[tab]ab[tab] [tab]
> 
> > You can see the following output in another bash session.
> > $ tail -f ~/debug.out
> > [1414][1435][1435][1435][1431]
> > Every number enclosed by brackets is the line number in
> > tab-complete.c, where completion happens.
> 
> > Is this useful? Any suggestions, thoughts?
> 
> Hm.  I can see the value of instrumenting tab-complete when you're trying
> to debug why it did something, but this output format seems pretty terse
> and unreadable.  Can we get it to print the completion text as well?
> I'm imagining something more like
> 
> 1414: "er "
> 1435: ""
> 1435: "ab"
> 1435: ""
> 1431: ""
> 
> Perhaps there's room as well to print the context that the match looked
> at:
> 
> 1414: "alt" -> "er "
> 1435: "alter " -> ""
> 1435: "alter t" -> "ab"
> 
> etc.
> 
>   regards, tom lane

Is this something along the lines of what you had in mind?

Best,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
>From 712244a029c0f30fe1195a1400aee85541aa4b6b Mon Sep 17 00:00:00 2001
In-Reply-To: <1227.1543008...@sss.pgh.pa.us>
References: <1227.1543008...@sss.pgh.pa.us>
From: David Fetter 
Date: Fri, 23 Nov 2018 14:55:17 -0800
Subject: [PATCH] v0002 Surface tab completions for debugging

To access:
make -s COPT=-DTABCOMPLETION_DEBUG
---
 src/bin/psql/tab-complete.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 9dbd555166..8b81c90d19 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -60,6 +60,32 @@ extern char *filename_completion_function();
 #define completion_matches rl_completion_matches
 #endif
 
+/*
+ * By enabling the following definition the source line number is emitted to
+ * stderr for every completion attempt. You can isolate them from console
+ * interaction by redirecting stderr into a file.
+ */
+#ifdef TABCOMPLETION_DEBUG
+#ifdef HAVE_RL_COMPLETION_MATCHES
+#define org_completion_matches rl_completion_matches
+#else
+#define org_completion_matches completion_matches
+#endif
+
+static char **completion_debug(int line, const char *text, char **list)
+{
+	fprintf(stderr, "[%d: (%s", line, text);
+	for (int i = 0; list && list[i]; ++i)
+		fprintf(stderr, ", %s", list[i]);
+	fprintf(stderr, ")]\n");
+	return list;
+}
+
+#undef completion_matches
+#define completion_matches(text, func) \
+	completion_debug(__LINE__, (text), org_completion_matches((text),(func)))
+#endif
+
 /* word break characters */
 #define WORD_BREAKS		"\t\n@$><=;|&{() "
 
-- 
2.19.1



Re: pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.

2018-11-24 Thread Thomas Munro
On Sun, Nov 25, 2018 at 3:38 AM Christoph Berg  wrote:
> Re: Thomas Munro 2018-11-23 
> > Add WL_EXIT_ON_PM_DEATH pseudo-event.
>
> I think this broke something:
>
> TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || 
> (wakeEvents & (1 << 4)))«, Datei: 
> »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«,
>  Zeile: 389)
> 2018-11-24 15:20:43.193 CET [17834] LOG:  Serverprozess (PID 18425) wurde von 
> Signal 6 beendet: Aborted
>
> I can trigger it just by opening an ssl connection, non-ssl tcp
> connections are fine.

Thanks.  I was initially surprised that this didn't come up in
check-world, but I see now that I need to go and add
PG_TEST_EXTRA="ssl ldap" to my testing routine (and cfbot's).
Reproduced here, and it's a case where we were not handling postmaster
death, which exactly what this assertion was designed to find.  The
following is one way to fix the assertion failure, though I'm not sure
if it would be better to request WL_POSTMASTER_DEATH and generate a
FATAL error like secure_read() does:

--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -406,9 +406,9 @@ aloop:
 * StartupPacketTimeoutHandler() which
directly exits.
 */
if (err == SSL_ERROR_WANT_READ)
-   waitfor = WL_SOCKET_READABLE;
+   waitfor = WL_SOCKET_READABLE |
WL_EXIT_ON_PM_DEATH;
else
-   waitfor = WL_SOCKET_WRITEABLE;
+   waitfor = WL_SOCKET_WRITEABLE
| WL_EXIT_ON_PM_DEATH;

-- 
Thomas Munro
http://www.enterprisedb.com



Re: Centralize use of PG_INTXX_MIN/MAX for integer limits

2018-11-24 Thread Andrew Gierth
> "Michael" == Michael Paquier  writes:

 Michael> Hi all,

 Michael> A couple of years ago, 62e2a8dc has introduced in c.h a set of
 Michael> limits (to fix some portability issues from 83ff1618) to make
 Michael> the code more system-independent. Those are for example
 Michael> PG_INT32_MIN, etc. The core code now mixes the internal PG_
 Michael> limits with the system ones. Would we want to unify a bit the
 Michael> whole thing and replace all the SHRT_MIN/MAX, LONG_MIN/MAX and
 Michael> such with the internal limit definitions?

Of course not. And LONG_MIN/MAX is the obvious example of why not, since
that one does vary between platforms.

INT_MAX is for the max value of an "int". PG_INT32_MAX is for the max
value of an "int32". PG_INT64_MAX is for the max value of an "int64".
LONG_MAX is for the max value of a "long". Simple.

However, as I said at the time of those patches, I did not at that stage
audit all the uses of INT_MIN/MAX to determine which should really have
been INT32_MIN/MAX. Currently, all sorts of things will likely break if
"int" and "int32" are not exactly the same size, but it might still be a
good idea to fix that at some point. (As a recent example,
contrib/intarray uses "int" almost universally, regardless of the fact
that it's meant to be dealing with SQL "integer" values, which should be
int32.)

-- 
Andrew (irc:RhodiumToad)



Re: Desirability of client-side expressions in psql?

2018-11-24 Thread Corey Huinker
>
> >>psql> \if :i >= 5
> >>
> > I think we're ok with that so long as none of the operators or values
> has a
> > \ in it.
> > What barriers do you see to re-using the pgbench grammar?
>
> The pgbench expression grammar mimics SQL expression grammar,
> on integers, floats, booleans & NULL.
>
> I'm unsure about some special cases in psql (`shell command`,
> 'text' "identifier"). They can be forbidden on a new commande (\let),
> but what happens on "\if ..." which I am afraid allows them is unclear.
>
> --
> Fabien.
>

(raising this thread from hibernation now that I have the bandwidth)

It seems like the big barriers to just using pgbench syntax are:
  - the ability to indicate that the next thing to follow will be a pgbench
expression
  - a way to coax pgbench truth-y values into psql truthy values (t/f, y/n,
1/0)

For that, I see a few ways forward:

1. A suffix on \if, \elif, -exp suffix (or even just -x) to existing
commands to indicate that a pgbench expression would follow
This would look something like
\ifx \elifx \setx
\if$ \elif$ \set$

2. A command-line-esque switch or other sigil to indicate that what follows
is a pgbench expression with psql vars to interpolate
Example:
\set foo -x 1 + 4
\set foo \expr 1 + 4
\if -x :limit > 10
\if \expr :limit > 10

3. A global toggle to indicate which mode should be used by \if, \elif, and
\set
Example:
 \pset expressions [on | off]

4. A combination of #2 and #3 with a corresponding switch/sigil to indicate
"do not evaluate pgbench-style
   This is particularly appealing to me because it would allow code
snippets from pgbench to be used without modification, while still allowing
the user to mix-in old/new style to an existing script.

5. A special variant of `command` where variables are interpolated before
being sent to the OS, and allow that on \if, \elif
\set foo ``expr :y + :z``
\set foo $( expr :y + :z )
\if ``expr :limit > 10``
\if $( expr :limit > 10 )

This also has some appeal because it allows for a great amount of
flexibility, but obviously constrains us with OS-dependencies. The user
might have a hard time sending commands with ')' in them if we go the $( )
route

6. Option #5, but we add an additional executable (suggested name: pgexpr)
to the client libs, which encapsulates the pgbench expression library as a
way around OS-dependent code.

7. I believe someone suggested introducing the :{! pgbench-command} or :{{
pgbench-command }} var-mode
\set foo :{! :y + :z }
\set foo :{{ :y + :z }}
\if :{! :limit > 10 }
\if :{{ :limit > 10 }}

This has some appeal as well, though I prefer the {{...}}  syntax
because "!" looks like negation, and {{ resembles the [[ x + y ]] syntax in
bash

One nice thing is that most of these options are not mutually exclusive.

Thoughts?


Re: RHEL 8.0 build

2018-11-24 Thread Tom Lane
Jeremy Harris  writes:
>  Trying to set up a buildfarm animal on a RHEL 8.0 (beta) system,
> the build fails early:
> ...
> It appears to be a "configure" script looking for python; and there is
> no such.  You can have python3 or python2 - but neither package install
> provides a symlink of just "python".

Yeah, some distros are starting to act that way, and I suspect it's
causing pain for a lot of people.

Currently we are agnostic about which python version to use, so if you
don't have anything simply named "python", you have to tell configure
what to use by setting the PYTHON environment variable.

In a buildfarm configuration file this would look something like

# env settings to pass to configure. These settings will only be seen by
# configure.
config_env => {
+   PYTHON => "/usr/local/bin/python3",

There's been some preliminary discussion about starting to default to
python3, but given this project's inherent conservatism, I don't expect
that to happen for some years yet.  In any case, whenever we do pull
that trigger we'd surely do so only in HEAD not released branches, so
buildfarm owners will need to deal with the case for years more.

regards, tom lane



Re: Regarding performance regression on specific query

2018-11-24 Thread Tom Lane
Amit Langote  writes:
> On 2018/11/20 2:49, Jung, Jinho wrote:
>>  [ assorted queries ]

> I noticed that these two are fixed by running ANALYZE in the database in
> which these queries are run.

That didn't help much for me.  What did help was increasing
join_collapse_limit and from_collapse_limit to not limit the
join search space --- on queries with as many input relations
as these, you're really at the mercy of whether the given query
structure represents a good join order if you don't.

In general I can't get even a little bit excited about the quality of the
plans selected for these examples, as they all involve made-up restriction
and join clauses that the planner isn't going to have the slightest clue
about.  The observations boil down to "9.4 made one set of arbitrary plan
choices, while v10 made a different set of arbitrary plan choices, and on
these particular examples 9.4 got lucky and 10 didn't".

Possibly also worth noting is that running these in an empty database
is in itself kind of a worst case, because many of the tables are empty
to start with (or the restriction/join clauses pass no rows), and so
the fastest runtime tends to go to plans of the form "nestloop with
empty relation on the outside and all the expensive stuff on the
inside".  (Observe all the "(never executed)" notations in the EXPLAIN
output.)  This kind of plan wins only when the outer rel is actually
empty, otherwise it can easily lose big, and therefore PG's planner is
intentionally designed to discount the case entirely.  We never believe
that a relation is empty, unless we can mathematically prove that, and
our cost estimates are never made with an eye to exploiting such cases.
This contributes a lot to the random-chance nature of which plan is
actually fastest; the planner isn't expecting "(never executed)" to
happen and doesn't prefer plans that will win if it does happen.

regards, tom lane



RHEL 8.0 build

2018-11-24 Thread Jeremy Harris
Hi,

 Trying to set up a buildfarm animal on a RHEL 8.0 (beta) system,
the build fails early:

$ ./run_build.pl --nosend --nostatus --verbose
Sat Nov 24 20:09:28 2018: buildfarm run for CHANGEME:HEAD starting
CHANGEME:HEAD  [20:09:28] checking out source ...
CHANGEME:HEAD  [20:09:33] checking if build run needed ...
CHANGEME:HEAD  [20:09:34] copying source to pgsql.build ...
CHANGEME:HEAD  [20:09:35] running configure ...
Branch: HEAD
Stage Configure failed with status 1
$

It appears to be a "configure" script looking for python; and there is
no such.  You can have python3 or python2 - but neither package install
provides a symlink of just "python".


Obviously I could get going by manually adding one, but perhaps
other people would benefit from a neater fix.
-- 
Cheers,
  Jeremy



Re: jsonpath

2018-11-24 Thread Tomas Vondra
Hi,

I've done another round of reviews on v20, assuming the patch is almost
ready to commit, but unfortunately I ran into a bunch of issues that
need to be resolved. None of this is a huge issue, but it's also above
the threshold of what could be tweaked by a committer IMHO.

(Which brings the question who plans to commit this. The patch does not
have a committer in the CF app, but I see both Teodor and Alexander are
listed as it's authors, so I'd expect it to be one of those. Or I might
do that, of course.)


0001


1) to_timestamp() does this:

  do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false,
  , , , NULL);

Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the
function did before (well, it called text_to_cstring, but that does
VARDATA_ANY). The same thing applies to to_date(), BTW.

I also find it a bit inconvenient that we expand the fmt like this in
all do_to_timestamp() calls, although it's just to_datetime() that needs
to do it this way. I realize we can't change to_datetime() because it's
external API, but maybe we should make it construct a varlena and pass
it to do_to_timestamp().


2) We define both DCH_FF# and DCH_ff#, but we never ever use the
lower-case version. Heck, it's not mentioned even in DCH_keywords, which
does this:

  ...
  {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */
  ...
  {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE},  /* F */
  ...

Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and
"day". Also, the comment for "ff1" is wrong, it should be "f" I guess.

And of course, there are regression tests for FF# variants, but not the
lower-case ones.


3) Of course, these new formatting patterns should be added to SGML
docs, for example to "Template Patterns for Date/Time Formatting" in
func.sgml (and maybe to other places, I haven't looked for them very
thoroughly).


4) The last block in DCH_from_char() does this

while (*s == ' ')
s++;

but I suppose it should use isspace() just like the code immediately
before it.


5) I might be missing something, but why is there the "D" at the end of
the return flags from DCH_from_char?

/* Return flags for DCH_from_char() */
#define DCH_DATED0x01
#define DCH_TIMED0x02
#define DCH_ZONED0x04


0002


1) There are some unnecessary changes to to_datetime signature (date_txt
renamed to vs. datetxt), which is mostly minor but unnecessary churn.


2) There are some extra changes to to_datetime (extra parameters, etc.).
I wonder why those are not included in 0001, as part of the supporting
datetime infrastructure.


3) I'm not sure whether the _safe functions are a good or bad idea, but
at the very least the comments should be updated to explain what it does
(as the API has changed, obviously).


4) the json.c changes are under-documented, e.g. there are no comments
for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and
whether it needs to be specified all the time, half of the functions at
the end don't have comments (some of them are really simple, but then
other simple functions do have comments).

I don't know what the right balance here is (I'm certainly not asking
for bogus comments just to have comments) and I agree that the existing
code is not exactly abundant in comments. But perhaps having at least
some would be nice.

The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There
are pretty much no comments whatsoever (at least at the function level,
explaining what the functions do). It would be good to have a file-level
comment too, explaining how jsonpath works, probably.


5) I see uniqueifyJsonbObject now does this:

if (!object->val.object.uniquified)
return;

That seems somewhat strange, considering the function says it'll
uniqueify objects, but then exits when noticing the passed object is not
uniquified?


6) Why do we make make_result inline? (numeric.c) If this needs to be
marked with "inline" then perhaps all the _internal functions should be
marked like that too? I have my doubts about the need for this.


7) The other places add _safe to functions that don't throw errors
directly and instead update edata. Why are set_var_from_str, div_var,
mod_var excluded from this convention?


8) I wonder if the changes in numeric can have negative impact on
performance. Has anyone done any performance tests of this part?


0003


1) json_gin.c should probably add comments briefly explaining what
JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and
JsonPathExtractionContext are for.


2) I find it a bit suspicious that there are no asserts in json_gin.c
(well, there are 3 in the existing code, but nothing in the new code,
and the patch essentially doubles the amount of code here).


No comments for 0004 at this point.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, 

Re: [HACKERS] pgbench - allow to store select results into variables

2018-11-24 Thread Fabien COELHO




FWIW I think the terminology in this patch is wrong.  You use the term
"compound" to mean "one query within a string containing multiple
queries", but that's not what compound means.  Compound is the whole
thing, comprised of the multiple queries.


Indeed. Compounded query?


Maybe "query" is the right word to use there, not sure.


I do not understand, "query queries"?

I think it should avoid using sql-related words, such as "group", 
"aggregate", "merge", "join"...


I thought of "combined", meaning the queries are combined together in a 
single message at the protocol level.


Basically I'm ok with any better idea.

--
Fabien.



Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)

2018-11-24 Thread Fabien COELHO



Unfortunately, there was no activity over the last few commitfests and the
proposed patch pgbench-tap-progress-6 can't be applied anymore without
conflicts. Fabien, what are your plans about it, could you please post a
rebased version?


Here it is.

--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c64e16187a..4495395990 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -5736,6 +5736,96 @@ main(int argc, char **argv)
 	return exit_code;
 }
 
+/* display the progress report */
+static void
+doProgressReport(TState *thread, StatsData *plast, int64 *plast_report,
+ int64 now, int64 thread_start)
+{
+	StatsData	cur;
+	int64		run = now - *plast_report,
+ntx;
+	double		tps,
+total_run,
+latency,
+sqlat,
+lag,
+stdev;
+	char		tbuf[64];
+	int		i;
+
+	/*
+	 * Add up the statistics of all threads.
+	 *
+	 * XXX: No locking. There is no guarantee that we get an
+	 * atomic snapshot of the transaction count and latencies, so
+	 * these figures can well be off by a small amount. The
+	 * progress report's purpose is to give a quick overview of
+	 * how the test is going, so that shouldn't matter too much.
+	 * (If a read from a 64-bit integer is not atomic, you might
+	 * get a "torn" read and completely bogus latencies though!)
+	 */
+	initStats(, 0);
+	for (i = 0; i < nthreads; i++)
+	{
+		mergeSimpleStats(, [i].stats.latency);
+		mergeSimpleStats(, [i].stats.lag);
+		cur.cnt += thread[i].stats.cnt;
+		cur.skipped += thread[i].stats.skipped;
+	}
+
+	/* we count only actually executed transactions */
+	ntx = (cur.cnt - cur.skipped) - (plast->cnt - plast->skipped);
+	total_run = (now - thread_start) / 100.0;
+	tps = 100.0 * ntx / run;
+	if (ntx > 0)
+	{
+		latency = 0.001 * (cur.latency.sum - plast->latency.sum) / ntx;
+		sqlat = 1.0 * (cur.latency.sum2 - plast->latency.sum2) / ntx;
+		stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency);
+		lag = 0.001 * (cur.lag.sum - plast->lag.sum) / ntx;
+	}
+	else
+	{
+		latency = sqlat = stdev = lag = 0;
+	}
+
+	if (progress_timestamp)
+	{
+		/*
+		 * On some platforms the current system timestamp is
+		 * available in now_time, but rather than get entangled
+		 * with that, we just eat the cost of an extra syscall in
+		 * all cases.
+		 */
+		struct timeval tv;
+
+		gettimeofday(, NULL);
+		snprintf(tbuf, sizeof(tbuf), "%ld.%03ld s",
+ (long) tv.tv_sec, (long) (tv.tv_usec / 1000));
+	}
+	else
+	{
+		/* round seconds are expected, but the thread may be late */
+		snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run);
+	}
+
+	fprintf(stderr,
+			"progress: %s, %.1f tps, lat %.3f ms stddev %.3f",
+			tbuf, tps, latency, stdev);
+
+	if (throttle_delay)
+	{
+		fprintf(stderr, ", lag %.3f ms", lag);
+		if (latency_limit)
+			fprintf(stderr, ", " INT64_FORMAT " skipped",
+	cur.skipped - plast->skipped);
+	}
+	fprintf(stderr, "\n");
+
+	*plast = cur;
+	*plast_report = now;
+}
+
 static void *
 threadRun(void *arg)
 {
@@ -5746,6 +5836,7 @@ threadRun(void *arg)
 	int			nstate = thread->nstate;
 	int			remains = nstate;	/* number of remaining clients */
 	socket_set *sockets = alloc_socket_set(nstate);
+	int			aborted = 0;
 	int			i;
 
 	/* for reporting progress: */
@@ -5975,6 +6066,10 @@ threadRun(void *arg)
 			 */
 			if (st->state == CSTATE_FINISHED || st->state == CSTATE_ABORTED)
 remains--;
+
+			/* count aborted clients */
+			if (st->state == CSTATE_ABORTED)
+aborted++;
 		}
 
 		/* progress report is made by thread 0 for all threads */
@@ -5987,93 +6082,15 @@ threadRun(void *arg)
 			now = INSTR_TIME_GET_MICROSEC(now_time);
 			if (now >= next_report)
 			{
-/* generate and show report */
-StatsData	cur;
-int64		run = now - last_report,
-			ntx;
-double		tps,
-			total_run,
-			latency,
-			sqlat,
-			lag,
-			stdev;
-char		tbuf[315];
-
-/*
- * Add up the statistics of all threads.
- *
- * XXX: No locking. There is no guarantee that we get an
- * atomic snapshot of the transaction count and latencies, so
- * these figures can well be off by a small amount. The
- * progress report's purpose is to give a quick overview of
- * how the test is going, so that shouldn't matter too much.
- * (If a read from a 64-bit integer is not atomic, you might
- * get a "torn" read and completely bogus latencies though!)
- */
-initStats(, 0);
-for (i = 0; i < nthreads; i++)
-{
-	mergeSimpleStats(, [i].stats.latency);
-	mergeSimpleStats(, [i].stats.lag);
-	cur.cnt += thread[i].stats.cnt;
-	cur.skipped += thread[i].stats.skipped;
-}
-
-/* we count only actually executed transactions */
-ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped);
-total_run = (now - thread_start) / 100.0;
-tps = 100.0 * ntx / run;
-if (ntx > 0)
-{
-	latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx;
-	sqlat = 1.0 * (cur.latency.sum2 - 

Don't wake up to check trigger file if none is configured

2018-11-24 Thread Jeff Janes
A streaming replica waiting on WAL from the master will wake up every 5
seconds to check for a trigger file.  This is pointless if no trigger file
has been configured.

The attached patch suppresses the timeout when there is no trigger file
configured.

A minor thing to be sure, but there was a campaign a couple years ago to
remove such spurious wake-ups, so maybe this change is worthwhile.

I noticed that the existing codebase does not have a consensus on what to
pass to WaitLatch for the timeout when the timeout isn't relevant. I picked
0, but -1L also has precedent.

Cheers,

Jeff


WAL_sleep_not_triggered.patch
Description: Binary data


Re: Centralize use of PG_INTXX_MIN/MAX for integer limits

2018-11-24 Thread Tom Lane
Michael Paquier  writes:
> A couple of years ago, 62e2a8dc has introduced in c.h a set of limits
> (to fix some portability issues from 83ff1618) to make the code more
> system-independent.  Those are for example PG_INT32_MIN, etc.  The core
> code now mixes the internal PG_ limits with the system ones.  Would we
> want to unify a bit the whole thing and replace all the SHRT_MIN/MAX,
> LONG_MIN/MAX and such with the internal limit definitions?

I doubt it's really worth the trouble.  I did just make such a change in
commit cbdb8b4c0, but it was mostly (a) so that the different ftoiN/dtoiN
functions would look more alike, and (b) because the relevant variables or
result values were actually declared int16, int32, etc.  It would be flat
wrong to replace SHRT_MIN or LONG_MIN in a context where it was used to
check whether a value would fit in a variable declared "short" or "long".

> I suspect that the buildfarm does not have any more members where
> sizeof(int) is 2.

I doubt PG has ever been able to run on two-byte-int hardware.  Certainly
not in the buildfarm era.

> I am seeing close to 250 places in the core code,
> most of them for INT_MIN and INT_MAX.

You'd really need to look at the associated variables to see whether any
of those would be better off as INT32_MIN/MAX.

regards, tom lane



Re: [HACKERS] pgbench - allow to store select results into variables

2018-11-24 Thread Alvaro Herrera
FWIW I think the terminology in this patch is wrong.  You use the term
"compound" to mean "one query within a string containing multiple
queries", but that's not what compound means.  Compound is the whole
thing, comprised of the multiple queries.  Maybe "query" is the right
word to use there, not sure.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Copy function for logical replication slots

2018-11-24 Thread Petr Jelinek
Hi,

On 31/08/2018 07:03, Masahiko Sawada wrote:
> 
> Attached a new version patch incorporated the all comments I got.
> 

This looks pretty reasonable.

I am personally not big fan of the C wrappers for overloaded functions,
but that's what we need to do for opr_sanity to pass so I guess we'll
have to use them.

The more serious thing is:

> + if (MyReplicationSlot)
> + ReplicationSlotRelease();
> +
> + /* Release the saved slot if exist while preventing double releasing */
> + if (savedslot && savedslot != MyReplicationSlot)

This won't work as intended as the ReplicationSlotRelease() will set
MyReplicationSlot to NULL, you might need to set aside MyReplicationSlot
to yet another temp variable inside this function prior to releasing it.

-- 
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services



Re: pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.

2018-11-24 Thread Christoph Berg
Re: Thomas Munro 2018-11-23 
> Add WL_EXIT_ON_PM_DEATH pseudo-event.

I think this broke something:

TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || 
(wakeEvents & (1 << 4)))«, Datei: 
»/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«,
 Zeile: 389)
2018-11-24 15:20:43.193 CET [17834] LOG:  Serverprozess (PID 18425) wurde von 
Signal 6 beendet: Aborted

I can trigger it just by opening an ssl connection, non-ssl tcp
connections are fine.

Debian unstable/amd64.

Christoph



Re: could not connect to server, in order to operate pgAdmin/PostgreSQL

2018-11-24 Thread Malik Rumi
1. What is your operating system? It looks like Windows, but I'm not sure.
Also what version number for the OS?
2. Look in the url. You are at 127.0.0.1. That is the same as localhost.
But then it says 49194. Postgres uses 5432 by default. That's why you got
that part of the error message, but we need a lot more detail to help you.
3. It might be that your OS is too old to run the more recent versions of
Postgres. I don't use Windows anymore, but I know you have to be at Ubuntu
18 to run PG 10 +, there's probably something similar in Windows. That is
one of the many reasons why knowing your OS is important information.


*“None of you has faith until he loves for his brother or his neighbor what
he loves for himself.”*


On Fri, Nov 23, 2018 at 8:31 PM Anne Marie Harm 
wrote:

> Hello,
>
> Unfortunately I'm unable to operate pgAdmin/PostgreSQL; first of all I can
> only install version 9.5 (tried versions 11, 10, and 9.6 -- but cannot
> install). When I launch pgAdmin in order to try to use PostgreSGL 9.5, here
> is the full text of the error message I receive (also, screenshot attached):
>
> could not connect to server: Connection refused (0x274D/10061) Is the
> server running on host "localhost" (::1) and accepting TCP/IP connections
> on port 5432? could not connect to server: Connection refused
> (0x274D/10061) Is the server running on host "localhost" (127.0.0.1)
> and accepting TCP/IP connections on port 5432?
>
> Please advise; thank you. ~ Anne Marie
>
> 
> Anne Marie Harm
> (312) 563-9397amharm@earthlink.nethttps://www.linkedin.com/in/annemarieharm/
>
>
>


Re: [HACKERS] Block level parallel vacuum

2018-11-24 Thread Amit Kapila
On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada  wrote:
>
> Attached rebased version patch to the current HEAD.
>
> > Please apply this patch with the extension lock patch[1] when testing
> > as this patch can try to extend visibility map pages concurrently.
> >
>
> Because the patch leads performance degradation in the case where
> bulk-loading to a partitioned table I think that the original
> proposal, which makes group locking conflict when relation extension
> locks, is more realistic approach. So I worked on this with the simple
> patch instead of [1]. Attached three patches:
>
> * 0001 patch publishes some static functions such as
> heap_paralellscan_startblock_init so that the parallel vacuum code can
> use them.
> * 0002 patch makes the group locking conflict when relation extension locks.
> * 0003 patch add paralel option to lazy vacuum.
>
> Please review them.
>

I could see that you have put a lot of effort on this patch and still
we are not able to make much progress mainly I guess because of
relation extension lock problem.  I think we can park that problem for
some time (as already we have invested quite some time on it), discuss
a bit about actual parallel vacuum patch and then come back to it.  I
don't know if that is right or not.  I am not sure we can make this
ready for PG12 timeframe, but I feel this patch deserves some
attention.  I have started reading the main parallel vacuum patch and
below are some assorted comments.

+ 
+  Execute VACUUM in parallel by N
+  a background workers. Collecting garbage on table
is processed
+  in block-level parallel. For tables with indexes, parallel
vacuum assigns each
+  index to each parallel vacuum worker and all garbages on a
index are processed
+  by particular parallel vacuum worker. The maximum nunber of
parallel workers
+  is . This
option can not
+  use with FULL option.
+ 

There are a couple of mistakes in above para:
(a) "..a background workers." a seems redundant.
(b) "Collecting garbage on table is processed in block-level
parallel."/"Collecting garbage on table is processed at block-level in
parallel."
(c) "For tables with indexes, parallel vacuum assigns each index to
each parallel vacuum worker and all garbages on a index are processed
by particular parallel vacuum worker."
We can rephrase it as:
"For tables with indexes, parallel vacuum assigns a worker to each
index and all garbages on a index are processed by particular that
parallel vacuum worker."
(d) Typo: nunber/number
(e) Typo: can not/cannot

I have glanced part of the patch, but didn't find any README or doc
containing the design of this patch. I think without having design in
place, it is difficult to review a patch of this size and complexity.
To start with at least explain how the work is distributed among
workers, say there are two workers which needs to vacuum a table with
four indexes, how it works?  How does the leader participate and
coordinate the work.  The other parts that you can explain how the
state is maintained during parallel vacuum, something like you are
trying to do in below function:

+ * lazy_prepare_next_state
+ *
+ * Before enter the next state prepare the next state. In parallel lazy vacuum,
+ * we must wait for the all vacuum workers to finish the previous state before
+ * preparation. Also, after prepared we change the state ot all vacuum workers
+ * and wake up them.
+ */
+static void
+lazy_prepare_next_state(LVState *lvstate, LVLeader *lvleader, int next_state)

Still other things are how the stats are shared among leader and
worker.  I can understand few things in bits and pieces while glancing
through the patch, but it would be easier to understand if you
document it at one place.  It can help reviewers to understand it.

Can you consider to split the patch so that the refactoring you have
done in current code to make it usable by parallel vacuum is a
separate patch?

+/*
+ * Vacuum all indexes. In parallel vacuum, each workers take indexes
+ * one by one. Also after vacuumed index they mark it as done. This marking
+ * is necessary to guarantee that all indexes are vacuumed based on
+ * the current collected dead tuples. The leader process continues to
+ * vacuum even if any indexes is not vacuumed completely due to failure of
+ * parallel worker for whatever reason. The mark will be checked
before entering
+ * the next state.
+ */
+void
+lazy_vacuum_all_indexes(LVState *lvstate)

I didn't understand what you want to say here.  Do you mean that
leader can continue collecting more dead tuple TIDs when workers are
vacuuming the index?  How does it deal with the errors if any during
index vacuum?

+ * plan_lazy_vacuum_workers_index_workers
+ * Use the planner to decide how many parallel worker processes
+ * VACUUM and autovacuum should request for use
+ *
+ * tableOid is the table begin vacuumed which must not be non-tables or
+ * special system tables.
..
+ plan_lazy_vacuum_workers(Oid tableOid, int 

Centralize use of PG_INTXX_MIN/MAX for integer limits

2018-11-24 Thread Michael Paquier
Hi all,

A couple of years ago, 62e2a8dc has introduced in c.h a set of limits
(to fix some portability issues from 83ff1618) to make the code more
system-independent.  Those are for example PG_INT32_MIN, etc.  The core
code now mixes the internal PG_ limits with the system ones.  Would we
want to unify a bit the whole thing and replace all the SHRT_MIN/MAX,
LONG_MIN/MAX and such with the internal limit definitions?

I suspect that the buildfarm does not have any more members where
sizeof(int) is 2.  I am seeing close to 250 places in the core code,
most of them for INT_MIN and INT_MAX.

Thoughts?
--
Michael


signature.asc
Description: PGP signature


Re: 64-bit hash function for hstore and citext data type

2018-11-24 Thread Andrew Gierth
> "Tom" == Tom Lane  writes:

 >>> I'm inclined to fix this in hstoreUpgrade rather than complicate
 >>> hstore_hash with historical trivia. Also there have been no field
 >>> complaints - I guess it's unlikely that there is much pg 8.4 hstore
 >>> data in the wild that anyone wants to hash.

 >> Changing hstoreUpgrade at this point seems like wasted/misguided effort.

 Tom> Oh, cancel that --- I was having a momentary brain fade about how
 Tom> that function is used. I agree your proposal is sensible.

Here's what I have queued up to push:

-- 
Andrew (irc:RhodiumToad)

>From d5890f49da6a77b1325a3f5822c6b092a2cd41ae Mon Sep 17 00:00:00 2001
From: Andrew Gierth 
Date: Sat, 24 Nov 2018 09:59:49 +
Subject: [PATCH] Fix hstore hash function for empty hstores upgraded from 8.4.

Hstore data generated on pg 8.4 and pg_upgraded to current versions
remains in its original on-disk format unless modified. The same goes
for values generated by the addon hstore-new module on pre-9.0
versions. (The hstoreUpgrade function converts old values on the fly
when read in, but the on-disk value is not modified by this.)

Since old-format empty hstores (and hstore-new hstores) have
representations compatible with the new format, hstoreUpgrade thought
it could get away without modifying such values; but this breaks
hstore_hash (and the new hstore_hash_extended) which assumes
bit-perfect matching between semantically identical hstore values.
Only one bit actually differs (the "new version" flag in the count
field) but that of course is enough to break the hash.

Fix by making hstoreUpgrade unconditionally convert all old values to
new format.

Backpatch all the way, even though this changes a hash value in some
cases, because in those cases the hash value is already failing - for
example, a hash join between old- and new-format empty hstores will be
failing to match, or a hash index on an hstore column containing an
old-format empty value will be failing to find the value since it will
be searching for a hash derived from a new-format datum. (There are no
known field reports of this happening, probably because hashing of
hstores has only been useful in limited circumstances and there
probably isn't much upgraded data being used this way.)

Per concerns arising from discussion of commit eb6f29141be. Original
bug is my fault.

Discussion: https://postgr.es/m/60b1fd3b-7332-40f0-7e7f-f2f04f47%402ndquadrant.com
---
 contrib/hstore/hstore_compat.c | 47 +-
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/contrib/hstore/hstore_compat.c b/contrib/hstore/hstore_compat.c
index b95ce9b4aa..1d4e7484e4 100644
--- a/contrib/hstore/hstore_compat.c
+++ b/contrib/hstore/hstore_compat.c
@@ -238,34 +238,35 @@ hstoreUpgrade(Datum orig)
 	HStore	   *hs = (HStore *) PG_DETOAST_DATUM(orig);
 	int			valid_new;
 	int			valid_old;
-	bool		writable;
 
 	/* Return immediately if no conversion needed */
-	if ((hs->size_ & HS_FLAG_NEWVERSION) ||
-		hs->size_ == 0 ||
+	if (hs->size_ & HS_FLAG_NEWVERSION)
+		return hs;
+
+	/* Do we have a writable copy? If not, make one. */
+	if ((void *) hs == (void *) DatumGetPointer(orig))
+		hs = (HStore *) PG_DETOAST_DATUM_COPY(orig);
+
+	if (hs->size_ == 0 ||
 		(VARSIZE(hs) < 32768 && HSE_ISFIRST((ARRPTR(hs)[0]
+	{
+		HS_SETCOUNT(hs, HS_COUNT(hs));
+		HS_FIXSIZE(hs, HS_COUNT(hs));
 		return hs;
+	}
 
 	valid_new = hstoreValidNewFormat(hs);
 	valid_old = hstoreValidOldFormat(hs);
-	/* Do we have a writable copy? */
-	writable = ((void *) hs != (void *) DatumGetPointer(orig));
 
 	if (!valid_old || hs->size_ == 0)
 	{
 		if (valid_new)
 		{
 			/*
-			 * force the "new version" flag and the correct varlena length,
-			 * but only if we have a writable copy already (which we almost
-			 * always will, since short new-format values won't come through
-			 * here)
+			 * force the "new version" flag and the correct varlena length.
 			 */
-			if (writable)
-			{
-HS_SETCOUNT(hs, HS_COUNT(hs));
-HS_FIXSIZE(hs, HS_COUNT(hs));
-			}
+			HS_SETCOUNT(hs, HS_COUNT(hs));
+			HS_FIXSIZE(hs, HS_COUNT(hs));
 			return hs;
 		}
 		else
@@ -302,15 +303,10 @@ hstoreUpgrade(Datum orig)
 		elog(WARNING, "ambiguous hstore value resolved as hstore-new");
 
 		/*
-		 * force the "new version" flag and the correct varlena length, but
-		 * only if we have a writable copy already (which we almost always
-		 * will, since short new-format values won't come through here)
+		 * force the "new version" flag and the correct varlena length.
 		 */
-		if (writable)
-		{
-			HS_SETCOUNT(hs, HS_COUNT(hs));
-			HS_FIXSIZE(hs, HS_COUNT(hs));
-		}
+		HS_SETCOUNT(hs, HS_COUNT(hs));
+		HS_FIXSIZE(hs, HS_COUNT(hs));
 		return hs;
 #else
 		elog(WARNING, "ambiguous hstore value resolved as hstore-old");
@@ -318,13 +314,8 @@ hstoreUpgrade(Datum orig)
 	}
 
 	/*
-	 * must have an old-style value. Overwrite it in place as a new-style one,
-	 * making sure we have a writable copy first.

Re: FETCH FIRST clause WITH TIES option

2018-11-24 Thread Surafel Temesgen
Attach is rebased patch against the current master
regards
Surafel

On Thu, Nov 1, 2018 at 2:28 PM Surafel Temesgen 
wrote:

> hi,
>
> The attached patch include all the comment given by Tomas and i check sql
> standard about LIMIT and this feature
>
> it did not say anything about it but I think its good idea to include it
> to LIMIT too and I will add it if we have consensus on it.
>
> regards
>
> surafel
>
diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml
index 4db8142afa..ed7249daeb 100644
--- a/doc/src/sgml/ref/select.sgml
+++ b/doc/src/sgml/ref/select.sgml
@@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
 [ LIMIT { count | ALL } ]
 [ OFFSET start [ ROW | ROWS ] ]
-[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ]
+[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ]
 [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ]
 
 where from_item can be one of:
@@ -1397,7 +1397,7 @@ OFFSET start
 which PostgreSQL also supports.  It is:
 
 OFFSET start { ROW | ROWS }
-FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY
+FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ]
 
 In this syntax, the start
 or count value is required by
@@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] {
 ambiguity.
 If count is
 omitted in a FETCH clause, it defaults to 1.
-ROW
+ROW .
+WITH TIES option is used to return two or more rows
+that tie for last place in the limit results set according to ORDER BY
+clause (ORDER BY clause must be specified in this case).
 and ROWS as well as FIRST
 and NEXT are noise words that don't influence
 the effects of these clauses.
diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c
index 6792f9e86c..4fc3f61b26 100644
--- a/src/backend/executor/nodeLimit.c
+++ b/src/backend/executor/nodeLimit.c
@@ -41,6 +41,7 @@ static TupleTableSlot *			/* return: a tuple or NULL */
 ExecLimit(PlanState *pstate)
 {
 	LimitState *node = castNode(LimitState, pstate);
+	ExprContext *econtext = node->ps.ps_ExprContext;
 	ScanDirection direction;
 	TupleTableSlot *slot;
 	PlanState  *outerPlan;
@@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate)
  * the state machine state to record having done so.
  */
 if (!node->noCount &&
-	node->position - node->offset >= node->count)
+	node->position - node->offset >= node->count &&
+	node->limitOption == WITH_ONLY)
 {
 	node->lstate = LIMIT_WINDOWEND;
 
@@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate)
 	return NULL;
 }
 
-/*
- * Get next tuple from subplan, if any.
- */
-slot = ExecProcNode(outerPlan);
-if (TupIsNull(slot))
+else if (!node->noCount &&
+		 node->position - node->offset >= node->count &&
+		 node->limitOption == WITH_TIES)
 {
-	node->lstate = LIMIT_SUBPLANEOF;
-	return NULL;
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	/*
+	 * Test if the new tuple and the last tuple match.
+	 * If so we return the tuple.
+	 */
+	econtext->ecxt_innertuple = slot;
+	econtext->ecxt_outertuple = node->last_slot;
+	if (ExecQualAndReset(node->eqfunction, econtext))
+	{
+		ExecCopySlot(node->last_slot, slot);
+		node->subSlot = slot;
+		node->position++;
+	}
+	else
+	{
+		node->lstate = LIMIT_WINDOWEND;
+
+		/*
+		* If we know we won't need to back up, we can release
+		* resources at this point.
+		*/
+		if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD))
+			(void) ExecShutdownNode(outerPlan);
+
+		return NULL;
+	}
+
+}
+else
+{
+	/*
+	 * Get next tuple from subplan, if any.
+	 */
+	slot = ExecProcNode(outerPlan);
+	if (TupIsNull(slot))
+	{
+		node->lstate = LIMIT_SUBPLANEOF;
+		return NULL;
+	}
+	if (node->limitOption == WITH_TIES)
+	{
+		ExecCopySlot(node->last_slot, slot);
+	}
+	node->subSlot = slot;
+	node->position++;
 }
-node->subSlot = slot;
-node->position++;
 			}
 			else
 			{
@@ -311,7 +360,8 @@ recompute_limits(LimitState *node)
 	 * must update the child node anyway, in case this is a rescan and the
 	 * previous time we got a different result.
 	 */
-	ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
+	if(node->limitOption == WITH_ONLY)
+		ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node));
 }
 
 /*
@@ -374,6 +424,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags)
 		   (PlanState *) limitstate);
 	limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
 	

Re: COPY FROM WHEN condition

2018-11-24 Thread Dean Rasheed
On Sat, 24 Nov 2018 at 02:09, Tomas Vondra  wrote:
> On 11/23/18 12:14 PM, Surafel Temesgen wrote:
> > On Sun, Nov 11, 2018 at 11:59 PM Tomas Vondra
> > mailto:tomas.von...@2ndquadrant.com>> wrote:
> > So, what about using FILTER here? We already use it for aggregates when
> > filtering rows to process.
> >
> > i think its good idea and describe its purpose more. Attache is a
> > patch that use FILTER instead
>

FWIW, I vote for just using WHERE here.

Right now we have 2 syntaxes for filtering rows in queries, both of
which use WHERE immediately before the condition:

1). SELECT ... FROM ... WHERE condition

2). SELECT agg_fn FILTER (WHERE condition) FROM ...

I'm not a huge fan of (2), but that's the SQL standard, so we're stuck
with it. There's a certain consistency in it's use of WHERE to
introduce the condition, and preceding that with FILTER helps to
distinguish it from any later WHERE clause. But what you'd be adding
here would be a 3rd syntax

3). COPY ... FROM ... FILTER condition

which IMO will just lead to confusion.

Regards,
Dean



Re: pgbench - doCustom cleanup

2018-11-24 Thread Fabien COELHO


Hello Alvaro,


I just pushed this.  I hope not to have upset you too much with the
subroutine thing.


Sorry for the feedback delay, my week was kind of overloaded…

Thanks for the push.

About the patch you committed, a post-commit review:

 - the state and function renamings are indeed a good thing.

 - I'm not fond of "now = func(..., now)", I'd have just passed a
   reference.

 - I'd put out the meta commands, but keep the SQL case and the state
   assignment in the initial function, so that all state changes are in
   one function… which was the initial aim of the submission.
   Kind of a compromise:-)

See the attached followup patch which implements these suggestions. The 
patch is quite small under "git diff -w", but larger because of the 
reindentation as one "if" level is removed.


--
Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index c64e16187a..7392cf8688 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -580,8 +580,7 @@ static void setIntValue(PgBenchValue *pv, int64 ival);
 static void setDoubleValue(PgBenchValue *pv, double dval);
 static bool evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr,
 			 PgBenchValue *retval);
-static instr_time doExecuteCommand(TState *thread, CState *st,
- instr_time now);
+static ConnectionStateEnum executeMetaCommand(TState *thread, CState *st, instr_time *now);
 static void doLog(TState *thread, CState *st,
 	  StatsData *agg, bool skipped, double latency, double lag);
 static void processXactStats(TState *thread, CState *st, instr_time *now,
@@ -2862,6 +2861,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	 */
 	for (;;)
 	{
+		Command	   *command;
 		PGresult   *res;
 
 		switch (st->state)
@@ -3003,8 +3003,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
  * Send a command to server (or execute a meta-command)
  */
 			case CSTATE_START_COMMAND:
+command = sql_script[st->use_file].commands[st->command];
+
 /* Transition to script end processing if done */
-if (sql_script[st->use_file].commands[st->command] == NULL)
+if (command == NULL)
 {
 	st->state = CSTATE_END_TX;
 	break;
@@ -3016,7 +3018,27 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 	INSTR_TIME_SET_CURRENT_LAZY(now);
 	st->stmt_begin = now;
 }
-now = doExecuteCommand(thread, st, now);
+
+if (command->type == SQL_COMMAND)
+{
+	if (!sendCommand(st, command))
+	{
+		commandFailed(st, "SQL", "SQL command send failed");
+		st->state = CSTATE_ABORTED;
+	}
+	else
+		st->state = CSTATE_WAIT_RESULT;
+}
+else if (command->type == META_COMMAND)
+{
+	/*-
+	 * Possible state changes when executing meta commands:
+	 * - on errors CSTATE_ABORTED
+	 * - on sleep CSTATE_SLEEP
+	 * - else CSTATE_END_COMMAND
+	 */
+	st->state = executeMetaCommand(thread, st, );
+}
 
 /*
  * We're now waiting for an SQL command to complete, or
@@ -3254,178 +3276,151 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg)
 
 /*
  * Subroutine for advanceConnectionState -- execute or initiate the current
- * command, and transition to next state appropriately.
- *
- * Returns an updated timestamp from 'now', used to update 'now' at callsite.
+ * meta command, and return to next state appropriately.
  */
-static instr_time
-doExecuteCommand(TState *thread, CState *st, instr_time now)
+static ConnectionStateEnum
+executeMetaCommand(TState *thread, CState *st, instr_time *now)
 {
 	Command*command = sql_script[st->use_file].commands[st->command];
+	int			argc;
+	char	  **argv;
 
-	/* execute the command */
-	if (command->type == SQL_COMMAND)
+	Assert(command != NULL && command->type == META_COMMAND);
+
+	argc = command->argc;
+	argv = command->argv;
+
+	if (debug)
 	{
-		if (!sendCommand(st, command))
-		{
-			commandFailed(st, "SQL", "SQL command send failed");
-			st->state = CSTATE_ABORTED;
-		}
-		else
-			st->state = CSTATE_WAIT_RESULT;
+		fprintf(stderr, "client %d executing \\%s", st->id, argv[0]);
+		for (int i = 1; i < argc; i++)
+			fprintf(stderr, " %s", argv[i]);
+		fprintf(stderr, "\n");
 	}
-	else if (command->type == META_COMMAND)
+
+	if (command->meta == META_SLEEP)
 	{
-		int			argc = command->argc;
-		char	  **argv = command->argv;
-
-		if (debug)
-		{
-			fprintf(stderr, "client %d executing \\%s",
-	st->id, argv[0]);
-			for (int i = 1; i < argc; i++)
-fprintf(stderr, " %s", argv[i]);
-			fprintf(stderr, "\n");
-		}
-
-		if (command->meta == META_SLEEP)
-		{
-			int			usec;
-
-			/*
-			 * A \sleep doesn't execute anything, we just get the delay from
-			 * the argument, and enter the CSTATE_SLEEP state.  (The
-			 * per-command latency will be recorded in CSTATE_SLEEP state, not
-			 * here, after the delay has elapsed.)
-			 */
-			if (!evaluateSleep(st, argc, argv, ))
-			{
-commandFailed(st, "sleep",