Re: [HACKERS] Block level parallel vacuum
On Sat, Nov 24, 2018 at 5:47 PM Amit Kapila wrote: > On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada wrote: > > > > I could see that you have put a lot of effort on this patch and still > we are not able to make much progress mainly I guess because of > relation extension lock problem. I think we can park that problem for > some time (as already we have invested quite some time on it), discuss > a bit about actual parallel vacuum patch and then come back to it. > Today, I was reading this and previous related thread [1] and it seems to me multiple people Andres [2], Simon [3] have pointed out that parallelization for index portion is more valuable. Also, some of the results [4] indicate the same. Now, when there are no indexes, parallelizing heap scans also have benefit, but I think in practice we will see more cases where the user wants to vacuum tables with indexes. So how about if we break this problem in the following way where each piece give the benefit of its own: (a) Parallelize index scans wherein the workers will be launched only to vacuum indexes. Only one worker per index will be spawned. (b) Parallelize per-index vacuum. Each index can be vacuumed by multiple workers. (c) Parallelize heap scans where multiple workers will scan the heap, collect dead TIDs and then launch multiple workers for indexes. I think if we break this problem into multiple patches, it will reduce the scope of each patch and help us in making progress. Now, it's been more than 2 years that we are trying to solve this problem, but still didn't make much progress. I understand there are various genuine reasons and all of that work will help us in solving all the problems in this area. How about if we first target problem (a) and once we are done with that we can see which of (b) or (c) we want to do first? [1] - https://www.postgresql.org/message-id/CAD21AoD1xAqp4zK-Vi1cuY3feq2oO8HcpJiz32UDUfe0BE31Xw%40mail.gmail.com [2] - https://www.postgresql.org/message-id/20160823164836.naody2ht6cutioiz%40alap3.anarazel.de [3] - https://www.postgresql.org/message-id/CANP8%2BjKWOw6AAorFOjdynxUKqs6XRReOcNy-VXRFFU_4bBT8ww%40mail.gmail.com [4] - https://www.postgresql.org/message-id/CAGTBQpbU3R_VgyWk6jaD%3D6v-Wwrm8%2B6CbrzQxQocH0fmedWRkw%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: POC for a function trust mechanism
On Sun, Aug 12, 2018 at 10:40:30PM -0400, Robert Haas wrote: > On Wed, Aug 8, 2018 at 1:15 PM, Tom Lane wrote: > > If we had, say, a catalog that provided the desired list of trusted roles > > for every role, then we could imagine implementing that context change > > automatically. Likewise, stuff like autovacuum or REINDEX would want > > to run with the table owner's list of trusted roles, but the GUC approach > > doesn't really provide enough infrastructure to know what to do there. > > Yeah, I think these are all good points. It seems natural for trust > to be a property of a role, for just the reasons that you mention. > However, there does also seem to be a use case for varying it > temporarily on a per-session or per-function basis, and I'm not > exactly sure how to cater to those needs. Yep. A GUC is great for src/bin sessions, since many of those applications consistently want tight trust settings. > I wonder if Noah would like to rebase and post his version also. Sure, attached (last merge at 757c518). The owner_trustcheck() header comment is a good starting point for reading it. Tom Lane later asserted that it's okay to perform the function trust check in fmgr_info_cxt_security() instead of doing it in most fmgr_info() callers and some *FunctionCall* callers. While I suspected he was right, I have not made that change in this rebase. (I also haven't audited every fmgr_info() call added after -v1.) When I shared -v1 with the security team, I included the following submission notes: === Here's an implementation. Design decisions: 1. A role trusts itself implicitly 2. Default pg_auth_trust entry for "public TRUST public" This effectively disables the new restrictions; concerned administrators will test their applications with it removed, then remove it. 3. Default pg_auth_trust entry for "public TRUST " Almost everyone will leave this untouched. 4. Changing trust for a role requires ADMIN OPTION on the role. Trust is the next step down from granting role membership. ("ALTER ROLE public TRUST ..." does require superuser.) 5. Non-inheritance of trust Trust is like a reverse permission; I find it helpful to think of "ALTER ROLE foo TRUST bar" as "GRANT may_supply_functions_to_foo TO bar". It would be reasonable to have that trust relationship be effective for INHERITS members of bar; after all, they could always "ALTER FUNCTION ... OWNER TO bar". For now, trust relationships are not subject to inheritance. This keeps things simpler to understand and poses no major downsides. For similar reasons, I have not made a role trust its own members implicitly. 6. Non-transitivity of trust If I trust only alice, alice trusts only bob, and I call alice's function that calls bob's function, should the call succeed? This is a tough one. In favor of "yes", this choice would allow alice to refactor her function without needing her callers to revise their trust settings. That is, she could switch from bob's function to carol's function, and her callers would never notice. My trust in alice is probably misplaced if she chooses her friends badly. In favor of "no", making the trust relationship transitive seems to mix two otherwise-separate concepts: alice's willingness to run code as herself and alice's willingness to certify third-party functions to her friends. In particular, current defects in the system are at odds with conflating those concepts. Everyone should trust the bootstrap superuser, but it currently owns functions like integer_pl_date that are not careful in what they call. That closed the issue for me; trust is not transitive by default. Even so, I think there would be value in a facility for requesting trust transitivity in specific situations. 7. (Lack of) negative trust entries There's no way to opt out of a PUBLIC trust relationship; until a superuser removes the "public TRUST public" entry, no user can achieve anything with ALTER USER [NO] TRUST. I considered adding the concept of a negative trust relationship to remedy this problem; it could also be used to remove the implicit self-trust for testing purposes. I have refrained; I don't know whether the benefits would pay for the extra user-facing complexity. 8. Applicable roles during trust checks We had examined whether the check could be looser for SECURITY DEFINER functions, since those can't perform arbitrary actions as the caller. I described some challenges then, but a deeper look turned up more: a SECURITY DEFINER function can do things like "SET search_path = ..." that affect the caller. Consequently, I concluded that all roles on the active call stack should have a stake in whether a particular function owner is permitted. Each trust check walks the call stack and checks every role represented there; if any lacks the trust relationships to approve the newly-contemplated function call, the call is rejected. The stack traversal may and does stop at the edge of a security-restricted operation;
Re: tab-completion debug print
On Fri, Nov 23, 2018 at 04:32:31PM -0500, Tom Lane wrote: > Kyotaro HORIGUCHI writes: > > I was reminded that I was often annoyed with identifying the code > > that made a word-completion, by hearing the same complaint from a > > collegue of mine just now. > > Something like the attached that tweaks completion_matches calls > > lets psql emit the line number where a word-completion > > happens. The output can be split out using redirection so that it > > doesn't break into the conversation on console. > > > (make -s COPT=-DTABCOMPLETION_DEBUG install) > > $ psql postgres 2>~debug.out > > =# alt[tab]er [tab]t[tab]ab[tab] [tab] > > > You can see the following output in another bash session. > > $ tail -f ~/debug.out > > [1414][1435][1435][1435][1431] > > Every number enclosed by brackets is the line number in > > tab-complete.c, where completion happens. > > > Is this useful? Any suggestions, thoughts? > > Hm. I can see the value of instrumenting tab-complete when you're trying > to debug why it did something, but this output format seems pretty terse > and unreadable. Can we get it to print the completion text as well? > I'm imagining something more like > > 1414: "er " > 1435: "" > 1435: "ab" > 1435: "" > 1431: "" > > Perhaps there's room as well to print the context that the match looked > at: > > 1414: "alt" -> "er " > 1435: "alter " -> "" > 1435: "alter t" -> "ab" > > etc. > > regards, tom lane Is this something along the lines of what you had in mind? Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate >From 712244a029c0f30fe1195a1400aee85541aa4b6b Mon Sep 17 00:00:00 2001 In-Reply-To: <1227.1543008...@sss.pgh.pa.us> References: <1227.1543008...@sss.pgh.pa.us> From: David Fetter Date: Fri, 23 Nov 2018 14:55:17 -0800 Subject: [PATCH] v0002 Surface tab completions for debugging To access: make -s COPT=-DTABCOMPLETION_DEBUG --- src/bin/psql/tab-complete.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 9dbd555166..8b81c90d19 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -60,6 +60,32 @@ extern char *filename_completion_function(); #define completion_matches rl_completion_matches #endif +/* + * By enabling the following definition the source line number is emitted to + * stderr for every completion attempt. You can isolate them from console + * interaction by redirecting stderr into a file. + */ +#ifdef TABCOMPLETION_DEBUG +#ifdef HAVE_RL_COMPLETION_MATCHES +#define org_completion_matches rl_completion_matches +#else +#define org_completion_matches completion_matches +#endif + +static char **completion_debug(int line, const char *text, char **list) +{ + fprintf(stderr, "[%d: (%s", line, text); + for (int i = 0; list && list[i]; ++i) + fprintf(stderr, ", %s", list[i]); + fprintf(stderr, ")]\n"); + return list; +} + +#undef completion_matches +#define completion_matches(text, func) \ + completion_debug(__LINE__, (text), org_completion_matches((text),(func))) +#endif + /* word break characters */ #define WORD_BREAKS "\t\n@$><=;|&{() " -- 2.19.1
Re: pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.
On Sun, Nov 25, 2018 at 3:38 AM Christoph Berg wrote: > Re: Thomas Munro 2018-11-23 > > Add WL_EXIT_ON_PM_DEATH pseudo-event. > > I think this broke something: > > TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || > (wakeEvents & (1 << 4)))«, Datei: > »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«, > Zeile: 389) > 2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von > Signal 6 beendet: Aborted > > I can trigger it just by opening an ssl connection, non-ssl tcp > connections are fine. Thanks. I was initially surprised that this didn't come up in check-world, but I see now that I need to go and add PG_TEST_EXTRA="ssl ldap" to my testing routine (and cfbot's). Reproduced here, and it's a case where we were not handling postmaster death, which exactly what this assertion was designed to find. The following is one way to fix the assertion failure, though I'm not sure if it would be better to request WL_POSTMASTER_DEATH and generate a FATAL error like secure_read() does: --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -406,9 +406,9 @@ aloop: * StartupPacketTimeoutHandler() which directly exits. */ if (err == SSL_ERROR_WANT_READ) - waitfor = WL_SOCKET_READABLE; + waitfor = WL_SOCKET_READABLE | WL_EXIT_ON_PM_DEATH; else - waitfor = WL_SOCKET_WRITEABLE; + waitfor = WL_SOCKET_WRITEABLE | WL_EXIT_ON_PM_DEATH; -- Thomas Munro http://www.enterprisedb.com
Re: Centralize use of PG_INTXX_MIN/MAX for integer limits
> "Michael" == Michael Paquier writes: Michael> Hi all, Michael> A couple of years ago, 62e2a8dc has introduced in c.h a set of Michael> limits (to fix some portability issues from 83ff1618) to make Michael> the code more system-independent. Those are for example Michael> PG_INT32_MIN, etc. The core code now mixes the internal PG_ Michael> limits with the system ones. Would we want to unify a bit the Michael> whole thing and replace all the SHRT_MIN/MAX, LONG_MIN/MAX and Michael> such with the internal limit definitions? Of course not. And LONG_MIN/MAX is the obvious example of why not, since that one does vary between platforms. INT_MAX is for the max value of an "int". PG_INT32_MAX is for the max value of an "int32". PG_INT64_MAX is for the max value of an "int64". LONG_MAX is for the max value of a "long". Simple. However, as I said at the time of those patches, I did not at that stage audit all the uses of INT_MIN/MAX to determine which should really have been INT32_MIN/MAX. Currently, all sorts of things will likely break if "int" and "int32" are not exactly the same size, but it might still be a good idea to fix that at some point. (As a recent example, contrib/intarray uses "int" almost universally, regardless of the fact that it's meant to be dealing with SQL "integer" values, which should be int32.) -- Andrew (irc:RhodiumToad)
Re: Desirability of client-side expressions in psql?
> > >>psql> \if :i >= 5 > >> > > I think we're ok with that so long as none of the operators or values > has a > > \ in it. > > What barriers do you see to re-using the pgbench grammar? > > The pgbench expression grammar mimics SQL expression grammar, > on integers, floats, booleans & NULL. > > I'm unsure about some special cases in psql (`shell command`, > 'text' "identifier"). They can be forbidden on a new commande (\let), > but what happens on "\if ..." which I am afraid allows them is unclear. > > -- > Fabien. > (raising this thread from hibernation now that I have the bandwidth) It seems like the big barriers to just using pgbench syntax are: - the ability to indicate that the next thing to follow will be a pgbench expression - a way to coax pgbench truth-y values into psql truthy values (t/f, y/n, 1/0) For that, I see a few ways forward: 1. A suffix on \if, \elif, -exp suffix (or even just -x) to existing commands to indicate that a pgbench expression would follow This would look something like \ifx \elifx \setx \if$ \elif$ \set$ 2. A command-line-esque switch or other sigil to indicate that what follows is a pgbench expression with psql vars to interpolate Example: \set foo -x 1 + 4 \set foo \expr 1 + 4 \if -x :limit > 10 \if \expr :limit > 10 3. A global toggle to indicate which mode should be used by \if, \elif, and \set Example: \pset expressions [on | off] 4. A combination of #2 and #3 with a corresponding switch/sigil to indicate "do not evaluate pgbench-style This is particularly appealing to me because it would allow code snippets from pgbench to be used without modification, while still allowing the user to mix-in old/new style to an existing script. 5. A special variant of `command` where variables are interpolated before being sent to the OS, and allow that on \if, \elif \set foo ``expr :y + :z`` \set foo $( expr :y + :z ) \if ``expr :limit > 10`` \if $( expr :limit > 10 ) This also has some appeal because it allows for a great amount of flexibility, but obviously constrains us with OS-dependencies. The user might have a hard time sending commands with ')' in them if we go the $( ) route 6. Option #5, but we add an additional executable (suggested name: pgexpr) to the client libs, which encapsulates the pgbench expression library as a way around OS-dependent code. 7. I believe someone suggested introducing the :{! pgbench-command} or :{{ pgbench-command }} var-mode \set foo :{! :y + :z } \set foo :{{ :y + :z }} \if :{! :limit > 10 } \if :{{ :limit > 10 }} This has some appeal as well, though I prefer the {{...}} syntax because "!" looks like negation, and {{ resembles the [[ x + y ]] syntax in bash One nice thing is that most of these options are not mutually exclusive. Thoughts?
Re: RHEL 8.0 build
Jeremy Harris writes: > Trying to set up a buildfarm animal on a RHEL 8.0 (beta) system, > the build fails early: > ... > It appears to be a "configure" script looking for python; and there is > no such. You can have python3 or python2 - but neither package install > provides a symlink of just "python". Yeah, some distros are starting to act that way, and I suspect it's causing pain for a lot of people. Currently we are agnostic about which python version to use, so if you don't have anything simply named "python", you have to tell configure what to use by setting the PYTHON environment variable. In a buildfarm configuration file this would look something like # env settings to pass to configure. These settings will only be seen by # configure. config_env => { + PYTHON => "/usr/local/bin/python3", There's been some preliminary discussion about starting to default to python3, but given this project's inherent conservatism, I don't expect that to happen for some years yet. In any case, whenever we do pull that trigger we'd surely do so only in HEAD not released branches, so buildfarm owners will need to deal with the case for years more. regards, tom lane
Re: Regarding performance regression on specific query
Amit Langote writes: > On 2018/11/20 2:49, Jung, Jinho wrote: >> [ assorted queries ] > I noticed that these two are fixed by running ANALYZE in the database in > which these queries are run. That didn't help much for me. What did help was increasing join_collapse_limit and from_collapse_limit to not limit the join search space --- on queries with as many input relations as these, you're really at the mercy of whether the given query structure represents a good join order if you don't. In general I can't get even a little bit excited about the quality of the plans selected for these examples, as they all involve made-up restriction and join clauses that the planner isn't going to have the slightest clue about. The observations boil down to "9.4 made one set of arbitrary plan choices, while v10 made a different set of arbitrary plan choices, and on these particular examples 9.4 got lucky and 10 didn't". Possibly also worth noting is that running these in an empty database is in itself kind of a worst case, because many of the tables are empty to start with (or the restriction/join clauses pass no rows), and so the fastest runtime tends to go to plans of the form "nestloop with empty relation on the outside and all the expensive stuff on the inside". (Observe all the "(never executed)" notations in the EXPLAIN output.) This kind of plan wins only when the outer rel is actually empty, otherwise it can easily lose big, and therefore PG's planner is intentionally designed to discount the case entirely. We never believe that a relation is empty, unless we can mathematically prove that, and our cost estimates are never made with an eye to exploiting such cases. This contributes a lot to the random-chance nature of which plan is actually fastest; the planner isn't expecting "(never executed)" to happen and doesn't prefer plans that will win if it does happen. regards, tom lane
RHEL 8.0 build
Hi, Trying to set up a buildfarm animal on a RHEL 8.0 (beta) system, the build fails early: $ ./run_build.pl --nosend --nostatus --verbose Sat Nov 24 20:09:28 2018: buildfarm run for CHANGEME:HEAD starting CHANGEME:HEAD [20:09:28] checking out source ... CHANGEME:HEAD [20:09:33] checking if build run needed ... CHANGEME:HEAD [20:09:34] copying source to pgsql.build ... CHANGEME:HEAD [20:09:35] running configure ... Branch: HEAD Stage Configure failed with status 1 $ It appears to be a "configure" script looking for python; and there is no such. You can have python3 or python2 - but neither package install provides a symlink of just "python". Obviously I could get going by manually adding one, but perhaps other people would benefit from a neater fix. -- Cheers, Jeremy
Re: jsonpath
Hi, I've done another round of reviews on v20, assuming the patch is almost ready to commit, but unfortunately I ran into a bunch of issues that need to be resolved. None of this is a huge issue, but it's also above the threshold of what could be tweaked by a committer IMHO. (Which brings the question who plans to commit this. The patch does not have a committer in the CF app, but I see both Teodor and Alexander are listed as it's authors, so I'd expect it to be one of those. Or I might do that, of course.) 0001 1) to_timestamp() does this: do_to_timestamp(date_txt, VARDATA(fmt), VARSIZE_ANY_EXHDR(fmt), false, , , , NULL); Shouldn't it really do VARDATA_ANY() instead of VARDATA()? It's what the function did before (well, it called text_to_cstring, but that does VARDATA_ANY). The same thing applies to to_date(), BTW. I also find it a bit inconvenient that we expand the fmt like this in all do_to_timestamp() calls, although it's just to_datetime() that needs to do it this way. I realize we can't change to_datetime() because it's external API, but maybe we should make it construct a varlena and pass it to do_to_timestamp(). 2) We define both DCH_FF# and DCH_ff#, but we never ever use the lower-case version. Heck, it's not mentioned even in DCH_keywords, which does this: ... {"FF1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... {"ff1", 3, DCH_FF1, false, FROM_CHAR_DATE_NONE}, /* F */ ... Compare that to DCH_DAY, DCH_Day and DCH_day, mapped to "DAY", "Day" and "day". Also, the comment for "ff1" is wrong, it should be "f" I guess. And of course, there are regression tests for FF# variants, but not the lower-case ones. 3) Of course, these new formatting patterns should be added to SGML docs, for example to "Template Patterns for Date/Time Formatting" in func.sgml (and maybe to other places, I haven't looked for them very thoroughly). 4) The last block in DCH_from_char() does this while (*s == ' ') s++; but I suppose it should use isspace() just like the code immediately before it. 5) I might be missing something, but why is there the "D" at the end of the return flags from DCH_from_char? /* Return flags for DCH_from_char() */ #define DCH_DATED0x01 #define DCH_TIMED0x02 #define DCH_ZONED0x04 0002 1) There are some unnecessary changes to to_datetime signature (date_txt renamed to vs. datetxt), which is mostly minor but unnecessary churn. 2) There are some extra changes to to_datetime (extra parameters, etc.). I wonder why those are not included in 0001, as part of the supporting datetime infrastructure. 3) I'm not sure whether the _safe functions are a good or bad idea, but at the very least the comments should be updated to explain what it does (as the API has changed, obviously). 4) the json.c changes are under-documented, e.g. there are no comments for lex_peek_value, JsonEncodeDateTime doesn't say what tzp is for and whether it needs to be specified all the time, half of the functions at the end don't have comments (some of them are really simple, but then other simple functions do have comments). I don't know what the right balance here is (I'm certainly not asking for bogus comments just to have comments) and I agree that the existing code is not exactly abundant in comments. But perhaps having at least some would be nice. The same thing applies to jsonpath.c and jsonpath_exec.c, I think. There are pretty much no comments whatsoever (at least at the function level, explaining what the functions do). It would be good to have a file-level comment too, explaining how jsonpath works, probably. 5) I see uniqueifyJsonbObject now does this: if (!object->val.object.uniquified) return; That seems somewhat strange, considering the function says it'll uniqueify objects, but then exits when noticing the passed object is not uniquified? 6) Why do we make make_result inline? (numeric.c) If this needs to be marked with "inline" then perhaps all the _internal functions should be marked like that too? I have my doubts about the need for this. 7) The other places add _safe to functions that don't throw errors directly and instead update edata. Why are set_var_from_str, div_var, mod_var excluded from this convention? 8) I wonder if the changes in numeric can have negative impact on performance. Has anyone done any performance tests of this part? 0003 1) json_gin.c should probably add comments briefly explaining what JsonPathNode, GinEntries, ExtractedPathEntry, ExtractedJsonPath and JsonPathExtractionContext are for. 2) I find it a bit suspicious that there are no asserts in json_gin.c (well, there are 3 in the existing code, but nothing in the new code, and the patch essentially doubles the amount of code here). No comments for 0004 at this point. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support,
Re: [HACKERS] pgbench - allow to store select results into variables
FWIW I think the terminology in this patch is wrong. You use the term "compound" to mean "one query within a string containing multiple queries", but that's not what compound means. Compound is the whole thing, comprised of the multiple queries. Indeed. Compounded query? Maybe "query" is the right word to use there, not sure. I do not understand, "query queries"? I think it should avoid using sql-related words, such as "group", "aggregate", "merge", "join"... I thought of "combined", meaning the queries are combined together in a single message at the protocol level. Basically I'm ok with any better idea. -- Fabien.
Re: [HACKERS] Re: [COMMITTERS] pgsql: Remove pgbench "progress" test pending solution of its timing is (fwd)
Unfortunately, there was no activity over the last few commitfests and the proposed patch pgbench-tap-progress-6 can't be applied anymore without conflicts. Fabien, what are your plans about it, could you please post a rebased version? Here it is. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index c64e16187a..4495395990 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -5736,6 +5736,96 @@ main(int argc, char **argv) return exit_code; } +/* display the progress report */ +static void +doProgressReport(TState *thread, StatsData *plast, int64 *plast_report, + int64 now, int64 thread_start) +{ + StatsData cur; + int64 run = now - *plast_report, +ntx; + double tps, +total_run, +latency, +sqlat, +lag, +stdev; + char tbuf[64]; + int i; + + /* + * Add up the statistics of all threads. + * + * XXX: No locking. There is no guarantee that we get an + * atomic snapshot of the transaction count and latencies, so + * these figures can well be off by a small amount. The + * progress report's purpose is to give a quick overview of + * how the test is going, so that shouldn't matter too much. + * (If a read from a 64-bit integer is not atomic, you might + * get a "torn" read and completely bogus latencies though!) + */ + initStats(, 0); + for (i = 0; i < nthreads; i++) + { + mergeSimpleStats(, [i].stats.latency); + mergeSimpleStats(, [i].stats.lag); + cur.cnt += thread[i].stats.cnt; + cur.skipped += thread[i].stats.skipped; + } + + /* we count only actually executed transactions */ + ntx = (cur.cnt - cur.skipped) - (plast->cnt - plast->skipped); + total_run = (now - thread_start) / 100.0; + tps = 100.0 * ntx / run; + if (ntx > 0) + { + latency = 0.001 * (cur.latency.sum - plast->latency.sum) / ntx; + sqlat = 1.0 * (cur.latency.sum2 - plast->latency.sum2) / ntx; + stdev = 0.001 * sqrt(sqlat - 100.0 * latency * latency); + lag = 0.001 * (cur.lag.sum - plast->lag.sum) / ntx; + } + else + { + latency = sqlat = stdev = lag = 0; + } + + if (progress_timestamp) + { + /* + * On some platforms the current system timestamp is + * available in now_time, but rather than get entangled + * with that, we just eat the cost of an extra syscall in + * all cases. + */ + struct timeval tv; + + gettimeofday(, NULL); + snprintf(tbuf, sizeof(tbuf), "%ld.%03ld s", + (long) tv.tv_sec, (long) (tv.tv_usec / 1000)); + } + else + { + /* round seconds are expected, but the thread may be late */ + snprintf(tbuf, sizeof(tbuf), "%.1f s", total_run); + } + + fprintf(stderr, + "progress: %s, %.1f tps, lat %.3f ms stddev %.3f", + tbuf, tps, latency, stdev); + + if (throttle_delay) + { + fprintf(stderr, ", lag %.3f ms", lag); + if (latency_limit) + fprintf(stderr, ", " INT64_FORMAT " skipped", + cur.skipped - plast->skipped); + } + fprintf(stderr, "\n"); + + *plast = cur; + *plast_report = now; +} + static void * threadRun(void *arg) { @@ -5746,6 +5836,7 @@ threadRun(void *arg) int nstate = thread->nstate; int remains = nstate; /* number of remaining clients */ socket_set *sockets = alloc_socket_set(nstate); + int aborted = 0; int i; /* for reporting progress: */ @@ -5975,6 +6066,10 @@ threadRun(void *arg) */ if (st->state == CSTATE_FINISHED || st->state == CSTATE_ABORTED) remains--; + + /* count aborted clients */ + if (st->state == CSTATE_ABORTED) +aborted++; } /* progress report is made by thread 0 for all threads */ @@ -5987,93 +6082,15 @@ threadRun(void *arg) now = INSTR_TIME_GET_MICROSEC(now_time); if (now >= next_report) { -/* generate and show report */ -StatsData cur; -int64 run = now - last_report, - ntx; -double tps, - total_run, - latency, - sqlat, - lag, - stdev; -char tbuf[315]; - -/* - * Add up the statistics of all threads. - * - * XXX: No locking. There is no guarantee that we get an - * atomic snapshot of the transaction count and latencies, so - * these figures can well be off by a small amount. The - * progress report's purpose is to give a quick overview of - * how the test is going, so that shouldn't matter too much. - * (If a read from a 64-bit integer is not atomic, you might - * get a "torn" read and completely bogus latencies though!) - */ -initStats(, 0); -for (i = 0; i < nthreads; i++) -{ - mergeSimpleStats(, [i].stats.latency); - mergeSimpleStats(, [i].stats.lag); - cur.cnt += thread[i].stats.cnt; - cur.skipped += thread[i].stats.skipped; -} - -/* we count only actually executed transactions */ -ntx = (cur.cnt - cur.skipped) - (last.cnt - last.skipped); -total_run = (now - thread_start) / 100.0; -tps = 100.0 * ntx / run; -if (ntx > 0) -{ - latency = 0.001 * (cur.latency.sum - last.latency.sum) / ntx; - sqlat = 1.0 * (cur.latency.sum2 -
Don't wake up to check trigger file if none is configured
A streaming replica waiting on WAL from the master will wake up every 5 seconds to check for a trigger file. This is pointless if no trigger file has been configured. The attached patch suppresses the timeout when there is no trigger file configured. A minor thing to be sure, but there was a campaign a couple years ago to remove such spurious wake-ups, so maybe this change is worthwhile. I noticed that the existing codebase does not have a consensus on what to pass to WaitLatch for the timeout when the timeout isn't relevant. I picked 0, but -1L also has precedent. Cheers, Jeff WAL_sleep_not_triggered.patch Description: Binary data
Re: Centralize use of PG_INTXX_MIN/MAX for integer limits
Michael Paquier writes: > A couple of years ago, 62e2a8dc has introduced in c.h a set of limits > (to fix some portability issues from 83ff1618) to make the code more > system-independent. Those are for example PG_INT32_MIN, etc. The core > code now mixes the internal PG_ limits with the system ones. Would we > want to unify a bit the whole thing and replace all the SHRT_MIN/MAX, > LONG_MIN/MAX and such with the internal limit definitions? I doubt it's really worth the trouble. I did just make such a change in commit cbdb8b4c0, but it was mostly (a) so that the different ftoiN/dtoiN functions would look more alike, and (b) because the relevant variables or result values were actually declared int16, int32, etc. It would be flat wrong to replace SHRT_MIN or LONG_MIN in a context where it was used to check whether a value would fit in a variable declared "short" or "long". > I suspect that the buildfarm does not have any more members where > sizeof(int) is 2. I doubt PG has ever been able to run on two-byte-int hardware. Certainly not in the buildfarm era. > I am seeing close to 250 places in the core code, > most of them for INT_MIN and INT_MAX. You'd really need to look at the associated variables to see whether any of those would be better off as INT32_MIN/MAX. regards, tom lane
Re: [HACKERS] pgbench - allow to store select results into variables
FWIW I think the terminology in this patch is wrong. You use the term "compound" to mean "one query within a string containing multiple queries", but that's not what compound means. Compound is the whole thing, comprised of the multiple queries. Maybe "query" is the right word to use there, not sure. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Copy function for logical replication slots
Hi, On 31/08/2018 07:03, Masahiko Sawada wrote: > > Attached a new version patch incorporated the all comments I got. > This looks pretty reasonable. I am personally not big fan of the C wrappers for overloaded functions, but that's what we need to do for opr_sanity to pass so I guess we'll have to use them. The more serious thing is: > + if (MyReplicationSlot) > + ReplicationSlotRelease(); > + > + /* Release the saved slot if exist while preventing double releasing */ > + if (savedslot && savedslot != MyReplicationSlot) This won't work as intended as the ReplicationSlotRelease() will set MyReplicationSlot to NULL, you might need to set aside MyReplicationSlot to yet another temp variable inside this function prior to releasing it. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: pgsql: Add WL_EXIT_ON_PM_DEATH pseudo-event.
Re: Thomas Munro 2018-11-23 > Add WL_EXIT_ON_PM_DEATH pseudo-event. I think this broke something: TRAP: FailedAssertion(»!(!IsUnderPostmaster || (wakeEvents & (1 << 5)) || (wakeEvents & (1 << 4)))«, Datei: »/build/postgresql-12-JElZNq/postgresql-12-12~~devel~20181124.1158/build/../src/backend/storage/ipc/latch.c«, Zeile: 389) 2018-11-24 15:20:43.193 CET [17834] LOG: Serverprozess (PID 18425) wurde von Signal 6 beendet: Aborted I can trigger it just by opening an ssl connection, non-ssl tcp connections are fine. Debian unstable/amd64. Christoph
Re: could not connect to server, in order to operate pgAdmin/PostgreSQL
1. What is your operating system? It looks like Windows, but I'm not sure. Also what version number for the OS? 2. Look in the url. You are at 127.0.0.1. That is the same as localhost. But then it says 49194. Postgres uses 5432 by default. That's why you got that part of the error message, but we need a lot more detail to help you. 3. It might be that your OS is too old to run the more recent versions of Postgres. I don't use Windows anymore, but I know you have to be at Ubuntu 18 to run PG 10 +, there's probably something similar in Windows. That is one of the many reasons why knowing your OS is important information. *“None of you has faith until he loves for his brother or his neighbor what he loves for himself.”* On Fri, Nov 23, 2018 at 8:31 PM Anne Marie Harm wrote: > Hello, > > Unfortunately I'm unable to operate pgAdmin/PostgreSQL; first of all I can > only install version 9.5 (tried versions 11, 10, and 9.6 -- but cannot > install). When I launch pgAdmin in order to try to use PostgreSGL 9.5, here > is the full text of the error message I receive (also, screenshot attached): > > could not connect to server: Connection refused (0x274D/10061) Is the > server running on host "localhost" (::1) and accepting TCP/IP connections > on port 5432? could not connect to server: Connection refused > (0x274D/10061) Is the server running on host "localhost" (127.0.0.1) > and accepting TCP/IP connections on port 5432? > > Please advise; thank you. ~ Anne Marie > > > Anne Marie Harm > (312) 563-9397amharm@earthlink.nethttps://www.linkedin.com/in/annemarieharm/ > > >
Re: [HACKERS] Block level parallel vacuum
On Tue, Oct 30, 2018 at 2:04 PM Masahiko Sawada wrote: > > Attached rebased version patch to the current HEAD. > > > Please apply this patch with the extension lock patch[1] when testing > > as this patch can try to extend visibility map pages concurrently. > > > > Because the patch leads performance degradation in the case where > bulk-loading to a partitioned table I think that the original > proposal, which makes group locking conflict when relation extension > locks, is more realistic approach. So I worked on this with the simple > patch instead of [1]. Attached three patches: > > * 0001 patch publishes some static functions such as > heap_paralellscan_startblock_init so that the parallel vacuum code can > use them. > * 0002 patch makes the group locking conflict when relation extension locks. > * 0003 patch add paralel option to lazy vacuum. > > Please review them. > I could see that you have put a lot of effort on this patch and still we are not able to make much progress mainly I guess because of relation extension lock problem. I think we can park that problem for some time (as already we have invested quite some time on it), discuss a bit about actual parallel vacuum patch and then come back to it. I don't know if that is right or not. I am not sure we can make this ready for PG12 timeframe, but I feel this patch deserves some attention. I have started reading the main parallel vacuum patch and below are some assorted comments. + + Execute VACUUM in parallel by N + a background workers. Collecting garbage on table is processed + in block-level parallel. For tables with indexes, parallel vacuum assigns each + index to each parallel vacuum worker and all garbages on a index are processed + by particular parallel vacuum worker. The maximum nunber of parallel workers + is . This option can not + use with FULL option. + There are a couple of mistakes in above para: (a) "..a background workers." a seems redundant. (b) "Collecting garbage on table is processed in block-level parallel."/"Collecting garbage on table is processed at block-level in parallel." (c) "For tables with indexes, parallel vacuum assigns each index to each parallel vacuum worker and all garbages on a index are processed by particular parallel vacuum worker." We can rephrase it as: "For tables with indexes, parallel vacuum assigns a worker to each index and all garbages on a index are processed by particular that parallel vacuum worker." (d) Typo: nunber/number (e) Typo: can not/cannot I have glanced part of the patch, but didn't find any README or doc containing the design of this patch. I think without having design in place, it is difficult to review a patch of this size and complexity. To start with at least explain how the work is distributed among workers, say there are two workers which needs to vacuum a table with four indexes, how it works? How does the leader participate and coordinate the work. The other parts that you can explain how the state is maintained during parallel vacuum, something like you are trying to do in below function: + * lazy_prepare_next_state + * + * Before enter the next state prepare the next state. In parallel lazy vacuum, + * we must wait for the all vacuum workers to finish the previous state before + * preparation. Also, after prepared we change the state ot all vacuum workers + * and wake up them. + */ +static void +lazy_prepare_next_state(LVState *lvstate, LVLeader *lvleader, int next_state) Still other things are how the stats are shared among leader and worker. I can understand few things in bits and pieces while glancing through the patch, but it would be easier to understand if you document it at one place. It can help reviewers to understand it. Can you consider to split the patch so that the refactoring you have done in current code to make it usable by parallel vacuum is a separate patch? +/* + * Vacuum all indexes. In parallel vacuum, each workers take indexes + * one by one. Also after vacuumed index they mark it as done. This marking + * is necessary to guarantee that all indexes are vacuumed based on + * the current collected dead tuples. The leader process continues to + * vacuum even if any indexes is not vacuumed completely due to failure of + * parallel worker for whatever reason. The mark will be checked before entering + * the next state. + */ +void +lazy_vacuum_all_indexes(LVState *lvstate) I didn't understand what you want to say here. Do you mean that leader can continue collecting more dead tuple TIDs when workers are vacuuming the index? How does it deal with the errors if any during index vacuum? + * plan_lazy_vacuum_workers_index_workers + * Use the planner to decide how many parallel worker processes + * VACUUM and autovacuum should request for use + * + * tableOid is the table begin vacuumed which must not be non-tables or + * special system tables. .. + plan_lazy_vacuum_workers(Oid tableOid, int
Centralize use of PG_INTXX_MIN/MAX for integer limits
Hi all, A couple of years ago, 62e2a8dc has introduced in c.h a set of limits (to fix some portability issues from 83ff1618) to make the code more system-independent. Those are for example PG_INT32_MIN, etc. The core code now mixes the internal PG_ limits with the system ones. Would we want to unify a bit the whole thing and replace all the SHRT_MIN/MAX, LONG_MIN/MAX and such with the internal limit definitions? I suspect that the buildfarm does not have any more members where sizeof(int) is 2. I am seeing close to 250 places in the core code, most of them for INT_MIN and INT_MAX. Thoughts? -- Michael signature.asc Description: PGP signature
Re: 64-bit hash function for hstore and citext data type
> "Tom" == Tom Lane writes: >>> I'm inclined to fix this in hstoreUpgrade rather than complicate >>> hstore_hash with historical trivia. Also there have been no field >>> complaints - I guess it's unlikely that there is much pg 8.4 hstore >>> data in the wild that anyone wants to hash. >> Changing hstoreUpgrade at this point seems like wasted/misguided effort. Tom> Oh, cancel that --- I was having a momentary brain fade about how Tom> that function is used. I agree your proposal is sensible. Here's what I have queued up to push: -- Andrew (irc:RhodiumToad) >From d5890f49da6a77b1325a3f5822c6b092a2cd41ae Mon Sep 17 00:00:00 2001 From: Andrew Gierth Date: Sat, 24 Nov 2018 09:59:49 + Subject: [PATCH] Fix hstore hash function for empty hstores upgraded from 8.4. Hstore data generated on pg 8.4 and pg_upgraded to current versions remains in its original on-disk format unless modified. The same goes for values generated by the addon hstore-new module on pre-9.0 versions. (The hstoreUpgrade function converts old values on the fly when read in, but the on-disk value is not modified by this.) Since old-format empty hstores (and hstore-new hstores) have representations compatible with the new format, hstoreUpgrade thought it could get away without modifying such values; but this breaks hstore_hash (and the new hstore_hash_extended) which assumes bit-perfect matching between semantically identical hstore values. Only one bit actually differs (the "new version" flag in the count field) but that of course is enough to break the hash. Fix by making hstoreUpgrade unconditionally convert all old values to new format. Backpatch all the way, even though this changes a hash value in some cases, because in those cases the hash value is already failing - for example, a hash join between old- and new-format empty hstores will be failing to match, or a hash index on an hstore column containing an old-format empty value will be failing to find the value since it will be searching for a hash derived from a new-format datum. (There are no known field reports of this happening, probably because hashing of hstores has only been useful in limited circumstances and there probably isn't much upgraded data being used this way.) Per concerns arising from discussion of commit eb6f29141be. Original bug is my fault. Discussion: https://postgr.es/m/60b1fd3b-7332-40f0-7e7f-f2f04f47%402ndquadrant.com --- contrib/hstore/hstore_compat.c | 47 +- 1 file changed, 19 insertions(+), 28 deletions(-) diff --git a/contrib/hstore/hstore_compat.c b/contrib/hstore/hstore_compat.c index b95ce9b4aa..1d4e7484e4 100644 --- a/contrib/hstore/hstore_compat.c +++ b/contrib/hstore/hstore_compat.c @@ -238,34 +238,35 @@ hstoreUpgrade(Datum orig) HStore *hs = (HStore *) PG_DETOAST_DATUM(orig); int valid_new; int valid_old; - bool writable; /* Return immediately if no conversion needed */ - if ((hs->size_ & HS_FLAG_NEWVERSION) || - hs->size_ == 0 || + if (hs->size_ & HS_FLAG_NEWVERSION) + return hs; + + /* Do we have a writable copy? If not, make one. */ + if ((void *) hs == (void *) DatumGetPointer(orig)) + hs = (HStore *) PG_DETOAST_DATUM_COPY(orig); + + if (hs->size_ == 0 || (VARSIZE(hs) < 32768 && HSE_ISFIRST((ARRPTR(hs)[0] + { + HS_SETCOUNT(hs, HS_COUNT(hs)); + HS_FIXSIZE(hs, HS_COUNT(hs)); return hs; + } valid_new = hstoreValidNewFormat(hs); valid_old = hstoreValidOldFormat(hs); - /* Do we have a writable copy? */ - writable = ((void *) hs != (void *) DatumGetPointer(orig)); if (!valid_old || hs->size_ == 0) { if (valid_new) { /* - * force the "new version" flag and the correct varlena length, - * but only if we have a writable copy already (which we almost - * always will, since short new-format values won't come through - * here) + * force the "new version" flag and the correct varlena length. */ - if (writable) - { -HS_SETCOUNT(hs, HS_COUNT(hs)); -HS_FIXSIZE(hs, HS_COUNT(hs)); - } + HS_SETCOUNT(hs, HS_COUNT(hs)); + HS_FIXSIZE(hs, HS_COUNT(hs)); return hs; } else @@ -302,15 +303,10 @@ hstoreUpgrade(Datum orig) elog(WARNING, "ambiguous hstore value resolved as hstore-new"); /* - * force the "new version" flag and the correct varlena length, but - * only if we have a writable copy already (which we almost always - * will, since short new-format values won't come through here) + * force the "new version" flag and the correct varlena length. */ - if (writable) - { - HS_SETCOUNT(hs, HS_COUNT(hs)); - HS_FIXSIZE(hs, HS_COUNT(hs)); - } + HS_SETCOUNT(hs, HS_COUNT(hs)); + HS_FIXSIZE(hs, HS_COUNT(hs)); return hs; #else elog(WARNING, "ambiguous hstore value resolved as hstore-old"); @@ -318,13 +314,8 @@ hstoreUpgrade(Datum orig) } /* - * must have an old-style value. Overwrite it in place as a new-style one, - * making sure we have a writable copy first.
Re: FETCH FIRST clause WITH TIES option
Attach is rebased patch against the current master regards Surafel On Thu, Nov 1, 2018 at 2:28 PM Surafel Temesgen wrote: > hi, > > The attached patch include all the comment given by Tomas and i check sql > standard about LIMIT and this feature > > it did not say anything about it but I think its good idea to include it > to LIMIT too and I will add it if we have consensus on it. > > regards > > surafel > diff --git a/doc/src/sgml/ref/select.sgml b/doc/src/sgml/ref/select.sgml index 4db8142afa..ed7249daeb 100644 --- a/doc/src/sgml/ref/select.sgml +++ b/doc/src/sgml/ref/select.sgml @@ -44,7 +44,7 @@ SELECT [ ALL | DISTINCT [ ON ( expressionexpression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ] [ LIMIT { count | ALL } ] [ OFFSET start [ ROW | ROWS ] ] -[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY ] +[ FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] ] [ FOR { UPDATE | NO KEY UPDATE | SHARE | KEY SHARE } [ OF table_name [, ...] ] [ NOWAIT | SKIP LOCKED ] [...] ] where from_item can be one of: @@ -1397,7 +1397,7 @@ OFFSET start which PostgreSQL also supports. It is: OFFSET start { ROW | ROWS } -FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } ONLY +FETCH { FIRST | NEXT } [ count ] { ROW | ROWS } [ ONLY | WITH TIES ] In this syntax, the start or count value is required by @@ -1407,7 +1407,10 @@ FETCH { FIRST | NEXT } [ count ] { ambiguity. If count is omitted in a FETCH clause, it defaults to 1. -ROW +ROW . +WITH TIES option is used to return two or more rows +that tie for last place in the limit results set according to ORDER BY +clause (ORDER BY clause must be specified in this case). and ROWS as well as FIRST and NEXT are noise words that don't influence the effects of these clauses. diff --git a/src/backend/executor/nodeLimit.c b/src/backend/executor/nodeLimit.c index 6792f9e86c..4fc3f61b26 100644 --- a/src/backend/executor/nodeLimit.c +++ b/src/backend/executor/nodeLimit.c @@ -41,6 +41,7 @@ static TupleTableSlot * /* return: a tuple or NULL */ ExecLimit(PlanState *pstate) { LimitState *node = castNode(LimitState, pstate); + ExprContext *econtext = node->ps.ps_ExprContext; ScanDirection direction; TupleTableSlot *slot; PlanState *outerPlan; @@ -131,7 +132,8 @@ ExecLimit(PlanState *pstate) * the state machine state to record having done so. */ if (!node->noCount && - node->position - node->offset >= node->count) + node->position - node->offset >= node->count && + node->limitOption == WITH_ONLY) { node->lstate = LIMIT_WINDOWEND; @@ -145,17 +147,64 @@ ExecLimit(PlanState *pstate) return NULL; } -/* - * Get next tuple from subplan, if any. - */ -slot = ExecProcNode(outerPlan); -if (TupIsNull(slot)) +else if (!node->noCount && + node->position - node->offset >= node->count && + node->limitOption == WITH_TIES) { - node->lstate = LIMIT_SUBPLANEOF; - return NULL; + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } + /* + * Test if the new tuple and the last tuple match. + * If so we return the tuple. + */ + econtext->ecxt_innertuple = slot; + econtext->ecxt_outertuple = node->last_slot; + if (ExecQualAndReset(node->eqfunction, econtext)) + { + ExecCopySlot(node->last_slot, slot); + node->subSlot = slot; + node->position++; + } + else + { + node->lstate = LIMIT_WINDOWEND; + + /* + * If we know we won't need to back up, we can release + * resources at this point. + */ + if (!(node->ps.state->es_top_eflags & EXEC_FLAG_BACKWARD)) + (void) ExecShutdownNode(outerPlan); + + return NULL; + } + +} +else +{ + /* + * Get next tuple from subplan, if any. + */ + slot = ExecProcNode(outerPlan); + if (TupIsNull(slot)) + { + node->lstate = LIMIT_SUBPLANEOF; + return NULL; + } + if (node->limitOption == WITH_TIES) + { + ExecCopySlot(node->last_slot, slot); + } + node->subSlot = slot; + node->position++; } -node->subSlot = slot; -node->position++; } else { @@ -311,7 +360,8 @@ recompute_limits(LimitState *node) * must update the child node anyway, in case this is a rescan and the * previous time we got a different result. */ - ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node)); + if(node->limitOption == WITH_ONLY) + ExecSetTupleBound(compute_tuples_needed(node), outerPlanState(node)); } /* @@ -374,6 +424,7 @@ ExecInitLimit(Limit *node, EState *estate, int eflags) (PlanState *) limitstate); limitstate->limitCount = ExecInitExpr((Expr *) node->limitCount,
Re: COPY FROM WHEN condition
On Sat, 24 Nov 2018 at 02:09, Tomas Vondra wrote: > On 11/23/18 12:14 PM, Surafel Temesgen wrote: > > On Sun, Nov 11, 2018 at 11:59 PM Tomas Vondra > > mailto:tomas.von...@2ndquadrant.com>> wrote: > > So, what about using FILTER here? We already use it for aggregates when > > filtering rows to process. > > > > i think its good idea and describe its purpose more. Attache is a > > patch that use FILTER instead > FWIW, I vote for just using WHERE here. Right now we have 2 syntaxes for filtering rows in queries, both of which use WHERE immediately before the condition: 1). SELECT ... FROM ... WHERE condition 2). SELECT agg_fn FILTER (WHERE condition) FROM ... I'm not a huge fan of (2), but that's the SQL standard, so we're stuck with it. There's a certain consistency in it's use of WHERE to introduce the condition, and preceding that with FILTER helps to distinguish it from any later WHERE clause. But what you'd be adding here would be a 3rd syntax 3). COPY ... FROM ... FILTER condition which IMO will just lead to confusion. Regards, Dean
Re: pgbench - doCustom cleanup
Hello Alvaro, I just pushed this. I hope not to have upset you too much with the subroutine thing. Sorry for the feedback delay, my week was kind of overloaded… Thanks for the push. About the patch you committed, a post-commit review: - the state and function renamings are indeed a good thing. - I'm not fond of "now = func(..., now)", I'd have just passed a reference. - I'd put out the meta commands, but keep the SQL case and the state assignment in the initial function, so that all state changes are in one function… which was the initial aim of the submission. Kind of a compromise:-) See the attached followup patch which implements these suggestions. The patch is quite small under "git diff -w", but larger because of the reindentation as one "if" level is removed. -- Fabien.diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index c64e16187a..7392cf8688 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -580,8 +580,7 @@ static void setIntValue(PgBenchValue *pv, int64 ival); static void setDoubleValue(PgBenchValue *pv, double dval); static bool evaluateExpr(TState *thread, CState *st, PgBenchExpr *expr, PgBenchValue *retval); -static instr_time doExecuteCommand(TState *thread, CState *st, - instr_time now); +static ConnectionStateEnum executeMetaCommand(TState *thread, CState *st, instr_time *now); static void doLog(TState *thread, CState *st, StatsData *agg, bool skipped, double latency, double lag); static void processXactStats(TState *thread, CState *st, instr_time *now, @@ -2862,6 +2861,7 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) */ for (;;) { + Command *command; PGresult *res; switch (st->state) @@ -3003,8 +3003,10 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) * Send a command to server (or execute a meta-command) */ case CSTATE_START_COMMAND: +command = sql_script[st->use_file].commands[st->command]; + /* Transition to script end processing if done */ -if (sql_script[st->use_file].commands[st->command] == NULL) +if (command == NULL) { st->state = CSTATE_END_TX; break; @@ -3016,7 +3018,27 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) INSTR_TIME_SET_CURRENT_LAZY(now); st->stmt_begin = now; } -now = doExecuteCommand(thread, st, now); + +if (command->type == SQL_COMMAND) +{ + if (!sendCommand(st, command)) + { + commandFailed(st, "SQL", "SQL command send failed"); + st->state = CSTATE_ABORTED; + } + else + st->state = CSTATE_WAIT_RESULT; +} +else if (command->type == META_COMMAND) +{ + /*- + * Possible state changes when executing meta commands: + * - on errors CSTATE_ABORTED + * - on sleep CSTATE_SLEEP + * - else CSTATE_END_COMMAND + */ + st->state = executeMetaCommand(thread, st, ); +} /* * We're now waiting for an SQL command to complete, or @@ -3254,178 +3276,151 @@ advanceConnectionState(TState *thread, CState *st, StatsData *agg) /* * Subroutine for advanceConnectionState -- execute or initiate the current - * command, and transition to next state appropriately. - * - * Returns an updated timestamp from 'now', used to update 'now' at callsite. + * meta command, and return to next state appropriately. */ -static instr_time -doExecuteCommand(TState *thread, CState *st, instr_time now) +static ConnectionStateEnum +executeMetaCommand(TState *thread, CState *st, instr_time *now) { Command*command = sql_script[st->use_file].commands[st->command]; + int argc; + char **argv; - /* execute the command */ - if (command->type == SQL_COMMAND) + Assert(command != NULL && command->type == META_COMMAND); + + argc = command->argc; + argv = command->argv; + + if (debug) { - if (!sendCommand(st, command)) - { - commandFailed(st, "SQL", "SQL command send failed"); - st->state = CSTATE_ABORTED; - } - else - st->state = CSTATE_WAIT_RESULT; + fprintf(stderr, "client %d executing \\%s", st->id, argv[0]); + for (int i = 1; i < argc; i++) + fprintf(stderr, " %s", argv[i]); + fprintf(stderr, "\n"); } - else if (command->type == META_COMMAND) + + if (command->meta == META_SLEEP) { - int argc = command->argc; - char **argv = command->argv; - - if (debug) - { - fprintf(stderr, "client %d executing \\%s", - st->id, argv[0]); - for (int i = 1; i < argc; i++) -fprintf(stderr, " %s", argv[i]); - fprintf(stderr, "\n"); - } - - if (command->meta == META_SLEEP) - { - int usec; - - /* - * A \sleep doesn't execute anything, we just get the delay from - * the argument, and enter the CSTATE_SLEEP state. (The - * per-command latency will be recorded in CSTATE_SLEEP state, not - * here, after the delay has elapsed.) - */ - if (!evaluateSleep(st, argc, argv, )) - { -commandFailed(st, "sleep",