Re: utilities to rebuild commit logs from wal
On Fri, Jun 22, 2018 at 10:49:58AM +0200, Chris Travers wrote: > Before we reinvent the wheel here, does anyone know of utilities to build > commit logs from wal segments? Or even to just fill with commits? > > I figure it is worth asking before I write one. Uh, what are commit log? pg_waldump output? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Incorrect errno used with %m for backend code
On Sat, Jun 23, 2018 at 6:43 PM, Michael Paquier wrote: > On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote: >> Okay, thanks for the confirmation. Few of them are also there in >> origin.c and snapbuild.c files. > > Thanks Ashutosh. I have been reviewing the whole tree and I found more > places where this is missing, like rewriteheap.c, reorderbuffer.c or > pg_basebackup, which gives the attached. > -- Okay, I too had a quick look into the source code to see if there are still some places where we could have missed to set an errno to ENOSPC in case of write system call failure but, couldn't find any such place in the code. The v2 version of patch looks good to me. So, to conclude, now, v2 patch fixes two things - 1) It makes ereport to print a correct error number (the error number that matches with the error message), 2) It sets the errno to ENOSPC (assuming that the problem is no disk space) if write system call fails to set an errno. -- With Regards, Ashutosh Sharma EnterpriseDB:http://www.enterprisedb.com
Re: Keeping temporary tables in shared buffers
On Thu, Jun 21, 2018 at 07:42:54AM +0530, Amit Kapila wrote: > On Wed, Jun 20, 2018 at 8:47 PM, Bruce Momjian wrote: > > On Sat, Jun 2, 2018 at 05:18:17PM -0400, Asim Praveen wrote: > >> Hi Amit > >> > >> On Mon, May 28, 2018 at 4:25 AM, Amit Kapila > >> wrote: > >> > > >> > This is one way, but I think there are other choices as well. We can > >> > identify and flush all the dirty (local) buffers for the relation > >> > being accessed parallelly. Now, once the parallel operation is > >> > started, we won't allow performing any write operation on them. It > >> > >> We talked about this in person in Ottawa and it was great meeting you! > >> To summarize, the above proposal to continue using local buffers for > >> temp tables is a step forward, however, it enables only certain kinds > >> of queries to be parallelized for temp tables. E.g. queries changing > >> a temp table in any way cannot be parallelized due to the restriction > >> of no writes during parallel operation. > > > > Should this be a TODO item? > > > > +1. I think we have not hammered out the design completely, but if > somebody is willing to put effort, it is not an unsolvable problem. > AFAIU, this thread is about parallelizing queries that refer temp > tables, however, it is not clear from the title of this thread. Seems it is already documented on the wiki: https://wiki.postgresql.org/wiki/Parallel_Query#What_Parts_of_a_Query_Can_Run_In_Parallel.3F o Scans of plain tables may not appear below Gather if (1) they are temporary tables ... -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: comma to delimit fractional seconds
Chris Howard writes: > I looked at the TODO list and saw the issue of using comma to delimit > fractional seconds. ... > Is there anything more to it than allowing HH:MM:SS,### as well as > HH:MM:SS.# ? Here's the thing: most of the easy-looking stuff on the TODO list is there because there's not actually consensus about how or whether to do it. In the case at hand, what you'd need to convince people of is that there's not a significant penalty in terms of robustness (error detection) if we allow commas to substitute for periods. There's a bunch of existing use-cases that depend on commas being noise, so it's not obvious that this is OK. regards, tom lane
Re: pg_upgrade from 9.4 to 10.4
On Fri, Jun 15, 2018 at 03:01:37PM -0700, Vimalraj A wrote: > Hi, > > After pg_upgrade, the data in Postgres is corrupted. > > 1. Postgres 9.4, stop db with "immediate" mode > 2. Run pg_upgrade, to upgrade to Postgres 10.4 > 3. Start Postgres 10 and run vacuum full, got a bunch of "concurrent insert in > progress". Looks like the data is corrupted. (Loading the old data back on > Postgres 9.4 works just fine) > > But if I stop the 9.4 DB with "smart" mode, the data after pg_upgrade looks > fine. > > pg_upgrade doesn't stop or throw warnings if the upgraded db gets into > corrupted state. > > > I would like to understand why it happens so. > 1. What transient state corrupts the db? > 2. Is it a known issue with pg_upgrade? > 3. Is there a way to get the data from pg_upgrade after "immediate" mode stop > of previous version? Well, that's interesting. We document to shut down the old and new sever with pg_ctl stop, but don't specify to avoid immediate. The reason you are having problems is that pg_upgrade does not copy the WAL from the old cluster to the new one, so there is no way to replay the needed WAL during startup of the new server, which leads to corruption. Did you find this out in testing or in actual use? What is also interesting is how pg_upgrade tries to avoid problems with _crash_ shutdowns --- if it sees a postmaster lock file, it tries to start the server, and if that works, it then stops it, causing the WAL to be replayed and cleanly shutdown. What it _doesn't_ handle is pg_ctl -m immediate, which removes the lock file, but does leave WAL in need of replay. Oops! Ideally we could detect this before we check pg_controldata and then do the start/stop trick to fix the WAL, but the ordering of the code makes that hard. Instead, I have developed the attached patch which does a check for the cluster state at the same time we are checking pg_controldata, and reports an error if there is not a clean shutdown. Based on how rare this is, this is probably the cleanest solution, and I think can be backpatched. Patch attached. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + diff --git a/src/bin/pg_upgrade/controldata.c b/src/bin/pg_upgrade/controldata.c new file mode 100644 index 0fe98a5..bba3b1b *** a/src/bin/pg_upgrade/controldata.c --- b/src/bin/pg_upgrade/controldata.c *** get_control_data(ClusterInfo *cluster, b *** 58,63 --- 58,64 bool got_large_object = false; bool got_date_is_int = false; bool got_data_checksum_version = false; + bool got_cluster_state = false; char *lc_collate = NULL; char *lc_ctype = NULL; char *lc_monetary = NULL; *** get_control_data(ClusterInfo *cluster, b *** 423,428 --- 424,487 pclose(output); /* + * Check for clean shutdown + */ + + /* only pg_controldata outputs the cluster state */ + snprintf(cmd, sizeof(cmd), "\"%s/pg_controldata\" \"%s\"", + cluster->bindir, cluster->pgdata); + fflush(stdout); + fflush(stderr); + + if ((output = popen(cmd, "r")) == NULL) + pg_fatal("could not get control data using %s: %s\n", + cmd, strerror(errno)); + + /* we have the result of cmd in "output". so parse it line by line now */ + while (fgets(bufin, sizeof(bufin), output)) + { + if ((!live_check || cluster == _cluster) && + (p = strstr(bufin, "Database cluster state:")) != NULL) + { + p = strchr(p, ':'); + + if (p == NULL || strlen(p) <= 1) + pg_fatal("%d: database cluster state problem\n", __LINE__); + + p++;/* remove ':' char */ + + /* + * We checked earlier for a postmaster lock file, and if we found + * one, we tried to start/stop the server to replay the WAL. However, + * pg_ctl -m immediate doesn't leave a lock file, but does require + * WAL replay, so we check here that the server was shut down cleanly, + * from the controldata perspective. + */ + /* remove leading spaces */ + while (*p == ' ') + p++; + if (strcmp(p, "shut down\n") != 0) + { + if (cluster == _cluster) + pg_fatal("The source cluster was not shut down cleanly.\n"); + else + pg_fatal("The target cluster was not shut down cleanly.\n"); + } + got_cluster_state = true; + } + } + + pclose(output); + + if (!got_cluster_state) + { + if (cluster == _cluster) + pg_fatal("The source cluster lacks cluster state information:\n"); + else + pg_fatal("The target cluster lacks cluster state information:\n"); + } + + /* * Restore environment variables */ pg_putenv("LC_COLLATE", lc_collate);
Re: WIP: BRIN multi-range indexes
Hi, Attached is rebased version of this BRIN patch series, fixing mostly the breakage due to 372728b0 (aka initial-catalog-data format changes). As 2018-07 CF is meant for almost-ready patches, this is more a 2018-09 material. But perhaps someone would like to take a look - and I'd have to fix it anyway ... At the pgcon dev meeting I suggested that long-running patches should have a "summary" post once in a while, so that reviewers don't have to reread the whole thread and follow all the various discussions. So let me start with this thread, although it's not a particularly long or complex one, nor does it have a long discussion. But anyway ... The patches introduce two new BRIN opclasses - minmax-multi and bloom. minmax-multi minmax-multi is a variant of the current minmax opclass that handles cases where the plain minmax opclass degrades due to outlier values. Imagine almost perfectly correlated data (say, timestamps in a log table) - that works great with regular minmax indexes. But if you go and delete a bunch of historical messages (for whatever reason), new rows with new timestamps will be routed to the empty space and the minmax indexes will degrade because the ranges will get much "wider" due to the new values. The minmax-multi indexes deal with that by maintaining not a single minmax range, but several of them. That allows tracking the outlier values separately, without constructing one wide minmax range. Consider this artificial example: create table t (a bigint, b int); alter t set (fillfactor=95); insert into t select i + 1000*random(), i+1000*random() from generate_series(1,1) s(i); update t set a = 1, b = 1 where random() < 0.001; update t set a = 1, b = 1 where random() < 0.001; Now if you create a regular minmax index, it's going to perform terribly, because pretty much every minmax range is [1,1] thanks to the update of 0.1% of rows. create index on t using brin (a); explain analyze select * from t where a between 1923300::int and 1923600::int; QUERY PLAN - Bitmap Heap Scan on t (cost=75.11..75884.45 rows=319 width=12) (actual time=948.906..101739.892 rows=308 loops=1) Recheck Cond: ((a >= 1923300) AND (a <= 1923600)) Rows Removed by Index Recheck: 9692 Heap Blocks: lossy=568182 -> Bitmap Index Scan on t_a_idx (cost=0.00..75.03 rows=22587 width=0) (actual time=89.357..89.357 rows=5681920 loops=1) Index Cond: ((a >= 1923300) AND (a <= 1923600)) Planning Time: 2.161 ms Execution Time: 101740.776 ms (8 rows) But with the minmax-multi opclass, this is not an issue: create index on t using brin (a int8_minmax_multi_ops); QUERY PLAN --- Bitmap Heap Scan on t (cost=1067.11..76876.45 rows=319 width=12) (actual time=38.906..49.763 rows=308 loops=1) Recheck Cond: ((a >= 1923300) AND (a <= 1923600)) Rows Removed by Index Recheck: 0 Heap Blocks: lossy=128 -> Bitmap Index Scan on t_a_idx (cost=0.00..1067.03 rows=22587 width=0) (actual time=28.069..28.069 rows=1280 loops=1) Index Cond: ((a >= 1923300) AND (a <= 1923600)) Planning Time: 1.715 ms Execution Time: 50.866 ms (8 rows) Which is clearly a big improvement. Doing this required some changes to how BRIN evaluates conditions on page ranges. With a single minmax range it was enough to evaluate them one by one, but minmax-multi needs to see all of them at once (to match them against the partial ranges). Most of the complexity is in building the summary, particularly picking which values (partial ranges) to merge. The max number of values in the summary is specified as values_per_range index reloption, and by default it's set to 64, so there can be either 64 points or 32 intervals or some combination of those. I've been thinking about some automated way to tune this (either globally or for each page range independently), but so far I have not been very successful. The challenge is that making good decisions requires global information about values in the column (e.g. global minimum and maximum). I think the reloption with 64 as a default is a good enough solution for now. Perhaps the stats from pg_statistic would be useful for improving this in the future, but I'm not sure. bloom = As the name suggests, this opclass uses bloom filter for the summary. Compared to the minmax-multi it's a bit more experimental idea, but I believe the foundations are safe. Using bloom filter means that the index can only support equalities, but for many use cases that's an acceptable limitation - UUID, IP addresses, ... (various identifiers in general). Of course, how to size the bloom filter? It's
Re: [GSoC] array processing questions
> "Charles" == Charles Cui writes: Charles> Hi mentors and hackers, Charles> One question about array processing in postgresql. Assume Charles> my input is a Datum (which contains an array). Are there any Charles> examples to loop through the array and operates on each Charles> element? feel like it is not that straight-ford. See array_iter_setup and array_iter_next -- Andrew (irc:RhodiumToad)
[GSoC] working status
Hi mentors and hackers, Here is my current working status. Resolved all warnings found by Aleksander previously. Having two threads in parallel. One is the thrift binary type implementation, the other is thrift compact byte interface implementation. For these two threads, simple data type has been implemented and tested, but still need time to focus on complicated data type like struct, map or list. Let me know if you have any questions. Here is the repo with latest updates, https://github.com/charles-cui/pg_thrift Thanks, Charles.
Re: pg11b1 from outside a txn: "VACUUM cannot run inside a transaction block": should be: ...or multi-command string
> "Justin" == Justin Pryzby writes: Justin> I couldn't figure out how to \set VERBOSITY verbose inside a Justin> psql command (??), psql -v VERBOSITY=verbose -c 'query here' -- Andrew (irc:RhodiumToad)
[GSoC] array processing questions
Hi mentors and hackers, One question about array processing in postgresql. Assume my input is a Datum (which contains an array). Are there any examples to loop through the array and operates on each element? feel like it is not that straight-ford. Thanks Charles
pg11b1 from outside a txn: "VACUUM cannot run inside a transaction block": should be: ...or multi-command string
in pg10: ts=# begin;VACUUM FULL pg_toast.pg_toast_2619; BEGIN ERROR: 25001: VACUUM cannot run inside a transaction block LOCATION: PreventTransactionChain, xact.c:3167 => sounds fine $ psql postgres -c 'SELECT 1; VACUUM pg_statistic' ERROR: VACUUM cannot be executed from a function or multi-command string => sounds fine In pg11b1: pryzbyj=# begin;VACUUM FULL pg_toast.pg_toast_2619; BEGIN ERROR: 25001: VACUUM cannot run inside a transaction block LOCATION: PreventInTransactionBlock, xact.c:3163 => sounds fine [pryzbyj@dev ~]$ psql -c 'SELECT 1; VACUUM pg_statistic' ERROR: VACUUM cannot run inside a transaction block => Error message seems off?? I couldn't figure out how to \set VERBOSITY verbose inside a psql command (??), but strace shows for v10: SERROR\0VERROR\0C25001\0MVACUUM cannot be executed from a function or multi-command string\0Fxact.c\0L3187\0RPreventTransactionChain And for v11: SERROR\0VERROR\0C25001\0MVACUUM cannot run inside a transaction block\0Fxact.c\0L3163\0RPreventInTransactionBlock Function renamed by commit 04700b685f31508036456bea4d92533e5ceee9d6. So behavior change maybe caused by 6eb52da3948dc8bc7c8a61cbacac14823b670c58 ? Justin
Re: Does logical replication supports cross platform servers?
On Wed, Jun 13, 2018 at 11:42:14AM +1000, Haribabu Kommi wrote: > > On Tue, Jun 12, 2018 at 1:29 PM Craig Ringer wrote: > > On 12 June 2018 at 11:04, Haribabu Kommi wrote: > > Hi All, > > I am not able to find any docs suggesting that PostgreSQL logical > replication supports > cross platform servers (windows --> Linux or vice-versa). > > > > It should. I don't think there's buildfarm coverage yet, though. > > > Thanks for the confirmation. > This is a good use case that must be explicitly mentioned in the docs. > How about the attached patch? OK, doc patch added to git head. Let me know if I should back patch it further. Thanks. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Log query parameters for terminated execute
2018-06-23 21:54 GMT+02:00 Sergei Kornilov : > Hello all > We already have feature to logging query parameters. If we use > log_statement = 'all' we write parameters before execution and all is fine > here. If we use log_min_duration_statement statement is logged obviously > after execution, but currently we have no parameters if query was > terminated by statement_timeout, lock_timeout or by pg_terminate_backend. > > I would like have parameters in logs at least for such three cases. > It is good idea - more times I would to see these values Regards Pavel > Simple way achieve this is just add errdetail_params to such ereport as in > attached patch. > Another way is add something like printing global variable > debug_query_string in send_message_to_server_log > (src/backend/utils/error/elog.c). But i have no good idea how print > ParamListInfo correctly. We can not use OidOutputFunctionCall in all cases, > right? > > Another small question is why errdetail_params uses errdetail instead > errdetail_log? We assume that the user wants to get their own parameters > back (if he set client_min_messages to LOG)? > > Any feedback is strongly appreciated. Thank you! > > regards, Sergei
Retrieve memory size allocated by libpq
Hello, I would like to be able to retrieve the size of memory internally allocated by libpq for a result. The reason is that we have a Ruby wrapper that exposes libpq in Ruby. The problem is that Ruby's GC doesn't know how much memory has been allocated by libpq, so no pressure is applied to the GC when it should be. With this function we could instruct the GC about the memory usage associated to each result object. This issue has already been discussed in the following thread, with the request to use custom malloc/realloc/free functions: https://www.postgresql.org/message-id/flat/20170828172834.GA71455%40TC.local#20170828172834.GA71455@TC.local Retrieving the allocated memory size is another approach to solve the same base issue. However since the relation between memory consumption and the particular result object is maintained, it can additionally be used to provide diagnostic information to each object. What do you think about adding such a function? -- Kind Regards, Lars From d3ac8089a1b8c26d29d8d8e93c48a892cec75e53 Mon Sep 17 00:00:00 2001 From: Lars Kanis Date: Sat, 23 Jun 2018 19:34:11 +0200 Subject: [PATCH] libpq: Add function PQresultMemsize() This function retrieves the number of bytes allocated for a given result. That can be used to instruct the GC about the memory consumed behind a wrapping object and for diagnosing memory consumtion. This is an alternative approach to customizable malloc/realloc/free functions as discussed here: https://www.postgresql.org/message-id/flat/20170828172834.GA71455%40TC.local#20170828172834.GA71455@TC.local --- src/interfaces/libpq/exports.txt | 1 + src/interfaces/libpq/fe-exec.c | 14 +- src/interfaces/libpq/libpq-fe.h | 1 + src/interfaces/libpq/libpq-int.h | 2 ++ 4 files changed, 17 insertions(+), 1 deletion(-) diff --git a/src/interfaces/libpq/exports.txt b/src/interfaces/libpq/exports.txt index d6a38d0df8..0b50dddbb7 100644 --- a/src/interfaces/libpq/exports.txt +++ b/src/interfaces/libpq/exports.txt @@ -172,3 +172,4 @@ PQsslAttribute169 PQsetErrorContextVisibility 170 PQresultVerboseErrorMessage 171 PQencryptPasswordConn 172 +PQresultMemsize 173 diff --git a/src/interfaces/libpq/fe-exec.c b/src/interfaces/libpq/fe-exec.c index 4c0114c514..064c7a693c 100644 --- a/src/interfaces/libpq/fe-exec.c +++ b/src/interfaces/libpq/fe-exec.c @@ -166,6 +166,7 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) result->curBlock = NULL; result->curOffset = 0; result->spaceLeft = 0; + result->memsize = sizeof(PGresult); if (conn) { @@ -215,6 +216,12 @@ PQmakeEmptyPGresult(PGconn *conn, ExecStatusType status) return result; } +size_t +PQresultMemsize(const PGresult *res) +{ + return res->memsize; +} + /* * PQsetResultAttrs * @@ -567,9 +574,11 @@ pqResultAlloc(PGresult *res, size_t nBytes, bool isBinary) */ if (nBytes >= PGRESULT_SEP_ALLOC_THRESHOLD) { - block = (PGresult_data *) malloc(nBytes + PGRESULT_BLOCK_OVERHEAD); + size_t alloc_size = nBytes + PGRESULT_BLOCK_OVERHEAD; + block = (PGresult_data *) malloc(alloc_size); if (!block) return NULL; + res->memsize += alloc_size; space = block->space + PGRESULT_BLOCK_OVERHEAD; if (res->curBlock) { @@ -594,6 +603,7 @@ pqResultAlloc(PGresult *res, size_t nBytes, bool isBinary) block = (PGresult_data *) malloc(PGRESULT_DATA_BLOCKSIZE); if (!block) return NULL; + res->memsize += PGRESULT_DATA_BLOCKSIZE; block->next = res->curBlock; res->curBlock = block; if (isBinary) @@ -711,6 +721,7 @@ PQclear(PGresult *res) res->errFields = NULL; res->events = NULL; res->nEvents = 0; + res->memsize = 0; /* res->curBlock was zeroed out earlier */ /* Free the PGresult structure itself */ @@ -927,6 +938,7 @@ pqAddTuple(PGresult *res, PGresAttValue *tup, const char **errmsgp) realloc(res->tuples, newSize * sizeof(PGresAttValue *)); if (!newTuples) return false; /* malloc or realloc failed */ + res->memsize += (newSize - res->tupArrSize) * sizeof(PGresAttValue *); res->tupArrSize = newSize; res->tuples = newTuples; } diff --git a/src/interfaces/libpq/libpq-fe.h b/src/interfaces/libpq/libpq-fe.h index ed9c806861..4fd9a4fcda 100644 --- a/src/interfaces/libpq/libpq-fe.h +++ b/src/interfaces/libpq/libpq-fe.h @@ -491,6 +491,7 @@ extern int PQgetlength(const PGresult *res, int tup_num, int field_num); extern int PQgetisnull(const PGresult *res, int tup_num, int field_num); extern int PQnparams(const PGresult *res); extern Oid PQparamtype(const PGresult *res, int param_num); +extern size_t PQresultMemsize(const PGresult *res); /* Describe prepared statements and portals */ extern PGresult *PQdescribePrepared(PGconn *conn, const char *stmt); diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h index 9a586ff25a..37c9c3853d 100644 --- a/src/interfaces/libpq/libpq-int.h +++ b/src/interfaces/libpq/libpq-int.h @@ -208,6 +208,8 @@ struct pg_result PGresult_data
Log query parameters for terminated execute
Hello all We already have feature to logging query parameters. If we use log_statement = 'all' we write parameters before execution and all is fine here. If we use log_min_duration_statement statement is logged obviously after execution, but currently we have no parameters if query was terminated by statement_timeout, lock_timeout or by pg_terminate_backend. I would like have parameters in logs at least for such three cases. Simple way achieve this is just add errdetail_params to such ereport as in attached patch. Another way is add something like printing global variable debug_query_string in send_message_to_server_log (src/backend/utils/error/elog.c). But i have no good idea how print ParamListInfo correctly. We can not use OidOutputFunctionCall in all cases, right? Another small question is why errdetail_params uses errdetail instead errdetail_log? We assume that the user wants to get their own parameters back (if he set client_min_messages to LOG)? Any feedback is strongly appreciated. Thank you! regards, Sergeidiff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index f413395..ef6877e 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -171,6 +171,9 @@ static ProcSignalReason RecoveryConflictReason; static MemoryContext row_description_context = NULL; static StringInfoData row_description_buf; +/* current portal parameters */ +static ParamListInfo debug_query_params = NULL; + /* * decls for routines only used in this file * @@ -183,7 +186,7 @@ static void forbidden_in_wal_sender(char firstchar); static List *pg_rewrite_query(Query *query); static bool check_log_statement(List *stmt_list); static int errdetail_execute(List *raw_parsetree_list); -static int errdetail_params(ParamListInfo params); +static int errdetail_log_params(ParamListInfo params); static int errdetail_abort(void); static int errdetail_recovery_conflict(void); static void start_xact_command(void); @@ -1850,7 +1853,7 @@ exec_bind_message(StringInfo input_message) *portal_name ? portal_name : "", psrc->query_string), errhidestmt(true), - errdetail_params(params))); + errdetail_log_params(params))); break; } @@ -1934,6 +1937,7 @@ exec_execute_message(const char *portal_name, long max_rows) else prepStmtName = ""; portalParams = portal->portalParams; + debug_query_params = portalParams; } /* @@ -1985,7 +1989,7 @@ exec_execute_message(const char *portal_name, long max_rows) *portal_name ? portal_name : "", sourceText), errhidestmt(true), - errdetail_params(portalParams))); + errdetail_log_params(portalParams))); was_logged = true; } @@ -2074,7 +2078,7 @@ exec_execute_message(const char *portal_name, long max_rows) *portal_name ? portal_name : "", sourceText), errhidestmt(true), - errdetail_params(portalParams))); + errdetail_log_params(portalParams))); break; } @@ -2082,6 +2086,7 @@ exec_execute_message(const char *portal_name, long max_rows) ShowUsage("EXECUTE MESSAGE STATISTICS"); debug_query_string = NULL; + debug_query_params = NULL; } /* @@ -2200,12 +2205,12 @@ errdetail_execute(List *raw_parsetree_list) } /* - * errdetail_params + * errdetail_log_params * - * Add an errdetail() line showing bind-parameter data, if available. + * Add an errdetail_log() line showing bind-parameter data, if available. */ static int -errdetail_params(ParamListInfo params) +errdetail_log_params(ParamListInfo params) { /* We mustn't call user-defined I/O functions when in an aborted xact */ if (params && params->numParams > 0 && !IsAbortedTransactionBlockState()) @@ -2256,7 +2261,7 @@ errdetail_params(ParamListInfo params) pfree(pstring); } - errdetail("parameters: %s", param_str.data); + errdetail_log("parameters: %s", param_str.data); pfree(param_str.data); @@ -2925,7 +2930,8 @@ ProcessInterrupts(void) else ereport(FATAL, (errcode(ERRCODE_ADMIN_SHUTDOWN), - errmsg("terminating connection due to administrator command"))); + errmsg("terminating connection due to administrator command"), + errdetail_log_params(debug_query_params))); } if (ClientConnectionLost) { @@ -3001,14 +3007,16 @@ ProcessInterrupts(void) LockErrorCleanup(); ereport(ERROR, (errcode(ERRCODE_LOCK_NOT_AVAILABLE), - errmsg("canceling statement due to lock timeout"))); + errmsg("canceling statement due to lock timeout"), + errdetail_log_params(debug_query_params))); } if (stmt_timeout_occurred) { LockErrorCleanup(); ereport(ERROR, (errcode(ERRCODE_QUERY_CANCELED), - errmsg("canceling statement due to statement timeout"))); + errmsg("canceling statement due to statement timeout"), +
comma to delimit fractional seconds
I'm new at this. I looked at the TODO list and saw the issue of using comma to delimit fractional seconds. I looked at the parser stuff and I think I have at least a start on the issue. Is there anything more to it than allowing HH:MM:SS,### as well as HH:MM:SS.# ? Is it OK if it works that way in all timestamps, not just ISO-8601-compliant cases? Chris Howard
postgresql_fdw doesn't handle defaults correctly
Hi I have a table boo create table boo(id serial primary key, inserted date default current_date, v varchar); I imported this table via simple IMPORT FOREIGN SCHEMA public FROM SERVER foreign_server INTO public; The command insert into boo(v) values('ahoj'); is working in original database, but in second database with foreign table this fails postgres=# insert into boo(v) values('ahoj'); ERROR: null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, null, ahoj). CONTEXT: remote SQL command: INSERT INTO public.boo(id, inserted, v) VALUES ($1, $2, $3) It does unwanted transformation to insert of all columns. Is it expected behave? Regards Pavel
Re: Constraint documentation
Hello lætitia, My 0.02 € to try to simplify the suggested documentation. Check constraint were not are not designed to enforce business rules across tables. Avoid using check constraints with function accessing to other tables accessing other tables (no "to") and prefer triggers instead (please refer to for more information about triggers). ... and use instead. PostgreSQL won't prevent you from doing so, Although PostgreSQL ... so, but be aware you might encounter difficulties to restore dumps (generated with pg_dump or pg_dumpall) if you do. beware that dumps generated by pg_*<...> or <...> may be hard to restore because of such checks, as the underlying dependencies are not taken into account. -- Fabien.
Re: Adding Markodwn formatting to psql output
On 23/06/18 17:01, Alvaro Herrera wrote: > On 2018-Jun-23, Bruce Momjian wrote: > >> On Tue, May 29, 2018 at 06:47:54PM +0200, Lætitia Avrot wrote: >>> >>> I will try to apply and review the patch. >> >> Uh, no one mentioned in this thread that psql has supported asciidoc >> since PG 9.5. > > I would welcome markdown output, particularly some version that can be > processed nicely by pandoc :-) I use latex format for that, but I also welcome a markdown format. -- Vik Fearing +33 6 46 75 15 36 http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
Re: Incorrect visibility test function assigned to snapshot
On Wed, May 30, 2018 at 09:28:54AM -0400, Alvaro Herrera wrote: > On 2018-May-30, Antonin Houska wrote: > > > In the header comment, SnapBuildInitialSnapshot() claims to set > > snapshot->satisfies to the HeapTupleSatisfiesMVCC test function, and indeed > > it > > converts the "xid" array to match its semantics (i.e. the xid items > > eventually > > represent running transactions as opposed to the committed ones). However > > the > > test function remains HeapTupleSatisfiesHistoricMVCC as set by > > SnapBuildBuildSnapshot(). > > Interesting. While this sounds like an oversight that should have > horrible consequences, it's seems not to because the current callers > don't seem to care about the ->satisfies function. Are you able to come > up with some scenario in which it causes an actual problem? Uh, are we going to fix this anyway? Seems we should. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Supporting tls-server-end-point as SCRAM channel binding for OpenSSL 1.0.0 and 1.0.1
On Wed, Jun 6, 2018 at 01:16:11PM -0700, Steven Fackler wrote: > TLS 1.3, (which is currently in a draft state, but is theoretically being > finalized soon) does not support the TLS channel binding algorithms [1]. From Uh, according to this article, TLS 1.3 was finalized in March: https://www.theregister.co.uk/2018/03/27/with_tls_13_signed_off_its_implementation_time/ -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Adding Markodwn formatting to psql output
On 2018-Jun-23, Bruce Momjian wrote: > On Tue, May 29, 2018 at 06:47:54PM +0200, Lætitia Avrot wrote: > > > > > > Le mar. 29 mai 2018 à 17:28, Euler Taveira a écrit : > > > > 2018-05-29 12:11 GMT-03:00 Lætitia Avrot : > > > I write often documents for my customers in MarkDown to produce pdf > > > documents. Quite often, I need to present data from queries in a > > tabular > > > form. > > > > > https://www.postgresql.org/message-id/flat/CAAYBy8YU4pXYKDHeQhsA_= > > FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com#CAAYBy8YU4pXYKDHeQhsA_= > > FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com > > > > Thanks a lot! > > > > It seems that other people need that feature too. :-) > > > > I will try to apply and review the patch. > > Uh, no one mentioned in this thread that psql has supported asciidoc > since PG 9.5. I would welcome markdown output, particularly some version that can be processed nicely by pandoc :-) -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Concurrency bug in UPDATE of partition-key
Would you wait a little bit before pushing this? I'd like to give this a read too. Thanks -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Adding Markodwn formatting to psql output
On Tue, May 29, 2018 at 06:47:54PM +0200, Lætitia Avrot wrote: > > > Le mar. 29 mai 2018 à 17:28, Euler Taveira a écrit : > > 2018-05-29 12:11 GMT-03:00 Lætitia Avrot : > > I write often documents for my customers in MarkDown to produce pdf > > documents. Quite often, I need to present data from queries in a tabular > > form. > > > https://www.postgresql.org/message-id/flat/CAAYBy8YU4pXYKDHeQhsA_= > FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com#CAAYBy8YU4pXYKDHeQhsA_= > FC93yOBZp5j1h=bssao9-olcw...@mail.gmail.com > > Thanks a lot! > > It seems that other people need that feature too. :-) > > I will try to apply and review the patch. Uh, no one mentioned in this thread that psql has supported asciidoc since PG 9.5. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: SCRAM with channel binding downgrade attack
On Sat, Jun 23, 2018 at 10:30:19PM +0900, Michael Paquier wrote: > On Fri, Jun 22, 2018 at 11:01:53PM -0400, Bruce Momjian wrote: > > Uh, as I am understanding it, if we don't allow clients to force channel > > binding, then channel binding is useless because it cannot prevent > > man-in-the-middle attacks. I am sure some users will try to use it, and > > not understand that it serves no purpose. If we then allow clients to > > force channel binding in PG 12, they will then need to fix their > > clients. > > > > I suggest that if we don't allow users to use channel binding > > effectively that we should remove all documentation about this > > feature. > > Well, I don't agree with this position as the protocol put in place for > SCRAM with or without channel binding perfectly allows a client to > enforce the use channel binding. While that's missing for libpq, other > clients like JDBC or npgsql could perfectly implement that before this > gets in Postgres core in the shape they want. So I think that the docs > should be kept. Yes, the code is useful, but the _feature_ is not useful until some interface allows the forcing of channel binding. People are worried about users having to change their API in PG 12, but the point is that to use this feature people will have to change their API in PG 12 anyway, and it doesn't do anything useful without an interface we don't ship, and hasn't been written, so why confuse people that it is a feature in PG 11? Channel binding is listed as a _major_ feature in PG 11 in the release notes, and you can bet people are going to look at how to use it: Channel binding for SCRAM authentication, to prevent potential man-in-the-middle attacks on database connections It should perhaps be marked in the source code section, and listed as not useful by PG 11's libpq or any of the interfaces built on it. We are also going to need to communicate to people who have already looked at the release notes that this features is not useful in PG 11 using libpq. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: SCRAM with channel binding downgrade attack
On Fri, Jun 22, 2018 at 11:01:53PM -0400, Bruce Momjian wrote: > Uh, as I am understanding it, if we don't allow clients to force channel > binding, then channel binding is useless because it cannot prevent > man-in-the-middle attacks. I am sure some users will try to use it, and > not understand that it serves no purpose. If we then allow clients to > force channel binding in PG 12, they will then need to fix their > clients. > > I suggest that if we don't allow users to use channel binding > effectively that we should remove all documentation about this > feature. Well, I don't agree with this position as the protocol put in place for SCRAM with or without channel binding perfectly allows a client to enforce the use channel binding. While that's missing for libpq, other clients like JDBC or npgsql could perfectly implement that before this gets in Postgres core in the shape they want. So I think that the docs should be kept. -- Michael signature.asc Description: PGP signature
Re: Incorrect errno used with %m for backend code
On Fri, Jun 22, 2018 at 11:15:53AM -0400, Alvaro Herrera wrote: > I wondered for a bit if it would be a better idea to have > CloseTransientFile restore the existing errno if there is any and close > works fine -- when I noticed that that function itself says that caller > should check errno for close() errors. Most callers seem to do it > correctly, but a few fail to check the return value. Yeah, the API in its current shape is simpler to understand. Once you get used to the errno stanza of course... > A bunch of other places use CloseTransientFile while closing shop after > some other syscall failure, which is what your patch fixes. I didn't > review those; checking for close failure seems pointless. Agreed. > In some cases, we fsync the file and check that return value, then close > the file and do *not* check CloseTransientFile's return value -- > examples are CheckPointLogicalRewriteHeap, heap_xlog_logical_rewrite, > SnapBuildSerialize, SaveSlotToPath, durable_rename. I don't know if > it's reasonable for close() to fail after successfully fsyncing a file; > maybe this is not a problem. I would patch those nonetheless. And writeTimeLineHistory. > be_lo_export() is certainly missing a check: it writes and closes, > without intervening fsync. One problem at the same time if possible :) I think that those adjustments should be a separate patch. -- Michael signature.asc Description: PGP signature
Re: Incorrect errno used with %m for backend code
On Fri, Jun 22, 2018 at 03:45:33PM +0530, Ashutosh Sharma wrote: > Okay, thanks for the confirmation. Few of them are also there in > origin.c and snapbuild.c files. Thanks Ashutosh. I have been reviewing the whole tree and I found more places where this is missing, like rewriteheap.c, reorderbuffer.c or pg_basebackup, which gives the attached. -- Michael diff --git a/src/backend/access/heap/rewriteheap.c b/src/backend/access/heap/rewriteheap.c index 8d3c861a33..ed7ba181c7 100644 --- a/src/backend/access/heap/rewriteheap.c +++ b/src/backend/access/heap/rewriteheap.c @@ -1168,9 +1168,14 @@ heap_xlog_logical_rewrite(XLogReaderState *r) /* write out tail end of mapping file (again) */ pgstat_report_wait_start(WAIT_EVENT_LOGICAL_REWRITE_MAPPING_WRITE); if (write(fd, data, len) != len) + { + /* if write didn't set errno, assume problem is no disk space */ + if (errno == 0) + errno = ENOSPC; ereport(ERROR, (errcode_for_file_access(), errmsg("could not write to file \"%s\": %m", path))); + } pgstat_report_wait_end(); /* diff --git a/src/backend/access/transam/twophase.c b/src/backend/access/transam/twophase.c index 65194db70e..a9ef1b3d73 100644 --- a/src/backend/access/transam/twophase.c +++ b/src/backend/access/transam/twophase.c @@ -1241,12 +1241,17 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings) */ if (fstat(fd, )) { + int save_errno = errno; + CloseTransientFile(fd); if (give_warnings) + { + errno = save_errno; ereport(WARNING, (errcode_for_file_access(), errmsg("could not stat two-phase state file \"%s\": %m", path))); + } return NULL; } @@ -1274,13 +1279,18 @@ ReadTwoPhaseFile(TransactionId xid, bool give_warnings) pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_READ); if (read(fd, buf, stat.st_size) != stat.st_size) { + int save_errno = errno; + pgstat_report_wait_end(); CloseTransientFile(fd); if (give_warnings) + { + errno = save_errno; ereport(WARNING, (errcode_for_file_access(), errmsg("could not read two-phase state file \"%s\": %m", path))); + } pfree(buf); return NULL; } @@ -1663,16 +1673,26 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len) pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_WRITE); if (write(fd, content, len) != len) { + int save_errno = errno; + pgstat_report_wait_end(); CloseTransientFile(fd); + + /* if write didn't set errno, assume problem is no disk space */ + errno = save_errno ? save_errno : ENOSPC; ereport(ERROR, (errcode_for_file_access(), errmsg("could not write two-phase state file: %m"))); } if (write(fd, _crc, sizeof(pg_crc32c)) != sizeof(pg_crc32c)) { + int save_errno = errno; + pgstat_report_wait_end(); CloseTransientFile(fd); + + /* if write didn't set errno, assume problem is no disk space */ + errno = save_errno ? save_errno : ENOSPC; ereport(ERROR, (errcode_for_file_access(), errmsg("could not write two-phase state file: %m"))); @@ -1686,7 +1706,10 @@ RecreateTwoPhaseFile(TransactionId xid, void *content, int len) pgstat_report_wait_start(WAIT_EVENT_TWOPHASE_FILE_SYNC); if (pg_fsync(fd) != 0) { + int save_errno = errno; + CloseTransientFile(fd); + errno = save_errno; ereport(ERROR, (errcode_for_file_access(), errmsg("could not fsync two-phase state file: %m"))); diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index adbd6a2126..1a419aa49b 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -3268,7 +3268,10 @@ XLogFileInit(XLogSegNo logsegno, bool *use_existent, bool use_lock) pgstat_report_wait_start(WAIT_EVENT_WAL_INIT_SYNC); if (pg_fsync(fd) != 0) { + int save_errno = errno; + close(fd); + errno = save_errno; ereport(ERROR, (errcode_for_file_access(), errmsg("could not fsync file \"%s\": %m", tmppath))); @@ -11675,8 +11678,10 @@ retry: if (lseek(readFile, (off_t) readOff, SEEK_SET) < 0) { char fname[MAXFNAMELEN]; + int save_errno = errno; XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size); + errno = save_errno; ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen), (errcode_for_file_access(), errmsg("could not seek in log segment %s to offset %u: %m", @@ -11688,9 +11693,11 @@ retry: if (read(readFile, readBuf, XLOG_BLCKSZ) != XLOG_BLCKSZ) { char fname[MAXFNAMELEN]; + int save_errno = errno; pgstat_report_wait_end(); XLogFileName(fname, curFileTLI, readSegNo, wal_segment_size); + errno = save_errno; ereport(emode_for_corrupt_record(emode, targetPagePtr + reqLen), (errcode_for_file_access(), errmsg("could not read from log segment %s, offset %u: %m", diff --git a/src/backend/access/transam/xlogutils.c b/src/backend/access/transam/xlogutils.c index 52fe55e2af..4ecdc9220f 100644 --- a/src/backend/access/transam/xlogutils.c +++
Re: PATCH: backtraces for error messages
On 23 June 2018 at 20:22, Craig Ringer wrote: > Yeah. libunwind doesn't expose any interface to get that information, so > you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or > capture /proc/self/maps. > Ahem, I take that part back. https://stackoverflow.com/a/21584356/398670 see /usr/include/link.h -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: PATCH: backtraces for error messages
On 22 June 2018 at 23:26, Korry Douglas wrote: > A few misc comments: > > In general, +1. > > It might be nice to move the stack trace code into a separate function > (not static to elog.c) - I have often found it useful to attach backtraces > to objects so I can debug complex allocation/deallocation problems. > Good suggestion. > > The major expense in capturing a stack trace is in the symbolization of > the stack addresses, not the stack walk itself. You typically want to > symbolize the addresses at the time you generate the trace, but that's not > always the case. For example, if you want to record the stack trace of > every call to AllocSetAlloc() (and attach the trace to the AllocChunk) > performance gets unbearable if you symbolize every trace. So a flag that > specifies whether to symbolize would be nice too. > libunwind has some nifty caching that makes that a _lot_ cheaper; that's part of why I went with it despite the extra lib requirement. > If you don't symbolize, you need a way to capture the address range of > each dynamically loaded shared object (so that you can later symbolize > using something like addr2line). > Yeah. libunwind doesn't expose any interface to get that information, so you'd have to use platform APIs, like glibc's dl_iterate_phdr or dladdr, or capture /proc/self/maps. You have to make sure that info makes it to the log, and is re-output often enough not to be lost to log rotation, and is invalidated and re-output if mappings change due to new lib loads etc. But you don't want to print it all the time either. Then you have to walk the stack and print the instruction pointers and stack pointers and spit out raw traces for the user to reassemble. I'd argue that if you're doing the sort of thing where you want a stack of every AllocSetAlloc, you shouldn't be trying to do that in-process. That's where tools like perf, systemtap, etc really come into their own. I'm focused on making additional diagnostic info for errors and key log messages collectable for systems that aren't easily accessed, like users who have 12-hour read/response latencies and security setups as strict as they are insane and nonsensical. I'd have no objection to the option to disable symbolic back traces and print the raw call stack. It's trivial to do; in fact, I only removed the ip/sp info to keep the output more concise. I'm not interested in writing anything to handle the library load address mapping collection etc though. I don't see a really sensible way to do that in a log-based system. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: libpq compression
On 22.06.2018 20:56, Robbie Harwood wrote: Konstantin Knizhnik writes: On 22.06.2018 18:59, Robbie Harwood wrote: Konstantin Knizhnik writes: On 21.06.2018 20:14, Robbie Harwood wrote: Konstantin Knizhnik writes: On 21.06.2018 17:56, Robbie Harwood wrote: Konstantin Knizhnik writes: On 20.06.2018 23:34, Robbie Harwood wrote: Konstantin Knizhnik writes: My idea was the following: client want to use compression. But server may reject this attempt (for any reasons: it doesn't support it, has no proper compression library, do not want to spend CPU for decompression,...) Right now compression algorithm is hardcoded. But in future client and server may negotiate to choose proper compression protocol. This is why I prefer to perform negotiation between client and server to enable compression. Well, for negotiation you could put the name of the algorithm you want in the startup. It doesn't have to be a boolean for compression, and then you don't need an additional round-trip. Sorry, I can only repeat arguments I already mentioned: - in future it may be possible to specify compression algorithm - even with boolean compression option server may have some reasons to reject client's request to use compression Extra flexibility is always good thing if it doesn't cost too much. And extra round of negotiation in case of enabling compression seems to me not to be a high price for it. You already have this flexibility even without negotiation. I don't want you to lose your flexibility. Protocol looks like this: - Client sends connection option "compression" with list of algorithms it wants to use (comma-separated, or something). - First packet that the server can compress one of those algorithms (or none, if it doesn't want to turn on compression). No additional round-trips needed. This is exactly how it works now... Client includes compression option in connection string and server replies with special message ('Z') if it accepts request to compress traffic between this client and server. No, it's not. You don't need this message. If the server receives a compression request, it should just turn compression on (or not), and then have the client figure out whether it got compression back. How it will managed to do it. It receives some reply and first of all it should know whether it has to be decompressed or not. You can tell whether a message is compressed by looking at it. The way the protocol works, every message has a type associated with it: a single byte, like 'R', that says what kind of message it is. Compressed message can contain any sequence of bytes, including 'R':) Then tag your messages with a type byte. Or do it the other way around - look for the zstd framing within a message. Please, try to work with me on this instead of fighting every design change. Sorry, I do not want fighting. I am always vote for peace and constructive dialog. But it is hard to me to understand and accept your arguments. I do not understand why secure_read function is better place for handling compression than pq_recvbuf. And why it is destroying existed model. I already mentioned my arguments: 1. I want to use the same code for frontend and backend. 2. I think that streaming compression can be used not only for libpq. This is why I tried to make zpq_stream independent from communication layer and pass here callbacks for sending/receiving data. If pq_recvbuf is not right place for performing decommpression, I can introduce some other function, like read_raw or something like that and do decompression here. But I do not see much sense in it. Concerning necessity of special message for acknowledging compression by server: I also do not understand why you do not like idea to send some message and what is wrong with it. What you are suggesting "then tag your message" actually is the same as sending new message. Because what is the difference between tag 'Z' and message with code 'Z'? Sorry, but I do not understand problems you are going to solve and do not see any arguments except "I can not accept it". Thanks, --Robbie
Re: Keeping temporary tables in shared buffers
On Fri, Jun 22, 2018 at 6:09 PM, Robert Haas wrote: > On Mon, May 28, 2018 at 4:25 AM, Amit Kapila wrote: >> On Fri, May 25, 2018 at 6:33 AM, Asim Praveen wrote: >>> We are evaluating the use of shared buffers for temporary tables. The >>> advantage being queries involving temporary tables can make use of parallel >>> workers. >> This is one way, but I think there are other choices as well. We can >> identify and flush all the dirty (local) buffers for the relation >> being accessed parallelly. Now, once the parallel operation is >> started, we won't allow performing any write operation on them. It >> could be expensive if we have a lot of dirty local buffers for a >> particular relation. I think if we are worried about the cost of >> writes, then we can try some different way to parallelize temporary >> table scan. At the beginning of the scan, leader backend will >> remember the dirty blocks present in local buffers, it can then share >> the list with parallel workers which will skip scanning those blocks >> and in the end leader ensures that all those blocks will be scanned by >> the leader. This shouldn't incur a much additional cost as the >> skipped blocks should be present in local buffers of backend. > > This sounds awkward and limiting. How about using DSA to allocate > space for the backend's temporary buffers and a dshash for lookups? > That's a better idea. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com
Re: Concurrency bug in UPDATE of partition-key
On Fri, Jun 22, 2018 at 10:25 PM, Amit Khandekar wrote: > On 20 June 2018 at 18:54, Amit Kapila wrote: >> On Tue, Jun 19, 2018 at 9:33 PM, Amit Khandekar >> wrote: >> >> 2. >> ExecBRDeleteTriggers(..) >> { >> .. >> + /* If requested, pass back the concurrently updated tuple, if any. */ >> + if (epqslot != NULL) >> + *epqslot = newSlot; >> + >> if (trigtuple == NULL) >> return false; >> + >> + /* >> + * If the tuple was concurrently updated and the caller of this >> + * function requested for the updated tuple, skip the trigger >> + * execution. >> + */ >> + if (newSlot != NULL && epqslot != NULL) >> + return false; >> .. >> } >> >> Can't we merge the first change into second, something like: >> >> if (newSlot != NULL && epqslot != NULL) >> { >> *epqslot = newSlot; >> return false; >> } > > We want to update *epqslot with whatever the value of newSlot is. So > if newSlot is NULL, epqslot should be NULL. So we can't do that in the > "if (newSlot != NULL && epqslot != NULL)" condition. > Why do you need to update when newslot is NULL? Already *epqslot is initialized as NULL in the caller (ExecUpdate). I think we only want to update it when trigtuple is not null. Apart from looking bit awkward, I think it is error-prone as well because the code of GetTupleForTrigger is such that it can return NULL with a valid value of newSlot in which case the behavior will become undefined. The case which I am worried is when first-time EvalPlanQual returns some valid value of epqslot, but in the next execution after heap_lock_tuple, returns NULL. In practise, it won't happen because EvalPlanQual locks the tuple, so after that heap_lock_tuple shouldn't fail again, but it seems prudent to not rely on that unless we need to. For now, I have removed it in the attached patch, but if you have any valid reason, then we can add back. >> >> 4. >> +/* -- >> + * ExecBRDeleteTriggers() >> + * >> + * Called to execute BEFORE ROW DELETE triggers. >> + * >> + * Returns false in following scenarios : >> + * 1. Trigger function returned NULL. >> + * 2. The tuple was concurrently deleted OR it was concurrently updated and >> the >> + * new tuple didn't pass EvalPlanQual() test. >> + * 3. The tuple was concurrently updated and the new tuple passed the >> + * EvalPlanQual() test, BUT epqslot parameter is not NULL. In such case, >> this >> + * function skips the trigger function execution, because the caller has >> + * indicated that it wants to further process the updated tuple. The updated >> + * tuple slot is passed back through epqslot. >> + * >> + * In all other cases, returns true. >> + * -- >> + */ >> bool >> ExecBRDeleteTriggers(EState *estate, EPQState *epqstate, >> .. > > If it looks complicated, can you please suggest something to make it a > bit simpler. > See attached. Apart from this, I have changed few comments and fixed indentation issues. Let me know what you think about attached? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_concurrency_bug_v5.patch Description: Binary data
Re: effect of JIT tuple deform?
2018-06-23 8:35 GMT+02:00 Pavel Stehule : > Hi > > I try to measure effect of JIT tuple deform and I don't see any possible > effect. > > Is it this feature active in master branch? > > Is possible to see this feature in EXPLAIN ANALYZE? > Unfortunately I got slowdown 0. shared buffers = 1GB 1. create table with 50 int columns 2. insert into this table 4M rows postgres=# \dt+ wt List of relations ++--+---+---++-+ | Schema | Name | Type | Owner | Size | Description | ++--+---+---++-+ | public | wt | table | pavel | 893 MB | | ++--+---+---++-+ (1 row) default setting postgres=# explain analyze select sum(c45) from wt; +--+ |QUERY PLAN| +--+ | Finalize Aggregate (cost=136120.69..136120.70 rows=1 width=8) (actual time=879.547..879.547 rows=1 loops=1) | | -> Gather (cost=136120.47..136120.68 rows=2 width=8) (actual time=879.514..879.538 rows=3 loops=1) | | Workers Planned: 2 | | Workers Launched: 2 | | -> Partial Aggregate (cost=135120.47..135120.48 rows=1 width=8) (actual time=850.283..850.284 rows=1 loops=3) | | -> Parallel Seq Scan on wt (cost=0.00..130953.77 rows=178 width=4) (actual time=0.071..223.338 rows=147 loops=3) | | Planning Time: 0.158 ms | | JIT: | | Functions: 6 | | Generation Time: 4.267 ms | | Inlining: false | | Inlining Time: 0.000 ms | | Optimization: false | | Optimization Time: 2.472 ms | | Emission Time: 15.929 ms | | Execution Time: 899.496 ms | +--+ (16 rows) postgres=# set jit_tuple_deforming to off; SET postgres=# explain analyze select sum(c45) from wt; +--+ |QUERY PLAN| +--+ | Finalize Aggregate (cost=136120.69..136120.70 rows=1 width=8) (actual time=743.667..743.667 rows=1 loops=1) | | -> Gather (cost=136120.47..136120.68 rows=2 width=8) (actual time=743.654..743.657 rows=3 loops=1) | | Workers Planned: 2 | | Workers Launched: 2 | | -> Partial Aggregate (cost=135120.47..135120.48 rows=1 width=8) (actual time=715.532..715.533 rows=1 loops=3) | | -> Parallel Seq Scan on wt (cost=0.00..130953.77 rows=178 width=4) (actual time=0.067..216.245 rows=147 loops=3) | | Planning Time: 0.157 ms | | JIT: | | Functions: 4 | | Generation Time: 1.989 ms | | Inlining: false | | Inlining Time: 0.000 ms | | Optimization: false | | Optimization Time: 0.449 ms | | Emission Time: 7.254 ms | | Execution Time: 761.549 ms | +--+ (16 rows) When jit_tuple_deforming is enabled, the query is slower about 100ms, looks like performance regression > Regards > > Pavel >
effect of JIT tuple deform?
Hi I try to measure effect of JIT tuple deform and I don't see any possible effect. Is it this feature active in master branch? Is possible to see this feature in EXPLAIN ANALYZE? Regards Pavel