Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
> > >> In fact, I am not in >> favour of tracking the query dependencies for UPDATE/DELETE since we >> don't have any concrete example as to when that would be needed. > > > Right, but as I said before, some FDW might consult FDW options stored in > those objects during AddForeignUpdateTargets, so we should do that. > >>> Besides >>> that, I modified add_rte_to_flat_rtable so that the plan's dependencies >>> are >>> tracked, but that would lead to tracking the dependencies of unreferenced >>> foreign tables in dead subqueries or the dependencies of foreign tables >>> excluded from the plan by eg, constraint exclusion. But I thought that >>> would be also OK by the same reason as above. (Another reason for that >>> was >>> it seemed better to me to collect the dependencies in the same place as >>> for >>> relation OIDs.) > > >> If those unreferenced relations become references because of the >> changes in options, we will require those query dependencies to be >> recorded. So, if we are recording query dependencies, we should record >> the ones on unreferenced relations as well. > > > I mean plan dependencies here, not query dependencies. A query dependency also implies plan dependency since changing query implies plan changes. So, if query depends upon something, so does the plan. > > Having said that, I like the latest version (v6), so I'd vote for marking > this as Ready For Committer. > Marked that way. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: > I agree with these conclusions. I had a chance to talk with Andres > this morning at Postgres Vision and based on that conversation I'd > like to suggest a couple of additional tests: > > 1. Repeat this test on x86. In particular, I think you should test on > the EnterpriseDB server cthulhu, which is an 8-socket x86 server. I have done my test on cthulhu, basic difference is that In POWER we saw ClogControlLock on top at 96 and more client with 300 scale factor. But, on cthulhu at 300 scale factor transactionid lock is always on top. So I repeated my test with 1000 scale factor as well on cthulhu. All configuration is same as my last test. Test with 1000 scale factor - Test1: number of clients: 192 Head: tps = 21206.108856 (including connections establishing) tps = 21206.245441 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 310489 LWLockNamed | CLogControlLock 296152 | 35537 Lock| transactionid 15821 LWLockTranche | buffer_mapping 10342 LWLockTranche | buffer_content 8427 LWLockTranche | clog 3961 3165 Lock| extend 2861 Lock| tuple 2781 LWLockNamed | ProcArrayLock 1104 LWLockNamed | XidGenLock 745 LWLockTranche | lock_manager 371 LWLockNamed | CheckpointerCommLock 70 LWLockTranche | wal_insert 5 BufferPin | BufferPin 3 LWLockTranche | proc Patch: tps = 28725.038933 (including connections establishing) tps = 28725.367102 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 540061 | 57810 LWLockNamed | CLogControlLock 36264 LWLockTranche | buffer_mapping 29976 Lock| transactionid 4770 Lock| extend 4735 LWLockTranche | clog 4479 LWLockNamed | ProcArrayLock 4006 3955 LWLockTranche | buffer_content 2505 LWLockTranche | lock_manager 2179 Lock| tuple 1977 LWLockNamed | XidGenLock 905 LWLockNamed | CheckpointerCommLock 222 LWLockTranche | wal_insert 8 LWLockTranche | proc Test2: number of clients: 96 Head: tps = 25447.861572 (including connections establishing) tps = 25448.012739 (excluding connections establishing) 261611 | 69604 LWLockNamed | CLogControlLock 6119 Lock| transactionid 4008 2874 LWLockTranche | buffer_mapping 2578 LWLockTranche | buffer_content 2355 LWLockNamed | ProcArrayLock 1245 Lock| extend 1168 LWLockTranche | clog 232 Lock| tuple 217 LWLockNamed | CheckpointerCommLock 160 LWLockNamed | XidGenLock 158 LWLockTranche | lock_manager 78 LWLockTranche | wal_insert 5 BufferPin | BufferPin Patch: tps = 32708.368938 (including connections establishing) tps = 32708.765989 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt 326601 | 7471 LWLockNamed | CLogControlLock 5387 Lock| transactionid 4018 3331 LWLockTranche | buffer_mapping 3144 LWLockNamed | ProcArrayLock 1372 Lock| extend 722 LWLockTranche | buffer_content 393 LWLockNamed | XidGenLock 237 LWLockTranche | lock_manager 234 Lock| tuple 194 LWLockTranche | clog 96 Lock| relation 88 LWLockTranche | wal_insert 34 LWLockNamed | CheckpointerCommLock Test3: number of clients: 64 Head: tps = 28264.194438 (including connections establishing) tps = 28264.336270 (excluding connections establishing) 218264 | 10314 LWLockNamed | CLogControlLock 4019 2067 Lock| transactionid 1950 LWLockTranche | buffer_mapping 1879 LWLockNamed | ProcArrayLock 592 Lock| extend 565 LWLockTranche | buffer_content 222 LWLockNamed | XidGenLock 143 LWLockTranche | clog 131 LWLockNamed | CheckpointerCommLock 63 LWLockTranche | lock_manager 52 Lock| tuple 35 LWLockTranche | wal_insert Patch: tps = 27906.376194 (including connections establishing) tps = 27906.531392 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt 228108 | 4039 2294 Lock| transactionid 2116 LWLockTranche | buffer_mapping 1757 LWLockNamed | ProcArrayLock 1553 LWLockNamed | CLogControlLock 800 Lock| extend 403 LWLockTranche | buffer_content 92 LWLockNamed | XidGenLock 74 LWLockTranche | lock_manager 42 Lock| tuple 35 LWLockTranche | wal_insert 34 LWLockTranche | clog 14 LWLockNamed | CheckpointerCommLock Test4: numb
Re: [HACKERS] Gather Merge
On Tue, Oct 18, 2016 at 5:29 PM, Rushabh Lathia wrote: > On Mon, Oct 17, 2016 at 2:26 PM, Amit Kapila > wrote: >> >> There is lot of common code between ExecGatherMerge and ExecGather. >> Do you think it makes sense to have a common function to avoid the >> duplicity? >> >> I see there are small discrepancies in both the codes like I don't see >> the use of single_copy flag, as it is present in gather node. >> > > Yes, even I thought to centrilize some things of ExecGather and > ExecGatherMerge, > but its really not something that is fixed. And I thought it might change > particularly > for the Gather Merge. And as explained by Robert single_copy doesn't make > sense > for the Gather Merge. I will still look into this to see if something can be > make > centralize. > Okay, I haven't thought about it, but do let me know if you couldn't find any way to merge the code. >> >> 3. >> +gather_merge_readnext(GatherMergeState * gm_state, int reader, bool >> force) >> { >> .. >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL); >> + >> + /* >> + >> * try to read more tuple into nowait mode and store it into the tuple >> + * array. >> + >> */ >> + if (HeapTupleIsValid(tup)) >> + fill_tuple_array(gm_state, reader); >> >> How the above read tuple is stored in array? In anycase the above >> interface seems slightly awkward to me. Basically, I think what you >> are trying to do here is after reading first tuple in waitmode, you >> are trying to fill the array by reading more tuples. So, can't we >> push reading of this fist tuple into that function and name it as >> form_tuple_array(). >> > > Yes, you are right. > You have not answered my first question. I will try to ask again, how the tuple read by below code is stored in the array: >> + tup = gm_readnext_tuple(gm_state, reader, force, NULL); > First its trying to read tuple into wait-mode, and once > it > find tuple then its try to fill the tuple array (which basically try to read > tuple > into nowait-mode). Reason I keep it separate is because in case of > initializing > the gather merge, if we unable to read tuple from all the worker - while > trying > re-read, we again try to fill the tuple array for the worker who already > produced > atleast a single tuple (see gather_merge_init() for more details). > Whenever any worker produced one tuple, you already try to fill the array in gather_merge_readnext(), so does the above code help much? > Also I > thought > filling tuple array (which basically read tuple into nowait mode) and > reading tuple > (into wait-mode) are two separate task - and if its into separate function > that code > look more clear. To me that looks slightly confusing. > If you have any suggestion for the function name > (fill_tuple_array) > then I am open to change that. > form_tuple_array (form_tuple is used at many places in code, so it should look okay). I think you might want to consider forming array even for leader, although it might not be as beneficial as for workers, OTOH, the code will get simplified if we do that way. Today, I observed another issue in code: +gather_merge_init(GatherMergeState *gm_state) { .. +reread: + for (i = 0; i < nreaders + 1; i++) + { + if (TupIsNull(gm_state->gm_slots[i]) || + gm_state->gm_slots[i]->tts_isempty) + { + if (gather_merge_readnext(gm_state, i, initialize ? false : true)) + { + binaryheap_add_unordered(gm_state->gm_heap, + Int32GetDatum(i)); + } + } + else + fill_tuple_array(gm_state, i); + } + initialize = false; + + for (i = 0; i < nreaders; i++) + if (TupIsNull(gm_state->gm_slots[i]) || gm_state->gm_slots[i]->tts_isempty) + goto reread; .. } This code can cause infinite loop. Consider a case where one of the worker doesn't get any tuple because by the time it starts all the tuples are consumed by all other workers. The above code will keep on looping to fetch the tuple from that worker whereas that worker can't return any tuple. I think you can fix it by checking if the particular queue has been exhausted. >> > >> > Open Issue: >> > >> > - Commit af33039317ddc4a0e38a02e2255c2bf453115fd2 fixed the leak into >> > tqueue.c by >> > calling gather_readnext() into per-tuple context. Now for gather merge >> > that >> > is >> > not possible, as we storing the tuple into Tuple array and we want tuple >> > to >> > be >> > free only its get pass through the merge sort algorithm. One idea is, we >> > can >> > also call gm_readnext_tuple() under per-tuple context (which will fix >> > the >> > leak >> > into tqueue.c) and then store the copy of tuple into tuple array. >> > >> >> Won't extra copy per tuple impact performance? Is the fix in >> mentioned commit was for record or composite types, if so, does >> GatherMerge support such types and if it support, does it provide any >> benefit over Gather? >> > > I don't think was specificially for the record or composite types - but I > might be > wrong. As per my understanding commit fix leak into tqueue.c. > Hmm, in tqueue.c, I thi
Re: [HACKERS] postgres_fdw : altering foreign table not invalidating prepare statement execution plan.
On 2016/10/20 16:27, Ashutosh Bapat wrote: I wrote: Besides that, I modified add_rte_to_flat_rtable so that the plan's dependencies are tracked, but that would lead to tracking the dependencies of unreferenced foreign tables in dead subqueries or the dependencies of foreign tables excluded from the plan by eg, constraint exclusion. But I thought that would be also OK by the same reason as above. You wrote: If those unreferenced relations become references because of the changes in options, we will require those query dependencies to be recorded. So, if we are recording query dependencies, we should record the ones on unreferenced relations as well. I wrote: I mean plan dependencies here, not query dependencies. A query dependency also implies plan dependency since changing query implies plan changes. So, if query depends upon something, so does the plan. Right. Sorry, my explanation was not enough, but what I just wanted to mention is: for unreferenced relations in the plan (eg, foreign tables excluded from the plan by constraint exclusion), we don't need to record the dependencies of those relations on FDW-related objects in the plan dependency list (ie, glob->invalItems), because the changes to attributes of the FDW-related objects for those relations wouldn't affect the plan. I wrote: Having said that, I like the latest version (v6), so I'd vote for marking this as Ready For Committer. Marked that way. Thanks! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Aggregate Push Down - Performing aggregation on foreign server
On Thu, Oct 20, 2016 at 12:49 PM, Ashutosh Bapat < ashutosh.ba...@enterprisedb.com> wrote: > The patch compiles and make check-world doesn't show any failures. > > >> > > > > > > I have tried it. Attached separate patch for it. > > However I have noticed that istoplevel is always false (at-least for the > > testcase we have, I get it false always). And I also think that taking > > this decision only on PlanState is enough. Does that in attached patch. > > To fix this, I have passed deparse_namespace to the private argument and > > accessed it in get_special_variable(). Changes looks very simple. Please > > have a look and let me know your views. > > This issue is not introduced by the changes done for the aggregate push > > down, it only got exposed as we now ship expressions in the target list. > > Thus I think it will be good if we make these changes separately as new > > patch, if required. > > > > > The patch looks good and pretty simple. > > +* expression. However if underneath PlanState is ForeignScanState, > then > +* don't force parentheses. > We need to explain why it's safe not to add paranthesis. The reason > this function adds paranthesis so as to preserve any operator > precedences imposed by the expression tree of which this IndexVar is > part. For ForeignScanState, the Var may represent the root of the > expression tree, and thus not need paranthesis. But we need to verify > this and explain it in the comments. > > As you have explained this is an issue exposed by this patch; > something not must have in this patch. If committers feel that > aggregate pushdown needs this fix as well, please provide a patch > addressing the above comment. > > Sure. > > Ok. PFA patch with cosmetic changes (mostly rewording comments). Let > me know if those look good. I am marking this patch is ready for > committer. > Changes look good to me. Thanks for the detailed review. -- Jeevan B Chalke Principal Software Engineer, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
[HACKERS] File content logging during execution of COPY queries (was: Better logging of COPY queries if log_statement='all')
> > > According to my colleagues it would be very nice to have this feature. > > > For instance, if you are trying to optimize PostgreSQL for application > > > that uses COPY and you don't have access to or something like this. > > > It could also be useful in some other cases. > > > > This use-case doesn't really make much sense to me. Can you explain it > > in more detail? Is the goal here to replicate all of the statements > > that are changing data in the database? > > The idea is to record application workload in real environment and write > a benchmark based on this record. Then using this benchmark we could try > different OS/DBMS configuration (or maybe hardware), find an extremum, > then change configuration in production environment. > > It's not always possible to change an application or even database (e.g. > to use triggers) for this purpose. For instance, if DBMS is provided as > a service. > > Currently PostgreSQL allows to record all workload _except_ COPY > queries. Considering how easily it could be done I think it's wrong. > Basically the only real question here is how it should look like in > postgresql.conf. OK, how about introducing a new boolean parameter named log_copy? Corresponding patch is attached. -- Best regards, Aleksander Alekseev diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 8c25b45..84a7542 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5205,6 +5205,20 @@ FROM pg_stat_activity; + + log_copy (boolean) + + log_copy configuration parameter + + + + +Controls whether file content is logged during execution of +COPY queries. The default is off. + + + + log_replication_commands (boolean) diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c index 5947e72..1863e27 100644 --- a/src/backend/commands/copy.c +++ b/src/backend/commands/copy.c @@ -331,6 +331,38 @@ static bool CopyGetInt16(CopyState cstate, int16 *val); /* + * Logs file content during COPY ... FROM / COPY ... TO execution if + * log_copy = 'on'. + */ +static void +CopyLogStatement(const char* str, bool flush) +{ + static StringInfo logString = NULL; + + if(log_copy == false) + return; + + if(logString == NULL) + { + MemoryContext oldctx = MemoryContextSwitchTo(TopMemoryContext); + logString = makeStringInfo(); + MemoryContextSwitchTo(oldctx); + } + + appendStringInfoString(logString, str); + + if(flush) + { + ereport(LOG, +(errmsg("statement: %s", logString->data), + errhidestmt(true), + errhidecontext(true))); + + resetStringInfo(logString); + } +} + +/* * Send copy start/stop messages for frontend copies. These have changed * in past protocol redesigns. */ @@ -2045,14 +2077,20 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls) if (!cstate->binary) { if (need_delim) + { CopySendChar(cstate, cstate->delim[0]); +CopyLogStatement(cstate->delim, false); + } need_delim = true; } if (isnull) { if (!cstate->binary) + { CopySendString(cstate, cstate->null_print_client); +CopyLogStatement(cstate->null_print_client, false); + } else CopySendInt32(cstate, -1); } @@ -2062,6 +2100,9 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls) { string = OutputFunctionCall(&out_functions[attnum - 1], value); + +CopyLogStatement(string, false); + if (cstate->csv_mode) CopyAttributeOutCSV(cstate, string, cstate->force_quote_flags[attnum - 1], @@ -2083,6 +2124,7 @@ CopyOneRowTo(CopyState cstate, Oid tupleOid, Datum *values, bool *nulls) } CopySendEndOfRow(cstate); + CopyLogStatement("", true); MemoryContextSwitchTo(oldcontext); } @@ -2914,6 +2956,8 @@ NextCopyFromRawFields(CopyState cstate, char ***fields, int *nfields) if (done && cstate->line_buf.len == 0) return false; + CopyLogStatement(cstate->line_buf.data, true); + /* Parse the line into de-escaped field values */ if (cstate->csv_mode) fldct = CopyReadAttributesCSV(cstate); diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index bc9d33f..0f035ac 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -415,6 +415,8 @@ bool log_planner_stats = false; bool log_executor_stats = false; bool log_statement_stats = false; /* this is sort of all three * above together */ +bool log_copy = false; + bool log_btree_build_stats = false; char *event_source; @@ -1161,6 +1163,15 @@ static struct config_bool ConfigureNamesBool[] = false, check_log_stats, NULL, NULL }, + { + {"log_copy", PGC_SUSET, STATS_MONITORING, + gettext_noop("Writes file content during COPY queries to the server log."), + NULL + }, + &log_copy, + false, + NULL, NULL, NULL + }, #ifdef BTREE_BUILD_STATS { {"log_btree_build_stats",
[HACKERS] Danger of automatic connection reset in psql
Hi Hackers! When using psql interactively one might be tempted to guard potentially destructive commands such as "UPDATE / DELETE / DROP " by starting the input line with an explicit "BEGIN; ...". This has the added benefit that then you invoke the command by reverse-searching the command history, you get it together with the guarding transaction open statement. This, however, is not 100% safe as I've found out a few days ago. Should the connection to the server get lost, the first of semicolon-separated statements, "BEGIN;" will only trigger connection reset, and if that is successful the following command(s) are going to be executed on the newly opened connection, but without the transaction guard. I'm not the first one to discover that, a search in archives gives at least 3 results: https://www.postgresql.org/message-id/flat/1097174870.9273.8.camel%40ipso.snappymail.ca#1097174870.9273.8.ca...@ipso.snappymail.ca https://www.postgresql.org/message-id/flat/4BF6A496.2090106%40comgate.c https://www.postgresql.org/message-id/flat/CAD3a31U%2BfSBsq%3Dtxw2G-D%2BfPND_UN-nSojrGyaD%2BhkYUzvxusQ%40mail.gmail.com The second one even resulted in a TODO item: Prevent psql from sending remaining single-line multi-statement queries after reconnection I was thinking that simply adding a bool flag in the pset struct, to indicate that connection was reset during attempt to execute the last query would do the trick, but it only helps in exactly the case described above. Since this is already an improvement, I'm attaching a patch. If on the other hand, someone is pasting into psql's terminal a block of commands enclosed in BEGIN/COMMIT, the same bug is triggered: BEGIN doesn't have effect and the rest of commands run outside of transaction. Is it possible at all to protect against the latter case? How? -- Alex From 3eae5e5d91084e0882a286bac464782701e17d21 Mon Sep 17 00:00:00 2001 From: Oleksandr Shulgin Date: Thu, 20 Oct 2016 12:24:48 +0200 Subject: [PATCH] psql: stop sending commands after connection reset Previsouly an input line such as "BEGIN; UPDATE something..." could result in UPDATE running outside of transaction if the first statement happen to trigger connection reset. NB: this does *not* protect from blocks of commands pasted into the terminal. --- src/bin/psql/common.c | 2 ++ src/bin/psql/mainloop.c | 5 - src/bin/psql/settings.h | 3 +++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index a7789df..34a4507 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -386,6 +386,8 @@ CheckConnection(void) } else psql_error("Succeeded.\n"); + + pset.conn_was_reset = true; } return OK; diff --git a/src/bin/psql/mainloop.c b/src/bin/psql/mainloop.c index 37dfa4d..6d39ce8 100644 --- a/src/bin/psql/mainloop.c +++ b/src/bin/psql/mainloop.c @@ -391,11 +391,14 @@ MainLoop(FILE *source) } /* fall out of loop if lexer reached EOL */ - if (scan_result == PSCAN_INCOMPLETE || + if (pset.conn_was_reset || +scan_result == PSCAN_INCOMPLETE || scan_result == PSCAN_EOL) break; } + pset.conn_was_reset = false; + /* Add line to pending history if we didn't execute anything yet */ if (pset.cur_cmd_interactive && !line_saved_in_history) pg_append_history(line, history_buf); diff --git a/src/bin/psql/settings.h b/src/bin/psql/settings.h index 8cfe9d2..39a4be0 100644 --- a/src/bin/psql/settings.h +++ b/src/bin/psql/settings.h @@ -102,6 +102,9 @@ typedef struct _psqlSettings FILE *cur_cmd_source; /* describe the status of the current main * loop */ bool cur_cmd_interactive; + bool conn_was_reset; /* indicates that the connection was reset + * during the last attempt to execute an + * interactive command */ int sversion; /* backend server version */ const char *progname; /* in case you renamed psql */ char *inputfile; /* file being currently processed, if any */ -- 2.7.4 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Gather Merge
On Thu, Oct 20, 2016 at 12:22 AM, Peter Geoghegan wrote: > On Tue, Oct 4, 2016 at 11:05 PM, Rushabh Lathia > wrote: > > Query 4: With GM 7901.480 -> Without GM 9064.776 > > Query 5: With GM 53452.126 -> Without GM 55059.511 > > Query 9: With GM 52613.132 -> Without GM 98206.793 > > Query 15: With GM 68051.058 -> Without GM 68918.378 > > Query 17: With GM 129236.075 -> Without GM 160451.094 > > Query 20: With GM 259144.232 -> Without GM 306256.322 > > Query 21: With GM 153483.497 -> Without GM 168169.916 > > > > Here from the results we can see that query 9, 17 and 20 are the one > which > > show good performance benefit with the Gather Merge. > > Were all other TPC-H queries unaffected? IOW, did they have the same > plan as before with your patch applied? Did you see any regressions? > > Yes, all other TPC-H queries where unaffected with the patch. At the initially stage of patch development I noticed the regressions, but then realize that it is because I am not allowing leader to participate in the GM. Later on I fixed that and after that I didn't noticed any regressions. I assume that this patch has each worker use work_mem for its own > sort, as with hash joins today. One concern with that model when > testing is that you could end up with a bunch of internal sorts for > cases with a GM node, where you get one big external sort for cases > without one. Did you take that into consideration? > > Yes, but isn't that good? Please correct me if I am missing anything. > -- > Peter Geoghegan > -- Rushabh Lathia www.EnterpriseDB.com
Re: [HACKERS] Danger of automatic connection reset in psql
On Thu, Oct 20, 2016 at 12:28 PM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > > > I'm not the first one to discover that, a search in archives gives at least 3 results: > > https://www.postgresql.org/message-id/flat/1097174870.9273.8.camel%40ipso.snappymail.ca#1097174870.9273.8.ca...@ipso.snappymail.ca > https://www.postgresql.org/message-id/flat/4BF6A496.2090106%40comgate.c Sorry that one got broken, should be: https://www.postgresql.org/message-id/flat/4BF6A496.2090106%40comgate.cz
Re: [HACKERS] Danger of automatic connection reset in psql
On Thu, Oct 20, 2016 at 12:28 PM, Oleksandr Shulgin < oleksandr.shul...@zalando.de> wrote: > > Since this is already an improvement, I'm attaching a patch. > > If on the other hand, someone is pasting into psql's terminal a block of > commands enclosed in BEGIN/COMMIT, the same bug is triggered: BEGIN doesn't > have effect and the rest of commands run outside of transaction. > > Is it possible at all to protect against the latter case? How? > One idea I was just suggested is to make interactive psql sessions buffer in all available input, before it's going to block. This way it doesn't matter if the multiple statements are appearing on one line or are being pasted. Feasible? -- Alex
Re: [HACKERS] Indirect indexes
On Wed, Oct 19, 2016 at 01:04:16PM -0400, Bruce Momjian wrote: > On Wed, Oct 19, 2016 at 06:58:05PM +0200, Simon Riggs wrote: > > >> I agree. Also, I think the recheck mechanism will have to be something > > >> like > > >> what I wrote for WARM i.e. only checking for index quals won't be enough > > >> and we > > >> would actually need to verify that the heap tuple satisfies the key in > > >> the > > >> indirect index. > > > > > > I personally would like to see how far we get with WARM before adding > > > this feature that requires a DBA to evaluate and enable it. > > > > Assuming WARM is accepted, that *might* be fine. > > First, I love WARM because everyone gets the benefits by default. For > example, a feature that improves performance by 10% but is only used by > 1% of users has a usefulness of 0.1% --- at least that is how I think of > it. Just to clarify, if a feature improves performance by 1%, but is enabled by default, that is 10x more useful across our entire user base as the feature numbers listed above, 1% vs 0.1%. Also, it seems indirect indexes would be useful for indexing columns that are not updated frequently on tables that are updated frequently, and whose primary key is not updated frequently. That's quite a logic problem for users to understand. Also, if vacuum is difficult for indirect indexes, and also for WARM, perhaps the vacuum problem can be solved for WARM and WARM can be used in this case too. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fun fact about autovacuum and orphan temp tables
On Mon, 5 Sep 2016 14:54:05 +0300 Grigory Smolkin wrote: > Hello, hackers! > > We were testing how well some application works with PostgreSQL and > stumbled upon an autovacuum behavior which I fail to understand. > Application in question have a habit to heavily use temporary tables > in funny ways. > For example it creates A LOT of them. > Which is ok. > Funny part is that it never drops them. So when backend is finally > terminated, it tries to drop them and fails with error: > > FATAL: out of shared memory > HINT: You might need to increase max_locks_per_transaction > > If I understand that rigth, we are trying to drop all these temp > tables in one transaction and running out of locks to do so. > After that postgresql.log is flooded at the rate 1k/s with messages > like that: > > LOG: autovacuum: found orphan temp table "pg_temp_15"."tt38147" in > database "DB_TEST" > > It produces a noticeable load on the system and it`s getting worst > with every terminated backend or restart. > I did some RTFS and it appears that autovacuum has no intention of > cleaning that orphan tables unless > it`s wraparound time: > > src/backend/postmaster/autovacuum.c > /* We just ignore it if the owning backend is still > active */ 2037 if (backendID == MyBackendId || > BackendIdGetProc(backendID) == NULL) > 2038 { > 2039 /* > 2040 * We found an orphan temp table (which was > probably left > 2041 * behind by a crashed backend). If it's so > old as to need > 2042 * vacuum for wraparound, forcibly drop it. > Otherwise just > 2043 * log a complaint. > 2044 */ > 2045 if (wraparound) > 2046 { > 2047 ObjectAddress object; > 2048 > 2049 ereport(LOG, > 2050 (errmsg("autovacuum: dropping > orphan temp table \"%s\".\"%s\" in database \"%s\"", > 2051 get_namespace_name(classForm->relnamespace), > 2052 NameStr(classForm->relname), > 2053 get_database_name(MyDatabaseId; > 2054 object.classId = RelationRelationId; > 2055 object.objectId = relid; > 2056 object.objectSubId = 0; > 2057 performDeletion(&object, DROP_CASCADE, > PERFORM_DELETION_INTERNAL); > 2058 } > 2059 else > 2060 { > 2061 ereport(LOG, > 2062 (errmsg("autovacuum: found orphan > temp table \"%s\".\"%s\" in database \"%s\"", > 2063 get_namespace_name(classForm->relnamespace), > 2064 NameStr(classForm->relname), > 2065 get_database_name(MyDatabaseId; > 2066 } > 2067 } > 2068 } > > > What is more troubling is that pg_statistic is starting to bloat > badly. > > LOG: automatic vacuum of table "DB_TEST.pg_catalog.pg_statistic": > index scans: 0 > pages: 0 removed, 68225 remain, 0 skipped due to pins > tuples: 0 removed, 2458382 remain, 2408081 are dead but not > yet removable > buffer usage: 146450 hits, 31 misses, 0 dirtied > avg read rate: 0.010 MB/s, avg write rate: 0.000 MB/s > system usage: CPU 3.27s/6.92u sec elapsed 23.87 sec > > What is the purpose of keeping orphan tables around and not dropping > them on the spot? > > Hey Hackers, I tried to fix the problem with a new backend not being able to reuse a temporary namespace when it contains thousands of temporary tables. I disabled locking of objects during namespace clearing process. See the patch attached. Please tell me if there are any reasons why this is wrong. I also added a GUC variable and changed the condition in autovacuum to let it nuke orphan tables on its way. See another patch attached. Regards, Constantin Pan diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 5c8db97..d7707ac 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -5665,6 +5665,20 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; + + autovacuum_wipe_orphan_temp_tables (boolean) + + autovacuum_wipe_orphan_temp_tables configuration parameter + + + + +Controls whether the server should drop orphan temporary tables during +autovacuum. This is on by default. + + + + log_autovacuum_min_duration (integer) diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c index 3768f50..e5318f4 100644 --- a/src/backend/postmaster/autovacuum.c +++ b/src/backend/postmaster/autovacuum.c @@ -123,6 +123,8 @@ int autovacuum_vac_cost_limit; int Log_autovacuum_min_duration = -1; +bool autovacuum_wipe_orphan_temp_tables = true; + /* how long to keep pgstat data in the launc
[HACKERS] Re: File content logging during execution of COPY queries (was: Better logging of COPY queries if log_statement='all')
Aleksander, * Aleksander Alekseev (a.aleks...@postgrespro.ru) wrote: > > The idea is to record application workload in real environment and write > > a benchmark based on this record. Then using this benchmark we could try > > different OS/DBMS configuration (or maybe hardware), find an extremum, > > then change configuration in production environment. > > > > It's not always possible to change an application or even database (e.g. > > to use triggers) for this purpose. For instance, if DBMS is provided as > > a service. > > > > Currently PostgreSQL allows to record all workload _except_ COPY > > queries. Considering how easily it could be done I think it's wrong. > > Basically the only real question here is how it should look like in > > postgresql.conf. > > OK, how about introducing a new boolean parameter named log_copy? > Corresponding patch is attached. The parameter would be better as 'log_copy_data', I believe. The actual COPY command is already logged with just 'log_statement = all', of course. Also.. > diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml > index 8c25b45..84a7542 100644 > --- a/doc/src/sgml/config.sgml > +++ b/doc/src/sgml/config.sgml > @@ -5205,6 +5205,20 @@ FROM pg_stat_activity; > > > > + > + log_copy (boolean) > + > + log_copy configuration parameter > + > + > + > + > +Controls whether file content is logged during execution of > +COPY queries. The default is off. > + > + > + "file" isn't accurate here and I don't know that it actually makes sense to log "COPY TO" data- we don't log the results of SELECT statements, after all, and the use-case you outlined above (which I generally agree is one we should consider) doesn't have any need for the data of "COPY TO" statements to be in the log, it seems to me. Can you elaborate on why we would want to log the data sent to the client with a COPY TO command. If there is a reason, why wouldn't we want to support that for SELECT and ... RETURNING commands too? Thanks! Stephen signature.asc Description: Digital signature
[HACKERS] Remove autovacuum GUC?
Hello, After all these years, we are still regularly running into people who say, "performance was bad so we disabled autovacuum". I am not talking about once in a while, it is often. I would like us to consider removing the autovacuum option. Here are a few reasons: 1. It does not hurt anyone 2. It removes a foot gun 3. Autovacuum is *not* optional, we shouldn't let it be 4. People could still disable it at the table level for those tables that do fall into the small window of, no maintenance is o.k. 5. People would still have the ability to decrease the max_workers to 1 (although I could argue about that too). Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disable autovacuum guc?
On 10/19/2016 07:22 PM, Josh Berkus wrote: On 10/19/2016 06:27 PM, Joshua D. Drake wrote: Hello, After all these years, we are still regularly running into people who say, "performance was bad so we disabled autovacuum". I am not talking about once in a while, it is often. I would like us to consider removing the autovacuum option. Here are a few reasons: 1. It does not hurt anyone 2. It removes a foot gun 3. Autovacuum is *not* optional, we shouldn't let it be 4. People could still disable it at the table level for those tables that do fall into the small window of, no maintenance is o.k. 5. People would still have the ability to decrease the max_workers to 1 (although I could argue about that too). People who run data warehouses where all of the data comes in as batch loads regularly disable autovacuum, and should do so. For the DW/batch load use-case, it makes far more sense to do batch loads interspersed with ANALYZEs and VACUUMS of loaded/updated tables. Hrm, true although that is by far a minority of our users. What if we made it so we disabled the autovacuum guc but made it so you could disable autovacuum per database (ALTER DATABASE SET or something such thing?). Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On 10/19/2016 02:51 PM, Tomas Vondra wrote: ... > Yeah. There are three contexts in reorder buffers: - changes (fixed size) - txns (fixed size) - tuples (variable size) The first two work perfectly fine with Slab. The last one (tuples) is used to allocate variable-sized bits, so I've tried to come up with something smart - a sequence of Slabs + overflow AllocSet. I agree that in hindsight it's a bit strange, and that the "generational" aspect is the key aspect here - i.e. it might be possible to implement a memory context that allocates variable-length chunks but still segregates them into generations. That is, don't build this on top of Slab. That would also fix the issue with two allocators in GenSlab. I'll think about this. And here is a fairly complete prototype of this idea, adding "Gen" generational memory context based only on the "similar lifespan" assumption (and abandoning the fixed-size assumption). It's much simpler than GenSlab (which it's supposed to replace), and abandoning the idea of composing two memory contexts also fixed the warts with some of the API methods (e.g. repalloc). I've also been thinking that perhaps "Gen" would be useful for all three contexts in ReorderBuffer - so I've done a quick test comparing the various combinations (using the test1() function used before). master slabs slabs+genslab slabs+gen gens 50k 18700 210 220 190 190 100k 16 380 470 350 350 200kN/A 750 920 740 679 500kN/A 225022401790 1740 1000kN/A 460050003910 3700 Where: * master - 23ed2ba812117 * slabs - all three contexts use Slab (patch 0001) * slabs+genslab - third context is GenSlab (patch 0002) * slabs+gen - third context is the new Gen (patch 0003) * gens - all three contexts use Gen The results are a bit noisy, but I think it's clear the new Gen context performs well - it actually seems a bit faster than GenSlab, and using only Gen for all three contexts does not hurt peformance. This is most likely due to the trivial (practically absent) freespace management in Gen context, compared to both Slab and GenSlab. So the speed is not the only criteria - I haven't measured memory consumption, but I'm pretty sure there are cases where Slab consumes much less memory than Gen, thanks to reusing free space. I'd say throwing away GenSlab and keeping Slab+Gen is the way to go. There's still a fair bit of work on this, particularly implementing the missing API methods in Gen - GenCheck() and GenStats(). As Robert pointed out, there's also quite a bit of duplicated code between the different memory contexts (randomization and valgrind-related), so this needs to be moved to a shared place. I'm also thinking that we need better names, particularly for the Gen allocator. It's supposed to mean Generational, although there are no explicit generations anymore. Slab is probably OK - it does not match any of the existing kernel slab allocators exactly, but it follows the same principles, which is what matters. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-allocators-v4.tgz Description: application/compressed-tar -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
Bruce Momjian wrote: > Just to clarify, if a feature improves performance by 1%, but is enabled > by default, that is 10x more useful across our entire user base as the > feature numbers listed above, 1% vs 0.1%. Great. But not all users are alike. We have big profile users that write blog posts that get echoed all over the world who would benefit from things that perhaps other users would not benefit that much from. > Also, it seems indirect indexes would be useful for indexing columns > that are not updated frequently on tables that are updated frequently, > and whose primary key is not updated frequently. That's quite a logic > problem for users to understand. I don't think we should be optimizing only for dumb users. In any case, updating primary key values is very rare; some would say it never happens. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Thu, Oct 20, 2016 at 10:39:23AM -0300, Alvaro Herrera wrote: > Bruce Momjian wrote: > > > Just to clarify, if a feature improves performance by 1%, but is enabled > > by default, that is 10x more useful across our entire user base as the > > feature numbers listed above, 1% vs 0.1%. > > Great. But not all users are alike. We have big profile users that > write blog posts that get echoed all over the world who would benefit > from things that perhaps other users would not benefit that much from. > > > Also, it seems indirect indexes would be useful for indexing columns > > that are not updated frequently on tables that are updated frequently, > > and whose primary key is not updated frequently. That's quite a logic > > problem for users to understand. > > I don't think we should be optimizing only for dumb users. In any case, > updating primary key values is very rare; some would say it never > happens. Uh, so we are writing the database for sophisticated users who write blog posts who make us look bad? Really? My point is that we need to look at the entire user base, and look at the adoption rates of features, and figure out how to maximize that. You are right that sophisticated users can hit roadblocks, and we need to give them a solution that keeps them on Postgres. However, if we can solve the problem in a way that everyone benefits from by default (e.g. WARM), why not implement that instead, or at least go as far as we can with that before adding a feature that only sophisticated users will know to enable. (My "logic problem" above shows that only sophisticated users will likely use indirect indexes.) If we have to solve a vacuum problem with indirect indexes, and WARM also is limited by the same vacuum problem, let's fix the vacuum problem for WARM, which will be used by all users. In summary, I need to know what problems indirect indexes fix that WARM cannot, or eventually cannot if given the amount of engineering work we would need to implement indirect indexes. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On 10/20/2016 06:39 AM, Alvaro Herrera wrote: Bruce Momjian wrote: Also, it seems indirect indexes would be useful for indexing columns that are not updated frequently on tables that are updated frequently, and whose primary key is not updated frequently. That's quite a logic problem for users to understand. I don't think we should be optimizing only for dumb users. In any case, updating primary key values is very rare; some would say it never happens. Just because a person doesn't understand a use case doesn't make them dumb. That said would it be possible to make this index an extension (like rum?). Assuming of course we can get any required infrastructure changes done in a general way. I do think the feature has merit. JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
On Wed, Oct 19, 2016 at 9:24 PM, Joshua D. Drake wrote: > After all these years, we are still regularly running into people who say, > "performance was bad so we disabled autovacuum". I am not talking about once > in a while, it is often. I would like us to consider removing the autovacuum > option. Here are a few reasons: > > 1. It does not hurt anyone > 2. It removes a foot gun > 3. Autovacuum is *not* optional, we shouldn't let it be > 4. People could still disable it at the table level for those tables that do > fall into the small window of, no maintenance is o.k. > 5. People would still have the ability to decrease the max_workers to 1 > (although I could argue about that too). Setting autovacuum=off is at least useful for testing purposes and I've used it that way. On the other hand, I haven't seen a customer disable this unintentionally in years. Generally, the customers I've worked with have found subtler ways of hosing themselves with autovacuum. One of my personal favorites is autovacuum_naptime='1 d' -- for the record, that did indeed work out very poorly. I think that this the kind of problem that can only properly be solved by education. If somebody thinks that they want to turn off autovacuum, and you keep them from turning it off, they just get frustrated. Sometimes, they then find a back-door way of getting what they want, like setting up a script to kill it whenever it starts, or changing the other thresholds so that it barely ever runs. But whether they resort to such measures or not, there is no real chance that they will be happy with PostgreSQL. And why should they be? It doesn't let them configure the settings that they want to configure. When some other program doesn't let me do what I want, I decide it's stupid. Pretty much the same thing here. The only way you actually get out from under this problem is by teaching people the right way to think about the settings they're busy misconfiguring. I'd actually rather go the other way with this and add a new autovacuum setting, autovacuum=really_off, that doesn't let autovacuum run even for wraparound. For example, let's say I've just recovered a badly damaged cluster using pg_resetxlog. I want to start it up and try to recover my data. I do *not* want VACUUM to decide to start removing things that I'm trying to recover. But there's no way to guarantee that today. So, you can't start up the cluster, look around, and then shut it down with the intent to change the next transaction ID if it's not right. Your data will be disappearing underneath you. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove vacuum_defer_cleanup_age
On Wed, Oct 19, 2016 at 9:09 PM, Bruce Momjian wrote: > On Wed, Oct 19, 2016 at 08:17:46PM -0400, Robert Haas wrote: >> On Wed, Oct 19, 2016 at 12:59 PM, Bruce Momjian wrote: >> > Uh, vacuum_defer_cleanup_age sets an upper limit on how long, in terms >> > of xids, that a standby query can run before cancel, like >> > old_snapshot_threshold, no? >> >> No, not really. It affects the behavior of the master, not the standby. > > But it controls when/if cancels happen on the standby. True. I see your point. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
Joshua D. Drake wrote: > That said would it be possible to make this index an extension (like rum?). > Assuming of course we can get any required infrastructure changes done in a > general way. Well, the patch I currently have creates a separate index AM called "ibtree" which is an indirect btree that internally calls the regular btree code -- pretty much what you propose. I think it can work that way, but it's not as efficient as it can be done if the feature is incorporated into core. There are things like obtaining the primary key value from the indexed tuple: in my extension I simply do "heap_fetch" to obtain the heap tuple, then heap_getattr() to obtain the values from the primary key columns. This works, but if instead the executor passes the primary key values as parameters, I can save both those steps, which are slow. Now if we do incorporate the infrastructure changes in core in a general way, which is what I am proposing, then the in-core provided btree AM can do indirect indexes without any further extension. The same infrastructure changes can later provide support for GIN indexes. > I do think the feature has merit. Thanks. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Hash Indexes
On Tue, Oct 18, 2016 at 8:27 PM, Amit Kapila wrote: > By this problem, I mean to say deadlocks for suspended scans, that can > happen in btree for non-Mvcc or other type of scans where we don't > release pin during scan. In my mind, we have below options: > > a. problem of deadlocks for suspended scans should be tackled as a > separate patch as it exists for other indexes (at least for some type > of scans). > b. Implement page-scan mode and then we won't have deadlock problem > for MVCC scans. > c. Let's not care for non-MVCC scans unless we have some way to hit > those for hash indexes and proceed with Dead tuple marking idea. I > think even if we don't care for non-MVCC scans, we might hit this > problem (deadlocks) when the index relation is unlogged. > > Here, even if we want to go with (b), I think we can handle it in a > separate patch, unless you think otherwise. After some off-list discussion with Amit, I think I get his point here: the deadlock hazard which is introduced by this patch already exists for btree and has for a long time, and nobody's gotten around to fixing it (although 2ed5b87f96d473962ec5230fd820abfeaccb2069 improved things). So it's probably OK for hash indexes to have the same issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote: > On Sat, Feb 6, 2016 at 4:52 AM, Noah Misch wrote: > > aset.c relies on the fact that VALGRIND_MEMPOOL_ALLOC() has an implicit > > VALGRIND_MAKE_MEM_UNDEFINED() and VALGRIND_MEMPOOL_FREE() has an implicit > > VALGRIND_MAKE_MEM_NOACCESS(). #define those two accordingly. If ASAN has > > no > > Actually this is confusing. > > aset.c doesn't actually use the MEMPOOL_ALLOC macro at all, it just > calls UNDEFINED, DEFINED, and NOACCESS. mcxt.c does however do the > MEMPOOL_ALLOC/FREE. Correct. > So both layers are calling these macros for > overlapping memory areas which I find very confusing and I'm not sure > what the net effect is. The net effect looks like this, at the instant an mcxt.c function returns: NOACCESS: - requested_size field of AllocChunkData - Trailing padding, if any, of AllocChunkData - Any chunk bytes after the last byte of the most recent requested size DEFINED: - palloc0() return value, up to requested size UNDEFINED: - palloc() return value, up to requested size - repalloc() new portion, after size increase (with MEMORY_CONTEXT_CHECKING disabled, memory unfortunately becomes DEFINED instead) > The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't > have convenient access to a size argument. It could call the > GetChunkSpace method but that will include the allocation overhead and That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested. Calling GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an assumption of mcxt.c, which is messy. Including the allocation overhead is fine, though. > in any case isn't this memory already marked noaccess by aset.c? Only sometimes, when AllocSetFree() happens to call free() or wipe_mem(). -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] incorrect libpq comment
On Wed, Oct 19, 2016 at 1:35 PM, Bruce Momjian wrote: > On Wed, Oct 19, 2016 at 01:16:28PM -0400, Robert Haas wrote: >> Bruce's commit 5d305d86bd917723f09ab4f15c075d90586a210a back in April >> of 2014 includes this change: >> >> /* See PQconnectPoll() for how we use 'int' and not 'pgsocket'. */ >> -int sock; /* Unix FD for socket, -1 if not connected >> */ >> +pgsocketsock; /* FD for socket, PGINVALID_SOCKET if >> unconnected */ >> >> I suppose Bruce must have overlooked the fact that the comment on the >> previous line is now false. I think we should remove it, because it >> makes no sense to say how we are using 'int' rather than 'pgsocket' >> when in fact we are not using 'int' any more. > > Agreed. Great. Nuked it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: two slab-like memory allocators
On Tue, Oct 18, 2016 at 6:27 PM, Petr Jelinek wrote: > I agree though that the usability beyond the ReoderBuffer is limited > because everything that will want to use it for part of allocations will > get much more complicated by the fact that it will have to use two > different allocators. > > I was wondering if rather than trying to implement new allocator we > should maybe implement palloc_fixed which would use some optimized > algorithm for fixed sized objects in our current allocator. The > advantage of that would be that we could for example use that for things > like ListCell easily (memory management of which I see quite often in > profiles). The sb_alloc allocator I proposed a couple of years ago would work well for this case, I think. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Avoiding pin scan during btree vacuum
On Wed, Oct 19, 2016 at 6:30 PM, Alvaro Herrera wrote: > Robert Haas wrote: >> On Mon, Jan 4, 2016 at 10:30 AM, Tom Lane wrote: >> >> This seems like a might subtle thing to backpatch. If we really want to >> >> go there, ISTM that the relevant code should stew in an unreleased >> >> branch for a while, before being backpatched. >> > >> > I'm definitely -1 on back-patching such a thing. Put it in HEAD for >> > awhile. If it survives six months or so then we could discuss it again. >> >> I agree with Tom. > > Okay, several months have passed with this in the development branch and > now seems a good time to backpatch this all the way back to 9.4. > > Any objections? Although the code has now been in the tree for six months, it's only been in a released branch for three weeks, which is something about which to think. I guess what's really bothering me is that I don't think this is a bug fix. It seems like a performance optimization. And I think that we generally have a policy that we don't back-patch performance optimizations. Of course, there is some fuzziness because when the performance gets sufficiently bad, we sometimes decide that it amounts to a bug, as in 73cc7d3b240e1d46b1996382e5735a820f8bc3f7. Maybe this case qualifies for similar treatment, but I'm not sure. Why are you talking about back-patching this to 9.4 rather than all supported branches? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File content logging during execution of COPY queries
On 10/20/2016 12:36 PM, Aleksander Alekseev wrote: According to my colleagues it would be very nice to have this feature. For instance, if you are trying to optimize PostgreSQL for application that uses COPY and you don't have access to or something like this. It could also be useful in some other cases. This use-case doesn't really make much sense to me. Can you explain it in more detail? Is the goal here to replicate all of the statements that are changing data in the database? The idea is to record application workload in real environment and write a benchmark based on this record. Then using this benchmark we could try different OS/DBMS configuration (or maybe hardware), find an extremum, then change configuration in production environment. It's not always possible to change an application or even database (e.g. to use triggers) for this purpose. For instance, if DBMS is provided as a service. Currently PostgreSQL allows to record all workload _except_ COPY queries. Considering how easily it could be done I think it's wrong. Basically the only real question here is how it should look like in postgresql.conf. OK, how about introducing a new boolean parameter named log_copy? Corresponding patch is attached. This is a useful feature I was waiting for some time. If some application which workload you want to collect is using COPY statement, then recording network traffic was your only option. -- Grigory Smolkin Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
[HACKERS] WIP: Fix invalid XML explain plans for track_io_timing
Hi! The XML output of explain potentially outputs the XML tag names "I/O-Write-Time" and "I/O-Read-Time", which are invalid due to the slash. This can easily be seen with this: set track_io_timing = on; explain (analyze true, buffers true, format xml) select 1; [...] 0.000 0.000 [...] Attached is a patch against master that translates slashes to dashes during XML formatting (very much like spaces are already translated to dashes). Removing the slash from those property names is another option, but is an incompatible change to the other formats (neither JSON nor YAML have issues with '/‘ in key names). Although the patch fixes the problem for the moment, it is incomplete in that sense that it continues to check against an incomplete black list. I guess this is how it slipped in: XML explain was added in 9.0, I/O timings in 9.2. Checking against an (abbreviated?) white list would be more future proof if new explain properties are added. Let me know if you consider this a better approach. I've also done a simple check to see if there are other dangerous characters used in explain properties at the moment: sed -n 's/.*ExplainProperty[^(]*(\s*"\([^"]*\)\".*/\1/p' src/backend/commands/explain.c |grep '[^-A-Za-z /]' Result: none. A similar check could be used at build-time to prevent introducing new property names that invalidate the XML output (not sure if this could ever reach 100% safety). Comments? -- Markus Winand - winand.at 0001-Fix-invalid-XML-explain-plans-for-track_io_timing.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Wed, Oct 19, 2016 at 7:05 AM, Christoph Berg wrote: > (tl;dr: rename pg_xlog yes, rename pg_resetxlog only if we have a good > alternative.) I'm amused by the idea of a TL;DR in parentheses at the very bottom of the email, but maybe I'm just easily amused. One idea would be to rename pg_resetxlog to pg_resetwal. I think that's actually an improvement. The "x" in "xlog" is not a clear abbreviation for "write-ahead", especially when we also use "x" in other contexts to mean "transaction" - think "xid". Given our current practices, we could justifiably rename pg_clog to pg_xlog -- after all, that's where we log the status of the xacts. Ugh. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] File content logging during execution of COPY queries
Grigory, * Grigory Smolkin (g.smol...@postgrespro.ru) wrote: > On 10/20/2016 12:36 PM, Aleksander Alekseev wrote: > According to my colleagues it would be very nice to have this feature. > For instance, if you are trying to optimize PostgreSQL for application > that uses COPY and you don't have access to or something like this. > It could also be useful in some other cases. > >>>This use-case doesn't really make much sense to me. Can you explain it > >>>in more detail? Is the goal here to replicate all of the statements > >>>that are changing data in the database? > >>The idea is to record application workload in real environment and write > >>a benchmark based on this record. Then using this benchmark we could try > >>different OS/DBMS configuration (or maybe hardware), find an extremum, > >>then change configuration in production environment. > >> > >>It's not always possible to change an application or even database (e.g. > >>to use triggers) for this purpose. For instance, if DBMS is provided as > >>a service. > >> > >>Currently PostgreSQL allows to record all workload _except_ COPY > >>queries. Considering how easily it could be done I think it's wrong. > >>Basically the only real question here is how it should look like in > >>postgresql.conf. > >OK, how about introducing a new boolean parameter named log_copy? > >Corresponding patch is attached. > > This is a useful feature I was waiting for some time. > If some application which workload you want to collect is using COPY > statement, then recording network traffic was your only option. As I pointed out to Aleksander, you would still need to record network traffic to see all of the data going to and from the database since we do not include SELECT or ... RETURNING results in the log files. If that is needed then that's a whole different discussion. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Indirect indexes
On 20/10/16 14:29, Bruce Momjian wrote: > On Wed, Oct 19, 2016 at 01:04:16PM -0400, Bruce Momjian wrote: >> On Wed, Oct 19, 2016 at 06:58:05PM +0200, Simon Riggs wrote: > I agree. Also, I think the recheck mechanism will have to be something > like > what I wrote for WARM i.e. only checking for index quals won't be enough > and we > would actually need to verify that the heap tuple satisfies the key in the > indirect index. I personally would like to see how far we get with WARM before adding this feature that requires a DBA to evaluate and enable it. >>> >>> Assuming WARM is accepted, that *might* be fine. >> >> First, I love WARM because everyone gets the benefits by default. For >> example, a feature that improves performance by 10% but is only used by >> 1% of users has a usefulness of 0.1% --- at least that is how I think of >> it. > > Just to clarify, if a feature improves performance by 1%, but is enabled > by default, that is 10x more useful across our entire user base as the > feature numbers listed above, 1% vs 0.1%. > > Also, it seems indirect indexes would be useful for indexing columns > that are not updated frequently on tables that are updated frequently, > and whose primary key is not updated frequently. That's quite a logic > problem for users to understand. > Which covers like 99.9% of problematic cases I see on daily basis. And by that logic we should not have indexes at all, they are not automatically created and user needs to think about if they need them or not. Also helping user who does not have performance problem by 1% is very different from helping user who has performance problem by 50% even if she needs to think about the solution a bit. WARM can do WARM update 50% of time, indirect index can do HOT update 100% of time (provided the column is not changed), I don't see why we could not have both solutions. That all being said, it would be interesting to hear Álvaro's thoughts about which use-cases he expects indirect indexes to work better than WARM. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Fix invalid XML explain plans for track_io_timing
Markus Winand writes: > The XML output of explain potentially outputs the XML tag names > "I/O-Write-Time" > and "I/O-Read-Time", which are invalid due to the slash. Ooops. > Although the patch fixes the problem for the moment, it is incomplete in that > sense that it continues to check against an incomplete black list. I guess > this is how it slipped in: XML explain was added in 9.0, I/O timings in 9.2. Yeah. The whitelist approach would look something like appendStringInfoChar(es->str, strchr(XMLCHARS, *s) ? *s : '-'); which would be quite a few more cycles than just testing for ' ' and '/'. So I'm not sure it's worth it. On the other hand, I have little faith that we wouldn't make a similar mistake in future. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Thu, Oct 20, 2016 at 12:14 PM, Petr Jelinek wrote: > WARM can do WARM update 50% of time, indirect index can do HOT update > 100% of time (provided the column is not changed), I don't see why we > could not have both solutions. > > That all being said, it would be interesting to hear Álvaro's thoughts > about which use-cases he expects indirect indexes to work better than WARM. I'm not Alvaro, but it's quite evident that indirect indexes don't need space on the same page to get the benefits of HOT update (even though it wouldn't be HOT). That's a big difference IMO. That said, WARM isn't inherently limited to 50%, but it *is* limited to HOT-like updates (new tuple is in the same page as the old), and since in many cases that is a limiting factor for HOT updates, one can expect WARM will be equally limited. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Thu, Oct 20, 2016 at 05:14:51PM +0200, Petr Jelinek wrote: > > Also, it seems indirect indexes would be useful for indexing columns > > that are not updated frequently on tables that are updated frequently, > > and whose primary key is not updated frequently. That's quite a logic > > problem for users to understand. > > > > Which covers like 99.9% of problematic cases I see on daily basis. > > And by that logic we should not have indexes at all, they are not > automatically created and user needs to think about if they need them or > not. Do you have to resort to extreme statements to make your point? The use of indexes is clear to most users, while the use of indirect indexes would not be, as I stated earlier. > Also helping user who does not have performance problem by 1% is very > different from helping user who has performance problem by 50% even if > she needs to think about the solution a bit. > > WARM can do WARM update 50% of time, indirect index can do HOT update > 100% of time (provided the column is not changed), I don't see why we > could not have both solutions. We don't know enough about the limits of WARM to say it is limited to 50%. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Thu, Oct 20, 2016 at 8:44 PM, Petr Jelinek wrote: > > > WARM can do WARM update 50% of time, indirect index can do HOT update > 100% of time (provided the column is not changed), I don't see why we > could not have both solutions. > > I think the reason why I restricted WARM to one update per chain, also applies to indirect index. For example, if a indirect column value is changed from 'a' to 'b' and back to 'a', there will be two pointers from 'a' to the PK and AFAICS that would lead to the same duplicate scan issue. We have a design to convert WARM chains back to HOT and that will increase the percentage of WARM updates much beyond 50%. I was waiting for feedback on the basic patch before putting in more efforts, but it went unnoticed last CF. > That all being said, it would be interesting to hear Álvaro's thoughts > about which use-cases he expects indirect indexes to work better than WARM. > > Yes, will be interesting to see that comparison. May be we need both or may be just one. Even better may be they complement each other.. I'll also put in some thoughts in this area. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/20/2016 09:36 AM, Dilip Kumar wrote: On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: I agree with these conclusions. I had a chance to talk with Andres this morning at Postgres Vision and based on that conversation I'd like to suggest a couple of additional tests: 1. Repeat this test on x86. In particular, I think you should test on the EnterpriseDB server cthulhu, which is an 8-socket x86 server. I have done my test on cthulhu, basic difference is that In POWER we saw ClogControlLock on top at 96 and more client with 300 scale factor. But, on cthulhu at 300 scale factor transactionid lock is always on top. So I repeated my test with 1000 scale factor as well on cthulhu. All configuration is same as my last test. Test with 1000 scale factor - Test1: number of clients: 192 Head: tps = 21206.108856 (including connections establishing) tps = 21206.245441 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 310489 LWLockNamed | CLogControlLock 296152 | 35537 Lock| transactionid 15821 LWLockTranche | buffer_mapping 10342 LWLockTranche | buffer_content 8427 LWLockTranche | clog 3961 3165 Lock| extend 2861 Lock| tuple 2781 LWLockNamed | ProcArrayLock 1104 LWLockNamed | XidGenLock 745 LWLockTranche | lock_manager 371 LWLockNamed | CheckpointerCommLock 70 LWLockTranche | wal_insert 5 BufferPin | BufferPin 3 LWLockTranche | proc Patch: tps = 28725.038933 (including connections establishing) tps = 28725.367102 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_192_ul.txt 540061 | 57810 LWLockNamed | CLogControlLock 36264 LWLockTranche | buffer_mapping 29976 Lock| transactionid 4770 Lock| extend 4735 LWLockTranche | clog 4479 LWLockNamed | ProcArrayLock 4006 3955 LWLockTranche | buffer_content 2505 LWLockTranche | lock_manager 2179 Lock| tuple 1977 LWLockNamed | XidGenLock 905 LWLockNamed | CheckpointerCommLock 222 LWLockTranche | wal_insert 8 LWLockTranche | proc Test2: number of clients: 96 Head: tps = 25447.861572 (including connections establishing) tps = 25448.012739 (excluding connections establishing) 261611 | 69604 LWLockNamed | CLogControlLock 6119 Lock| transactionid 4008 2874 LWLockTranche | buffer_mapping 2578 LWLockTranche | buffer_content 2355 LWLockNamed | ProcArrayLock 1245 Lock| extend 1168 LWLockTranche | clog 232 Lock| tuple 217 LWLockNamed | CheckpointerCommLock 160 LWLockNamed | XidGenLock 158 LWLockTranche | lock_manager 78 LWLockTranche | wal_insert 5 BufferPin | BufferPin Patch: tps = 32708.368938 (including connections establishing) tps = 32708.765989 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_96_ul.txt 326601 | 7471 LWLockNamed | CLogControlLock 5387 Lock| transactionid 4018 3331 LWLockTranche | buffer_mapping 3144 LWLockNamed | ProcArrayLock 1372 Lock| extend 722 LWLockTranche | buffer_content 393 LWLockNamed | XidGenLock 237 LWLockTranche | lock_manager 234 Lock| tuple 194 LWLockTranche | clog 96 Lock| relation 88 LWLockTranche | wal_insert 34 LWLockNamed | CheckpointerCommLock Test3: number of clients: 64 Head: tps = 28264.194438 (including connections establishing) tps = 28264.336270 (excluding connections establishing) 218264 | 10314 LWLockNamed | CLogControlLock 4019 2067 Lock| transactionid 1950 LWLockTranche | buffer_mapping 1879 LWLockNamed | ProcArrayLock 592 Lock| extend 565 LWLockTranche | buffer_content 222 LWLockNamed | XidGenLock 143 LWLockTranche | clog 131 LWLockNamed | CheckpointerCommLock 63 LWLockTranche | lock_manager 52 Lock| tuple 35 LWLockTranche | wal_insert Patch: tps = 27906.376194 (including connections establishing) tps = 27906.531392 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 1000_64_ul.txt 228108 | 4039 2294 Lock| transactionid 2116 LWLockTranche | buffer_mapping 1757 LWLockNamed | ProcArrayLock 1553 LWLockNamed | CLogControlLock 800 Lock| extend 403 LWLockTranche | buffer_content 92 LWLockNamed | XidGenLock 74 LWLockTranche | lock_manager 42 Lock| tuple 35 LWLockTranche | wal_insert 34 LWLockTranche | clog 14 LWLockNamed
Re: [HACKERS] Indirect indexes
On 20/10/16 17:24, Bruce Momjian wrote: > On Thu, Oct 20, 2016 at 05:14:51PM +0200, Petr Jelinek wrote: >>> Also, it seems indirect indexes would be useful for indexing columns >>> that are not updated frequently on tables that are updated frequently, >>> and whose primary key is not updated frequently. That's quite a logic >>> problem for users to understand. >>> >> >> Which covers like 99.9% of problematic cases I see on daily basis. >> >> And by that logic we should not have indexes at all, they are not >> automatically created and user needs to think about if they need them or >> not. > > Do you have to resort to extreme statements to make your point? The use > of indexes is clear to most users, while the use of indirect indexes > would not be, as I stated earlier. > Not extreme statement just pointing flaw in that logic. People need to understand same limitation for example when using most of current trigger-based replication systems as they don't support pkey updates. And no, many users don't know when to use indexes and which one is most appropriate even though indexes have been here for decades. The fact that some feature is not useful for everybody never stopped us from adding it before, especially when it can be extremely useful to some. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Wed, Oct 12, 2016 at 10:22 PM, Michael Paquier wrote: > OK. I can live with that as well. Attached are three patches. The > pg_xlog -> pg_wal move, the pg_clog -> pg_transaction move, and the > pg_clog -> pg_xact move. Only one survivor to be chosen among the last > two ones. Committed 0001. To be honest, I don't really like either pg_transaction or pg_xact. Neither name captures the fact that what we're really tracking here is the transaction *status*. pg_xact is slightly worse because it's a poor abbreviation for transaction, but I think the argument against even pg_transaction is similar to the one Tom previously levied against pg_logical - viz. "logical what?". The transaction themselves are not stored in the directory, just the commit status. The fact that commit status is saved is the source of the "c" in "clog". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar wrote: > On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: >> I agree with these conclusions. I had a chance to talk with Andres >> this morning at Postgres Vision and based on that conversation I'd >> like to suggest a couple of additional tests: >> >> 1. Repeat this test on x86. In particular, I think you should test on >> the EnterpriseDB server cthulhu, which is an 8-socket x86 server. > > I have done my test on cthulhu, basic difference is that In POWER we > saw ClogControlLock on top at 96 and more client with 300 scale > factor. But, on cthulhu at 300 scale factor transactionid lock is > always on top. So I repeated my test with 1000 scale factor as well on > cthulhu. So the upshot appears to be that this problem is a lot worse on power2 than cthulhu, which suggests that this is architecture-dependent. I guess it could also be kernel-dependent, but it doesn't seem likely, because: power2: Red Hat Enterprise Linux Server release 7.1 (Maipo), 3.10.0-229.14.1.ael7b.ppc64le cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64 So here's my theory. The whole reason why Tomas is having difficulty seeing any big effect from these patches is because he's testing on x86. When Dilip tests on x86, he doesn't see a big effect either, regardless of workload. But when Dilip tests on POWER, which I think is where he's mostly been testing, he sees a huge effect, because for some reason POWER has major problems with this lock that don't exist on x86. If that's so, then we ought to be able to reproduce the big gains on hydra, a community POWER server. In fact, I think I'll go run a quick test over there right now... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
Robert Haas writes: > One idea would be to rename pg_resetxlog to pg_resetwal. I think > that's actually an improvement. This would fit in as part of a general plan to s/xlog/wal/g throughout our user-visible names and documentation. Which seems like a good idea to me; there's no need to expose two different terms for the same thing. (I don't feel a great need to unify the terminology in the code, though. Just in stuff users see.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Thu, Oct 20, 2016 at 12:30 PM, Pavan Deolasee wrote: > > > On Thu, Oct 20, 2016 at 8:44 PM, Petr Jelinek wrote: >> >> >> >> WARM can do WARM update 50% of time, indirect index can do HOT update >> 100% of time (provided the column is not changed), I don't see why we >> could not have both solutions. >> > > I think the reason why I restricted WARM to one update per chain, also > applies to indirect index. For example, if a indirect column value is > changed from 'a' to 'b' and back to 'a', there will be two pointers from 'a' > to the PK and AFAICS that would lead to the same duplicate scan issue. > > We have a design to convert WARM chains back to HOT and that will increase > the percentage of WARM updates much beyond 50%. I was waiting for feedback > on the basic patch before putting in more efforts, but it went unnoticed > last CF. With indirect indexes, since you don't need to insert a tid, you can just "insert on conflict do nothing" on the index. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
On 10/20/2016 07:12 AM, Robert Haas wrote: On Wed, Oct 19, 2016 at 9:24 PM, Joshua D. Drake wrote: Setting autovacuum=off is at least useful for testing purposes and I've used it that way. On the other hand, I haven't seen a customer disable this unintentionally in years. Generally, the customers I've worked with have found subtler ways of hosing themselves with autovacuum. One of my personal favorites is autovacuum_naptime='1 d' -- for the record, that did indeed work out very poorly. Yes, I have seen that as well and you are right, it ends poorly. I think that this the kind of problem that can only properly be solved by education. If somebody thinks that they want to turn off autovacuum, and you keep them from turning it off, they just get frustrated. Sometimes, they then find a back-door way of getting what I think I am coming at this from a different perspective than the -hackers. Let me put this another way. The right answer isn't the answer founded in the reality for many if not most of our users. What do I mean by that? I mean that the right answer for -hackers isn't necessarily the right answer for users. Testing? Users don't test. They deploy. Education? If most people read the docs, CMD and a host of other companies would be out of business. I am not saying I have the right solution but I am saying I think we need a *different* solution. Something that limits a *USERS* choice to turn off autovacuum. If -hackers need testing or enterprise developers need testing, let's account for that but for the user that says this: My machine/instance bogs down every time autovacuum runs, oh I can turn it off Let's fix *that* problem. Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disable autovacuum guc?
On 10/20/2016 06:34 AM, Joshua D. Drake wrote: > On 10/19/2016 07:22 PM, Josh Berkus wrote: >> On 10/19/2016 06:27 PM, Joshua D. Drake wrote: >>> Hello, >>> >>> After all these years, we are still regularly running into people who >>> say, "performance was bad so we disabled autovacuum". I am not talking >>> about once in a while, it is often. I would like us to consider removing >>> the autovacuum option. Here are a few reasons: >>> >>> 1. It does not hurt anyone >>> 2. It removes a foot gun >>> 3. Autovacuum is *not* optional, we shouldn't let it be >>> 4. People could still disable it at the table level for those tables >>> that do fall into the small window of, no maintenance is o.k. >>> 5. People would still have the ability to decrease the max_workers to 1 >>> (although I could argue about that too). >> >> People who run data warehouses where all of the data comes in as batch >> loads regularly disable autovacuum, and should do so. For the DW/batch >> load use-case, it makes far more sense to do batch loads interspersed >> with ANALYZEs and VACUUMS of loaded/updated tables. > > Hrm, true although that is by far a minority of our users. What if we > made it so we disabled the autovacuum guc but made it so you could > disable autovacuum per database (ALTER DATABASE SET or something such > thing?). Well, that wouldn't fix the problem; people would just disable it per database, even if it was a bad idea. If I can't get rid of vacuum_defer_cleanup_age, you're not going to be able to get rid of autovacuum. Now, if you want to "fix" this issue, one thing which would help a lot is making it possible to disable/enable Autovacuum on one table without exclusive locking (for the ALTER statement), and create a how-to doc somewhere which explains how and why to disable autovac per-table. Most of our users don't know that it's even possible to adjust this per-table. -- -- Josh Berkus Red Hat OSAS (any opinions are my own) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Disable autovacuum guc?
On 10/20/2016 08:54 AM, Josh Berkus wrote: On 10/20/2016 06:34 AM, Joshua D. Drake wrote: On 10/19/2016 07:22 PM, Josh Berkus wrote: On 10/19/2016 06:27 PM, Joshua D. Drake wrote: Hrm, true although that is by far a minority of our users. What if we made it so we disabled the autovacuum guc but made it so you could disable autovacuum per database (ALTER DATABASE SET or something such thing?). Well, that wouldn't fix the problem; people would just disable it per database, even if it was a bad idea. I doubt this very much. It requires a different level of sophistication. General users (not you, not me, and certainly not Haas or Lane) don't run anything but an application backed to an ORM. They understand a conf file but they aren't going to touch anything they consider "underneath". Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
* Robert Haas (robertmh...@gmail.com) wrote: > On Wed, Oct 12, 2016 at 10:22 PM, Michael Paquier > wrote: > > OK. I can live with that as well. Attached are three patches. The > > pg_xlog -> pg_wal move, the pg_clog -> pg_transaction move, and the > > pg_clog -> pg_xact move. Only one survivor to be chosen among the last > > two ones. > > Committed 0001. Glad to see that happen, finally. > To be honest, I don't really like either pg_transaction or pg_xact. > Neither name captures the fact that what we're really tracking here is > the transaction *status*. pg_xact is slightly worse because it's a > poor abbreviation for transaction, but I think the argument against > even pg_transaction is similar to the one Tom previously levied > against pg_logical - viz. "logical what?". The transaction themselves > are not stored in the directory, just the commit status. The fact > that commit status is saved is the source of the "c" in "clog". This really needs to move forward also. When it comes to the name, I tend to think of 'pg_xact' as saying "this is where we persist info we need to keep about transactions." Today that's just the commit status info, but I could imagine that there might, someday, be other things that go in there. "pg_multixact" is an example of something quite similar but does have more than just one "thing." Also, using "pg_xact" and then "pg_subxact" and "pg_multixact" bring them all under one consistent naming scheme. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Indirect indexes
On Thu, Oct 20, 2016 at 9:20 PM, Claudio Freire wrote: > > > With indirect indexes, since you don't need to insert a tid, you can > just "insert on conflict do nothing" on the index. > Would that work with non-unique indexes? Anyways, the point I was trying to make is that there are a similar technical challenges and we could solve it for WARM as well with your work for finding conflicting pairs and then not doing inserts. My thinking currently is that it will lead to other challenges, especially around vacuum, but I could be wrong. What I tried to do with initial WARM patch is to show significant improvement even with just 50% WARM updates, yet keep the patch simple. But there are of course several things we can do to improve it further and support other index types. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Renaming of pg_xlog and pg_clog
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Robert Haas writes: > > One idea would be to rename pg_resetxlog to pg_resetwal. I think > > that's actually an improvement. > > This would fit in as part of a general plan to s/xlog/wal/g throughout > our user-visible names and documentation. Which seems like a good idea > to me; there's no need to expose two different terms for the same thing. > > (I don't feel a great need to unify the terminology in the code, though. > Just in stuff users see.) +1 on the general change of xlog -> wal. That said, I'd also like to see a --force or similar option or mechanism put in place to reduce the risk of users trashing their system because they think pg_resetwal is "safe." ("It's just gonna reset things to make the database start again, should be fine."). pg_destroydb almost seems like a better choice, though I suppose 'pg_clearwal' would be more acceptable. Doesn't have quite the same impact though. Not sure on the best answer here, but it's definitely foot-gun that some users have ended up using on themselves with depressing regularity. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On 10/20/2016 09:12 AM, Stephen Frost wrote: * Tom Lane (t...@sss.pgh.pa.us) wrote: Robert Haas writes: That said, I'd also like to see a --force or similar option or mechanism put in place to reduce the risk of users trashing their system because they think pg_resetwal is "safe." ("It's just gonna reset things to make the database start again, should be fine."). pg_destroydb almost seems like a better choice, though I suppose 'pg_clearwal' would be more acceptable. Doesn't have quite the same impact though. pg_dropwal Users won't *drop* things they shouldn't on purpose (usually) but they will reset and will clear them. Destroydb isn't anymore accurate because it doesn't destroy it. Instead it makes it so I can log in again and see my data. (Yes we all know the real implications with it but from a DUH user perspective...) Sincerely, JD -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
On Thu, Oct 20, 2016 at 11:53 AM, Joshua D. Drake wrote: > The right answer isn't the answer founded in the reality for many if not > most of our users. I think that's high-handed nonsense. Sure, there are some unsophisticated users who do incredibly stupid things and pay the price for it, but there are many users who are very sophisticated and make good decisions and don't want or need the system itself to act as a nanny. When we constrain the range of possible choices because somebody might do the wrong thing, sophisticated users are hurt because they can no longer make meaningful choices, and stupid users still get in trouble because that's what being stupid does. The only way to fix that is to help people be less stupid. You can't tell adult users running enterprise-grade software "I'm sorry, Dave, I can't do that". Or at least it can cause a few problems. > I mean that the right answer for -hackers isn't necessarily the right answer > for users. Testing? Users don't test. They deploy. Education? If most people > read the docs, CMD and a host of other companies would be out of business. I've run into these kinds of situations, but I know for a fact that there are quite a few EnterpriseDB customers who test extremely thoroughly, read the documentation in depth, and really understand the system at a very deep level. I can't say whether the majority of our customers fall into that category, but we certainly spend a lot of time working with the ones who do. > I am not saying I have the right solution but I am saying I think we need a > *different* solution. Something that limits a *USERS* choice to turn off > autovacuum. If -hackers need testing or enterprise developers need testing, > let's account for that but for the user that says this: > > My machine/instance bogs down every time autovacuum runs, oh I can turn it > off And I've seen customers solve real production problems by doing exactly that, and scheduling vacuums manually. Have I seen people cause bigger problems than the ones they were trying to solve? Yes. Have I recommended something a little less aggressive than completely shutting autovacuum off as perhaps being a better solution? Of course. But I'm not going to sit here and tell somebody "well, you know, what you are doing is working whereas the old thing was not working, but trust me, the way that didn't work was way better...". > Let's fix *that* problem. I will say again that I do not think that problem has a technical solution. It is a problem of knowledge, not technology. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
On Thu, Oct 20, 2016 at 9:24 AM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 11:53 AM, Joshua D. Drake > wrote: >> The right answer isn't the answer founded in the reality for many if not >> most of our users. > > I think that's high-handed nonsense. Sure, there are some > unsophisticated users who do incredibly stupid things and pay the > price for it, but there are many users who are very sophisticated and > make good decisions and don't want or need the system itself to act as > a nanny. When we constrain the range of possible choices because > somebody might do the wrong thing, sophisticated users are hurt > because they can no longer make meaningful choices, and stupid users > still get in trouble because that's what being stupid does. The only > way to fix that is to help people be less stupid. You can't tell > adult users running enterprise-grade software "I'm sorry, Dave, I > can't do that". Or at least it can cause a few problems. +1 I really don't like this paternalistic mindset. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
On 10/20/2016 09:24 AM, Robert Haas wrote: On Thu, Oct 20, 2016 at 11:53 AM, Joshua D. Drake wrote: The right answer isn't the answer founded in the reality for many if not most of our users. I think that's high-handed nonsense. Sure, there are some unsophisticated users who do incredibly stupid things and pay the price for it, but there are many users who are very sophisticated and make good decisions and don't want or need the system itself to act as a nanny. When we constrain the range of possible choices because That argument suggests we shouldn't have autovacuum :P somebody might do the wrong thing, sophisticated users are hurt because they can no longer make meaningful choices, and stupid users still get in trouble because that's what being stupid does. The only way to fix that is to help people be less stupid. You can't tell adult users running enterprise-grade software "I'm sorry, Dave, I can't do that". Or at least it can cause a few problems. As mentioned in an other reply, I am not suggesting we can't turn off autovacuum as a whole. Heck, I even suggested being able to turn it off per database (versus just per table). I am suggesting that we get rid of a foot gun for unsophisticated (and thus majority of our users). I mean that the right answer for -hackers isn't necessarily the right answer for users. Testing? Users don't test. They deploy. Education? If most people read the docs, CMD and a host of other companies would be out of business. I've run into these kinds of situations, but I know for a fact that there are quite a few EnterpriseDB customers who test extremely thoroughly, read the documentation in depth, and really understand the system at a very deep level. Those aren't exactly the users we are talking about are we? I also run into those users all the time. And I've seen customers solve real production problems by doing exactly that, and scheduling vacuums manually. Have I seen people 1 != 10 cause bigger problems than the ones they were trying to solve? Yes. Have I recommended something a little less aggressive than completely shutting autovacuum off as perhaps being a better solution? Of course. But I'm not going to sit here and tell somebody "well, you know, what you are doing is working whereas the old thing was not working, but trust me, the way that didn't work was way better...". I find it interesting that we are willing to do that every time we add a feature but once we have that feature it is like pulling teeth to show the people that implemented those features that some people don't think it was better :P Seriously though. I am only speaking from experience from 20 years of customers. CMD also has customers just like yours but we also run into lots and lots of people that still do really silly things like we have already discussed. Sincerely, Joshua D. Drake -- Command Prompt, Inc. http://the.postgres.company/ +1-503-667-4564 PostgreSQL Centered full stack support, consulting and development. Everyone appreciates your honesty, until you are honest with them. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 12:05 PM, Stephen Frost wrote: >> To be honest, I don't really like either pg_transaction or pg_xact. > >> Neither name captures the fact that what we're really tracking here is >> the transaction *status*. pg_xact is slightly worse because it's a >> poor abbreviation for transaction, but I think the argument against >> even pg_transaction is similar to the one Tom previously levied >> against pg_logical - viz. "logical what?". The transaction themselves >> are not stored in the directory, just the commit status. The fact >> that commit status is saved is the source of the "c" in "clog". > > This really needs to move forward also. > > When it comes to the name, I tend to think of 'pg_xact' as saying "this > is where we persist info we need to keep about transactions." Today > that's just the commit status info, but I could imagine that there > might, someday, be other things that go in there. "pg_multixact" is > an example of something quite similar but does have more than just one > "thing." Also, using "pg_xact" and then "pg_subxact" and "pg_multixact" > bring them all under one consistent naming scheme. I don't dispute the fact that you tend to think of it that way, but I think it's a real stretch to say that "pg_xact" is a clear name from the point of view of the uninitiated. Now, maybe the point is to be a little bit deliberately unclear, but "xact" for "transaction" is not a lot better than "xlog" for "write-ahead log". It's just arbitrary abbreviations we made up and you either know what they mean or you don't. We could call it "pg_xkcd" and we wouldn't be removing much in the way of clarity. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 12:12 PM, Stephen Frost wrote: > That said, I'd also like to see a --force or similar option or mechanism > put in place to reduce the risk of users trashing their system because > they think pg_resetwal is "safe." ("It's just gonna reset things to make > the database start again, should be fine."). You know we already have that, right? > pg_destroydb almost seems like a better choice, though I suppose > 'pg_clearwal' would be more acceptable. Doesn't have quite the same > impact though. > > Not sure on the best answer here, but it's definitely foot-gun that some > users have ended up using on themselves with depressing regularity. Just to provide some perspective from the other side of this, I handled a severity-1 escalation a few weeks ago where a user basically called up and explained their situation and I told them based on all of the facts and circumstances presented that running pg_resetxlog was going to be the best way forward and they said that was pretty much what they were expecting to hear, and did so. I think there's far too much anti-pg_resetxlog bias on this list. The situation where you need to use it is not common in general, but it's pretty common in support cases that reach me. I don't think I'd be exaggerating if I said that 25% of the customer escalations I handle personally require the use pg_resetxlog. Of course, that's because by the time the ticket gets to me, it's typically the case that all of the easy, neat solutions have already been ruled out. My experience is that users who are faced with this situation typically understand that they are in a bad situation and when I explain to them --- in a clear and direct way but without the sort of hysterical exaggeration sometimes seen on this mailing list --- what the consequences are, they are not surprised by those consequences and they are willing to accept them because they know that the alternatives are worse. For example, consider a customer with a mostly read-only 20TB database. If that user runs pg_resetxlog, they can be back up in 5 minutes and it is likely (though of course not certain) that corruption will be minimal. If they restore from backup, they will be down for many more hours. After pg_resetxlog, they can restore from backup on another system or do the dump-and-restore which I invariably recommend while they are up. For many customers, this is a good trade-off. The decision, because of the risks associated with it, is usually passed up from the DBA I'm working with to some business leader. If that business leader feels that benefits of being up immediately rather than many hours later are sufficient to justify the risk of a possibly-corrupted database, running pg_resetxlog is a perfectly reasonable decision. I have not yet had one DBA or business leader call back and say "hey, that thing where we ran pg_resetxlog? you should have discouraged us more, because that was a disaster". Now maybe those disasters have happened and I am just not aware of them, but I think we should all take a deep breath and remind ourselves that the database exists in service to the business and not the other way around. My job -- when doing support -- is to tell you what your options are and what consequences they may have, good and bad -- not to tell you how to run your company. Telling users that pg_resetxlog will "destroy your database" is over-the-top hyperbole. It's a sharp tool with risks and benefits and it deserves to be presented exactly that way. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
Hello, What about a simpler solution to all of this. Let's just remove it from postgresql.conf. Out of sight. If someone needs to test they can but a uneducated user won't immediately know what to do about that "autovacuum process" and when they look it up the documentation is exceedingly blunt about why to *not* turn it off. Sincerely, JD -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Index Scans
On Wed, Oct 19, 2016 at 11:07 PM, Amit Kapila wrote: >> Ideally, the parallel_workers storage parameter will rarely be >> necessary because the optimizer will generally do the right thing in >> all case. > > Yeah, we can choose not to provide any parameter for parallel index > scans, but some users might want to have a parameter similar to > parallel table scans, so it could be handy for them to use. I think the parallel_workers reloption should override the degree of parallelism for any sort of parallel scan on that table. Had I intended it to apply only to sequential scans, I would have named it differently. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
On Thu, Oct 20, 2016 at 12:32 PM, Joshua D. Drake wrote: > That argument suggests we shouldn't have autovacuum :P It certainly does not. That, too, would be removing a useful option. In fact, it would be removing the most useful option that is the right choice for most users in 99% of cases. > As mentioned in an other reply, I am not suggesting we can't turn off > autovacuum as a whole. Heck, I even suggested being able to turn it off per > database (versus just per table). I am suggesting that we get rid of a foot > gun for unsophisticated (and thus majority of our users). It has to be possible to shut it off in postgresql.conf, before starting the server. Anything per-database wouldn't have that characteristic. >> I've run into these kinds of situations, but I know for a fact that >> there are quite a few EnterpriseDB customers who test extremely >> thoroughly, read the documentation in depth, and really understand the >> system at a very deep level. > > Those aren't exactly the users we are talking about are we? I also run into > those users all the time. Well, we can't very well ship the autovacuum option only to the smart customers and remove it for the dumb ones, can we? The option either exists or it doesn't. > 1 != 10 I concede the truth of that statement, but not whatever it is you intend to imply thereby. > I find it interesting that we are willing to do that every time we add a > feature but once we have that feature it is like pulling teeth to show the > people that implemented those features that some people don't think it was > better :P Well, we don't allow much dumb stuff to get added in the first place, so there isn't much to take out. I'm not trying to argue that we're perfect here. There are certainly changes I'd like to make that other people oppose and, well, I think they are wrong. And I know I'm wrong about some things, too: I just don't know which things, or I'd change my mind about just those. > Seriously though. I am only speaking from experience from 20 years of > customers. CMD also has customers just like yours but we also run into lots > and lots of people that still do really silly things like we have already > discussed. I think it would be better to confine this thread to the specific issue of whether removing the autovacuum GUC is a good idea rather than turning it into a referendum on whether you are an experience PostgreSQL professional. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] LLVM Address Sanitizer (ASAN) and valgrind support
On Oct 20, 2016 5:27 PM, "Noah Misch" wrote: > > On Wed, Oct 19, 2016 at 11:08:39AM +0100, Greg Stark wrote: > > > The MEMPOOL_FREE doesn't take any size argument and mcxt.c doesn't > > have convenient access to a size argument. It could call the > > GetChunkSpace method but that will include the allocation overhead and > > That is indeed a problem for making VALGRIND_MEMPOOL_FREE() an alias of > VALGRIND_MAKE_MEM_NOACCESS() under ASAN as I suggested. Calling > GetMemoryChunkSpace() in the macro would cause memdebug.h to embed an > assumption of mcxt.c, which is messy. Including the allocation overhead is > fine, though. I think the way out is to simply have aset.c make the memory undefined/noaccess even if it's redundant under valgrind. It's a bit unfortunate that the macros would have different semantics under the two. If it's too confusing or we're worried about the performance overhead we could make a MAKE_MEM_{UNDEFINED,NOACCESS}_IF_NO_MEMPOOL() but I don't think it's worth it myself. > > in any case isn't this memory already marked noaccess by aset.c? > > Only sometimes, when AllocSetFree() happens to call free() or wipe_mem(). I think if I did further surgery here I would rename wipe_mem and randomise_mem and make them responsible for making the memory undefined and noaccess as well. They would always be defined so that would get rid of all the ifdefs except within those functions. I have a patch working under asan on both gcc and clang. That handles noaccess but not undefined memory reads. I need to try msan to make sure the macro definitions work well for that API too. There are a couple build oddities both with gcc and clang before I can commit anything though. And I can't test valgrind to be sure the redundant calls aren't causing a problem.
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 12:29:47PM -0400, Robert Haas wrote: > > When it comes to the name, I tend to think of 'pg_xact' as saying "this > > is where we persist info we need to keep about transactions." Today > > that's just the commit status info, but I could imagine that there > > might, someday, be other things that go in there. "pg_multixact" is > > an example of something quite similar but does have more than just one > > "thing." Also, using "pg_xact" and then "pg_subxact" and "pg_multixact" > > bring them all under one consistent naming scheme. > > I don't dispute the fact that you tend to think of it that way, but I > think it's a real stretch to say that "pg_xact" is a clear name from > the point of view of the uninitiated. Now, maybe the point is to be a > little bit deliberately unclear, but "xact" for "transaction" is not a > lot better than "xlog" for "write-ahead log". It's just arbitrary > abbreviations we made up and you either know what they mean or you > don't. We could call it "pg_xkcd" and we wouldn't be removing much in > the way of clarity. What is your suggestion for a name? If you have none, I suggest we use "pg_xact". -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar wrote: >> On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: >>> I agree with these conclusions. I had a chance to talk with Andres >>> this morning at Postgres Vision and based on that conversation I'd >>> like to suggest a couple of additional tests: >>> >>> 1. Repeat this test on x86. In particular, I think you should test on >>> the EnterpriseDB server cthulhu, which is an 8-socket x86 server. >> >> I have done my test on cthulhu, basic difference is that In POWER we >> saw ClogControlLock on top at 96 and more client with 300 scale >> factor. But, on cthulhu at 300 scale factor transactionid lock is >> always on top. So I repeated my test with 1000 scale factor as well on >> cthulhu. > > So the upshot appears to be that this problem is a lot worse on power2 > than cthulhu, which suggests that this is architecture-dependent. I > guess it could also be kernel-dependent, but it doesn't seem likely, > because: > > power2: Red Hat Enterprise Linux Server release 7.1 (Maipo), > 3.10.0-229.14.1.ael7b.ppc64le > cthulhu: CentOS Linux release 7.2.1511 (Core), 3.10.0-229.7.2.el7.x86_64 > > So here's my theory. The whole reason why Tomas is having difficulty > seeing any big effect from these patches is because he's testing on > x86. When Dilip tests on x86, he doesn't see a big effect either, > regardless of workload. But when Dilip tests on POWER, which I think > is where he's mostly been testing, he sees a huge effect, because for > some reason POWER has major problems with this lock that don't exist > on x86. > > If that's so, then we ought to be able to reproduce the big gains on > hydra, a community POWER server. In fact, I think I'll go run a quick > test over there right now... And ... nope. I ran a 30-minute pgbench test on unpatched master using unlogged tables at scale factor 300 with 64 clients and got these results: 14 LWLockTranche | wal_insert 36 LWLockTranche | lock_manager 45 LWLockTranche | buffer_content 223 Lock| tuple 527 LWLockNamed | CLogControlLock 921 Lock| extend 1195 LWLockNamed | XidGenLock 1248 LWLockNamed | ProcArrayLock 3349 Lock| transactionid 85957 Client | ClientRead 135935 | I then started a run at 96 clients which I accidentally killed shortly before it was scheduled to finish, but the results are not much different; there is no hint of the runaway CLogControlLock contention that Dilip sees on power2. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 1:39 PM, Bruce Momjian wrote: > On Thu, Oct 20, 2016 at 12:29:47PM -0400, Robert Haas wrote: >> > When it comes to the name, I tend to think of 'pg_xact' as saying "this >> > is where we persist info we need to keep about transactions." Today >> > that's just the commit status info, but I could imagine that there >> > might, someday, be other things that go in there. "pg_multixact" is >> > an example of something quite similar but does have more than just one >> > "thing." Also, using "pg_xact" and then "pg_subxact" and "pg_multixact" >> > bring them all under one consistent naming scheme. >> >> I don't dispute the fact that you tend to think of it that way, but I >> think it's a real stretch to say that "pg_xact" is a clear name from >> the point of view of the uninitiated. Now, maybe the point is to be a >> little bit deliberately unclear, but "xact" for "transaction" is not a >> lot better than "xlog" for "write-ahead log". It's just arbitrary >> abbreviations we made up and you either know what they mean or you >> don't. We could call it "pg_xkcd" and we wouldn't be removing much in >> the way of clarity. > > What is your suggestion for a name? If you have none, I suggest we use > "pg_xact". I'm not sure. pg_transaction_status would be clear, but it's long. Is pg_xact actually better than pg_clog? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
Robert Haas writes: > Is pg_xact actually better than pg_clog? Yes, because it doesn't contain the three letters "log". We have the two precedents "pg_subtrans" and "pg_multixact", so unless we want to get into renaming those too, I think "pg_trans" and "pg_xact" are really the only options worth considering. Personally I'd go for "pg_trans", but it's only a weak preference. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane wrote: > Robert Haas writes: >> Is pg_xact actually better than pg_clog? > > Yes, because it doesn't contain the three letters "log". I figured somebody was going to say that. > We have the two precedents "pg_subtrans" and "pg_multixact", so > unless we want to get into renaming those too, I think "pg_trans" > and "pg_xact" are really the only options worth considering. > > Personally I'd go for "pg_trans", but it's only a weak preference. Heaven forfend we actually use enough characters to make it self-documenting. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
Robert Haas writes: > On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane wrote: >> We have the two precedents "pg_subtrans" and "pg_multixact", so >> unless we want to get into renaming those too, I think "pg_trans" >> and "pg_xact" are really the only options worth considering. >> >> Personally I'd go for "pg_trans", but it's only a weak preference. > Heaven forfend we actually use enough characters to make it self-documenting. $ ls $PGDATA PG_VERSION pg_dynshmem/ pg_notify/ pg_stat_tmp/ postgresql.auto.conf base/ pg_hba.confpg_replslot/ pg_subtrans/ postgresql.conf global/pg_ident.conf pg_serial/ pg_tblspc/postmaster.opts pg_clog/ pg_logical/pg_snapshots/ pg_twophase/ postmaster.pid pg_commit_ts/ pg_multixact/ pg_stat/ pg_wal/ I don't see one single one of those subdirectory names that I'd call self-documenting. Are you proposing we rename them all with carpal- tunnel-syndrome-promoting names? There's certainly some case to be made for renaming at least one of "pg_subtrans" and "pg_multixact" so that these three similarly-purposed subdirectories can all have similar names. But I think on the whole that's (a) fixing what ain't broken, and (b) making it even more unlikely that we'll ever get to consensus on changing anything. We've managed to agree that we need to change the names ending in "log"; let's do that and be happy that we've removed one foot-gun from the system. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 02:02:27PM -0400, Robert Haas wrote: > On Thu, Oct 20, 2016 at 1:39 PM, Bruce Momjian wrote: > > On Thu, Oct 20, 2016 at 12:29:47PM -0400, Robert Haas wrote: > >> > When it comes to the name, I tend to think of 'pg_xact' as saying "this > >> > is where we persist info we need to keep about transactions." Today > >> > that's just the commit status info, but I could imagine that there > >> > might, someday, be other things that go in there. "pg_multixact" is > >> > an example of something quite similar but does have more than just one > >> > "thing." Also, using "pg_xact" and then "pg_subxact" and "pg_multixact" > >> > bring them all under one consistent naming scheme. > >> > >> I don't dispute the fact that you tend to think of it that way, but I > >> think it's a real stretch to say that "pg_xact" is a clear name from > >> the point of view of the uninitiated. Now, maybe the point is to be a > >> little bit deliberately unclear, but "xact" for "transaction" is not a > >> lot better than "xlog" for "write-ahead log". It's just arbitrary > >> abbreviations we made up and you either know what they mean or you > >> don't. We could call it "pg_xkcd" and we wouldn't be removing much in > >> the way of clarity. > > > > What is your suggestion for a name? If you have none, I suggest we use > > "pg_xact". > > I'm not sure. pg_transaction_status would be clear, but it's long. > Is pg_xact actually better than pg_clog? Uh, yeah, no "log". Wasn't that the point? -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 2:23 PM, Tom Lane wrote: > Robert Haas writes: >> On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane wrote: >>> We have the two precedents "pg_subtrans" and "pg_multixact", so >>> unless we want to get into renaming those too, I think "pg_trans" >>> and "pg_xact" are really the only options worth considering. >>> >>> Personally I'd go for "pg_trans", but it's only a weak preference. > >> Heaven forfend we actually use enough characters to make it self-documenting. > > $ ls $PGDATA > PG_VERSION pg_dynshmem/ pg_notify/ pg_stat_tmp/ > postgresql.auto.conf > base/ pg_hba.confpg_replslot/ pg_subtrans/ postgresql.conf > global/pg_ident.conf pg_serial/ pg_tblspc/postmaster.opts > pg_clog/ pg_logical/pg_snapshots/ pg_twophase/ postmaster.pid > pg_commit_ts/ pg_multixact/ pg_stat/ pg_wal/ > > I don't see one single one of those subdirectory names that I'd call > self-documenting. Are you proposing we rename them all with carpal- > tunnel-syndrome-promoting names? No. Are you proposing that self-documenting names are a bad thing rather than a good thing? > There's certainly some case to be made for renaming at least one of > "pg_subtrans" and "pg_multixact" so that these three similarly-purposed > subdirectories can all have similar names. But I think on the whole > that's (a) fixing what ain't broken, and (b) making it even more unlikely > that we'll ever get to consensus on changing anything. We've managed to > agree that we need to change the names ending in "log"; let's do that > and be happy that we've removed one foot-gun from the system. I agree that there is an overwhelming consensus in favor of getting "log" out of the names, but I do not agree that the only two possible alternative names are "pg_trans" and "pg_xact", which strikes me as being akin to saying that the only two options for dinner tonight are overripe peaches and lunch meats a week past the sell-by date. If I had to pick only between those two, I suppose I'd go with "pg_xact" which is clear enough to someone familiar with PostgreSQL internals, as opposed to "pg_trans" which I don't think will be clear even to those people. But I don't like either one very much. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure wrote: > On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian wrote: >> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: >>> > Yeah. Believe me -- I know the drill. Most or all the damage seemed >>> > to be to the system catalogs with at least two critical tables dropped >>> > or inaccessible in some fashion. A lot of the OIDs seemed to be >>> > pointing at the wrong thing. Couple more datapoints here. >>> > >>> > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) >>> > *) Another database on the same cluster was not impacted. However >>> > it's more olap style and may not have been written to during the >>> > outage >>> > >>> > Now, this infrastructure running this system is running maybe 100ish >>> > postgres clusters and maybe 1000ish sql server instances with >>> > approximately zero unexplained data corruption issues in the 5 years >>> > I've been here. Having said that, this definitely smells and feels >>> > like something on the infrastructure side. I'll follow up if I have >>> > any useful info. >>> >>> After a thorough investigation I now have credible evidence the source >>> of the damage did not originate from the database itself. >>> Specifically, this database is mounted on the same volume as the >>> operating system (I know, I know) and something non database driven >>> sucked up disk space very rapidly and exhausted the volume -- fast >>> enough that sar didn't pick it up. Oh well :-) -- thanks for the help >> >> However, disk space exhaustion should not lead to corruption unless the >> underlying layers lied in some way. > > I agree -- however I'm sufficiently separated from the things doing > the things that I can't verify that in any real way. In the meantime > I'm going to take standard precautions (enable checksums/dedicated > volume/replication). Low disk space also does not explain the bizarre > outage I had last friday. ok, data corruption struck again. This time disk space is ruled out, and access to the database is completely denied: postgres=# \c castaging WARNING: leaking still-referenced relcache entry for "pg_index_indexrelid_index" merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 02:23:32PM -0400, Tom Lane wrote: > Robert Haas writes: > > On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane wrote: > >> We have the two precedents "pg_subtrans" and "pg_multixact", so > >> unless we want to get into renaming those too, I think "pg_trans" > >> and "pg_xact" are really the only options worth considering. > >> > >> Personally I'd go for "pg_trans", but it's only a weak preference. > > > Heaven forfend we actually use enough characters to make it > > self-documenting. > > $ ls $PGDATA > PG_VERSION pg_dynshmem/ pg_notify/ pg_stat_tmp/ > postgresql.auto.conf > base/ pg_hba.confpg_replslot/ pg_subtrans/ postgresql.conf > global/pg_ident.conf pg_serial/ pg_tblspc/postmaster.opts > pg_clog/ pg_logical/pg_snapshots/ pg_twophase/ postmaster.pid > pg_commit_ts/ pg_multixact/ pg_stat/ pg_wal/ > > I don't see one single one of those subdirectory names that I'd call > self-documenting. That's a problem we should do something about, even if we can't do it by renaming these all in one go. At the very least, we can do this for any new names. > Are you proposing we rename them all with carpal- > tunnel-syndrome-promoting names? Are you saying that people are getting carpal tunnel syndrome from hitting the tab key, which has been standard for completion in shells for decades? I'm pretty sure that doesn't actually happen. > There's certainly some case to be made for renaming at least one of > "pg_subtrans" and "pg_multixact" so that these three similarly-purposed > subdirectories can all have similar names. But I think on the whole > that's (a) fixing what ain't broken, and (b) making it even more unlikely > that we'll ever get to consensus on changing anything. We've managed to > agree that we need to change the names ending in "log"; let's do that > and be happy that we've removed one foot-gun from the system. Removing foot guns, un-sexy as it may be from a developer's perspective, is very useful work. Best, David. -- David Fetter http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 20, 2016 at 1:52 PM, Merlin Moncure wrote: > On Wed, Oct 19, 2016 at 2:39 PM, Merlin Moncure wrote: >> On Wed, Oct 19, 2016 at 9:56 AM, Bruce Momjian wrote: >>> On Wed, Oct 19, 2016 at 08:54:48AM -0500, Merlin Moncure wrote: > Yeah. Believe me -- I know the drill. Most or all the damage seemed > to be to the system catalogs with at least two critical tables dropped > or inaccessible in some fashion. A lot of the OIDs seemed to be > pointing at the wrong thing. Couple more datapoints here. > > *) This database is OLTP, doing ~ 20 tps avg (but very bursty) > *) Another database on the same cluster was not impacted. However > it's more olap style and may not have been written to during the > outage > > Now, this infrastructure running this system is running maybe 100ish > postgres clusters and maybe 1000ish sql server instances with > approximately zero unexplained data corruption issues in the 5 years > I've been here. Having said that, this definitely smells and feels > like something on the infrastructure side. I'll follow up if I have > any useful info. After a thorough investigation I now have credible evidence the source of the damage did not originate from the database itself. Specifically, this database is mounted on the same volume as the operating system (I know, I know) and something non database driven sucked up disk space very rapidly and exhausted the volume -- fast enough that sar didn't pick it up. Oh well :-) -- thanks for the help >>> >>> However, disk space exhaustion should not lead to corruption unless the >>> underlying layers lied in some way. >> >> I agree -- however I'm sufficiently separated from the things doing >> the things that I can't verify that in any real way. In the meantime >> I'm going to take standard precautions (enable checksums/dedicated >> volume/replication). Low disk space also does not explain the bizarre >> outage I had last friday. > > ok, data corruption struck again. This time disk space is ruled out, > and access to the database is completely denied: > postgres=# \c castaging > WARNING: leaking still-referenced relcache entry for > "pg_index_indexrelid_index" single user mode dumps core :( bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging LOG: 0: could not change directory to "/root": Permission denied LOCATION: resolve_symlinks, exec.c:293 Segmentation fault (core dumped) Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data castaging'. Program terminated with signal 11, Segmentation fault. #0 0x00797d6f in ?? () Missing separate debuginfos, use: debuginfo-install postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 (gdb) bt #0 0x00797d6f in ?? () #1 0x0079acf1 in RelationCacheInitializePhase3 () #2 0x007b35c5 in InitPostgres () #3 0x006b9b53 in PostgresMain () #4 0x005f30fb in main () (gdb) merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
Merlin Moncure writes: > single user mode dumps core :( You've got a mess there :-( > Missing separate debuginfos, use: debuginfo-install > postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 This backtrace would likely be much more informative if you did the above. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 20, 2016 at 2:07 PM, Tom Lane wrote: > Merlin Moncure writes: >> single user mode dumps core :( > > You've got a mess there :-( > >> Missing separate debuginfos, use: debuginfo-install >> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 > > This backtrace would likely be much more informative if you did the above. can't; don't have the package unfortunately. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
* David Fetter (da...@fetter.org) wrote: > On Thu, Oct 20, 2016 at 02:23:32PM -0400, Tom Lane wrote: > > Robert Haas writes: > > > On Thu, Oct 20, 2016 at 2:09 PM, Tom Lane wrote: > > >> We have the two precedents "pg_subtrans" and "pg_multixact", so > > >> unless we want to get into renaming those too, I think "pg_trans" > > >> and "pg_xact" are really the only options worth considering. > > >> > > >> Personally I'd go for "pg_trans", but it's only a weak preference. > > > > > Heaven forfend we actually use enough characters to make it > > > self-documenting. > > > > $ ls $PGDATA > > PG_VERSION pg_dynshmem/ pg_notify/ pg_stat_tmp/ > > postgresql.auto.conf > > base/ pg_hba.confpg_replslot/ pg_subtrans/ postgresql.conf > > global/pg_ident.conf pg_serial/ pg_tblspc/postmaster.opts > > pg_clog/ pg_logical/pg_snapshots/ pg_twophase/ postmaster.pid > > pg_commit_ts/ pg_multixact/ pg_stat/ pg_wal/ > > > > I don't see one single one of those subdirectory names that I'd call > > self-documenting. > > That's a problem we should do something about, even if we can't do it > by renaming these all in one go. At the very least, we can do this > for any new names. I disagree that excessivly long names for files or directories are useful and should be fully self-documenting. We should name our directories with useful hints at what they are used for that remind those who are knowledgable where things live. Secondary to that is using an approach to naming which avoid implying anything about the directories to those who are *not* knowledgable, which leads us to the current discussion regarding the removal of directories with 'log' in the name. Individuals who are not knowledgable in this area are not going to get any more benefit from a directory named 'pg_trans' or 'pg_transaction_status' than one named 'pg_clog' or 'pg_xact' and therefore this whole line of reasoning is a red herring. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Renaming of pg_xlog and pg_clog
Stephen Frost writes: > * David Fetter (da...@fetter.org) wrote: >> On Thu, Oct 20, 2016 at 02:23:32PM -0400, Tom Lane wrote: >>> I don't see one single one of those subdirectory names that I'd call >>> self-documenting. >> That's a problem we should do something about, even if we can't do it >> by renaming these all in one go. At the very least, we can do this >> for any new names. > I disagree that excessivly long names for files or directories are > useful and should be fully self-documenting. I'm mostly with Stephen on this. As the names stand, they encourage people to go look at the documentation, https://www.postgresql.org/docs/devel/static/storage-file-layout.html which will provide more information than you'd ever get out of any reasonable directory name. Having said that, I still don't like "pg_logical", but I suppose renaming it would have more downsides than upsides. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On 10/20/2016 07:59 PM, Robert Haas wrote: On Thu, Oct 20, 2016 at 11:45 AM, Robert Haas wrote: On Thu, Oct 20, 2016 at 3:36 AM, Dilip Kumar wrote: On Thu, Oct 13, 2016 at 12:25 AM, Robert Haas wrote: >> ... So here's my theory. The whole reason why Tomas is having difficulty seeing any big effect from these patches is because he's testing on x86. When Dilip tests on x86, he doesn't see a big effect either, regardless of workload. But when Dilip tests on POWER, which I think is where he's mostly been testing, he sees a huge effect, because for some reason POWER has major problems with this lock that don't exist on x86. If that's so, then we ought to be able to reproduce the big gains on hydra, a community POWER server. In fact, I think I'll go run a quick test over there right now... And ... nope. I ran a 30-minute pgbench test on unpatched master using unlogged tables at scale factor 300 with 64 clients and got these results: 14 LWLockTranche | wal_insert 36 LWLockTranche | lock_manager 45 LWLockTranche | buffer_content 223 Lock| tuple 527 LWLockNamed | CLogControlLock 921 Lock| extend 1195 LWLockNamed | XidGenLock 1248 LWLockNamed | ProcArrayLock 3349 Lock| transactionid 85957 Client | ClientRead 135935 | I then started a run at 96 clients which I accidentally killed shortly before it was scheduled to finish, but the results are not much different; there is no hint of the runaway CLogControlLock contention that Dilip sees on power2. What shared_buffer size were you using? I assume the data set fit into shared buffers, right? FWIW as I explained in the lengthy post earlier today, I can actually reproduce the significant CLogControlLock contention (and the patches do reduce it), even on x86_64. For example consider these two tests: * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip However, it seems I can also reproduce fairly bad regressions, like for example this case with data set exceeding shared_buffers: * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
Merlin Moncure wrote: > single user mode dumps core :( > > bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging > LOG: 0: could not change directory to "/root": Permission denied > LOCATION: resolve_symlinks, exec.c:293 > Segmentation fault (core dumped) > > Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data > castaging'. > Program terminated with signal 11, Segmentation fault. > #0 0x00797d6f in ?? () > Missing separate debuginfos, use: debuginfo-install > postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 > (gdb) bt > #0 0x00797d6f in ?? () > #1 0x0079acf1 in RelationCacheInitializePhase3 () > #2 0x007b35c5 in InitPostgres () > #3 0x006b9b53 in PostgresMain () > #4 0x005f30fb in main () Maybe rm global/pg_internal.init and try again? -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE
A customer just pinged me wondering how it was that a BitmapAnd node was reporting 0 tuples when the Bitmap Heap Scan above it showed it had in fact generated tuples. While this is mentioned in the docs, I think it would be very helpful to have ANALYZE spit out "N/A" instead of 0 for these nodes. AFAICT that would just require adding a special case to the "if (es->costs)" block at line ~1204 in explain.c? BTW, it looks like it would actually be possible to return a real row-count if none of the TIDBitmap pages are chunks, but I'm not sure if it's worth the extra effort. -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com 855-TREBLE2 (855-873-2532) mobile: 512-569-9461 -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE
Jim Nasby writes: > A customer just pinged me wondering how it was that a BitmapAnd node was > reporting 0 tuples when the Bitmap Heap Scan above it showed it had in > fact generated tuples. > While this is mentioned in the docs, I think it would be very helpful to > have ANALYZE spit out "N/A" instead of 0 for these nodes. That would break code that tries to parse that stuff, eg depesz.com. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE
Tom, * Tom Lane (t...@sss.pgh.pa.us) wrote: > Jim Nasby writes: > > A customer just pinged me wondering how it was that a BitmapAnd node was > > reporting 0 tuples when the Bitmap Heap Scan above it showed it had in > > fact generated tuples. > > > While this is mentioned in the docs, I think it would be very helpful to > > have ANALYZE spit out "N/A" instead of 0 for these nodes. > > That would break code that tries to parse that stuff, eg depesz.com. I don't believe Jim was suggesting that we back-patch such a change. Changing it in a new major release seems entirely reasonable. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP: Fix invalid XML explain plans for track_io_timing
I wrote: > Markus Winand writes: >> The XML output of explain potentially outputs the XML tag names >> "I/O-Write-Time" >> and "I/O-Read-Time", which are invalid due to the slash. > Ooops. After further thought I decided we should go with the whitelist solution. The extra time needed to produce XML-format output isn't really likely to bother anyone. And given that this bug escaped notice for several years, it seems likely that the next time somebody decides to be creative about picking a tag name, we might not notice an XML syntax violation for several more years. So a future-proof fix seems advisable. I pushed a patch using the strchr() approach. Thanks for reporting this! regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] emergency outage requiring database restart
On Thu, Oct 20, 2016 at 3:16 PM, Alvaro Herrera wrote: > Merlin Moncure wrote: > >> single user mode dumps core :( >> >> bash-4.1$ postgres --single -D /var/lib/pgsql/9.5/data castaging >> LOG: 0: could not change directory to "/root": Permission denied >> LOCATION: resolve_symlinks, exec.c:293 >> Segmentation fault (core dumped) >> >> Core was generated by `postgres --single -D /var/lib/pgsql/9.5/data >> castaging'. >> Program terminated with signal 11, Segmentation fault. >> #0 0x00797d6f in ?? () >> Missing separate debuginfos, use: debuginfo-install >> postgresql95-server-9.5.2-1PGDG.rhel6.x86_64 >> (gdb) bt >> #0 0x00797d6f in ?? () >> #1 0x0079acf1 in RelationCacheInitializePhase3 () >> #2 0x007b35c5 in InitPostgres () >> #3 0x006b9b53 in PostgresMain () >> #4 0x005f30fb in main () > > Maybe > rm global/pg_internal.init > and try again? Will do when I can do that had to do emergency restore + some unfun data reconstruction from the query log. Notably there is a much larger database in the same cluster which is undamaged. This server is new to production usage, maybe 2 months. Here is contents of pg_extension plpgsql dblink hstore postgres_fdw plsh * not used pg_trgm * not used plr * not used tablefunc * not used adminpack * not used plpythonu * not used postgis * not used postgis_topology * not used Short term plan is to separate the database to it's own cluster, install replication and checksums. All queries to this database are logged. Here is the contents of the log leading into and after the the crash: oct 17 crash: 2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL: parameters: $1 = '21121', $2 = '8', $3 = '2016-10-13', $4 = NULL, $5 = NULL, $6 = NULL, $7 = NULL, $8 = NULL, $9 = NULL, $10 = NULL, $11 = 't', $12 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute : SELECT NULL AS PROCEDURE_CAT, n.nspname AS PROCEDURE_SCHEM, p.proname AS PROCEDURE_NAME, NULL, NULL, NULL, d.description AS REMARKS 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute : SELECT n.nspname,p.proname,p.prorettype,p.proargtypes, t.typtype,t.typrelid , p.proargnames, p.proargmodes, p.proallargtypes , p.o 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute : select * from checkin($1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, $12) as result 2016-10-17 12:12:24 CDT [rms@castaging]: DETAIL: parameters: $1 = '114333', $2 = 'rrosillo', $3 = 'CALLER', $4 = 'Survey', $5 = 'Happy', $6 = 'Callback', $7 = 'OTHER', $8 = '2016-10-18 01:05:00', 2016-10-17 12:12:24 CDT [rms@castaging]: LOG: execute S_3: COMMIT 2016-10-17 12:12:25 CDT [@]: ERROR: could not open relation with OID 1203933 <-- first sign of damage 2016-10-17 12:12:25 CDT [@]: CONTEXT: automatic analyze of table "castaging.public.apartment" oct 20 crash: 2016-10-20 12:46:38 CDT [postgres@castaging]: LOG: statement: SELECT CallsByUser() AS byuser 2016-10-20 12:46:40 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterOverviewJSON() AS overview 2016-10-20 12:46:41 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterUserTrackingJSON() AS tracking 2016-10-20 12:46:41 CDT [postgres@castaging]: LOG: statement: SELECT MarketOverviewJSON() AS market 2016-10-20 12:46:42 CDT [postgres@castaging]: LOG: execute : SELECT SubMarketOverviewJSON($1::TEXT) AS submkt 2016-10-20 12:46:42 CDT [postgres@castaging]: DETAIL: parameters: $1 = '640' 2016-10-20 12:46:44 CDT [postgres@castaging]: LOG: statement: SELECT CallsByUser() AS byuser 2016-10-20 12:46:46 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterOverviewJSON() AS overview 2016-10-20 12:46:47 CDT [postgres@castaging]: LOG: statement: SELECT CallCenterUserTrackingJSON() AS tracking 2016-10-20 12:46:47 CDT [postgres@castaging]: ERROR: "pg_description_o_c_o_index" is an index <-- first sign of damage 2016-10-20 12:46:47 CDT [postgres@castaging]: CONTEXT: SQL function "callcenterusertrackingjson" during startup 2016-10-20 12:46:47 CDT [postgres@castaging]: STATEMENT: SELECT CallCenterUserTrackingJSON() AS tracking 2016-10-20 12:46:47 CDT [postgres@castaging]: WARNING: leaking still-referenced relcache entry for "pg_class_oid_index" CallCenterUserTrackingJSON() and friends are not particularly interesting except that they are making use of of json_agg(). They were also called basically all day long in 5 second intervals. I guess this isn't saying very much, but I'm starting to smell a rat here. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Improve output of BitmapAnd EXPLAIN ANALYZE
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> That would break code that tries to parse that stuff, eg depesz.com. > I don't believe Jim was suggesting that we back-patch such a change. I don't either. > Changing it in a new major release seems entirely reasonable. It's still a crock though. I wonder whether it wouldn't be better to change the nodeBitmap code so that when EXPLAIN ANALYZE is active, it expends extra effort to try to produce a rowcount number. We could certainly run through the result bitmap and count the number of exact-TID bits. I don't see a practical way of doing something with lossy page bits, but maybe those occur infrequently enough that we could ignore them? Or we could arbitrarily decide that a lossy page should be counted as MaxHeapTuplesPerPage, or a bit less arbitrarily, count it as the relation's average number of tuples per page. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Indirect indexes
On Thu, Oct 20, 2016 at 1:08 PM, Pavan Deolasee wrote: > On Thu, Oct 20, 2016 at 9:20 PM, Claudio Freire > wrote: >> >> >> >> With indirect indexes, since you don't need to insert a tid, you can >> just "insert on conflict do nothing" on the index. > > > Would that work with non-unique indexes? Anyways, the point I was trying to > make is that there are a similar technical challenges and we could solve it > for WARM as well with your work for finding conflicting pairs and > then not doing inserts. My thinking currently is that it will lead to other > challenges, especially around vacuum, but I could be wrong. Consider this: Have a vacuum_cycle_id field in the metapage, with one bit reserved to whether there's a vacuum in progress. While there is a vacuum in progress on the index, all kinds of modifications will look up the entry, and store the current vacuum_cycle_id on the unused space for the tid pointer on the index entry. When not, only new entries will be added (with the current vacuum cycle id). So, during vacuum, indirect indexes incur a similar cost to that of regular indexes, but only during vacuum. When vacuuming, allocate 1/2 maintenance_work_mem for a bloom filter, and increase all vacuum cycle ids (on the metapage) and mark a vacuum in progress. Scan the heap, add pairs of *non-dead* tuples to the bloom filter. That's one BF per index, sadly, but bear with me. Then scan the indexes. pairs *not* in the BF that have the *old* vacuum cycle id get removed. Clear the vacuum in progress flag on all indexes' metapage. The only drawback here is that mwm dictates the amount of uncleanable waste left on the indexes (BF false positives). Surely, the BF could be replaced with an accurate set rather than an approximate one, but that could require a lot of memory if keys are big, and a lot of scans of the indexes. The BF trick bounds the amount of waste left while minimizing work. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Fri, Oct 21, 2016 at 12:35 AM, Robert Haas wrote: > On Wed, Oct 12, 2016 at 10:22 PM, Michael Paquier > wrote: >> OK. I can live with that as well. Attached are three patches. The >> pg_xlog -> pg_wal move, the pg_clog -> pg_transaction move, and the >> pg_clog -> pg_xact move. Only one survivor to be chosen among the last >> two ones. > > Committed 0001. Thanks! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo in pgstat.h
Hi, - * st_progress_command_target, and st_progress_command[]. + * st_progress_command_target, and st_progress_param[]. Attached patch fixed typo. Regards, Vinayak Pokale NTT Open Source Software Center typo-pgstat-h.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra wrote: >> I then started a run at 96 clients which I accidentally killed shortly >> before it was scheduled to finish, but the results are not much >> different; there is no hint of the runaway CLogControlLock contention >> that Dilip sees on power2. >> > What shared_buffer size were you using? I assume the data set fit into > shared buffers, right? 8GB. > FWIW as I explained in the lengthy post earlier today, I can actually > reproduce the significant CLogControlLock contention (and the patches do > reduce it), even on x86_64. /me goes back, rereads post. Sorry, I didn't look at this carefully the first time. > For example consider these two tests: > > * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync > * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip > > However, it seems I can also reproduce fairly bad regressions, like for > example this case with data set exceeding shared_buffers: > > * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip I'm not sure how seriously we should take the regressions. I mean, what I see there is that CLogControlLock contention goes down by about 50% -- which is the point of the patch -- and WALWriteLock contention goes up dramatically -- which sucks, but can't really be blamed on the patch except in the indirect sense that a backend can't spend much time waiting for A if it's already spending all of its time waiting for B. It would be nice to know why it happened, but we shouldn't allow CLogControlLock to act as an admission control facility for WALWriteLock (I think). -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 3:46 PM, Tom Lane wrote: > I'm mostly with Stephen on this. As the names stand, they encourage > people to go look at the documentation, > https://www.postgresql.org/docs/devel/static/storage-file-layout.html > which will provide more information than you'd ever get out of any > reasonable directory name. Well, we could change them all to pg_a, pg_b, pg_c, pg_d, ... which would encourage that even more strongly. But I don't think that proposal can be taken seriously. Giving things meaningful names is a good practice in almost every case. > Having said that, I still don't like "pg_logical", but I suppose > renaming it would have more downsides than upsides. Remind me what your beef is? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Typo in pgstat.c
Hi, Attached patch fixes a typo in pgstat.c s/addtions/additions/g Regards, Vinayak Pokale NTT Open Source Software Center typo-pgstat-c.patch Description: application/download -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Fri, Oct 21, 2016 at 10:03 AM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 3:46 PM, Tom Lane wrote: >> I'm mostly with Stephen on this. As the names stand, they encourage >> people to go look at the documentation, >> https://www.postgresql.org/docs/devel/static/storage-file-layout.html >> which will provide more information than you'd ever get out of any >> reasonable directory name. > > Well, we could change them all to pg_a, pg_b, pg_c, pg_d, ... which > would encourage that even more strongly. But I don't think that > proposal can be taken seriously. Giving things meaningful names is a > good practice in almost every case. Moving on with the topic of this thread... I would think that "pg_xact" is what we are moving to rename pg_clog. On top of that, after reading the thread, here are the options and binaries that could be renamed for consistency with the renaming of pg_xlog: - pg_xlogdump -> pg_waldump - pg_resetxlog -> pg_resetwal - pg_receivexlog -> pg_receivewal - initdb --xlogdir -> --waldir - pg_basebackup --xlogmethod --xlogdir -> --walmethod --waldir That's quite a number, and each change is trivial. Previous options would be better kept for backward-compatibility. Personally, I see no urge in changing all that and I am fine with just renaming the *log directories of PGDATA for this thread. But as the point has been raised, here are all things that could be done. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Renaming of pg_xlog and pg_clog
On Thu, Oct 20, 2016 at 6:03 PM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 3:46 PM, Tom Lane wrote: > > I'm mostly with Stephen on this. As the names stand, they encourage > > people to go look at the documentation, > > https://www.postgresql.org/docs/devel/static/storage-file-layout.html > > which will provide more information than you'd ever get out of any > > reasonable directory name. > > Well, we could change them all to pg_a, pg_b, pg_c, pg_d, ... which > would encourage that even more strongly. But I don't think that > proposal can be taken seriously. Giving things meaningful names is a > good practice in almost every case. > Those don't have the virtue of being at least somewhat m nemonic like pg_xact. I'll toss my lot in with Steven's and Tom's on this. I have no problem continuing keeping with historical precedent and allowing mnemonic abbreviations in our directory and file names at this point. David J.
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 9:03 PM, Tomas Vondra wrote: > In the results you've posted on 10/12, you've mentioned a regression with 32 > clients, where you got 52k tps on master but only 48k tps with the patch (so > ~10% difference). I have no idea what scale was used for those tests, That test was with scale factor 300 on POWER 4 socket machine. I think I need to repeat this test with multiple reading to confirm it was regression or run to run variation. I will do that soon and post the results. > and I > see no such regression in the current results (but you only report results > for some of the client counts). This test is on X86 8 socket machine, At 1000 scale factor I have given reading with all client counts (32,64,96,192), but at 300 scale factor I posted only with 192 because on this machine (X86 8 socket machine) I did not see much load on ClogControlLock at 300 scale factor. > > Also, which of the proposed patches have you been testing? I tested with GroupLock patch. > Can you collect and share a more complete set of data, perhaps based on the > scripts I use to do tests on the large machine with 36/72 cores, available > at https://bitbucket.org/tvondra/hp05-results ? I think from my last run I did not share data for -> X86 8 socket machine, 300 scale factor, 32,64,96 client. I already have those data so I ma sharing it. (Please let me know if you want to see at some other client count, for that I need to run another test.) Head: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 77233356 latency average: 0.746 ms tps = 42907.363243 (including connections establishing) tps = 42907.546190 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 300_32_ul.txt 111757 | 3666 1289 LWLockNamed | ProcArrayLock 1142 Lock| transactionid 318 LWLockNamed | CLogControlLock 299 Lock| extend 109 LWLockNamed | XidGenLock 70 LWLockTranche | buffer_content 35 Lock| tuple 29 LWLockTranche | lock_manager 14 LWLockTranche | wal_insert 1 Tuples only is on. 1 LWLockNamed | CheckpointerCommLock Group Lock Patch: scaling factor: 300 query mode: prepared number of clients: 32 number of threads: 32 duration: 1800 s number of transactions actually processed: 77544028 latency average: 0.743 ms tps = 43079.783906 (including connections establishing) tps = 43079.960331 (excluding connections establishing 112209 | 3718 1402 LWLockNamed | ProcArrayLock 1070 Lock| transactionid 245 LWLockNamed | CLogControlLock 188 Lock| extend 80 LWLockNamed | XidGenLock 76 LWLockTranche | buffer_content 39 LWLockTranche | lock_manager 31 Lock| tuple 7 LWLockTranche | wal_insert 1 Tuples only is on. 1 LWLockTranche | buffer_mapping Head: number of clients: 64 number of threads: 64 duration: 1800 s number of transactions actually processed: 76211698 latency average: 1.512 ms tps = 42339.731054 (including connections establishing) tps = 42339.930464 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt 215734 | 5106 Lock| transactionid 3754 LWLockNamed | ProcArrayLock 3669 3267 LWLockNamed | CLogControlLock 661 Lock| extend 339 LWLockNamed | XidGenLock 310 Lock| tuple 289 LWLockTranche | buffer_content 205 LWLockTranche | lock_manager 50 LWLockTranche | wal_insert 2 LWLockTranche | buffer_mapping 1 Tuples only is on. 1 LWLockTranche | proc GroupLock patch: scaling factor: 300 query mode: prepared number of clients: 64 number of threads: 64 duration: 1800 s number of transactions actually processed: 76629309 latency average: 1.503 ms tps = 42571.704635 (including connections establishing) tps = 42571.905157 (excluding connections establishing) [dilip.kumar@cthulhu bin]$ cat 300_64_ul.txt 217840 | 5197 Lock| transactionid 3744 LWLockNamed | ProcArrayLock 3663 966 Lock| extend 849 LWLockNamed | CLogControlLock 372 Lock| tuple 305 LWLockNamed | XidGenLock 199 LWLockTranche | buffer_content 184 LWLockTranche | lock_manager 35 LWLockTranche | wal_insert 1 Tuples only is on. 1 LWLockTranche | proc 1 LWLockTranche | buffer_mapping Head: scaling factor: 300 query mode: prepared number of clients: 96 number of threads: 96 duration: 1800 s number of transactions actually processed: 77663593 latency average: 2.225 ms tps = 43145.624864 (including connections establishing) tps = 43145.838167 (excluding connections establishing) 302317 | 18836 Lo
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Thu, Oct 20, 2016 at 9:15 PM, Robert Haas wrote: > So here's my theory. The whole reason why Tomas is having difficulty > seeing any big effect from these patches is because he's testing on > x86. When Dilip tests on x86, he doesn't see a big effect either, > regardless of workload. But when Dilip tests on POWER, which I think > is where he's mostly been testing, he sees a huge effect, because for > some reason POWER has major problems with this lock that don't exist > on x86. Right, because on POWER we can see big contention on ClogControlLock with 300 scale factor, even at 96 client count, but on X86 with 300 scan factor there is almost no contention on ClogControlLock. However at 1000 scale factor we can see significant contention on ClogControlLock on X86 machine. I want to test on POWER with 1000 scale factor to see whether contention on ClogControlLock become much worse ? I will run this test and post the results. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FSM corruption leading to errors
On Thu, Oct 20, 2016 at 3:37 PM, Pavan Deolasee wrote: > Just to clarify, I meant if we truncate the entire FSM then we'll need API > to truncate VM as well so that VACUUM rebuilds everything completely. OTOH > if we provide a function just to truncate FSM to match the size of the > table, then we don't need to rebuild the FSM. So that's probably a better > way to handle FSM corruption, as far as this particular issue is concerned. To be honest, I think that just having in the release notes the method that does not involve the use any extra extension or SQL routine is fine. So we could just tell to users to: 1) Run something like the query you gave upthread, giving to the user a list of the files that are corrupted. And add this query to the release notes. 2) If anything is found, stop the server and delete the files manually. 3) Re-start the server. OK, that's troublesome and costly for large relations, but we know that's the safest way to go for any versions, and there is no need to complicate the code with any one-time repairing extensions. Speaking of which, I implemented a small extension able to truncate the FSM up to the size of the relation as attached, but as I looked at it SMGR_TRUNCATE_FSM has been introduced in 9.6 so its range of action is rather limited... And I pushed as well a version on github: https://github.com/michaelpq/pg_plugins/tree/master/pg_fix_truncation The limitation range of such an extension is a argument good enough to just rely on the stop/delete-FSM/start method to fix an instance and let VACUUM do the rest of the work. That looks to work but use it at your own risk. This bug would be a good blog topic by the way... -- Michael pg_fix_truncation.tar.gz Description: GNU Zip compressed data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Remove autovacuum GUC?
On 21 Oct. 2016 12:57 am, "Joshua D. Drake" wrote: > > Hello, > > What about a simpler solution to all of this. Let's just remove it from postgresql.conf. Out of sight. If someone needs to test they can but a uneducated user won't immediately know what to do about that "autovacuum process" and when they look it up the documentation is exceedingly blunt about why to *not* turn it off. Then they'll just do what I've seen at multiple sites: create a cron job that kills it as soon as it starts. Then their DB performance issues go away ... for a while. By the time they're forced to confront it their DB is immensely listed and barely staggering along, or has reached wraparound shutdown. So we get the fun job of trying to fix it using special freeze tools etc because they broke the normal ones... We still have fsync=off available. If you want a user foot gun to crusade against, start there. Even that's useful and legitimate though I wish it were called enable_crash_safety = off. It's legit to use it in testing, in data ingestion where you'll fsync afterward, in cloud deployments where you rely on replication and the whole instance gets nuked if it crashes anyway. There are similarly legit reasons to turn autovac off but the consequences are less bad. Personally what I think is needed here is to make monitoring and bloat visibility not completely suck. So we can warn users if tables haven't been vac'd in ages and have recent churn. And so they can easily SELECT a view to get bloat estimates with an estimate of how much drift there could've been since last vacuum. Users turn off vacuum because they cannot see that it is doing anything except wasting I/O and cpu. So: * A TL;DR in the docs saying what vac does and why not to turn it off. In particular warning that turning autovac off will make a slow SB get slower even though it seems to help at first. * A comment in the conf file with the same TL;DR. Comments are free, let's use a few lines. * Warn on startup when autovac is off? Personally I wouldn't mind encouraging most users to prefer table or db level autovac controls. Though we really need to make them more visible. If that improved I wouldn't really mind removing the global autovac option from the conf file though I'd prefer to just give it a decent comment.
Re: [HACKERS] Fun fact about autovacuum and orphan temp tables
On Thu, Sep 8, 2016 at 12:38 AM, Robert Haas wrote: > On Mon, Sep 5, 2016 at 1:14 PM, Bruce Momjian wrote: >> I don't think we look at those temp tables frequently enough to justify >> keeping them around for all users. > > +1. I think it would be much better to nuke them more aggressively. +1 from here as well. Making the deletion of orphaned temp tables even in the non-wraparound autovacuum case mandatory looks to be the better move to me. I can see that it could be important to be able to look at some of temp tables' data after a crash, but the argument looks weak compared to the potential bloat of catalog tables because of those dangling temp relations. And I'd suspect that there are far more users who would like to see this removal more aggressive than users caring about having a look at those orphaned tables after a crash. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fun fact about autovacuum and orphan temp tables
On Thu, Oct 20, 2016 at 9:30 PM, Constantin S. Pan wrote: > I tried to fix the problem with a new backend not being > able to reuse a temporary namespace when it contains > thousands of temporary tables. I disabled locking of objects > during namespace clearing process. See the patch attached. > Please tell me if there are any reasons why this is wrong. That's invasive. I am wondering if a cleaner approach here would be a flag in deleteOneObject() that performs the lock cleanup, as that's what you are trying to solve here. > I also added a GUC variable and changed the condition in > autovacuum to let it nuke orphan tables on its way. > See another patch attached. It seems to me that you'd even want to make the drop of orphaned tables mandatory once it is detected even it is not a wraparound autovacuum. Dangling temp tables have higher chances to hit users than diagnostic of orphaned temp tables after a crash. (A background worker could be used for existing versions to clean up that more aggressively actually) -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Fun fact about autovacuum and orphan temp tables
On Fri, Oct 21, 2016 at 2:29 PM, Michael Paquier wrote: > On Thu, Oct 20, 2016 at 9:30 PM, Constantin S. Pan wrote: >> I tried to fix the problem with a new backend not being >> able to reuse a temporary namespace when it contains >> thousands of temporary tables. I disabled locking of objects >> during namespace clearing process. See the patch attached. >> Please tell me if there are any reasons why this is wrong. > > That's invasive. I am wondering if a cleaner approach here would be a > flag in deleteOneObject() that performs the lock cleanup, as that's > what you are trying to solve here. > >> I also added a GUC variable and changed the condition in >> autovacuum to let it nuke orphan tables on its way. >> See another patch attached. > > It seems to me that you'd even want to make the drop of orphaned > tables mandatory once it is detected even it is not a wraparound > autovacuum. Dangling temp tables have higher chances to hit users than > diagnostic of orphaned temp tables after a crash. (A background worker > could be used for existing versions to clean up that more aggressively > actually) You should as well add your patch to the next commit fest, so as to be sure that it will attract more reviews and more attention: https://commitfest.postgresql.org/11/ -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Transactions involving multiple postgres foreign servers
On Wed, Oct 19, 2016 at 9:17 PM, Robert Haas wrote: > On Thu, Oct 13, 2016 at 7:27 AM, Amit Langote > wrote: >> However, when I briefly read the description in "Transaction Management in >> the R* Distributed Database Management System (C. Mohan et al)" [2], it >> seems that what Ashutosh is saying might be a correct way to proceed after >> all: > > I think Ashutosh is mostly right, but I think there's a lot of room to > doubt whether the design of this patch is good enough that we should > adopt it. > > Consider two possible designs. In design #1, the leader performs the > commit locally and then tries to send COMMIT PREPARED to every standby > server afterward, and only then acknowledges the commit to the client. > In design #2, the leader performs the commit locally and then > acknowledges the commit to the client at once, leaving the task of > running COMMIT PREPARED to some background process. Design #2 > involves a race condition, because it's possible that the background > process might not complete COMMIT PREPARED on every node before the > user submits the next query, and that query might then fail to see > supposedly-committed changes. This can't happen in design #1. On the > other hand, there's always the possibility that the leader's session > is forcibly killed, even perhaps by pulling the plug. If the > background process contemplated by design #2 is well-designed, it can > recover and finish sending COMMIT PREPARED to each relevant server > after the next restart. In design #1, that background process doesn't > necessarily exist, so inevitably there is a possibility of orphaning > prepared transactions on the remote servers, which is not good. Even > if the DBA notices them, it won't be easy to figure out whether to > commit them or roll them back. > > I think this thought experiment shows that, on the one hand, there is > a point to waiting for commits on the foreign servers, because it can > avoid the anomaly of not seeing the effects of your own commits. On > the other hand, it's ridiculous to suppose that every case can be > handled by waiting, because that just isn't true. You can't be sure > that you'll be able to wait long enough for COMMIT PREPARED to > complete, and even if that works out, you may not want to wait > indefinitely for a dead server. Waiting for a ROLLBACK PREPARED has > no value whatsoever unless the system design is such that failing to > wait for it results in the ROLLBACK PREPARED never getting performed > -- which is a pretty poor excuse. > > Moreover, there are good reasons to think that doing this kind of > cleanup work in the post-commit hooks is never going to be acceptable. > Generally, the post-commit hooks need to be no-fail, because it's too > late to throw an ERROR. But there's very little hope that a > connection to a remote server can be no-fail; anything that involves a > network connection is, by definition, prone to failure. We can try to > guarantee that every single bit of code that runs in the path that > sends COMMIT PREPARED only raises a WARNING or NOTICE rather than an > ERROR, but that's going to be quite difficult to do: even palloc() can > throw an error. And what about interrupts? We don't want to be stuck > inside this code for a long time without any hope of the user > recovering control of the session by pressing ^C, but of course the > way that works is it throws an ERROR, which we can't handle here. We > fixed a similar issue for synchronous replication in > 9a56dc3389b9470031e9ef8e45c95a680982e01a by making an interrupt emit a > WARNING in that case and then return control to the user. But if we > do that here, all of the code that every FDW emits has to be aware of > that rule and follow it, and it just adds to the list of ways that the > user backend can escape this code without having cleaned up all of the > prepared transactions on the remote side. Hmm, IIRC, my patch and possibly patch by Masahiko-san and Vinayak, tries to resolve prepared transactions in post-commit code. I agree with you here, that it should be avoided and the backend should take over the job of resolving transactions. > > It seems to me that the only way to really make this feature robust is > to have a background worker as part of the equation. The background > worker launches at startup and looks around for local state that tells > it whether there are any COMMIT PREPARED or ROLLBACK PREPARED > operations pending that weren't completed during the last server > lifetime, whether because of a local crash or remote unavailability. > It attempts to complete those and retries periodically. When a new > transaction needs this type of coordination, it adds the necessary > crash-proof state and then signals the background worker. If > appropriate, it can wait for the background worker to complete, just > like a CHECKPOINT waits for the checkpointer to finish -- but if the > CHECKPOINT command is interrupted, the actual checkpoint is > unaffect
Re: [HACKERS] Speed up Clog Access by increasing CLOG buffers
On Fri, Oct 21, 2016 at 6:31 AM, Robert Haas wrote: > On Thu, Oct 20, 2016 at 4:04 PM, Tomas Vondra > wrote: >>> I then started a run at 96 clients which I accidentally killed shortly >>> before it was scheduled to finish, but the results are not much >>> different; there is no hint of the runaway CLogControlLock contention >>> that Dilip sees on power2. >>> >> What shared_buffer size were you using? I assume the data set fit into >> shared buffers, right? > > 8GB. > >> FWIW as I explained in the lengthy post earlier today, I can actually >> reproduce the significant CLogControlLock contention (and the patches do >> reduce it), even on x86_64. > > /me goes back, rereads post. Sorry, I didn't look at this carefully > the first time. > >> For example consider these two tests: >> >> * http://tvondra.bitbucket.org/#dilip-300-unlogged-sync >> * http://tvondra.bitbucket.org/#pgbench-300-unlogged-sync-skip >> >> However, it seems I can also reproduce fairly bad regressions, like for >> example this case with data set exceeding shared_buffers: >> >> * http://tvondra.bitbucket.org/#pgbench-3000-unlogged-sync-skip > > I'm not sure how seriously we should take the regressions. I mean, > what I see there is that CLogControlLock contention goes down by about > 50% -- which is the point of the patch -- and WALWriteLock contention > goes up dramatically -- which sucks, but can't really be blamed on the > patch except in the indirect sense that a backend can't spend much > time waiting for A if it's already spending all of its time waiting > for B. > Right, I think not only WALWriteLock, but contention on other locks also goes up as you can see in below table. I think there is nothing much we can do for that with this patch. One thing which is unclear is why on unlogged tests it is showing WALWriteLock? test | clients | wait_event_type | wait_event | master | granular_locking | no_content_lock | group_update --+-+-+--+-+--+-+-- pgbench-3000-unlogged-sync-skip | 72 | LWLockNamed | CLogControlLock | 217012 |37326 | 32288 |12040 pgbench-3000-unlogged-sync-skip | 72 | LWLockNamed | WALWriteLock | 13188 | 104183 | 123359 | 103267 pgbench-3000-unlogged-sync-skip | 72 | LWLockTranche | buffer_content | 10532 |65880 | 57007 |86176 pgbench-3000-unlogged-sync-skip | 72 | LWLockTranche | wal_insert |9280 |85917 | 109472 |99609 pgbench-3000-unlogged-sync-skip | 72 | LWLockTranche | clog |4623 |25692 | 10422 |11755 -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers