Re: [HACKERS] Parallel Append implementation
On 2017-04-04 08:01:32 -0400, Robert Haas wrote: > On Tue, Apr 4, 2017 at 12:47 AM, Andres Freundwrote: > > I don't think the parallel seqscan is comparable in complexity with the > > parallel append case. Each worker there does the same kind of work, and > > if one of them is behind, it'll just do less. But correct sizing will > > be more important with parallel-append, because with non-partial > > subplans the work is absolutely *not* uniform. > > Sure, that's a problem, but I think it's still absolutely necessary to > ramp up the maximum "effort" (in terms of number of workers) > logarithmically. If you just do it by costing, the winning number of > workers will always be the largest number that we think we'll be able > to put to use - e.g. with 100 branches of relatively equal cost we'll > pick 100 workers. That's not remotely sane. I'm quite unconvinced that just throwing a log() in there is the best way to combat that. Modeling the issue of starting more workers through tuple transfer, locking, startup overhead costing seems a better to me. If the goal is to compute the results of the query as fast as possible, and to not use more than max_parallel_per_XXX, and it's actually beneficial to use more workers, then we should. Because otherwise you really can't use the resources available. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql - add special variable to reflect the last query status
2017-04-04 22:05 GMT+02:00 Fabien COELHO: > > After some discussions about what could be useful since psql scripts now > accepts tests, this patch sets a few variables which can be used by psql > after a "front door" (i.e. actually typed by the user) query: > > - RESULT_STATUS: the status of the query > - ERROR: whether the query failed > - ERROR_MESSAGE: ... > - ROW_COUNT: #rows affected > > SELECT * FROM ; > \if :ERROR >\echo oops >\q > \endif > > I'm not sure that the names are right. Maybe STATUS would be better than > RESULT_STATUS. good ideas Regards Pavel > > > -- > Fabien. > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > >
Re: [HACKERS] PATCH: recursive json_populate_record()
On 04/03/2017 05:17 PM, Andres Freund wrote: > On 2017-03-21 14:31:08 -0400, Andrew Dunstan wrote: >> >> On 03/21/2017 01:37 PM, David Steele wrote: >>> On 3/16/17 11:54 AM, David Steele wrote: On 2/1/17 12:53 AM, Michael Paquier wrote: > On Thu, Jan 26, 2017 at 6:49 AM, Tom Lanewrote: >> Nikita Glukhov writes: >>> On 25.01.2017 23:58, Tom Lane wrote: I think you need to take a second look at the code you're producing and realize that it's not so clean either. This extract from populate_record_field, for example, is pretty horrid: >>> But what if we introduce some helper macros like this: >>> #define JsValueIsNull(jsv) \ >>> ((jsv)->is_json ? !(jsv)->val.json.str \ >>> : !(jsv)->val.jsonb || (jsv)->val.jsonb->type == jbvNull) >>> #define JsValueIsString(jsv) \ >>> ((jsv)->is_json ? (jsv)->val.json.type == JSON_TOKEN_STRING \ >>> : (jsv)->val.jsonb && (jsv)->val.jsonb->type == jbvString) >> Yeah, I was wondering about that too. I'm not sure that you can make >> a reasonable set of helper macros that will fix this, but if you want >> to try, go for it. >> >> BTW, just as a stylistic thing, I find "a?b:c||d" unreadable: I have >> to go back to the manual every darn time to convince myself whether >> that means (a?b:c)||d or a?b:(c||d). It's better not to rely on >> the reader (... or the author) having memorized C's precedence rules >> in quite that much detail. Extra parens help. > Moved to CF 2017-03 as discussion is going on and more review is > needed on the last set of patches. > The current patches do not apply cleanly at cccbdde: $ git apply ../other/0001-introduce-JsonContainerXxx-macros-v04.patch error: patch failed: src/backend/utils/adt/jsonb_util.c:328 error: src/backend/utils/adt/jsonb_util.c: patch does not apply error: patch failed: src/backend/utils/adt/jsonfuncs.c:1266 error: src/backend/utils/adt/jsonfuncs.c: patch does not apply error: patch failed: src/include/utils/jsonb.h:218 error: src/include/utils/jsonb.h: patch does not apply In addition, it appears a number of suggestions have been made by Tom that warrant new patches. Marked "Waiting on Author". >>> This thread has been idle for months since Tom's review. >>> >>> The submission has been marked "Returned with Feedback". Please feel >>> free to resubmit to a future commitfest. >>> >>> >> Please revive. I am planning to look at this later this week. > Since that was 10 days ago - you're still planning to get this in before > the freeze? > Hoping to, other demands have intervened a bit, but I might be able to. cheers andrew -- Andrew Dunstanhttps://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
On 4/4/17 12:55 PM, Ashutosh Sharma wrote: > > As I am not seeing any response from Tomas for last 2-3 days and since > the commit-fest is coming towards end, I have planned to work on the > review comments that I had given few days back and submit the updated > patch. PFA new version of patch that takes care of all the review > comments given by me. I have also ran pgindent on btreefuncs.c file > and have run some basic test cases. All looks fine to me now! Awesome, Ashutosh! -- -David da...@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED
On 04/04/2017 10:42 PM, Tomas Vondra wrote: Hi, Andres nagged to me about valgrind runs taking much longer since 9fab40ad introduced the SlabContext into reorderbuffer.c. And by "longer" I mean hours instead of minutes. After a bit of investigation I stumbled on this line in slab.c: VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk)); which is wrong, because the second parameter should be size and not another pointer. This essentially marks a lot of memory as defined, which results in the extreme test runtime. Instead, the line should be: VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32)); Patch attached. Turns out SlabCheck() was missing VALGRIND_MAKE_MEM_DEFINED, resulting in a valgrind failure with --enable-assert. Updated patch version attached, passing all tests in test_decoding. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-valgrind-fix-v2.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: PATCH: pageinspect / add page_checksum and bt_page_items(bytea)
Thanks. I planned to look into this today, but you've been faster ;-) regards Tomas On 04/04/2017 06:55 PM, Ashutosh Sharma wrote: Hi, As I am not seeing any response from Tomas for last 2-3 days and since the commit-fest is coming towards end, I have planned to work on the review comments that I had given few days back and submit the updated patch. PFA new version of patch that takes care of all the review comments given by me. I have also ran pgindent on btreefuncs.c file and have run some basic test cases. All looks fine to me now! Please note that this patch still belongs to Tomas not me. I still remain the reviewer of this patch. The same thing has been very clearly mentioned in the attached patch. The only intention behind Ashutosh (reviewer) working on this patch is to ensure that we do not miss the things that can easily get committed in this commit-fest. Thanks. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
On Tue, Apr 4, 2017 at 3:07 AM, Anastasia Lubennikovawrote: >> * I think that we should store this (the number of attributes), and >> use it directly when comparing, per my remarks to Tom over on that >> other thread. We should also use the free bit within >> IndexTupleData.t_info, to indicate that the IndexTuple was truncated, >> just to make it clear to everyone that might care that that's how >> these truncated IndexTuples need to be represented. >> >> Doing this would have no real impact on your patch, because for you >> this will be 100% redundant. It will help external tools, and perhaps >> another, more general suffix truncation patch that comes in the >> future. We should try very hard to have a future-proof on-disk >> representation. I think that this is quite possible. > > To be honest, I think that it'll make the patch overcomplified, because this > exact patch has nothing to do with suffix truncation. > Although, we can add any necessary flags if this work will be continued in > the future. Yes, doing things that way would mean adding a bit more complexity to your patch, but IMV would be worth it to have the on-disk format be compatible with what a full suffix truncation patch will eventually require. Obviously I disagree with what you say here -- I think that your patch *does* have plenty in common with suffix truncation. But, you don't have to even agree with me on that to see why what I propose is still a good idea. Tom Lane had a specific objection to this patch -- catalog metadata is currently necessary to interpret internal page IndexTuples [1]. However, by storing the true number of columns in the case of truncated tuples, we can make the situation with IndexTuples similar enough to the existing situation with heap tuples, where the number of attributes is available right in the header as "natts". We don't have to rely on something like catalog metadata from a great distance, where some caller may forget to pass through the metadata to a lower level. So, presumably doing things this way addresses Tom's exact objection to the truncation aspect of this patch [2]. We have the capacity to store something like natts "for free" -- let's use it. The lack of any direct source of metadata was called "dangerous". As much as anything else, I want to remove any danger. > There is already an assertion. > Assert(IndexTupleSize(newitup) <= IndexTupleSize(olditup)); > Do you think it is not enough? I think that a "can't happen" check will work better in the future, when user defined code could be involved in truncation. Any extra overhead will be paid relatively infrequently, and will be very low. >> * I see a small bug. You forgot to teach _bt_findsplitloc() about >> truncation. It does this currently, which you did not update: >> >> /* >> * The first item on the right page becomes the high key of the left >> page; >> * therefore it counts against left space as well as right space. >> */ >> leftfree -= firstrightitemsz; >> >> I think that this accounting needs to be fixed. > > Could you explain, what's wrong with this accounting? We may expect to take > more space on the left page, than will be taken after highkey truncation. > But I don't see any problem here. Obviously it would at least be slightly better to have the actual truncated high key size where that's expected -- not the would-be untruncated high key size. The code as it stands might lead to a bad choice of split point in edge-cases. At the very least, you should change comments to note the issue. I think it's highly unlikely that this could ever result in a failure to find a split point, which there are many defenses against already, but I think I would find that difficult to prove. The intent of the code is almost as important as the code, at least in my opinion. [1] postgr.es/m/CAH2-Wz=vmdh8pfazx9wah9bn5ast5vrna0xsz+gsfrs12bp...@mail.gmail.com [2] postgr.es/m/11895.1490983884%40sss.pgh.pa.us -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] psql - add special variable to reflect the last query status
After some discussions about what could be useful since psql scripts now accepts tests, this patch sets a few variables which can be used by psql after a "front door" (i.e. actually typed by the user) query: - RESULT_STATUS: the status of the query - ERROR: whether the query failed - ERROR_MESSAGE: ... - ROW_COUNT: #rows affected SELECT * FROM ; \if :ERROR \echo oops \q \endif I'm not sure that the names are right. Maybe STATUS would be better than RESULT_STATUS. -- Fabien.diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 3b86612..7006f23 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3449,6 +3449,24 @@ bar + ERROR + + + Whether the last query failed. + + + + + + ERROR_MESSAGE + + + If the last query failed, the associated error message. + + + + + FETCH_COUNT @@ -3653,6 +3671,25 @@ bar + RESULT_STATUS + + + Text status of the last query. + + + + + + ROW_COUNT + + + When appropriate, how many rows were returned or affected by the + last query. + + + + + QUIET diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c index a2f1259..74d22fb 100644 --- a/src/bin/psql/common.c +++ b/src/bin/psql/common.c @@ -1213,6 +1213,57 @@ PrintQueryResults(PGresult *results) return success; } +/* + * Set special variables for "front door" queries + * - RESULT_STATUS: last query status + * - ERROR: TRUE/FALSE, whether an error occurred + * - ERROR_MESSAGE: message if an error occured + * - ROW_COUNT: how many rows were returned or affected, if appropriate + */ +static void +SetResultVariables(PGresult *results) +{ + bool success; + ExecStatusType restat = PQresultStatus(results); + + SetVariable(pset.vars, "RESULT_STATUS", PQresStatus(restat)); + + switch (restat) + { + case PGRES_EMPTY_QUERY: + case PGRES_TUPLES_OK: + case PGRES_COMMAND_OK: + case PGRES_COPY_OUT: + case PGRES_COPY_IN: + case PGRES_COPY_BOTH: + case PGRES_SINGLE_TUPLE: + success = true; + SetVariable(pset.vars, "ERROR", "FALSE"); + SetVariable(pset.vars, "ERROR_MESSAGE", NULL); + break; + case PGRES_BAD_RESPONSE: + case PGRES_NONFATAL_ERROR: + case PGRES_FATAL_ERROR: + success = false; + SetVariable(pset.vars, "ERROR", "TRUE"); + SetVariable(pset.vars, "ERROR_MESSAGE", PQerrorMessage(pset.db)); + break; + default: + /* dead code */ + success = false; + psql_error("unexpected PQresultStatus: %d\n", restat); + break; + } + + if (success) + { + /* set variable if non empty */ + char *ntuples = PQcmdTuples(results); + SetVariable(pset.vars, "ROW_COUNT", *ntuples ? ntuples : NULL); + } + else + SetVariable(pset.vars, "ROW_COUNT", NULL); +} /* * SendQuery: send the query string to the backend @@ -1346,6 +1397,9 @@ SendQuery(const char *query) elapsed_msec = INSTR_TIME_GET_MILLISEC(after); } + /* set special variables to reflect the result status */ + SetResultVariables(results); + /* but printing results isn't: */ if (OK && results) OK = PrintQueryResults(results); diff --git a/src/test/regress/expected/psql.out b/src/test/regress/expected/psql.out index d602aee..3ee96b5 100644 --- a/src/test/regress/expected/psql.out +++ b/src/test/regress/expected/psql.out @@ -2904,6 +2904,36 @@ bar 'bar' "bar" \echo 'should print #8-1' should print #8-1 \endif +-- special result variables +SELECT 1 UNION SELECT 2; + ?column? +-- +1 +2 +(2 rows) + +\echo 'result status: ' :RESULT_STATUS +result status: PGRES_TUPLES_OK +\echo 'number of rows: ' :ROW_COUNT +number of rows: 2 +SELECT 1 UNION; +ERROR: syntax error at or near ";" +LINE 1: SELECT 1 UNION; + ^ +\echo 'result status: ' :RESULT_STATUS +result status: PGRES_FATAL_ERROR +\if :ERROR + \echo 'Oops, an error occured...' +Oops, an error occured... + \echo 'error message: ' :ERROR_MESSAGE +error message: ERROR: syntax error at or near ";" +LINE 1: SELECT 1 UNION; + ^ + +\endif +; +\echo 'result status: ' :RESULT_STATUS +result status: PGRES_EMPTY_QUERY -- SHOW_CONTEXT \set SHOW_CONTEXT never do $$ diff --git a/src/test/regress/sql/psql.sql b/src/test/regress/sql/psql.sql index b56a05f..716fe06 100644 --- a/src/test/regress/sql/psql.sql +++ b/src/test/regress/sql/psql.sql @@ -526,6 +526,21 @@ select \if false \\ (bogus \else \\ 42 \endif \\ forty_two; \echo 'should print #8-1' \endif +-- special result variables +SELECT 1 UNION SELECT 2; +\echo 'result status: ' :RESULT_STATUS +\echo 'number of rows: ' :ROW_COUNT + +SELECT 1 UNION; +\echo 'result status: ' :RESULT_STATUS +\if :ERROR + \echo 'Oops, an error occured...' + \echo 'error message: '
Re: [HACKERS] Adding support for Default partition in partitioning
On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syedwrote: > Hello, > > Please find attached an updated patch. > Following has been accomplished in this update: > > 1. A new partition can be added after default partition if there are no > conflicting rows in default partition. > 2. Solved the crash reported earlier. > > Thank you, > Rahila Syed > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > Installed and compiled against commit 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017 -0400) without any issue However, running your original example, I'm getting this error keith@keith=# CREATE TABLE list_partitioned ( keith(# a int keith(# ) PARTITION BY LIST (a); CREATE TABLE Time: 4.933 ms keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR VALUES IN (DEFAULT); CREATE TABLE Time: 3.492 ms keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES IN (4,5); ERROR: unrecognized node type: 216 Time: 0.979 ms Also, I'm still of the opinion that denying future partitions of values in the default would be a cause of confusion. In order to move the data out of the default and into a proper child it would require first removing that data from the default, storing it elsewhere, creating the child, then moving it back. If it's only a small amount of data it may not be a big deal, but if it's a large amount, that could cause quite a lot of contention if done in a single transaction. Either that or the user would have to deal with data existing in the table, disappearing, then reappearing again. This also makes it harder to migrate an existing table easily. Essentially no child tables for a large, existing data set could ever be created without going through one of the two situations above. However, thinking through this, I'm not sure I can see a solution without the global index support. If this restriction is removed, there's still an issue of data duplication after the necessary child table is added. So guess it's a matter of deciding which user experience is better for the moment? -- Keith Fiske Database Administrator OmniTI Computer Consulting, Inc. http://www.keithf4.com
Re: [HACKERS] Logical decoding on standby
On 2017-04-04 22:32:40 +0800, Craig Ringer wrote: > I'm much happier with this. I'm still fixing some issues in the tests > for 03 and tidying them up, but 03 should allow 01 and 02 to be > reviewed in their proper context now. To me this very clearly is too late for v10, and now should be moved to the next CF. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
On 2017-03-13 18:14:07 -0700, Peter Geoghegan wrote: > From 5351b5db257cb39832647d9117465c0217e6268b Mon Sep 17 00:00:00 2001 > From: Peter Geoghegan> Date: Thu, 13 Oct 2016 10:54:31 -0700 > Subject: [PATCH 1/2] Avoid copying within tuplesort_gettupleslot(). s/Avoid/Allow to avoid/ > Add a "copy" argument to make it optional to receive a copy of caller > tuple that is safe to use following a subsequent manipulating of > tuplesort's state. This is a performance optimization. Most existing > tuplesort_gettupleslot() callers are made to opt out of copying. > Existing callers that happen to rely on the validity of tuple memory > beyond subsequent manipulations of the tuplesort request their own copy. > > This brings tuplesort_gettupleslot() in line with > tuplestore_gettupleslot(). In the future, a "copy" tuplesort_getdatum() > argument may be added, that similarly allows callers to opt out of > receiving their own copy of tuple. > > In passing, clarify assumptions that callers of other tuplesort fetch > routines may make about tuple memory validity, per gripe from Tom Lane. > --- > src/backend/executor/nodeAgg.c | 10 +++--- > src/backend/executor/nodeSort.c| 5 +++-- > src/backend/utils/adt/orderedsetaggs.c | 5 +++-- > src/backend/utils/sort/tuplesort.c | 28 +--- > src/include/utils/tuplesort.h | 2 +- > 5 files changed, 31 insertions(+), 19 deletions(-) > > diff --git a/src/backend/executor/nodeAgg.c b/src/backend/executor/nodeAgg.c > index aa08152..b650f71 100644 > --- a/src/backend/executor/nodeAgg.c > +++ b/src/backend/executor/nodeAgg.c > @@ -570,6 +570,10 @@ initialize_phase(AggState *aggstate, int newphase) > * Fetch a tuple from either the outer plan (for phase 0) or from the sorter > * populated by the previous phase. Copy it to the sorter for the next phase > * if any. > + * > + * Callers cannot rely on memory for tuple in returned slot remaining valid > + * past any subsequent manipulation of the sorter, such as another fetch of > + * input from sorter. (The sorter may recycle memory.) > */ It's not just the sorter, right? I'd just rephrase that the caller cannot rely on tuple to stay valid beyond a single call. > static bool > tuplesort_gettuple_common(Tuplesortstate *state, bool forward, > @@ -2091,12 +2092,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, bool > forward, > * NULL value in leading attribute will set abbreviated value to zeroed > * representation, which caller may rely on in abbreviated inequality check. > * > - * The slot receives a copied tuple (sometimes allocated in caller memory > - * context) that will stay valid regardless of future manipulations of the > - * tuplesort's state. > + * If copy is TRUE, the slot receives a copied tuple that will stay valid > + * regardless of future manipulations of the tuplesort's state. Memory is > + * owned by the caller. If copy is FALSE, the slot will just receive a > + * pointer to a tuple held within the tuplesort, which is more efficient, > + * but only safe for callers that are prepared to have any subsequent > + * manipulation of the tuplesort's state invalidate slot contents. > */ Hm. Do we need to note anything about how long caller memory needs to live? I think we'd get into trouble if the caller does a memory context reset, without also clearing the slot? > bool > -tuplesort_gettupleslot(Tuplesortstate *state, bool forward, > +tuplesort_gettupleslot(Tuplesortstate *state, bool forward, bool copy, > TupleTableSlot *slot, Datum *abbrev) > { > MemoryContext oldcontext = MemoryContextSwitchTo(state->sortcontext); > @@ -2113,8 +2117,10 @@ tuplesort_gettupleslot(Tuplesortstate *state, bool > forward, > if (state->sortKeys->abbrev_converter && abbrev) > *abbrev = stup.datum1; > > - stup.tuple = heap_copy_minimal_tuple((MinimalTuple) stup.tuple); > - ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, true); > + if (copy) > + stup.tuple = heap_copy_minimal_tuple((MinimalTuple) > stup.tuple); > + > + ExecStoreMinimalTuple((MinimalTuple) stup.tuple, slot, copy); > return true; Other than these minimal adjustments, this looks good to go to me. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED
Hi, Andres nagged to me about valgrind runs taking much longer since 9fab40ad introduced the SlabContext into reorderbuffer.c. And by "longer" I mean hours instead of minutes. After a bit of investigation I stumbled on this line in slab.c: VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk)); which is wrong, because the second parameter should be size and not another pointer. This essentially marks a lot of memory as defined, which results in the extreme test runtime. Instead, the line should be: VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32)); Patch attached. -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services slab-valgrind-fix.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] bug in SlabAlloc / VALGRIND_MAKE_MEM_DEFINED
On 2017-04-04 23:23:30 +0200, Tomas Vondra wrote: > On 04/04/2017 10:42 PM, Tomas Vondra wrote: > > Hi, > > > > Andres nagged to me about valgrind runs taking much longer since > > 9fab40ad introduced the SlabContext into reorderbuffer.c. And by > > "longer" I mean hours instead of minutes. > > > > After a bit of investigation I stumbled on this line in slab.c: > > > > VALGRIND_MAKE_MEM_DEFINED(chunk, SlabChunkGetPointer(chunk)); > > > > which is wrong, because the second parameter should be size and not > > another pointer. This essentially marks a lot of memory as defined, > > which results in the extreme test runtime. > > > > Instead, the line should be: > > > > VALGRIND_MAKE_MEM_DEFINED(SlabChunkGetPointer(chunk), sizeof(int32)); > > > > Patch attached. > > > > Turns out SlabCheck() was missing VALGRIND_MAKE_MEM_DEFINED, resulting in a > valgrind failure with --enable-assert. Updated patch version attached, > passing all tests in test_decoding. Pushed, and re-enabled TestDecoding on skink. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > It's too late. Someone has already moved the patch to the next CF (for > PostgreSQL 11). Yes, but this patch will be necessary by the final release of PG 10 if the libpq batch/pipelining is committed in PG 10. I marked this as ready for committer in the next CF, so that some committer can pick up this patch and consider putting it in PG 10. If you decide to modify the patch, please change the status. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
On 2017-01-05 03:12:09 +, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander > > For the pg_ctl changes, we're going from removing all privilieges from the > > token, to removing none. Are there any other privileges that we should be > > worried about? I think you may be correct in that it's overkill to do it, > > but I think we need some more specifics to decide that. > > This page lists the privileges. Is there anyhing you are concerned about? > > https://msdn.microsoft.com/ja-jp/library/windows/desktop/bb530716(v=vs.85).aspx Aren't like nearly all of them a concern? We gone from having some containment (not being to create users, shut the system down, ...), to none. I do however think there's a fair argument to be made that other platforms do not have a similar containment (no root, but sudo etc is still possible), and that the containment isn't super strong. Can't we, to reduce the size of the behavioural change, just use AdjustTokenPrivileges() to re-add the privileges we want? Regards, Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Letting the client choose the protocol to use during a SASL exchange
Hi all, There is still one open item pending for SCRAM that has not been treated which is mentioned here: https://www.postgresql.org/message-id/b081887e-1712-3aa4-7dbe-e012333d5...@iki.fi When doing an authentication with SASL, the server decides what is the mechanism that the client has to use. As SCRAM-SHA-256 is only one of such mechanisms, it would be nice to have something more generic and have the server return to the client a list of protocols that the client can choose from. And also the server achnowledge which protocol is going to be used. Note that RFC4422 has some content on the matter https://tools.ietf.org/html/rfc4422#section-3.1: Mechanism negotiation is protocol specific. Commonly, a protocol will specify that the server advertises supported and available mechanisms to the client via some facility provided by the protocol, and the client will then select the "best" mechanism from this list that it supports and finds suitable. So once the server sends back the list of mechanisms that are supported, the client is free to use what it wants. On HEAD, a 'R' message with AUTH_REQ_SASL followed by SCRAM_SHA256_NAME is sent to let the client know what is the mechanism to use for the SASL exchange. In the future, this should be extended so as a list of names is sent, for example a comma-separated list, but we are free to choose the format we want here. With this list at hand, the client can then choose the protocol it thinks is the best. Still, there is a gap with our current implementation because the server expects the first message from the client to have a SCRAM format, but that's true only if SCRAM-SHA-256 is used as mechanism. In order to cover this gap, it seems to me that we need to have an intermediate state before the server is switched to FE_SCRAM_INIT so as the mechanism used is negotiated between the two parties. Once the protocol negotiation is done, the server can then move on with the mechanism to use. This would be important in the future to allow more SASL mechanisms to work. I am adding an open item for that. For extensibility, we may also want to revisit the choice of defining 'scram' in pg_hba.conf instead of 'sasl'... Thoughts? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Andres Freund > I don't think the errdetail is quite right - OpenProcessToken isn't really > a syscall, is it? But then it's a common pattern already in wind32_shmem.c... Yes, "Win32 API function" would be correct, but I followed the existing code... > > +errdetail("Failed system call was %s.", > > +"LookupPrivilegeValue"))); > > Other places in the file actually log the arguments to the function... The only place is CreateFileMapping. Other places (DuplicateHandle and MapViewOfFileEx) don't log arguments. I guess the original developer thought that size and name arguments to CreateFileMapping() might be useful for troubleshooting. > Wonder if we should quote "Lock Pages in Memory" or add dashes, to make > sure it's clear that that's the right? I saw several Microsoft pages, including a page someone introduced me here, and they didn't quote the user right. I'm comfortable with the current code, but I don't mind if the committer adds some quotation. > > + flProtect = PAGE_READWRITE | SEC_COMMIT | > SEC_LARGE_PAGES; > > Why don't we just add the relevant flag, instead of overwriting the previous > contents? I don't have strong opinion on this... > Uh - isn't that a behavioural change? Was this discussed? Yes, I described this in the first mail. Magnus asked about this later and I replied. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-04 16:10:32 +0900, Tatsuo Ishii wrote: > >> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute > >> starts and stops the timer), then it's a concern and the patch should not > >> be ready for committer. However, the current patch is not like that -- it > >> seems to do what others in this thread are expecting. > > > > Oh, interesting - I kind of took the author's statement as, uh, > > authoritative ;). A quick look over the patch confirms your > > understanding. > > Yes, Tsunakawa-san is correct. Sorry for confusion. > > > I think the code needs a few clarifying comments around this, but > > otherwise seems good. Not restarting the timeout in those cases > > obviously isn't entirely "perfect"/"correct", but a tradeoff - the > > comments should note that. > > > > Tatsuo-san, do you want to change those, and push? I can otherwise. > > Andres, > > If you don't mind, could you please fix the comments and push it. Hm. I started to edit it, but I'm halfway coming back to my previous view that this isn't necessarily ready. If a client were to to prepare a large number of prepared statements (i.e. a lot of parse messages), this'll only start the timeout once, at the first statement sent. It's not an issue for libpq which sends a sync message after each PQprepare, nor does it look to be an issue for pgjdbc based on a quick look. Does anybody think there might be a driver out there that sends a bunch of 'parse' messages, without syncs? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 6 March 2017 at 05:09, Jim Nasbywrote: > On 2/28/17 9:42 PM, Jim Nasby wrote: >>> >>> >>> I'll post a plpython patch that doesn't add the output format control. >> >> >> I've attached the results of that. Unfortunately the speed improvement >> is only 27% at this point (with 999 tuples). Presumably that's >> because it's constructing a brand new dictionary from scratch for each >> tuple. Taking a look at this now. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pgbench - allow to store select results into variables
>> Please find attached a v8 which hopefully fixes these two issues. > Looks good to me, marking as ready for committer. I have looked into this a little bit. It seems the new feature \gset doesn't work with tables having none ascii column names: $ src/bin/pgbench/pgbench -t 1 -f /tmp/f test starting vacuum...end. gset: invalid variable name: "数字" client 0 file 0 command 0 compound 0: error storing into var 数字 transaction type: /tmp/f scaling factor: 1 query mode: simple number of clients: 1 number of threads: 1 number of transactions per client: 1 number of transactions actually processed: 0/1 This is because pgbench variable names are limited to ascii ranges. IMO the limitation is unnecessary and should be removed. (I know that the limitation was brought in by someone long time ago and the patch author is not responsible for that). Anyway, now that the feature reveals the undocumented limitation, we should document the limitation of \gset at least. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
> Hm. I started to edit it, but I'm halfway coming back to my previous > view that this isn't necessarily ready. > > If a client were to to prepare a large number of prepared statements > (i.e. a lot of parse messages), this'll only start the timeout once, at > the first statement sent. It's not an issue for libpq which sends a > sync message after each PQprepare, nor does it look to be an issue for > pgjdbc based on a quick look. > > Does anybody think there might be a driver out there that sends a bunch > of 'parse' messages, without syncs? What's your point of the question? What kind of problem do you expect if the timeout starts only once at the first parse meesage out of bunch of parse messages? Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Outdated comments around HandleFunctionRequest
Hi, PostgresMain() has the following blurb for fastpath functions: /* * Note: we may at this point be inside an aborted * transaction. We can't throw error for that until we've * finished reading the function-call message, so * HandleFunctionRequest() must check for it after doing so. * Be careful not to do anything that assumes we're inside a * valid transaction here. */ and in HandleFunctionRequest() there's: * INPUT: * In protocol version 3, postgres.c has already read the message body * and will pass it in msgBuf. * In old protocol, the passed msgBuf is empty and we must read the * message here. which is not true anymore. Followed by: /* * Now that we've eaten the input message, check to see if we actually * want to do the function call or not. It's now safe to ereport(); we * won't lose sync with the frontend. */ which is also not really meaningful, because there's no previous code in the function. This largely seems to be damage from commit 2b3a8b20c2da9f39ffecae25ab7c66974fbc0d3b Author: Heikki LinnakangasDate: 2015-02-02 17:08:45 +0200 Be more careful to not lose sync in the FE/BE protocol. Heikki? - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
From: pgsql-hackers-ow...@postgresql.org > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Tatsuo Ishii > Hmm. IMO, that could happen even with the current statement timeout > implementation as well. > > Or we could start/stop the timeout in exec_execute_message() only. This > could avoid the problem above. Also this is more consistent with > log_duration/log_min_duration_statement behavior than now. I think it's better to include Parse and Bind as now. Parse or Bind could take long time when the table has many partitions, the query is complex and/or very long, some DDL statement is running against a target table, or the system load is high. Firing statement timeout during or after many Parses is not a problem, because the first Parse started running some statement and it's not finished. Plus, Andres confirmed that major client drivers don't use such a pattern. There may be an odd behavior purely from the perspective of E.Q.P, that's a compromise, which Andres meant by "not perfect but." Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
From: Craig Ringer [mailto:craig.rin...@2ndquadrant.com] > TBH, anyone who cares about security and runs Win7 or Win2k8 or newer should > be using virtual service accounts and managed service accounts. > > https://technet.microsoft.com/en-us/library/dd548356 > > > Those are more like Unix service accounts. Notably they don't need a password, > getting rid of some of the management pain that led us to abandon the > 'postgres' system user on Windows. > > Now that older platforms are EoL and even the oldest that added this feature > are also near EoL or in extended maintenance, I think installers should > switch to these by default instead of using NETWORK SERVICE. > > Then the issue of priv dropping would be a lesser concern anyway. Good point! And I said earlier in this thread, I think managing privileges (adding/revoking privileges from the user account) is the DBA's or sysadmin's duty, and PG's removing all privileges feels overkill. OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock Pages in Memory, using the attached pg_ctl.c. Please see EnableLockPagesPrivilege() and its call site. But pg_ctl -w start fails emitting the following message: error code 1300 waiting for server to startFATAL: could not enable "Lock pages in memory" privilege HINT: Assign "Lock pages in memory" privilege to the Windows user account which runs PostgreSQL. LOG: database system is shut down stopped waiting pg_ctl: could not start server Examine the log output. error code 1300 is ERROR_NOT_ALL_ASSIGNED, which means AdjustTokenPrivileges() cound not enable Lock Pages in Memory privilege. It seems that the privilege cannot be enabled once it was removed with CreateRestrictedToken(DISABLE_MAX_PRIVILEGE). Regards Takayuki Tsunakawa /*- * * pg_ctl --- start/stops/restarts the PostgreSQL server * * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group * * src/bin/pg_ctl/pg_ctl.c * *- */ #ifdef WIN32 /* * Need this to get defines for restricted tokens and jobs. And it * has to be set before any header from the Win32 API is loaded. */ #define _WIN32_WINNT 0x0501 #endif #include "postgres_fe.h" #include "libpq-fe.h" #include "pqexpbuffer.h" #include #include #include #include #include #include #include #include #ifdef HAVE_SYS_RESOURCE_H #include #include #endif #include "getopt_long.h" #include "miscadmin.h" /* PID can be negative for standalone backend */ typedef long pgpid_t; typedef enum { SMART_MODE, FAST_MODE, IMMEDIATE_MODE } ShutdownMode; typedef enum { NO_COMMAND = 0, INIT_COMMAND, START_COMMAND, STOP_COMMAND, RESTART_COMMAND, RELOAD_COMMAND, STATUS_COMMAND, PROMOTE_COMMAND, KILL_COMMAND, REGISTER_COMMAND, UNREGISTER_COMMAND, RUN_AS_SERVICE_COMMAND } CtlCommand; #define DEFAULT_WAIT60 static bool do_wait = false; static bool del_service_rid = false; static bool wait_set = false; static int wait_seconds = DEFAULT_WAIT; static bool wait_seconds_arg = false; static bool silent_mode = false; static ShutdownMode shutdown_mode = FAST_MODE; static int sig = SIGINT; /* default */ static CtlCommand ctl_command = NO_COMMAND; static char *pg_data = NULL; static char *pg_config = NULL; static char *pgdata_opt = NULL; static char *post_opts = NULL; static const char *progname; static char *log_file = NULL; static char *exec_path = NULL; static char *event_source = NULL; static char *register_servicename = "PostgreSQL"; /* FIXME: + version ID? */ static char *register_username = NULL; static char *register_password = NULL; static char *argv0 = NULL; static bool allow_core_files = false; static time_t start_time; static char postopts_file[MAXPGPATH]; static char version_file[MAXPGPATH]; static char pid_file[MAXPGPATH]; static char backup_file[MAXPGPATH]; static char recovery_file[MAXPGPATH]; static char promote_file[MAXPGPATH]; #ifdef WIN32 static DWORD pgctl_start_type = SERVICE_AUTO_START; static SERVICE_STATUS status; static SERVICE_STATUS_HANDLE hStatus = (SERVICE_STATUS_HANDLE) 0; static HANDLE shutdownHandles[2]; static pid_t postmasterPID = -1; #define shutdownEvent shutdownHandles[0] #define postmasterProcess shutdownHandles[1] #endif static void write_stderr(const char *fmt,...) pg_attribute_printf(1, 2); static void do_advice(void); static void do_help(void); static void set_mode(char *modeopt); static void set_sig(char *signame); static void do_init(void); static void do_start(void); static void do_stop(void); static void do_restart(void); static void do_reload(void); static void do_status(void); static void do_promote(void); static void do_kill(pgpid_t pid); static void print_msg(const char *msg); static void
Re: [HACKERS] Statement timeout behavior in extended queries
From: Andres Freund [mailto:and...@anarazel.de] > Looks to me like npgsql doesn't do that either. None of libpq, pgjdbs and > npgsql doing it seems like some evidence that it's ok. And psqlODBC now uses always libpq. Now time for final decision? Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] partitioned tables and contrib/sepgsql
On Tue, Apr 4, 2017 at 6:56 PM, Joe Conwaywrote: > On 04/04/2017 10:02 AM, Joe Conway wrote: >> On 04/04/2017 09:55 AM, Mike Palmiotto wrote: >>> After some discussion off-list, I've rebased and udpated the patches. >>> Please see attached for further review. >> >> Thanks -- will have another look and test on a machine with selinux >> setup. Robert, did you want me to take responsibility to commit on this >> or just provide review/feedback? > > I did some editorializing on these. > > In particular I did not like the approach to fixing "warning: ‘tclass’ > may be used uninitialized" and ended up just doing it the same as was > done elsewhere in relation.c already (set tclass = 0 in the variable > declaration). Along the way I also changed one instance of tclass from > uint16 to uint16_t for the sake of consistency. > > Interestingly we figured out that the warning was present with -Og, but > not present with -O0, -O2, or -O3. > > If you want to test, apply 0001a and 0001b before 0002. > > Any objections? I'm guessing Tom's going to have a strong feeling about whether 0001a is the right way to address the stdbool issue, but I don't personally know what the right thing to do is so I will avoid opining on that topic. So, nope, no objections here to you committing those. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
> On 2017-04-04 16:56:26 -0700, 'Andres Freund' wrote: >> On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote: >> > From: Andres Freund [mailto:and...@anarazel.de] >> > > Looks to me like npgsql doesn't do that either. None of libpq, pgjdbs >> > > and >> > > npgsql doing it seems like some evidence that it's ok. >> > >> > And psqlODBC now uses always libpq. >> > >> > Now time for final decision? >> >> I'll send an updated patch in a bit, and then will wait till tomorrow to >> push. Giving others a chance to chime in seems fair. > > Attached. I did not like that the previous patch had the timeout > handling duplicated in the individual functions, I instead centralized > it into start_xact_command(). Given that it already activated the > timeout in the most common cases, that seems to make more sense to > me. In your version we'd have called enable_statement_timeout() twice > consecutively (which behaviourally is harmless). > > What do you think? I've not really tested this with the extended > protocol, so I'd appreciate if you could rerun your test from the > older thread. Ok, I will look into your patch and test it out using pgproto. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] partitioned tables and contrib/sepgsql
On 04/04/2017 10:02 AM, Joe Conway wrote: > On 04/04/2017 09:55 AM, Mike Palmiotto wrote: >> After some discussion off-list, I've rebased and udpated the patches. >> Please see attached for further review. > > Thanks -- will have another look and test on a machine with selinux > setup. Robert, did you want me to take responsibility to commit on this > or just provide review/feedback? I did some editorializing on these. In particular I did not like the approach to fixing "warning: ‘tclass’ may be used uninitialized" and ended up just doing it the same as was done elsewhere in relation.c already (set tclass = 0 in the variable declaration). Along the way I also changed one instance of tclass from uint16 to uint16_t for the sake of consistency. Interestingly we figured out that the warning was present with -Og, but not present with -O0, -O2, or -O3. If you want to test, apply 0001a and 0001b before 0002. Any objections? Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c index 1a8f884..320c098 100644 *** a/contrib/sepgsql/label.c --- b/contrib/sepgsql/label.c *** *** 10,15 --- 10,25 */ #include "postgres.h" + #include + /* + * selinux/label.h includes stdbool.h, which redefines bool, so + * revert to the postgres definition of bool from c.h + */ + #ifdef bool + #undef bool + typedef char bool; + #endif + #include "access/heapam.h" #include "access/htup_details.h" #include "access/genam.h" *** *** 37,44 #include "sepgsql.h" - #include - /* * Saved hook entries (if stacked) */ --- 47,52 diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c index ab98a9b..2ea6bfb 100644 *** a/contrib/sepgsql/relation.c --- b/contrib/sepgsql/relation.c *** sepgsql_relation_post_create(Oid relOid) *** 243,249 HeapTuple tuple; Form_pg_class classForm; ObjectAddress object; ! uint16 tclass; char *scontext; /* subject */ char *tcontext; /* schema */ char *rcontext; /* relation */ --- 243,249 HeapTuple tuple; Form_pg_class classForm; ObjectAddress object; ! uint16_t tclass; char *scontext; /* subject */ char *tcontext; /* schema */ char *rcontext; /* relation */ *** sepgsql_relation_drop(Oid relOid) *** 413,419 { ObjectAddress object; char *audit_name; ! uint16_t tclass; char relkind; relkind = get_rel_relkind(relOid); --- 413,419 { ObjectAddress object; char *audit_name; ! uint16_t tclass = 0; char relkind; relkind = get_rel_relkind(relOid); diff --git a/contrib/sepgsql/label.c b/contrib/sepgsql/label.c index 1a8f884..4dda82a 100644 *** a/contrib/sepgsql/label.c --- b/contrib/sepgsql/label.c *** exec_object_restorecon(struct selabel_ha *** 779,785 case RelationRelationId: relForm = (Form_pg_class) GETSTRUCT(tuple); ! if (relForm->relkind == RELKIND_RELATION) objtype = SELABEL_DB_TABLE; else if (relForm->relkind == RELKIND_SEQUENCE) objtype = SELABEL_DB_SEQUENCE; --- 779,786 case RelationRelationId: relForm = (Form_pg_class) GETSTRUCT(tuple); ! if (relForm->relkind == RELKIND_RELATION || ! relForm->relkind == RELKIND_PARTITIONED_TABLE) objtype = SELABEL_DB_TABLE; else if (relForm->relkind == RELKIND_SEQUENCE) objtype = SELABEL_DB_SEQUENCE; diff --git a/contrib/sepgsql/relation.c b/contrib/sepgsql/relation.c index ab98a9b..f8689c0 100644 *** a/contrib/sepgsql/relation.c --- b/contrib/sepgsql/relation.c *** sepgsql_attribute_post_create(Oid relOid *** 54,65 ObjectAddress object; Form_pg_attribute attForm; StringInfoData audit_name; /* ! * Only attributes within regular relation have individual security ! * labels. */ ! if (get_rel_relkind(relOid) != RELKIND_RELATION) return; /* --- 54,66 ObjectAddress object; Form_pg_attribute attForm; StringInfoData audit_name; + char relkind = get_rel_relkind(relOid); /* ! * Only attributes within regular relations or partition relations have ! * individual security labels. */ ! if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) return; /* *** sepgsql_attribute_drop(Oid relOid, AttrN *** 135,142 { ObjectAddress object; char *audit_name; ! if (get_rel_relkind(relOid) != RELKIND_RELATION) return; /* --- 136,144 { ObjectAddress object; char *audit_name; + char relkind = get_rel_relkind(relOid); ! if (relkind != RELKIND_RELATION && relkind != RELKIND_PARTITIONED_TABLE) return; /* *** sepgsql_attribute_relabel(Oid relOid, At *** 167,174 { ObjectAddress object; char *audit_name; ! if
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-05 10:05:19 +0900, Tatsuo Ishii wrote: > > Hm. I started to edit it, but I'm halfway coming back to my previous > > view that this isn't necessarily ready. > > > > If a client were to to prepare a large number of prepared statements > > (i.e. a lot of parse messages), this'll only start the timeout once, at > > the first statement sent. It's not an issue for libpq which sends a > > sync message after each PQprepare, nor does it look to be an issue for > > pgjdbc based on a quick look. > > > > Does anybody think there might be a driver out there that sends a bunch > > of 'parse' messages, without syncs? > > What's your point of the question? What kind of problem do you expect > if the timeout starts only once at the first parse meesage out of > bunch of parse messages? It's perfectly valid to send a lot of Parse messages without interspersed Sync or Bind/Execute message. There'll be one timeout covering all of those Parse messages, which can thus lead to a timeout, even though nothing actually takes long individually. Greetings, Andres Freund -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] ANALYZE command progress checker
On Wed, Apr 5, 2017 at 1:49 AM, Robert Haaswrote: > On Tue, Apr 4, 2017 at 4:57 AM, Amit Langote > wrote: >> Hmm, you're right. It could be counted with a separate variable >> initialized to 0 and incremented every time we decide to add a row to the >> final set of sampled rows, although different implementations of >> AcquireSampleRowsFunc have different ways of deciding if a given row will >> be part of the final set of sampled rows. >> >> On the other hand, if we decide to count progress in terms of blocks as >> you suggested afraid, I'm afraid that FDWs won't be able to report the >> progress. > > I think it may be time to push this patch out to v11. It was > submitted one day before the start of the last CommitFest, the design > wasn't really right, and it's not clear even now that we know what the > right design is. And we're pretty much out of time. > +1 We're encountering the design issue and it takes more time to find out right design including FDWs support. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
On Tue, Apr 4, 2017 at 9:30 PM, Stephen Frostwrote: > * Michael Paquier (michael.paqu...@gmail.com) wrote: >> On Tue, Apr 4, 2017 at 7:38 AM, Stephen Frost wrote: >> The patch presented here does lower the coverage we have now. > > I assume (perhaps mistakenly) that this statement was based on an > analysis of before-and-after 'make coverage' runs. Here you are saying > that you're *not* lowering the coverage. My apologies here, when I used the work "coverage" in my previous emails it visibly implied that I meant that I had used the coverage stuff. But I did not because the TAP test proposed does exactly what test.sh is doing: 1) Initialize the old cluster and start it. 2) create a bunch of databases with full range of ascii characters. 3) Run regression tests. 4) Take dump on old cluster. 4) Stop the old cluster. 5) Initialize the new cluster. 6) Run pg_upgrade. 7) Start new cluster. 8) Take dump from it. 9) Run deletion script (Oops forgot this part!) > I understand how the current pg_upgrade tests work. I don't see > off-hand why the TAP tests would reduce the code coverage of pg_upgrade, > but if they do, we should be able to figure out why and correct it. Good news is that this patch at least does not lower the bar. >> So in short I don't think that this lack of infrastructure should be a >> barrier for what is basically a cleanup but... I just work here. > > I didn't mean to imply that this patch needs to address the > cross-version testing challenge, was merely mentioning that there's been > some work in this area already by Andrew and that if you're interested > in working on that problem that you should probably coordinate with him. Sure. > What I do think is a barrier to this patch moving forward is if it > reduces our current code coverage testing (with the same-version > pg_upgrade that's run in the regular regression tests). If it doesn't, > then great, but if it does, then the patch should be updated to fix > that. I did not do a coverage test first, but surely this patch needs numbers, so here you go. Without the patch, here is the coverage of src/bin/pg_upgrade: lines..: 57.7% (1311 of 2273 lines) functions..: 85.3% (87 of 102 functions) And with the patch: lines..: 58.8% (1349 of 2294 lines) functions..: 85.6% (89 of 104 functions) The coverage gets a bit higher as a couple of basic code paths like pg_upgrade --help get looked at. -- Michael pgupgrade-tap-test-v2.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] increasing the default WAL segment size
On 27 March 2017 at 15:36, Beena Emersonwrote: > 02-increase-max-wal-segsize.patch - Increases the wal-segsize and changes > the internal representation of max_wal_size and min_wal_size to mb. Committed first part to allow internal representation change (only). No commitment yet to increasing wal-segsize in the way this patch has it. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
Andres, >> I think the code needs a few clarifying comments around this, but >> otherwise seems good. Not restarting the timeout in those cases >> obviously isn't entirely "perfect"/"correct", but a tradeoff - the >> comments should note that. >> >> Tatsuo-san, do you want to change those, and push? I can otherwise. > > Andres, > > If you don't mind, could you please fix the comments and push it. I have changed the comments as you suggested. If it's ok, I can push the patch myself (today I have time to work on this). Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index a2282058..1fd93359 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -149,6 +149,11 @@ static bool doing_extended_query_message = false; static bool ignore_till_sync = false; /* + * Flag to keep track of whether we have started statement timeout timer. + */ +static bool stmt_timer_started = false; + +/* * If an unnamed prepared statement exists, it's stored here. * We keep it separate from the hashtable kept by commands/prepare.c * in order to reduce overhead for short-lived queries. @@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *pstmts); static void drop_unnamed_stmt(void); static void SigHupHandler(SIGNAL_ARGS); static void log_disconnections(int code, Datum arg); +static void enable_statement_timeout(void); +static void disable_statement_timeout(void); /* @@ -1239,6 +1246,15 @@ exec_parse_message(const char *query_string, /* string to execute */ start_xact_command(); /* + * Set statement timeout running if it's not already set. The timer will + * be reset in a subsequent execute message. Ideally the timer should be + * reset in this function so that each parse/bind/execute message catches + * the timeout, but it needs gettimeofday() call for each, which is too + * expensive. + */ + enable_statement_timeout(); + + /* * Switch to appropriate context for constructing parsetrees. * * We have two strategies depending on whether the prepared statement is @@ -1526,6 +1542,15 @@ exec_bind_message(StringInfo input_message) */ start_xact_command(); + /* + * Set statement timeout running if it's not already set. The timer will + * be reset in a subsequent execute message. Ideally the timer should be + * reset in this function so that each parse/bind/execute message catches + * the timeout, but it needs gettimeofday() call for each, which is too + * expensive. + */ + enable_statement_timeout(); + /* Switch back to message context */ MemoryContextSwitchTo(MessageContext); @@ -1942,6 +1967,11 @@ exec_execute_message(const char *portal_name, long max_rows) start_xact_command(); /* + * Set statement timeout running if it's not already set. + */ + enable_statement_timeout(); + + /* * If we re-issue an Execute protocol request against an existing portal, * then we are only fetching more rows rather than completely re-executing * the query from the start. atStart is never reset for a v3 portal, so we @@ -2014,6 +2044,11 @@ exec_execute_message(const char *portal_name, long max_rows) * those that start or end a transaction block. */ CommandCounterIncrement(); + + /* + * We need to reset statement timeout if already set. + */ + disable_statement_timeout(); } /* Send appropriate CommandComplete to client */ @@ -2443,14 +2478,10 @@ start_xact_command(void) { StartTransactionCommand(); - /* Set statement timeout running, if any */ - /* NB: this mustn't be enabled until we are within an xact */ - if (StatementTimeout > 0) - enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout); - else - disable_timeout(STATEMENT_TIMEOUT, false); - xact_started = true; + + /* Set statement timeout running, if any */ + enable_statement_timeout(); } } @@ -2460,7 +2491,7 @@ finish_xact_command(void) if (xact_started) { /* Cancel any active statement timeout before committing */ - disable_timeout(STATEMENT_TIMEOUT, false); + disable_statement_timeout(); CommitTransactionCommand(); @@ -4511,3 +4542,53 @@ log_disconnections(int code, Datum arg) port->user_name, port->database_name, port->remote_host, port->remote_port[0] ? " port=" : "", port->remote_port))); } + +/* + * Set statement timeout if any. + */ +static void +enable_statement_timeout(void) +{ + if (!stmt_timer_started) + { + if (StatementTimeout > 0) + { + + /* + * Sanity check + */ + if (!xact_started) + { +ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("Transaction must have been already started to set statement timeout"))); + } + + ereport(DEBUG3, + (errmsg_internal("Set statement timeout"))); + + enable_timeout_after(STATEMENT_TIMEOUT,
Re: [HACKERS] delta relations in AFTER triggers
On Mon, Apr 3, 2017 at 7:16 PM, Thomas Munrowrote: > On Tue, Apr 4, 2017 at 3:41 AM, Kevin Grittner wrote: >> On Mon, Apr 3, 2017 at 8:59 AM, Tom Lane wrote: >>> Thomas Munro writes: Or perhaps the code to inject trigger data transition tables into SPI (a near identical code block these three patches) should be somewhere common so that each PLs would only need to call a function. If so, where should that go? >>> >>> spi.c? >> >> Until now, trigger.c didn't know about SPI, and spi.c didn't know >> about triggers. The intersection was left to referencing code, like >> PLs. Is there any other common code among the PLs dealing with this >> intersection? If so, maybe a new triggerspi.c file (or >> spitrigger.c?) would make sense. Possibly it could make sense from >> a code structure PoV even for a single function, but it seems kinda >> iffy for just this function. As far as I can see it comes down to >> adding it to spi.c or creating a new file -- or just duplicating >> these 30-some lines of code to every PL. > > Ok, how about SPI_register_trigger_data(TriggerData *)? Or any name > you prefer... I didn't suggest something as specific as > SPI_register_transition_tables because think it's plausible that > someone might want to implement SQL standard REFERENCING OLD/NEW ROW > AS and make that work in all PLs too one day, and this would be the > place to do that. > > See attached, which add adds the call to all four built-in PLs. Thoughts? Worked on the docs some more and then pushed it. Nice job cutting the number of *.[ch] lines by 30 while adding support for the other three core PLs. :-) -- Kevin Grittner -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-04 16:56:26 -0700, 'Andres Freund' wrote: > On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote: > > From: Andres Freund [mailto:and...@anarazel.de] > > > Looks to me like npgsql doesn't do that either. None of libpq, pgjdbs and > > > npgsql doing it seems like some evidence that it's ok. > > > > And psqlODBC now uses always libpq. > > > > Now time for final decision? > > I'll send an updated patch in a bit, and then will wait till tomorrow to > push. Giving others a chance to chime in seems fair. Attached. I did not like that the previous patch had the timeout handling duplicated in the individual functions, I instead centralized it into start_xact_command(). Given that it already activated the timeout in the most common cases, that seems to make more sense to me. In your version we'd have called enable_statement_timeout() twice consecutively (which behaviourally is harmless). What do you think? I've not really tested this with the extended protocol, so I'd appreciate if you could rerun your test from the older thread. - Andres >From c47d904725f3ef0272a4540b6b5bcf0be5c1dea6 Mon Sep 17 00:00:00 2001 From: Andres FreundDate: Tue, 4 Apr 2017 18:44:42 -0700 Subject: [PATCH] Rearm statement_timeout after each executed query. Previously statement_timeout, in the extended protocol, affected all messages till a Sync message. For clients that pipeline/batch query execution that's problematic. Instead disable timeout after each Execute message, and enable, if necessary, the timer in start_xact_command(). Author: Tatsuo Ishii, editorialized by Andres Freund Reviewed-By: Takayuki Tsunakawa, Andres Freund Discussion: https://postgr.es/m/20170222.115044.1665674502985097185.t-is...@sraoss.co.jp --- src/backend/tcop/postgres.c | 77 ++--- 1 file changed, 65 insertions(+), 12 deletions(-) diff --git a/src/backend/tcop/postgres.c b/src/backend/tcop/postgres.c index a2282058c0..4ab3bd7f07 100644 --- a/src/backend/tcop/postgres.c +++ b/src/backend/tcop/postgres.c @@ -149,6 +149,11 @@ static bool doing_extended_query_message = false; static bool ignore_till_sync = false; /* + * Flag to keep track of whether statement timeout timer is active. + */ +static bool stmt_timeout_active = false; + +/* * If an unnamed prepared statement exists, it's stored here. * We keep it separate from the hashtable kept by commands/prepare.c * in order to reduce overhead for short-lived queries. @@ -188,6 +193,8 @@ static bool IsTransactionStmtList(List *pstmts); static void drop_unnamed_stmt(void); static void SigHupHandler(SIGNAL_ARGS); static void log_disconnections(int code, Datum arg); +static void enable_statement_timeout(void); +static void disable_statement_timeout(void); /* @@ -1234,7 +1241,8 @@ exec_parse_message(const char *query_string, /* string to execute */ /* * Start up a transaction command so we can run parse analysis etc. (Note * that this will normally change current memory context.) Nothing happens - * if we are already in one. + * if we are already in one. This also arms the statement timeout if + * necessary. */ start_xact_command(); @@ -1522,7 +1530,8 @@ exec_bind_message(StringInfo input_message) /* * Start up a transaction command so we can call functions etc. (Note that * this will normally change current memory context.) Nothing happens if - * we are already in one. + * we are already in one. This also arms the statement timeout if + * necessary. */ start_xact_command(); @@ -2014,6 +2023,9 @@ exec_execute_message(const char *portal_name, long max_rows) * those that start or end a transaction block. */ CommandCounterIncrement(); + + /* full command has been executed, reset timeout */ + disable_statement_timeout(); } /* Send appropriate CommandComplete to client */ @@ -2443,25 +2455,27 @@ start_xact_command(void) { StartTransactionCommand(); - /* Set statement timeout running, if any */ - /* NB: this mustn't be enabled until we are within an xact */ - if (StatementTimeout > 0) - enable_timeout_after(STATEMENT_TIMEOUT, StatementTimeout); - else - disable_timeout(STATEMENT_TIMEOUT, false); - xact_started = true; } + + /* + * Start statement timeout if necessary. Note that this'll intentionally + * not reset the clock on an already started timeout, to avoid the timing + * overhead when start_xact_command() is invoked repeatedly, without an + * interceding finish_xact_command() (e.g. parse/bind/execute). If that's + * not desired, the timeout has to be disabled explicitly. + */ + enable_statement_timeout(); } static void finish_xact_command(void) { + /* cancel active statement timeout after each command */ + disable_statement_timeout(); + if (xact_started) { - /* Cancel any active statement timeout before committing */ -
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Thu, Mar 30, 2017 at 7:55 PM, Robert Haaswrote: > but > try to access the TOAST table would be fatal; that probably would have > deadlock hazards among other problems. Hmm. I think you're right. We could make a copy of the heap tuple, drop the lock and then access TOAST to handle that. Would that work? Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256
On 04/04/2017 01:52 PM, Heikki Linnakangas wrote: On 03/31/2017 10:10 AM, Michael Paquier wrote: On Wed, Mar 8, 2017 at 10:39 PM, Robert Haaswrote: On Tue, Mar 7, 2017 at 10:01 PM, Michael Paquier wrote: I kinda hope Heikki is going to step up to the plate here, because I think he understands most of it already. The second person who knows a good deal about NFKC is Ishii-san. Attached is a rebased patch. Thanks, I'm looking at this now. I will continue tomorrow, but I wanted to report on what I've done so far. Attached is a new patch version, quite heavily modified. Notable changes so far: * Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit ints. IMHO this makes the tables easier to read (to a human), and they are also packed slightly more tightly (see next two points), as you can fit more codepoints in a 16-bit integer. * Reorganize the decomposition table. Instead of a separate binary-searched table for each "size", have one table that's ordered by codepoint, which contains a length and offset into another array, where all the decomposed sequences are. This makes the recomposition step somewhat slower, as it now has to grovel through an array that contains all decompositions, rather than just the array that contains decompositions of two characters. But this isn't performance-critical, readability and tight packing of the tables matter more. * In the lookup table, store singleton decompositions that decompose to a single character, and the character fits in a uint16, directly in the main lookup table, instead of storing an index into the other table. This makes the tables somewhat smaller, as there are a lot of such decompositions. * Use uint32 instead of pg_wchar to represent unicode codepoints. pg_wchar suggests something that holds multi-byte characters in the OS locale, used by functions like wcscmp, so avoid that. I added a "typedef uint32 Codepoint" alias, though, to make it more clear which variables hold Unicode codepoints rather than e.g. indexes into arrays. * Unicode consortium publishes a comprehensive normalization test suite, as a text file, NormalizationTest.txt. I wrote a small perl and C program to run all the tests from that test suite, with the normalization function. That uncovered a number of bugs in the recomposition algorithm, which I then fixed: * decompositions that map to ascii characters cannot be excluded. * non-canonical and non-starter decompositions need to be excluded. They are now flagged in the lookup table, so that we only use them during the decomposition phase, not when recomposing. TODOs / Discussion points: * I'm not sure I like the way the code is organized, I feel that we're littering src/common with these. Perhaps we should add a src/common/unicode_normalization directory for all this. * The list of characters excluded from recomposition is currently hard-coded in utf_norm_generate.pl. However, that list is available in machine-readable format, in file CompositionExclusions.txt. Since we're reading most of the data from UnicodeData.txt, would be good to read the exclusion table from a file, too. * SASLPrep specifies normalization form KC, but it also specifies that some characters are mapped to space or nothing. Should do those mappings, too. - Heikki implement-SASLprep-2.patch.gz Description: application/gzip -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
>> What's your point of the question? What kind of problem do you expect >> if the timeout starts only once at the first parse meesage out of >> bunch of parse messages? > > It's perfectly valid to send a lot of Parse messages without > interspersed Sync or Bind/Execute message. There'll be one timeout > covering all of those Parse messages, which can thus lead to a timeout, > even though nothing actually takes long individually. Hmm. IMO, that could happen even with the current statement timeout implementation as well. Or we could start/stop the timeout in exec_execute_message() only. This could avoid the problem above. Also this is more consistent with log_duration/log_min_duration_statement behavior than now. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-04 23:52:28 +, Tsunakawa, Takayuki wrote: > From: Andres Freund [mailto:and...@anarazel.de] > > Looks to me like npgsql doesn't do that either. None of libpq, pgjdbs and > > npgsql doing it seems like some evidence that it's ok. > > And psqlODBC now uses always libpq. > > Now time for final decision? I'll send an updated patch in a bit, and then will wait till tomorrow to push. Giving others a chance to chime in seems fair. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] delta relations in AFTER triggers
On Wed, Apr 5, 2017 at 11:49 AM, Kevin Grittnerwrote: > Worked on the docs some more and then pushed it. > > Nice job cutting the number of *.[ch] lines by 30 while adding support for > the other three core PLs. :-) Great. Thanks. I wonder if there is some way we can automatically include code fragments in the documentation without keeping them in sync manually. Now it looks like I have no more excuses for putting off reading the papers you've shared[1][2] about incremental matview algorithms! Looking forward to helping out with the next steps in that project if I can. [1] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.40.2254=rep1=pdf [2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.31.3208=rep1=pdf -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Speedup twophase transactions
On Tue, Mar 28, 2017 at 3:10 PM, Michael Paquierwrote: > OK, done. I have just noticed that Simon has marked himself as a > committer of this patch 24 hours ago. For the archive's sake, this has been committed as 728bd991. Thanks Simon! -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Reduce src/test/recovery verbosity
On Wed, Apr 5, 2017 at 12:39 AM, Stephen Frostwrote: > Committed, with those additions. Thanks for the commit. The final result is nice. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Wed, Apr 5, 2017 at 1:43 AM, Andres Freundwrote: > On 2017-04-04 08:01:32 -0400, Robert Haas wrote: > > On Tue, Apr 4, 2017 at 12:47 AM, Andres Freund > wrote: > > > I don't think the parallel seqscan is comparable in complexity with the > > > parallel append case. Each worker there does the same kind of work, > and > > > if one of them is behind, it'll just do less. But correct sizing will > > > be more important with parallel-append, because with non-partial > > > subplans the work is absolutely *not* uniform. > > > > Sure, that's a problem, but I think it's still absolutely necessary to > > ramp up the maximum "effort" (in terms of number of workers) > > logarithmically. If you just do it by costing, the winning number of > > workers will always be the largest number that we think we'll be able > > to put to use - e.g. with 100 branches of relatively equal cost we'll > > pick 100 workers. That's not remotely sane. > > I'm quite unconvinced that just throwing a log() in there is the best > way to combat that. Modeling the issue of starting more workers through > tuple transfer, locking, startup overhead costing seems a better to me. > > If the goal is to compute the results of the query as fast as possible, > and to not use more than max_parallel_per_XXX, and it's actually > beneficial to use more workers, then we should. Because otherwise you > really can't use the resources available. > +1. I had expressed similar opinion earlier, but yours is better articulated. Thanks. -- Best Wishes, Ashutosh Bapat EnterpriseDB Corporation The Postgres Database Company
Re: [HACKERS] Rewriting the test of pg_upgrade as a TAP test
On Tue, Apr 4, 2017 at 10:52 PM, Peter Eisentrautwrote: > On 4/3/17 11:32, Andres Freund wrote: >> That doesn't strike as particularly future proof. We intentionally >> leave objects behind pg_regress runs, but that only works if we actually >> run them... > > I generally agree with the sentiments expressed later in this thread. > But just to clarify what I meant here: We don't need to run a, say, > 1-minute serial test to load a few "left behind" objects for the > pg_upgrade test, if we can load the same set of objects using dedicated > scripting in say 2 seconds. This would make both the pg_upgrade tests > faster and would reduce the hidden dependencies in the main tests about > which kinds of objects need to be left behind. Making the tests run shorter while maintaining the current code coverage is nice. But this makes more complicated the test suite maintenance as this needs either a dedicated regression schedule or an extra test suite where objects are created just for the sake of pg_upgrade. This increases the risks of getting a rotten test suite with the time if patch makers and reviewers are not careful. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] tuplesort_gettuple_common() and *should_free argument
On Tue, Apr 4, 2017 at 1:32 PM, Andres Freundwrote: > s/Avoid/Allow to avoid/ WFM. >> + * >> + * Callers cannot rely on memory for tuple in returned slot remaining valid >> + * past any subsequent manipulation of the sorter, such as another fetch of >> + * input from sorter. (The sorter may recycle memory.) >> */ > > It's not just the sorter, right? I'd just rephrase that the caller > cannot rely on tuple to stay valid beyond a single call. It started that way, but changed on the basis of Tom's feedback. I would be equally happy with either wording. >> static bool >> tuplesort_gettuple_common(Tuplesortstate *state, bool forward, >> @@ -2091,12 +2092,15 @@ tuplesort_gettuple_common(Tuplesortstate *state, >> bool forward, >> * NULL value in leading attribute will set abbreviated value to zeroed >> * representation, which caller may rely on in abbreviated inequality check. >> * >> - * The slot receives a copied tuple (sometimes allocated in caller memory >> - * context) that will stay valid regardless of future manipulations of the >> - * tuplesort's state. >> + * If copy is TRUE, the slot receives a copied tuple that will stay valid >> + * regardless of future manipulations of the tuplesort's state. Memory is >> + * owned by the caller. If copy is FALSE, the slot will just receive a >> + * pointer to a tuple held within the tuplesort, which is more efficient, >> + * but only safe for callers that are prepared to have any subsequent >> + * manipulation of the tuplesort's state invalidate slot contents. >> */ > > Hm. Do we need to note anything about how long caller memory needs to > live? I think we'd get into trouble if the caller does a memory context > reset, without also clearing the slot? Since we arrange to have caller explicitly own memory (it's always in their own memory context (current context), which is not the case with other similar routines), it's 100% the caller's problem. As I assume you know, convention in executor around resource management of memory like this is pretty centralized within ExecStoreTuple(), and nearby routines. In general, the executor is more or less used to having this be its problem alone, and is pessimistic about memory lifetime unless otherwise noted. > Other than these minimal adjustments, this looks good to go to me. Cool. I'll try to get out a revision soon, maybe later today, including an updated 0002-* (Valgrind suppression), which I have not forgotten about. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-05 08:34:43 +0900, Tatsuo Ishii wrote: > Andres, > > >> I think the code needs a few clarifying comments around this, but > >> otherwise seems good. Not restarting the timeout in those cases > >> obviously isn't entirely "perfect"/"correct", but a tradeoff - the > >> comments should note that. > >> > >> Tatsuo-san, do you want to change those, and push? I can otherwise. > > > > Andres, > > > > If you don't mind, could you please fix the comments and push it. > > I have changed the comments as you suggested. If it's ok, I can push > the patch myself (today I have time to work on this). I'm working on the patch, and I've edited it more heavily, so please hold off. Changes: I don't think the debugging statements are a good idea, the !xact_started should be an assert, and disable_timeout should be called from within enable_statement_timeout independent of stmt_timer_started. But more importantly I had just sent a question that I think merits discussion. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-04 16:38:53 -0700, Andres Freund wrote: > On 2017-04-04 16:10:32 +0900, Tatsuo Ishii wrote: > > >> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute > > >> starts and stops the timer), then it's a concern and the patch should > > >> not be ready for committer. However, the current patch is not like that > > >> -- it seems to do what others in this thread are expecting. > > > > > > Oh, interesting - I kind of took the author's statement as, uh, > > > authoritative ;). A quick look over the patch confirms your > > > understanding. > > > > Yes, Tsunakawa-san is correct. Sorry for confusion. > > > > > I think the code needs a few clarifying comments around this, but > > > otherwise seems good. Not restarting the timeout in those cases > > > obviously isn't entirely "perfect"/"correct", but a tradeoff - the > > > comments should note that. > > > > > > Tatsuo-san, do you want to change those, and push? I can otherwise. > > > > Andres, > > > > If you don't mind, could you please fix the comments and push it. > > Hm. I started to edit it, but I'm halfway coming back to my previous > view that this isn't necessarily ready. > > If a client were to to prepare a large number of prepared statements > (i.e. a lot of parse messages), this'll only start the timeout once, at > the first statement sent. It's not an issue for libpq which sends a > sync message after each PQprepare, nor does it look to be an issue for > pgjdbc based on a quick look. > > Does anybody think there might be a driver out there that sends a bunch > of 'parse' messages, without syncs? Looks to me like npgsql doesn't do that either. None of libpq, pgjdbs and npgsql doing it seems like some evidence that it's ok. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 5 April 2017 at 08:00, Craig Ringerwrote: > Taking a look at this now. Rebased to current master with conflicts and whitespace errors fixed. Review pending. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services From 71b6163734934db3160de81fd064e7d113b9122f Mon Sep 17 00:00:00 2001 From: Jim Nasby Date: Wed, 1 Mar 2017 15:45:51 -0600 Subject: [PATCH 1/2] Add SPI_execute_callback() and callback-based DestReceiver. Instead of placing results in a tuplestore, this method of execution uses the supplied callback when creating the Portal for a query. --- src/backend/executor/spi.c | 79 -- src/backend/tcop/dest.c| 11 +++ src/include/executor/spi.h | 4 +++ src/include/tcop/dest.h| 1 + 4 files changed, 85 insertions(+), 10 deletions(-) diff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c index 3a89ccd..a0af31a 100644 --- a/src/backend/executor/spi.c +++ b/src/backend/executor/spi.c @@ -55,7 +55,8 @@ static void _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan); static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, Snapshot snapshot, Snapshot crosscheck_snapshot, - bool read_only, bool fire_triggers, uint64 tcount); + bool read_only, bool fire_triggers, uint64 tcount, + DestReceiver *callback); static ParamListInfo _SPI_convert_params(int nargs, Oid *argtypes, Datum *Values, const char *Nulls); @@ -321,7 +322,34 @@ SPI_execute(const char *src, bool read_only, long tcount) res = _SPI_execute_plan(, NULL, InvalidSnapshot, InvalidSnapshot, - read_only, true, tcount); + read_only, true, tcount, NULL); + + _SPI_end_call(true); + return res; +} +int +SPI_execute_callback(const char *src, bool read_only, long tcount, + DestReceiver *callback) +{ + _SPI_plan plan; + int res; + + if (src == NULL || tcount < 0) + return SPI_ERROR_ARGUMENT; + + res = _SPI_begin_call(true); + if (res < 0) + return res; + + memset(, 0, sizeof(_SPI_plan)); + plan.magic = _SPI_PLAN_MAGIC; + plan.cursor_options = 0; + + _SPI_prepare_oneshot_plan(src, ); + + res = _SPI_execute_plan(, NULL, + InvalidSnapshot, InvalidSnapshot, + read_only, true, tcount, callback); _SPI_end_call(true); return res; @@ -355,7 +383,34 @@ SPI_execute_plan(SPIPlanPtr plan, Datum *Values, const char *Nulls, _SPI_convert_params(plan->nargs, plan->argtypes, Values, Nulls), InvalidSnapshot, InvalidSnapshot, - read_only, true, tcount); + read_only, true, tcount, NULL); + + _SPI_end_call(true); + return res; +} + +/* Execute a previously prepared plan with a callback Destination */ +int +SPI_execute_plan_callback(SPIPlanPtr plan, Datum *Values, const char *Nulls, + bool read_only, long tcount, DestReceiver *callback) +{ + int res; + + if (plan == NULL || plan->magic != _SPI_PLAN_MAGIC || tcount < 0) + return SPI_ERROR_ARGUMENT; + + if (plan->nargs > 0 && Values == NULL) + return SPI_ERROR_PARAM; + + res = _SPI_begin_call(true); + if (res < 0) + return res; + + res = _SPI_execute_plan(plan, + _SPI_convert_params(plan->nargs, plan->argtypes, +Values, Nulls), + InvalidSnapshot, InvalidSnapshot, + read_only, true, tcount, callback); _SPI_end_call(true); return res; @@ -384,7 +439,7 @@ SPI_execute_plan_with_paramlist(SPIPlanPtr plan, ParamListInfo params, res = _SPI_execute_plan(plan, params, InvalidSnapshot, InvalidSnapshot, - read_only, true, tcount); + read_only, true, tcount, NULL); _SPI_end_call(true); return res; @@ -425,7 +480,7 @@ SPI_execute_snapshot(SPIPlanPtr plan, _SPI_convert_params(plan->nargs, plan->argtypes, Values, Nulls), snapshot, crosscheck_snapshot, - read_only, fire_triggers, tcount); + read_only, fire_triggers, tcount, NULL); _SPI_end_call(true); return res; @@ -472,7 +527,7 @@ SPI_execute_with_args(const char *src, res = _SPI_execute_plan(, paramLI, InvalidSnapshot, InvalidSnapshot, - read_only, true, tcount); + read_only, true, tcount, NULL); _SPI_end_call(true); return res; @@ -1907,7 +1962,8 @@ _SPI_prepare_oneshot_plan(const char *src, SPIPlanPtr plan) static int _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, Snapshot snapshot, Snapshot crosscheck_snapshot, - bool read_only, bool fire_triggers, uint64 tcount) + bool read_only, bool fire_triggers, uint64 tcount, + DestReceiver *callback) { int my_res = 0; uint64 my_processed = 0; @@ -1918,6 +1974,7 @@ _SPI_execute_plan(SPIPlanPtr plan, ParamListInfo paramLI, ErrorContextCallback spierrcontext; CachedPlan *cplan = NULL; ListCell *lc1; + DestReceiver *dest = callback; /* * Setup error traceback support for ereport() @@ -2037,7 +2094,6 @@
Re: [HACKERS] Faster methods for getting SPI results (460% improvement)
On 5 April 2017 at 08:23, Craig Ringerwrote: > On 5 April 2017 at 08:00, Craig Ringer wrote: > >> Taking a look at this now. > > Rebased to current master with conflicts and whitespace errors fixed. > Review pending. This patch fails to update the documentation at all. https://www.postgresql.org/docs/devel/static/spi.html The patch crashes in initdb with --enable-cassert builds: performing post-bootstrap initialization ... TRAP: FailedAssertion("!(myState->pub.mydest == DestSQLFunction)", File: "functions.c", Line: 800) sh: line 1: 30777 Aborted (core dumped) "/home/craig/projects/2Q/postgres/tmp_install/home/craig/pg/10/bin/postgres" --single -F -O -j -c search_path=pg_catalog -c exit_on_error=true template1 > /dev/null child process exited with exit code 134 Backtrace attached. Details on patch 1: missing newline +} +int +SPI_execute_callback(const char *src, bool read_only, long tcount, +/* Execute a previously prepared plan with a callback Destination */ caps "Destination" +// XXX throw error if callback is set ^^ +static DestReceiver spi_callbackDR = { +donothingReceive, donothingStartup, donothingCleanup, donothingCleanup, +DestSPICallback +}; Is the callback destreceiver supposed to be a blackhole? Why? Its name, spi_callbackDR and DestSPICallback, doesn't seem to indicate that it drops its input. Presumably that's a default destination you're then supposed to modify with your own callbacks? There aren't any comments to indicate what's going on here. Comments on patch 2: If this is the "bare minimum" patch, what is pending? Does it impose any downsides or limits? +/* Get switch execution contexts */ +extern PLyExecutionContext *PLy_switch_execution_context(PLyExecutionContext *new); Comment makes no sense to me. This seems something like memory context switch, where you supply the new and return the old. But the comment doesn't explain it at all. +void PLy_CSStartup(DestReceiver *self, int operation, TupleDesc typeinfo); +void PLy_CSDestroy(DestReceiver *self); These are declared in the plpy_spi.c. Why aren't these static? +volatile MemoryContext oldcontext; +volatile ResourceOwner oldowner; int rv; -volatile MemoryContext oldcontext; -volatile ResourceOwner oldowner; Unnecessary code movement. In PLy_Callback_New, I think your use of a sub-context is sensible. Is it necessary to palloc0 though? +static CallbackState * +PLy_Callback_Free(CallbackState *callback) The code here looks like it could be part of spi.c's callback support, rather than plpy_spi specific, since you provide a destroy callback in the SPI callback struct. +/* We need to store this because the TupleDesc the receive function gets has no names. */ +myState->desc = typeinfo; Is it safe to just store the pointer to the TupleDesc here? What's the lifetime? + * will clean it up when the time is right. XXX This might result in a leak + * if an error happens and the result doesn't get dereferenced. + */ +MemoryContextSwitchTo(TopMemoryContext); +result->tupdesc = CreateTupleDescCopy(typeinfo); especially given this XXX comment... Patch needs bug fix, docs updates, fixes for issues marked in comments. But overall approach looks sensible enough. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services #0 0x7f9d5d64ea28 in raise () from /lib64/libc.so.6 No symbol table info available. #1 0x7f9d5d65062a in abort () from /lib64/libc.so.6 No symbol table info available. #2 0x0083a521 in ExceptionalCondition (conditionName=conditionName@entry=0x9b68d0 "!(myState->pub.mydest == DestSQLFunction)", errorType=errorType@entry=0x882d29 "FailedAssertion", fileName=fileName@entry=0x9b6920 "functions.c", lineNumber=lineNumber@entry=800) at assert.c:54 No locals. #3 0x006166e8 in postquel_start (fcache=0x27772c0, es=0x284eb58) at functions.c:800 myState = 0x284eeb8 dest = 0x284eeb8 #4 fmgr_sql (fcinfo=0x27a7a28) at functions.c:1149 fcache = 0x27772c0 sqlerrcontext = {previous = 0x0, callback = 0x614910 , arg = 0x27a79d0} randomAccess = 0 '\000' lazyEvalOK = is_first = pushed_snapshot = 0 '\000' es = 0x284eb58 slot = result = eslist = eslc = 0x284eb90 __func__ = "fmgr_sql" #5 0x0060810b in ExecInterpExpr (state=0x27a7528, econtext=0x27a4ce0, isnull=) at execExprInterp.c:670 fcinfo = 0x27a7a28 argnull = 0x27a7d68 "" argno = op = 0x27a76f8 resultslot = 0x27a7130 innerslot = outerslot = 0x27a5930 scanslot = 0x0 dispatch_table = {0x608960 , 0x607e00 , 0x607e20 , 0x607e38 , 0x607c90
Re: [HACKERS] Patch: Write Amplification Reduction Method (WARM)
On Wed, Apr 5, 2017 at 8:42 AM, Robert Haaswrote: > On Tue, Apr 4, 2017 at 10:21 PM, Pavan Deolasee > wrote: > > On Thu, Mar 30, 2017 at 7:55 PM, Robert Haas > wrote: > >> but > >> try to access the TOAST table would be fatal; that probably would have > >> deadlock hazards among other problems. > > > > Hmm. I think you're right. We could make a copy of the heap tuple, drop > the > > lock and then access TOAST to handle that. Would that work? > > Yeah, but it might suck. :-) Well, better than causing a deadlock ;-) Lets see if we want to go down the path of blocking WARM when tuples have toasted attributes. I submitted a patch yesterday, but having slept over it, I think I made mistakes there. It might not be enough to look at the caller supplied new tuple because that may not have any toasted values, but the final tuple that gets written to the heap may be toasted. We could look at the new tuple's attributes to find if any indexed attributes are toasted, but that might suck as well. Or we can simply block WARM if the old or the new tuple has external attributes i.e. HeapTupleHasExternal() returns true. That could be overly restrictive because irrespective of whether the indexed attributes are toasted or just some other attribute is toasted, we will block WARM on such updates. May be that's not a problem. We will also need to handle the case where some older tuple in the chain has toasted value and that tuple is presented to recheck (I think we can handle that case fairly easily, but its not done in the code yet) because of a subsequent WARM update and the tuples updated by WARM did not have any toasted values (and hence allowed). Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] BRIN cost estimate
On 3 April 2017 at 03:05, Emre Hasegeliwrote: > Unfortunately, I am on vacation for two weeks without my computer. I can > post another version after 18th. I know we are under time pressure for > release. I wouldn't mind if you or Alvaro would change it anyway you like. I've made some changes. Actually, I completely changed how the estimates work. I find this method more self-explanatory. Basically, we work out the total index ranges, then work out how many of those we'd touch in a perfectly correlated scenario. We then work out how many ranges we'll probably visit based on the correlation estimates from the stats, and assume the selectivity is probableRanges / totalIndexRanges. I've attached a spreadsheet that compares Emre's method to mine. Mine seems to favour the BRIN index less when the table is small. I think this is pretty natural since if there is only 1 range, and we narrow the result to one of them, then we might as well have performed a seqscan. My method seems favour BRIN a bit longer when the correlation is between about 1% and 100%. But penalises BRIN much more when correlation is less than around 1%. This may be better my way is certainly smarter than the unpatched version, but still holds on a bit longer, which may be more favourable if a BRIN index actually exists. It might be more annoying for a user to have added a BRIN index and never have the planner choose it. My method also never suffers from estimating > 100% of the table. I was a bit worried that Emre's method would penalise BRIN too much when the correlation is not so high. Interested to hear comments on this. Please feel free to play with the spreadsheet by changing rows 1-3 in column B. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services brin-correlation-drowley_v1.patch Description: Binary data BRIN_estimates2.ods Description: application/vnd.oasis.opendocument.spreadsheet -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiler warning in costsize.c
On Wed, Apr 5, 2017 at 2:54 AM, Tom Lanewrote: > Michael Paquier writes: >> In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can >> generate warnings. Here is for example with MSVC: >> src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : >> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj] >> src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : >> unreferenced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj] > >> a9c074ba7 has done an effort, but a bit more is needed as the attached. > > That doesn't look right at all: > > +#ifdef USE_ASSERT_CHECKING > RangeTblEntry *rte PG_USED_FOR_ASSERTS_ONLY; > +#endif > > The entire point of the PG_USED_FOR_ASSERTS_ONLY annotation is to suppress > this type of warning without a need for an explicit #ifdef like that one. > > We should either fix PG_USED_FOR_ASSERTS_ONLY to work on MSVC as well, > or decide that we don't care about suppressing such warnings on MSVC, > or give up on PG_USED_FOR_ASSERTS_ONLY and remove it everywhere in > favor of plain #ifdefs. Visual does not have any equivalent of __attribute__((unused))... And visual does not have an equivalent after looking around. A trick that I can think of would be to change PG_USED_FOR_ASSERTS_ONLY to be roughly a macro like that: #if defined(__GNUC__) #define PG_ASSERT_ONLY(x) ASSERT_ONLY_ ## x __attribute__((unused)) #else #define PG_ASSERT_ONLY(x) ((void) x) #endif But that's ugly... > (I'm personally not that much in love with PG_USED_FOR_ASSERTS_ONLY, > because it tends to confuse pgindent.) I would be incline to just do that, any other solution I can think of is uglier than that. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] partitioned tables and contrib/sepgsql
Robert Haaswrites: > On Tue, Apr 4, 2017 at 6:56 PM, Joe Conway wrote: >> Any objections? > I'm guessing Tom's going to have a strong feeling about whether 0001a > is the right way to address the stdbool issue, I will? [ looks ... ] Yup, you're right. I doubt that works at all, TBH. What I'd expect to happen with a typical compiler is a complaint about redefinition of typedef bool, because c.h already declared it and here this fragment is doing so again. It'd make sense to me to do + #ifdef bool + #undef bool + #endif to get rid of the macro definition of bool that stdbool.h is supposed to provide. But there should be no reason to declare our typedef a second time. Another issue is whether you won't get compiler complaints about redefinition of the "true" and "false" macros. But those would likely only be warnings, not flat-out errors. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
>> What do you think? I've not really tested this with the extended protocol, >> so I'd appreciate if you could rerun your test from the older thread. > > The patch looks good and cleaner. It looks like the code works as expected. > As before, I ran one INSERT statement with PgJDBC, with gdb's breakpoints set > on enable_timeout() and disable_timeout(). I confirmed that enable_timeout() > is called just once at Parse message, and disable_timeout() is called just > once at Execute message. > > I'd like to wait for Tatsuo-san's thorough testing with pgproto. I have done tests using pgproto. One thing I noticed a strange behavior. Below is an output of pgproto. The test first set the timeout to 3 seconds, and parse/bind for "SELECT pg_sleep(2)" then set timeout to 1 second using extended query. Subsequent Execute emits a statement timeout error as expected, but next "SELECT pg_sleep(2)" call using extended query does not emit a statement error. The test for this is "007-timeout-twice". Attached is the test cases including this. FE=> Query(query="SET statement_timeout = '3s'") <= BE CommandComplete(SET) <= BE ReadyForQuery(I) FE=> Parse(stmt="S1", query="SELECT pg_sleep(2)") FE=> Bind(stmt="S1", portal="S2") FE=> Parse(stmt="", query="SET statement_timeout = '1s'") FE=> Bind(stmt="", portal="") FE=> Execute(portal="") FE=> Execute(portal="S2") FE=> Sync <= BE ParseComplete <= BE BindComplete <= BE ParseComplete <= BE BindComplete <= BE CommandComplete(SET) <= BE ErrorResponse(S ERROR V ERROR C 57014 M canceling statement due to statement timeout F postgres.c L 2968 R ProcessInterrupts ) <= BE ReadyForQuery(I) FE=> Parse(stmt="S3", query="SELECT pg_sleep(2)") FE=> Bind(stmt="S3", portal="S2") FE=> Execute(portal="S2") FE=> Sync <= BE ParseComplete <= BE BindComplete <= BE DataRow <= BE CommandComplete(SELECT 1) <= BE ReadyForQuery(I) FE=> Terminate tests.tar.gz Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v25)
At Tue, 4 Apr 2017 20:19:39 +0200, Tomas Vondrawrote in <56f40b20-c464-fad2-ff39-06b668fac...@2ndquadrant.com> > On 04/04/2017 09:55 AM, David Rowley wrote: > > On 1 April 2017 at 04:25, David Rowley > > wrote: > >> I've attached an updated patch. > > > > I've made another pass at this and ended up removing the tryextstats > > variable. We now only try to use extended statistics when > > clauselist_selectivity() is given a valid RelOptInfo with rtekind == > > RTE_RELATION, and of course, it must also have some extended stats > > defined too. > > > > I've also cleaned up a few more comments, many of which I managed to > > omit updating when I refactored how the selectivity estimates ties > > into clauselist_selectivity() > > > > I'm quite happy with all of this now, and would also be happy for > > other people to take a look and comment. > > > > As a reviewer, I'd be marking this ready for committer, but I've moved > > a little way from just reviewing this now, having spent two weeks > > hacking at it. > > > > The latest patch is attached. > > > > Thanks David, I agree the reworked patch is much cleaner that the last > version I posted. Thanks for spending your time on it. > > Two minor comments: > > 1) DEPENDENCY_MIN_GROUP_SIZE > > I'm not sure we still need the min_group_size, when evaluating > dependencies. It was meant to deal with 'noisy' data, but I think it > after switching to the 'degree' it might actually be a bad idea. > > Consider this: > > create table t (a int, b int); > insert into t select 1, 1 from generate_series(1, 1) s(i); > insert into t select i, i from generate_series(2, 2) s(i); > create statistics s with (dependencies) on (a,b) from t; > analyze t; > > select stadependencies from pg_statistic_ext ; > stadependencies > > [{1 => 2 : 0.44}, {2 => 1 : 0.44}] > (1 row) > > So the degree of the dependency is just ~0.333 although it's obviously > a perfect dependency, i.e. a knowledge of 'a' determines 'b'. The > reason is that we discard 2/3 of rows, because those groups are only a > single row each, except for the one large group (1/3 of rows). > > Without the mininum group size limitation, the dependencies are: > > test=# select stadependencies from pg_statistic_ext ; > stadependencies > > [{1 => 2 : 1.00}, {2 => 1 : 1.00}] > (1 row) > > which seems way more reasonable, I think. I think the same. Quite large part of functional dependency in reality is in this kind. > 2) A minor detail is that instead of this > > if (estimatedclauses != NULL && > bms_is_member(listidx, estimatedclauses)) > continue; > > perhaps we should do just this: > > if (bms_is_member(listidx, estimatedclauses)) > continue; > > bms_is_member does the same NULL check right at the beginning, so I > don't think this might make a measurable difference. I have some other comments. == - The comment for clauselist_selectivity, | + * When 'rel' is not null and rtekind = RTE_RELATION, we'll try to apply | + * selectivity estimates using any extended statistcs on 'rel'. The 'rel' is actually a parameter but rtekind means rel->rtekind so this might be better be such like the following. | When a relation of RTE_RELATION is given as 'rel', we try | extended statistcs on the relation. Then the following line doesn't seem to be required. | + * If we identify such extended statistics exist, we try to apply them. = The following comment in the same function, | +if (rel && rel->rtekind == RTE_RELATION && rel->statlist != NIL) | +{ | +/* | + * Try to estimate with multivariate functional dependency statistics. | + * | + * The function will supply an estimate for the clauses which it | + * estimated for. Any clauses which were unsuitible were ignored. | + * Clauses which were estimated will have their 0-based list index set | + * in estimatedclauses. We must ignore these clauses when processing | + * the remaining clauses later. | + */ (Notice that I'm not a good writer) This might better be the following. | dependencies_clauselist_selectivity gives selectivity over | caluses that functional dependencies on the given relation is | applicable. 0-based index numbers of consumed clauses are | returned in the bitmap set estimatedclauses so that the | estimation here after can ignore them. = | +s1 *= dependencies_clauselist_selectivity(root, clauses, varRelid, | + jointype, sjinfo, rel, ); The name prefix "dependency_" means "functional_dependency" here and omitting "functional" is confusing to me. On the other hand "functional_dependency" is quite long as prefix. Could we
Re: [HACKERS] identity columns
On 4/3/17, Peter Eisentrautwrote: > On 3/30/17 22:57, Vitaly Burovoy wrote: >> Why do you still want to leave "ADD IF NOT EXISTS" instead of using >> "SET IF NOT EXISTS"? >> If someone wants to follow the standard he can simply not to use "IF >> NOT EXISTS" at all. Without it error should be raised. > > As I tried to mention earlier, it is very difficult to implement the IF > NOT EXISTS behavior here, because we need to run the commands the create > the sequence before we know whether we will need it. I understand. You did not mention the reason. But now you have the "get_attidentity" function to check whether the column is an identity or not. If it is not so create a sequence otherwise skip the creation. I'm afraid after the feature freeze it is impossible to do "major reworking of things", but after the release we have to support the "ALTER COLUMN col ADD" grammar. > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services > The next nitpickings to the last patch. I try to get places with lacking of variables' initialization. All other things seem good for me now. I'll continue to review the patch while you're fixing the current notes. Unfortunately I don't know PG internals so deep to understand correctness of changes in the executor (what Tom and Andres are talking about). 0. There is a little conflict of applying to the current master because of 60a0b2e. 1. First of all, I think the previous usage of the constant "ATTRIBUTE_IDENTITY_NONE" (for setting a value) is better than just '\0'. It is OK for lacking of the constant in comparison. 2. Lack of "SET IDENTITY" and "RESTART" in the "Compatibility" chapter in "alter_table.sgml": "The forms ADD (without USING INDEX), DROP, SET DEFAULT, and SET DATA TYPE (without USING) conform with the SQL standard." Also ADD IDENTITY is an extension (I hope temporary), but it looks like a standard feature (from the above sentance). 3. It would be better if tab-completion has ability to complete the "RESTART" keyword like: ALTER TABLE x1 ALTER COLUMN i RESTART tab-complete.c:8898 - COMPLETE_WITH_LIST5("TYPE", "SET", "RESET", "ADD", "DROP"); + COMPLETE_WITH_LIST5("TYPE", "SET", "RESET", "RESTART", "ADD", "DROP"); 4. The block in "gram.y": /* ALTER TABLE ALTER [COLUMN] DROP IDENTITY */ does not contain "n->missing_ok = false;". I think it should be. 5. In the file "pg_dump.c" in the "getTableAttrs" function at row 7959 the "tbinfo->needs_override" is used (in the "||" operator) but it is initially nowhere set to "false". 6. And finally... a segfault when pre-9.6 databases are pg_dump-ing: SQL query in the file "pg_dump.c" in funcion "getTables" has the "is_identity_sequence" column only in the "if (fout->remoteVersion >= 90600)" block (frow 5536), whereas 9.6 does not support it (but OK, for 9.6 it is always FALSE, no sense to create a new "if" block), but in other blocks no ",FALSE as is_identity_sequence" is added. The same mistake is in the "getTableAttrs" function. Moreover whether "ORDER BY a.attrelid, a.attnum" is really necessary ("else if (fout->remoteVersion >= 90200)" has only "ORDER BY a.attnum")? -- Best regards, Vitaly Burovoy -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
On 5 April 2017 at 10:37, Tsunakawa, Takayukiwrote: > Good point! And I said earlier in this thread, I think managing privileges > (adding/revoking privileges from the user account) is the DBA's or sysadmin's > duty, and PG's removing all privileges feels overkill. I think it's a sensible alternative to refusing to run as a highly privileged role, which is what we used to do IIRC. > OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock > Pages in Memory, using the attached pg_ctl.c. Please see > EnableLockPagesPrivilege() and its call site. But pg_ctl -w start fails > emitting the following message: That won't work. You'd have to pass 0 to the flags of CreateRestrictedToken and instead supply a PrivilegesToDelete array. You'd probably GetTokenInformation and AND with a mask of ones you wanted to retain. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] BRIN cost estimate
> Interested to hear comments on this. I don't have chance to test it right now, but I am sure it would be an improvement over what we have right now. There is no single correct equation with so many unknowns we have. > *indexTotalCost += (numTuples * *indexSelectivity) * (cpu_index_tuple_cost + qual_op_cost); Have you preserved this line intentionally? This would make BRIN index scan cost even higher, but the primary property of BRIN is to be cheap to scan. Shouldn't bitmap heap scan node take into account of checking the tuples in its cost?
Re: [HACKERS] Supporting huge pages on Windows
From: Craig Ringer [mailto:craig.rin...@2ndquadrant.com] > On 5 April 2017 at 10:37, Tsunakawa, Takayuki >wrote: > > OTOH, I tried again to leave the DISABLE_MAX_PRIVILEGE as is and add Lock > Pages in Memory, using the attached pg_ctl.c. Please see > EnableLockPagesPrivilege() and its call site. But pg_ctl -w start fails > emitting the following message: > > That won't work. You'd have to pass 0 to the flags of CreateRestrictedToken > and instead supply a PrivilegesToDelete array. > You'd probably GetTokenInformation and AND with a mask of ones you wanted > to retain. Uh, that's inconvenient. We can't determine what privileges to delete, and we must be aware of new privileges added in the future version of Windows. Then, I have to say the last patch (v12) is the final answer. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256
fore On Wed, Apr 5, 2017 at 7:05 AM, Heikki Linnakangaswrote: > I will continue tomorrow, but I wanted to report on what I've done so far. > Attached is a new patch version, quite heavily modified. Notable changes so > far: Great, thanks! > * Use Unicode codepoints, rather than UTF-8 bytes packed in a 32-bit ints. > IMHO this makes the tables easier to read (to a human), and they are also > packed slightly more tightly (see next two points), as you can fit more > codepoints in a 16-bit integer. Using directly codepoints is not much consistent with the existing backend, but for the sake of packing things more, OK. > * Reorganize the decomposition table. Instead of a separate binary-searched > table for each "size", have one table that's ordered by codepoint, which > contains a length and offset into another array, where all the decomposed > sequences are. This makes the recomposition step somewhat slower, as it now > has to grovel through an array that contains all decompositions, rather than > just the array that contains decompositions of two characters. But this > isn't performance-critical, readability and tight packing of the tables > matter more. Okay, no objection to that. > * In the lookup table, store singleton decompositions that decompose to a > single character, and the character fits in a uint16, directly in the main > lookup table, instead of storing an index into the other table. This makes > the tables somewhat smaller, as there are a lot of such decompositions. Indeed. > * Use uint32 instead of pg_wchar to represent unicode codepoints. pg_wchar > suggests something that holds multi-byte characters in the OS locale, used > by functions like wcscmp, so avoid that. I added a "typedef uint32 > Codepoint" alias, though, to make it more clear which variables hold Unicode > codepoints rather than e.g. indexes into arrays. > > * Unicode consortium publishes a comprehensive normalization test suite, as > a text file, NormalizationTest.txt. I wrote a small perl and C program to > run all the tests from that test suite, with the normalization function. > That uncovered a number of bugs in the recomposition algorithm, which I then > fixed: I was looking for something like that for some time, thanks! That's here actually: http://www.unicode.org/Public/8.0.0/ucd/NormalizationTest.txt > * decompositions that map to ascii characters cannot be excluded. > * non-canonical and non-starter decompositions need to be excluded. They > are now flagged in the lookup table, so that we only use them during the > decomposition phase, not when recomposing. Okay on that. > TODOs / Discussion points: > > * I'm not sure I like the way the code is organized, I feel that we're > littering src/common with these. Perhaps we should add a > src/common/unicode_normalization directory for all this. pg_utf8_islegal() and pg_utf_mblen() should as well be moved in their own file I think, and wchar.c can use that. > * The list of characters excluded from recomposition is currently hard-coded > in utf_norm_generate.pl. However, that list is available in machine-readable > format, in file CompositionExclusions.txt. Since we're reading most of the > data from UnicodeData.txt, would be good to read the exclusion table from a > file, too. Ouch. Those are present here... http://www.unicode.org/reports/tr41/tr41-19.html#Exclusions Definitely it makes more sense to read them from a file. > * SASLPrep specifies normalization form KC, but it also specifies that some > characters are mapped to space or nothing. Should do those mappings, too. Ah, right. Those ones are here: https://tools.ietf.org/html/rfc3454#appendix-B.1 The conversion tables need some extra tweaking... + else if ((*utf & 0xf0) == 0xe0) + { + if (utf[1] == 0 || utf[2] == 0) + goto invalid; + cp = (*utf++) & 0x0F; + cp = (cp << 6) | ((*utf++) & 0x3F); + cp = (cp << 6) | ((*utf++) & 0x3F); + } + else if ((*utf & 0xf8) == 0xf0) + { + if (utf[1] == 0 || utf[2] == 0|| utf[3] == 0) + goto invalid; + cp = (*utf++) & 0x07; + cp = (cp << 6) | ((*utf++) & 0x3F); + cp = (cp << 6) | ((*utf++) & 0x3F); + cp = (cp << 6) | ((*utf++) & 0x3F); + } I find more readable something like pg_utf2wchar_with_len(), but well... Some typos: s/fore/for/ s/reprsented/represented/ You seem to be fully on it... I can help out with any of those items as needed. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
Andres Freundwrites: > On 2017-04-05 10:05:19 +0900, Tatsuo Ishii wrote: >> What's your point of the question? What kind of problem do you expect >> if the timeout starts only once at the first parse meesage out of >> bunch of parse messages? > It's perfectly valid to send a lot of Parse messages without > interspersed Sync or Bind/Execute message. There'll be one timeout > covering all of those Parse messages, which can thus lead to a timeout, > even though nothing actually takes long individually. It might well be reasonable to redefine statement_timeout as limiting the total time from the first client input to the response to Sync ... but if that's what we're doing, let's make sure we do it consistently. I haven't read the patch, but the comments in this thread make me fear that it's introducing some ad-hoc, inconsistent behavior. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] partitioned tables and contrib/sepgsql
Peter Eisentrautwrites: > On 4/5/17 00:58, Tom Lane wrote: >> Another issue is whether you won't get compiler complaints about >> redefinition of the "true" and "false" macros. But those would >> likely only be warnings, not flat-out errors. > The complaint about bool is also just a warning. Really? $ cat test.c typedef char bool; typedef char bool; $ gcc -c test.c test.c:2: error: redefinition of typedef 'bool' test.c:1: note: previous declaration of 'bool' was here This is with gcc 4.4.7. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Adding support for Default partition in partitioning
On 2017/04/05 6:22, Keith Fiske wrote: > On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote: >> Please find attached an updated patch. >> Following has been accomplished in this update: >> >> 1. A new partition can be added after default partition if there are no >> conflicting rows in default partition. >> 2. Solved the crash reported earlier. > > Installed and compiled against commit > 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017 > -0400) without any issue > > However, running your original example, I'm getting this error > > keith@keith=# CREATE TABLE list_partitioned ( > keith(# a int > keith(# ) PARTITION BY LIST (a); > CREATE TABLE > Time: 4.933 ms > keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned FOR > VALUES IN (DEFAULT); > CREATE TABLE > Time: 3.492 ms > keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR VALUES > IN (4,5); > ERROR: unrecognized node type: 216 It seems like the new ExecPrepareCheck should be used in the place of ExecPrepareExpr in the code added in check_new_partition_bound(). > Also, I'm still of the opinion that denying future partitions of values in > the default would be a cause of confusion. In order to move the data out of > the default and into a proper child it would require first removing that > data from the default, storing it elsewhere, creating the child, then > moving it back. If it's only a small amount of data it may not be a big > deal, but if it's a large amount, that could cause quite a lot of > contention if done in a single transaction. Either that or the user would > have to deal with data existing in the table, disappearing, then > reappearing again. > > This also makes it harder to migrate an existing table easily. Essentially > no child tables for a large, existing data set could ever be created > without going through one of the two situations above. I thought of the following possible way to allow future partitions when the default partition exists which might contain rows that belong to the newly created partition (unfortunately, nothing that we could implement at this point for v10): Suppose you want to add a new partition which will accept key=3 rows. 1. If no default partition exists, we're done; no key=3 rows would have been accepted by any of the table's existing partitions, so no need to move any rows 2. Default partition exists which might contain key=3 rows, which we'll need to move. If we do this in the same transaction, as you say, it might result in unnecessary unavailability of table's data. So, it's better to delegate that responsibility to a background process. The current transaction will only add the new key=3 partition, so any key=3 rows will be routed to the new partition from this point on. But we haven't updated the default partition's constraint yet to say that it no longer contains key=3 rows (constraint that the planner consumes), so it will continue to be scanned for queries that request key=3 rows (there should be some metadata to indicate that the default partition's constraint is invalid), along with the new partition. 3. A background process receives a "work item" requesting it to move all key=3 rows from the default partition heap to the new partition's heap. The movement occurs without causing the table to become unavailable; once all rows have been moved, we momentarily lock the table to update the default partition's constraint to mark it valid, so that it will no longer be accessed by queries that want to see key=3 rows. Regarding 2, there is a question of whether it should not be possible for the row movement to occur in the same transaction. Somebody may want that to happen because they chose to run the command during a maintenance window, when the table's becoming unavailable is not an issue. In that case, we need to think of the interface more carefully. Regarding 3, I think the new autovacuum work items infrastructure added by the following commit looks very promising: * BRIN auto-summarization * https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=7526e10224f0792201e99631567bbe44492bbde4 > However, thinking through this, I'm not sure I can see a solution without > the global index support. If this restriction is removed, there's still an > issue of data duplication after the necessary child table is added. So > guess it's a matter of deciding which user experience is better for the > moment? I'm not sure about the fate of this particular patch for v10, but until we implement a solution to move rows and design appropriate interface for the same, we could error out if moving rows is required at all, like the patch does. Could you briefly elaborate why you think the lack global index support would be a problem in this regard? I agree that some design is required here to implement a solution redistribution of rows; not only in the context of supporting the notion of
Re: [HACKERS] Adding support for Default partition in partitioning
On Wed, Apr 5, 2017 at 10:59 AM, Amit Langotewrote: > On 2017/04/05 6:22, Keith Fiske wrote: > > On Tue, Apr 4, 2017 at 9:30 AM, Rahila Syed wrote: > >> Please find attached an updated patch. > >> Following has been accomplished in this update: > >> > >> 1. A new partition can be added after default partition if there are no > >> conflicting rows in default partition. > >> 2. Solved the crash reported earlier. > > > > Installed and compiled against commit > > 60a0b2ec8943451186dfa22907f88334d97cb2e0 (Date: Tue Apr 4 12:36:15 2017 > > -0400) without any issue > > > > However, running your original example, I'm getting this error > > > > keith@keith=# CREATE TABLE list_partitioned ( > > keith(# a int > > keith(# ) PARTITION BY LIST (a); > > CREATE TABLE > > Time: 4.933 ms > > keith@keith=# CREATE TABLE part_default PARTITION OF list_partitioned > FOR > > VALUES IN (DEFAULT); > > CREATE TABLE > > Time: 3.492 ms > > keith@keith=# CREATE TABLE part_1 PARTITION OF list_partitioned FOR > VALUES > > IN (4,5); > > ERROR: unrecognized node type: 216 > > It seems like the new ExecPrepareCheck should be used in the place of > ExecPrepareExpr in the code added in check_new_partition_bound(). > > > Also, I'm still of the opinion that denying future partitions of values > in > > the default would be a cause of confusion. In order to move the data out > of > > the default and into a proper child it would require first removing that > > data from the default, storing it elsewhere, creating the child, then > > moving it back. If it's only a small amount of data it may not be a big > > deal, but if it's a large amount, that could cause quite a lot of > > contention if done in a single transaction. Either that or the user would > > have to deal with data existing in the table, disappearing, then > > reappearing again. > > > > This also makes it harder to migrate an existing table easily. > Essentially > > no child tables for a large, existing data set could ever be created > > without going through one of the two situations above. > > I thought of the following possible way to allow future partitions when > the default partition exists which might contain rows that belong to the > newly created partition (unfortunately, nothing that we could implement at > this point for v10): > > Suppose you want to add a new partition which will accept key=3 rows. > > 1. If no default partition exists, we're done; no key=3 rows would have >been accepted by any of the table's existing partitions, so no need to >move any rows > > 2. Default partition exists which might contain key=3 rows, which we'll >need to move. If we do this in the same transaction, as you say, it >might result in unnecessary unavailability of table's data. So, it's >better to delegate that responsibility to a background process. The >current transaction will only add the new key=3 partition, so any key=3 >rows will be routed to the new partition from this point on. But we >haven't updated the default partition's constraint yet to say that it >no longer contains key=3 rows (constraint that the planner consumes), >so it will continue to be scanned for queries that request key=3 rows >(there should be some metadata to indicate that the default partition's >constraint is invalid), along with the new partition. > > 3. A background process receives a "work item" requesting it to move all >key=3 rows from the default partition heap to the new partition's heap. >The movement occurs without causing the table to become unavailable; >once all rows have been moved, we momentarily lock the table to update >the default partition's constraint to mark it valid, so that it will >no longer be accessed by queries that want to see key=3 rows. > > Regarding 2, there is a question of whether it should not be possible for > the row movement to occur in the same transaction. Somebody may want that > to happen because they chose to run the command during a maintenance > window, when the table's becoming unavailable is not an issue. In that > case, we need to think of the interface more carefully. > > Regarding 3, I think the new autovacuum work items infrastructure added by > the following commit looks very promising: > > * BRIN auto-summarization * > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h= > 7526e10224f0792201e99631567bbe44492bbde4 > > > However, thinking through this, I'm not sure I can see a solution without > > the global index support. If this restriction is removed, there's still > an > > issue of data duplication after the necessary child table is added. So > > guess it's a matter of deciding which user experience is better for the > > moment? > > I'm not sure about the fate of this particular patch for v10, but until we > implement a solution to move rows and design appropriate interface for the > same, we could error out if moving rows is required at
Re: [HACKERS] ANALYZE command progress checker
On Tue, Apr 4, 2017 at 2:12 PM, Amit Langotewrote: > On 2017/03/30 17:39, Masahiko Sawada wrote: >> On Wed, Mar 29, 2017 at 5:38 PM, vinayak wrote: > I have updated the patch. >> >> I reviewed v7 patch. >> >> When I ran ANALYZE command to the table having 5 millions rows with 3 >> child tables, the progress report I got shows the strange result. The >> following result was output after sampled all target rows from child >> tables (i.g, after finished acquire_inherited_sample_rows). I think >> the progress information should report 100% complete at this point. >> > > [...] > >> >> I guess the cause of this is that you always count the number of >> sampled rows in acquire_sample_rows staring from 0 for each child >> table. num_rows_sampled is reset whenever starting collecting sample >> rows. >> Also, even if table has child table, the total number of sampling row >> is not changed. I think that main differences between analyzing on >> normal table and on partitioned table is where we fetch the sample row >> from. So I'm not sure if adding "computing inherited statistics" phase >> is desirable. If you want to report progress of collecting sample rows >> on child table I think that you might want to add the information >> showing which child table is being analyzed. > > Two kinds of statistics are collected if the table is a inheritance parent. > > First kind considers the table by itself and calculates regular > single-table statistics using rows sampled from the only available heap > (by calling do_analyze_rel() with inh=false). > > The second kind are called inheritance statistics, which consider rows > sampled from the whole inheritance hierarchy (by calling do_analyze_rel() > with inh=true). Note that we are still collecting statistics for the > parent table, so we not really "analyzing" child tables; we just collect > sample rows from them to calculate whole-tree statistics for the root > parent table. Agreed. > It might still be possible to show the child table being > sampled though. > >> -- >> pg_stat_progress_analyze.num_target_sample_rows is set while ignoring >> the number of rows the target table has actually. So If the table has >> rows less than target number of rows, the num_rows_sampled never reach >> to num_target_sample_rows. >> >> -- >> @@ -1180,6 +1213,9 @@ acquire_sample_rows(Relation onerel, int elevel, >> } >> >> samplerows += 1; >> + >> + /* Report number of rows sampled so far */ >> + >> pgstat_progress_update_param(PROGRESS_ANALYZE_NUM_ROWS_SAMPLED, >> numrows); >> } >> } >> >> I think that it's wrong to use numrows variable to report the progress >> of the number of rows sampled. As the following comment mentioned, we >> first save row into rows[] and increment numrows until numrows < >> targrows. And then we could replace stored sample row with new sampled >> row. That is, after collecting "numrows" rows, analyze can still take >> a long time to get and replace the sample row. To computing progress >> of collecting sample row, IMO reporting the number of blocks we >> scanned so far is better than number of sample rows. Thought? > > We can report progress in terms of individual blocks only inside > acquire_sample_rows(), which seems undesirable when one thinks that we > will be resetting the target for every child table. We should have a > global target that considers all child tables in the inheritance > hierarchy, which maybe is possible if we count them beforehand in > acquire_inheritance_sample_rows(). But why not use target sample rows, > which remains the same for both when we're collecting sample rows from one > table and from the whole inheritance hierarchy. We can keep the count of > already collected rows in a struct that is used across calls for all the > child tables and increment upward from that count when we start collecting > from a new child table. An another option I came up with is that support new pgstat progress function, say pgstat_progress_incr_param, which increments index'th member in st_progress_param[]. That way we just need to report a delta using that function. > >>> /* >>> * The first targrows sample rows are simply copied into the >>> * reservoir. Then we start replacing tuples in the sample >>> * until we reach the end of the relation. This algorithm is >>> * from Jeff Vitter's paper (see full citation below). It >>> * works by repeatedly computing the number of tuples to skip >>> * before selecting a tuple, which replaces a randomly chosen >>> * element of the reservoir (current set of tuples). At all >>> * times the reservoir is a true random sample of the tuples >>> * we've passed over so far, so when we fall off the end of >>> * the relation we're done. >>> */ > > It seems that we could use samplerows instead
Re: [HACKERS] strange parallel query behavior after OOM crashes
Looking further in this context, number of active parallel workers is: parallel_register_count - parallel_terminate_count Can active workers ever be greater than max_parallel_workers, I think no. Then why should there be greater than check in the following condition: if (parallel && (BackgroundWorkerData->parallel_register_count - BackgroundWorkerData->parallel_terminate_count) >= max_parallel_workers) I feel there should be an assert if (BackgroundWorkerData->parallel_register_count - BackgroundWorkerData->parallel_terminate_count) > max_parallel_workers) And the check could be if (parallel && (active_parallel_workers == max_parallel_workers)) then do not register new parallel wokers and return. There should be no tolerance for the case when active_parallel_workers > max_parallel_workers. After all that is the purpose of max_parallel_workers. Is it like multiple backends trying to register parallel workers at the same time, for which the greater than check should be present? Thoughts? Regards, Neha
Re: [HACKERS] Refactoring identifier checks to consistently use strcmp
> On 04 Apr 2017, at 05:52, Alvaro Herrerawrote: > > Daniel Gustafsson wrote: >> Testing DefElem options is done with both strcmp() and pg_strcasecmp() a bit >> mixed. Since the option defnames are all lowercased, either via IDENT, >> keyword >> rules or “by hand” with makeString(), using strcmp() is safe (and assumed to >> be >> so in quite a lot of places). >> >> While it’s not incorrect per se to use pg_strcasecmp(), it has the potential >> to >> hide a DefElem created with a mixed-case defname where it in other places is >> expected to be in lowercase, which may lead to subtle bugs. >> >> The attached patch refactors to use strcmp() consistently for option >> processing >> in the command code as a pre-emptive belts+suspenders move against such >> subtle >> bugs and to make the code more consistent. Also reorders a few checks to >> have >> all in the same “format” and removes a comment related to the above. >> >> Tested with randomizing case on options in make check (not included in >> patch). > > Does it work correctly in the Turkish locale? Yes, running make check with random case defnames under tr_TR works fine. cheers ./daniel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variable substitution in psql backtick expansion
Hello Pavel, The expression evaluation is interesting question, but there is a workaround - we can use \gset already. Yes, that is a good point. It is a little bit inconvenient because it requires a dummy variable name each time for testing. SELECT whatever AS somename \gset \if :somename But this is an already functional solution to use server-side expressions, so there is no hurry. What is more important, because there is not any workaround, is detection if some variable exists or not. So possibilities 1. \if defined varname Yep, and as Tom pointed it should handle NOT as well. My issue with this version is that Lane Tom convinced me some time ago that client side expressions should look like SQL expressions, so that everything in the script is somehow in the same language. I think that he made a good point. However "defined varname" is definitely not an SQL expression, so I do not find that "intuitive", for a given subjective idea of intuitive. Then there is the question of simple comparisons, which would also make sense client-side, eg simple things like: \if :VERSION_NUM >= 11 Should not need to be executed on the server. 2. \ifdefined or \ifdef varname I think that we want to avoid that if possible, but it is a cpp-like possibility. This approach does not allow to support comparisons. 3. \if :?varname Tom suggested that there is a special namespace problem with this option. I did not understand what is the actual issue. I like first two, and I can live with @3 - although it is not intuitive For me @3 is neither worth nor better than the already existing :'varname' and :"varname" hacks, it is consistent with them, so it is just an extension of the existing approach. It seems easy to implement because the substitution would be handled by the lexer, so there is no need for anything special like looking at the first or second word, rewinding, whatever. Basically I agree with everything Tom suggested (indeed, some client side definition & comparison tests are really needed), but not with the proposed prefix syntax because it does not look clean and SQL. -- Fabien. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] proposal: Introduction a commontype as new polymorphic type
Hi I am still little bit unhappy with missing functionality in our generic types. If I write function fx(anyelement, anyelement) returns anyelement postgres=# create or replace function fx(anyelement, anyelement) returns anyelement as $$ select greather($1,$2) $$ language sql; CREATE FUNCTION postgres=# select fx(1,1.1); ERROR: function fx(integer, numeric) does not exist LINE 1: select fx(1,1.1); ^ It fails on basic example. What do you think about introduction new similar polymorphic type, that will use common type for real parameters? some like create or replace function fx(anyvalue, anyvalue) returns anyvalue create or replace function fx(anyvalue[]) returns anyvalue Using "any" and casting inside function has significant negative impact on performance Regards Pavel
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-04 06:18:04 +, Tsunakawa, Takayuki wrote: > From: Tatsuo Ishii [mailto:is...@sraoss.co.jp] > > It's too late. Someone has already moved the patch to the next CF (for > > PostgreSQL 11). > > Yes, but this patch will be necessary by the final release of PG 10 if the > libpq batch/pipelining is committed in PG 10. > > I marked this as ready for committer in the next CF, so that some committer > can pick up this patch and consider putting it in PG 10. If you decide to > modify the patch, please change the status. Given the concern raised in http://archives.postgresql.org/message-id/12207.1491228316%40sss.pgh.pa.us I don't see this being ready for committer. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
From: Andres Freund [mailto:and...@anarazel.de] Given the concern raised in > http://archives.postgresql.org/message-id/12207.1491228316%40sss.pgh.p > a.us > I don't see this being ready for committer. If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute starts and stops the timer), then it's a concern and the patch should not be ready for committer. However, the current patch is not like that -- it seems to do what others in this thread are expecting. Regards Takayuki Tsunakawa -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
>> If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute >> starts and stops the timer), then it's a concern and the patch should not be >> ready for committer. However, the current patch is not like that -- it >> seems to do what others in this thread are expecting. > > Oh, interesting - I kind of took the author's statement as, uh, > authoritative ;). A quick look over the patch confirms your > understanding. Yes, Tsunakawa-san is correct. Sorry for confusion. > I think the code needs a few clarifying comments around this, but > otherwise seems good. Not restarting the timeout in those cases > obviously isn't entirely "perfect"/"correct", but a tradeoff - the > comments should note that. > > Tatsuo-san, do you want to change those, and push? I can otherwise. Andres, If you don't mind, could you please fix the comments and push it. > Unfortunately I can't move the patch back to the current CF, but I guess > we can just mark it as committed in the next. That will be great. Best regards, -- Tatsuo Ishii SRA OSS, Inc. Japan English: http://www.sraoss.co.jp/index_en.php Japanese:http://www.sraoss.co.jp -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] multivariate statistics (v25)
On 1 April 2017 at 04:25, David Rowleywrote: > I've attached an updated patch. I've made another pass at this and ended up removing the tryextstats variable. We now only try to use extended statistics when clauselist_selectivity() is given a valid RelOptInfo with rtekind == RTE_RELATION, and of course, it must also have some extended stats defined too. I've also cleaned up a few more comments, many of which I managed to omit updating when I refactored how the selectivity estimates ties into clauselist_selectivity() I'm quite happy with all of this now, and would also be happy for other people to take a look and comment. As a reviewer, I'd be marking this ready for committer, but I've moved a little way from just reviewing this now, having spent two weeks hacking at it. The latest patch is attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services mv_functional-deps_2017-04-04.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Statement timeout behavior in extended queries
On 2017-04-04 06:35:00 +, Tsunakawa, Takayuki wrote: > From: Andres Freund [mailto:and...@anarazel.de] > Given the concern raised in > > http://archives.postgresql.org/message-id/12207.1491228316%40sss.pgh.p > > a.us > > I don't see this being ready for committer. > > If what Tatsuo-san said to Tom is correct (i.e. each Parse/Bind/Execute > starts and stops the timer), then it's a concern and the patch should not be > ready for committer. However, the current patch is not like that -- it seems > to do what others in this thread are expecting. Oh, interesting - I kind of took the author's statement as, uh, authoritative ;). A quick look over the patch confirms your understanding. I think the code needs a few clarifying comments around this, but otherwise seems good. Not restarting the timeout in those cases obviously isn't entirely "perfect"/"correct", but a tradeoff - the comments should note that. Tatsuo-san, do you want to change those, and push? I can otherwise. Unfortunately I can't move the patch back to the current CF, but I guess we can just mark it as committed in the next. - Andres -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Some never executed code regarding the table sync worker
On Tue, Apr 4, 2017 at 6:26 PM, Kyotaro HORIGUCHIwrote: > Hi, > > At Sat, 1 Apr 2017 02:35:00 +0900, Masahiko Sawada > wrote in
Re: [HACKERS] Parallel Append implementation
On 4 April 2017 at 01:47, Andres Freundwrote: >> +typedef struct ParallelAppendDescData >> +{ >> + LWLock pa_lock;/* mutual exclusion to choose >> next subplan */ >> + int pa_first_plan; /* plan to choose while >> wrapping around plans */ >> + int pa_next_plan; /* next plan to choose by any >> worker */ >> + >> + /* >> + * pa_finished : workers currently executing the subplan. A worker >> which >> + * finishes a subplan should set pa_finished to true, so that no new >> + * worker picks this subplan. For non-partial subplan, a worker which >> picks >> + * up that subplan should immediately set to true, so as to make sure >> + * there are no more than 1 worker assigned to this subplan. >> + */ >> + boolpa_finished[FLEXIBLE_ARRAY_MEMBER]; >> +} ParallelAppendDescData; > > >> +typedef ParallelAppendDescData *ParallelAppendDesc; > > Pointer hiding typedefs make this Andres sad. Yeah .. was trying to be consistent with other parts of code where we have typedefs for both structure and a pointer to that structure. > > > >> @@ -291,6 +362,276 @@ ExecReScanAppend(AppendState *node) >> if (subnode->chgParam == NULL) >> ExecReScan(subnode); >> } >> + >> + if (padesc) >> + { >> + padesc->pa_first_plan = padesc->pa_next_plan = 0; >> + memset(padesc->pa_finished, 0, sizeof(bool) * node->as_nplans); >> + } >> + > > Is it actually guaranteed that none of the parallel workers are doing > something at that point? ExecReScanAppend() would be called by ExecReScanGather(). ExecReScanGather() shuts down all the parallel workers before calling its child node (i.e. ExecReScanAppend). >> +static bool >> +exec_append_parallel_next(AppendState *state) >> +{ >> + ParallelAppendDesc padesc = state->as_padesc; >> + int whichplan; >> + int initial_plan; >> + int first_partial_plan = ((Append >> *)state->ps.plan)->first_partial_plan; >> + boolfound; >> + >> + Assert(padesc != NULL); >> + >> + /* Backward scan is not supported by parallel-aware plans */ >> + Assert(ScanDirectionIsForward(state->ps.state->es_direction)); >> + >> + /* The parallel leader chooses its next subplan differently */ >> + if (!IsParallelWorker()) >> + return exec_append_leader_next(state); > > It's a bit weird that the leader's case does is so separate, and does > it's own lock acquisition. Since we wanted to prevent it from taking the most expensive non-partial plans first , thought it would be better to keep its logic simple and separate, so could not merge it in the main logic code. > > >> + found = false; >> + for (whichplan = initial_plan; whichplan != PA_INVALID_PLAN;) >> + { >> + /* >> + * Ignore plans that are already done processing. These also >> include >> + * non-partial subplans which have already been taken by a >> worker. >> + */ >> + if (!padesc->pa_finished[whichplan]) >> + { >> + found = true; >> + break; >> + } >> + >> + /* >> + * Note: There is a chance that just after the child plan node >> is >> + * chosen above, some other worker finishes this node and sets >> + * pa_finished to true. In that case, this worker will go >> ahead and >> + * call ExecProcNode(child_node), which will return NULL tuple >> since it >> + * is already finished, and then once again this worker will >> try to >> + * choose next subplan; but this is ok : it's just an extra >> + * "choose_next_subplan" operation. >> + */ > > IIRC not all node types are safe against being executed again when > they've previously returned NULL. That's why e.g. nodeMaterial.c > contains the following blurb: > /* > * If necessary, try to fetch another row from the subplan. > * > * Note: the eof_underlying state variable exists to short-circuit > further > * subplan calls. It's not optional, unfortunately, because some plan > * node types are not robust about being called again when they've > already > * returned NULL. > */ This scenario is different from the parallel append scenario described by my comment. A worker sets pa_finished to true only when it itself gets a NULL tuple for a given subplan. So in exec_append_parallel_next(), suppose a worker W1 finds a subplan with pa_finished=false. So it chooses it. Now a different worker W2 sets this subplan's pa_finished=true because W2 has got a NULL tuple. But W1 hasn't yet got a NULL tuple. If it had got a NULL tuple earlier, it would have itself set pa_finished to true, and then it
Re: [HACKERS] Variable substitution in psql backtick expansion
2017-04-04 9:53 GMT+02:00 Fabien COELHO: > > Hello Pavel, > > The expression evaluation is interesting question, but there is a >> workaround - we can use \gset already. >> > > Yes, that is a good point. It is a little bit inconvenient because it > requires a dummy variable name each time for testing. > > SELECT whatever AS somename \gset > \if :somename > > But this is an already functional solution to use server-side expressions, > so there is no hurry. > > What is more important, because there is not any workaround, is detection >> if some variable exists or not. >> >> So possibilities >> >> 1. \if defined varname >> > > Yep, and as Tom pointed it should handle NOT as well. > > My issue with this version is that Lane Tom convinced me some time ago > that client side expressions should look like SQL expressions, so that > everything in the script is somehow in the same language. I think that he > made a good point. > > However "defined varname" is definitely not an SQL expression, so I do not > find that "intuitive", for a given subjective idea of intuitive. > > Then there is the question of simple comparisons, which would also make > sense client-side, eg simple things like: > > \if :VERSION_NUM >= 11 > > Should not need to be executed on the server. > > 2. \ifdefined or \ifdef varname >> > > I think that we want to avoid that if possible, but it is a cpp-like > possibility. This approach does not allow to support comparisons. > > 3. \if :?varname >> > > Tom suggested that there is a special namespace problem with this option. > I did not understand what is the actual issue. > > I like first two, and I can live with @3 - although it is not intuitive >> > > For me @3 is neither worth nor better than the already existing :'varname' > and :"varname" hacks, it is consistent with them, so it is just an > extension of the existing approach. > > It seems easy to implement because the substitution would be handled by > the lexer, so there is no need for anything special like looking at the > first or second word, rewinding, whatever. > > Basically I agree with everything Tom suggested (indeed, some client side > definition & comparison tests are really needed), but not with the proposed > prefix syntax because it does not look clean and SQL. I don't need a full SQL expression in \if commands ever. I prefer some simple functional language here implemented only on client side - the code from pgbench can be used maybe \if fx( variable | constant [, ... ] ) the buildin functions can be only basic defined, undefined, equal, greater, less \if defined(varname) \if geq(VERSION_NUM, 11000) But this question is important - there is not a workaround postgres=# select :xxx postgres-# ; ERROR: syntax error at or near ":" LINE 1: select :xxx ^ postgres=# \if :xxx unrecognized value ":xxx" for "\if expression": boolean expected postgres@# > > > -- > Fabien. >
Re: [HACKERS] ANALYZE command progress checker
On 2017/04/04 15:30, Masahiko Sawada wrote: >> We can report progress in terms of individual blocks only inside >> acquire_sample_rows(), which seems undesirable when one thinks that we >> will be resetting the target for every child table. We should have a >> global target that considers all child tables in the inheritance >> hierarchy, which maybe is possible if we count them beforehand in >> acquire_inheritance_sample_rows(). But why not use target sample rows, >> which remains the same for both when we're collecting sample rows from one >> table and from the whole inheritance hierarchy. We can keep the count of >> already collected rows in a struct that is used across calls for all the >> child tables and increment upward from that count when we start collecting >> from a new child table. > > An another option I came up with is that support new pgstat progress > function, say pgstat_progress_incr_param, which increments index'th > member in st_progress_param[]. That way we just need to report a delta > using that function. That's an interesting idea. It could be made to work and would not require changing the interface of AcquireSampleRowsFunc, which seems very desirable. /* * The first targrows sample rows are simply copied into the * reservoir. Then we start replacing tuples in the sample * until we reach the end of the relation. This algorithm is * from Jeff Vitter's paper (see full citation below). It * works by repeatedly computing the number of tuples to skip * before selecting a tuple, which replaces a randomly chosen * element of the reservoir (current set of tuples). At all * times the reservoir is a true random sample of the tuples * we've passed over so far, so when we fall off the end of * the relation we're done. */ >> >> It seems that we could use samplerows instead of numrows to count the >> progress (if we choose to count progress in terms of sample rows collected). >> > > I guess it's hard to count progress in terms of sample rows collected > even if we use samplerows instead, because samplerows can be > incremented independently of the target number of sampling rows. The > samplerows can be incremented up to the total number of rows of > relation. Hmm, you're right. It could be counted with a separate variable initialized to 0 and incremented every time we decide to add a row to the final set of sampled rows, although different implementations of AcquireSampleRowsFunc have different ways of deciding if a given row will be part of the final set of sampled rows. On the other hand, if we decide to count progress in terms of blocks as you suggested afraid, I'm afraid that FDWs won't be able to report the progress. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] postgres_fdw bug in 9.6
On 2017/04/04 14:38, Ashutosh Bapat wrote: Probably we should use "could not be created" instead of "was not created" in "... a local path suitable for EPQ checks was not created". Done. "outer_path should not require relations from inner_path" may be reworded as "outer paths should not be parameterized by the inner relations". "neither path should require relations from the other path" may be reworded as "neither path should be parameterized by the the other joining relation". Done. I used "input" instead of "joining" in the latter, though. Why JOIN_RIGHT is being treated differently from JOIN_LEFT? We should be able to create a nested loop join for JOIN_RIGHT? +case JOIN_RIGHT: +case JOIN_FULL: I don't think so, because nestloop joins aren't supported for JOIN_RIGHT. See ExecInitNestLoop(). Hmm, I see in match_unsorted_outer() 1254 case JOIN_RIGHT: 1255 case JOIN_FULL: 1256 nestjoinOK = false; 1257 useallclauses = true; 1258 break; Yeah, I should have pointed that out as well. I rebased the patch also. Please find attached an updated version of the patch. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/expected/postgres_fdw.out --- b/contrib/postgres_fdw/expected/postgres_fdw.out *** *** 1629,1650 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 !-> Merge Join Output: t1.c1, t1.c3, t1.*, t2.c1, t2.* ! Merge Cond: (t1.c1 = t2.c1) ! -> Sort Output: t1.c1, t1.c3, t1.* !Sort Key: t1.c1 !-> Foreign Scan on public.ft1 t1 ! Output: t1.c1, t1.c3, t1.* ! Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE ! -> Sort Output: t2.c1, t2.* !Sort Key: t2.c1 !-> Foreign Scan on public.ft2 t2 ! Output: t2.c1, t2.* ! Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ! (23 rows) SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1; c1 | c1 --- 1629,1644 Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS LAST FOR UPDATE OF r1 !-> Nested Loop Output: t1.c1, t1.c3, t1.*, t2.c1, t2.* ! Join Filter: (t1.c1 = t2.c1) ! -> Foreign Scan on public.ft1 t1 Output: t1.c1, t1.c3, t1.* !Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" FOR UPDATE ! -> Foreign Scan on public.ft2 t2 Output: t2.c1, t2.* !Remote SQL: SELECT "C 1", c2, c3, c4, c5, c6, c7, c8 FROM "S 1"."T 1" ! (17 rows) SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t1.c1 OFFSET 100 LIMIT 10 FOR UPDATE OF t1; c1 | c1 *** *** 1673,1694 SELECT t1.c1, t2.c1 FROM ft1 t1 JOIN ft2 t2 ON (t1.c1 = t2.c1) ORDER BY t1.c3, t Output: t1.c1, t2.c1, t1.c3, t1.*, t2.* Relations: (public.ft1 t1) INNER JOIN (public.ft2 t2) Remote SQL: SELECT r1."C 1", r1.c3, CASE WHEN (r1.*)::text IS NOT NULL THEN ROW(r1."C 1", r1.c2, r1.c3, r1.c4, r1.c5, r1.c6, r1.c7, r1.c8) END, r2."C 1", CASE WHEN (r2.*)::text IS NOT NULL THEN ROW(r2."C 1", r2.c2, r2.c3, r2.c4, r2.c5, r2.c6, r2.c7, r2.c8) END FROM ("S 1"."T 1" r1 INNER JOIN "S 1"."T 1" r2 ON (((r1."C 1" = r2."C 1" ORDER BY r1.c3 ASC NULLS LAST, r1."C 1" ASC NULLS
Re: [HACKERS] Some never executed code regarding the table sync worker
Hi, At Sat, 1 Apr 2017 02:35:00 +0900, Masahiko Sawadawrote in
Re: [HACKERS] Compiler warning in costsize.c
On 4 April 2017 at 16:22, Michael Paquierwrote: > Hi all, > > In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can > generate warnings. Here is for example with MSVC: > src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : > unreferen > ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj] > src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : > unreferen > ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj] > > a9c074ba7 has done an effort, but a bit more is needed as the attached. Thanks for writing the patch. I wondering if it would be worth simplifying it a little to get rid of the double #ifdefs, as per attached. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services costsize-fix-warning_drowley.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Covering + unique indexes.
01.04.2017 02:31, Peter Geoghegan: * index_truncate_tuple() should have as an argument the number of attributes. No need to "#include utils/rel.h" that way. Will fix. * I think that we should store this (the number of attributes), and use it directly when comparing, per my remarks to Tom over on that other thread. We should also use the free bit within IndexTupleData.t_info, to indicate that the IndexTuple was truncated, just to make it clear to everyone that might care that that's how these truncated IndexTuples need to be represented. Doing this would have no real impact on your patch, because for you this will be 100% redundant. It will help external tools, and perhaps another, more general suffix truncation patch that comes in the future. We should try very hard to have a future-proof on-disk representation. I think that this is quite possible. To be honest, I think that it'll make the patch overcomplified, because this exact patch has nothing to do with suffix truncation. Although, we can add any necessary flags if this work will be continued in the future. * I suggest adding a "can't happen" defensive check + error that checks that the tuple returned by index_truncate_tuple() is sized <= the original. This cannot be allowed to ever happen. (Not that I think it will.) There is already an assertion. Assert(IndexTupleSize(newitup) <= IndexTupleSize(olditup)); Do you think it is not enough? * I see a small bug. You forgot to teach _bt_findsplitloc() about truncation. It does this currently, which you did not update: /* * The first item on the right page becomes the high key of the left page; * therefore it counts against left space as well as right space. */ leftfree -= firstrightitemsz; I think that this accounting needs to be fixed. Could you explain, what's wrong with this accounting? We may expect to take more space on the left page, than will be taken after highkey truncation. But I don't see any problem here. * Note sure about one thing. What's the reason for this change? - /* Log left page */ - if (!isleaf) - { - /* -* We must also log the left page's high key, because the right -* page's leftmost key is suppressed on non-leaf levels. Show it -* as belonging to the left page buffer, so that it is not stored -* if XLogInsert decides it needs a full-page image of the left -* page. -*/ - itemid = PageGetItemId(origpage, P_HIKEY); - item = (IndexTuple) PageGetItem(origpage, itemid); - XLogRegisterBufData(0, (char *) item, MAXALIGN(IndexTupleSize(item))); - } + /* +* We must also log the left page's high key, because the right +* page's leftmost key is suppressed on non-leaf levels. Show it +* as belonging to the left page buffer, so that it is not stored +* if XLogInsert decides it needs a full-page image of the left +* page. +*/ + itemid = PageGetItemId(origpage, P_HIKEY); + item = (IndexTuple) PageGetItem(origpage, itemid); + XLogRegisterBufData(0, (char *) item, MAXALIGN(IndexTupleSize(item))); Is this related to the problem that you mentioned to me that you'd fixed when we spoke in person earlier today? You said something about WAL logging, but I don't recall any details. I don't remember seeing this change in prior versions. Anyway, whatever the reason for doing this on the leaf level now, the comments should be updated to explain it. This change related to the bug described in this message. https://www.postgresql.org/message-id/20170330192706.GA2565%40e733.localdomain After fix it is not reproducible. I will update comments in the next patch. * Speaking of WAL-logging, I think that there is another bug in btree_xlog_split(). You didn't change this existing code at all: /* * On leaf level, the high key of the left page is equal to the first key * on the right page. */ if (isleaf) { ItemId hiItemId = PageGetItemId(rpage, P_FIRSTDATAKEY(ropaque)); left_hikey = PageGetItem(rpage, hiItemId); left_hikeysz = ItemIdGetLength(hiItemId); } It seems like this was missed when you changed WAL-logging, since you do something for this on the logging side, but not here, on the replay side. No? I changed it. Now we always use highkey saved in xlog. This code don't needed anymore and can be deleted. Thank you for the notice. I will send updated patch today. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] logical decoding of two-phase transactions
> On 4 Apr 2017, at 04:23, Masahiko Sawadawrote: > > > I reviewed this patch but when I tried to build contrib/test_decoding > I got the following error. > Thanks! Yes, seems that 18ce3a4a changed ProcessUtility_hook signature. Updated. > There are still some unnecessary code in v5 patch. Actually second diff isn’t intended to be part of the patch, I've just shared the way I ran regression test suite through the 2pc decoding changing all commits to prepare/commits where commits happens only after decoding of prepare is finished (more details in my previous message in this thread). That is just argument against Andres concern that prepared transaction is able to deadlock with decoding process — at least no such cases in regression tests. And that concern is main thing blocking this patch. Except explicit catalog locks in prepared tx nobody yet found such cases and it is hard to address or argue about. Stas Kelvich Postgres Professional: http://www.postgrespro.com The Russian Postgres Company logical_twophase_v6.diff Description: Binary data logical_twophase_regresstest.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Parallel Bitmap Heap Scan - Prefetch pages are not updated properly
While analyzing the coverage for the prefetching part, I found an issue that prefetch_pages were not updated properly while executing in parallel mode. Attached patch fixes the same. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com parallel_bitmap_prefetch_fix.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
On Sun, Apr 2, 2017 at 4:14 AM, Ashutosh Sharmawrote: > > Please note that these patches needs to be applied on top of [1]. > Few more review comments: 1. - page = BufferGetPage(so->hashso_curbuf); + blkno = so->currPos.currPage; + if (so->hashso_bucket_buf == so->currPos.buf) + { + buf = so->currPos.buf; + LockBuffer(buf, BUFFER_LOCK_SHARE); + page = BufferGetPage(buf); + } Here, you are assuming that only bucket page can be pinned, but I think that assumption is not right. When _hash_kill_items() is called before moving to next page, there could be a pin on the overflow page. You need some logic to check if the buffer is pinned, then just lock it. I think once you do that, it might not be convinient to release the pin at the end of this function. Add some comments on top of _hash_kill_items to explain the new changes or say some thing like "See _bt_killitems for details" 2. + /* + * We save the LSN of the page as we read it, so that we know whether it + * safe to apply LP_DEAD hints to the page later. This allows us to drop + * the pin for MVCC scans, which allows vacuum to avoid blocking. + */ + so->currPos.lsn = PageGetLSN(page); + The second part of above comment doesn't make sense because vacuum's will still be blocked because we hold the pin on primary bucket page. 3. + { + /* + * No more matching tuples were found. return FALSE + * indicating the same. Also, remember the prev and + * next block number so that if fetching tuples using + * cursor we remember the page from where to start the + * scan. + */ + so->currPos.prevPage = (opaque)->hasho_prevblkno; + so->currPos.nextPage = (opaque)->hasho_nextblkno; You can't read opaque without having lock and by this time it has already been released. Also, I think if you want to save position for cursor movement, then you need to save the position of last bucket when scan completes the overflow chain, however as you have written it will be always invalid block number. I think there is similar problem with prevblock number. 4. +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum, + OffsetNumber maxoff, ScanDirection dir) +{ + HashScanOpaque so = (HashScanOpaque) scan->opaque; + IndexTuple itup; + int itemIndex; + + if (ScanDirectionIsForward(dir)) + { + /* load items[] in ascending order */ + itemIndex = 0; + + /* new page, relocate starting position by binary search */ + offnum = _hash_binsearch(page, so->hashso_sk_hash); What is the need to find offset number when this function already has that as an input parameter? 5. +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum, + OffsetNumber maxoff, ScanDirection dir) I think maxoff is not required to be passed a parameter to this function as you need it only for forward scan. You can compute it locally. 6. +_hash_load_qualified_items(IndexScanDesc scan, Page page, OffsetNumber offnum, + OffsetNumber maxoff, ScanDirection dir) { .. + if (ScanDirectionIsForward(dir)) + { .. + while (offnum <= maxoff) { .. + if (so->hashso_sk_hash == _hash_get_indextuple_hashkey(itup) && + _hash_checkqual(scan, itup)) + { + /* tuple is qualified, so remember it */ + _hash_saveitem(so, itemIndex, offnum, itup); + itemIndex++; + } + + offnum = OffsetNumberNext(offnum); .. Why are you traversing the whole page even when there is no match? There is a similar problem in backward scan. I think this is blunder. 7. + if (so->currPos.buf == so->hashso_bucket_buf || + so->currPos.buf == so->hashso_split_bucket_buf) + { + so->currPos.prevPage = InvalidBlockNumber; + LockBuffer(so->currPos.buf, BUFFER_LOCK_UNLOCK); + } + else + { + so->currPos.prevPage = (opaque)->hasho_prevblkno; + _hash_relbuf(rel, so->currPos.buf); + } + + so->currPos.nextPage = (opaque)->hasho_nextblkno; What makes you think it is safe read opaque after releasing the lock? -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] asynchronous execution
Hello, At Sun, 2 Apr 2017 12:21:14 -0400, Corey Huinkerwrote in
Re: [HACKERS] Page Scan Mode in Hash Index
On Mon, Mar 27, 2017 at 7:04 PM, Ashutosh Sharmawrote: > > My guess (which could be wrong) is that so->hashso_bucket_buf = >> InvalidBuffer should be moved back up higher in the function where it >> was before, just after the first if statement, and that the new >> condition so->hashso_bucket_buf == so->currPos.buf on the last >> _hash_dropbuf() should be removed. If that's not right, then I think >> I need somebody to explain why not. > > Okay, as i mentioned above, in case of page scan mode we only keep pin > on a bucket buf. There won't be any case where we will be having pin > on overflow buf at the end of scan. > What if mark the buffer as invalid after releasing the pin? We are already doing it that way in btree code, refer _bt_drop_lock_and_maybe_pin(). I think if we do that way, then we can do what Robert is suggesting. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] FDW and parallel execution
Hi, At Sun, 02 Apr 2017 16:30:24 +0300, Konstantin Knizhnikwrote in <58e0fcf0.2070...@postgrespro.ru> > Hi hackers and personally Robet (you are the best expert in both > areas). > I want to ask one more question concerning parallel execution and FDW. > Below are two plans for the same query (TPC-H Q5): one for normal > tables, another for FDW to vertical representation of the same data. > FDW supports analyze function and is expected to produce the similar > statistic as for original tables. > The plans look very similar, but first one is parallel and second - > not. > My FDW provides implementation for IsForeignScanParallelSafe which > returns true. > I wonder what can prevent optimizer from using parallel plan in this > case? Parallel execution requires partial paths. It's the work for GetForeignPaths of your FDW. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] UPDATE of partition key
On 3 April 2017 at 17:13, Amit Langotewrote: > Hi Amit, > > Thanks for updating the patch. Since ddl.sgml got updated on Saturday, > patch needs a rebase. Rebased now. > >> On 31 March 2017 at 16:54, Amit Khandekar wrote: >>> On 31 March 2017 at 14:04, Amit Langote >>> wrote: On 2017/03/28 19:12, Amit Khandekar wrote: > In the section 5.11 "Partitioning" => subsection 5 "Caveats", I have > removed the caveat of not being able to update partition key. And it > is now replaced by the caveat where an update/delete operations can > silently miss a row when there is a concurrent UPDATE of partition-key > happening. Hmm, how about just removing the "partition-changing updates are disallowed" caveat from the list on the 5.11 Partitioning page and explain the concurrency-related caveats on the UPDATE reference page? >>> >>> IMHO this caveat is better placed in Partitioning chapter to emphasize >>> that it is a drawback specifically in presence of partitioning. > > I mean we fixed things for declarative partitioning so it's no longer a > caveat like it is for partitioning implemented using inheritance (in that > the former doesn't require user-defined triggers to implement > row-movement). Seeing the first sentence, that is: > > An UPDATE causes a row to move from one partition to another > if the new value of the row fails to satisfy the implicit partition > constraint of the original partition but there is another partition which > can fit this row. > > which clearly seems to suggest that row-movement, if required, is handled > by the system. So it's not clear why it's in this list. If we want to > describe the limitations of the current implementation, we'll need to > rephrase it a bit. Yes I agree. > How about something like: > For an UPDATE that causes a row to move from one partition to > another due the partition key being updated, the following caveats exist: > presence of concurrent manipulation of the row in question> Now with the slightly changed doc structuring for partitioning in latest master, I have described in the end of section "5.10.2. Declarative Partitioning" this note : --- "Updating the partition key of a row might cause it to be moved into a different partition where this row satisfies its partition constraint." --- And then in the Limitations section, I have replaced the earlier can't-update-partition-key limitation with this new limitation as below : "When an UPDATE causes a row to move from one partition to another, there is a chance that another concurrent UPDATE or DELETE misses this row. Suppose, during the row movement, the row is still visible for the concurrent session, and it is about to do an UPDATE or DELETE operation on the same row. This DML operation can silently miss this row if the row now gets deleted from the partition by the first session as part of its UPDATE row movement. In such case, the concurrent UPDATE/DELETE, being unaware of the row movement, interprets that the row has just been deleted so there is nothing to be done for this row. Whereas, in the usual case where the table is not partitioned, or where there is no row movement, the second session would have identified the newly updated row and carried UPDATE/DELETE on this new row version." --- Further, in the Notes section of update.sgml, I have kept a link to the above limitations section like this : "In the case of a partitioned table, updating a row might cause it to no longer satisfy the partition constraint of the containing partition. In that case, if there is some other partition in the partition tree for which this row satisfies its partition constraint, then the row is moved to that partition. If there isn't such a partition, an error will occur. The error will also occur when updating a partition directly. Behind the scenes, the row movement is actually a DELETE and INSERT operation. However, there is a possibility that a concurrent UPDATE or DELETE on the same row may miss this row. For details see the section Section 5.10.2.3." > +If an UPDATE on a partitioned table causes a row to +move to another partition, it is possible that all row-level +BEFORE UPDATE triggers and all row-level +BEFORE DELETE/INSERT +triggers are applied on the respective partitions in a way that is apparent +from the final state of the updated row. How about dropping "it is possible that" from this sentence? >>> >>> What the statement means is : "It is true that all triggers are >>> applied on the respective partitions; but it is possible that they are >>> applied in a way that is apparent from final state of the updated >>> row". So "possible" applies to "in a way that is apparent..". It >>> means, the user should be aware that all the triggers can change the >>> row and so the final row will be
Re: [HACKERS] wait event documentation
On Mon, Apr 3, 2017 at 11:57 PM, Amit Langotewrote: > By the way, wonder if it wouldn't make sense to take the whole Table 28.1. > Dynamic Statistics Views into a new section (perhaps before 28.2 Viewing > Locks or after), since those views display information different from what > the statistics collector component collects and publishes (those in the > Table 28.2. Collected Statistics Views). It seems a little short for that, doesn't it? I mean, I'm not against it on principle, but we don't want to hoist a 5-line table into a section by itself. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Implementation of SASLprep for SCRAM-SHA-256
On 03/31/2017 10:10 AM, Michael Paquier wrote: On Wed, Mar 8, 2017 at 10:39 PM, Robert Haaswrote: On Tue, Mar 7, 2017 at 10:01 PM, Michael Paquier wrote: I kinda hope Heikki is going to step up to the plate here, because I think he understands most of it already. The second person who knows a good deal about NFKC is Ishii-san. Attached is a rebased patch. Thanks, I'm looking at this now. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making clausesel.c Smarter
On 4 April 2017 at 13:35, Claudio Freirewrote: > On Mon, Apr 3, 2017 at 9:19 PM, Claudio Freire wrote: >> On Mon, Apr 3, 2017 at 8:52 PM, David Rowley >> wrote: One last observation: +/* + * An IS NOT NULL test is a no-op if there's any other strict quals, + * so if that's the case, then we'll only apply this, otherwise we'll + * ignore it. + */ +else if (cslist->selmask == CACHESEL_NOTNULLTEST) +s1 *= cslist->notnullsel; In some paths, the range selectivity estimation code punts and returns DEFAULT_RANGE_INEQ_SEL. In those cases, if a not null test was present, it should still be applied. It could make a big difference if the selectivity for the nulltest is high. >>> >>> I'd say that the location that applies the DEFAULT_RANGE_INEQ_SEL >>> should apply the nullfrac. Seems wrong to rely on a IS NOT NULL test >>> to exists to estimate that properly. I don't think that needs done as >>> part of this patch. I'd rather limit the scope since "returned with >>> feedback" is already knocking at the door of this patch. >> >> I agree, but this patch regresses for those cases if the user >> compensated with a not null test, leaving little recourse, so I'd fix >> it in this patch to avoid the regression. >> >> Maybe you're right that the null fraction should be applied regardless >> of whether there's an explicit null check, though. > > The more I read that code, the more I'm convinced there's no way to > get a DEFAULT_RANGE_INEQ_SEL if (rqlist->hibound != DEFAULT_INEQ_SEL > && > rqlist->lobound != DEFAULT_INEQ_SEL) other than from contradictory > clauses, a case the current code handles very poorly, returning > DEFAULT_RANGE_INEQ_SEL and ignoring nullfrac, something that could > give way-off estimates if the table had lots of nulls. > > Say, consider if "value" was mostly null and you write: > > SELECT * FROM testdata WHERE value BETWEEN 10 AND 20 AND value BETWEEN > 50 AND 60; > > All other cases I can think of would end with either hibound or > lobound being equal to DEFAULT_INEQ_SEL. > > It seems to me, doing a git blame, that the checks in the else branch > were left only as a precaution, one that seems unnecessary. > > If you ask me, I'd just leave: > > + if (rqlist->hibound == DEFAULT_INEQ_SEL || rqlist->lobound == > DEFAULT_INEQ_SEL) > + { > + /* No point in checking null selectivity, DEFAULT_INEQ_SEL > implies we have no stats */ > + s2 = DEFAULT_RANGE_INEQ_SEL; > + } > + else > + { > ... > + s2 = rqlist->hibound + rqlist->lobound - 1.0; > + nullsel = nulltestsel(root, IS_NULL, rqlist->var, varRelid); > + notnullsel = 1.0 - nullsel; > + > + /* Adjust for double-exclusion of NULLs */ > + s2 += nullsel; > + if (s2 <= 0.0) { > + if (s2 <= (-1.0e-4 * notnullsel - 1.0e-6)) > + { > + /* Most likely contradictory clauses found */ > + s2 = (1.0e-10 < notnullsel) ? 1.0e-10 : notnullsel; > + } > + else > + { > + /* Could be a rounding error */ > + s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; > + } > + } > + } > > Where (-1.0e-4 * notnullsel - 1.0e-6) is just a very rough (and overly > cautious) estimation of the amount of rounding error that could be > there with 32-bit floats. > > The above assumes a non-DEFAULT_INEQ_SEL value in lobound/hibound is > an estimation based on stats, maybe approximate, but one that makes > sense (ie: preserves the monotonicity of the CPF), and as such > negative values are either sign of a contradiction or rounding error. > The previous code left the possibility of negatives coming out of > default selectivities creeping up on non-DEFAULT_INEQ_SEL estimates, > but that doesn't seem possible coming out of scalarineqsel. > > But I'd like it if Tom could comment on that, since he's the one that > did that in commit 547bb4a7f2bdccad9253a99211ce84b3f9de485a (way back > in 2004). > > Barring that, I'd just change the > > s2 = DEFAULT_RANGE_INEQ_SEL; > > To > > s2 = DEFAULT_RANGE_INEQ_SEL * notnullsel; > > Which makes a lot more sense. I think to fix this properly the selfuncs would have to return the estimate, and nullfrac separately, so the nullfrac could just be applied once per expression. There's already hacks in there to get rid of the double null filtering. I'm not proposing that as something for this patch. It would be pretty invasive to change this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Compiler warning in costsize.c
On Tue, Apr 4, 2017 at 7:03 PM, David Rowleywrote: > On 4 April 2017 at 16:22, Michael Paquier wrote: >> Hi all, >> >> In builds where USE_ASSERT_CHECKING is not enabled, costsize.c can >> generate warnings. Here is for example with MSVC: >> src/backend/optimizer/path/costsize.c(4520): warning C4101: 'rte' : >> unreferen >> ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj] >> src/backend/optimizer/path/costsize.c(4640): warning C4101: 'rte' : >> unreferen >> ced local variable [C:\Users\ioltas\git\postgres\postgres.vcxproj] >> >> a9c074ba7 has done an effort, but a bit more is needed as the attached. > > Thanks for writing the patch. > > I wondering if it would be worth simplifying it a little to get rid of > the double #ifdefs, as per attached. Yes, that would be fine as well. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Supporting huge pages on Windows
On 4 Apr. 2017 14:22, "Andres Freund"wrote: On 2017-01-05 03:12:09 +, Tsunakawa, Takayuki wrote: > From: pgsql-hackers-ow...@postgresql.org > > [mailto:pgsql-hackers-ow...@postgresql.org] On Behalf Of Magnus Hagander > > For the pg_ctl changes, we're going from removing all privilieges from the > > token, to removing none. Are there any other privileges that we should be > > worried about? I think you may be correct in that it's overkill to do it, > > but I think we need some more specifics to decide that. > > This page lists the privileges. Is there anyhing you are concerned about? > > https://msdn.microsoft.com/ja-jp/library/windows/desktop/ bb530716(v=vs.85).aspx Aren't like nearly all of them a concern? We gone from having some containment (not being to create users, shut the system down, ...), to none. I do however think there's a fair argument to be made that other platforms do not have a similar containment (no root, but sudo etc is still possible), and that the containment isn't super strong. TBH, anyone who cares about security and runs Win7 or Win2k8 or newer should be using virtual service accounts and managed service accounts. https://technet.microsoft.com/en-us/library/dd548356 Those are more like Unix service accounts. Notably they don't need a password, getting rid of some of the management pain that led us to abandon the 'postgres' system user on Windows. Now that older platforms are EoL and even the oldest that added this feature are also near EoL or in extended maintenance, I think installers should switch to these by default instead of using NETWORK SERVICE. Then the issue of priv dropping would be a lesser concern anyway.
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Apr 4, 2017 at 9:18 AM, Robert Haaswrote: > Committed. Thanks Robert, And also sorry, one unfortunate thing happened in the last patch while fixing one of the review comments a variable disappeared from the equation @_hash_spareindex. splitpoint_phases += - (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) & + (((num_bucket - 1) >> + (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) & HASH_SPLITPOINT_PHASE_MASK); /* to 0-based value. */ I wanted most significant 3 bits. And while fixing comments in patch11 I unknowingly somehow removed splitpoint_group from the equation. Extremely sorry for the mistake. Thanks to Ashutosh Sharma for pointing the mistake. -- Thanks and Regards Mithun C Y EnterpriseDB: http://www.enterprisedb.com _hash_spareindex_defect.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [POC] A better way to expand hash indexes.
On Tue, Apr 4, 2017 at 6:33 AM, Mithun Cywrote: > On Tue, Apr 4, 2017 at 9:18 AM, Robert Haas wrote: >> Committed. > > Thanks Robert, > > And also sorry, one unfortunate thing happened in the last patch while > fixing one of the review comments a variable disappeared from the > equation > @_hash_spareindex. > > splitpoint_phases += > - (((num_bucket - 1) >> (HASH_SPLITPOINT_PHASE_BITS + 1)) & > + (((num_bucket - 1) >> > + (splitpoint_group - (HASH_SPLITPOINT_PHASE_BITS + 1))) & > HASH_SPLITPOINT_PHASE_MASK); /* to 0-based value. */ > > I wanted most significant 3 bits. And while fixing comments in patch11 > I unknowingly somehow removed splitpoint_group from the equation. > Extremely sorry for the mistake. Thanks to Ashutosh Sharma for > pointing the mistake. Ugh, OK, committed that also. Please try to be more careful in the future. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Parallel Append implementation
On Tue, Apr 4, 2017 at 12:47 AM, Andres Freundwrote: > I don't think the parallel seqscan is comparable in complexity with the > parallel append case. Each worker there does the same kind of work, and > if one of them is behind, it'll just do less. But correct sizing will > be more important with parallel-append, because with non-partial > subplans the work is absolutely *not* uniform. Sure, that's a problem, but I think it's still absolutely necessary to ramp up the maximum "effort" (in terms of number of workers) logarithmically. If you just do it by costing, the winning number of workers will always be the largest number that we think we'll be able to put to use - e.g. with 100 branches of relatively equal cost we'll pick 100 workers. That's not remotely sane. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] show "aggressive" or not in autovacuum logs
On Tue, Apr 4, 2017 at 10:09 AM, Kyotaro HORIGUCHIwrote: > Hello, > > At Fri, 31 Mar 2017 18:20:23 +0900, Masahiko Sawada > wrote in >> On Wed, Mar 29, 2017 at 12:46 PM, Kyotaro HORIGUCHI >> wrote: >> > Hello, it would be too late but I'd like to propose this because >> > this cannot be back-patched. >> > >> > >> > In autovacuum logs, "%u skipped frozen" shows the number of pages >> > skipped by ALL_FROZEN only in aggressive vacuum. >> > >> > So users cannot tell whether '0 skipped-frozen' means a >> > non-agressive vacuum or no frozen-pages in an agressive vacuum. >> > >> > I think it is nice to have an indication whether the scan was >> > "agressive" or not in log output. >> >> Good idea. I also was thinking about this. > > Thanks. Currently we cannot use "skipped-frozen" to see the > effect of ALL_FROZEN. > >> > Like this, >> > >> >> LOG: automatic aggressive vacuum of table >> >> "template1.pg_catalog.pg_statistic": index scans: 0 >> > >> > "0 skipped frozen" is uesless in non-aggressive vacuum but >> > removing it would be too-much. Inserting "aggressive" reduces >> > machine-readability so it might be better in another place. The >> > attached patch does the following. >> > >> >> LOG: automatic vacuum of table "postgres.public.pgbench_branches": >> >> mode: normal, index scans: 0 >> >> LOG: automatic vacuum of table "postgres.public.pgbench_branches": >> >> mode: aggressive, index scans: 0 >> > >> >> Should we add this even to the manual vacuum verbose message? > > I forgot that. The patch adds the mode indication in the first > message of VACUUM VERBOSE. > > | =# vacuum freeze verbose it; > | INFO: vacuuming "public.it" in aggressive mode > | INFO: "it": found 0 removable, 0 nonremovable row versions in 0 out of 0 > pages > ... > | Skipped 0 pages due to buffer pins, 0 frozen pages. > > I still feel a bit uneasy about the word "aggressive" here. I think we can use the word "aggressive" here since we already use the word "aggressive vacuum" in docs[1], but it might be easily misunderstood. [1] https://www.postgresql.org/docs/9.6/static/routine-vacuuming.html >Is it better to be "freezing" or something? An another idea can be something like "prevent wraparound". The autovaucum process doing aggressive vacuum appears in pg_stat_activity with the word " (to prevent wraparound)". This word might be more user friendly IMO. Regards, -- Masahiko Sawada NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Page Scan Mode in Hash Index
On Tue, Apr 4, 2017 at 6:29 AM, Amit Kapilawrote: > On Mon, Mar 27, 2017 at 7:04 PM, Ashutosh Sharma > wrote: >> My guess (which could be wrong) is that so->hashso_bucket_buf = >>> InvalidBuffer should be moved back up higher in the function where it >>> was before, just after the first if statement, and that the new >>> condition so->hashso_bucket_buf == so->currPos.buf on the last >>> _hash_dropbuf() should be removed. If that's not right, then I think >>> I need somebody to explain why not. >> >> Okay, as i mentioned above, in case of page scan mode we only keep pin >> on a bucket buf. There won't be any case where we will be having pin >> on overflow buf at the end of scan. >> > > What if mark the buffer as invalid after releasing the pin? We are > already doing it that way in btree code, refer > _bt_drop_lock_and_maybe_pin(). I think if we do that way, then we can > do what Robert is suggesting. Please continue reviewing, but I think we're out of time to get this patch into v10. This patch seems to be still under fairly heavy revision, and we're only a couple of days from feature freeze, and the patch upon which it depends (page-at-a-time vacuum) has had no fewer than four follow-up commits repairing various problems with the logic, with no guarantee that we've found all the bugs yet. In view of those facts, I don't think it would be wise for me to commit this to v10, so I'm instead going to move it to the next CommitFest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] wait event documentation
On Tue, Apr 4, 2017 at 9:05 PM, Robert Haaswrote: > On Mon, Apr 3, 2017 at 11:57 PM, Amit Langote > wrote: >> By the way, wonder if it wouldn't make sense to take the whole Table 28.1. >> Dynamic Statistics Views into a new section (perhaps before 28.2 Viewing >> Locks or after), since those views display information different from what >> the statistics collector component collects and publishes (those in the >> Table 28.2. Collected Statistics Views). > > It seems a little short for that, doesn't it? I mean, I'm not against > it on principle, but we don't want to hoist a 5-line table into a > section by itself. I was thinking that if we move Table 28.1 to a new section, Tables 28.3 (pg_stat_activity) to 28.8 (pg_stat_ssl) will go too. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers