Re: [HACKERS] postgres_fdw join pushdown (was Re: Custom/Foreign-Join-APIs)
On 2016/02/10 4:16, Robert Haas wrote: On Tue, Feb 9, 2016 at 8:39 AM, Ashutosh Bapat wrote: Thanks Jeevan for your review and comments. PFA the patch which fixes those. Committed with a couple more small adjustments. Thanks for working on this, Robert, Ashutosh, and everyone involved! I happened to notice that this code in foreign_join_ok(): switch (jointype) { case JOIN_INNER: fpinfo->remote_conds = list_concat(fpinfo->remote_conds, fpinfo_i->remote_conds); fpinfo->remote_conds = list_concat(fpinfo->remote_conds, fpinfo_o->remote_conds); break; case JOIN_LEFT: fpinfo->joinclauses = list_concat(fpinfo->joinclauses, fpinfo_i->remote_conds); fpinfo->remote_conds = list_concat(fpinfo->remote_conds, fpinfo_o->remote_conds); break; case JOIN_RIGHT: fpinfo->joinclauses = list_concat(fpinfo->joinclauses, fpinfo_o->remote_conds); fpinfo->remote_conds = list_concat(fpinfo->remote_conds, fpinfo_i->remote_conds); break; case JOIN_FULL: fpinfo->joinclauses = list_concat(fpinfo->joinclauses, fpinfo_i->remote_conds); fpinfo->joinclauses = list_concat(fpinfo->joinclauses, fpinfo_o->remote_conds); break; default: /* Should not happen, we have just check this above */ elog(ERROR, "unsupported join type %d", jointype); } would break the list fpinfo_i->remote_conds in the case of INNER JOIN or FULL JOIN. You can see the list breakage from e.g., the following queries on an Assert-enabled build: postgres=# create extension postgres_fdw; CREATE EXTENSION postgres=# create server myserver foreign data wrapper postgres_fdw options (dbname 'mydatabase'); CREATE SERVER postgres=# create user mapping for current_user server myserver; CREATE USER MAPPING postgres=# create foreign table foo (a int) server myserver options (table_name 'foo'); CREATE FOREIGN TABLE postgres=# create foreign table bar (a int) server myserver options (table_name 'bar'); CREATE FOREIGN TABLE postgres=# create foreign table baz (a int) server myserver options (table_name 'baz'); CREATE FOREIGN TABLE postgres=# select * from foo, bar, baz where foo.a = bar.a and bar.a = baz.a and foo.a < 10 and bar.a < 10 and baz.a < 10; Attached is a patch to avoid the breakage. Best regards, Etsuro Fujita *** a/contrib/postgres_fdw/postgres_fdw.c --- b/contrib/postgres_fdw/postgres_fdw.c *** *** 3488,3495 foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype, case JOIN_INNER: fpinfo->remote_conds = list_concat(fpinfo->remote_conds, fpinfo_i->remote_conds); ! fpinfo->remote_conds = list_concat(fpinfo->remote_conds, ! fpinfo_o->remote_conds); break; case JOIN_LEFT: --- 3488,3496 case JOIN_INNER: fpinfo->remote_conds = list_concat(fpinfo->remote_conds, fpinfo_i->remote_conds); ! if (fpinfo_o->remote_conds) ! fpinfo->remote_conds = list_concat(list_copy(fpinfo->remote_conds), ! fpinfo_o->remote_conds); break; case JOIN_LEFT: *** *** 3509,3516 foreign_join_ok(PlannerInfo *root, RelOptInfo *joinrel, JoinType jointype, case JOIN_FULL: fpinfo->joinclauses = list_concat(fpinfo->joinclauses, fpinfo_i->remote_conds); ! fpinfo->joinclauses = list_concat(fpinfo->joinclauses, ! fpinfo_o->remote_conds); break; default: --- 3510,3518 case JOIN_FULL: fpinfo->joinclauses = list_concat(fpinfo->joinclauses, fpinfo_i->remote_conds); ! if (fpinfo_o->remote_conds) ! fpinfo->joinclauses = list_concat(list_copy(fpinfo->joinclauses), ! fpinfo_o->remote_conds); break; default: -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extend pgbench expressions with functions
On Mon, Feb 15, 2016 at 5:21 AM, Robert Haas wrote: > On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO wrote: >> Here is a 3 part v29: >> >> a) add support for integer functions in pgbench, including turning >>operators as functions, as well as some minimal infrastructure for >>additional types (this allows to minimize the diffs with the next >>patch which adds double). >> >> b) add support for doubles, including setting double variables. >>Note that variable are not explicitely typed. Add some >>double-related functions, most interesting of them for pgbench >>are the randoms. >> >> c) remove \setrandom support (as thanks to part b \set x random(...) does >>the same). Prints an error pointing to the replacement if \setrandom is >>used in a pgbench script. > > Thanks, I'll review these as soon as I can get to it. I got around to look at (a) in this set. + if ((PGBENCH_FUNCTIONS[fnumber].nargs >= 0 && +PGBENCH_FUNCTIONS[fnumber].nargs != elist_length(args)) || + /* check at least one arg for min & max */ + (PGBENCH_FUNCTIONS[fnumber].nargs == -1 && +elist_length(args) == 0)) + expr_yyerror_more("unexpected number of arguments", + PGBENCH_FUNCTIONS[fnumber].fname); We could split that into two parts, each one with a more precise error message: - For functions that expect at least one argument: "at least one argument was expected, none found". - For functions that expect N arguments: "N arguments expected, but M found" + "\\set aid random(1, " CppAsString2(naccounts) " * :scale)\n" "SELECT abalance FROM pgbench_accounts WHERE aid = :aid;\n" } }; - /* Function prototypes */ Noise. + * Recursive evaluation of int or double expressions + * + * Note that currently only integer variables are available, with values + * stored as text. This comment is incorrect, we only care about integers in this patch. Taking patch 1 as a completely independent thing, there is no need to introduce PgBenchValueType yet. Similar remark for setIntValue and coerceToInt. They are useful afterwards when introducing double types to be able to handle double input parameters for the gaussian and other functions. - /* -* INT64_MIN / -1 is problematic, since the result -* can't be represented on a two's-complement machine. -* Some machines produce INT64_MIN, some produce zero, -* some throw an exception. We can dodge the problem -* by recognizing that division by -1 is the same as -* negation. -*/ - if (rval == -1) + if (coerceToInt(&rval) == -1) { - *retval = -lval; - - /* overflow check (needed for INT64_MIN) */ - if (lval == PG_INT64_MIN) - { - fprintf(stderr, "bigint out of range\n"); - return false; - } + setIntValue(retval, 0); + return true; } (INT64_MIN / -1) should error. (INT64_MIN % -1) should result in 0. This is missing the division handling. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket
On Mon, Feb 15, 2016 at 3:20 PM, Michael Paquier wrote: > On Sat, Feb 13, 2016 at 6:25 AM, Alvaro Herrera > wrote: >> I noticed that pgbench calls FD_ISSET on a socket returned by >> PQsocket() without first checking that it's not invalid. I don't think >> there's a real problem here because the socket was already checked a few >> lines above, but I think applying the same coding pattern to both places >> is cleaner. >> >> Any objections to changing it like this? I'd probably backpatch to 9.5, >> but no further (even though this pattern actually appears all the way >> back.) > > Not really, +1 for consistency here, and this makes the code clearer. > > Different issues in the same area: > 1) In vacuumdb.c, init_slot() does not check for the return value of > PQsocket(): > slot->sock = PQsocket(conn); > 2) In isolationtester.c, try_complete_step() should do the same. > 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem. > I guess those ones should be fixed as well, no? With a patch, that's even better. -- Michael diff --git a/src/bin/pg_basebackup/pg_recvlogical.c b/src/bin/pg_basebackup/pg_recvlogical.c index 832f9f9..b4325b0 100644 --- a/src/bin/pg_basebackup/pg_recvlogical.c +++ b/src/bin/pg_basebackup/pg_recvlogical.c @@ -360,6 +360,14 @@ StreamLogicalLog(void) struct timeval timeout; struct timeval *timeoutptr = NULL; + if (PQsocket(conn) < 0) + { +fprintf(stderr, + _("%s: bad socket: \"%s\"\n"), + progname, strerror(errno)); +goto error; + } + FD_ZERO(&input_mask); FD_SET(PQsocket(conn), &input_mask); diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index c6afcd5..e81123f 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -70,7 +70,7 @@ static void DisconnectDatabase(ParallelSlot *slot); static int select_loop(int maxFd, fd_set *workerset, bool *aborting); -static void init_slot(ParallelSlot *slot, PGconn *conn); +static void init_slot(ParallelSlot *slot, PGconn *conn, const char *progname); static void help(const char *progname); @@ -421,14 +421,14 @@ vacuum_one_database(const char *dbname, vacuumingOptions *vacopts, * array contains the connection. */ slots = (ParallelSlot *) pg_malloc(sizeof(ParallelSlot) * concurrentCons); - init_slot(slots, conn); + init_slot(slots, conn, progname); if (parallel) { for (i = 1; i < concurrentCons; i++) { conn = connectDatabase(dbname, host, port, username, prompt_password, progname, false, true); - init_slot(slots + i, conn); + init_slot(slots + i, conn, progname); } } @@ -917,11 +917,18 @@ select_loop(int maxFd, fd_set *workerset, bool *aborting) } static void -init_slot(ParallelSlot *slot, PGconn *conn) +init_slot(ParallelSlot *slot, PGconn *conn, const char *progname) { slot->connection = conn; slot->isFree = true; slot->sock = PQsocket(conn); + + if (slot->sock < 0) + { + fprintf(stderr, _("%s: bad socket: %s\n"), progname, +strerror(errno)); + exit(1); + } } static void diff --git a/src/test/isolation/isolationtester.c b/src/test/isolation/isolationtester.c index 0a9d25c..3e13a39 100644 --- a/src/test/isolation/isolationtester.c +++ b/src/test/isolation/isolationtester.c @@ -720,6 +720,12 @@ try_complete_step(Step *step, int flags) PGresult *res; bool canceled = false; + if (sock < 0) + { + fprintf(stderr, "bad socket: %s\n", strerror(errno)); + exit_nicely(); + } + gettimeofday(&start_time, NULL); FD_ZERO(&read_set); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] innocuous: pgbench does FD_ISSET on invalid socket
On Sat, Feb 13, 2016 at 6:25 AM, Alvaro Herrera wrote: > I noticed that pgbench calls FD_ISSET on a socket returned by > PQsocket() without first checking that it's not invalid. I don't think > there's a real problem here because the socket was already checked a few > lines above, but I think applying the same coding pattern to both places > is cleaner. > > Any objections to changing it like this? I'd probably backpatch to 9.5, > but no further (even though this pattern actually appears all the way > back.) Not really, +1 for consistency here, and this makes the code clearer. Different issues in the same area: 1) In vacuumdb.c, init_slot() does not check for the return value of PQsocket(): slot->sock = PQsocket(conn); 2) In isolationtester.c, try_complete_step() should do the same. 3) In pg_recvlogical.c for StreamLogicalLog() I am spotting the same problem. I guess those ones should be fixed as well, no? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On Fri, Feb 12, 2016 at 5:40 PM, Etsuro Fujita wrote: > Hi Rushabh and Thom, > > Thanks for the review! > > On 2016/02/10 22:37, Thom Brown wrote: > >> On 10 February 2016 at 08:00, Rushabh Lathia >> wrote: >> >>> Fujita-san, I am attaching update version of the patch, which added >>> the documentation update. >>> >> > Thanks for updating the patch! > > Once we finalize this, I feel good with the patch and think that we >>> could mark this as ready for committer. >>> >> > I find this wording a bit confusing: >> >> "If the IsForeignRelUpdatable pointer is set to NULL, foreign tables >> are assumed to be insertable, updatable, or deletable either the FDW >> provides ExecForeignInsert,ExecForeignUpdate, or ExecForeignDelete >> respectively or if the FDW optimizes a foreign table update on a >> foreign table using PlanDMLPushdown (PlanDMLPushdown still needs to >> provide BeginDMLPushdown, IterateDMLPushdown and EndDMLPushdown to >> execute the optimized update.)." >> >> This is a very long sentence, and the word "either" doesn't work here. >> > > Agreed. > > As a result of our discussions, we reached a conclusion that the DML > pushdown APIs should be provided together with existing APIs such as > ExecForeignInsert, ExecForeignUpdate or ExecForeignDelete, IIUC. So, how > about (1) leaving the description for the existing APIs as-is and (2) > adding a new description for the DML pushdown APIs in parenthesis, like > this?: > > If the IsForeignRelUpdatable pointer is set to > NULL, foreign tables are assumed to be insertable, > updatable, > or deletable if the FDW provides ExecForeignInsert, > ExecForeignUpdate, or ExecForeignDelete > respectively. > (If the FDW attempts to optimize a foreign table update, it still > needs to provide PlanDMLPushdown, BeginDMLPushdown, > IterateDMLPushdown and EndDMLPushdown.) > > Actually, if the FDW provides the DML pushdown APIs, (pushdown-able) > foreign table updates can be done without ExecForeignInsert, > ExecForeignUpdate or ExecForeignDelete. So, the above docs are not > necessarily correct. But we don't recommend to do that without the > existing APIs, so I'm not sure it's worth complicating the docs. Adding a new description for DML pushdown API seems good idea. I would suggest to add that as separate paragraph rather then into brackets. > > > Also: >> >> "When the query doesn't has the clause, the FDW must also increment >> the row count for the ForeignScanState node in the EXPLAIN ANALYZE >> case." >> >> Should read "doesn't have" >> > > Will fix. > > Best regards, > Etsuro Fujita > > > -- Rushabh Lathia
Re: [HACKERS] Refectoring of receivelog.c
On 15 February 2016 at 04:48, Magnus Hagander wrote: > I was working on adding the tar streaming functionality we talked about at > the developer meeting to pg_basebackup, and rapidly ran across the issue > that Andres has been complaining about for a while. The code in > receivelog.c just passes an insane number of parameters around. Adding or > changing even a small thing ends up touching a huge number of places. > Other than the lack of comments on the fields in StreamCtl to indicate their functions I think this looks good. I didn't find any mistakes, but I do admit my eyes started glazing over after a bit. I'd rather not have StreamCtl as a typedef of an anonymous struct if it's exposed in a header though. It should have a name that can be used in forward declarations etc. After recently working with the XLogReader I can't emphasise enough how important *useful* comments on the fields in these sorts of structs are - the XLogReaderState has ReadRecPtr, readSegNo + readOff + readLen, currRecPtr AND latestPagePtr. Which actually do have comments, just not super helpful ones. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] pgbench - allow backslash-continuations in custom scripts
Hello, thank you for reviewing this. > On Thu, Jan 7, 2016 at 3:36 AM, Kyotaro HORIGUCHI > wrote: > > - 0001-Prepare-for-sharing-psqlscan-with-pgbench.patch > > > > This diff looks a bit large but most of them is cut'n-paste > > work and the substantial change is rather small. > > > > This refactors psqlscan.l into two .l files. The additional > > psqlscan_slash.l is a bit tricky in the point that recreating > > scan_state on transition between psqlscan.l. > > I've looked at this patch a few times now but find it rather hard to > verify. I am wondering if you would be willing to separate 0001 into > subpatches. For example, maybe there could be one or two patches that > ONLY move code around and then one or more patches that make the > changes to that code. Right now, for example, psql_scan_setup() is > getting three additional arguments, but it's also being moved to a > different file. Perhaps those two things could be done one at a time. I tried to split it into patches with some meaningful (I thought) steps, but I'll arrange them if it is not easy to read. > I also think this patch could really benefit from a detailed set of > submission notes that specifically lay out why each change was made > and why. For instance, I see that psqlscan.l used yyless() while > psqlscanbody.l uses a new my_yyless() you've defined. There is > probably a great reason for that and I'm sure if I stare at this for > long enough I can figure out what that reason is, but it would be > better if you had a list of bullet points explaining what was changed > and why. I'm sorry, but I didn't understood the 'submission notes' exactly means. Is it precise descriptions in source comments? or commit message of git-commit? > I would really like to see this patch committed; my problem is that I > don't have enough braincells to be sure that it's correct in the > present form. Thank you. I'll send the rearranged patch sooner. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
On Mon, Feb 15, 2016 at 2:11 PM, Kyotaro HORIGUCHI wrote: > Surprizingly yes. The list is handled as an identifier list and > parsed by SplitIdentifierString thus it can accept double-quoted > names. Good point. I was not aware of this trick. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Optimization for updating foreign tables in Postgres FDW
On 2016/02/12 21:46, Robert Haas wrote: On Fri, Feb 12, 2016 at 7:19 AM, Etsuro Fujita wrote: I think that displaying target lists would be confusing for users. Here is an example: EXPLAIN (verbose, costs off) DELETE FROM rem1; -- can be pushed down QUERY PLAN - Delete on public.rem1 -> Foreign Delete on public.rem1 Output: ctid Remote SQL: DELETE FROM public.loc1 (4 rows) Should we output the "Output" line? I see your point, but what if there's a RETURNING clause? IMO I think that would be confusing in that case. Here is an example: EXPLAIN (verbose, costs off) DELETE FROM rem1 RETURNING *; -- can be pushed down QUERY PLAN -- Delete on public.rem1 Output: f1, f2 -> Foreign Delete on public.rem1 Output: ctid Remote SQL: DELETE FROM public.loc1 RETURNING f1, f2 (5 rows) The Output line beneath the ForeignScan node doesn't match the RETURNING expressions in the remote query as the Output line beneath the ModifyTable node does, so I think displaying that would be confusing even in that case. Another example: postgres=# explain verbose update foo set a = a + 1 returning *; QUERY PLAN -- Update on public.foo (cost=100.00..137.50 rows=1000 width=10) Output: a -> Foreign Update on public.foo (cost=100.00..137.50 rows=1000 width=10) Output: (a + 1), ctid Remote SQL: UPDATE public.foo SET a = (a + 1) RETURNING a (5 rows) Same above. As for case of INSERT .. RETURNING .., I guess there is not such a mismatch, but I'm not sure that displaying that is that helpful, honestly, so I'd vote for suppressing that in all cases, for consistency. Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Incorrect formula for SysV IPC parameters
Thanks for looking at this. At Fri, 12 Feb 2016 23:19:45 +0900, Fujii Masao wrote in > >> ISTM that you also need to change the descriptions about SEMMNI and SEMMNS > >> under the Table 17-1. > > > > Oops! Thank you for pointing it out. > > > > The original description doesn't mention the magic-number ('5' in > > the above formulas, which previously was '4') so I haven't added > > anything about it. > > > > Process of which the number is limited by max_worker_processes is > > called 'background process' (not 'backend worker') in the > > documentation so I used the name to mention it in the additional > > description. > > > > The difference in the body text for 9.2, 9.3 is only a literal > > '4' to '5' in the formula. > > Thanks for updating the patches! > > They look good to me except that the formulas don't include the number of > background processes requesting shared memory access, i.e., > GetNumShmemAttachedBgworkers(), in 9.3. Isn't it better to use the following > formula in 9.3? > > ceil((max_connections + autovacuum_max_workers + number of > background proceses + 5) / 16) > > Attached patch uses the above formula for 9.3. I'm thinking to push your > patches to 9.2, 9.4, 9.5, master, also push the attached one to 9.3. > Comments? One concern is that users don't have any simple way to know how many bg-workers a server runs in their current configuration. The formula donsn't make sense without it. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Support for N synchronous standby servers - take 2
Hello, At Wed, 10 Feb 2016 18:36:43 +0900, Michael Paquier wrote in > On Wed, Feb 10, 2016 at 5:34 PM, Kyotaro HORIGUCHI > wrote: > > > > +sync_node_group: > > > > + sync_list { $$ = create_group_node(1, > > > > $1); > > > > } > > > > + | sync_element_ast{ $$ = create_group_node(1, > > > > $1);} > > > > + | INT '[' sync_list ']' { $$ = create_group_node($1, > > > > $3);} > > > > + | INT '[' sync_element_ast ']'{ $$ = create_group_node($1, > > > > $3); } > > > > We may want to be careful with the use of '[' in application_name. I am > > > > not > > > > much thrilled with forbidding the use of []() in application_name, so > > > > we may > > > > want to recommend user to use a backslash when using s_s_names when a > > > > group > > > > is defined. > > > > Mmmm. I found that application_name can contain > > commas. Furthermore, there seems to be no limitation for > > character in the name. > > > > postgres=# set application_name='ho,ge'; > > postgres=# select application_name from pg_stat_activity; > > application_name > > -- > > ho,ge > > > > check_application_name() allows all characters in the range > > between 32 to 126 in ascii. All other characters are replaced > > with '?'. > > Actually I was thinking about that a couple of hours ago. If the > application_name of a node has a comma, it cannot become a sync > replica, no? Wouldn't we need a special handling in s_s_names like > '\,' make a comma part of an application name? Or just ban commas from > the list of supported characters in the application name? Surprizingly yes. The list is handled as an identifier list and parsed by SplitIdentifierString thus it can accept deouble-quoted names. s_s_names='abc, def, " abc,""def"' Result list is ["abc", "def", " abc,\"def"] Simplly supporting the same notation addresses the problem and accepts strings like the following. s_s_names='2["comma,name", "foo[bar,baz]"]' It is currently an undocumented behavior but I doubt the necessity to have an explict mention. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Sun, Feb 7, 2016 at 4:50 PM, Peter Geoghegan wrote: >> I'm not even sure this is necessary. The idea of missing out on >> producing a single sorted run sounds bad but in practice since we >> normally do the final merge on the fly there doesn't seem like there's >> really any difference between reading one tape or reading two or three >> tapes when outputing the final results. There will be the same amount >> of I/O happening and a 2-way or 3-way merge for most data types should >> be basically free. > > I basically agree with you, but it seems possible to fix the > regression (generally misguided though those regressed cases are). > It's probably easiest to just fix it. Here is a benchmark on my laptop: $ pgbench -i -s 500 --unlogged This results in a ~1GB accounts PK: postgres=# \di+ pgbench_accounts_pkey List of relations ─[ RECORD 1 ]── Schema │ public Name│ pgbench_accounts_pkey Type│ index Owner │ pg Table │ pgbench_accounts Size│ 1071 MB Description │ The query I'm testing is: "reindex index pgbench_accounts_pkey;" Now, with a maintenance_work_mem of 5MB, the most recent revision of my patch takes about 54.2 seconds to complete this, as compared to master's 44.4 seconds. So, clearly a noticeable regression there of just under 20%. I did not see a regression with a 5MB maintenance_work_mem when pgbench scale was 100, though. And, with the default maintenance_work_mem of 64MB, it's a totally different story -- my patch takes about 28.3 seconds, whereas master takes 48.5 seconds (i.e. longer than with 5MB). My patch needs a 56-way final merge with the 64MB maintenance_work_mem case, and 47 distinct merge steps, plus a final on-the-fly merge for the 5MB maintenance_work_mem case. So, a huge amount of merging, but RS still hardly pays for itself. With the regressed case for my patch, we finish sorting *runs* about 15 seconds in to a 54.2 second operation -- very early. So it isn't "quicksort vs replacement selection", so much as "polyphase merge vs replacement selection". There is a good reason to think that we can make progress on fixing that regression by doubling down on the general strategy of improving cache characteristics, and being cleverer about memory use during non-final merging, too. I looked at what it would take to make the heap a smaller part of memtuples, along the lines Robert and I talked about, and I think it's non-trivial because it needs to make the top of the heap something other than memtuples[0]. I'd need to change the heap code, which already has 3 reasons for existing (RS, merging, and top-N heap). I'll find it really hard to justify the effort, and especially the risk of adding bugs, for a benefit that there is *scant* evidence for. My guess is that the easiest, and most sensible way to fix the ~20% regression seen here is to introduce batch memory allocation to non-final merge steps, which is where most time was spent. (For simplicity, that currently only happens during the final merge phase, but I could revisit that -- seems not that hard). Now, I accept that the cost model has to go. So, what I think would be best is if we still added a GUC, like the replacement_sort_mem suggestion that Robert made. This would be a threshold for using what is currently called "quicksort with spillover". There'd be no cost model. Jeff Janes also suggested something like this. The only regression that I find concerning is the one reported by Jeff Janes [1]. That didn't even involve a correlation, though, so no reason to think that it would be at all helped by what Robert and I talked about. It seemed like the patch happened to have the effect of tickling a pre-existing problem with polyphase merge -- what Jeff called an "anti-sweetspot". Jeff had a plausible theory for why that is. So, what if we try to fix polyphase merge? That would be easier. We could look at the tape buffer size, and the number of tapes, as well as memory access patterns. We might even make more fundamental changes to polyphase merge, since we don't use the more advanced variant that I think correlation is a red herring. Knuth suggests that his algorithm 5.4.3, cascade merge, has more efficient distribution of runs. The bottom line is that there will always be some regression somewhere. I'm not sure what the guiding principle for when that becomes unacceptable is, but you did seem sympathetic to the idea that really low work_mem settings (e.g. 4MB) with really big inputs were not too concerning [2]. I'm emphasizing Jeff's case now because I, like you [2], am much more worried about maintenance_work_mem default cases with regressions than anything else, and that *was* such a case. Like Jeff Janes, I don't care about his other regression of about 5% [3], which involved a 4MB work_mem + 100 million tuples. The other case (the one I do care about) was 64MB + 400 million tuples, and was a much bigger regression, which is suggestive of the unpredictable
Re: [HACKERS] WIP: SCRAM authentication
On Mon, Feb 15, 2016 at 10:51 AM, Stephen Frost wrote: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> Stephen Frost writes: >> > Why do we need pg_shadow or pg_user or related views at all..? >> >> A lot of code looks at those just to get usernames. I am not in favor of >> breaking such stuff without need. > > Alright. > >> How about we just say that the password in these old views always reads >> out as '' even when there is a password, and we invent new views >> that carry real auth information? (Hopefully in an extensible way.) > > I'd be alright with that approach, I'd just rather that any clients > which actually want to read the password field be updated to look at the > extensible and sensible base catalogs, and not some hacked up array that > we shoved into that field. Well, then let's mask it, and just have pg_auth_verifiers. Another possible problem that I can see with this patch is what do we do with valid_until? The last set of patches sent did not switch this field to be per-verifier settable. I would consider a saner approach to keep things simple and still do that. Allowing multiple verifiers per protocol is a problem, and having a solution for it would be nice. Should this be prioritized before having more protocols like SCRAM? FWIW, browsing through pgbouncer, it has a look at pg_shadow for user's password to build a basic configuration file. (My mistake, while pg_user is world-readable, that's not the case of pg_shadow). -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: SCRAM authentication
Stephen Frost writes: > * Tom Lane (t...@sss.pgh.pa.us) wrote: >> How about we just say that the password in these old views always reads >> out as '' even when there is a password, and we invent new views >> that carry real auth information? (Hopefully in an extensible way.) > I'd be alright with that approach, I'd just rather that any clients > which actually want to read the password field be updated to look at the > extensible and sensible base catalogs, and not some hacked up array that > we shoved into that field. Yeah, I'm good with that. I just don't think the breakage needs to extend to clients that aren't trying to read auth-related information. BTW, if we haven't learned this lesson by now: I'm pretty sure that every single one of these views is an attempt to emulate what *used* to be the real base catalog, in some previous release. Maybe we should stop expecting clients to read the real catalog, ever, in favor of a sanitized view? Although I don't know exactly what that would lead to in terms of what we'd expose that's different from what the base catalog is. But it's worth thinking about whether there is a way to avoid having this same discussion again in five or ten years. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: SCRAM authentication
* Michael Paquier (michael.paqu...@gmail.com) wrote: > On Mon, Feb 15, 2016 at 10:23 AM, Stephen Frost wrote: > > I would start by pointing out that pg_user currently uses pg_shadow.. > > Why do we need pg_shadow or pg_user or related views at all..? > > pg_user/pg_shadow have the advantage to be world-readable and mask > password values. New views would have that same advantage, should we implement them that way. Tom's approach is also workable though, where we make the existing views have a reducaed charter, which is mainly around providing user lists and simply not include any info about password verifiers or the like. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP: SCRAM authentication
* Tom Lane (t...@sss.pgh.pa.us) wrote: > Stephen Frost writes: > > Why do we need pg_shadow or pg_user or related views at all..? > > A lot of code looks at those just to get usernames. I am not in favor of > breaking such stuff without need. Alright. > How about we just say that the password in these old views always reads > out as '' even when there is a password, and we invent new views > that carry real auth information? (Hopefully in an extensible way.) I'd be alright with that approach, I'd just rather that any clients which actually want to read the password field be updated to look at the extensible and sensible base catalogs, and not some hacked up array that we shoved into that field. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP: SCRAM authentication
On Mon, Feb 15, 2016 at 10:23 AM, Stephen Frost wrote: > I would start by pointing out that pg_user currently uses pg_shadow.. > Why do we need pg_shadow or pg_user or related views at all..? pg_user/pg_shadow have the advantage to be world-readable and mask password values. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: SCRAM authentication
Stephen Frost writes: > Why do we need pg_shadow or pg_user or related views at all..? A lot of code looks at those just to get usernames. I am not in favor of breaking such stuff without need. How about we just say that the password in these old views always reads out as '' even when there is a password, and we invent new views that carry real auth information? (Hopefully in an extensible way.) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: SCRAM authentication
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > It seems to me that applications are going to need a refresh anyway... Indeed. > Among the other possibilities I can foresee: > - Add a column "protocol" to pg_shadow and produce one row per > protocol, so one user will be listed for all the protocol it has. Any > application could then filter out things with an additional WHERE > clause. > - Nuke passwd from pg_shadow and have a new view pg_shadow_verifiers > made of the user OID, the protocol and the verifier. This maps quite > well with pg_auth_verifiers. > - Give up and nuke pg_shadow, which is here for compatibility down to > 8.1, and add a protocol column to pg_user, or even better create a new > view pg_user_verifiers that has all the data of all the protocols. If > we care a lot about backward-compatibility, pg_user could as well map > with pg_auth_verifiers with the md5 protocol. > I would go with the last one. I would start by pointing out that pg_user currently uses pg_shadow.. Why do we need pg_shadow or pg_user or related views at all..? Applications will need to be updated, we might as well simply nuke them and expect applications to use the new catalogs. Perhaps there is a useful view or two which we can provide over the new catalogs, but I'd rather consider how to create brand new, useful, views over the new catalogs than consider any kind of way to provides backwards compatible-ish views. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP: SCRAM authentication
On Mon, Feb 15, 2016 at 9:56 AM, Stephen Frost wrote: > * Michael Paquier (michael.paqu...@gmail.com) wrote: >> We'd need as well to switch pg_shadow to have an array of elements >> made of protocol:identifier instead of a single password field. There >> can be only one valid identifier per protocol, user and valid_until >> for a single point in time, and I can't believe that we should >> recommend only one authentication protocol per single major version of >> Postgres. > > Ugh, that sounds pretty grotty to me. > > Applications which consider these fields will need to be updated, one > way or the other, and I'd much rather they be updated to work with > reasonable structures rather than something we've hacked together in > some faint hope that it'd be useful. An array in pg_shadow for a field > which used to be a text field does *not* sound like a simpler solution > to me, and I'd rather simply do away with those views entirely, or at > least nuke the fields which are at issue, than try to come up with > something between wholesale change and no change that ends up being > worse than both. It seems to me that applications are going to need a refresh anyway... Among the other possibilities I can foresee: - Add a column "protocol" to pg_shadow and produce one row per protocol, so one user will be listed for all the protocol it has. Any application could then filter out things with an additional WHERE clause. - Nuke passwd from pg_shadow and have a new view pg_shadow_verifiers made of the user OID, the protocol and the verifier. This maps quite well with pg_auth_verifiers. - Give up and nuke pg_shadow, which is here for compatibility down to 8.1, and add a protocol column to pg_user, or even better create a new view pg_user_verifiers that has all the data of all the protocols. If we care a lot about backward-compatibility, pg_user could as well map with pg_auth_verifiers with the md5 protocol. I would go with the last one. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: SCRAM authentication
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Mon, Feb 15, 2016 at 9:17 AM, Stephen Frost wrote: > > That said, per various discussions, we'd really want a more-or-less > > fully formed patch to land prior to the last CF, for this to have any > > chance. Perhaps that means it's not going to happen, which would be > > unfortunate, but it's not beyond the possible, in my view, at least. > > Well, I could send a rebased patch with the new things proposed > upthread, and with things split in as many patches as I can get out, > roughly: > 1) One patch for the catalog split > 2) One for the GUC param controlling recommended protocols > 3) One for the pg_upgrade function cleaning up automatically the > entries of unsupported protocols > 4) SCRAM on top of the rest, which is at more or less 75% something > that Heikki produced. That sounds like a pretty reasonable split, to me at least. > >> > In addition, I would prefer to maintain the current structure of the > >> > pg_authid table and use the rolpassword and rolvaliduntil columns to > >> > store only the md5 verifier which will also be stored in > >> > pg_auth_verifiers. This would provide a smoother migration path with > >> > the idea that rolpassword and rolvaliduntil will be removed from > >> > pg_authid in a future version of Postgres. > >> > >> Actually, I am of the opinion that both rolpassword and rolvaliduntil > >> should be directly part of pg_auth_verifiers. Being able to handle > >> multiple verifiers for the same protocol is a feature that is being > >> asked for with a given password handling policy (was pinged again > >> about that in Moscow last week). Rolling in new verifiers needs now > >> extra roles to be created. > > > > I'm on Michael's side here. I don't believe it makes sense to try and > > maintain the exact structure of pg_authid. We are certainly happy to > > whack around the other catalogs, and I'm unimpressed with my prior > > efforts to provide backwards-compatible catalog (see pg_user, et al) for > > just a few releases- we seem unable to get rid of them now, even though > > we should have years ago, really. Indeed, I'd be just as happy if we > > got rid of them during this work.. > > We'd need as well to switch pg_shadow to have an array of elements > made of protocol:identifier instead of a single password field. There > can be only one valid identifier per protocol, user and valid_until > for a single point in time, and I can't believe that we should > recommend only one authentication protocol per single major version of > Postgres. Ugh, that sounds pretty grotty to me. Applications which consider these fields will need to be updated, one way or the other, and I'd much rather they be updated to work with reasonable structures rather than something we've hacked together in some faint hope that it'd be useful. An array in pg_shadow for a field which used to be a text field does *not* sound like a simpler solution to me, and I'd rather simply do away with those views entirely, or at least nuke the fields which are at issue, than try to come up with something between wholesale change and no change that ends up being worse than both. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] WIP: SCRAM authentication
On Mon, Feb 15, 2016 at 9:17 AM, Stephen Frost wrote: > Michael, > > * Michael Paquier (michael.paqu...@gmail.com) wrote: >> On Sat, Feb 13, 2016 at 3:05 AM, David Steele wrote: >> > On 11/16/15 8:53 AM, Michael Paquier wrote: >> >> On Sat, Sep 5, 2015 at 9:31 AM, Bruce Momjian wrote: >> >>> On Fri, Sep 4, 2015 at 04:51:33PM -0400, Stephen Frost wrote: >> > Coming in late, but can you explain how multiple passwords allow for >> > easier automated credential rotation? If you have five applications >> > with stored passwords, I imagine you can't change them all at once, so >> > with multiples you could change it on one, then go to the others and >> > change it there, and finally, remove the old password. Is that the >> > process? I am not realizing that without multiple plasswords, this is >> > a >> > hard problem. >> That's exactly the process if multiple passwords can be used. If >> there's only one account and one password supported then you have to >> change all the systems all at once and that certainly can be a hard >> problem. >> >> One way to deal with this is to have a bunch of different accounts, but >> that's certainly not simple either and can get quite painful. >> >>> OK, for me, if we can explain the benefit for users, it seems worth >> >>> doing just to allow that. >> >> Reviving an old thread for a patch still registered in this commit >> >> fest to make the arguing move on. >> > >> > I was wondering if this patch was going to be submitted for the 2016-03 CF? >> >> For 9.6 I am afraid this is too late, per the rule that there cannot >> be huge patches for the last CF of a development cycle. But I have >> plans for this set of features afterwards with the first CF of 9.7 and >> I was planning to talk about it at PgCon's unconference if I can get >> there to gather some feedback. There is still cruel need for it on my >> side.. > > There's a lot of good reason to get SCRAM added as a protocol, > considering our current password-based implementation is rather.. > lacking. Regarding the specific comment about the timing, that rule is > specifically to prevent large patches from landing in the last CF > without any prior discussion or review, as I recall, so I'm not sure it > really applies here as there's been quite a bit of discussion and review > already. Honestly I don't know what to answer to that. > That said, per various discussions, we'd really want a more-or-less > fully formed patch to land prior to the last CF, for this to have any > chance. Perhaps that means it's not going to happen, which would be > unfortunate, but it's not beyond the possible, in my view, at least. Well, I could send a rebased patch with the new things proposed upthread, and with things split in as many patches as I can get out, roughly: 1) One patch for the catalog split 2) One for the GUC param controlling recommended protocols 3) One for the pg_upgrade function cleaning up automatically the entries of unsupported protocols 4) SCRAM on top of the rest, which is at more or less 75% something that Heikki produced. >> > In addition, I would prefer to maintain the current structure of the >> > pg_authid table and use the rolpassword and rolvaliduntil columns to >> > store only the md5 verifier which will also be stored in >> > pg_auth_verifiers. This would provide a smoother migration path with >> > the idea that rolpassword and rolvaliduntil will be removed from >> > pg_authid in a future version of Postgres. >> >> Actually, I am of the opinion that both rolpassword and rolvaliduntil >> should be directly part of pg_auth_verifiers. Being able to handle >> multiple verifiers for the same protocol is a feature that is being >> asked for with a given password handling policy (was pinged again >> about that in Moscow last week). Rolling in new verifiers needs now >> extra roles to be created. > > I'm on Michael's side here. I don't believe it makes sense to try and > maintain the exact structure of pg_authid. We are certainly happy to > whack around the other catalogs, and I'm unimpressed with my prior > efforts to provide backwards-compatible catalog (see pg_user, et al) for > just a few releases- we seem unable to get rid of them now, even though > we should have years ago, really. Indeed, I'd be just as happy if we > got rid of them during this work.. We'd need as well to switch pg_shadow to have an array of elements made of protocol:identifier instead of a single password field. There can be only one valid identifier per protocol, user and valid_until for a single point in time, and I can't believe that we should recommend only one authentication protocol per single major version of Postgres. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: SCRAM authentication
Michael, * Michael Paquier (michael.paqu...@gmail.com) wrote: > On Sat, Feb 13, 2016 at 3:05 AM, David Steele wrote: > > On 11/16/15 8:53 AM, Michael Paquier wrote: > >> On Sat, Sep 5, 2015 at 9:31 AM, Bruce Momjian wrote: > >>> On Fri, Sep 4, 2015 at 04:51:33PM -0400, Stephen Frost wrote: > > Coming in late, but can you explain how multiple passwords allow for > > easier automated credential rotation? If you have five applications > > with stored passwords, I imagine you can't change them all at once, so > > with multiples you could change it on one, then go to the others and > > change it there, and finally, remove the old password. Is that the > > process? I am not realizing that without multiple plasswords, this is a > > hard problem. > That's exactly the process if multiple passwords can be used. If > there's only one account and one password supported then you have to > change all the systems all at once and that certainly can be a hard > problem. > > One way to deal with this is to have a bunch of different accounts, but > that's certainly not simple either and can get quite painful. > >>> OK, for me, if we can explain the benefit for users, it seems worth > >>> doing just to allow that. > >> Reviving an old thread for a patch still registered in this commit > >> fest to make the arguing move on. > > > > I was wondering if this patch was going to be submitted for the 2016-03 CF? > > For 9.6 I am afraid this is too late, per the rule that there cannot > be huge patches for the last CF of a development cycle. But I have > plans for this set of features afterwards with the first CF of 9.7 and > I was planning to talk about it at PgCon's unconference if I can get > there to gather some feedback. There is still cruel need for it on my > side.. There's a lot of good reason to get SCRAM added as a protocol, considering our current password-based implementation is rather.. lacking. Regarding the specific comment about the timing, that rule is specifically to prevent large patches from landing in the last CF without any prior discussion or review, as I recall, so I'm not sure it really applies here as there's been quite a bit of discussion and review already. That said, per various discussions, we'd really want a more-or-less fully formed patch to land prior to the last CF, for this to have any chance. Perhaps that means it's not going to happen, which would be unfortunate, but it's not beyond the possible, in my view, at least. > > In addition, I would prefer to maintain the current structure of the > > pg_authid table and use the rolpassword and rolvaliduntil columns to > > store only the md5 verifier which will also be stored in > > pg_auth_verifiers. This would provide a smoother migration path with > > the idea that rolpassword and rolvaliduntil will be removed from > > pg_authid in a future version of Postgres. > > Actually, I am of the opinion that both rolpassword and rolvaliduntil > should be directly part of pg_auth_verifiers. Being able to handle > multiple verifiers for the same protocol is a feature that is being > asked for with a given password handling policy (was pinged again > about that in Moscow last week). Rolling in new verifiers needs now > extra roles to be created. I'm on Michael's side here. I don't believe it makes sense to try and maintain the exact structure of pg_authid. We are certainly happy to whack around the other catalogs, and I'm unimpressed with my prior efforts to provide backwards-compatible catalog (see pg_user, et al) for just a few releases- we seem unable to get rid of them now, even though we should have years ago, really. Indeed, I'd be just as happy if we got rid of them during this work.. Thanks! Stephen signature.asc Description: Digital signature
Re: [HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?
Patric Bechtel writes: > Tom Lane schrieb am 14.02.2016 um 17:51: >> I think your problem is that the planner won't apply >> match_boolean_index_clause() or >> expand_boolean_index_clause(), because it has a hard-wired idea of which >> index opclasses could >> possibly benefit from that, cf IsBooleanOpfamily(). > If someone might give me a hint where to look, I'd be grateful. In principle, instead of using the hard-wired IsBooleanOpfamily test, we could check "op_in_opfamily(BooleanEqualOperator, opfamily)" to see whether those transforms would produce something that works with the index. The reason I didn't do it that way is that those tests are made for every WHERE clause the planner ever considers, so it seemed like adding a syscache lookup that is usually going to accomplish nothing would be pretty annoying overhead. The trick to getting an acceptable patch here would be to figure out how to arrange things to not cost much when the optimization doesn't apply. One idea worth exploring is to redefine IsBooleanOpfamily to inspect the index's AM OID as well as the opfamily, and code it along the lines of ((relam) == BTREE_AM_OID ? (opfamily) == BOOL_BTREE_FAM_OID : (relam) == HASH_AM_OID ? (opfamily) == BOOL_HASH_FAM_OID : op_in_opfamily(BooleanEqualOperator, opfamily)) so that the extra syscache lookup doesn't have to happen at all for btree and hash indexes. But I doubt that moves the needle far enough. Another idea is to try to rearrange match_clause_to_indexcol and expand_indexqual_conditions so that IsBooleanOpfamily() doesn't get hit quite so easily. I don't remember at the moment exactly why that's the first test rather than something further down. Probably at least part of the reason is an assumption that IsBooleanOpfamily() is cheap. It's possible that match_boolean_index_clause() is actually cheaper than the full-fledged version of IsBooleanOpfamily() and so the order of those tests should be reversed; moving them down to after other tests might also be appropriate. Another line of thought is to try to cache the result of the IsBooleanOpfamily() test in the IndexOptInfo structs so it doesn't have to get done so many times. But that would be more invasive and I'm not sure that it helps in simple cases. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Robert Haas writes: > On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane wrote: >> pgprocno of the specific PGPROC, or of the group leader? If it's >> the former, I'm pretty suspicious that there are deadlock and/or >> linked-list-corruption hazards here. If it's the latter, at least >> the comments around this are misleading. > Leader. The other way would be nuts. ... and btw, either BecomeLockGroupMember() has got this backwards, or I'm still fundamentally confused. Also, after fixing that it would be good to add a comment explaining why it's not fundamentally unsafe for BecomeLockGroupMember() to examine leader->pgprocno without having any relevant lock. AFAICS, that's only okay because the pgprocno fields are never changed after InitProcGlobal, even when a PGPROC is recycled. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Tom, Tom Lane schrieb am 14.02.2016 um 17:51: > Patric Bechtel writes: >> I tried to add bool support to the btree_gin contrib module, and as far as I >> can tell, it >> seems to work (wasn't that complicated, actually). But now I'm stuck, as >> PostgreSQL doesn't >> seem to like to use my new index, if I use equality or unequality, just with >> greater and >> lower than. > > I think your problem is that the planner won't apply > match_boolean_index_clause() or > expand_boolean_index_clause(), because it has a hard-wired idea of which > index opclasses could > possibly benefit from that, cf IsBooleanOpfamily(). oh, sh*t... My motivation was the size of the bool indexes; they are tiny and really fast. It feels almost like bitmap indexes. I hope that's not too far over my head already... but I'll take a look. If someone might give me a hint where to look, I'd be grateful. Thanks a lot for the hint, Patric -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAlbA+64ACgkQfGgGu8y7ypCfVwCg81dCY9Mv70+2dk8e3+5xChyO C7cAn1fRV3NAosi0W3IisKNEmS9K9hZE =Xd+r -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Robert Haas writes: > On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane wrote: >> Robert Haas writes: >>> ... the lock manager lock that protects the fields in a >>> given PGPROC is chosen by taking pgprocno modulo the number of lock >>> manager partitions. >> pgprocno of the specific PGPROC, or of the group leader? If it's >> the former, I'm pretty suspicious that there are deadlock and/or >> linked-list-corruption hazards here. If it's the latter, at least >> the comments around this are misleading. > Leader. The other way would be nuts. On closer inspection, that's actually still somewhat problematic, because that essentially means that no one can inspect another backend's lockGroupLeader etc fields unless they take *all* the lock partition LWLocks first. You can't take just the relevant one because you don't know which one that is, at least not without dereferencing lockGroupLeader which is entirely unsafe if you haven't got the appropriate lock. This isn't a problem for members of the lock group, since they already know who the leader is and hence which partition lock to use. And it's not a problem for the deadlock detector either since it'll take all those partition locks anyway. But it makes life difficult for status inspection code. I'm finding that the only way to write a lock-group-aware function that tells whether A is blocked by B is to hold both the ProcArrayLock and *all* of the lock partition locks throughout inspection of the data structures. I'd hoped to be able to restrict it to locking just the partition holding the lock A is blocked on, but that ain't working. Maybe this is all okay, but it makes me nervous that the data structures may have been over-optimized. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns
On 14/02/2016 5:11 pm, "Tom Lane" wrote: > > David Rowley writes: > > On 12/02/2016 12:01 am, "Tom Lane" wrote: > > I can't quite understand what you're seeing here. > > The second loop is iterating through the original GROUP BY list, so it > will see again any outer Vars that were excluded by the first loop. > It needs to exclude them exactly the same, because they are outside > the scope of its data structures. Consider something like (admittedly > pretty silly, but legal SQL) > > create table up (u1 int, u2 int, u3 int); > create table down (f1 int primary key, f2 int); > > select * from othertable, up > where u1 in (select f2 from down group by f1, f2, up.u3); > > up.u3 would have varlevelsup = 1, varno = 2, varattno = 3. > If you don't skip it then the surplusvars[var->varno] access > will be trying to fetch off the end of the surplusvars[] array, > because there is only one RTE in the subquery's rangetable > though there are two in the outer query's rangetable. Thanks for explaining this. Clearly I missed the case of the varno pointing off the end of the array. Thanks for noticing and fixing.
[HACKERS] Refectoring of receivelog.c
I was working on adding the tar streaming functionality we talked about at the developer meeting to pg_basebackup, and rapidly ran across the issue that Andres has been complaining about for a while. The code in receivelog.c just passes an insane number of parameters around. Adding or changing even a small thing ends up touching a huge number of places. Here's an attempt to refactor the code to instead pass around a control structure. I think it's a definite win already now, and we can't just keep adding new functionality on top of the current one. I'll proceed to work on the actual functionality I was working on to go on top of this separately, but would appreciate a review of this part independently. It's mostly mechanical, but there may definitely be mistakes - or thinkos in the whole idea... -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ *** a/src/bin/pg_basebackup/pg_basebackup.c --- b/src/bin/pg_basebackup/pg_basebackup.c *** *** 372,381 typedef struct static int LogStreamerMain(logstreamer_param *param) { ! if (!ReceiveXlogStream(param->bgconn, param->startptr, param->timeline, ! param->sysidentifier, param->xlogdir, ! reached_end_position, standby_message_timeout, ! NULL, false, true)) /* * Any errors will already have been reported in the function process, --- 372,391 static int LogStreamerMain(logstreamer_param *param) { ! StreamCtl stream; ! ! MemSet(&stream, sizeof(stream), 0); ! stream.startpos = param->startptr; ! stream.timeline = param->timeline; ! stream.sysidentifier = param->sysidentifier; ! stream.stream_stop = reached_end_position; ! stream.standby_message_timeout = standby_message_timeout; ! stream.synchronous = false; ! stream.mark_done = true; ! stream.basedir = param->xlogdir; ! stream.partial_suffix = NULL; ! ! if (!ReceiveXlogStream(param->bgconn, &stream)) /* * Any errors will already have been reported in the function process, *** a/src/bin/pg_basebackup/pg_receivexlog.c --- b/src/bin/pg_basebackup/pg_receivexlog.c *** *** 276,285 FindStreamingStart(uint32 *tli) static void StreamLog(void) { ! XLogRecPtr startpos, ! serverpos; ! TimeLineID starttli, ! servertli; /* * Connect in replication mode to the server --- 276,286 static void StreamLog(void) { ! XLogRecPtr serverpos; ! TimeLineID servertli; ! StreamCtl stream; ! ! MemSet(&stream, 0, sizeof(stream)); /* * Connect in replication mode to the server *** *** 311,327 StreamLog(void) /* * Figure out where to start streaming. */ ! startpos = FindStreamingStart(&starttli); ! if (startpos == InvalidXLogRecPtr) { ! startpos = serverpos; ! starttli = servertli; } /* * Always start streaming at the beginning of a segment */ ! startpos -= startpos % XLOG_SEG_SIZE; /* * Start the replication --- 312,328 /* * Figure out where to start streaming. */ ! stream.startpos = FindStreamingStart(&stream.timeline); ! if (stream.startpos == InvalidXLogRecPtr) { ! stream.startpos = serverpos; ! stream.timeline = servertli; } /* * Always start streaming at the beginning of a segment */ ! stream.startpos -= stream.startpos % XLOG_SEG_SIZE; /* * Start the replication *** *** 329,340 StreamLog(void) if (verbose) fprintf(stderr, _("%s: starting log streaming at %X/%X (timeline %u)\n"), ! progname, (uint32) (startpos >> 32), (uint32) startpos, ! starttli); ! ReceiveXlogStream(conn, startpos, starttli, NULL, basedir, ! stop_streaming, standby_message_timeout, ".partial", ! synchronous, false); PQfinish(conn); conn = NULL; --- 330,346 if (verbose) fprintf(stderr, _("%s: starting log streaming at %X/%X (timeline %u)\n"), ! progname, (uint32) (stream.startpos >> 32), (uint32) stream.startpos, ! stream.timeline); ! ! stream.stream_stop = stop_streaming; ! stream.standby_message_timeout = standby_message_timeout; ! stream.synchronous = synchronous; ! stream.mark_done = false; ! stream.basedir = basedir; ! stream.partial_suffix = ".partial"; ! ReceiveXlogStream(conn, &stream); PQfinish(conn); conn = NULL; *** a/src/bin/pg_basebackup/receivelog.c --- b/src/bin/pg_basebackup/receivelog.c *** *** 33,59 static XLogRecPtr lastFlushPosition = InvalidXLogRecPtr; static bool still_sending = true; /* feedback still needs to be sent? */ ! static PGresult *HandleCopyStream(PGconn *conn, XLogRecPtr startpos, ! uint32 timeline, char *basedir, ! stream_stop_callback stream_stop, int standby_message_timeout, ! char *partial_suffix, XLogRecPtr *stoppos, ! bool synchronous, bool mark_done); static int CopyStreamPoll(PGconn *conn, long timeout_ms); static int CopyStreamReceive(PGconn *conn, long t
Re: [HACKERS] pl/pgSQL, get diagnostics and big data
* Andreas 'ads' Scherbaum wrote: Attached is a new version of the patch, with %lu replaced by %zu. I re-ran all the tests, especially the long test with 2^32+x rows, and it produces the same result as before. To paraphrase Twain: "Sire, the Board finds this patch perfect in all the requirements and qualifications for inclusion into core, and doth hold his case open for decision after due examination by his committer." The Windows issue is corrected, and all regression tests pass on Windows and FreeBSD. I can find no further fault with this patch. Sorry it took so long, my other PostgreSQL issue happened just when I was going to test the updated patch. -- Christian -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Freeze avoidance of very large table.
On Sun, Feb 14, 2016 at 12:19 AM, Masahiko Sawada wrote: > Thank you for reviewing this patch. > I've divided 000 patch into two patches, and attached latest 4 patches in > total. Thank you! I'll go through this again as soon as I have a free moment. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extend pgbench expressions with functions
On Sun, Feb 14, 2016 at 11:28 AM, Fabien COELHO wrote: > Here is a 3 part v29: > > a) add support for integer functions in pgbench, including turning >operators as functions, as well as some minimal infrastructure for >additional types (this allows to minimize the diffs with the next >patch which adds double). > > b) add support for doubles, including setting double variables. >Note that variable are not explicitely typed. Add some >double-related functions, most interesting of them for pgbench >are the randoms. > > c) remove \setrandom support (as thanks to part b \set x random(...) does >the same). Prints an error pointing to the replacement if \setrandom is >used in a pgbench script. Thanks, I'll review these as soon as I can get to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
On Sun, Feb 14, 2016 at 1:33 PM, Tom Lane wrote: > Robert Haas writes: >> First, the overall concept here is that processes can either be a >> member of a lock group or a member of no lock group. > > Check. > >> Second, I can review the data structure changes and the associated >> invariants. > > The data structure additions seemed relatively straightforward, though > I did wonder why you added groupLeader to PROCLOCK. Why is that not > redundant with tag.myProc->lockGroupLeader? If it isn't redundant, > it's inadequately documented. If it is, seems better to lose it. That's a good question. There are locking considerations: the lock manager lock for a given PROCLOCK isn't necessarily the same one that protects tag.myProc->lockGroupLeader. I can go and look back through the contexts where we use the PROCLOCK field and see if we can get by without this, but I suspect it's too much of a pain. > Also, isn't the comment on this PGPROC field incorrect: > > PGPROC *lockGroupLeader;/* lock group leader, if I'm a follower */ > > ie shouldn't you s/follower/member/ ? Yes, that would be better. >> ... the lock manager lock that protects the fields in a >> given PGPROC is chosen by taking pgprocno modulo the number of lock >> manager partitions. > > pgprocno of the specific PGPROC, or of the group leader? If it's > the former, I'm pretty suspicious that there are deadlock and/or > linked-list-corruption hazards here. If it's the latter, at least > the comments around this are misleading. Leader. The other way would be nuts. >> Each PROCLOCK now has a new groupLeader flag. While a PGPROC's >> lockGroupLeader may be NULL if that process is not involved in group >> locking, a PROCLOCK's groupLeader always has a non-NULL value; it >> points back to the PGPROC that acquired the lock. > > This is not what the comment on it says; and your prose explanation > here implies that it should actually be equal to tag.myProc, or else > you are using some definition of "acquired the lock" that I don't > follow at all. There could be lots of PGPROCs that have acquired > a lock in one sense or another; which one do you mean? The PROCLOCK's groupLeader will be equal to tag.myProc->lockGroupLeader unless that is NULL. In that case it will be equal to tag.myProc. Or at least that's the intention, modulo bugs. >> With respect to pg_locks - and for that matter also pg_stat_activity - >> I think you are right that improvement is needed. > > This is really the core of my concern at the moment. I think that > isolationtester is probably outright broken for any situation where the > queries-under-test are being parallel executed, and so will be any other > client that's trying to identify who blocks whom from pg_locks. OK. >> The simplest thing we could do to make that easier is, in >> pg_stat_activity, have parallel workers advertise the PID that >> launched them in a new field; and in pg_locks, have members of a lock >> group advertise the leader's PID in a new field. > > That would be simple for us, but it would break every existing client-side > query that tries to identify blockers/blockees; and not only would those > queries need work but they would become very substantially more complex > and slower (probably at least 4-way joins not 2-way). We already know > that isolationtester's query has performance problems in the buildfarm. > I think more thought is needed here, and that this area should be > considered a release blocker. It's certainly a blocker for any thought > of turning parallel query on by default. I don't quite see why this would cause the joins to get more complex. It's also not really clear what the alternative is. You could show all the locks under the leader's PID, but that's horribly confusing because there might be duplicates, and because if somebody's waiting you really want to be able to tell which process to target with gdb. You can show everybody under their own PIDs as now, but then you can't tell which processes are in groups together. So I don't see what there is except to show both values. Maybe there's some whole-new way of displaying this information that would be more clear but I don't really see what it is. >> I hope this helps and please let me know what more you think I should do. > > At the least you need to make another pass over the comments and README > addition. I highlighted above a few critical inaccuracies, and it's not > hard to find a lot of less-critical typos in the comments in that commit. > Transposing some of what you've written here into the README would likely > be a good thing too. I'm getting on a long plane flight shortly but go back over this when I can find some time to do that in an appropriately careful way. Please feel free to make such corrections to comments and so forth as may seem urgent to you in the meanwhile. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hacker
Re: [HACKERS] [COMMITTERS] pgsql: Introduce group locking to prevent parallel processes from deadl
Robert Haas writes: > First, the overall concept here is that processes can either be a > member of a lock group or a member of no lock group. Check. > Second, I can review the data structure changes and the associated > invariants. The data structure additions seemed relatively straightforward, though I did wonder why you added groupLeader to PROCLOCK. Why is that not redundant with tag.myProc->lockGroupLeader? If it isn't redundant, it's inadequately documented. If it is, seems better to lose it. Also, isn't the comment on this PGPROC field incorrect: PGPROC *lockGroupLeader;/* lock group leader, if I'm a follower */ ie shouldn't you s/follower/member/ ? > ... the lock manager lock that protects the fields in a > given PGPROC is chosen by taking pgprocno modulo the number of lock > manager partitions. pgprocno of the specific PGPROC, or of the group leader? If it's the former, I'm pretty suspicious that there are deadlock and/or linked-list-corruption hazards here. If it's the latter, at least the comments around this are misleading. > Each PROCLOCK now has a new groupLeader flag. While a PGPROC's > lockGroupLeader may be NULL if that process is not involved in group > locking, a PROCLOCK's groupLeader always has a non-NULL value; it > points back to the PGPROC that acquired the lock. This is not what the comment on it says; and your prose explanation here implies that it should actually be equal to tag.myProc, or else you are using some definition of "acquired the lock" that I don't follow at all. There could be lots of PGPROCs that have acquired a lock in one sense or another; which one do you mean? > With respect to pg_locks - and for that matter also pg_stat_activity - > I think you are right that improvement is needed. This is really the core of my concern at the moment. I think that isolationtester is probably outright broken for any situation where the queries-under-test are being parallel executed, and so will be any other client that's trying to identify who blocks whom from pg_locks. > The simplest thing we could do to make that easier is, in > pg_stat_activity, have parallel workers advertise the PID that > launched them in a new field; and in pg_locks, have members of a lock > group advertise the leader's PID in a new field. That would be simple for us, but it would break every existing client-side query that tries to identify blockers/blockees; and not only would those queries need work but they would become very substantially more complex and slower (probably at least 4-way joins not 2-way). We already know that isolationtester's query has performance problems in the buildfarm. I think more thought is needed here, and that this area should be considered a release blocker. It's certainly a blocker for any thought of turning parallel query on by default. > I hope this helps and please let me know what more you think I should do. At the least you need to make another pass over the comments and README addition. I highlighted above a few critical inaccuracies, and it's not hard to find a lot of less-critical typos in the comments in that commit. Transposing some of what you've written here into the README would likely be a good thing too. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: enhancing slow query log, and autoexplain log about waiting on lock before query exec time
2016-02-14 17:46 GMT+01:00 Tom Lane : > Pavel Stehule writes: > > We have a patch, that inject logs about the time waiting on locks before > > query execution. This feature helps us lot of, and I hope, it can be > > generally useful. > > Doesn't log_lock_waits cover that territory already? > It does. But It creates different log entry - and can be hard to join slow query with log entry sometimes lot of lines before. This proposal is about taking important information comfortably - and log parsing and processing is simpler. Regards Pavel > regards, tom lane >
Re: [HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?
Patric Bechtel writes: > I tried to add bool support to the btree_gin contrib module, and as far as I > can tell, it seems to > work (wasn't that complicated, actually). > But now I'm stuck, as PostgreSQL doesn't seem to like to use my new index, if > I use equality or > unequality, just with greater and lower than. I think your problem is that the planner won't apply match_boolean_index_clause() or expand_boolean_index_clause(), because it has a hard-wired idea of which index opclasses could possibly benefit from that, cf IsBooleanOpfamily(). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: enhancing slow query log, and autoexplain log about waiting on lock before query exec time
Pavel Stehule writes: > We have a patch, that inject logs about the time waiting on locks before > query execution. This feature helps us lot of, and I hope, it can be > generally useful. Doesn't log_lock_waits cover that territory already? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extend pgbench expressions with functions
Hello Michaël, I'll be happy if you do the review of the resulting split. OK, I am fine with this scenario as well. I have luckily done nothing yet. Here is a 3 part v29: a) add support for integer functions in pgbench, including turning operators as functions, as well as some minimal infrastructure for additional types (this allows to minimize the diffs with the next patch which adds double). b) add support for doubles, including setting double variables. Note that variable are not explicitely typed. Add some double-related functions, most interesting of them for pgbench are the randoms. c) remove \setrandom support (as thanks to part b \set x random(...) does the same). Prints an error pointing to the replacement if \setrandom is used in a pgbench script. -- Fabien.diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ade1b53..9d5eb32 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -798,8 +798,11 @@ pgbench options dbname The expression may contain integer constants such as 5432, references to variables :variablename, and expressions composed of unary (-) or binary operators - (+, -, *, /, %) - with their usual associativity, and parentheses. + (+, -, *, /, + %) with their usual associativity, function calls and + parentheses. + shows the available + functions. @@ -965,6 +968,52 @@ f(x) = exp(-parameter * (x - min) / (max - min + 1)) / (1.0 - exp(-parameter)) + + +PgBench Functions + + + + Function + Return Type + Description + Example + Result + + + + + abs(a) + same as a + integer value + abs(-17) + 17 + + + debug(a) + same asa + print to stderr the given argument + debug(5432) + 5432 + + + max(i [, ... ] ) + integer + maximum value + max(5, 4, 3, 2) + 5 + + + min(i [, ... ] ) + integer + minimum value + min(5, 4, 3, 2) + 2 + + + + + As an example, the full definition of the built-in TPC-B-like transaction is: diff --git a/src/bin/pgbench/exprparse.y b/src/bin/pgbench/exprparse.y index 06ee04b..93c6173 100644 --- a/src/bin/pgbench/exprparse.y +++ b/src/bin/pgbench/exprparse.y @@ -16,10 +16,13 @@ PgBenchExpr *expr_parse_result; +static PgBenchExprList *make_elist(PgBenchExpr *exp, PgBenchExprList *list); static PgBenchExpr *make_integer_constant(int64 ival); static PgBenchExpr *make_variable(char *varname); -static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, +static PgBenchExpr *make_op(const char *operator, PgBenchExpr *lexpr, PgBenchExpr *rexpr); +static int find_func(const char *fname); +static PgBenchExpr *make_func(const int fnumber, PgBenchExprList *args); %} @@ -31,13 +34,15 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, int64 ival; char *str; PgBenchExpr *expr; + PgBenchExprList *elist; } +%type elist %type expr -%type INTEGER -%type VARIABLE +%type INTEGER function +%type VARIABLE FUNCTION -%token INTEGER VARIABLE +%token INTEGER VARIABLE FUNCTION %token CHAR_ERROR /* never used, will raise a syntax error */ /* Precedence: lowest to highest */ @@ -49,16 +54,25 @@ static PgBenchExpr *make_op(char operator, PgBenchExpr *lexpr, result: expr{ expr_parse_result = $1; } +elist: { $$ = NULL; } + | expr { $$ = make_elist($1, NULL); } + | elist ',' expr { $$ = make_elist($3, $1); } + ; + expr: '(' expr ')' { $$ = $2; } | '+' expr %prec UMINUS { $$ = $2; } - | '-' expr %prec UMINUS { $$ = make_op('-', make_integer_constant(0), $2); } - | expr '+' expr { $$ = make_op('+', $1, $3); } - | expr '-' expr { $$ = make_op('-', $1, $3); } - | expr '*' expr { $$ = make_op('*', $1, $3); } - | expr '/' expr { $$ = make_op('/', $1, $3); } - | expr '%' expr { $$ = make_op('%', $1, $3); } + | '-' expr %prec UMINUS { $$ = make_op("-", make_integer_constant(0), $2); } + | expr '+' expr { $$ = make_op("+", $1, $3); } + | expr '-' expr { $$ = make_op("-", $1, $3); } + | expr '*' expr { $$ = make_op("*", $1, $3); } + | expr '/' expr { $$ = make_op("/", $1, $3); } + | expr '%' expr { $$ = make_op("%", $1, $3); } | INTEGER{ $$ = make_integer_constant($1); } | VARIABLE { $$ = make_variable($1); } + | function '(' elist ')'{ $$ = make_func($1, $3); } + ; + +function: FUNCTION { $$ = find_func($1); pg_free($1); } ; %% @@ -68,8 +82,9 @@ make_integer_constant(int64 ival) { PgBenchExpr *expr = pg_malloc(sizeof(PgBenchExpr)); - expr->etype = ENODE_INTEGER_CONSTANT; - expr->u.integer_constant.ival = ival; + expr->etype = ENODE_CONSTANT; + expr->u.constant.type = PGBT_INT; + expr->u.constant.u.ival = ival; return expr; } @@ -84,14 +99,128
Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns
David Rowley writes: > On 12/02/2016 12:01 am, "Tom Lane" wrote: >> Um, AFAICS, you *do* need to check again in the second loop, else you'll >> be accessing a surplusvars[] entry that might not exist at all, and in >> any case might falsely tell you that you can exclude the outer var from >> the new GROUP BY list. > I can't quite understand what you're seeing here. The second loop is iterating through the original GROUP BY list, so it will see again any outer Vars that were excluded by the first loop. It needs to exclude them exactly the same, because they are outside the scope of its data structures. Consider something like (admittedly pretty silly, but legal SQL) create table up (u1 int, u2 int, u3 int); create table down (f1 int primary key, f2 int); select * from othertable, up where u1 in (select f2 from down group by f1, f2, up.u3); up.u3 would have varlevelsup = 1, varno = 2, varattno = 3. If you don't skip it then the surplusvars[var->varno] access will be trying to fetch off the end of the surplusvars[] array, because there is only one RTE in the subquery's rangetable though there are two in the outer query's rangetable. When I trace through this example, it manages not to crash because in point of fact the outer Var has already been replaced by a Param. But I don't think this code should be assuming that; it's an artifact of the detailed order of processing in subquery_planner(), and could easily change in the future, for example if we tried to move the whole affair to the jointree preprocessing stage as we discussed earlier. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Defaults for replication/backup
On Sun, Feb 14, 2016 at 2:00 AM, Robert Haas wrote: > On Sat, Feb 13, 2016 at 11:31 AM, Andres Freund > wrote: > > On 2016-02-13 11:10:58 -0500, Tom Lane wrote: > >> Magnus Hagander writes: > >> > The big thing is, IIRC, that we turn off the optimizations in > >> > min_wal_level. *most* people will see no impact of their regular > >> > application runtime from that, but it might definitely have an effect > on > >> > data loads and such. For normal runtime, there should be very close > to zero > >> > difference, no? > >> > >> I was asking for a demonstration of that, not just handwaving. Even if > >> it was measured years ago, I wouldn't assume the comparison would be > >> the same on current Postgres. > > > > Well, let's look at what the difference between wal_level's are: > > 1) the (currently broken) optimization of not WAL logging COPY et al, > >for newly created relations. > > 2) relation AccessExclusiveLocks are WAL logged on >= hot_standby > > 3) Subtransaction assignment records are generated for >= hot_standby > >after 64 records. > > 4) checkpoints and bgwriter occasionally generate XLOG_RUNNING_XACTS > >records > > 5) btreevacuum() and _bt_getbuf() sometimes do additional WAL logging on > >>= hot_standby > > 6) Once per vacuum we issue a XLOG_HEAP2_CLEANUP_INFO > > > > > > 1) obviously can have performance impact; but only in a relatively > > narrow set of cases. I doubt any of the others really play that major a > > role. But I really think minor efficiency differences are besides the > > point. Making backups and replication easier has a far bigger positive > > impact, for far more users. > > I think that there are definitely people for whom #1 is an issue, but > maybe that's not a sufficient reason not to change the default. > There definitely are people. I'd say most of those would already be tuning their config in different ways as well, so changing it is a lot lower cost for them. And there's fewer of them. > As a thought experiment, how about teaching initdb how to tailor the > configuration to a couple of common scenarios via some new flag? I'll > call it --setup although that's probably not best. > > --setup=replication --- Preconfigure for replication. > --setup=standalone --- Preconfigure for standalone mode. > --setup=ephemeral --- Preconfigure for an ephemeral instance that > doesn't need durability because we'll blow it up soon. > > Whichever mode we make the default, I think this kind of thing would > make life easier for users. > I'd like to reiterate that this is not just about replication, it's even more about decent backups. As soon as your database goes to the point that pg_dump is not a great solution for backup (and that happens pretty quickly, given the performance of pg_restore), the response is "oh, go change these 3 parameters in your config and then restart the db disconnecting all your users" which gives interesting reactions from people... I could go with somethin glike --setup=small --setup=normal --setup=ephemeral which would be a more proper mapping I think. Of course, this would also give us the ability to bikeshed about three different colors, one in each predefined set! :) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/
[HACKERS] Bool btree_gin index not chosen on equality contraint, but on greater/lower?
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, I tried to add bool support to the btree_gin contrib module, and as far as I can tell, it seems to work (wasn't that complicated, actually). But now I'm stuck, as PostgreSQL doesn't seem to like to use my new index, if I use equality or unequality, just with greater and lower than. My test subject is a table with 13690993 rows, one of them (bar) is a boolean, 376442 are true, the others are false, no nulls. The index on bar is a btree_gin index. Table is vacuum analyzed and all, so statistics are fresh and usable, as the estimates within the plans show. Here's the plan if I ask for 300 rows with d, as in "select id from foo where bar": Seq Scan on foo (cost=0.00..684709.82 rows=385495 width=8) (actual time=0.014..2657.326 rows=376442 loops=1) Filter: bar Rows Removed by Filter: 13314551 Planning time: 0.309 ms Execution time: 2672.559 ms But, if I query "select if from foo where bar>'f'": Bitmap Heap Scan on foo (cost=7955.59..313817.94 rows=385495 width=8) (actual time=220.631..365.299 rows=376442 loops=1) Recheck Cond: (bar > false) Heap Blocks: exact=104100 -> Bitmap Index Scan on ix_foo_gin (cost=0.00..7859.21 rows=385495 width=0) (actual time=193.192..193.192 rows=376442 loops=1) Index Cond: (bar > false) Planning time: 0.400 ms Execution time: 377.518 ms It starts using the index. The rule seems to be: as long as I'm using <, <=, >= or >, it chooses the index. If I use = or !=, it doesn't. Here's my definition of the bool_ops for the gin index (it's very similar to the other indexes in the btree_gin extension): CREATE OPERATOR CLASS bool_ops DEFAULT FOR TYPE bool USING gin AS OPERATOR1 <, OPERATOR2 <=, OPERATOR3 =, OPERATOR4 >=, OPERATOR5 >, FUNCTION1 btboolcmp(bool,bool), FUNCTION2 gin_extract_value_bool(bool, internal), FUNCTION3 gin_extract_query_bool(bool, internal, int2, internal, internal), FUNCTION4 gin_btree_consistent(internal, int2, anyelement, int4, internal, internal), FUNCTION5 gin_compare_prefix_bool(bool,bool,int2, internal), STORAGE bool; What am I overseeing? - -- Patric -BEGIN PGP SIGNATURE- Version: GnuPG v2.0.22 (GNU/Linux) Comment: GnuPT 2.5.2 iEYEARECAAYFAlbAdg0ACgkQfGgGu8y7ypBHZwCg0g1JSgZTc0OBYsMzrj6w4Zy6 DTQAn38gk8hfqFf86N8hWEzwqc9afjar =SLMC -END PGP SIGNATURE- -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] extend pgbench expressions with functions
On Sun, Feb 14, 2016 at 4:42 PM, Fabien COELHO wrote: > >> So I would be fine to do a portion of the legwork and extract from this >> patch something smaller that adds only functions as a first step, with the >> minimum set of functions I mentioned upthread. Robert, Alvaro, Fabien, does >> that sound fine to you? > > > Thanks, but this is my c*, I have a few hours of travelling this evening, > I'll do it then. > > I'll be happy if you do the review of the resulting split. OK, I am fine with this scenario as well. I have luckily done nothing yet. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Removing Functionally Dependent GROUP BY Columns
On 12/02/2016 12:01 am, "Tom Lane" wrote: > > David Rowley writes: > > [ prune_group_by_clause_ab4f491_2016-01-23.patch ] > > [ check_functional_grouping_refactor.patch ] > > I've committed this with mostly-cosmetic revisions (principally, rewriting > a lot of the comments, which seemed on the sloppy side). Many thanks for committing this and improving the comments. > >> * Both of the loops iterating over the groupClause neglect to check > >> varlevelsup, thus leading to assert failures or worse if an outer-level > >> Var is present in the GROUP BY list. (I'm pretty sure outer Vars can > >> still be present at this point, though I might be wrong.) > > > Fixed in the first loop, and the way I've rewritten the code to use > > bms_difference, there's no need to check again in the 2nd loop. > > Um, AFAICS, you *do* need to check again in the second loop, else you'll > be accessing a surplusvars[] entry that might not exist at all, and in > any case might falsely tell you that you can exclude the outer var from > the new GROUP BY list. > I can't quite understand what you're seeing here. As far as I can tell from reading the code again (on my phone ) the varlevelsup check in the 2nd loop is not required. Any var with varlevelsup above zero won't make it into the surplus bitmap for the release as the bms_difference call just removes the pk vars from the varlevelsup =0 vars. So the bms_ismember should be fine. I can't see what I'm missing. In either case it test does no harm. > That was the only actual bug I found, though I changed some other stuff.
[HACKERS] proposal: enhancing slow query log, and autoexplain log about waiting on lock before query exec time
Hi, the interpretation of slow queries or entries from auto-explain log can be difficult some times, because the the main time of query evaluation is waiting on lock, and this number isn't in related entry. Our situation is little bit difficult, because we have not direct access to PostgreSQL logs, we working with log aggregators. We have a patch, that inject logs about the time waiting on locks before query execution. This feature helps us lot of, and I hope, it can be generally useful. Is this feature interesting for community? What do you think about it? Regards Pavel