Re: [HACKERS] [PROPOSAL] VACUUM Progress Checker.
Hi, On 2015/12/03 19:05, Kyotaro HORIGUCHI wrote: > At Thu, 3 Dec 2015 16:24:32 +0900, Amit Langote > wrote >> By the way, there are some non-st_progress_* fields that I was talking >> about in my previous message. For example, st_activity_flag (which I have >> suggested to rename to st_command instead). It needs to be set once at the >> beginning of the command processing and need not be touched again. I think >> it may be a better idea to do the same for table name or OID. It also >> won't change over the duration of the command execution. So, we could set >> it once at the beginning where that becomes known. Currently in the patch, >> it's reported in st_progress_message[0] by every pgstat_report_progress() >> invocation. So, the table name will be strcpy()'d to shared memory for >> every scanned block that's reported. > > If we don't have dedicate reporting functions for each phase > (that is, static data phase and progress phase), the one function > pgstat_report_progress does that by having some instruction from > the caller and it would be a bitfield. > > Reading the flags for making decision whether every field to copy > or not and branching by that seems too much for int32 (or maybe > 64?) fields. Alhtough it would be in effective when we have > progress_message fields, I don't think it is a good idea without > having progress_message. > >> pgstat_report_progress(uint32 *param1, uint16 param1_bitmap, >>char param2[][..], uint16 param2_bitmap) >> { >> ... >> for(i = 0; i < 16 ; i++) >> { >> if (param1_bitmap & (1 << i)) >>beentry->st_progress_param[i] = param1[i]; >> if (param2_bitmap & (1 << i)) >>strcpy(beentry->..., param2[i]); >> } > > Thoughts? Hm, I guess progress messages would not change that much over the course of a long-running command. How about we pass only the array(s) that we change and NULL for others: With the following prototype for pgstat_report_progress: void pgstat_report_progress(uint32 *uint32_param, int num_uint32_param, bool *message_param[], int num_message_param); If we just wanted to change, say scanned_heap_pages, then we report that using: uint32_param[1] = scanned_heap_pages; pgstat_report_progress(uint32_param, 3, NULL, 0); Also, pgstat_report_progress() should check each of its parameters for NULL before iterating over to copy. So in most reports over the course of a command, the message_param would be NULL and hence not copied. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Tue, Nov 17, 2015 at 12:19 AM, Alvaro Herrera wrote: > Thomas Munro wrote: >> New version attached, merging recent changes. > > I wonder about the TailMatches and Matches macros --- wouldn't it be > better to have a single one, renaming TailMatches to Matches and > replacing the current Matches() with an initial token that corresponds > to anchoring to start of command? Just wondering, not terribly attached > to the idea. + /* TODO:TM -- begin temporary, not part of the patch! */ + Assert(!word_matches(NULL, "")); + [...] + Assert(!word_matches("foo", "")); + /* TODO:TM -- end temporary */ Be sure to not forget to remove that later. - else if (pg_strcasecmp(prev5_wd, "DEFAULT") == 0 && -pg_strcasecmp(prev4_wd, "PRIVILEGES") == 0 && -(pg_strcasecmp(prev3_wd, "FOR") == 0 || - pg_strcasecmp(prev3_wd, "IN") == 0)) - { - static const char *const list_ALTER_DEFAULT_PRIVILEGES_REST[] = - {"GRANT", "REVOKE", NULL}; - - COMPLETE_WITH_LIST(list_ALTER_DEFAULT_PRIVILEGES_REST); - } + else if (TailMatches5("DEFAULT", "PRIVILEGES", "FOR", "ROLE", MatchAny) || +TailMatches5("DEFAULT", "PRIVILEGES", "IN", "SCHEMA", MatchAny)) + CompleteWithList2("GRANT", "REVOKE"); For this chunk I think that you need to check for ROLE|USER and not only ROLE. + else if (TailMatches4("ALTER", "DOMAIN", MatchAny, "RENAME")) { static const char *const list_ALTERDOMAIN[] = {"CONSTRAINT", "TO", NULL}; I think you should remove COMPLETE_WITH_LIST here for consistency with the rest. - else if (pg_strcasecmp(prev5_wd, "DOMAIN") == 0 && -pg_strcasecmp(prev3_wd, "RENAME") == 0 && -pg_strcasecmp(prev2_wd, "CONSTRAINT") == 0) + else if (TailMatches3("RENAME", "CONSTRAINT", MatchAny)) COMPLETE_WITH_CONST("TO"); Perhaps you are missing DOMAIN here? - else if (pg_strcasecmp(prev4_wd, "ALTER") == 0 && -pg_strcasecmp(prev3_wd, "SEQUENCE") == 0 && -pg_strcasecmp(prev_wd, "NO") == 0) - { - static const char *const list_ALTERSEQUENCE2[] = - {"MINVALUE", "MAXVALUE", "CYCLE", NULL}; - - COMPLETE_WITH_LIST(list_ALTERSEQUENCE2); - } + else if (TailMatches4("ALTER", "SEQUEMCE", MatchAny, "NO")) + CompleteWithList3("MINVALUE", "MAXVALUE", "CYCLE"); Typo here: s/SEQUEMCE/SEQUENCE. - else if (pg_strcasecmp(prev5_wd, "TABLE") == 0 && -pg_strcasecmp(prev3_wd, "RENAME") == 0 && -(pg_strcasecmp(prev2_wd, "COLUMN") == 0 || - pg_strcasecmp(prev2_wd, "CONSTRAINT") == 0) && -pg_strcasecmp(prev_wd, "TO") != 0) + else if (TailMatches6("ALTER", "TABLE", MatchAny, "RENAME", "COLUMN|CONSTRAINT", MatchAny) && +!TailMatches1("TO")) This should use TailMatches5 without ALTER for consistency with the existing code? - else if (pg_strcasecmp(prev_wd, "CLUSTER") == 0 && -pg_strcasecmp(prev2_wd, "WITHOUT") != 0) + else if (TailMatches1("CLUSTER") && !TailMatches2("WITHOUT", "CLUSTER")) Here removing CLUSTER should be fine. - else if (pg_strcasecmp(prev2_wd, "CLUSTER") == 0 && -pg_strcasecmp(prev_wd, "ON") != 0 && -pg_strcasecmp(prev_wd, "VERBOSE") != 0) - { + else if (TailMatches2("CLUSTER", MatchAny) && !TailMatches1("VERBOSE")) Handling of ON has been forgotten. - else if (pg_strcasecmp(prev3_wd, "CREATE") == 0 && -!(pg_strcasecmp(prev2_wd, "USER") == 0 && pg_strcasecmp(prev_wd, "MAPPING") == 0) && -(pg_strcasecmp(prev2_wd, "ROLE") == 0 || - pg_strcasecmp(prev2_wd, "GROUP") == 0 || pg_strcasecmp(prev2_wd, "USER") == 0)) + else if (TailMatches3("CREATE", "ROLE|GROUP|USER", MatchAny) && +!TailMatches3("CREATE", "USER", "MAPPING")) !TailMatches2("USER", "MAPPING")? - else if (pg_strcasecmp(prev4_wd, "CREATE") == 0 && -pg_strcasecmp(prev3_wd, "MATERIALIZED") == 0 && -pg_strcasecmp(prev2_wd, "VIEW") == 0) + else if (TailMatches3("CREATE", "MATERIALIZED", "VIEW")) Forgot a MatchAny here? - else if (pg_strcasecmp(prev_wd, "DELETE") == 0 && -!(pg_strcasecmp(prev2_wd, "ON") == 0 || - pg_strcasecmp(prev2_wd, "GRANT") == 0 || - pg_strcasecmp(prev2_wd, "BEFORE") == 0 || - pg_strcasecmp(prev2_wd, "AFTER") == 0)) + else if (TailMatches1("DELETE") && !TailMatches2("ON|GRANT|BEFORE|AFTER", "DELETE")) COMPLETE_WITH_CONST("FR
Re: [HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
On 2015/12/07 2:52, Andreas Seltenreich wrote: > Hi, > > I've added new grammar rules to sqlsmith and improved some older ones. > This was rewarded with a return of "failed to generate plan" errors. > The failing queries all contain a lateral subquery. The shortest of the > failing queries are below. They were run against the regression db of > master as of db07236. > > smith=# select msg, query from error where >(firstline(msg) ~~ 'ERROR: failed to build any%' >or firstline(msg) ~~ 'ERROR: could not devise a query plan%') > and t > now() - interval '1 day' order by length(query) asc limit 3; > > ERROR: failed to build any 8-way joins > select > ref_96.foreign_table_schema as c0, > sample_87.is_supported as c1 > from > information_schema.sql_packages as sample_87 tablesample system (0.2) > right join information_schema._pg_foreign_tables as ref_96 > on (sample_87.feature_id = ref_96.foreign_table_catalog ), > lateral (select > sample_87.is_verified_by as c0, > ref_97.indexed_col as c1, > coalesce(sample_87.feature_id, ref_96.foreign_server_name) as c2, > 4 as c3 > from > public.comment_test as ref_97 > where ref_97.id ~>~ ref_97.indexed_col > fetch first 73 rows only) as subq_33 > where ref_96.foreign_table_name ~~ subq_33.c1 > > ERROR: could not devise a query plan for the given query > select > subq_43.c0 as c0 > from > (select > ref_181.installed as c0 > from > pg_catalog.pg_available_extension_versions as ref_181, > lateral (select > ref_181.name as c0, > ref_181.installed as c1 > from > pg_catalog.pg_conversion as ref_182 > where ref_182.conname ~~* ref_181.version > fetch first 98 rows only) as subq_42 > where (subq_42.c0 is not NULL) > or (subq_42.c1 is NULL)) as subq_43 > right join pg_catalog.pg_language as sample_177 tablesample system (2.8) > on (subq_43.c0 = sample_177.lanispl ) > where sample_177.lanowner < sample_177.lanvalidator > > ERROR: failed to build any 5-way joins > select > ref_239.id2 as c0, > 40 as c1, > ref_239.id2 as c2, > ref_238.aa as c3 > from > public.tt5 as sample_289 tablesample system (8.1) > inner join information_schema.element_types as ref_237 > on (sample_289.x = ref_237.character_maximum_length ) > left join public.b as ref_238 > on (ref_237.character_maximum_length = ref_238.aa ) > left join public.num_exp_mul as ref_239 > on (ref_237.numeric_precision_radix = ref_239.id1 ), > lateral (select > sample_290.b as c0, > sample_289.y as c1, > ref_239.id2 as c2 > from > public.rtest_t8 as sample_290 tablesample bernoulli (4.6) > where (sample_290.b > ref_238.bb) > and (sample_289.y > ref_239.expected) > fetch first 91 rows only) as subq_64 > where (subq_64.c1 > sample_289.y) > and (sample_289.y = ref_239.expected) > fetch first 133 rows only FWIW, I'm seeing the following behaviors: * Removing the limit (fetch first...) in lateral sub-queries makes the errors go away for all above queries. * For the last query producing "failed to build any 5-way joins" error, setting join_collapse_limit to 1, 2 and 4 makes the error go away irrespective of whether the limit is kept or not. Thanks, Amit -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Making tab-complete.c easier to maintain
On Thu, Nov 26, 2015 at 2:45 PM, Kyotaro HORIGUCHI wrote: > What do you think about this solution? This patch fails to compile on OSX: Undefined symbols for architecture x86_64: "_ExceptionalCondition", referenced from: _pg_regexec in regexec.o _cfind in regexec.o _find in regexec.o _newdfa in regexec.o _cfindloop in regexec.o _shortest in regexec.o _cdissect in regexec.o ... So, to begin with, this may be better if replugged as a standalone library, aka moving the regexp code into src/common for example or similar. Also, per the comments on top of rcancelrequested, rstacktoodeep and rcancelrequested, returning unconditionally 0 is not a good idea for -DFRONTEND. Callbacks should be defined and made available for callers. - {"EVENT TRIGGER", NULL, NULL}, + {"EVENT TRIGGER", Query_for_list_of_event_triggers, NULL}, {"EXTENSION", Query_for_list_of_extensions}, - {"FOREIGN DATA WRAPPER", NULL, NULL}, + {"FOREIGN DATA WRAPPER", Query_for_list_of_fdws, NULL}, {"FOREIGN TABLE", NULL, NULL}, {"FUNCTION", NULL, &Query_for_list_of_functions}, {"GROUP", Query_for_list_of_roles}, {"LANGUAGE", Query_for_list_of_languages}, {"INDEX", NULL, &Query_for_list_of_indexes}, - {"MATERIALIZED VIEW", NULL, NULL}, + {"MATERIALIZED VIEW", NULL, &Query_for_list_of_matviews}, This has value as a separate patch. The patch has many whitespaces, and unrelated diffs. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On Mon, Dec 7, 2015 at 12:06 PM, Noah Misch wrote: > On Wed, Dec 02, 2015 at 06:59:09PM -0300, Alvaro Herrera wrote: >> --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl >> +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl > >> -like($recovery_conf, qr/^standby_mode = 'on[']$/m, 'recovery.conf sets >> standby_mode'); >> -like($recovery_conf, qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, >> 'recovery.conf sets primary_conninfo'); >> - >> -command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' ], >> +like( >> + $recovery_conf, >> + qr/^standby_mode = 'on[']$/m, >> + 'recovery.conf sets standby_mode'); >> +like( >> + $recovery_conf, >> + qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, >> + 'recovery.conf sets primary_conninfo'); > > This now elicits a diagnostic: > > Use of uninitialized value $ENV{"PGPORT"} in regexp compilation at > t/010_pg_basebackup.pl line 175, <> line 1. Fixed. >> --- a/src/bin/pg_controldata/t/001_pg_controldata.pl >> +++ b/src/bin/pg_controldata/t/001_pg_controldata.pl > >> -standard_initdb "$tempdir/data"; >> -command_like([ 'pg_controldata', "$tempdir/data" ], >> + >> +my $node = get_new_node(); >> +$node->init; >> +$node->start; > > Before the commit, this test did not start a server. Fixed. >> --- /dev/null >> +++ b/src/test/perl/PostgresNode.pm > >> +sub _update_pid >> +{ >> + my $self = shift; >> + >> + # If we can open the PID file, read its first line and that's the PID >> we >> + # want. If the file cannot be opened, presumably the server is not >> + # running; don't be noisy in that case. >> + open my $pidfile, $self->data_dir . "/postmaster.pid"; >> + if (not defined $pidfile) > > $ grep -h 'at /.*line' src/bin/*/tmp_check/log/* |sort|uniq -c > 1 cannot remove directory for > /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ: Directory not > empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 1 cannot remove directory for > /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base/13264: > Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 1 cannot remove directory for > /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base: > Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 1 cannot remove directory for > /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata: Directory > not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 > 28 readline() on closed filehandle $pidfile at > /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pm > line 308. > 28 Use of uninitialized value in concatenation (.) or string at > /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pm > line 309. > > $pidfile is always defined. Test the open() return value. That's something I have addressed here: http://www.postgresql.org/message-id/cab7npqtop28zxv_sxfo2axgjoesfvllmo6syddafv0duvsf...@mail.gmail.com I am including the fix as well here to do a group shot. >> + { >> + $self->{_pid} = undef; >> + print "# No postmaster PID\n"; >> + return; >> + } >> + >> + $self->{_pid} = <$pidfile>; > > chomp() that value to remove its trailing newline. Right. >> + print "# Postmaster PID is $self->{_pid}\n"; >> + close $pidfile; >> +} > >> + my $devnull = $TestLib::windows_os ? "nul" : "/dev/null"; > > Unused variable. Removed. >> +sub DESTROY >> +{ >> + my $self = shift; >> + return if not defined $self->{_pid}; >> + print "# signalling QUIT to $self->{_pid}\n"; >> + kill 'QUIT', $self->{_pid}; > > On Windows, that kill() is the wrong thing. I suspect "pg_ctl kill" will be > the right thing, but that warrants specific testing. I don't directly see any limitation with the use of kill on Windows.. http://perldoc.perl.org/functions/kill.html But indeed using directly pg_ctl kill seems like a better fit for the PG infrastructure. > Postmaster log file names became less informative. Before the commit: > Should nodes get a name, so we instead see master_57834.log? I am not sure that this is necessary. There is definitely a limitation regarding the fact that log files of nodes get overwritten after each test, hence I would tend with just appending the test name in front of the node_* files similarly to the other files. > See also the things Tom Lane identified in <27550.1449342...@sss.pgh.pa.us>. Yep, I marked this email as something to address when it was sent. On Sun, Dec 6, 2015 at 4:09 AM, Tom Lane wrote: > BTW, if we consider that gospel, why has PostgresNode.pm got this in its > BEGIN block? > > # PGHOST is set once and for all through a single series of tests when > # this module is loaded. > $test_pghost = > $TestLib::windows_os ? "127.0.0.1" : TestLib::t
Re: [HACKERS] Foreign join pushdown vs EvalPlanQual
On 2015/12/05 5:15, Robert Haas wrote: On Tue, Dec 1, 2015 at 10:20 PM, Etsuro Fujita wrote: One thing I can think of is that we can keep both the structure of a ForeignPath node and the API of create_foreignscan_path as-is. The latter is a good thing for FDW authors. And IIUC the patch you posted today, I think we could make create_foreignscan_plan a bit simpler too. Ie, in your patch, you modified that function as follows: @@ -2129,7 +2134,9 @@ create_foreignscan_plan(PlannerInfo *root, ForeignPath *best_path, */ scan_plan = rel->fdwroutine->GetForeignPlan(root, rel, rel_oid, best_path, - tlist, scan_clauses); + tlist, + scan_clauses); + outerPlan(scan_plan) = fdw_outerplan; I think that would be OK, but I think we would have to do a bit more here about the fdw_outerplan's targetlist and qual; I think that the targetlist needs to be changed to fdw_scan_tlist, as in the patch [1], and that it'd be better to change the qual to remote conditions, ie, quals not in the scan_plan's scan.plan.qual, to avoid duplicate evaluation of local conditions. (In the patch [1], I didn't do anything about the qual because the current postgres_fdw join pushdown patch assumes that all the the scan_plan's scan.plan.qual are pushed down.) Or, FDW authors might want to do something about fdw_recheck_quals for a foreign-join while creating the fdw_outerplan. So if we do that during GetForeignPlan, I think we could make create_foreignscan_plan a bit simpler, or provide flexibility to FDW authors. It's certainly true that we need the alternative plan's tlist to match that of the main plan; otherwise, it's going to be difficult for the FDW to make use of that alternative subplan to fill its slot, which is kinda the point of all this. OK. However, I'm quite reluctant to introduce code into create_foreignscan_plan() that forces the subplan's tlist to match that of the main plan. For one thing, that would likely foreclose the possibility of an FDW ever using the outer plan for any purpose other than EPQ rechecks. It may be hard to imagine what else you'd do with the outer plan as things are today, but right now the two haves of the patch - letting FDWs have an outer subplan, and providing them with a way of overriding the EPQ recheck behavior - are technically independent. Putting tlist-altering behavior into create_foreignscan_plan() ties those two things together irrevocably. Agreed. Instead, I think we should go the opposite direction and pass the outerplan to GetForeignPlan after all. I was lulled into a full sense of security by the realization that every FDW that uses this feature MUST want to do outerPlan(scan_plan) = fdw_outerplan. That's true, but irrelevant. The point is that the FDW might want to do something additional, like frob the outer plan's tlist, and it can't do that if we don't pass it fdw_outerplan. So we should do that, after all. As I proposed upthread, another idea would be to 1) to store an fdw_outerpath in the fdw_private list of a ForeignPath node, and then 2) to create an fdw_outerplan from *the fdw_outerpath stored into the fdw_private* in GetForeignPlan. One good thing for this is that we keep the API of create_foreignscan_path as-is. What do you think about that? Updated patch attached. This fixes a couple of whitespace issues that were pointed out, also. Thanks for updating the patch! Best regards, Etsuro Fujita -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb_delete not documented
Peter Eisentraut writes: > I see. The reference from pg_operator to pg_proc is by OID rather than > function name, so I didn't find them. Is that because the function is > overloaded? Yeah, I suppose so --- regproc can't resolve overloaded function names. > It's kind of odd that these are the only operators (at > first glance) that are set up like that. I think the customary thing when creating functions meant as operator support is to give them unique names. These weren't done that way ... I wasn't involved, but I wonder whether there was uncertainty as to whether these should be documented as functions or operators. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] jsonb_delete not documented
On 12/6/15 9:51 PM, Tom Lane wrote: > Peter Eisentraut writes: >> The new function jsonb_delete does not appear to be documented. Is that >> intentional? > >> The only thing that's documented is the #- operator for >> jsonb_delete_path. But jsonb_delete(jsonb, text) and >> jsonb_delete(jsonb, int) are not documented. (Those don't have an >> operator.) > > Yeah they do ... > regression=# \do+ - > ... > pg_catalog | -| jsonb | integer > | jsonb | pg_catalog.jsonb_delete | delete array > element > pg_catalog | -| jsonb | text > | jsonb | pg_catalog.jsonb_delete | delete object field > ... I see. The reference from pg_operator to pg_proc is by OID rather than function name, so I didn't find them. Is that because the function is overloaded? It's kind of odd that these are the only operators (at first glance) that are set up like that. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
On Sun, Dec 6, 2015 at 7:14 PM, Andreas Karlsson wrote: > I can also confirm the roughly 25% speedup in the best case (numerics which > are all distinct) with no measurable slowdown in the worst case. > > Given this speedup and the small size of the patch I think we should apply > it. I will set this patch to "Ready for Commiter". Thanks for the review, Andreas. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication and multimaster
> There are definitely two clear places where additional help would be > useful and welcome right now. > Here's a shortlist of replication-related open items I put together earlier. These are all independent things it'd be helpful to have when working with logical replication in PostgreSQL. There are a lot of medium-sized pieces of independent work still to be done to make some things work well. For example, you can't have multimaster logical replication with physical failover nodes for each MM node unless we replicate slot create/drop/advance over physical replication and copy them with pg_basebackup. Handling big transactions better: * logical decoding interleaved transaction streaming as discussed earlier Automatic DDL replication: * DDL deparse extension (Álvaro started some work on this) * Better way to link the pg_temp_nnn tables generated during table rewrite to the original table in decoding * logical decoding support for prepared xacts (before commit prepared). Useful for DDL replication, other consensus operations. Using physical replication/PITR with logical replication: * replication slots replicated to physical standbys ("Failover slots") * pg_basebackup should copy slots * slots should follow timeline changes Failover between logical replicas: * logical decoding support for sequences * logical decoding support for slot create/drop/advance * Separate slot WAL retention from confirmed commit point so you can say "I've replayed up to 1/AB but you should keep from 1/01". Needed in async MM to cope with node loss properly. Will write it up separately. Multimaster: * Sequence Access Method Other logical decoding enhancements: * Support exporting a new snapshot from an existing logical slot. Useful for COPYing new tables not previously replicated when added to a replication set, for resync'ing tables, comparing tables, etc. * WAL messages. Useful for voting, replay confirmation, etc. Rendered largely unnecessary by xact interleaved streaming. Misc: * An API to enumerate currently registered bgworkers * A way to securely make a libpq connection from a bgworker without messing with passwords etc. Generate one-time cookies, sometihng like that. * (unimportant but nice API fix): BGW_NO_RESTART_ON_CRASH -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Re: Reusing abbreviated keys during second pass of ordered [set] aggregates
Hi, I have reviewed this patch and it still applies to master, compiles and passes the test suite. I like the goal of the patch, making use of the already existing abbreviation machinery in more cases is something we should do and the patch itself looks clean. I can also confirm the roughly 25% speedup in the best case (numerics which are all distinct) with no measurable slowdown in the worst case. Given this speedup and the small size of the patch I think we should apply it. I will set this patch to "Ready for Commiter". Andreas -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] pglogical_output - a general purpose logical decoding output plugin
On 2 December 2015 at 18:38, Petr Jelinek wrote: > First, I wonder if it would be useful to mention somewhere, even if it's > only here in the mailing list how can the protocol be extended in > non-breaking way in future for transaction streaming if we ever get that. Good point. I'll address that in the DESIGN.md in the next rev. Separately, it's looking like xact streaming is possibly more complex than I hoped due to cache invalidation issues, but I haven't been able to fully understand the problem yet. The other thing is that I think we don't need the "forward_changesets" > parameter which currently decides if to forward changes that didn't > originate on local node. There is already hook for origin filtering which > provides same functionality in more flexible way so it seems redundant to > also have special boolean option for it. Removed, change pushed. Also pushed a change to expose the decoded row data to row filter hooks. I won't cut a v4 for this, as I'm working on merging the SGML-ified docs and will do a v4 with that and the above readme change once that's done. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] Re: In-core regression tests for replication, cascading, archiving, PITR, etc.
On Wed, Dec 02, 2015 at 06:59:09PM -0300, Alvaro Herrera wrote: > It seemed better to me to have the import list be empty, i.e. "use > TestLib ()" and then qualify the routine names inside PostgresNode, > instead of having to list the names of the routines to import, so I > pushed it that way after running the tests a few more times. I inspected commit 1caef31: > --- a/src/bin/pg_basebackup/t/010_pg_basebackup.pl > +++ b/src/bin/pg_basebackup/t/010_pg_basebackup.pl > -like($recovery_conf, qr/^standby_mode = 'on[']$/m, 'recovery.conf sets > standby_mode'); > -like($recovery_conf, qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, > 'recovery.conf sets primary_conninfo'); > - > -command_ok([ 'pg_basebackup', '-D', "$tempdir/backupxf", '-X', 'fetch' ], > +like( > + $recovery_conf, > + qr/^standby_mode = 'on[']$/m, > + 'recovery.conf sets standby_mode'); > +like( > + $recovery_conf, > + qr/^primary_conninfo = '.*port=$ENV{PGPORT}.*'$/m, > + 'recovery.conf sets primary_conninfo'); This now elicits a diagnostic: Use of uninitialized value $ENV{"PGPORT"} in regexp compilation at t/010_pg_basebackup.pl line 175, <> line 1. > --- a/src/bin/pg_controldata/t/001_pg_controldata.pl > +++ b/src/bin/pg_controldata/t/001_pg_controldata.pl > -standard_initdb "$tempdir/data"; > -command_like([ 'pg_controldata', "$tempdir/data" ], > + > +my $node = get_new_node(); > +$node->init; > +$node->start; Before the commit, this test did not start a server. > --- /dev/null > +++ b/src/test/perl/PostgresNode.pm > +sub _update_pid > +{ > + my $self = shift; > + > + # If we can open the PID file, read its first line and that's the PID we > + # want. If the file cannot be opened, presumably the server is not > + # running; don't be noisy in that case. > + open my $pidfile, $self->data_dir . "/postmaster.pid"; > + if (not defined $pidfile) $ grep -h 'at /.*line' src/bin/*/tmp_check/log/* |sort|uniq -c 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ: Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base/13264: Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata/base: Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 1 cannot remove directory for /data/nmisch/src/pg/postgresql/src/bin/scripts/tmp_testNNCZ/pgdata: Directory not empty at /home/nmisch/sw/cpan/lib/perl5/File/Temp.pm line 784 28 readline() on closed filehandle $pidfile at /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pm line 308. 28 Use of uninitialized value in concatenation (.) or string at /data/nmisch/src/pg/postgresql/src/bin/pg_rewind/../../../src/test/perl/PostgresNode.pm line 309. $pidfile is always defined. Test the open() return value. > + { > + $self->{_pid} = undef; > + print "# No postmaster PID\n"; > + return; > + } > + > + $self->{_pid} = <$pidfile>; chomp() that value to remove its trailing newline. > + print "# Postmaster PID is $self->{_pid}\n"; > + close $pidfile; > +} > + my $devnull = $TestLib::windows_os ? "nul" : "/dev/null"; Unused variable. > +sub DESTROY > +{ > + my $self = shift; > + return if not defined $self->{_pid}; > + print "# signalling QUIT to $self->{_pid}\n"; > + kill 'QUIT', $self->{_pid}; On Windows, that kill() is the wrong thing. I suspect "pg_ctl kill" will be the right thing, but that warrants specific testing. Postmaster log file names became less informative. Before the commit: -rw---. 1 nmisch nmisch 211265 2015-12-06 22:35:59.931114215 + regress_log_001_basic -rw---. 1 nmisch nmisch 268887 2015-12-06 22:36:21.871165951 + regress_log_002_databases -rw---. 1 nmisch nmisch 206808 2015-12-06 22:36:41.861213736 + regress_log_003_extrafiles -rw---. 1 nmisch nmisch 7464 2015-12-06 22:37:00.371256795 + master.log -rw---. 1 nmisch nmisch 6648 2015-12-06 22:37:01.381259211 + standby.log -rw---. 1 nmisch nmisch 208561 2015-12-06 22:37:02.381261374 + regress_log_004_pg_xlog_symlink After: -rw---. 1 nmisch nmisch 219581 2015-12-06 23:00:50.504643175 + regress_log_001_basic -rw---. 1 nmisch nmisch 276315 2015-12-06 23:01:12.924697085 + regress_log_002_databases -rw---. 1 nmisch nmisch 213940 2015-12-06 23:01:33.574747195 + regress_log_003_extrafiles -rw---. 1 nmisch nmisch 4114 2015-12-06 23:01:40.914764850 + node_57834.log -rw---. 1 nmisch nmisch 6166 2015-12-06 23:01:43.184770615 + node_57833.log -rw---. 1 nmisch nmisch 5550 2015-12-06 23:01:52.504792997 + node_57835.log -rw--
Re: [HACKERS] Logical replication and multimaster
On 7 December 2015 at 01:39, Konstantin Knizhnik wrote: > > I have integrated pglogical_output in multimaster > Excellent. I just pushed a change to pglogical_output that exposes the row contents (and the rest of the reorder change buffer contents) to hooks that want it, by the way. > using bdr_apply from BDR as template for implementation of receiver part. > Yep, that'll tide you over. We're working hard on getting the downstream part ready and you'll find it more flexible. > The time of insert is reduced almost 10 times comparing with logical > replication based on decoder_raw/receiver_raw plugins which performs > logical replication using SQL statements. But unfortunately time of updates > is almost not changed. > That's not too surprising, given that you'll have significant overheads for checking if keys are present when doing updates. This field is assigned by pglogical_init_api() function. And I can extend > this PGLogicalProtoAPI structure by adding some protocol specific fields. > Yep, that's the idea. > typedef struct PGLogicalProtoMM > { > PGLogicalProtoAPI api; > bool isLocal; /* mark transaction as local */ > } PGLogicalProtoMM; > I'm curious about what you're using the 'isLocal' field for. For MM you should only need to examine the replication origin assigned to the transaction to determine whether you're going to forward it or not. Were you not able to achieve what you wanted with a hook? If not, then we might need another hook. Could you explain what it's for in more detail? What I suggest is: have your downstream client install a pglogical_output hook for the transaction filter hook. There, examine the replication origin passed to the hook. If you want to forward locally originated xacts only (such as for mesh multimaster) you can just filter out everything where the origin is not InvalidRepOriginId. There are example hooks in contrib/pglogical_output_plhooks . There'll be a simple MM example using filter hooks in the pglogical downstream btw and we're working hard to get that out. > But I have to add "PGLogicalOutputData *data" parameter to > pglogical_write_rel_fn function. > Do you think that it will be better to pass this parameter to all > functions? > Yes, I agree that it should be passed to the API for the output protocol. It's pretty harmless. Please feel free to send a pull req. Note that we haven't made that pluggable from the outside though; there's no way to load a new protocol distributed separately from pglogical_output. The idea is really to make sure that between the binary protocol and json protocol we meet the reasonably expected set of use cases and don't need pluggable protocols. Perhaps that's over-optimistic, but we've already got and output plugin that has plug-in hooks, a plugin for a plugin. Do we need another? Also, if we allow dynamic loading of new protocols then that means we'll have a much harder time changing the protocol implementation API later, so it's not something I'm keen to do. Also, to make it secure to allow users to specify the protocol we'd have to make protocols implement an extension with a C function in pg_proc to return its API struct, like we do for hooks. So there'd be more hoop-jumping required to figure out how to talk to the client. If possible I'd like to find any oversights and omissions in the current protocol and its APIs to meet future use cases without having to introduce protocol plugins for an output plugin. May be it is not intended way of passing custom data to this functions... > Yeah, we weren't really thinking of the protocol API as intended to be pluggable and extensible. If you define new protocols you have to change the rest of the output plugin code anyway. Lets look at what protocol changes are needed to address your use case and see whether it's necessary to take the step of making the protocol fully pluggable or not. -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] jsonb_delete not documented
Peter Eisentraut writes: > The new function jsonb_delete does not appear to be documented. Is that > intentional? > The only thing that's documented is the #- operator for > jsonb_delete_path. But jsonb_delete(jsonb, text) and > jsonb_delete(jsonb, int) are not documented. (Those don't have an > operator.) Yeah they do ... regression=# \df+ jsonb_delete List of functions Schema | Name | Result data type | Argument data types | Type | Security | Volatility | Owner | Language | Source code| Description +--+--+-++--++--+--+--+-- pg_catalog | jsonb_delete | jsonb| jsonb, integer | normal | invoker | immutable | postgres | internal | jsonb_delete_idx | implementation of - operator pg_catalog | jsonb_delete | jsonb| jsonb, text | normal | invoker | immutable | postgres | internal | jsonb_delete | implementation of - operator (2 rows) regression=# \do+ - ... pg_catalog | -| jsonb | integer | jsonb | pg_catalog.jsonb_delete | delete array element pg_catalog | -| jsonb | text | jsonb | pg_catalog.jsonb_delete | delete object field ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Redundant sentence within INSERT documentation page (exclusion constraints)
There is a redundant second appearance of more or less the same sentence at one point in the new INSERT documentation -- I missed this during the recent overhaul of the 9.5+ INSERT documentation. Attached is a trivial patch that fixes this issue. -- Peter Geoghegan diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml index 945eb69..21b28a6 100644 --- a/doc/src/sgml/ref/insert.sgml +++ b/doc/src/sgml/ref/insert.sgml @@ -482,8 +482,7 @@ INSERT INTO table_name [ AS ON CONFLICT DO UPDATE. +arbiter index or constraint. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] jsonb_delete not documented
The new function jsonb_delete does not appear to be documented. Is that intentional? The only thing that's documented is the #- operator for jsonb_delete_path. But jsonb_delete(jsonb, text) and jsonb_delete(jsonb, int) are not documented. (Those don't have an operator.) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Attach comments to functions' parameters and return value
On 9/15/15 12:35 AM, Charles Clavadetscher wrote: COMMENT ON FUNCTION function_name ( [ [ argmode ] [ argname ] argtype [, ...] ] ) PARAMETER param_position IS 'text'; COMMENT ON FUNCTION function_name ( [ [ argmode ] [ argname ] argtype [, ...] ] ) RETURN VALUE IS 'text'; An alternative to "RETURN VALUE" could be "RETURNS", which would make the statement shorter, but I think this may be confusing. I like RETURN VALUE better myself. The parameter list of the function is only required to identify the function also in cases it exists with the same name in different flavours. This sticks to the general syntax of the command and should be easy to understand. Right. You should be able to use the current COMMENT ON code. Note however that the last time I looked it does NOT support the full syntax that CREATE FUNCTION does. It'd be nice to fix that, but that's mostly a separate matter. Though, it would probably be nice if all of this stuff (along with the regprocedure input function) could be factored into a single piece of code... -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Double linking MemoryContext children
On 9/14/15 7:24 AM, Jan Wieck wrote: On 09/12/2015 11:35 AM, Kevin Grittner wrote: On the other hand, a grep indicates that there are two places that MemoryContextData.nextchild is set (and we therefore probably need to also set the new field), and Jan's proposed patch only changes one of them. If we do this, I think we need to change both places that are affected, so ResourceOwnerCreate() in resowner.c would need a line or two added. ResourceOwnerCreate() sets ResourceOwnerData.nextchild, not MemoryContextData.nextchild. Anything ever happen with this? -- Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX Experts in Analytics, Data Architecture and PostgreSQL Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Tue, Nov 24, 2015 at 4:46 PM, Simon Riggs wrote: >> Parallel sort is very important. Robert, Amit and I had a call about >> this earlier today. We're all in agreement that this should be >> extended in that direction, and have a rough idea about how it ought >> to fit together with the parallelism primitives. Parallel sort in 9.6 >> could certainly happen -- that's what I'm aiming for. I haven't really >> done preliminary research yet; I'll know more in a little while. > > Glad to hear it, I was hoping to see that. As I mentioned just now, I'm working on parallel CREATE INDEX currently, which seems like a good proving ground for parallel sort, as its where the majority of really expensive sorts occur. It would be nice to get parallel-aware sort nodes in 9.6, but I don't think I'll be able to make that happen in time. The required work in the optimizer is just too complicated. The basic idea is that we use the parallel heapam interface, and have backends sort and write runs as with an external sort (if those runs are would-be internal sorts, we still write them to tape in the manner of external sorts). When done, worker processes release memory, but not tapes, initially. The leader reassembles an in-memory representation of the tapes that is basically consistent with it having generated those runs itself (using my new approach to external sorting). Then, it performs an on-the-fly merge, as before. At the moment, I have the sorting of runs within workers using the parallel heapam interface more or less working, with workers dumping out the runs to tape. I'll work on reassembling the state of the tapes within the leader in the coming week. It's all still rather rough, but I think I'll have benchmarks before people start taking time off later in the month, and possibly even code. Cutting the scope of parallel sort in 9.6 to only cover parallel CREATE INDEX will make it likely that I'll be able to deliver something acceptable for that release. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Tue, Nov 24, 2015 at 4:33 PM, Peter Geoghegan wrote: > So, the bottom line is: This patch seems very good, is unlikely to > have any notable downside (no case has been shown to be regressed), > but has yet to receive code review. I am working on a new version with > the first two commits consolidated, and better comments, but that will > have the same code, unless I find bugs or am dissatisfied. It mostly > needs thorough code review, and to a lesser extent some more > performance testing. I'm currently spending a lot of time working on parallel CREATE INDEX. I should not delay posting a new version of my patch series any further, though. I hope to polish up parallel CREATE INDEX to be able to show people something in a couple of weeks. This version features consolidated commits, the removal of the multipass_warning parameter, and improved comments and commit messages. It has almost entirely unchanged functionality. The only functional changes are: * The function useselection() is taught to distrust an obviously bogus caller reltuples hint (when it's already less than half of what we know to be the minimum number of tuples that the sort must sort, immediately after LACKMEM() first becomes true -- this is probably a generic estimate). * Prefetching only occurs when writing tuples. Explicit prefetching appears to hurt in some cases, as David Rowley has shown over on the dedicated thread. But it might still be that writing tuples is a case that is simple enough to benefit consistently, due to the relatively uniform processing that memory latency can hide behind for that case (before, the same prefetching instructions were used for CREATE INDEX and for aggregates, for example). Maybe we should consider trying to get patch 0002 (the memory pool/merge patch) committed first, something Greg Stark suggested privately. That might actually be an easier way of integrating this work, since it changes nothing about the algorithm we use for merging (it only improves memory locality), and so is really an independent piece of work (albeit one that makes a huge overall difference due to the other patches increasing the time spent merging in absolute terms, and especially as a proportion of the total). -- Peter Geoghegan From 7ff4a3e4448b25ba524e40c0db57307466f167d9 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Sun, 12 Jul 2015 13:14:01 -0700 Subject: [PATCH 3/6] Perform memory prefetching when writing memtuples This patch is based on, but quite distinct to a separately submitted, more general version which performs prefetching in several places [1]. This version now only performs prefetching of each "tuple proper" during the writing of batches of tuples (an entire run, written following a quicksort). The case for prefetching each "tuple proper" at several sites now seems weak due to difference in CPU microarchitecture. However, it might still be that there is a consistent improvement observable when writing out tuples, because that involves a particularly tight inner loop, with relatively predictable processing to hide memory latency behind. A helpful generic prefetch hint may be possible for this case, even if it proves impossible elsewhere. This has been shown to appreciably help on both a POWER7 server processor [2], and an Intel Mobile processor. [1] https://commitfest.postgresql.org/6/305/ [2] CAM3SWZR5rv3+F3FOKf35=dti7otmmcdfoe2vogur0pddg3j...@mail.gmail.com --- config/c-compiler.m4 | 17 + configure | 31 +++ configure.in | 1 + src/backend/utils/sort/tuplesort.c | 15 +++ src/include/c.h| 14 ++ src/include/pg_config.h.in | 3 +++ src/include/pg_config.h.win32 | 3 +++ src/include/pg_config_manual.h | 10 ++ 8 files changed, 94 insertions(+) diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 550d034..8be2122 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -271,6 +271,23 @@ fi])# PGAC_C_BUILTIN_UNREACHABLE +# PGAC_C_BUILTIN_PREFETCH +# - +# Check if the C compiler understands __builtin_prefetch(), +# and define HAVE__BUILTIN_PREFETCH if so. +AC_DEFUN([PGAC_C_BUILTIN_PREFETCH], +[AC_CACHE_CHECK(for __builtin_prefetch, pgac_cv__builtin_prefetch, +[AC_LINK_IFELSE([AC_LANG_PROGRAM([], +[int i = 0;__builtin_prefetch(&i, 0, 3);])], +[pgac_cv__builtin_prefetch=yes], +[pgac_cv__builtin_prefetch=no])]) +if test x"$pgac_cv__builtin_prefetch" = xyes ; then +AC_DEFINE(HAVE__BUILTIN_PREFETCH, 1, + [Define to 1 if your compiler understands __builtin_prefetch.]) +fi])# PGAC_C_BUILTIN_PREFETCH + + + # PGAC_C_VA_ARGS # -- # Check if the C compiler understands C99-style variadic macros, diff --git a/configure b/configure index 8c61390..cd8db5d 100755 --- a/configure +++ b/configure @@ -11338,6 +11338,37 @@ if test x"$pgac_cv__builtin_unreachable" = xyes
Re: [HACKERS] Using quicksort for every external sort run
On Sun, Dec 6, 2015 at 3:59 PM, Jeff Janes wrote: > My column has the format of ABC-123-456-789-0 > > The name-space identifier ("ABC-") is the same in 99.99% of the > cases. And to date, as well as in my extrapolation, the first two > digits of the numeric part are leading zeros and the third one is > mostly 0,1,2. So the first 8 bytes really have less than 2 bits worth > of information. So yeah, not surprising abbreviation was not useful. I think that given you're using the "C" collation, abbreviation should still go ahead. I posted a patch to do that, which I need to further justify per Robert's request (currently, we do nothing special based on collation). Abbreviation should help in surprisingly marginal cases, since far fewer memory accesses will be required in the early stages of the sort with only (say) 5 distinct abbreviated keys. Once abbreviated comparisons start to not help at all (with quicksort, at some partition), there's a good chance that the full keys can be reused to some extent, before being evicted from CPU caches. >> BTW, roughly what does this CREATE INDEX look like? Is it a composite >> index, for example? > > Nope, just a single column index. In the extrapolated data set, each > distinct value shows up a couple hundred times on average. I'm > thinking of converting it to a btree_gin index once I've tested them a > bit more, as the compression benefits are substantial. Unfortunately, that cannot use tuplesort.c at all. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Using quicksort for every external sort run
On Sun, Nov 29, 2015 at 2:01 AM, Peter Geoghegan wrote: > On Sat, Nov 28, 2015 at 4:05 PM, Peter Geoghegan wrote: >> So there was 27.76 seconds spent copying tuples into local memory >> ahead of the quicksort, 2 minutes 56.68 seconds spent actually >> quicksorting, and a trifling 10.32 seconds actually writing the run! I >> bet that the quicksort really didn't use up too much memory bandwidth >> on the system as a whole, since abbreviated keys are used with a cache >> oblivious internal sorting algorithm. > > Uh, actually, that isn't so: > > LOG: begin index sort: unique = f, workMem = 1048576, randomAccess = f > LOG: bttext_abbrev: abbrev_distinct after 160: 1.000489 > (key_distinct: 40.802210, norm_abbrev_card: 0.006253, prop_card: > 0.20) > LOG: bttext_abbrev: aborted abbreviation at 160 (abbrev_distinct: > 1.000489, key_distinct: 40.802210, prop_card: 0.20) > > Abbreviation is aborted in all cases that you tested. Arguably this > should happen significantly less frequently with the "C" locale, > possibly almost never, but it makes this case less than representative > of most people's workloads. I think that at least the first several > hundred leading attribute tuples are duplicates. I guess I wasn't paying sufficient attention to that part of trace_sort, I was not familiar enough with the abbreviation feature to interpret what it meant.I had thought we used 16 bytes for abbreviation, but now I see it is only 8 bytes. My column has the format of ABC-123-456-789-0 The name-space identifier ("ABC-") is the same in 99.99% of the cases. And to date, as well as in my extrapolation, the first two digits of the numeric part are leading zeros and the third one is mostly 0,1,2. So the first 8 bytes really have less than 2 bits worth of information. So yeah, not surprising abbreviation was not useful. (When I created the system, I did tests that showed it doesn't make much difference whether I used the format natively, or stripped it to something more compact on input and reformatted it on output. That was before abbreviation features existed) > > BTW, roughly what does this CREATE INDEX look like? Is it a composite > index, for example? Nope, just a single column index. In the extrapolated data set, each distinct value shows up a couple hundred times on average. I'm thinking of converting it to a btree_gin index once I've tested them a bit more, as the compression benefits are substantial. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PATCH: index-only scans with partial indexes
Hello Kyotaro-san, Sorry for the long delay since your response in this thread :-( On 10/14/2015 08:06 AM, Kyotaro HORIGUCHI wrote: The table t is referred to twice by different (alias) names (though the diferrence is made by EXPLAIN, it shows that they are different rels in plantree). So we'll have these indexrinfos: aidx: {b=2} bidx: {a=1} Yes, but each of them belongs to different rels. So, which makes index only scans unusable. The are usable. Yes, you're of course right. I got confused by the comment at IndexOptInfo that says "per-index information" but as you've pointed out it's really "per-index per-reloptinfo" which makes it perfectly suitable for keeping the info we need. However, I'm not sure the changes to check_partial_indexes() are correct - particularly the first loop. /* * Frequently, there will be no partial indexes, so first check to * make sure there's something useful to do here. */ have_partial = false; foreach(lc, rel->indexlist) { IndexOptInfo *index = (IndexOptInfo *) lfirst(lc); /* * index rinfos are the same to baseristrict infos for non-partial * indexes */ index->indrinfos = rel->baserestrictinfo; if (index->indpred == NIL) continue; /* ignore non-partial indexes */ if (index->predOK) continue; /* don't repeat work if already proven OK */ have_partial = true; break; } Now, imagine we have two indexes - partial and regular. In such case the loop will terminate after the first index (assuming predOK=true), so the regular index won't have indrinfos set. I think we need to remove the 'break' at the end. BTW it took me a while to understand the change at the end: /* Search for the first rinfo that is implied by indpred */ foreach (lcr, rel->baserestrictinfo) { RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr); if (predicate_implied_by(list_make1(rinfo->clause), index->indpred)) break; } /* This index needs rinfos different from baserestrictinfo */ if (lcr) { ... filter implied conditions ... } Is there a reason why not to simply move the second block into the if block in the foreach loop like this? /* Search for the first rinfo that is implied by indpred */ foreach (lcr, rel->baserestrictinfo) { RestrictInfo *rinfo = (RestrictInfo *) lfirst(lcr); if (predicate_implied_by(list_make1(rinfo->clause), index->indpred)) { ... filter implied conditions ... break; } } Otherwise the reworked patch seems fine to me, but I'll give it a bit more testing over the next few days. Thanks for the help so far! -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Unicode collations in FreeBSD 11, DragonFly BSD 4.4 without ICU
Hi Since the ICU patch from the BSD ports trees has been discussed on this list a few times I thought people might be interested in this news: https://lists.freebsd.org/pipermail/freebsd-current/2015-October/057781.html https://www.dragonflybsd.org/release44/ https://forums.freebsd.org/threads/this-is-how-i-like-open-source.53943/ (Thanks to pstef on #postgresql for pointing me at this). -- Thomas Munro http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a word-choice question
On 12/06/2015 11:03 AM, Tom Lane wrote: Chapman Flack writes: In PL/Java's function annotations, 'type' has been used for the volatility category: @Function(type=IMMUTABLE) public static String hello() { return "Hello, world!"; } It seems a bit infelicitous because you would probably sooner guess that 'type' was telling you something about the function's _data_ type Yeah ... But I've been trying think of something short, clear, preferably monosyllabic, less geeky than 'v8y', and I don't have a winner yet. Instead of thinking about "volatility", maybe something based on "stability" or "constancy"? I suppose "const" is too much of a conflict, but "stable=IMMUTABLE" doesn't seem awful. I would stick with "volatility". It makes for consistency - pg_proc has "provolatile" for example. I think it's a moderately well understood term in the Postgres world. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [sqlsmith] Failed to generate plan on lateral subqueries
Hi, I've added new grammar rules to sqlsmith and improved some older ones. This was rewarded with a return of "failed to generate plan" errors. The failing queries all contain a lateral subquery. The shortest of the failing queries are below. They were run against the regression db of master as of db07236. regards, Andreas smith=# select msg, query from error where (firstline(msg) ~~ 'ERROR: failed to build any%' or firstline(msg) ~~ 'ERROR: could not devise a query plan%') and t > now() - interval '1 day' order by length(query) asc limit 3; ERROR: failed to build any 8-way joins select ref_96.foreign_table_schema as c0, sample_87.is_supported as c1 from information_schema.sql_packages as sample_87 tablesample system (0.2) right join information_schema._pg_foreign_tables as ref_96 on (sample_87.feature_id = ref_96.foreign_table_catalog ), lateral (select sample_87.is_verified_by as c0, ref_97.indexed_col as c1, coalesce(sample_87.feature_id, ref_96.foreign_server_name) as c2, 4 as c3 from public.comment_test as ref_97 where ref_97.id ~>~ ref_97.indexed_col fetch first 73 rows only) as subq_33 where ref_96.foreign_table_name ~~ subq_33.c1 ERROR: could not devise a query plan for the given query select subq_43.c0 as c0 from (select ref_181.installed as c0 from pg_catalog.pg_available_extension_versions as ref_181, lateral (select ref_181.name as c0, ref_181.installed as c1 from pg_catalog.pg_conversion as ref_182 where ref_182.conname ~~* ref_181.version fetch first 98 rows only) as subq_42 where (subq_42.c0 is not NULL) or (subq_42.c1 is NULL)) as subq_43 right join pg_catalog.pg_language as sample_177 tablesample system (2.8) on (subq_43.c0 = sample_177.lanispl ) where sample_177.lanowner < sample_177.lanvalidator ERROR: failed to build any 5-way joins select ref_239.id2 as c0, 40 as c1, ref_239.id2 as c2, ref_238.aa as c3 from public.tt5 as sample_289 tablesample system (8.1) inner join information_schema.element_types as ref_237 on (sample_289.x = ref_237.character_maximum_length ) left join public.b as ref_238 on (ref_237.character_maximum_length = ref_238.aa ) left join public.num_exp_mul as ref_239 on (ref_237.numeric_precision_radix = ref_239.id1 ), lateral (select sample_290.b as c0, sample_289.y as c1, ref_239.id2 as c2 from public.rtest_t8 as sample_290 tablesample bernoulli (4.6) where (sample_290.b > ref_238.bb) and (sample_289.y > ref_239.expected) fetch first 91 rows only) as subq_64 where (subq_64.c1 > sample_289.y) and (sample_289.y = ref_239.expected) fetch first 133 rows only -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Logical replication and multimaster
Hi, I have integrated pglogical_output in multimaster, using bdr_apply from BDR as template for implementation of receiver part. The time of insert is reduced almost 10 times comparing with logical replication based on decoder_raw/receiver_raw plugins which performs logical replication using SQL statements. But unfortunately time of updates is almost not changed. It is expected result because I didn't see any functions related with SQL parsing/preparing in profile. Now in both cases profile is similar: 4.62% postgres[.] HeapTupleSatisfiesMVCC 2.99% postgres[.] heapgetpage 2.10% postgres[.] hash_search_with_hash_value 1.86% postgres[.] ExecProject 1.80% postgres[.] heap_getnext 1.79% postgres[.] PgXidInMVCCSnapshot By the way, you asked about comments concerning pglogical_output. I have one: most of pglogical protocol functions have "PGLogicalOutputData *data" parameter. There are few exceptions: write_startup_message_fn, pglogical_write_origin_fn, pglogical_write_rel_fn PGLogicalOutputData is the only way to pass protocol specific data, using "PGLogicalProtoAPI *api" field. This field is assigned by pglogical_init_api() function. And I can extend this PGLogicalProtoAPI structure by adding some protocol specific fields. For example, this is how it is done now for multimaster: typedef struct PGLogicalProtoMM { PGLogicalProtoAPI api; bool isLocal; /* mark transaction as local */ } PGLogicalProtoMM; PGLogicalProtoAPI * pglogical_init_api(PGLogicalProtoType typ) { PGLogicalProtoMM* pmm = palloc0(sizeof(PGLogicalProtoMM)); PGLogicalProtoAPI* res = &pmm->api; pmm->isLocal = false; res->write_rel = pglogical_write_rel; res->write_begin = pglogical_write_begin; res->write_commit = pglogical_write_commit; res->write_insert = pglogical_write_insert; res->write_update = pglogical_write_update; res->write_delete = pglogical_write_delete; res->write_startup_message = write_startup_message; return res; } But I have to add "PGLogicalOutputData *data" parameter to pglogical_write_rel_fn function. Di you think that it will be better to pass this parameter to all functions? May be it is not intended way of passing custom data to this functions... Certainly it is possible to use static variables for this purpose. But I think that passing user specific data through PGLogicalOutputData is safer and more flexible solution. On 12/03/2015 04:53 PM, Craig Ringer wrote: On 3 December 2015 at 19:06, konstantin knizhnik mailto:k.knizh...@postgrespro.ru>> wrote: On Dec 3, 2015, at 10:34 AM, Craig Ringer wrote: On 3 December 2015 at 14:54, konstantin knizhnik mailto:k.knizh...@postgrespro.ru>> wrote: I'd really like to collaborate using pglogical_output if at all possible. Petr's working really hard to get the pglogical downstrem out too, with me helping where I can. And where I can get pglogical_output plugin? Sorry, but I can't quickly find reference with Google... It's been submitted to this CF. https://commitfest.postgresql.org/7/418/ https://github.com/2ndQuadrant/postgres/tree/dev/pglogical-output Any tests and comments would be greatly appreciated. Thank you. I wonder if there is opposite part of the pipe for pglogical_output - analog of receiver_raw? It's pglogical, and it's in progress, due to be released at the same time as 9.5. We're holding it a little longer to nail down the user interface a bit better, etc, and because sometimes the real world gets in the way. The catalogs and UI are very different to BDR, it's much more extensible/modular, it supports much more flexible topologies, etc... but lots of the core concepts are very similar. So if you go take a look at the BDR code that'll give you a pretty solid idea of how a lot of it works, though BDR has whole subsystems pglogical doesn't (global ddl lock, ddl replication, etc). -- Craig Ringer http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: [HACKERS] [PATCH] Equivalence Class Filters
David Rowley writes: > On 6 December 2015 at 06:07, Tom Lane wrote: >> Another issue that would need consideration is how to avoid skewing >> planner selectivity estimates with derived clauses that are fully >> redundant with other clauses. > Could you give me an example of where this is a problem? Using the regression database, try explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand; explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand and a.thousand < 100; explain analyze select * from tenk1 a, tenk1 b where a.thousand = b.thousand and a.thousand < 100 and b.thousand < 100; The first two give pretty accurate estimates for the join size, the third not so much, because it thinks b.thousand < 100 is an independent constraint. > I've tried fooling > the planner into giving me bad estimates, but I've not managed to. > # explain analyze select * from t1 where t1.id < 10 and t1.id < 100 and > t1.id < 1000; That particular case works because clausesel.c's AddRangeClause figures out that the similar inequalities are redundant and drops all but one, on its way to (not) identifying a range constraint for t1.id. There's nothing comparable for constraints on different variables, though, especially not constraints on Vars of different relations, which will never even appear in the same restriction list. > if so, is the current eclass code not prone to the exact same problem? No, and I already explained why not. >> Lastly, in most cases knowing that t2.id <= 10 is just not worth all >> that much; it's certainly far less useful than an equality condition. > I'd have thought that my link to a thread of a reported 1100 to 2200 times > performance improvement might have made you think twice about claiming the > uselessness of this idea. I said "in most cases". You can find example cases to support almost any weird planner optimization no matter how expensive and single-purpose; but that is the wrong way to think about it. What you have to think about is average cases, and in particular, not putting a drag on planning time in cases where no benefit ensues. We're not committing any patches that give one uncommon case an 1100X speedup by penalizing every other query 10%, or even 1%; especially not when there may be other ways to fix it. The EC machinery wasn't designed in a vacuum. It is interested in deriving equality conditions because those are useful for merge and hash joins. It's also tightly tied in with the planner's understanding of sort ordering and distinct-ness. None of those considerations provide any reason to build a similar thing for inequalities. It may be that computing derived inequalities is something that's worth adding, but we're going to have to take a very hard look at the costs and benefits; it's not a slam dunk that such a patch would get committed. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] a word-choice question
Chapman Flack writes: > In PL/Java's function annotations, 'type' has been used for the > volatility category: > @Function(type=IMMUTABLE) > public static String hello() { return "Hello, world!"; } > It seems a bit infelicitous because you would probably sooner guess that > 'type' was telling you something about the function's _data_ type Yeah ... > But I've been trying think of something short, clear, preferably > monosyllabic, less geeky than 'v8y', and I don't have a winner yet. Instead of thinking about "volatility", maybe something based on "stability" or "constancy"? I suppose "const" is too much of a conflict, but "stable=IMMUTABLE" doesn't seem awful. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: CustomScan in a larger structure (RE: [HACKERS] CustomScan support on readfuncs.c)
> On Wed, Dec 2, 2015 at 4:06 AM, Andres Freund wrote: > > On 2015-12-02 08:52:20 +, Kouhei Kaigai wrote: > >> diff --git a/src/backend/nodes/copyfuncs.c b/src/backend/nodes/copyfuncs.c > >> index 26264cb..c4bb76e 100644 > >> --- a/src/backend/nodes/copyfuncs.c > >> +++ b/src/backend/nodes/copyfuncs.c > >> @@ -635,8 +635,12 @@ _copyWorkTableScan(const WorkTableScan *from) > >> static ForeignScan * > >> _copyForeignScan(const ForeignScan *from) > >> { > >> - ForeignScan *newnode = makeNode(ForeignScan); > >> - > >> + const ExtensibleNodeMethods *methods > >> + = GetExtensibleNodeMethods(from->extnodename, true); > >> + ForeignScan *newnode = (ForeignScan *) newNode(!methods > >> > + > ? sizeof(ForeignScan) > >> > + > : methods->node_size, > >> + > T_ForeignScan); > > > > Changing the FDW API seems to put this squarely into 9.6 > > territory. Agreed? > > I don't think anybody thought this patch was 9.5 material. I didn't, > anyway. The FDW changes for the join-pushdown-EPQ problem are another > issue, but that can be discussed on the other thread. > I don't expect it is 9.5 feature. The name of attached patch was "pgsql-v9.6-custom-private.v2.patch". > >> /* > >>* copy node superclass fields > >>*/ > >> @@ -645,6 +649,7 @@ _copyForeignScan(const ForeignScan *from) > >> /* > >>* copy remainder of node > >>*/ > >> + COPY_STRING_FIELD(extnodename); > >> COPY_SCALAR_FIELD(fs_server); > >> COPY_NODE_FIELD(fdw_exprs); > >> COPY_NODE_FIELD(fdw_private); > >> @@ -652,6 +657,8 @@ _copyForeignScan(const ForeignScan *from) > >> COPY_NODE_FIELD(fdw_recheck_quals); > >> COPY_BITMAPSET_FIELD(fs_relids); > >> COPY_SCALAR_FIELD(fsSystemCol); > >> + if (methods && methods->nodeCopy) > >> + methods->nodeCopy((Node *) newnode, (const Node *) from); > > > > I wonder if we shouldn't instead "recurse" into a generic routine for > > extensible nodes here. > > I don't know what that means. > > >> +void > >> +RegisterExtensibleNodeMethods(const ExtensibleNodeMethods *methods) > >> +{ > >> + uint32 hash; > >> + int index; > >> + ListCell *lc; > >> + MemoryContext oldcxt; > >> + > >> + if (!xnodes_methods_slots) > >> + xnodes_methods_slots = > MemoryContextAllocZero(TopMemoryContext, > >> + > sizeof(List *) * XNODES_NUM_SLOTS); > >> + > >> + hash = hash_any(methods->extnodename, strlen(methods->extnodename)); > >> + index = hash % XNODES_NUM_SLOTS; > >> + > >> + /* duplication check */ > >> + foreach (lc, xnodes_methods_slots[index]) > >> + { > >> + const ExtensibleNodeMethods *curr = lfirst(lc); > >> + > >> + if (strcmp(methods->extnodename, curr->extnodename) == 0) > >> + ereport(ERROR, > >> + (errcode(ERRCODE_DUPLICATE_OBJECT), > >> + errmsg("ExtensibleNodeMethods > \"%s\" already exists", > >> + > methods->extnodename))); > >> + } > >> + /* ok, register this entry */ > >> + oldcxt = MemoryContextSwitchTo(TopMemoryContext); > >> + xnodes_methods_slots[index] = lappend(xnodes_methods_slots[index], > >> + > (void *)methods); > >> + MemoryContextSwitchTo(oldcxt); > >> +} > > > > Why aren't you using dynahash here, and instead reimplement a hashtable? > > I had thought we might assume that the number of extensible nodes was > small enough that we could use an array for this and just use linear > search. But if we want to keep the door open to having enough > extensible nodes that this would be inefficient, then I agree that > dynahash is better than a hand-rolled hash table. > Hmm. I also expected the number of extensible nodes are not so much, so self-defined hash table is sufficient. However, it is not strong reason. Let me revise the hash implementation. > >> static ForeignScan * > >> _readForeignScan(void) > >> { > >> + const ExtensibleNodeMethods *methods; > >> READ_LOCALS(ForeignScan); > >> > >> ReadCommonScan(&local_node->scan); > >> > >> + READ_STRING_FIELD(extnodename); > >> READ_OID_FIELD(fs_server); > >> READ_NODE_FIELD(fdw_exprs); > >> READ_NODE_FIELD(fdw_private); > >> @@ -1804,6 +1806,19 @@ _readForeignScan(void) > >> READ_BITMAPSET_FIELD(fs_relids); > >> READ_BOOL_FIELD(fsSystemCol); > >> > >> + methods = GetExtensibleNodeMethods(local_node->extnodename, true); > >> + if (methods && methods->nodeRead) > >> + { > >> + ForeignScan*local_temp = > palloc0(methods->node_size); > >> + > >> + Assert(methods->node_size >= sizeof(ForeignScan)); > >> + > >> + memcpy(local_temp, local_node, sizeof(ForeignScan)); > >> + methods->nodeRead((Node *) local_temp); > >> + > >> + pfree(lo
Re: [HACKERS] proposal: multiple psql option -c
On Sun, Dec 6, 2015 at 10:56 PM, Michael Paquier wrote: > Thanks, I looked at that again and problem is fixed as attached. Er, not exactly... poll_query_until in PostgresNode.pm is using psql -c without the --no-psqlrc switch so this patch causes the regression tests of pg_rewind to fail. Fixed as attached. -- Michael 20151206_psql_commands_v4.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] proposal: multiple psql option -c
On Fri, Dec 4, 2015 at 11:47 PM, Robert Haas wrote: > On Wed, Dec 2, 2015 at 12:33 AM, Pavel Stehule > wrote: >>> Yeah, I don't think that's a big issue either to be honest. The code >>> is kept consistent a maximum with what is there previously. >>> >>> Patch is switched to ready for committer. >> >> perfect >> >> Thank you very much to all > > I did some edits on this patch and was all set to commit it when I ran > the regression tests and discovered that this breaks 130 out of the > 160 regression tests. Allow me to suggest that before submitting a > patch, or marking it ready for commiter, you test that 'make check' > passes. Mea culpa. I thought I did a check-world run... But well... > For the most part, the cleanups in this version are just cosmetic: I > fixed some whitespace damage, and reverted some needless changes to > the psql references page that were whitespace-only adjustments. In a > few places, I tweaked documentation or comment language. I also > hoisted the psqlrc handling out of an if statement where it was the > same in both branches. Other than that, this version is, I believe, > the same as Pavel's last version. Thanks, I looked at that again and problem is fixed as attached. -- Michael 20151206_psql_commands_v3.patch Description: binary/octet-stream -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] Equivalence Class Filters
On 6 December 2015 at 06:07, Tom Lane wrote: > David Rowley writes: > > As of today these Equivalence Classes only incorporate expressions which > > use the equality operator, but what if the above query had been written > as: > > > select * from t1 inner join t2 on t1.id = t2.id where t1.id <= 10; > > > Should we not be able to assume that t2.id is also <= 10? This sort of thing has been suggested multiple times before, and I've > rejected it every time on the grounds that it would likely add far more > planner cycles than it'd be worth, eg, time would be spent on searches for > matching subexpressions whether or not anything was learned (and often > nothing would be learned). I guess if it keeps coming up then people must be hitting this limitation. Perhaps continually rejecting it based on your original opinion is not the best way to move forward with improving the query planner. > While I've only read your description of the > patch not the patch itself, the search methodology you propose seems > pretty brute-force and unlikely to solve that issue. It's particularly > important to avoid O(N^2) behaviors when there are N expressions ... > > Likely it would be possible to do something to improve on that. Perhaps if the list of filter clauses grows beyond a certain number then a hash table can be built on the ec_members list with the em_expr as the key and the EquivalenceClass as the data. Although likely we don't currently have anything available for generating hash values for Expr nodes. On checking it seems that 32 is the estimated tipping point for hashing in find_join_rel(), perhaps something similar would suit this. > Another issue that would need consideration is how to avoid skewing > planner selectivity estimates with derived clauses that are fully > redundant with other clauses. The existing EC machinery is mostly > able to dodge that problem by generating just a minimal set of equality > clauses from an EC, but I don't see how we'd make that work here. > > Could you give me an example of where this is a problem? I've tried fooling the planner into giving me bad estimates, but I've not managed to. # explain analyze select * from t1 where t1.id < 10 and t1.id < 100 and t1.id < 1000; QUERY PLAN Index Scan using t1_pkey on t1 (cost=0.29..8.49 rows=9 width=8) (actual time=0.155..0.160 rows=9 loops=1) Index Cond: ((id < 10) AND (id < 100) AND (id < 1000)) I'd say the t1.id < 100 AND t1.id < 1000 are fully redundant here. The estimate given by the planner is bang-on. Perhaps you're talking about another column making the pushed down qual redundant? if so, is the current eclass code not prone to the exact same problem? I'm also wondering why you want to limit it to comparisons to constants; > that seems rather arbitrary. > > Do you have other useful cases in mind? The other one I considered was "expr op expr" clauses, where both those exprs were in eclasses. For example: select * from t1 inner join t2 on t1.col1=t2.col1 and t1.col2=t2.col2 where t1.col1 < t1.col2; I'd imagine that would have a narrower use case. I simply stopped with "Expr op Const" because I though that would be more common and less planner overhead to detect. Giving that below you also raise concerns that "expr op const" is likely not that useful in enough cases, then I'm not holding up too much hope of adding more complexity to the detection mechanism for less common wins. > Lastly, in most cases knowing that t2.id <= 10 is just not worth all > that much; it's certainly far less useful than an equality condition. > It's not difficult to imagine that this would often be a net drag on > runtime performance (never mind planner performance) by doing nothing > except creating additional filter conditions the executor has to check. > Certainly it would be valuable to know this if it let us exclude some > partition of a table, but that's only going to happen in a small minority > of queries. > > I'd have thought that my link to a thread of a reported 1100 to 2200 times performance improvement might have made you think twice about claiming the uselessness of this idea. I'm also quite surprised at you mentioning partitions here. Personally I'd be thinking of indexes long before thinking of partitions. I don't think I need to remind you that btree indexes handle >= and <= just fine. It's not all that hard to imagine that if they're using that column for a join condition and are serious about performance that they just might also have an index defined on that column too. I'd imagine the only case we might have to worry about is when the selectivity of the pushed qual is getting close to 1. Perhaps the RestrictInfos could be marked as "optional" somehow, and we could simply remove them when their selectivity estimates are too high. I'm not necessarily oppos