Re: [HACKERS] sorted writes for checkpoints
On Fri, Oct 29, 2010 at 3:23 PM, Heikki Linnakangas wrote: > Simon's argument in the thread that the todo item points to > (http://archives.postgresql.org/pgsql-patches/2008-07/msg00123.php) is > basically that we don't know what the best algorithm is yet and benchmarking > is a lot of work, so let's just let people do whatever they feel like until > we settle on the best approach. I think we need to bite the bullet and do > some benchmarking, and commit one carefully vetted patch to the backend. When I submitted the patch, I tested it on disk-based RAID-5 machine: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00541.php But there were no additional benchmarking reports at that time. We still need benchmarking before we re-examine the feature. For example, SSD and SSD-RAID was not popular at that time, but now they might be considerable. I think direct patching to the core is enough at the first testing, and we will decide the interface according to the result. If one algorithm win in all cases, we could just include it in the core, and then extensibility would not need. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sorted writes for checkpoints
On 29.10.2010 06:00, Jeff Janes wrote: One of the items on the Wiki ToDo list is sorted writes for checkpoints. The consensus seemed to be that this should be done by adding hook(s) into the main code, and then a contrib module to work with those hooks. Is there an existing contrib module that one could best look to for inspiration on how to go about doing this? I have the sorted checkpoint working under a guc, but don't know where to start on converting it to a contrib module instead. I don't think it's a good idea to have this as a hook. Bgwriter shouldn't need to load external code, and checkpoint robustness should dependend on user-written code. IIRC Tom Lane didn't even like pallocing the memory for the list of dirty pages at checkpoint time because that might cause an out-of-memory error. Calling a function in a contrib module is much much worse. Simon's argument in the thread that the todo item points to (http://archives.postgresql.org/pgsql-patches/2008-07/msg00123.php) is basically that we don't know what the best algorithm is yet and benchmarking is a lot of work, so let's just let people do whatever they feel like until we settle on the best approach. I think we need to bite the bullet and do some benchmarking, and commit one carefully vetted patch to the backend. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
I wrote: > the right way to make this faster is to refactor things so that we > don't generate useless equivalence classes in the first place, or > at least don't keep them around in the planner's lists once we realize > they're useless. After a bit of hacking, I propose the attached patch. > I like Heikki's hack to cut down on searching in make_canonical_pathkey, > but I think that complicating the data structure searching beyond that > is just a band-aid. With the given test case and this patch, we end up with exactly two canonical pathkeys referencing a single EquivalenceClass. So as far as I can tell there's not a lot of point in refining the pathkey searching. Now, the EquivalenceClass has got 483 members, which means that there's still some O(N^2) behavior in get_eclass_for_sort_expr. There might be some use in refining the search for a matching eclass member. It's not sticking out in profiling like it did before though. regards, tom lane diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index d6402cf911817b1b8c17da91019a1fac19bf051a..5c0786f2fe6dea9a72ad66ba93aa8833ab0e26ba 100644 *** a/src/backend/optimizer/README --- b/src/backend/optimizer/README *** sort ordering was important; and so usin *** 632,640 orderings doesn't create any real problem. - Though Bob Devine was not involved in the - coding of our optimizer, he is available to field questions about - optimizer topics. -- bjm & tgl --- 632,670 orderings doesn't create any real problem. + Order of processing for EquivalenceClasses and PathKeys + --- + + As alluded to above, there is a specific sequence of phases in the + processing of EquivalenceClasses and PathKeys during planning. During the + initial scanning of the query's quals (deconstruct_jointree followed by + reconsider_outer_join_clauses), we construct EquivalenceClasses based on + mergejoinable clauses found in the quals. At the end of this process, + we know all we can know about equivalence of different variables, so + subsequently there will be no further merging of EquivalenceClasses. + At that point it is possible to consider the EquivalenceClasses as + "canonical" and build canonical PathKeys that reference them. Before + we reach that point (actually, before entering query_planner at all) + we also ensure that we have constructed EquivalenceClasses for all the + expressions used in the query's ORDER BY and related clauses. These + classes might or might not get merged together, depending on what we + find in the quals. + + Because all the EquivalenceClasses are known before we begin path + generation, we can use them as a guide to which indexes are of interest: + if an index's column is not mentioned in any EquivalenceClass then that + index's sort order cannot possibly be helpful for the query. This allows + short-circuiting of much of the processing of create_index_paths() for + irrelevant indexes. + + There are some cases where planner.c constructs additional + EquivalenceClasses and PathKeys after query_planner has completed. + In these cases, the extra ECs/PKs are needed to represent sort orders + that were not considered during query_planner. Such situations should be + minimized since it is impossible for query_planner to return a plan + producing such a sort order, meaning a explicit sort will always be needed. + Currently this happens only for queries involving multiple window functions + with different orderings, so extra sorts are needed anyway. -- bjm & tgl diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index e44e960b5454d4698ed82e4e857794ffe2a9adf2..c101c272a14b2f1b9d92a54670688df057d84a13 100644 *** a/src/backend/optimizer/path/equivclass.c --- b/src/backend/optimizer/path/equivclass.c *** static bool reconsider_full_join_clause( *** 78,83 --- 78,87 * join. (This is the reason why we need a failure return. It's more * convenient to check this case here than at the call sites...) * + * On success return, we have also initialized the clause's left_ec/right_ec + * fields to point to the EquivalenceClass built from it. This saves lookup + * effort later. + * * Note: constructing merged EquivalenceClasses is a standard UNION-FIND * problem, for which there exist better data structures than simple lists. * If this code ever proves to be a bottleneck then it could be sped up --- *** process_equivalence(PlannerInfo *root, R *** 106,111 --- 110,119 *em2; ListCell *lc1; + /* Should not already be marked as having generated an eclass */ + Assert(restrictinfo->left_ec == NULL); + Assert(restrictinfo->right_ec == NULL); + /* Extract info from given clause */ Assert(is_opclause(clause)); opno = ((OpExpr *) clause)->opno; *** process_equivalence(Plann
Re: [HACKERS] add label to enum syntax
Hello, Alvaro. You wrote: AH> Excerpts from Pavel Golub's message of jue oct 28 07:50:24 -0300 2010: >> Forgot link to poll: >> http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/ AH> Hah, there are 17 votes as of right now, no option is below 23% and no AH> option is above 29%. Yeah, right now 42 votes: VALUE 26% LABEL 26% Just ADD 'newlabel' 24% ELEMENT 21% MEMBER 2% -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] sorted writes for checkpoints
Excerpts from Jeff Janes's message of vie oct 29 00:00:24 -0300 2010: > One of the items on the Wiki ToDo list is sorted writes for > checkpoints. The consensus seemed to be that this should be done by > adding hook(s) into the main code, and then a contrib module to work > with those hooks. Is there an existing contrib module that one could > best look to for inspiration on how to go about doing this? I have > the sorted checkpoint working under a guc, but don't know where to > start on converting it to a contrib module instead. Hmm, see contrib/auto_explain? -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] sorted writes for checkpoints
One of the items on the Wiki ToDo list is sorted writes for checkpoints. The consensus seemed to be that this should be done by adding hook(s) into the main code, and then a contrib module to work with those hooks. Is there an existing contrib module that one could best look to for inspiration on how to go about doing this? I have the sorted checkpoint working under a guc, but don't know where to start on converting it to a contrib module instead. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Thu, 2010-10-28 at 17:12 -0700, Josh Berkus wrote: > > Sorry, didn't know... I have 122 responses so far, which I think will be > > surprising (some of them certainly surprised me). I will keep it open > > until next week and then post the results. > > Well, for any community site poll, I hope you realize that there's a LOT > of sampling error. Here's another one: > > http://www.postgresql.org/community/survey.71 > Oh sure. I don't expect this to be some kind of authoritative reference but it is certainly worth at least reviewing. If nothing else it is fun to see the responses and consider their meaning based on your own views. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
> Sorry, didn't know... I have 122 responses so far, which I think will be > surprising (some of them certainly surprised me). I will keep it open > until next week and then post the results. Well, for any community site poll, I hope you realize that there's a LOT of sampling error. Here's another one: http://www.postgresql.org/community/survey.71 -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Buildfarm server upgrade
The buildfarm server is moving. The web site has already moved, and the database will be moving on Tuesday Nov 9th, starting at 9.00 am EST (or, if you prefer, at 14:00 UTC). The server will be unavailable during the outage, which is expected to take no longer then 6 hours. (we're moving from 8.0 to 9.0, so it will be an old-fashioned dump and reload). Any buildfarm owners are advised to disable any scheduled jobs during that period. I will post notices shortly before shutting the server down and as soon as it is back up. It would probably be a good thing not to make any commits in the hours leading up to the outage. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Thu, 2010-10-28 at 16:25 -0700, Josh Berkus wrote: > > https://www.postgresqlconference.org/content/replication-poll > > > > You don't have to login to take it but of course it helps with validity > > of results. > > Oh, I'd already put something up on http://www.postgresql.org/community Sorry, didn't know... I have 122 responses so far, which I think will be surprising (some of them certainly surprised me). I will keep it open until next week and then post the results. JD -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
> https://www.postgresqlconference.org/content/replication-poll > > You don't have to login to take it but of course it helps with validity > of results. Oh, I'd already put something up on http://www.postgresql.org/community -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Keywords in pg_hba.conf should be field-specific
On 18 October 2010 01:19, Tom Lane wrote: > Brendan Jurd writes: >> On 17 October 2010 09:59, Tom Lane wrote: >>> Good point. Maybe the correct fix is to remember whether each token was >>> quoted or not, so that keyword detection can be done safely after the >>> initial lexing. I still think that the current method is impossibly >>> ugly ... > >> I'm happy to revise the patch on that basis. Any suggestions about >> how to communicate the 'quotedness' of each token? We could make each >> token a struct consisting of the token itself, plus a boolean flag to >> indicate whether it had been quoted. Does that work for you? > > Seems reasonable. I had the idea of a parallel list of booleans in the > back of my mind, but a list of structs is probably easier to understand, > and to extend further if necessary. > Okay, I've taken the red pill and I'm finding out how deep the rabbit hole goes ... The logical structure of pg_hba.conf is a set of lines, each line containing a set of fields, each field containing a set of tokens. The way the existing implementation handles this is to create a list of lines containing sublists of fields, containing comma-separated strings for the set of tokens, with newlines embedded next to tokens which might be keywords. The tokeniser breaks apart the comma-separated tokens ... and then reassembles them into a comma-separated string. Which the db/role matching functions then have to break apart *again*. In order to keep track of whether each individual token was quoted, I first need to impose some sanity here. Rather than using a magical string for each field, I intend to use a List of HbaToken structs which explicitly note whether quoting was used. Introducing an extra List level does mean a bit more work copying and freeing, and it makes the patch really quite intrusive. I have to touch a lot of lines in hba.c, but I think the additional clarity is worth it. If nobody dissuades me from this approach I hope to post a patch in a couple of days. Cheers, BJ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Boszormenyi Zoltan writes: > Tom Lane írta: >> That seems like a ridiculously large number of ECs. What is the >> test query again? > The test case is here: > http://archives.postgresql.org/message-id/4cbd9ddc.4040...@cybertec.at After poking through that a bit, I think that the real issue is in this division of labor: index_pathkeys = build_index_pathkeys(root, index, ForwardScanDirection); useful_pathkeys = truncate_useless_pathkeys(root, rel, index_pathkeys); If you trace what is happening here, the index pathkeys that actually survive the "usefulness" test all refer to exactly ONE equivalence class, namely the one arising from the query's "order by timestamp2" clause. All the other pathkeys that get created are immediately discarded as being irrelevant to the query. The reason that we end up with so many equivalence classes is that there is nothing causing the variables of the different child tables to be recognized as all sort-equivalent. Maybe that's a bug in itself, but I would argue that the right way to make this faster is to refactor things so that we don't generate useless equivalence classes in the first place, or at least don't keep them around in the planner's lists once we realize they're useless. I like Heikki's hack to cut down on searching in make_canonical_pathkey, but I think that complicating the data structure searching beyond that is just a band-aid. Reasonably-sized queries shouldn't contain very many equivalence classes: they should only come from equality clauses or sort conditions that appeared in the query text. Therefore, there also shouldn't be all that many distinct pathkeys. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contsel and gist
Ben writes: > On Oct 28, 2010, at 10:50 AM, Tom Lane wrote: >> However, having said that: the constant value of the stub contsel >> function is intended to be small enough to encourage use of an >> indexscan. Maybe we just need to decrease it a bit more. Have you >> investigated what the cutover point is for your queries? > i'd be happy to investigate this for you, but my guess is my dataset is probably not a good example to use for setting the constant more generally. i'm joining an 8e10 table vs a 150K table, so the selectivity fraction would probably have to drop by many orders of magnitude. I doubt it. > that being said, i'll poke around and see if i can find the cutoff point. is > there an easy way to do this that doesn't involve recompiling postgres? No, those are just hardwired constants. If you wanted to avoid multiple recompilations while experimenting, you could set up a custom GUC variable for the functions to read... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
On Thu, 2010-10-28 at 07:05 -0500, Kevin Grittner wrote: > "Joshua D. Drake" wrote: > > On Wed, 2010-10-27 at 19:52 -0400, Robert Haas wrote: > >> Josh Berkus wrote: > > > >>> *you don't know* how many .org users plan to implement > >>> replication, whether it's a minority or majority. > >> > >> None of us know. What I do know is that I don't want PostgreSQL to > >> be slower out of the box. > > > > Poll TIME! > > If you do take a poll, be careful to put in an option or two to deal > with environments where there is "surgical" implementation of > replication features. And Poll it is: https://www.postgresqlconference.org/content/replication-poll You don't have to login to take it but of course it helps with validity of results. Sincerely, Joshua D. Drake -- PostgreSQL.org Major Contributor Command Prompt, Inc: http://www.commandprompt.com/ - 509.416.6579 Consulting, Training, Support, Custom Development, Engineering http://twitter.com/cmdpromptinc | http://identi.ca/commandprompt -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql autocompletion for \z and \dg
On ons, 2010-10-20 at 22:56 -0400, Josh Kupershmidt wrote: > It looks like psql's tab completion for the \z and \dg commands in > psql are missing. I couldn't see a reason for this, so attached patch > fixes. > > Also, this patch proposes to change psql's "\?" help text to say that > \dg and \du are the same, since AFAICT they do exactly the same thing. Committed with slight variations. I didn't think documenting \du as "same as \dg" added much value, so I just made them both "list roles". -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contsel and gist
thanks for the prompt reply. On Oct 28, 2010, at 10:50 AM, Tom Lane wrote: >> 1 - am i wrong in my assessment? is the constant contsel, areasel, etc >> hurting us? > > The stub selectivity functions definitely suck. i'm taking this as implying that my intuition here is basically right. >> 2 - how hard would it be to implement contsel et al for period data types? > > If it were easy, it'd likely have been done already :-( i am interested in learning more about this, in hopes that it might be possible for me to do it some day. do you have any pointers as far as things to look at to learn from? i imagine this must be a problem for the postgis people too. i guess the first step is to figure out what kind of statistics / histograms to collect for the period datatype. (i don't see anything in pg_stats.) has there been previous work / thinking on this? > However, having said that: the constant value of the stub contsel > function is intended to be small enough to encourage use of an > indexscan. Maybe we just need to decrease it a bit more. Have you > investigated what the cutover point is for your queries? i'd be happy to investigate this for you, but my guess is my dataset is probably not a good example to use for setting the constant more generally. i'm joining an 8e10 table vs a 150K table, so the selectivity fraction would probably have to drop by many orders of magnitude. that being said, i'll poke around and see if i can find the cutoff point. is there an easy way to do this that doesn't involve recompiling postgres? best regards, b -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
On 10/28/2010 02:11 PM, Garick Hamlin wrote: On Thu, Oct 28, 2010 at 01:03:24PM -0400, Andrew Dunstan wrote: On 10/28/2010 12:34 PM, Tom Lane wrote: BTW, maybe we could have the best of both worlds? Dunno about Perl, but in some languages it would be possible to instantiate the hash only if it's actually touched. Passing the data as a hash definitely seems to fit with the spirit of things otherwise, so as long as it didn't cost cycles when not needed, I'd be in favor of that API. Maybe, but I think that's getting rather beyond my perlguts-fu. I think we'd need to do that via PERL_MAGIC_tied, but it's new territory for me. Anyone else want to chime in? Warning, I don't know the plperl, I am just a perl coder. I do think all the anonymous array are worth worrying about in terms of performance. I don't think that tie is necessarily good for performance. tie() is not generally fast. I think you'd likely be better off writing plain accessors or using a function to add type info. Use an accessor for type information, like this? $ref->typeof($key) ... or perhaps use a special function? add_type_info(\%args); ... or if you want attibute based syntax sugar for the add_type_info() solution... my %args : pg_record(add_type_info); Again, these I don't know the plperl code, so I might be missing something here. This wouldn't be done at the perl level. It would be done in C code. Run "man perlguts" and search for "Understanding the Magic of Tied Hashes and Arrays". The overhead in setting it up is likely to be very low unless I'm not understanding correctly. There might be some price paid when accessing the object, but that's another affair. OTOH, a pg_get_arg_info() function would probably be a substantially simpler if slightly less perlish solution. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
On Thu, Oct 28, 2010 at 01:03:24PM -0400, Andrew Dunstan wrote: > > > On 10/28/2010 12:34 PM, Tom Lane wrote: > > > > BTW, maybe we could have the best of both worlds? Dunno about Perl, > > but in some languages it would be possible to instantiate the hash > > only if it's actually touched. Passing the data as a hash definitely > > seems to fit with the spirit of things otherwise, so as long as it > > didn't cost cycles when not needed, I'd be in favor of that API. > > Maybe, but I think that's getting rather beyond my perlguts-fu. I think > we'd need to do that via PERL_MAGIC_tied, but it's new territory for me. > Anyone else want to chime in? Warning, I don't know the plperl, I am just a perl coder. I do think all the anonymous array are worth worrying about in terms of performance. I don't think that tie is necessarily good for performance. tie() is not generally fast. I think you'd likely be better off writing plain accessors or using a function to add type info. Use an accessor for type information, like this? $ref->typeof($key) ... or perhaps use a special function? add_type_info(\%args); ... or if you want attibute based syntax sugar for the add_type_info() solution... my %args : pg_record(add_type_info); Again, these I don't know the plperl code, so I might be missing something here. Garick > > cheers > > andrew > > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] contsel and gist
Ben writes: > my guess is that it has to do with the selectivity of the @> operator. i've > looked and noticed that the selectivity functions for @> and other period > operators are basically stubs, with constant selectivity. my questions are : > 1 - am i wrong in my assessment? is the constant contsel, areasel, etc > hurting us? The stub selectivity functions definitely suck. > 2 - how hard would it be to implement contsel et al for period data types? If it were easy, it'd likely have been done already :-( However, having said that: the constant value of the stub contsel function is intended to be small enough to encourage use of an indexscan. Maybe we just need to decrease it a bit more. Have you investigated what the cutover point is for your queries? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Composite Types and Function Parameters
On Oct 28, 2010, at 9:31 AM, Andrew Dunstan wrote: > Of course it's possible, but it's a different feature. As for "just as easy", > no, it's much more work. I agree it should be done, though. I bet we could raise some money to fund it's development. How much work are we talking about here? Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Tom Lane írta: > Boszormenyi Zoltan writes: > >> This means add_child_rel_equivalences() is still takes >> too much time, the previously posted test case calls this >> function 482 times, it's called for almost every 10th entry >> added to eq_classes. The elog() I put into this function says >> that at the last call list_length(eq_classes) == 4754. >> > > That seems like a ridiculously large number of ECs. What is the > test query again? > > regards, tom lane > The test case is here: http://archives.postgresql.org/message-id/4cbd9ddc.4040...@cybertec.at create_table.sql for the main table plus childtables.sql.gz, the EXPLAIN query is in the message body. Basically, it's a model for a lot of data for three months, partitioned by 4 hour intervals for every day. Imagine the call list handled by a phone company in a large country. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
On 10/28/2010 12:34 PM, Tom Lane wrote: BTW, maybe we could have the best of both worlds? Dunno about Perl, but in some languages it would be possible to instantiate the hash only if it's actually touched. Passing the data as a hash definitely seems to fit with the spirit of things otherwise, so as long as it didn't cost cycles when not needed, I'd be in favor of that API. Maybe, but I think that's getting rather beyond my perlguts-fu. I think we'd need to do that via PERL_MAGIC_tied, but it's new territory for me. Anyone else want to chime in? cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] contsel and gist
hello -- i have a largish table (~8 billion rows) which makes use of the temporal period datatype and gist indexes. i find that query plans are somewhat "unstable" in that it is often the case that slightly altering a query can result in a change from using the index (usually via a bitmap scan) to a sequential scan. there is basically no situation where a sequential scan is the right plan for the kinds of queries we are doing -- the table is so large that any sequential scan would take hours. (i will give more details about our setup below.) my guess is that it has to do with the selectivity of the @> operator. i've looked and noticed that the selectivity functions for @> and other period operators are basically stubs, with constant selectivity. my questions are : 1 - am i wrong in my assessment? is the constant contsel, areasel, etc hurting us? 2 - how hard would it be to implement contsel et al for period data types? i've read the gist papers but find the eqsel code a bit hard to understand. (would anyone be willing to help?) more details : pg 9, linux x64_64 box with 24gb ram and software raid-5. (not ideal, i understand.) the table is create table timeseries ( id integer not null, value float not null, timespan period not null ); create index timeseries_id on timeseries (id); create index timeseries_timespan on timeseries using gist (timespan); in our dataset there are about 2000 different time series, each given a different id. the time series are piecewise constant functions we are representing as (id, value, time interval) triples. the intervals are typically no more than a few seconds, at most a few minutes. for each id, the intervals are non-overlapping and cover the total time period. there are about 8 billion rows of historical data, and there are about 3 million new rows a day, relatively evenly spread across the different ids. the database gets updated once a day with the new rows, and the rows are loaded in time order; the historical data is basically ordered in time. other than that single daily update, the workload is basically read-only. the most important query for us is to sample the time series at (potentially irregular) grid points (i will give some example queries below.) there are some small side tables (less than 150K rows) for things like different grid points to sample at, or auxiliary data which augment the time series. create table grids ( gridid integer not null, gridpt timestamp with timezone, primary key (gridid, gridpt) ); create table adjs1 ( id integer not null, timespan period not null, adj double precision not null); create index adjs1_timespan on adjs1 using gist (timespan); create index adjs1_id on adjs1 (id); an example query that works : postgres=> explain analyze select * from timeseries join grids on timespan @> gridpt where gridid = 2 and gridpt between '2006-10-12' and '2006-10-15'; QUERY PLAN -- Nested Loop (cost=18082.74..33760441.77 rows=6525038912 width=48) (actual time=204.655..2498.152 rows=34275 loops=1) -> Index Scan using grids_pkey on grids (cost=0.00..76.64 rows=23 width=12) (actual time=0.059..0.559 rows=38 loops=1) Index Cond: ((gridid = 2) AND (gridpt >= '2006-10-12 00:00:00'::timestamp without time zone) AND (gridpt <= '2006-10-15 00:00:00'::timestamp without time zone)) -> Bitmap Heap Scan on timeseries (cost=18082.74..1460749.52 rows=567395 width=36) (actual time=32.561..62.545 rows=902 loops=38) Recheck Cond: (timeseries.timespan @> grids.gridpt) -> Bitmap Index Scan on timeseries_idx_timespan (cost=0.00..17940.89 rows=567395 width=0) (actual time=32.184..32.184 rows=902 loops=38) Index Cond: (timeseries.timespan @> grids.gridpt) Total runtime: 2553.386 ms (8 rows) Time: 2555.498 ms where there are about 10-100 gridpts between the times selected. this query plan looks good to me, and indeed it runs fairly fast. an example query that goes haywire : postgres=> explain select * from timeseries as TS join grids on TS.timespan @> gridpt join adjs1 as AD1 on TS.id=AD1.id and AD1.timespan @> gridpt where gridid=2 and gridpt between '2006-10-19' and '2006-10-22'; QUERY PLAN -- Hash Join (cost=7166.37..2517194919.79 rows=83476436469 width=76) Hash Cond: (ts.id = ad1.id) Join Filter: (ts.timespan @> grids.gridpt) -> Seq Scan on timeseries ts (cost=0.00..10402235.88 rows=567394688 width=36) -> Hash (cost=
Re: [HACKERS] plperl arginfo
On Thu, Oct 28, 2010 at 11:34 AM, Tom Lane wrote: > Andrew Dunstan writes: >> On 10/28/2010 11:54 AM, Pavel Stehule wrote: >>> Alternatively, maybe the feature could be exposed in a way where you >>> don't actually calculate the values unless requested, ie provide some >>> sort of inquiry function instead of always precomputing a hash. >>> +1 .. some like get_function_info() > >> Yeah, that looks doable. > > BTW, maybe we could have the best of both worlds? Dunno about Perl, > but in some languages it would be possible to instantiate the hash > only if it's actually touched. Passing the data as a hash definitely > seems to fit with the spirit of things otherwise, so as long as it > didn't cost cycles when not needed, I'd be in favor of that API. Perl has the tie interface (perldoc perltie) which lets you tie a hash to an object instance, which implements subs to handle the various hash operations. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] revision of todo: NULL for ROW variables
On Thu, Oct 28, 2010 at 10:15 AM, Tom Lane wrote: > Pavel Stehule writes: >> I am checking PLpgSQL ToDo topics, and I am not sure if this topic >> isn't done. And if not, then I would to get some detail. > > I think that thread petered out because we didn't have consensus on > what the behavior ought to be. It goes back to whether there is > supposed to be a difference between NULL and ROW(NULL,NULL,NULL,...) I think somewhere along the line it was noticed that SQL says you are supposed to treat (null, null) as null and the behavior of 'is null' operator was changed to reflect this while other null influenced behaviors were left intact (for example, coalesce()). My take on this is that we are stuck with the status quo. If a change must be done, the 'is null' change should be reverted to un-standard behavior. The SQL standard position on this issue is, IMNSHO, on mars. merlin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
2010/10/28 Andrew Dunstan : > > > On 10/28/2010 11:54 AM, Pavel Stehule wrote: >> >> Alternatively, maybe the feature could be exposed in a way where you >> don't actually calculate the values unless requested, ie provide some >> sort of inquiry function instead of always precomputing a hash. >> +1 .. some like get_function_info() >> > > Yeah, that looks doable. > > I think we can just commit the generic record support now and add this on > later. this can be very interesting feature - because it can to do some things in plperi instead c. regards Pavel > > cheers > > andrew > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
Andrew Dunstan writes: > On 10/28/2010 11:54 AM, Pavel Stehule wrote: >> Alternatively, maybe the feature could be exposed in a way where you >> don't actually calculate the values unless requested, ie provide some >> sort of inquiry function instead of always precomputing a hash. >> +1 .. some like get_function_info() > Yeah, that looks doable. BTW, maybe we could have the best of both worlds? Dunno about Perl, but in some languages it would be possible to instantiate the hash only if it's actually touched. Passing the data as a hash definitely seems to fit with the spirit of things otherwise, so as long as it didn't cost cycles when not needed, I'd be in favor of that API. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Composite Types and Function Parameters
On Oct 27, 2010, at 9:08 PM, Andrew Dunstan wrote: > Well, it turns out that the hashref required exactly one more line to > achieve. We already have all the infrastructure on the composite handling > code, and all it requires it to enable it for the RECORDOID case. I don't suppose that it would be just as easy to allow an array passed to PL/Perl to be read into the PL/Perl function as an array reference, would it? Since it would break backcompat, it would need to be enabled by a plperl.* directive, but man, I would kill for that. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Composite Types and Function Parameters
On 10/28/2010 12:23 PM, David E. Wheeler wrote: On Oct 27, 2010, at 9:08 PM, Andrew Dunstan wrote: Well, it turns out that the hashref required exactly one more line to achieve. We already have all the infrastructure on the composite handling code, and all it requires it to enable it for the RECORDOID case. I don't suppose that it would be just as easy to allow an array passed to PL/Perl to be read into the PL/Perl function as an array reference, would it? Since it would break backcompat, it would need to be enabled by a plperl.* directive, but man, I would kill for that. Of course it's possible, but it's a different feature. As for "just as easy", no, it's much more work. I agree it should be done, though. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
On 10/28/2010 11:54 AM, Pavel Stehule wrote: Alternatively, maybe the feature could be exposed in a way where you don't actually calculate the values unless requested, ie provide some sort of inquiry function instead of always precomputing a hash. +1 .. some like get_function_info() Yeah, that looks doable. I think we can just commit the generic record support now and add this on later. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Alvaro Herrera writes: > BTW I note that there are no elog(ERROR) calls in that code path at all, > because syscall errors are ignored, so PANIC is not a concern (as the > code stands currently, at least). ISTM it would be good to have a > comment on SetLatch stating that it's used inside critical sections, > though. Yeah, I was thinking the same while reading the code yesterday. It already notes that it's used in interrupt handlers, but the critical section angle is an additional reason not to elog(ERROR). regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
2010/10/28 Tom Lane : > Andrew Dunstan writes: >> While we were discussing allowing generic record type arguments to >> plperl functions, Tom suggested that we should expose the type >> information about the record members to plperl. I think if we do that we >> should probably expand it somewhat to all arguments, so that for >> non-trigger functions, we'd have $_ARG_INFO while could look something >> like this: > >> { >> names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2 >> modes => ['in', 'in', 'in' ], # do we support anything other >> than IN ? >> types => ['integer', 'text', { name => 'somecomposite', fields >> => [ 'field1', 'field2' ], types => ['date', 'numeric' ] } ], >> } > > Hmm, I'm a bit worried about the performance implications of adding this > information. It seems like the value in typical cases would be minimal: > when you are writing the body of "myfunction(foo int, bar text)", it's > not like you don't know perfectly well the names and argument types of > the parameters. > > I can see the value of providing type info for polymorphic arguments, > but not sure about expending extra cycles to do it for all. > > Alternatively, maybe the feature could be exposed in a way where you > don't actually calculate the values unless requested, ie provide some > sort of inquiry function instead of always precomputing a hash. +1 .. some like get_function_info() Regards Pavel Stehule > > regards, tom lane > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
Excerpts from Tom Lane's message of mié oct 27 19:01:38 -0300 2010: > I don't know what Simon is thinking, but I think he's nuts. There is is > obvious extra overhead in COMMIT: > > /* > * Wake up all walsenders to send WAL up to the COMMIT record > * immediately if replication is enabled > */ > if (max_wal_senders > 0) > WalSndWakeup(); > > which AFAICT is injecting multiple kernel calls into what's not only > a hot-spot but a critical section (ie, any error -> PANIC). Hmm, I wonder if that could be moved out of the critical section somehow. Obviously the point here is to allow wal senders to react before we write to clog (which is expensive by itself); but it seems hard to wake up some other process without incurring exactly the same cost (which is basically SetLatch) ... the only difference is that it would be a single one instead of one per walsender. BTW I note that there are no elog(ERROR) calls in that code path at all, because syscall errors are ignored, so PANIC is not a concern (as the code stands currently, at least). ISTM it would be good to have a comment on SetLatch stating that it's used inside critical sections, though. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
2010/10/28 Andrew Dunstan : > While we were discussing allowing generic record type arguments to plperl > functions, Tom suggested that we should expose the type information about > the record members to plperl. I think if we do that we should probably > expand it somewhat to all arguments, so that for non-trigger functions, we'd > have $_ARG_INFO while could look something like this: > > { > names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2 > modes => ['in', 'in', 'in' ], # do we support anything other than IN ? variadic Pavel > types => ['integer', 'text', { name => 'somecomposite', fields => [ > 'field1', 'field2' ], types => ['date', 'numeric' ] } ], > } > > Maybe we should also pass in type Oid info, too. > > I don't think this would be terribly difficult to do. > > thoughts? > > cheers > > andrew > > > > > -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plperl arginfo
Andrew Dunstan writes: > While we were discussing allowing generic record type arguments to > plperl functions, Tom suggested that we should expose the type > information about the record members to plperl. I think if we do that we > should probably expand it somewhat to all arguments, so that for > non-trigger functions, we'd have $_ARG_INFO while could look something > like this: > { > names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2 > modes => ['in', 'in', 'in' ], # do we support anything other > than IN ? > types => ['integer', 'text', { name => 'somecomposite', fields > => [ 'field1', 'field2' ], types => ['date', 'numeric' ] } ], > } Hmm, I'm a bit worried about the performance implications of adding this information. It seems like the value in typical cases would be minimal: when you are writing the body of "myfunction(foo int, bar text)", it's not like you don't know perfectly well the names and argument types of the parameters. I can see the value of providing type info for polymorphic arguments, but not sure about expending extra cycles to do it for all. Alternatively, maybe the feature could be exposed in a way where you don't actually calculate the values unless requested, ie provide some sort of inquiry function instead of always precomputing a hash. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] plperl arginfo
While we were discussing allowing generic record type arguments to plperl functions, Tom suggested that we should expose the type information about the record members to plperl. I think if we do that we should probably expand it somewhat to all arguments, so that for non-trigger functions, we'd have $_ARG_INFO while could look something like this: { names => ['arg1', undef, 'arg3' ] , # no name was supplied for arg2 modes => ['in', 'in', 'in' ], # do we support anything other than IN ? types => ['integer', 'text', { name => 'somecomposite', fields => [ 'field1', 'field2' ], types => ['date', 'numeric' ] } ], } Maybe we should also pass in type Oid info, too. I don't think this would be terribly difficult to do. thoughts? cheers andrew
Re: [HACKERS] add label to enum syntax
Excerpts from Pavel Golub's message of jue oct 28 07:50:24 -0300 2010: > Forgot link to poll: > http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/ Hah, there are 17 votes as of right now, no option is below 23% and no option is above 29%. -- Álvaro Herrera The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] 9.1alpha2 release notes and plan
I have committed an initial version of release notes for 9.1alpha2. Please look it over. The release can then be prepared over the weekend and announced on Monday. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Simplifying replication
On Thu, Oct 28, 2010 at 1:13 AM, Josh Berkus wrote: > >> I sort of agree with you that the current checkpoint_segments >> parameter is a bit hard to tune, at least if your goal is to control >> the amount of disk space that will be used by WAL files. But I'm not >> sure your proposal is better. Instead of having a complicated formula >> for predicting how much disk space would get used by a given value for >> checkpoint_segments, we'd have a complicated formula for the amount of >> WAL that would force a checkpoint based on max_wal_size. > > Yes, but the complicated formula would then be *in our code* instead of > being inflicted on the user, as it now is. I don't think so - I think it will just be inflicted on the user in a different way. We'd still have to document what the formula is, because people will want to understand how often a checkpoint is going to get forced. So here's an example of how this could happen. Someone sets max_wal_size = 480MB. Then, they hear about the checkpoint_completion_target parameter, and say, ooh, goody, let me boost that. So they raise it from 0.5 to 0.9. Now, all of a sudden, they're getting more frequent checkpoints. Performance may get worse rather than better. To figure out what value for max_wal_size forces a checkpoint after the same amount of WAL that forced a checkpoint before, they need to work backwards from max_wal_size to checkpoint_segments, and then work forward again to figure out the new value for the max_wal_size parameter. Here's the math. max_wal_size = 480MB = 30 segments. With checkpoint_completion_target = 0.5, that means that checkpoint_segments is (30 - 1) / (2 + 0.5) = 11 (rounded down). With checkpoint_completion_target = 0.9, that means they'll need to set max_wal_size to (2 + 0.9) * 11 + 1 = 33 (rounded up) * 16MB = 528MB. Whew! My theory is that most tuning of checkpoint_segments is based on a worry about recovery time or performance, not disk consumption. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Re: [PERFORM] Re: Postgres insert performance and storage requirement compared to Oracle
[moving to -hackers, from -performance] On Wed, Oct 27, 2010 at 11:32 PM, Tom Lane wrote: > Robert Haas writes: >> Gee, I wonder if it would possible for PG to automatically do an >> asynchronous commit of any transaction which touches only temp tables. > > Hmm ... do we need a commit at all in such a case? If our XID has only > gone into temp tables, I think we need to write to clog, but we don't > really need a WAL entry, synced or otherwise. I think we might need a commit record anyway to keep Hot Standby's KnownAssignedXids tracking from getting confused. It might be possible to suppress it when wal_level is less than hot_standby, but I'm not sure it's worth it. You definitely need to write to CLOG, because otherwise a subsequent transaction from within the same backend might fail to see those tuples. Also, I think that the right test is probably "Have we done anything that needs to be WAL-logged?". We can get that conveniently by checking whether XactLastRecEnd.xrecoff. One option looks to be to just change this: if (XactSyncCommit || forceSyncCommit || nrels > 0) ...to say ((XactSyncCommit && XactLastRecEnd.recoff != 0) || forceSyncCommit || nrels > 0). But I'm wondering if we can instead rejigger things so that this test moves out of the !markXidCommitted branch of the if statement and drops down below the whole if statement. /* * If we didn't create XLOG entries, we're done here; otherwise we * should flush those entries the same as a commit record. (An * example of a possible record that wouldn't cause an XID to be * assigned is a sequence advance record due to nextval() --- we want * to flush that to disk before reporting commit.) */ if (XactLastRecEnd.xrecoff == 0) goto cleanup; -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Compan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] revision of todo: NULL for ROW variables
Pavel Stehule writes: > I am checking PLpgSQL ToDo topics, and I am not sure if this topic > isn't done. And if not, then I would to get some detail. I think that thread petered out because we didn't have consensus on what the behavior ought to be. It goes back to whether there is supposed to be a difference between NULL and ROW(NULL,NULL,NULL,...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Heikki Linnakangas writes: > Actually, I wonder if we could just have a separate canon_pathkeys list > for each EquivalenceClass, instead of one big list in PlannerInfo. I'm > not too familiar with equivalence classes and all that, Hm. I don't like getting rid of the main canon_pathkeys list like that. The whole point of a canonical pathkey is that there is only one, so it seems like we need a central list. But it might be sane for each EC to have an additional, side list of PKs made from it. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Boszormenyi Zoltan writes: > This means add_child_rel_equivalences() is still takes > too much time, the previously posted test case calls this > function 482 times, it's called for almost every 10th entry > added to eq_classes. The elog() I put into this function says > that at the last call list_length(eq_classes) == 4754. That seems like a ridiculously large number of ECs. What is the test query again? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] revision of todo: NULL for ROW variables
Hello I am checking PLpgSQL ToDo topics, and I am not sure if this topic isn't done. And if not, then I would to get some detail. Now there is possible to test row's variable on NULL, now it is possible to assign NULL to row variable. What we can do more? a) There is small difference between returned value when we use a empty row or empty record variable CREATE OR REPLACE FUNCTION f2(int) RETURNS t2 AS $$ DECLARE rv t2; re record; BEGIN CASE $1 WHEN 0 THEN RETURN rv; WHEN 1 THEN RETURN re; ELSE RETURN null; END CASE; END; $$ LANGUAGE plpgsql; postgres=# SELECT f2(0); f2 () (1 row) Time: 0.759 ms postgres=# SELECT f2(1); f2 [null] (1 row) Time: 0.570 ms postgres=# SELECT f2(2); f2 [null] (1 row) () is equal to NULL for test IS NULL, but it isn't same - see: Time: 0.586 ms postgres=# SELECT f2(0) is null; ?column? ── t (1 row) Time: 0.548 ms postgres=# SELECT f2(1) is null; ?column? ── t (1 row) Time: 0.535 ms postgres=# SELECT f2(2) is null; ?column? ── t (1 row) postgres=# SELECT 'Hello' || f2(0); ?column? ── Hello() (1 row) Time: 51.546 ms postgres=# SELECT 'Hello' || f2(1); ?column? ── [null] (1 row) so this is one known issue. Actually rowvar := NULL <-> reset all fields inside row. I think so this is perfect from perspective "IS [NOT] NULL" operator. But maybe it isn't practical. So we can distinct between assign some field to NULL and between assign row variable to NULL. This flag can be used just only for returning value. Some like DECLARE r rowtype; BEGIN IF a = 1 THEN RETURN r; -- result is NULL ELSIF a = 2 THEN r.x := NULL; RETURN r; -- result is () ELSIF a = 3 THEN r.x := NULL; r := NULL; RETURN r; -- result is NULL; comments? Is this change some what we would? next question? I found one paradox. When some IS NULL, then any operation with this value should be NULL. But it isn't true for composite values! postgres=# CREATE TYPE t AS (a int, b int); CREATE TYPE Time: 66.605 ms postgres=# SELECT 'Hello' || (NULL, NULL)::t; ?column? ── Hello(,) (1 row) postgres=# SELECT (NULL, NULL)::t is null; ?column? ── t (1 row) does know somebody if this behave is good per ANSI SQL? Regards Pavel Stehule -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] max_wal_senders must die
"Joshua D. Drake" wrote: > On Wed, 2010-10-27 at 19:52 -0400, Robert Haas wrote: >> Josh Berkus wrote: > >>> *you don't know* how many .org users plan to implement >>> replication, whether it's a minority or majority. >> >> None of us know. What I do know is that I don't want PostgreSQL to >> be slower out of the box. > > Poll TIME! If you do take a poll, be careful to put in an option or two to deal with environments where there is "surgical" implementation of replication features. We'll almost certainly be using SR with a custom WAL receiver as part of our solution for our biggest and most distributed data set (circuit court data), but an "out of the box" drop in usage there is not in the cards anytime soon; whereas we're already using HS/SR "out of the box" for a small RoR web app's data. By the way, the other three DBAs here implemented the HS/SR while I was out for a couple days while my dad was in the hospital (so they didn't want to even bother me with a phone call). They went straight from the docs, without the benefit of having tracked any PostgreSQL lists. They told me that it was working great once they figured it out, but it was confusing; it took them a lot of time and a few false starts to get it working. I've been trying to get details to support an improvement in documentation, but if those guys had problems I agree we need to do *something* to make this simpler -- they're bright professionals who manage hundreds of PostgreSQL databases on a full time basis. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
On 28.10.2010 13:54, Boszormenyi Zoltan wrote: A little better version, no need for the heavy hash_any, hash_uint32 on the lower 32 bits on pk_eclass is enough. The profiling runtime is now 0.42 seconds vs the previous 0.41 seconds for the tree version. Actually, I wonder if we could just have a separate canon_pathkeys list for each EquivalenceClass, instead of one big list in PlannerInfo. I'm not too familiar with equivalence classes and all that, but the attached patch at least passes the regressions. I haven't done any performance testing, but I would expect this to be even faster than the hashtable or tree implementations, and a lot simpler. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/backend/nodes/outfuncs.c b/src/backend/nodes/outfuncs.c index ee2aeb0..7a12c47 100644 --- a/src/backend/nodes/outfuncs.c +++ b/src/backend/nodes/outfuncs.c @@ -1595,7 +1595,6 @@ _outPlannerInfo(StringInfo str, PlannerInfo *node) WRITE_NODE_FIELD(init_plans); WRITE_NODE_FIELD(cte_plan_ids); WRITE_NODE_FIELD(eq_classes); - WRITE_NODE_FIELD(canon_pathkeys); WRITE_NODE_FIELD(left_join_clauses); WRITE_NODE_FIELD(right_join_clauses); WRITE_NODE_FIELD(full_join_clauses); @@ -1692,6 +1691,7 @@ _outEquivalenceClass(StringInfo str, EquivalenceClass *node) WRITE_BOOL_FIELD(ec_below_outer_join); WRITE_BOOL_FIELD(ec_broken); WRITE_UINT_FIELD(ec_sortref); + WRITE_NODE_FIELD(ec_canon_pathkeys); } static void diff --git a/src/backend/optimizer/path/pathkeys.c b/src/backend/optimizer/path/pathkeys.c index 643d57a..d5e5c42 100644 --- a/src/backend/optimizer/path/pathkeys.c +++ b/src/backend/optimizer/path/pathkeys.c @@ -93,9 +93,10 @@ make_canonical_pathkey(PlannerInfo *root, while (eclass->ec_merged) eclass = eclass->ec_merged; - foreach(lc, root->canon_pathkeys) + foreach(lc, eclass->ec_canon_pathkeys) { pk = (PathKey *) lfirst(lc); + Assert(eclass == pk->pk_eclass); if (eclass == pk->pk_eclass && opfamily == pk->pk_opfamily && strategy == pk->pk_strategy && @@ -110,7 +111,7 @@ make_canonical_pathkey(PlannerInfo *root, oldcontext = MemoryContextSwitchTo(root->planner_cxt); pk = makePathKey(eclass, opfamily, strategy, nulls_first); - root->canon_pathkeys = lappend(root->canon_pathkeys, pk); + eclass->ec_canon_pathkeys = lappend(eclass->ec_canon_pathkeys, pk); MemoryContextSwitchTo(oldcontext); diff --git a/src/backend/optimizer/plan/planmain.c b/src/backend/optimizer/plan/planmain.c index fd4c6f5..5f8f817 100644 --- a/src/backend/optimizer/plan/planmain.c +++ b/src/backend/optimizer/plan/planmain.c @@ -117,7 +117,10 @@ query_planner(PlannerInfo *root, List *tlist, * We still are required to canonicalize any pathkeys, in case it's * something like "SELECT 2+2 ORDER BY 1". */ +/* XXX: Same as below */ +#if 0 root->canon_pathkeys = NIL; +#endif root->query_pathkeys = canonicalize_pathkeys(root, root->query_pathkeys); root->group_pathkeys = canonicalize_pathkeys(root, @@ -145,7 +148,13 @@ query_planner(PlannerInfo *root, List *tlist, root->join_rel_hash = NULL; root->join_rel_level = NULL; root->join_cur_level = 0; - root->canon_pathkeys = NIL; +/* XXX + * Do we need to reset something here? This is just initializing otherwise + * uninitialized fields, right? + */ +#if 0 + root->canon_pathkeys = NIL; +#endif root->left_join_clauses = NIL; root->right_join_clauses = NIL; root->full_join_clauses = NIL; diff --git a/src/include/nodes/relation.h b/src/include/nodes/relation.h index 6e3d0f3..c959708 100644 --- a/src/include/nodes/relation.h +++ b/src/include/nodes/relation.h @@ -160,8 +160,6 @@ typedef struct PlannerInfo List *eq_classes; /* list of active EquivalenceClasses */ - List *canon_pathkeys; /* list of "canonical" PathKeys */ - List *left_join_clauses; /* list of RestrictInfos for * mergejoinable outer join clauses * w/nonnullable var on left */ @@ -527,6 +525,7 @@ typedef struct EquivalenceClass bool ec_below_outer_join; /* equivalence applies below an OJ */ bool ec_broken; /* failed to generate needed clauses? */ Index ec_sortref; /* originating sortclause label, or 0 */ + List *ec_canon_pathkeys; struct EquivalenceClass *ec_merged; /* set if merged into another EC */ } EquivalenceClass; -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Boszormenyi Zoltan írta: > Boszormenyi Zoltan írta: > >> Boszormenyi Zoltan írta: >> >> >>> Heikki Linnakangas írta: >>> >>> >>> On 26.10.2010 18:34, Boszormenyi Zoltan wrote: > thank you very much for pointing me to dynahash, here is the > next version that finally seems to work. > > Two patches are attached, the first is the absolute minimum for > making it work, this still has the Tree type for canon_pathkeys > and eq_classes got the same treatment as join_rel_list/join_rel_hash > has in the current sources: if the list grows larger than 32, a hash > table > is created. It seems to be be enough for doing in for > get_eclass_for_sort_expr() > only, the other users of eq_classes aren't bothered by this change. > > > That's better, but can't you use dynahash for canon_pathkeys as well? >>> Here's a purely dynahash solution. It's somewhat slower than >>> the tree version, 0.45 vs 0.41 seconds in the cached case for the >>> previously posted test case. >>> >>> >>> >> And now in context diff, sorry for my affection towards unified diffs. :-) >> >> > > A little better version, no need for the heavy hash_any, hash_uint32 > on the lower 32 bits on pk_eclass is enough. The profiling runtime > is now 0.42 seconds vs the previous 0.41 seconds for the tree version. > > Best regards, > Zoltán Böszörményi > Btw, the top entries in the current gprof output are: Each sample counts as 0.01 seconds. % cumulative self self total time seconds secondscalls ms/call ms/call name 19.05 0.08 0.08 482 0.17 0.29 add_child_rel_equivalences 11.90 0.13 0.05 1133447 0.00 0.00 bms_is_subset 9.52 0.17 0.04 331162 0.00 0.00 hash_search_with_hash_value 7.14 0.20 0.03 548971 0.00 0.00 AllocSetAlloc 4.76 0.22 0.02 2858 0.01 0.01 get_tabstat_entry 4.76 0.24 0.02 1136 0.02 0.02 tzload This means add_child_rel_equivalences() is still takes too much time, the previously posted test case calls this function 482 times, it's called for almost every 10th entry added to eq_classes. The elog() I put into this function says that at the last call list_length(eq_classes) == 4754. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add label to enum syntax
Hello, Andrew. You wrote: AD> It occurred to me in the dead of the night that instead of: AD> ALTER TYPE enumtype ADD 'newlabel' AD> it might be better to have: AD> ALTER TYPE enumtype ADD LABEL 'newlabel' AD> That way if we later wanted to support some other sort of ADD operation AD> on types we would be able to more easily. LABEL is already a keyword, so AD> it should be pretty minimally invasive to make this change, and if we AD> want to do it now is the time. AD> Thoughts? AD> cheers AD> andrew Forgot link to poll: http://pgolub.wordpress.com/2010/10/28/poll-alter-type-enumtype-add-what-newlabel/ -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] add label to enum syntax
Hello, Andrew. You wrote: AD> It occurred to me in the dead of the night that instead of: AD> ALTER TYPE enumtype ADD 'newlabel' AD> it might be better to have: AD> ALTER TYPE enumtype ADD LABEL 'newlabel' AD> That way if we later wanted to support some other sort of ADD operation AD> on types we would be able to more easily. LABEL is already a keyword, so AD> it should be pretty minimally invasive to make this change, and if we AD> want to do it now is the time. AD> Thoughts? AD> cheers AD> andrew I'm with you. BTW, I wrote post with poll about this. -- With best wishes, Pavel mailto:pa...@gf.microolap.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Boszormenyi Zoltan írta: > Boszormenyi Zoltan írta: > >> Heikki Linnakangas írta: >> >> >>> On 26.10.2010 18:34, Boszormenyi Zoltan wrote: >>> >>> thank you very much for pointing me to dynahash, here is the next version that finally seems to work. Two patches are attached, the first is the absolute minimum for making it work, this still has the Tree type for canon_pathkeys and eq_classes got the same treatment as join_rel_list/join_rel_hash has in the current sources: if the list grows larger than 32, a hash table is created. It seems to be be enough for doing in for get_eclass_for_sort_expr() only, the other users of eq_classes aren't bothered by this change. >>> That's better, but can't you use dynahash for canon_pathkeys as well? >>> >>> >> Here's a purely dynahash solution. It's somewhat slower than >> the tree version, 0.45 vs 0.41 seconds in the cached case for the >> previously posted test case. >> >> > > And now in context diff, sorry for my affection towards unified diffs. :-) > A little better version, no need for the heavy hash_any, hash_uint32 on the lower 32 bits on pk_eclass is enough. The profiling runtime is now 0.42 seconds vs the previous 0.41 seconds for the tree version. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ diff -dcrpN postgresql.orig/src/backend/optimizer/path/equivclass.c postgresql.1/src/backend/optimizer/path/equivclass.c *** postgresql.orig/src/backend/optimizer/path/equivclass.c 2010-10-15 10:31:47.0 +0200 --- postgresql.1/src/backend/optimizer/path/equivclass.c 2010-10-28 12:37:47.0 +0200 *** *** 24,29 --- 24,30 #include "optimizer/planmain.h" #include "optimizer/prep.h" #include "optimizer/var.h" + #include "utils/hsearch.h" #include "utils/lsyscache.h" *** add_eq_member(EquivalenceClass *ec, Expr *** 360,434 /* ! * get_eclass_for_sort_expr ! * Given an expression and opfamily info, find an existing equivalence ! * class it is a member of; if none, build a new single-member ! * EquivalenceClass for it. ! * ! * sortref is the SortGroupRef of the originating SortGroupClause, if any, ! * or zero if not. (It should never be zero if the expression is volatile!) ! * ! * This can be used safely both before and after EquivalenceClass merging; ! * since it never causes merging it does not invalidate any existing ECs ! * or PathKeys. ! * ! * Note: opfamilies must be chosen consistently with the way ! * process_equivalence() would do; that is, generated from a mergejoinable ! * equality operator. Else we might fail to detect valid equivalences, ! * generating poor (but not incorrect) plans. */ ! EquivalenceClass * ! get_eclass_for_sort_expr(PlannerInfo *root, ! Expr *expr, ! Oid expr_datatype, ! List *opfamilies, ! Index sortref) { ! EquivalenceClass *newec; ! EquivalenceMember *newem; ListCell *lc1; ! MemoryContext oldcontext; /* ! * Scan through the existing EquivalenceClasses for a match */ ! foreach(lc1, root->eq_classes) { ! EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1); ! ListCell *lc2; /* ! * Never match to a volatile EC, except when we are looking at another ! * reference to the same volatile SortGroupClause. */ ! if (cur_ec->ec_has_volatile && ! (sortref == 0 || sortref != cur_ec->ec_sortref)) ! continue; ! ! if (!equal(opfamilies, cur_ec->ec_opfamilies)) continue; ! foreach(lc2, cur_ec->ec_members) { ! EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2); ! ! /* ! * If below an outer join, don't match constants: they're not as ! * constant as they look. ! */ ! if (cur_ec->ec_below_outer_join && ! cur_em->em_is_const) ! continue; ! if (expr_datatype == cur_em->em_datatype && ! equal(expr, cur_em->em_expr)) ! return cur_ec; /* Match! */ } } /* - * No match, so build a new single-member EC - * * Here, we must be sure that we construct the EC in the right context. We * can assume, however, that the passed expr is long-lived. */ --- 361,463 /* ! * eq_classes_match - matching function for eq_classes_hash in PlannerInfo */ ! static int ! eq_classes_match(const void *key1, const void *key2, Size keysize) { ! EquivalenceClass *ec1 = (EquivalenceClass *) key1; /* this is in the hashtable */ ! EquivalenceClass *ec2 = (EquivalenceClass *) key2; /* this is the new matched entry */ ListCell *lc1; ! ListCell *lc2; /* ! * Never match to a volatile EC, except when we are looking at another ! * reference to the same volatile S
[HACKERS] Error message with COMMIT/ROLLBACK within PL/pgSQL
This is not very user-friendly: postgres=# do $$ begin COMMIT; end; $$; ERROR: SPI_execute_plan_with_paramlist failed executing query "COMMIT": SPI_ERROR_TRANSACTION CONTEXT: PL/pgSQL function "inline_code_block" line 2 at SQL statement Clearly we don't support that, but seems like it would deserve a better error message. Curiously, we *do* give a better error message if you try that with EXECUTE: postgres=# do $$ begin EXECUTE 'COMMIT'; end; $$; ERROR: cannot begin/end transactions in PL/pgSQL HINT: Use a BEGIN block with an EXCEPTION clause instead. CONTEXT: PL/pgSQL function "inline_code_block" line 2 at EXECUTE statement Barring objections, I'll add the same error message to the non-execute codepath. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com diff --git a/src/pl/plpgsql/src/pl_exec.c b/src/pl/plpgsql/src/pl_exec.c index 9929e04..a21ea53 100644 --- a/src/pl/plpgsql/src/pl_exec.c +++ b/src/pl/plpgsql/src/pl_exec.c @@ -2889,6 +2889,17 @@ exec_stmt_execsql(PLpgSQL_execstate *estate, exec_set_found(estate, false); break; + /* Some SPI errors deserve specific error messages */ + case SPI_ERROR_COPY: + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot COPY to/from client in PL/pgSQL"))); + case SPI_ERROR_TRANSACTION: + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("cannot begin/end transactions in PL/pgSQL"), + errhint("Use a BEGIN block with an EXCEPTION clause instead."))); + default: elog(ERROR, "SPI_execute_plan_with_paramlist failed executing query \"%s\": %s", expr->query, SPI_result_code_string(rc)); -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] plan time of MASSIVE partitioning ...
Boszormenyi Zoltan írta: > Heikki Linnakangas írta: > >> On 26.10.2010 18:34, Boszormenyi Zoltan wrote: >> >>> thank you very much for pointing me to dynahash, here is the >>> next version that finally seems to work. >>> >>> Two patches are attached, the first is the absolute minimum for >>> making it work, this still has the Tree type for canon_pathkeys >>> and eq_classes got the same treatment as join_rel_list/join_rel_hash >>> has in the current sources: if the list grows larger than 32, a hash >>> table >>> is created. It seems to be be enough for doing in for >>> get_eclass_for_sort_expr() >>> only, the other users of eq_classes aren't bothered by this change. >>> >> That's better, but can't you use dynahash for canon_pathkeys as well? >> > > Here's a purely dynahash solution. It's somewhat slower than > the tree version, 0.45 vs 0.41 seconds in the cached case for the > previously posted test case. > And now in context diff, sorry for my affection towards unified diffs. :-) > Best regards, > Zoltán Böszörményi > > -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ diff -dcrpN postgresql.orig/src/backend/optimizer/path/equivclass.c postgresql.1/src/backend/optimizer/path/equivclass.c *** postgresql.orig/src/backend/optimizer/path/equivclass.c 2010-10-15 10:31:47.0 +0200 --- postgresql.1/src/backend/optimizer/path/equivclass.c 2010-10-26 17:01:57.0 +0200 *** *** 24,29 --- 24,30 #include "optimizer/planmain.h" #include "optimizer/prep.h" #include "optimizer/var.h" + #include "utils/hsearch.h" #include "utils/lsyscache.h" *** add_eq_member(EquivalenceClass *ec, Expr *** 360,434 /* ! * get_eclass_for_sort_expr ! * Given an expression and opfamily info, find an existing equivalence ! * class it is a member of; if none, build a new single-member ! * EquivalenceClass for it. ! * ! * sortref is the SortGroupRef of the originating SortGroupClause, if any, ! * or zero if not. (It should never be zero if the expression is volatile!) ! * ! * This can be used safely both before and after EquivalenceClass merging; ! * since it never causes merging it does not invalidate any existing ECs ! * or PathKeys. ! * ! * Note: opfamilies must be chosen consistently with the way ! * process_equivalence() would do; that is, generated from a mergejoinable ! * equality operator. Else we might fail to detect valid equivalences, ! * generating poor (but not incorrect) plans. */ ! EquivalenceClass * ! get_eclass_for_sort_expr(PlannerInfo *root, ! Expr *expr, ! Oid expr_datatype, ! List *opfamilies, ! Index sortref) { ! EquivalenceClass *newec; ! EquivalenceMember *newem; ListCell *lc1; ! MemoryContext oldcontext; /* ! * Scan through the existing EquivalenceClasses for a match */ ! foreach(lc1, root->eq_classes) { ! EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1); ! ListCell *lc2; /* ! * Never match to a volatile EC, except when we are looking at another ! * reference to the same volatile SortGroupClause. */ ! if (cur_ec->ec_has_volatile && ! (sortref == 0 || sortref != cur_ec->ec_sortref)) ! continue; ! ! if (!equal(opfamilies, cur_ec->ec_opfamilies)) continue; ! foreach(lc2, cur_ec->ec_members) { ! EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2); ! ! /* ! * If below an outer join, don't match constants: they're not as ! * constant as they look. ! */ ! if (cur_ec->ec_below_outer_join && ! cur_em->em_is_const) ! continue; ! if (expr_datatype == cur_em->em_datatype && ! equal(expr, cur_em->em_expr)) ! return cur_ec; /* Match! */ } } /* - * No match, so build a new single-member EC - * * Here, we must be sure that we construct the EC in the right context. We * can assume, however, that the passed expr is long-lived. */ --- 361,463 /* ! * eq_classes_match - matching function for eq_classes_hash in PlannerInfo */ ! static int ! eq_classes_match(const void *key1, const void *key2, Size keysize) { ! EquivalenceClass *ec1 = (EquivalenceClass *) key1; /* this is in the hashtable */ ! EquivalenceClass *ec2 = (EquivalenceClass *) key2; /* this is the new matched entry */ ListCell *lc1; ! ListCell *lc2; /* ! * Never match to a volatile EC, except when we are looking at another ! * reference to the same volatile SortGroupClause. */ ! if (ec1->ec_has_volatile && ! (ec2->ec_sortref == 0 || ec2->ec_sortref != ec1->ec_sortref)) ! return 1; ! ! if (!equal(ec1->ec_opfamilies, ec2->ec_opfamilies)) ! return 1; ! ! foreach(lc1, ec1->ec_members) { ! EquivalenceMember *em1 = (EquivalenceMember *) lfirst(lc1
Re: [HACKERS] plan time of MASSIVE partitioning ...
Heikki Linnakangas írta: > On 26.10.2010 18:34, Boszormenyi Zoltan wrote: >> thank you very much for pointing me to dynahash, here is the >> next version that finally seems to work. >> >> Two patches are attached, the first is the absolute minimum for >> making it work, this still has the Tree type for canon_pathkeys >> and eq_classes got the same treatment as join_rel_list/join_rel_hash >> has in the current sources: if the list grows larger than 32, a hash >> table >> is created. It seems to be be enough for doing in for >> get_eclass_for_sort_expr() >> only, the other users of eq_classes aren't bothered by this change. > > That's better, but can't you use dynahash for canon_pathkeys as well? Here's a purely dynahash solution. It's somewhat slower than the tree version, 0.45 vs 0.41 seconds in the cached case for the previously posted test case. Best regards, Zoltán Böszörményi -- -- Zoltán Böszörményi Cybertec Schönig & Schönig GmbH Gröhrmühlgasse 26 A-2700 Wiener Neustadt, Austria Web: http://www.postgresql-support.de http://www.postgresql.at/ diff -durpN postgresql.orig/src/backend/optimizer/path/equivclass.c postgresql.1/src/backend/optimizer/path/equivclass.c --- postgresql.orig/src/backend/optimizer/path/equivclass.c 2010-10-15 10:31:47.0 +0200 +++ postgresql.1/src/backend/optimizer/path/equivclass.c 2010-10-26 17:01:57.0 +0200 @@ -24,6 +24,7 @@ #include "optimizer/planmain.h" #include "optimizer/prep.h" #include "optimizer/var.h" +#include "utils/hsearch.h" #include "utils/lsyscache.h" @@ -360,75 +361,103 @@ add_eq_member(EquivalenceClass *ec, Expr /* - * get_eclass_for_sort_expr - * Given an expression and opfamily info, find an existing equivalence - * class it is a member of; if none, build a new single-member - * EquivalenceClass for it. - * - * sortref is the SortGroupRef of the originating SortGroupClause, if any, - * or zero if not. (It should never be zero if the expression is volatile!) - * - * This can be used safely both before and after EquivalenceClass merging; - * since it never causes merging it does not invalidate any existing ECs - * or PathKeys. - * - * Note: opfamilies must be chosen consistently with the way - * process_equivalence() would do; that is, generated from a mergejoinable - * equality operator. Else we might fail to detect valid equivalences, - * generating poor (but not incorrect) plans. + * eq_classes_match - matching function for eq_classes_hash in PlannerInfo */ -EquivalenceClass * -get_eclass_for_sort_expr(PlannerInfo *root, - Expr *expr, - Oid expr_datatype, - List *opfamilies, - Index sortref) +static int +eq_classes_match(const void *key1, const void *key2, Size keysize) { - EquivalenceClass *newec; - EquivalenceMember *newem; + EquivalenceClass *ec1 = (EquivalenceClass *) key1; /* this is in the hashtable */ + EquivalenceClass *ec2 = (EquivalenceClass *) key2; /* this is the new matched entry */ ListCell *lc1; - MemoryContext oldcontext; + ListCell *lc2; /* - * Scan through the existing EquivalenceClasses for a match + * Never match to a volatile EC, except when we are looking at another + * reference to the same volatile SortGroupClause. */ - foreach(lc1, root->eq_classes) + if (ec1->ec_has_volatile && + (ec2->ec_sortref == 0 || ec2->ec_sortref != ec1->ec_sortref)) + return 1; + + if (!equal(ec1->ec_opfamilies, ec2->ec_opfamilies)) + return 1; + + foreach(lc1, ec1->ec_members) { - EquivalenceClass *cur_ec = (EquivalenceClass *) lfirst(lc1); - ListCell *lc2; + EquivalenceMember *em1 = (EquivalenceMember *) lfirst(lc1); /* - * Never match to a volatile EC, except when we are looking at another - * reference to the same volatile SortGroupClause. + * If below an outer join, don't match constants: they're not as + * constant as they look. */ - if (cur_ec->ec_has_volatile && - (sortref == 0 || sortref != cur_ec->ec_sortref)) - continue; - - if (!equal(opfamilies, cur_ec->ec_opfamilies)) + if (ec1->ec_below_outer_join && + em1->em_is_const) continue; - foreach(lc2, cur_ec->ec_members) + foreach(lc2, ec2->ec_members) { - EquivalenceMember *cur_em = (EquivalenceMember *) lfirst(lc2); - - /* - * If below an outer join, don't match constants: they're not as - * constant as they look. - */ - if (cur_ec->ec_below_outer_join && -cur_em->em_is_const) -continue; + EquivalenceMember *em2 = (EquivalenceMember *) lfirst(lc2); - if (expr_datatype == cur_em->em_datatype && -equal(expr, cur_em->em_expr)) -return cur_ec; /* Match! */ + if (em1->em_datatype == em2->em_datatype && +equal(em1->em_expr, em2->em_expr)) +return 0; } } + return 1; +} + + +/* + * build_eq_classes_hash + * Build the initial contents of PlannerInfo.eq_classes_hash + * for faster search in PlannerInfo.eq_classes. This is used + * to make get_eclass_for_sort_expr() faster for la