Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Mar 25, 2011, at 11:23 PM, Tom Lane wrote: If this were PL/perl, or PL/almost-anything-except-SQL, I could get behind such a proposal. But it's not, it's SQL; and SQL doesn't do things that way. SQL's idea of disambiguation is qualified names. And even more to the point: to the extent you think that weird syntax might be a suitable solution, you have to keep in mind that the SQL committee could take over any such syntax at the drop of a hat. See the recent unpleasantness concerning = ... Perhaps we could use `=varname`. ;-P David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Proposal: q-gram GIN and GiST indexes
On Fri, Mar 25, 2011 at 8:32 PM, Alexander Korotkov aekorot...@gmail.com wrote: I would like to ask you about currency of the work above. This seems to be a mess of words. Sorry for my bad english. Actually, I meant that I need a appraisal of my proposal. With best regards, Alexander Korotkov. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
2011/3/26 Darren Duncan dar...@darrenduncan.net: Robert Haas wrote: On Mar 25, 2011, at 9:22 PM, Joshua Berkus j...@agliodbs.com wrote: Tom, Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this. Also, I don't understand why this would be a dump/reload issue if $1 and $2 continue to work. Because an identifier that previously referred unambiguously to a column might now be ambiguous, if there is a parameter with the same name. I mention 2 possible solutions here, both which involve syntax alterations, each between the -- lines. I personally like the second/lower option more. Might it be reasonable, perhaps as a 9.2 feature, to add top-level-namespaces so that one could always explicitly qualify what they are referring to? For example, you could have the 3 sch, lex, attr (I may have missed some useful ones). The sch TLN would unambiguously refer directly to a schema object, such as a database table. The lex TLN would unambiguously refer directly to a lexical, either a parameter of the current routine or to a lexical variable. The attr TLN would unambiguously refer to a table/etc column/attribute in the manner typical for SQL. Use them like: sch.foo - the table/etc foo lex.foo - the lexical variable foo attr.foo - the column foo Use of these TLN are optional where there is no ambiguity. The TLN are not reserved words, but if one has an entity named the same, then references to it must be TLN-qualified; eg: lex.sch lex.lex lex.attr Now these are just examples. You may find a different set works better. -1 this is not based on any pattern on SQL. It's not simple, and it introduce a reserved keywords Regards Pavel -- There are also alternate solutions. For example, it could be mandated that lexical-scope aliases for any data/var-like schema object are required in routines, where the aliases are distinct from all lexical vars/params/etc, and then all SQL/code in the routines may only refer to the schema objects by the aliases. Effectively this makes it so that routines can no longer see non-lexical vars but for those from parameters, and this aliasing is defining a parameter whose argument is supplied by the DBMS automatically rather than as an explicit routine caller argument. That way, inside a routine body there are only lexical names for things, and so no namespace-qualification is ever needed by the regular SQL. Similarly, if you always think of table column names as referring to an attribute or element of a table variable, then just reference the column qualified by the table name (or the lexical alias thereof). Same as you do in any other programming language. Of course, sometimes you don't have to qualify column name references as context could make it unambiguous. Or, a shorthand like a simple leading . could unambiguously say you're referring to a column of the particular table in context. With those in place, all unqualified references are straight to lexical variables or parameters. And so, this is also an effective way to resolve the ambiguity and I prefer the latter design personally. Here's an example in quasi-PL/PgSQL: create function myrtn (myparam integer, mytbl ::= mydb.myschema.mytbl) as declare myvar integer := 5; $body$ begin select (.mycol + myvar * myparam) as mynewcol from mytbl; end; $body$ Note that I've already thought through this last example as these methods of avoiding ambiguity are loosely-speaking how my language Muldis D avoids the problem faced by many SQL procedures. The .mycol syntax specifically was inspired originally for me by Perl 6 where the lack of something just before the . means that the implicit topic variable is referred to, like if you said $_.mycol. A Perl 6 analogy being something like: $mytbl.map:{ .mycol + $myvar * $myparam } aka: $mytbl.map:{ $_.mycol + $myvar * $myparam } -- -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Sat, Mar 26, 2011 at 12:33 AM, Tom Lane t...@sss.pgh.pa.us wrote: The correct question is whether we're ready for beta, and I would say the answer is clearly no, unless you have a pretty low standard for what ready for beta means. Perhaps it would be suitable to discuss what the standard for that really ought to be; but I don't agree in the slightest that we ought to decide based on predetermined calendar dates rather than the state of the code. OK If we had a hard date for feature freeze, lets have a hard date for Beta of +2 months (next time), and +2.5 months now. (I know +1 month was suggested, well that's just unrealistic). Beta is a great time to resolve difficult decisions, by opening the floor to wider debate and feedback. The reason we get wider testing during beta is that people have some confidence (perhaps misplaced) that the database won't eat their data. Releasing alpha-grade code and calling it beta isn't going to get us wider testing ... at least, not more than once. I agree that we seem to have slightly different viewpoints on what Beta means. Your definition of Beta sounds like safe enough to run production systems on, but not trying to put words in your mouth. I think if we have a clear statement of what Beta actually means it would help us know when we've achieved it. And it will also inform potential Beta testers of what we mean by it. The basic point of this post was this: If we wait for the Open Items list to drop to zero, many people are unable to contribute and that means delay. Also, waiting for the Open Items list to drop to zero puts the schedule in the hands of one or two individuals, which is a bad thing. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Sat, Mar 26, 2011 at 1:48 AM, Robert Haas robertmh...@gmail.com wrote: On Fri, Mar 25, 2011 at 6:18 PM, Simon Riggs si...@2ndquadrant.com wrote: The sooner we declare Beta, the sooner people will test. Then we will have user feedback, bugs to fix etc.. Everybody is very clearly sitting idle. With a longer bug list we will make faster progress to release. We're just wasting time. I can't resist observing that if you want beta to happen sooner, it would be better not to commit major and largely unreviewed patches three weeks after the end of the last CommitFest. Before you insist that it was reviewed, the version that was actually committed bore so little resemblance to the versions that were posted earlier that any earlier review that was done was basically meaningless in terms of ensuring that the final product was bug free, and it wasn't and isn't. I complained *repeatedly* about the need to get both collation support and sync rep finished and committed sooner, for exactly this reason. We are now reaping the entirely predictable fruit of having failed to make that happen. But for those two patches, we would likely be in beta already, or darn close. http://archives.postgresql.org/pgsql-hackers/2010-12/msg01257.php http://archives.postgresql.org/pgsql-hackers/2011-01/msg01209.php http://archives.postgresql.org/pgsql-hackers/2011-01/msg02811.php http://archives.postgresql.org/pgsql-hackers/2011-02/msg00438.php There are two responses to your comments. First, you are presuming that the state of those patches must hold up the whole release process. I don't think it should. You want to see it finished before it goes to Beta. I want to see wider input before we consider it finished. In your way of seeing it, you have great input into the decision of what is finished or not. I prefer to open things up to a wider audience, who don't normally get a say until too late. Yes, you do send a great many very long emails and many of them are complaints. If I had read every single word of the many you've written we would be delayed even further. Debating minor points endlessly does not move the world forwards it just delays it. You should trust more that if your points are valid that they will be brought up by another sometime soon. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open issues for collations
On Sat, Mar 26, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I think some discussion of which of the things on the open item lists need to be done before beta might be helpful, and we ought to add any items that are not there but are blockers. Here's a quick enumeration of some things I think need discussion about the collations patch: * Are we happy yet with the collation assignment behavior (see parse_collate.c)? A couple of specific subtopics: ** Selecting a field from a record-returning function's output. Currently, we'll use the field's declared collation; except that if the field has default collation, we'll replace that with the common collation of the function's inputs, if any. Is either part of that sane? Do we need to make this work for functions invoked with other syntax than a plain function call, eg operator or cast syntax? ** What to do with domains whose declaration includes a COLLATE clause? Currently, we'll impute that collation to the result of a cast to the domain type --- even if the cast's input expression includes an explicit COLLATE clause. It's not clear that that's per spec. If it is correct, should we behave similarly for functions that are declared to return a domain type? Should it matter if the cast-to-domain is explicit or implicit? Perhaps it'd be best if domain collations only mattered for columns declared with that domain type. Then we'd have a general rule that collations only come into play in an expression as a result of (a) the declared type of a column reference or (b) an explicit COLLATE clause. * In plpgsql, is it OK for declared local variables to inherit the function's input collation? Should we provide a COLLATE option in variable declarations to let that be overridden? If Oracle understands COLLATE, probably we should look at what they do in PL/SQL. * RI triggers should insert COLLATE clauses in generated queries to satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the referenced column's collation. Right now you may get either table's collation depending on which query type is involved. I think an obvious failure may not be possible so long as equality means the same thing in all collations, but it's definitely possible that the planner might decide it can't use the referenced column's unique index, which would suck for performance. (Note: this rule seems to prove that the committee assumes equality can mean different things in different collations, else they'd not have felt the need to specify.) * It'd sure be nice if we had some nontrivial test cases that work in encodings besides UTF8. I'm still bothered that the committed patch failed to cover single-byte-encoding cases in upper/lower/initcap. * Remove initdb's warning about useless locales? Seems like pointless noise, or at least something that can be relegated to debug mode. * Is it worth adding a cares-about-collation flag to pg_proc? Probably too late to be worrying about such refinements for 9.1. There are a bunch of other minor issues that I'm still working through, but these are the ones that seem to merit discussion. That's a long list and I think it's clear that we won't resolve all of those issues to everybody's satisfaction in a single release, let alone in next week or so. We need a way forwards. What I think we should do is add detailed documentation on how it works now. There are many people that would love to help, but not everybody can visualise exactly the points you are making above, I would confess that I can't. Having docs that clearly explain a neat new capability and the various possible gotcha/caveats will help others come up with test cases and ideas. It seems to me likely that in real usage many of those gotchas will drop away because they represent unlikely or perverse use cases. I don't see anything bad in releasing software that has unresolved questions, as long as those items are clearly flagged up and we specifically ask for feedback on them. That looks to me to be a many-eyeballs approach to the problem. http://en.wikipedia.org/wiki/Linus%27_Law Tucking our shoelaces into our shoes doesn't mean the loose ends are resolved fully. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 1:11 AM, Robert Haas robertmh...@gmail.com wrote: I believe I've figured out why synchronous replication has such terrible performance with fsync=off: it has a nasty race condition. It may happen - if the standby responds very quickly - that the standby acks the commit record and awakens waiters before the committing backend actually begins to wait. There's no cross-check for this: the committing backend waits unconditionally, with no regard to whether the necessary ACK has already arrived. At this point we may be in for a very long wait: another ACK will be required to release waiters, and that may not be immediately forthcoming. I had thought that the next ACK (after at most wal_receiver_status_interval) would do the trick, but it appears to be even worse than that: by making the standby win the race, I was easily able to get the master to hang for over a minute, and it only got released when I committed another transaction. Had I been sufficiently patient, the next checkpoint probably would have done the trick. Of course, with fsync=off on the standby, it's much easier for the standby to win the race. That seems very unlikely even with fsync=off in a real config where we have network path to consider. Your definition of a nasty race condition seems off. I've added code for you. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Mar 25, 2011, at 11:23 PM, Tom Lane t...@sss.pgh.pa.us wrote: If this were PL/perl, or PL/almost-anything-except-SQL, I could get behind such a proposal. But it's not, it's SQL; and SQL doesn't do things that way. SQL's idea of disambiguation is qualified names. And even more to the point: to the extent you think that weird syntax might be a suitable solution, you have to keep in mind that the SQL committee could take over any such syntax at the drop of a hat. See the recent unpleasantness concerning = ... You can't be guaranteed that they won't standardize something incompatible no matter what we do. We could choose to do it as you've proposed and they could then standardize some weird syntax - the = is a fairly relevant example of exactly that. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Lock problem with autovacuum truncating heap
We have run across a problem with autovacuum that occurs when it can truncate off a large amount of empty blocks. It behaves different in version 9.0 than previous versions. Both behaviors are bad. Consider a relation receives constant inserts/updates that are satisfied using freespace at the beginning of the heap. Delete operations now have removed large amounts of tuples at the end of the relation. The following autovacuum will find a large amount of blocks at the end, that can be truncated. Vacuumlazy now takes out an access exclusive lock and scans the relation *backwards* to find out if concurrent access has created new tuples in the to be truncated space. Apparently such a backward scan in 8K blocks isn't really a good access strategy. Up to 8.4, it simply holds the lock until it is done, which in our case stalled a production system for 12 minutes! This is obviously bad. In 9.0, the autovacuum process will be aborted about 1 second after another transaction starts waiting for a lock. The result is that even a simple INSERT will take 1 second. The autovacuum restarts shortly after and somehow gets to a point, where it will cause this 1 second hiccup ever 2 minutes. This is slightly better but still far from optimal in a world, where transaction response times are measured in milliseconds. My current idea for a fix is to modify lazy_truncate_heap(). It does acquire and release the exclusive lock, so it should be possible to do this in smaller chunks, releasing and reacquiring the lock so that client transactions can get their work done as well. At the same time I would change count_nondeletable_pages() so that it uses a forward scan direction (if that leads to a speedup). Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Sat, Mar 26, 2011 at 9:22 AM, Simon Riggs si...@2ndquadrant.com wrote: First, you are presuming that the state of those patches must hold up the whole release process. I don't think it should There's not much point in releasing a beta with behaviour that we know we intend to change. All it will do is elicit bug reports that we have to respond to saying we know, we were going to change that anyways. I think the goal of a beta is to be able to say we think this is the final behaviour of the next release but we're open to feedback. Once we release the final release we're pretty stuck with the behaviour unless the problems are severe enough to justify changing it. And before the beta, in the alpha releases then it's clear to users that they're seeing work in progress and is most appropriate for people who are already following -hackers. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Mar 26, 2011, at 4:27 AM, Simon Riggs si...@2ndquadrant.com wrote: The basic point of this post was this: If we wait for the Open Items list to drop to zero, many people are unable to contribute and that means delay. Also, waiting for the Open Items list to drop to zero puts the schedule in the hands of one or two individuals, which is a bad thing. As far as I can tell, everyone is just as free to make suggestions and review patches right as now as they always are. In fact, I do not particularly enjoy slogging through this list of open items. I would be more than happy to have more help. There are plenty of issues there that require real thought, and work, and I have no particular desire to be the one that fixes them all. You seem to feel that these issues are quite subjective and that the right behavior is altogether unclear. I disagree. There are a few things that may fall into that category, but I think for the most part we're fixing bugs and major usability problems. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On 03/25/2011 06:18 PM, Simon Riggs wrote: Judging by the number of new threads about development for 9.2, I think its time we declared 9.1 Beta. I just had a conversation with some Debian developers about how PostgreSQL 9.0 got pulled out of their release because we delayed by 3 weeks. So we missed our slot to deliver useful new features to our very best supporters by 2 years. I really hope that was deliberate. ISTR Debian on at least on occasion not including a version of Postgres that had been fully released quite some time before their release. We're always going to be missing someone's timeline, not matter what timeline we adopt, anyway. So colour me unimpressed by this whole line of argument. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Mar 26, 2011, at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote: That seems very unlikely even with fsync=off in a real config where we have network path to consider. Your definition of a nasty race condition seems off. I've added code for you. Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why? It is not our policy - and bad practice in general - to ship code with race conditions. The code you added is not for me; it's to fix a race condition. Thanks for the quick fix. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] When and how many times does ExecSetParamPlan executes?
Thanks a lot Mr. Tom. I understand it better now. I see that the function ExecSetParamPlan is indeed called only once when executing a query which would have a paramkind = PARAM_EXEC. The query helped me see the run in debugger, making things clearer (a lot clearer in fact, especially reading your last reply again and again). Thank you again. Regards, Vaibhav On Fri, 2011-03-25 at 15:59 -0400, Tom Lane wrote: Vaibhav Kaushal vaibhavkaushal...@gmail.com writes: So, I think that the function ExecSetParamPlan (as the code suggests too) is called _once_ in any plan/expression and that should be mostly for a sub-select query. Kindly correct me if I am wrong. Since I am not able to understand this usecase completely, a sample query which is capable of calling this function (ExecSetParamPlan) could show some light. It would be really kind of you / anyone to show me a query executable through psql which can actually call ExecSetParamPlan and involves the use of a on-disk relation. regression=# explain verbose select *, (select sum(f1) from int4_tbl) ss from int8_tbl; QUERY PLAN - Seq Scan on public.int8_tbl (cost=1.07..2.12 rows=5 width=16) Output: int8_tbl.q1, int8_tbl.q2, $0 InitPlan 1 (returns $0) - Aggregate (cost=1.06..1.07 rows=1 width=4) Output: sum(int4_tbl.f1) - Seq Scan on public.int4_tbl (cost=0.00..1.05 rows=5 width=4) Output: int4_tbl.f1 (7 rows) $0 here represents the PARAM_EXEC Param. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open issues for collations
On Sat, Mar 26, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: ** Selecting a field from a record-returning function's output. Currently, we'll use the field's declared collation; except that if the field has default collation, we'll replace that with the common collation of the function's inputs, if any. Is either part of that sane? Do we need to make this work for functions invoked with other syntax than a plain function call, eg operator or cast syntax? Either of those sounds reasonable but the combination seems weird. Some example functions might help: list_words('foo bar bar') - ('foo'), ('bar'), ('baz') fetch_users_by_lastname('Smith') - (1,'John','Smith','Great Britain','GB'), (2,'Jürgen','Smith','DE') fetch_users_by_countrycode('DE') - (2,'Jürgen','Smith','DE') The first looks like it should definitely inherit. The second the result set is heterogeneous and inheriting might be the best compromise but it would produce very strange results for columns like the country-code which should just use their defined collation of C. The third case inheriting the country code's collation of C would be very strange and definitely wrong. It occurs to me that if we have any inherited cases people might come to depend on that behaviour and there would be no out for us. Whereas if we say record return values always use the record type's field's collations then we could always later add a collation of type _inherited or _anycollation or some such that indicated that that column should inherit the arguments' collation and it wouldn't affect any existing code. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open issues for collations
On Mar 26, 2011, at 12:36 AM, Tom Lane t...@sss.pgh.pa.us wrote: ** Selecting a field from a record-returning function's output. Currently, we'll use the field's declared collation; except that if the field has default collation, we'll replace that with the common collation of the function's inputs, if any. Is either part of that sane? Do we need to make this work for functions invoked with other syntax than a plain function call, eg operator or cast syntax? I am not an expert on this topic in any way. That having been said, the first part of that rule seems quite sane. The second part seems less clear, but probably also sane. ** What to do with domains whose declaration includes a COLLATE clause? Currently, we'll impute that collation to the result of a cast to the domain type --- even if the cast's input expression includes an explicit COLLATE clause. I would have thought that an explicit COLLATE clause would trump any action at a distance. * In plpgsql, is it OK for declared local variables to inherit the function's input collation? Should we provide a COLLATE option in variable declarations to let that be overridden? If Oracle understands COLLATE, probably we should look at what they do in PL/SQL. I don't know what Oracle does, but a collate option in variable declarations seems like a very good idea. Inheriting the input collation if not specified seems good too. I also suspect we might need something like COLLATE FROM $1, but maybe that's a 9.2 feature. * RI triggers should insert COLLATE clauses in generated queries to satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the referenced column's collation. Right now you may get either table's collation depending on which query type is involved. I think an obvious failure may not be possible so long as equality means the same thing in all collations, but it's definitely possible that the planner might decide it can't use the referenced column's unique index, which would suck for performance. (Note: this rule seems to prove that the committee assumes equality can mean different things in different collations, else they'd not have felt the need to specify.) No idea what to do about this. * It'd sure be nice if we had some nontrivial test cases that work in encodings besides UTF8. I'm still bothered that the committed patch failed to cover single-byte-encoding cases in upper/lower/initcap. Or this. * Remove initdb's warning about useless locales? Seems like pointless noise, or at least something that can be relegated to debug mode. +1. * Is it worth adding a cares-about-collation flag to pg_proc? Probably too late to be worrying about such refinements for 9.1. Depends how much knock-on work it'll create. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
Greg Stark gsst...@mit.edu writes: There's not much point in releasing a beta with behaviour that we know we intend to change. All it will do is elicit bug reports that we have to respond to saying we know, we were going to change that anyways. I think the goal of a beta is to be able to say we think this is the final behaviour of the next release but we're open to feedback. Yeah, I think this is a productive way to approach the question. I would put on a couple of extra conditions, though. Something like this: * No open issues that are expected to result in user-visible behavior changes. (Or at least significant changes? But then we have to argue about what's significant --- for instance, are the questions in the nearby collations-issues thread significant enough to be beta blockers?) * No open issues that are expected to result in a catversion bump. (With pg_upgrade, this is not as critical as it used to be, but I still think catalog stability is a good indicator of a release's maturity) * No known data-loss-causing bugs (duh) Comments? Any other quality criteria we should have for beta? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
Robert Haas robertmh...@gmail.com writes: I've added code for you. Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why? It is not our policy - and bad practice in general - to ship code with race conditions. The code you added is not for me; it's to fix a race condition. In particular, in view of today's fix, shouldn't this commit be reverted? http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 I thought at the time that that was nothing more than documenting a known bug, and now it is documenting a dead bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lock problem with autovacuum truncating heap
On Sat, Mar 26, 2011 at 2:30 PM, Jan Wieck janwi...@yahoo.com wrote: My current idea for a fix is to modify lazy_truncate_heap(). It does acquire and release the exclusive lock, so it should be possible to do this in smaller chunks, releasing and reacquiring the lock so that client transactions can get their work done as well. Agreed, presumably with vacuum delay in there as well? At the same time I would change count_nondeletable_pages() so that it uses a forward scan direction (if that leads to a speedup). Do we need that? Linux readahead works in both directions doesn't it? Guess it wouldn't hurt too much. BTW does it read the blocks at that point using a buffer strategy? -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: I've added code for you. Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why? It is not our policy - and bad practice in general - to ship code with race conditions. The code you added is not for me; it's to fix a race condition. In particular, in view of today's fix, shouldn't this commit be reverted? http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 I thought at the time that that was nothing more than documenting a known bug, and now it is documenting a dead bug. No, that doc change is still accurate. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
Simon Riggs si...@2ndquadrant.com writes: On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: In particular, in view of today's fix, shouldn't this commit be reverted? http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 I thought at the time that that was nothing more than documenting a known bug, and now it is documenting a dead bug. No, that doc change is still accurate. Well, in that case, it should be on the open-items list. If the system is still behaving that way, it's a bug. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 3:00 PM, Robert Haas robertmh...@gmail.com wrote: On Mar 26, 2011, at 6:16 AM, Simon Riggs si...@2ndquadrant.com wrote: That seems very unlikely even with fsync=off in a real config where we have network path to consider. Your definition of a nasty race condition seems off. I've added code for you. Your skepticism seems out of place. I actually hit this problem in testing. We could debate how realistic my test setup was, but why? It is not our policy - and bad practice in general - to ship code with race conditions. The code you added is not for me; it's to fix a race condition. A race that will not be lost with normal and sensible settings and results in a slight performance degradation only in a contrived worse case scenario. There is no question that it caused any other problem. Thanks for the quick fix. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open issues for collations
On Sat, Mar 26, 2011 at 3:16 PM, Robert Haas robertmh...@gmail.com wrote: ** What to do with domains whose declaration includes a COLLATE clause? Currently, we'll impute that collation to the result of a cast to the domain type --- even if the cast's input expression includes an explicit COLLATE clause. I would have thought that an explicit COLLATE clause would trump any action at a distance. I think an explicit COLLATE *outside* the cast would. But inside the cast? The question comes down to whether a domain with a collate clause is explicitly providing a collation or implicitly. So again, examples: CREATE DOMAIN name AS text COLLATE english; CREATE DOMAIN countrycode AS char(2) COLLATE C; 1) SELECT * from users where country = 'DE' order by first_name COLLATE german; 2) SELECT * from users where country = 'DE' order by (tolower(first_name) COLLATE german)::name; 3) SELECT * from users order by substr(address,1,2)::countrycode COLLATE english 4) SELECT * from users order by (substr(address,1,2) COLLATE english)::countrycode The ones with the collation expressions inside the casts seem very strange and the behaviour following the domain don't seem unreasonable. The behaviour with the collate clauses outside the cast should definitely be follow the explicit collate clause. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Sat, Mar 26, 2011 at 3:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: There's not much point in releasing a beta with behaviour that we know we intend to change. All it will do is elicit bug reports that we have to respond to saying we know, we were going to change that anyways. I think the goal of a beta is to be able to say we think this is the final behaviour of the next release but we're open to feedback. Yeah, I think this is a productive way to approach the question. I would put on a couple of extra conditions, though. Something like this: * No open issues that are expected to result in user-visible behavior changes. (Or at least significant changes? But then we have to argue about what's significant --- for instance, are the questions in the nearby collations-issues thread significant enough to be beta blockers?) * No open issues that are expected to result in a catversion bump. (With pg_upgrade, this is not as critical as it used to be, but I still think catalog stability is a good indicator of a release's maturity) * No known data-loss-causing bugs (duh) Comments? Any other quality criteria we should have for beta? Last 2 are pretty clear. The first one is debatable because of the word expected. Who decides that? I want more feedback into the project. That can obviously result in changes that are user visible. Users don't complain about non-user visible things. I'd state it the other way around: No open issues that are expected to result in non-user visible architecture changes. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Mar 26, 2011, at 12:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: In particular, in view of today's fix, shouldn't this commit be reverted? http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 I thought at the time that that was nothing more than documenting a known bug, and now it is documenting a dead bug. No, that doc change is still accurate. Well, in that case, it should be on the open-items list. If the system is still behaving that way, it's a bug. I suspect it's fixed, but I haven't checked yet. And contrary to Simon's contention on a nearby email, it was not a slight performance degradation. It was running at like 1 tps. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Simon Riggs si...@2ndquadrant.com writes: On Sat, Mar 26, 2011 at 4:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: In particular, in view of today's fix, shouldn't this commit be reverted? http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6e8e7cc580665ddd43c8ca2acc6d60f345570a57 I thought at the time that that was nothing more than documenting a known bug, and now it is documenting a dead bug. No, that doc change is still accurate. Well, in that case, it should be on the open-items list. If the system is still behaving that way, it's a bug. Is it? Sync rep requires fsync on the standby. If you then explicitly turn off fsync on the standby then it has a performance impact, as documented. Setting shared_buffers very low also reduces performance. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open issues for collations
On Mar 26, 2011, at 12:34 PM, Greg Stark gsst...@mit.edu wrote: The ones with the collation expressions inside the casts seem very strange and the behaviour following the domain don't seem unreasonable. The behaviour with the collate clauses outside the cast should definitely be follow the explicit collate clause. +1. That's exactly what I was trying to say; thanks for saying it better. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Open issues for collations
On Sat, Mar 26, 2011 at 12:36:43AM -0400, Tom Lane wrote: ** Selecting a field from a record-returning function's output. Currently, we'll use the field's declared collation; except that if the field has default collation, we'll replace that with the common collation of the function's inputs, if any. Is either part of that sane? Do we need to make this work for functions invoked with other syntax than a plain function call, eg operator or cast syntax? That seems all a bit weird. I spent some time reading through the SQL spec to see if I could came up with a few ideas about what they thought relevent. I think the gist of it is that I think the result row should have for each column its declared collation in all cases. Firstly, the SQL doesn't go into the general case where the collate result of a function is based in it inputs. But in any case, if the function returns a record, the whole record would have that collation, which is absurd. I think letting it go to the contained columns is just weird. Secondly, I think the derivation algorithm is for determing the collation of expressions which have no otherwise declared collation. Anything returning a predefined record type has a predefined collation and it should be used. If you're in a query referring to rowvar.field and rowvar has a type, that's what should be used. (No doubt you have some corner cases in mind?) As for operators, they should behave like functions wherever possible, otherwise it's just introducing unnecessary differences. The cast-case is related to below. ** What to do with domains whose declaration includes a COLLATE clause? Currently, we'll impute that collation to the result of a cast to the domain type --- even if the cast's input expression includes an explicit COLLATE clause. It's not clear that that's per spec. If it is correct, should we behave similarly for functions that are declared to return a domain type? Should it matter if the cast-to-domain is explicit or implicit? Perhaps it'd be best if domain collations only mattered for columns declared with that domain type. Then we'd have a general rule that collations only come into play in an expression as a result of (a) the declared type of a column reference or (b) an explicit COLLATE clause. The SQL spec considers the collation to be part of the datatype, so if you're casting to a domain (or type) you get the collation associated with that domain (or type). As per the spec: The collation derivation of a declared type with a declared type collation that is explicitly or implicitly specified by a data type is implicit. So the result of a cast would be the collation of the specified type/domain with state implicit. Also, apparently the COLLATE clause as allowed anywhere where a datatype is permitted. So you can say: CAST( foo AS TEXT COLLATE en_US ) Not sure if that works now. The result would be implicit state, as opposed to if the COLLATE clause appears elsewhere. Incidently, a function returning a domain seems weird to me. What does it mean: (1) the function returns this type, Postgres assumes this is true, or (2) function returns something, Postgres does an implicit cast? In any case, I'd suggest it is treated as being included in the resolving of the return collation with the arguments so if the result is a domain and you apply the normal rules you get: (1) explicit states in the arguments will override it (2) if arguments are implicit state and conflict with domain, the result is no-collation, otherwise implicitly whatever the domain was (3) no arguments have collation, which means you get the domain default. Which all seems eminently reasonable. So I'd agree with your rules, but add a case (c) result of a cast. * In plpgsql, is it OK for declared local variables to inherit the function's input collation? Should we provide a COLLATE option in variable declarations to let that be overridden? If Oracle understands COLLATE, probably we should look at what they do in PL/SQL. If COLLATE is allowed anywhere where the datatype is allowed, then the COLLATE clause should be permitted there. Otherwise they become the specified type with whatever the default is for that type. In expressions the coercible-default state will get overridden by the IMPLICIT state from the arguments as appropriate. I note I'm using the term coercible default here, because that's what Transact-SQL calls the state for any variable or value that's not a column reference. I'm just checking and don't see any support for it in the SQL standard. While it seemed to me to be extremely useful, since it allows column references to override literals. * RI triggers should insert COLLATE clauses in generated queries to satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the referenced column's collation. Right now you may get either table's collation depending on which query type is involved. I think an obvious failure may not be possible so long as
Re: [HACKERS] Lock problem with autovacuum truncating heap
On Sun, Mar 27, 2011 at 01:12, Simon Riggs si...@2ndquadrant.com wrote: At the same time I would change count_nondeletable_pages() so that it uses a forward scan direction (if that leads to a speedup). +1. Do we need that? Linux readahead works in both directions doesn't it? Guess it wouldn't hurt too much. Yes, probably. AFAIK, RHEL 5 cannot readahead in backward scans. It might be improved in the latest kernel, but it would be safe not to rely on kernels except simple forward scans. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
On Sat, Mar 26, 2011 at 12:41 PM, Simon Riggs si...@2ndquadrant.com wrote: Is it? Sync rep requires fsync on the standby. If you then explicitly turn off fsync on the standby then it has a performance impact, as documented. Actually, it doesn't, now that you fixed this. Before: [rhaas@office ~]$ pgbench -T 10 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 25 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 27 tps = 0.099386 (including connections establishing) tps = 0.099389 (excluding connections establishing) [rhaas@office ~]$ pgbench -T 10 starting vacuum...end. transaction type: TPC-B (sort of) scaling factor: 25 query mode: simple number of clients: 1 number of threads: 1 duration: 10 s number of transactions actually processed: 425 tps = 42.442185 (including connections establishing) tps = 42.468972 (excluding connections establishing) The first one - run with code from a few weeks ago - hung up on the 27th transaction and was stuck there until the next checkpoint completed. The second one was run with the latest code and no longer hangs - and in fact it's now faster than running with fsync=on, exactly as one would expect. Or at least as *I* expected. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] race condition in sync rep
Simon Riggs si...@2ndquadrant.com writes: On Sat, Mar 26, 2011 at 4:24 PM, Tom Lane t...@sss.pgh.pa.us wrote: Well, in that case, it should be on the open-items list. If the system is still behaving that way, it's a bug. Is it? Sync rep requires fsync on the standby. If you then explicitly turn off fsync on the standby then it has a performance impact, as documented. No, that's utter nonsense. If the system is behaving that way, then it's a bug. If you don't think it's a bug, then you misunderstand what the fsync GUC is supposed to do. What fsync=off is supposed to do is cause every attempted fsync to succeed instantly. It is *not possible* for that to result in a performance slowdown compared to fsyncs that take nonzero time. Unless someone's broken it. The documented behavior is broken. Period. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 9.1 Beta
On Sat, Mar 26, 2011 at 11:46 AM, Tom Lane t...@sss.pgh.pa.us wrote: Greg Stark gsst...@mit.edu writes: There's not much point in releasing a beta with behaviour that we know we intend to change. All it will do is elicit bug reports that we have to respond to saying we know, we were going to change that anyways. I think the goal of a beta is to be able to say we think this is the final behaviour of the next release but we're open to feedback. Yeah, I think this is a productive way to approach the question. I would put on a couple of extra conditions, though. Something like this: * No open issues that are expected to result in user-visible behavior changes. (Or at least significant changes? But then we have to argue about what's significant --- for instance, are the questions in the nearby collations-issues thread significant enough to be beta blockers?) * No open issues that are expected to result in a catversion bump. (With pg_upgrade, this is not as critical as it used to be, but I still think catalog stability is a good indicator of a release's maturity) * No known data-loss-causing bugs (duh) Comments? Any other quality criteria we should have for beta? I think all of these things are pretty subjective, but I broadly agree with the way you've set it out here. Simon is right that it's sometimes reasonable to ship the code as it is and see what feedback we get. But there's a countervailing effect, too: once we ship the code, then people get used to the way it works, and people don't want to change it, even if it's not what they would have picked initially. Magnus mentioned that he was going to upgrade some machine somewhere to alpha5 once it was out. When my jaw fell off he assured me it wasn't a critical system, but still: some people upgrade very early, and once we declare beta, they're going to expect that we aren't going to change too much. And even if they had no such expectation, we don't WANT to change too much, because then we've got to allow more time for beta-testing of the new stuff. It's much easier and much less work to get it right the first time. All that having been said, I think this whole thing may be a tempest in a teapot. Regardless of what the exact criteria are for beta, I think we're getting reasonably close to meeting them. Shaking out the bugs in the collation stuff has taken longer than you originally predicted, but it seems like we're homing in on it; and sync rep has gone from a long list of open items (most of which were reported by Fujii Masao, whose efforts I at least very much appreciate) to a much shorter one. A couple of the other issues are in fact longstanding bugs rather than new regressions in 9.1, and therefore can't be viewed as beta blockers. I see no reason we can't finish the remaining items in the next couple of weeks, barring a sudden influx of new bug reports - especially if a few more people pitch in and help move things forward. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lock problem with autovacuum truncating heap
On Mar 26, 2011, at 1:44 PM, Itagaki Takahiro itagaki.takah...@gmail.com wrote: On Sun, Mar 27, 2011 at 01:12, Simon Riggs si...@2ndquadrant.com wrote: At the same time I would change count_nondeletable_pages() so that it uses a forward scan direction (if that leads to a speedup). +1. Hmm. That would speed up truncations that are large relative to the table size, but slow down small truncations. And small truncations are likely to be more common than big ones. Maybe we could do a mix... back up 16MB and scan forward; if all those pages are empty then back up 16MB from the start point and scan forward from there. Or whatever we think the right chunk size is to get some benefit from kernel readahead without making the truncate 1 block case slow. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lock problem with autovacuum truncating heap
On 3/26/2011 12:12 PM, Simon Riggs wrote: On Sat, Mar 26, 2011 at 2:30 PM, Jan Wieckjanwi...@yahoo.com wrote: My current idea for a fix is to modify lazy_truncate_heap(). It does acquire and release the exclusive lock, so it should be possible to do this in smaller chunks, releasing and reacquiring the lock so that client transactions can get their work done as well. Agreed, presumably with vacuum delay in there as well? Not sure about that. My theory is that unless somebody needs access to that table, just have at it like it is now. The current implementation seems to assume that the blocks, checked for being empty, are still found in memory (vacuum just scanned them). And that seems to be correct most of the time, in which case adding vacuum delay only gives more time that the blocks get evicted and have to be read back in. At the same time I would change count_nondeletable_pages() so that it uses a forward scan direction (if that leads to a speedup). Do we need that? Linux readahead works in both directions doesn't it? Guess it wouldn't hurt too much. BTW does it read the blocks at that point using a buffer strategy? Is reading a file backwards in 8K blocks actually an access pattern, that may confuse buffer strategies? I don't know. I also don't know if what I am suggesting is much better. If you think about it, I merely suggested to try and do the same access pattern with larger chunks. We need to run some tests to find out. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Lock problem with autovacuum truncating heap
On 3/26/2011 3:17 PM, Robert Haas wrote: On Mar 26, 2011, at 1:44 PM, Itagaki Takahiroitagaki.takah...@gmail.com wrote: On Sun, Mar 27, 2011 at 01:12, Simon Riggssi...@2ndquadrant.com wrote: At the same time I would change count_nondeletable_pages() so that it uses a forward scan direction (if that leads to a speedup). +1. Hmm. That would speed up truncations that are large relative to the table size, but slow down small truncations. And small truncations are likely to be more common than big ones. For small truncations the blocks to check are most likely found in memory (shared or OS buffer) anyway, in which case the access pattern should be rather irrelevant. Maybe we could do a mix... back up 16MB and scan forward; if all those pages are empty then back up 16MB from the start point and scan forward from there. Or whatever we think the right chunk size is to get some benefit from kernel readahead without making the truncate 1 block case slow. That was what I meant. Go in steps of 16-64MB backwards and scan from there to the current end in forward direction to find a nondeletable block. In between these steps, release and reacquire the exclusive lock so that client transactions can get their work done. Jan -- Anyone who trades liberty for security deserves neither liberty nor security. -- Benjamin Franklin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Pavel Stehule wrote: 2011/3/26 Darren Duncan dar...@darrenduncan.net: I mention 2 possible solutions here, both which involve syntax alterations, each between the -- lines. I personally like the second/lower option more. -1 this is not based on any pattern on SQL. It's not simple, and it introduce a reserved keywords Okay, here's a much simpler proposal with the most important bit of the old one. 1. In all situations where there is ambiguity such that an identifier reference (not declaration) may be referring to either a lexical variable/parameter of the current routine, or to the name of the table column of the contextually current table of the current SQL statement, the ambiguity is always resolved in favor of the lexical var/param. If I am not mistaken, that is what PL/PgSQL already does since 9.0. 2. If an identifier reference has a leading . then that will force it to be interpreted as a column instead (and the code will fail if there is no such column), and so .colname is a shorthand for tablename.colname; but like with the old colname it only works when just 1 of the source tables has colname else it is still ambiguous like before. Example: select (.mycol + myvar * myparam) as mynewcol from mytbl; This solution is a very terse and understandable change. There are no reserved keywords. Legacy user code has no change where there were no conflicts before. Legacy user code has no change in the case of conflict if it was previously resolved to favor the lexical var/param. Legacy user code only gains a leading . in the few places where conflict was resolved in favor of a column name before where a same-named lexical/param existed. So what's not to like about this? -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
2011/3/26 Darren Duncan dar...@darrenduncan.net: Pavel Stehule wrote: 2011/3/26 Darren Duncan dar...@darrenduncan.net: I mention 2 possible solutions here, both which involve syntax alterations, each between the -- lines. I personally like the second/lower option more. -1 this is not based on any pattern on SQL. It's not simple, and it introduce a reserved keywords Okay, here's a much simpler proposal with the most important bit of the old one. 1. In all situations where there is ambiguity such that an identifier reference (not declaration) may be referring to either a lexical variable/parameter of the current routine, or to the name of the table column of the contextually current table of the current SQL statement, the ambiguity is always resolved in favor of the lexical var/param. If I am not mistaken, that is what PL/PgSQL already does since 9.0. 2. If an identifier reference has a leading . then that will force it to be interpreted as a column instead (and the code will fail if there is no such column), and so .colname is a shorthand for tablename.colname; but like with the old colname it only works when just 1 of the source tables has colname else it is still ambiguous like before. Example: select (.mycol + myvar * myparam) as mynewcol from mytbl; This solution is a very terse and understandable change. There are no reserved keywords. Legacy user code has no change where there were no conflicts before. Legacy user code has no change in the case of conflict if it was previously resolved to favor the lexical var/param. Legacy user code only gains a leading . in the few places where conflict was resolved in favor of a column name before where a same-named lexical/param existed. So what's not to like about this? sorry - I dislike this. The design is correct, but it is against to SQL verbosity. A reader must to thinking about missing tablenames. I dont't think so it is good solution, because it doesn't solve a backing compatibility problem - somebody must to fix a function still, and I think so it is much preferable to fix like: select (mytbl.mycol + myvar * myparam) as mynewcol from mytbl; your proposal saves a five chars, but it has a negative impacts on readability - there should be more tables. There are no reason to introduce a new concepts - SQL knows a aliases. Regards Pavel -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] resolving SQL ambiguity (was Re: WIP: Allow SQL-lang funcs to ref params by param name)
Pavel Stehule wrote: sorry - I dislike this. The design is correct, but it is against to SQL verbosity. A reader must to thinking about missing tablenames. I dont't think so it is good solution, because it doesn't solve a backing compatibility problem - somebody must to fix a function still, and I think so it is much preferable to fix like: select (mytbl.mycol + myvar * myparam) as mynewcol from mytbl; your proposal saves a five chars, but it has a negative impacts on readability - there should be more tables. There are no reason to introduce a new concepts - SQL knows a aliases. Well, going forward, I know I would much rather have to say mytbl.mycol than have to say myfunc.myparam. And I certainly would want to expect that when one says ... as foo that this foo is treated as a declaration unambiguously and is never substituted for some parameter or there be other grief as I seem to recall having in 8.4. -- Darren Duncan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
Joshua Berkus j...@agliodbs.com writes: Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this. +1 I think the best choice is to only accept qualified parameter names in SQL functions (function_name.parameter_name). If a referenced table share the function's name, ERROR out and HINT to alias the table name. If we allow more than that, we're opening the door to ambiguity, bug reports, and more than that costly migrations. I don't see any benefit in having to audit all SQL functions for ambiguity on a flag day, when this could be avoided easily. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
2011/3/26 Dimitri Fontaine dimi...@2ndquadrant.fr: Joshua Berkus j...@agliodbs.com writes: Personally I'd vote for *not* having any such dangerous semantics as that. We should have learned better by now from plpgsql experience. I think the best idea is to throw error for ambiguous references, period. As a likely heavy user of this feature, I agree with Tom here. I really don't want the column being silently preferred in SQL functions, when PL/pgSQL functions are throwing an error. I'd end up spending hours debugging this. +1 I think the best choice is to only accept qualified parameter names in SQL functions (function_name.parameter_name). If a referenced table share the function's name, ERROR out and HINT to alias the table name. it's maybe too hard. I agree so we should to use a function_name alias when collision is possible. Still there are more use cases, where SQL function is used as macro, and there a alias isn't necessary CREATE OR REPLACE FUNCTION greatest(VARIADIC values anyarray) RETURNS anyelement AS $$ SELECT max(v) FROM unnest(values) $$ LANGUAGE sql; Regards Pavel If we allow more than that, we're opening the door to ambiguity, bug reports, and more than that costly migrations. I don't see any benefit in having to audit all SQL functions for ambiguity on a flag day, when this could be avoided easily. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] SSI bug?
On Fri, Mar 25, 2011 at 04:06:30PM -0400, Tom Lane wrote: Up to now, I believe the lockmgr's lock table is the only shared hash table that is expected to grow past the declared size; that can happen anytime a session exceeds max_locks_per_transaction, which we consider to be only a soft limit. I don't know whether SSI has introduced any other hash tables that behave similarly. (If it has, we might want to rethink the amount of slop space we leave in shmem...) I looked into this recently and the two lockmgr tables were indeed the only ones that could vary in size. IIRC, the only other shared hash tables were the shared buffer index and the shmem index. SSI adds two more analogous tables (per-lock-target and per-xact-lock), both of which are sized according to max_pred_locks_per_transaction, which is also a soft limit. It also adds a per-transaction shared hash table, but that has a clear maximum size. I find the soft limit on htab size a strange model, and I suspect it might be a source of problems now that we've got more than one table that can actually exceed it its limit. (Particularly so given that once shmem gets allocated to one htab, there's no getting it back.) Dan -- Dan R. K. Ports MIT CSAILhttp://drkp.net/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] 2nd Level Buffer Cache
On Fri, Mar 25, 2011 at 8:07 AM, Gurjeet Singh singh.gurj...@gmail.com wrote: On Tue, Mar 22, 2011 at 3:53 PM, Robert Haas robertmh...@gmail.com wrote: On Tue, Mar 22, 2011 at 11:24 AM, Jeff Janes jeff.ja...@gmail.com wrote: On Fri, Mar 18, 2011 at 9:19 AM, Robert Haas robertmh...@gmail.com wrote: A related area that could use some looking at is why performance tops out at shared_buffers ~8GB and starts to fall thereafter. Under what circumstances does this happen? Can a simple pgbench -S with a large scaling factor elicit this behavior? To be honest, I'm mostly just reporting what I've heard Greg Smith say on this topic. I don't have any machine with that kind of RAM. I can sponsor a few hours (say 10) of one High-memory on-demand Quadruple Extra Large instance (26 EC2 Compute Units (8 virtual cores with 3.25 EC2 Compute Units each), 1690 GB of local instance storage, 64-bit platform). That's the largest memory AWS has. Does AWS have machines with battery-backed write cache? I think people running servers with 192G probably have BBWC, so it may be hard to do realistic tests without also having one on the test machine. But probably a bigger problem is that (to the best of my knowledge) we don't seem to have a non-proprietary, generally implementable benchmark system or load-generator which is known to demonstrate the problem. Cheers, Jeff -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] WIP: Allow SQL-language functions to reference parameters by parameter name
On Sat, Mar 26, 2011 at 5:19 PM, Dimitri Fontaine dimi...@2ndquadrant.fr wrote: I think the best choice is to only accept qualified parameter names in SQL functions (function_name.parameter_name). If a referenced table share the function's name, ERROR out and HINT to alias the table name. If we allow more than that, we're opening the door to ambiguity, bug reports, and more than that costly migrations. I don't see any benefit in having to audit all SQL functions for ambiguity on a flag day, when this could be avoided easily. That syntax is sufficiently unwieldly that few people will want to use it in real life, but certainly the backward compatibility problem is much less than with what Tom proposed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] alpha5
Per previous discussion, I'm going to wrap alpha5 Monday morning Eastern time, barring objections. This time, I'm going to try to make sure that announcements actually go out; I dropped the ball on that last time. If anyone wants to push time zone updates, translation updates, etc., I suppose now's the time. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \dt and table size
On Wed, Mar 23, 2011 at 4:33 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of mié mar 23 17:24:59 -0300 2011: On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle maili...@oopsware.de wrote: It stroke me today again, that \dt+ isn't displaying the acurate table size for tables, since it uses pg_relation_size() till now. With having pg_table_size() since PostgreSQL 9.0 available, i believe it would be more useful to have the total acquired storage displayed, including implicit objects (the mentioned case where it was not very useful atm was a table with a big TOAST table). I guess the threshold question for this patch is whether pg_table_size() is a more accurate table size or just a different one. Not including the toast table and index in the size is just plain wrong. Reporting the size without the toast objects is an implementation artifact that should not be done unless explicitely requested. It sounds like everyone is in agreement that we should go ahead and commit this patch, so I'll go do that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] psql \dt and table size
On Sat, Mar 26, 2011 at 9:42 PM, Robert Haas robertmh...@gmail.com wrote: On Wed, Mar 23, 2011 at 4:33 PM, Alvaro Herrera alvhe...@commandprompt.com wrote: Excerpts from Robert Haas's message of mié mar 23 17:24:59 -0300 2011: On Mon, Mar 21, 2011 at 1:44 PM, Bernd Helmle maili...@oopsware.de wrote: It stroke me today again, that \dt+ isn't displaying the acurate table size for tables, since it uses pg_relation_size() till now. With having pg_table_size() since PostgreSQL 9.0 available, i believe it would be more useful to have the total acquired storage displayed, including implicit objects (the mentioned case where it was not very useful atm was a table with a big TOAST table). I guess the threshold question for this patch is whether pg_table_size() is a more accurate table size or just a different one. Not including the toast table and index in the size is just plain wrong. Reporting the size without the toast objects is an implementation artifact that should not be done unless explicitely requested. It sounds like everyone is in agreement that we should go ahead and commit this patch, so I'll go do that. Err, wait a minute. This can't be quite right: showTables isn't mutually exclusive with other options; we don't want to display the size using pg_relation_size() when someone says: \dts and pg_table_size() when they say: \dt and pg_relation_size() when they say: \ds But I think we can just call pg_table_size() regardless in 9.0+; I believe it'll return the same results as pg_relation_size() on non-tables. Anyone see a problem with that? Also, for clarity, the 9.0+ code should go first, like this: if (pset.sversion = 9) { /* do stuff */ } else if (pset.sversion = 81000 { /* do different stuff */ } -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Typed-tables patch broke pg_upgrade
On Fri, Mar 11, 2011 at 8:28 AM, Bruce Momjian br...@momjian.us wrote: Is this still an open bug? Is anyone working on fixing this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] DO hint update?
Does this need updating now that languages are more exensiony? Or is it OK? if (!HeapTupleIsValid(languageTuple)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(language \%s\ does not exist, languageName), (PLTemplateExists(languageName) ? errhint(Use CREATE LANGUAGE to load the language into the database.) : 0))); -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Performance bug in DO blocks
On Tue, Jan 18, 2011 at 1:46 PM, Tom Lane t...@sss.pgh.pa.us wrote: I just noticed that if you execute the same DO command over and over within a session, it gets slower and slower. And if you keep it up you'll notice the backend's RAM consumption bloating too. The cause appears to be that we leak the cached plans created for any SQL statements or expressions within the DO command --- the next iteration won't reuse those, but rather create its own set. Probably ought to look into releasing those when the DO block is over. Should we try to do something about this? I don't really understand what's going on here. I thought maybe the problem was that the inline handler was getting called with TopMemoryContext active, but it's not. It's getting called with the PortalHeapMemory context active, so anything that is allocated without first specifying the context ought to get cleaned up at end-of-statement. So somewhere we're deliberately copying something into a longer-lived memory context where we shouldn't be. Your comment about cached plans got me looking at CreateCachedPlan(), which does this; it's called from SPI_saveplan(), which is called from exec_prepare_plan(). But that's immediately followed by SPI_freeplan(), so I'm all tapped out. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility
On Mon, Jan 3, 2011 at 6:55 AM, Magnus Hagander mag...@hagander.net wrote: ISTM that the correct fix is to increment to protocol version number to 3.1 and send PGRES_COPY_OUT if the client requests version 3.0. That's what the version numbers are for, no? In a way - yes. I assume we didn't do that because it's considered internal. It still won't help in my situation though - I need to know what version of the libpq headers I have in order to even be able to *compile* the program. At runtime, I could check against the server version, and get around it. This is listed on the open items list as raise protocol version number, but the above discussion suggests both that this might be unnecessary and that it might not solve Magnus's problem anyway. What do we want to do here? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] DO hint update?
Robert Haas robertmh...@gmail.com writes: Does this need updating now that languages are more exensiony? Or is it OK? if (!HeapTupleIsValid(languageTuple)) ereport(ERROR, (errcode(ERRCODE_UNDEFINED_OBJECT), errmsg(language \%s\ does not exist, languageName), (PLTemplateExists(languageName) ? errhint(Use CREATE LANGUAGE to load the language into the database.) : 0))); Hm ... the core languages would now prefer CREATE EXTENSION, but it's not clear how fast non-core PLs will follow suit. Perhaps Use CREATE EXTENSION or CREATE LANGUAGE to load ... ? regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Libpq PGRES_COPY_BOTH - version compatibility
Robert Haas robertmh...@gmail.com writes: On Mon, Jan 3, 2011 at 6:55 AM, Magnus Hagander mag...@hagander.net wrote: ISTM that the correct fix is to increment to protocol version number to 3.1 and send PGRES_COPY_OUT if the client requests version 3.0. That's what the version numbers are for, no? It still won't help in my situation though - I need to know what version of the libpq headers I have in order to even be able to *compile* the program. At runtime, I could check against the server version, and get around it. This is listed on the open items list as raise protocol version number, but the above discussion suggests both that this might be unnecessary and that it might not solve Magnus's problem anyway. What do we want to do here? I'm not in favor of bumping the protocol version number for this. Magnus is correct that that'd be the right thing to do in an abstract sense; but we haven't bumped the protocol version number since 7.4, and so I have no faith that clients will behave sensibly. I think we'd be much more likely to break things than to accomplish anything useful. Given the small fraction of client programs that will care, and the fact that a protocol bump doesn't fix the whole issue anyway, working around it on the client side seems much the best compromise. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers