Re: [HACKERS] Variadic aggregates vs. project policy
Pavel Stehule pavel.steh...@gmail.com writes: I was one who sent a bug report - this error is not too dangerous, but it is hidden, and difficult to find, if you don't know what can be happen. Same as bug with plpgsql and SQL identifier collisions. If you understand, then you can protect self well and simply. If not, then it is a magic error. So still I am thing so best solution is a) a warning when detect ORDER BY in variadic aggregates Such a warning would never be tolerated by users, because it would appear even when the query is perfectly correct. b) disallow ORDER BY in variadic aggregates in classic syntax, and enable it only in WITHIN GROUP syntax where is safe , And we're *not* inventing randomly different syntax for variadic aggregates. That ship sailed when we did it this way for regular functions. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
Tom Lane-2 wrote Pavel Stehule lt; pavel.stehule@ gt; writes: I was one who sent a bug report - this error is not too dangerous, but it is hidden, and difficult to find, if you don't know what can be happen. Same as bug with plpgsql and SQL identifier collisions. If you understand, then you can protect self well and simply. If not, then it is a magic error. So still I am thing so best solution is a) a warning when detect ORDER BY in variadic aggregates Such a warning would never be tolerated by users, because it would appear even when the query is perfectly correct. b) disallow ORDER BY in variadic aggregates in classic syntax, and enable it only in WITHIN GROUP syntax where is safe , And we're *not* inventing randomly different syntax for variadic aggregates. That ship sailed when we did it this way for regular functions. In the example case the problem is that ORDER BY constant is a valid, if not-very-useful, construct. Can we warn on this specific usage and thus mitigate many of the potential avenues of mis-use? If we alter syntax for mitigation purposes I'd want to consider requiring parentheses around the columns that belong to the ORDER BY instead of using the full extended syntax of WITHIN GROUP. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769106.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
On 2013-08-30 06:34:47 -0700, David Johnston wrote: Tom Lane-2 wrote I was one who sent a bug report - this error is not too dangerous, but it is hidden, and difficult to find, if you don't know what can be happen. Same as bug with plpgsql and SQL identifier collisions. If you understand, then you can protect self well and simply. If not, then it is a magic error. So still I am thing so best solution is a) a warning when detect ORDER BY in variadic aggregates Such a warning would never be tolerated by users, because it would appear even when the query is perfectly correct. b) disallow ORDER BY in variadic aggregates in classic syntax, and enable it only in WITHIN GROUP syntax where is safe , And we're *not* inventing randomly different syntax for variadic aggregates. That ship sailed when we did it this way for regular functions. In the example case the problem is that ORDER BY constant is a valid, if not-very-useful, construct. Can we warn on this specific usage and thus mitigate many of the potential avenues of mis-use? That doesn't help against something like »SELECT string_agg(somecol ORDER BY bar, separator)« where separator is a column. If we alter syntax for mitigation purposes I'd want to consider requiring parentheses around the columns that belong to the ORDER BY instead of using the full extended syntax of WITHIN GROUP. I think that ship has sailed. The syntax is there and it's not going away. Requiring different syntaxes for variadic/nonvariadic usages is going to be a way much bigger pitfall for users. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
David Johnston pol...@yahoo.com writes: If we alter syntax for mitigation purposes I'd want to consider requiring parentheses around the columns that belong to the ORDER BY instead of using the full extended syntax of WITHIN GROUP. Unfortunately, that ORDER BY syntax is specified by the SQL standard, and they didn't say anything about parentheses. We don't get to require parens there. The particular case that's standardized is only array_agg(): array aggregate function ::= ARRAY_AGG left paren value expression [ ORDER BY sort specification list ] right paren but, as we customarily do, we didn't restrict the feature to be used only with that aggregate. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
Andres Freund-3 wrote On 2013-08-30 06:34:47 -0700, David Johnston wrote: Tom Lane-2 wrote I was one who sent a bug report - this error is not too dangerous, but it is hidden, and difficult to find, if you don't know what can be happen. Same as bug with plpgsql and SQL identifier collisions. If you understand, then you can protect self well and simply. If not, then it is a magic error. So still I am thing so best solution is a) a warning when detect ORDER BY in variadic aggregates Such a warning would never be tolerated by users, because it would appear even when the query is perfectly correct. b) disallow ORDER BY in variadic aggregates in classic syntax, and enable it only in WITHIN GROUP syntax where is safe , And we're *not* inventing randomly different syntax for variadic aggregates. That ship sailed when we did it this way for regular functions. In the example case the problem is that ORDER BY constant is a valid, if not-very-useful, construct. Can we warn on this specific usage and thus mitigate many of the potential avenues of mis-use? That doesn't help against something like »SELECT string_agg(somecol ORDER BY bar, separator)« where separator is a column. If we alter syntax for mitigation purposes I'd want to consider requiring parentheses around the columns that belong to the ORDER BY instead of using the full extended syntax of WITHIN GROUP. I think that ship has sailed. The syntax is there and it's not going away. Requiring different syntaxes for variadic/nonvariadic usages is going to be a way much bigger pitfall for users. Neither suggestion (nor any suggestion I would imagine) is going to solve the problem. The goal is to minimize the size of the exposure. For the second ORDER BY (col1, col2) suggestion it would be added and recommended so those using that syntax would have less to worry about. This would apply to ALL invocations, not just variadic. David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769119.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
2013/8/30 David Johnston pol...@yahoo.com Andres Freund-3 wrote On 2013-08-30 06:34:47 -0700, David Johnston wrote: Tom Lane-2 wrote I was one who sent a bug report - this error is not too dangerous, but it is hidden, and difficult to find, if you don't know what can be happen. Same as bug with plpgsql and SQL identifier collisions. If you understand, then you can protect self well and simply. If not, then it is a magic error. So still I am thing so best solution is a) a warning when detect ORDER BY in variadic aggregates Such a warning would never be tolerated by users, because it would appear even when the query is perfectly correct. b) disallow ORDER BY in variadic aggregates in classic syntax, and enable it only in WITHIN GROUP syntax where is safe , And we're *not* inventing randomly different syntax for variadic aggregates. That ship sailed when we did it this way for regular functions. In the example case the problem is that ORDER BY constant is a valid, if not-very-useful, construct. Can we warn on this specific usage and thus mitigate many of the potential avenues of mis-use? That doesn't help against something like »SELECT string_agg(somecol ORDER BY bar, separator)« where separator is a column. If we alter syntax for mitigation purposes I'd want to consider requiring parentheses around the columns that belong to the ORDER BY instead of using the full extended syntax of WITHIN GROUP. I think that ship has sailed. The syntax is there and it's not going away. Requiring different syntaxes for variadic/nonvariadic usages is going to be a way much bigger pitfall for users. Neither suggestion (nor any suggestion I would imagine) is going to solve the problem. The goal is to minimize the size of the exposure. For the second ORDER BY (col1, col2) suggestion it would be added and recommended so those using that syntax would have less to worry about. This would apply to ALL invocations, not just variadic. you cannot use this syntax - probably, because (col1, col2) is same as ROW(col1, col2) and this syntax is supported now. So there is a collision. I had a same idea, but I don't think so it is possible Regards Pavel David J. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Variadic-aggregates-vs-project-policy-tp5768980p5769119.html Sent from the PostgreSQL - hackers mailing list archive at Nabble.com. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
Uh, the pg_dump part checks for version 80400, shouldn't it be 90400? -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
Alvaro Herrera alvhe...@2ndquadrant.com writes: Uh, the pg_dump part checks for version 80400, shouldn't it be 90400? The reasoning there is that 8.4 is where we added pg_get_function_arguments(), so this dumping code should work against that server version or later. (Oh, memo to self: test that.) It's true that pre-9.4 servers are not going to produce any useful extra info by using pg_get_function_arguments() on an aggregate; but the general idea of this patch is to make the aggregate-related code look more like the plain-function-related code, so using the same version cutoffs in both cases seemed to make sense. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
2013/8/29 Tom Lane t...@sss.pgh.pa.us So I was hacking away at supporting variadic aggregates (per an internal request at Salesforce), and had it pretty much working, when I came across this old comment in opr_sanity.sql: -- Check that there are not aggregates with the same name and different -- numbers of arguments. While not technically wrong, we have a project policy -- to avoid this because it opens the door for confusion in connection with -- ORDER BY: novices frequently put the ORDER BY in the wrong place. -- See the fate of the single-argument form of string_agg() for history. -- The only aggregates that should show up here are count(x) and count(*). While a variadic-using aggregate doesn't actually trip the associated test query, it surely violates the spirit of this policy: if you put ORDER BY in the wrong place the parser will be unable to detect that that wasn't what you meant. For context see the thread starting here: http://www.postgresql.org/message-id/aanlktikv5ok2ts8t6v+gsapte3n6tjq1jpphmzhg2...@mail.gmail.com In that thread we agreed that this policy might be rather squishy, but we should at least think hard about whether it would be wise to create built-in aggregates with the same name and different numbers of arguments. So the question I'm now wondering about is whether this consideration makes variadic aggregates a bad idea all around, even if we don't have any built-in ones. Is the risk of user confusion (in the use of their own aggregate) sufficient reason to reject such a feature? can be this issue solved by syntax? In September commitfest is patch for WITHIN GROUP where ORDER BY clause is safety separated from parameters. Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
Pavel Stehule pavel.steh...@gmail.com writes: 2013/8/29 Tom Lane t...@sss.pgh.pa.us So the question I'm now wondering about is whether this consideration makes variadic aggregates a bad idea all around, even if we don't have any built-in ones. Is the risk of user confusion (in the use of their own aggregate) sufficient reason to reject such a feature? can be this issue solved by syntax? In September commitfest is patch for WITHIN GROUP where ORDER BY clause is safety separated from parameters. That might not be the ugliest syntax the SQL committee ever invented, but it's right up there. I don't want to go that way, especially not when the existing precedent for the same feature with regular functions doesn't use any weird special syntax. On further reflection, what the policy was actually about was not that we should forbid users from creating potentially-confusing aggregates themselves, but only that we'd avoid having any *built in* aggregates with this hazard. So maybe I'm overthinking this, and the correct reading is just that we should have a policy against built-in variadic aggregates. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
2013/8/29 Tom Lane t...@sss.pgh.pa.us Pavel Stehule pavel.steh...@gmail.com writes: 2013/8/29 Tom Lane t...@sss.pgh.pa.us So the question I'm now wondering about is whether this consideration makes variadic aggregates a bad idea all around, even if we don't have any built-in ones. Is the risk of user confusion (in the use of their own aggregate) sufficient reason to reject such a feature? can be this issue solved by syntax? In September commitfest is patch for WITHIN GROUP where ORDER BY clause is safety separated from parameters. That might not be the ugliest syntax the SQL committee ever invented, but it's right up there. I don't want to go that way, especially not when the existing precedent for the same feature with regular functions doesn't use any weird special syntax. It is maybe not nice, but it is long years supported by almost all SQL servers. When I talked with Atri, he mentioned, so variadic aggregates are supported there too. Regards Pavel On further reflection, what the policy was actually about was not that we should forbid users from creating potentially-confusing aggregates themselves, but only that we'd avoid having any *built in* aggregates with this hazard. So maybe I'm overthinking this, and the correct reading is just that we should have a policy against built-in variadic aggregates. can be this potentially strange situation identified? - and some warning can be raised. regards, tom lane
Re: [HACKERS] Variadic aggregates vs. project policy
Tom, On further reflection, what the policy was actually about was not that we should forbid users from creating potentially-confusing aggregates themselves, but only that we'd avoid having any *built in* aggregates with this hazard. So maybe I'm overthinking this, and the correct reading is just that we should have a policy against built-in variadic aggregates. Yes. I think we can assume that anyone smart enough to create a variadic aggregate is smart enough to put ORDER BY in the right place. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
Josh Berkus wrote: Tom, On further reflection, what the policy was actually about was not that we should forbid users from creating potentially-confusing aggregates themselves, but only that we'd avoid having any *built in* aggregates with this hazard. So maybe I'm overthinking this, and the correct reading is just that we should have a policy against built-in variadic aggregates. Yes. I think we can assume that anyone smart enough to create a variadic aggregate is smart enough to put ORDER BY in the right place. I don't think it's a question of smart enough, but rather how easy it is to make a mistake. In the referenced thread email, I'm sure many of us looked at the failing query without realizing what the problem was. That said, if the question is whether or not to offer variadic aggregates if they give you the chance to misuse them in this way, the decision seems quite clear to me. Presumably, in the problematic case the arguments to the aggregate are going to be constructed programatically in client code; how would an ORDER BY be accidentally inserted by such code? If you end up having someagg('foo','bar','baz' ORDER BY 'qux', 'blarg') instead of someagg('foo','bar','baz', 'blarg' ORDER BY 'qux') I think you're going to realize the problem quickly enough. -- Álvaro Herrerahttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
On 2013-08-29 15:55:13 -0400, Tom Lane wrote: So I was hacking away at supporting variadic aggregates (per an internal request at Salesforce), and had it pretty much working, when I came across this old comment in opr_sanity.sql: -- Check that there are not aggregates with the same name and different -- numbers of arguments. While not technically wrong, we have a project policy -- to avoid this because it opens the door for confusion in connection with -- ORDER BY: novices frequently put the ORDER BY in the wrong place. -- See the fate of the single-argument form of string_agg() for history. -- The only aggregates that should show up here are count(x) and count(*). While a variadic-using aggregate doesn't actually trip the associated test query, it surely violates the spirit of this policy: if you put ORDER BY in the wrong place the parser will be unable to detect that that wasn't what you meant. For context see the thread starting here: http://www.postgresql.org/message-id/aanlktikv5ok2ts8t6v+gsapte3n6tjq1jpphmzhg2...@mail.gmail.com In that thread we agreed that this policy might be rather squishy, but we should at least think hard about whether it would be wise to create built-in aggregates with the same name and different numbers of arguments. So the question I'm now wondering about is whether this consideration makes variadic aggregates a bad idea all around, even if we don't have any built-in ones. Is the risk of user confusion (in the use of their own aggregate) sufficient reason to reject such a feature? I vote for abolishing that policy or maybe weakinging it. As you comment somewhere downthread the policy just prohibits core functions, but even for those it looks too strong for me. There are some useful variadic aggregates I'd like to see and I don't think that the kind of errors prevented by the policy are frequent enough to warrant a blanket prohibition. I'd say we let the check in there but have a list of exceptions in it so that one has to explicitly think about the issue before adding the function. Some functions are worth the risk, others are not. E.g. the 1 argument form of string_agg() doesn't have enough benefits, but other functions might. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
Andres Freund and...@2ndquadrant.com writes: On 2013-08-29 15:55:13 -0400, Tom Lane wrote: For context see the thread starting here: http://www.postgresql.org/message-id/aanlktikv5ok2ts8t6v+gsapte3n6tjq1jpphmzhg2...@mail.gmail.com In that thread we agreed that this policy might be rather squishy, but we should at least think hard about whether it would be wise to create built-in aggregates with the same name and different numbers of arguments. I vote for abolishing that policy or maybe weakinging it. As you comment somewhere downthread the policy just prohibits core functions, but even for those it looks too strong for me. There are some useful variadic aggregates I'd like to see and I don't think that the kind of errors prevented by the policy are frequent enough to warrant a blanket prohibition. Well, I dunno. We had two different bug reports caused by this type of confusion before string_agg even got out of beta, both from intelligent people. So I'm not about to discount the potential for confusion. As we said originally, this is a policy that might be broken for sufficiently strong cause --- but I don't want to just forget about the risks. I'd say we let the check in there but have a list of exceptions in it so that one has to explicitly think about the issue before adding the function. That's pretty much how the tests in opr_sanity work now. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
On 2013-08-29 18:29:34 -0400, Tom Lane wrote: Andres Freund and...@2ndquadrant.com writes: On 2013-08-29 15:55:13 -0400, Tom Lane wrote: For context see the thread starting here: http://www.postgresql.org/message-id/aanlktikv5ok2ts8t6v+gsapte3n6tjq1jpphmzhg2...@mail.gmail.com In that thread we agreed that this policy might be rather squishy, but we should at least think hard about whether it would be wise to create built-in aggregates with the same name and different numbers of arguments. I vote for abolishing that policy or maybe weakinging it. As you comment somewhere downthread the policy just prohibits core functions, but even for those it looks too strong for me. There are some useful variadic aggregates I'd like to see and I don't think that the kind of errors prevented by the policy are frequent enough to warrant a blanket prohibition. Well, I dunno. We had two different bug reports caused by this type of confusion before string_agg even got out of beta, both from intelligent people. So I'm not about to discount the potential for confusion. As we said originally, this is a policy that might be broken for sufficiently strong cause --- but I don't want to just forget about the risks. I'd say we let the check in there but have a list of exceptions in it so that one has to explicitly think about the issue before adding the function. That's pretty much how the tests in opr_sanity work now. I basically mean that we should adapt the paragraph you quoted upthread to roughly say something like: -- Check that there are not aggregates with the same name and different -- numbers of arguments. For many aggregates - look for string_agg in -- the archives for an example - the risk of confusing novices, which -- place ORDER BY in the wrong place, seems too big. If an aggregate is -- deemed not to be likely to cause such a problem or provides a feature -- which doesn't seem possibly to provide in another way that, add an -- excception for it. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
On 08/29/2013 05:37 PM, Josh Berkus wrote: Tom, On further reflection, what the policy was actually about was not that we should forbid users from creating potentially-confusing aggregates themselves, but only that we'd avoid having any *built in* aggregates with this hazard. So maybe I'm overthinking this, and the correct reading is just that we should have a policy against built-in variadic aggregates. Yes. I think we can assume that anyone smart enough to create a variadic aggregate is smart enough to put ORDER BY in the right place. It's not the creator who is in danger, though, it's the user of the aggregate function, AIUI. So unless you're saying that anyone smart enough to use a variadic aggregate can be assumed to be smart enough to put ORDER BY in the right place, I don't think this argument holds. cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Variadic aggregates vs. project policy
2013/8/30 Andrew Dunstan and...@dunslane.net On 08/29/2013 05:37 PM, Josh Berkus wrote: Tom, On further reflection, what the policy was actually about was not that we should forbid users from creating potentially-confusing aggregates themselves, but only that we'd avoid having any *built in* aggregates with this hazard. So maybe I'm overthinking this, and the correct reading is just that we should have a policy against built-in variadic aggregates. Yes. I think we can assume that anyone smart enough to create a variadic aggregate is smart enough to put ORDER BY in the right place. It's not the creator who is in danger, though, it's the user of the aggregate function, AIUI. So unless you're saying that anyone smart enough to use a variadic aggregate can be assumed to be smart enough to put ORDER BY in the right place, I don't think this argument holds. I was one who sent a bug report - this error is not too dangerous, but it is hidden, and difficult to find, if you don't know what can be happen. Same as bug with plpgsql and SQL identifier collisions. If you understand, then you can protect self well and simply. If not, then it is a magic error. So still I am thing so best solution is a) a warning when detect ORDER BY in variadic aggregates b) disallow ORDER BY in variadic aggregates in classic syntax, and enable it only in WITHIN GROUP syntax where is safe , btw a aggregates that needs a ORDER BY is better to evaluate different for minimise risk of OOP killer. This is a good solution without any risks. Regards Pavel cheers andrew -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/**mailpref/pgsql-hackershttp://www.postgresql.org/mailpref/pgsql-hackers