Re: [HACKERS] Review: listagg aggregate
I noticed that the regression test results don't include the following case: select string_agg(a) from (values(null),(null)) g(a); There is one similar where a delimiter is provided. Which leads me to ask for clarification, would it be safe to assume that string_agg can never itself return null? Thom
Re: [HACKERS] Review: listagg aggregate
2010/2/1 Thom Brown thombr...@gmail.com: I noticed that the regression test results don't include the following case: select string_agg(a) from (values(null),(null)) g(a); There is one similar where a delimiter is provided. Which leads me to ask for clarification, would it be safe to assume that string_agg can never itself return null? if all values are null, then result is null. Pavel Thom -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On 1 February 2010 13:40, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/2/1 Thom Brown thombr...@gmail.com: I noticed that the regression test results don't include the following case: select string_agg(a) from (values(null),(null)) g(a); There is one similar where a delimiter is provided. Which leads me to ask for clarification, would it be safe to assume that string_agg can never itself return null? if all values are null, then result is null. Pavel Ah, I was looking at the expected results, and couldn't see a NULL outcome, but then these aren't indicated in such results anyway. Thom
Re: [HACKERS] Review: listagg aggregate
On Sun, Jan 31, 2010 at 10:29 PM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: Applied with some editorialization. Please check the docs are good enough. Looks pretty good. I have committed a couple very minor adjustments. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Robert Haas robertmh...@gmail.com wrote: ok - I am only one who like original behave - so I ?am withdrawing. Robert, If you like, please commit Itagaki's patch. + add note about behave when second parameter isn't stable. I haven't even looked at this code - I sort of assumed Itagaki was handling this one. But it might be good to make sure that the docs have been read through by a native English speaker prior to commit... Applied with some editorialization. Please check the docs are good enough. I removed a test pattern for variable delimiters from regression test because it might be an undefined behavior. We might change the behavior in the future, so we guarantee only constant delimiters. The transition functions are renamed to string_agg_transfn and string_agg_delim_transfn. We cannot use string_agg_transfn for both names because we cast the function name as regproc in bootstrap; it complains about duplicated function names. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/2/1 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp: Robert Haas robertmh...@gmail.com wrote: ok - I am only one who like original behave - so I ?am withdrawing. Robert, If you like, please commit Itagaki's patch. + add note about behave when second parameter isn't stable. I haven't even looked at this code - I sort of assumed Itagaki was handling this one. But it might be good to make sure that the docs have been read through by a native English speaker prior to commit... Applied with some editorialization. Please check the docs are good enough. I removed a test pattern for variable delimiters from regression test because it might be an undefined behavior. We might change the behavior in the future, so we guarantee only constant delimiters. The transition functions are renamed to string_agg_transfn and string_agg_delim_transfn. We cannot use string_agg_transfn for both names because we cast the function name as regproc in bootstrap; it complains about duplicated function names. thank you very much Pavel Stehule Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Fri, Jan 29, 2010 at 2:43 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/1/28 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: with get_fn_expr_arg_stable() we are able to fix second parameter without some performance issues. No, that will create its own performance issues --- get_fn_expr_arg_stable isn't especially cheap. If there were a really strong reason why we had to do it, then I'd agree, but frankly the argument for disallowing a variable delimiter is too thin. it could be called only once. But I agree, so could be better, check it in parser time. ok - I am only one who like original behave - so I am withdrawing. Robert, If you like, please commit Itagaki's patch. + add note about behave when second parameter isn't stable. I haven't even looked at this code - I sort of assumed Itagaki was handling this one. But it might be good to make sure that the docs have been read through by a native English speaker prior to commit... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 29, 2010, at 10:43 AM, Robert Haas wrote: I haven't even looked at this code - I sort of assumed Itagaki was handling this one. But it might be good to make sure that the docs have been read through by a native English speaker prior to commit... I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The documentation for all the functions on that page could use a little love, frankly. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Fri, Jan 29, 2010 at 1:45 PM, David E. Wheeler da...@kineticode.com wrote: On Jan 29, 2010, at 10:43 AM, Robert Haas wrote: I haven't even looked at this code - I sort of assumed Itagaki was handling this one. But it might be good to make sure that the docs have been read through by a native English speaker prior to commit... I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The documentation for all the functions on that page could use a little love, frankly. Want to take a short at it? ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 29, 2010, at 10:46 AM, Robert Haas wrote: I did and revised them slightly. There isn't much, just a brief comment in the table of aggregate functions. The documentation for all the functions on that page could use a little love, frankly. Want to take a short at it? ENOTUITS! /me is already sorely over-committed… David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp: Pavel Stehule pavel.steh...@gmail.com wrote: with actualised oids I'm checking the patch for commit, and have a couple of comments. * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. no I dislike it. This using is nonsense. Regards Pavel * Can we use StringInfo directly as the aggregate context instead of StringAggState? For the first reason, we need to drop 'delimiter' field from struct StringAggState. Now it has only StringInfo field. * We'd better avoiding to call text_to_cstring() for delimitors and elements for performance reason. We can use appendBinaryStringInfo() here. My proposal patch attached. Also, I've not changed it yet, but it might be considerable: * Do we need better names for string_agg1_transfn and string_agg2_transfn? They are almost internal names, but we could have more like string_agg_with_sep_transfn. Comments? Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 David E. Wheeler da...@kineticode.com: On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote: * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. Ooh, nice. * Can we use StringInfo directly as the aggregate context instead of StringAggState? For the first reason, we need to drop 'delimiter' field from struct StringAggState. Now it has only StringInfo field. Makes sense. no, has not. Pavel * We'd better avoiding to call text_to_cstring() for delimitors and elements for performance reason. We can use appendBinaryStringInfo() here. My proposal patch attached. Also, I've not changed it yet, but it might be considerable: * Do we need better names for string_agg1_transfn and string_agg2_transfn? They are almost internal names, but we could have more like string_agg_with_sep_transfn. Yes please. Comments? Patch looks great, thank you! David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Pavel Stehule pavel.steh...@gmail.com: 2010/1/28 David E. Wheeler da...@kineticode.com: On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote: * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. Ooh, nice. * Can we use StringInfo directly as the aggregate context instead of StringAggState? For the first reason, we need to drop 'delimiter' field from struct StringAggState. Now it has only StringInfo field. Makes sense. no, has not. What is use case for this behave?? Pavel Pavel * We'd better avoiding to call text_to_cstring() for delimitors and elements for performance reason. We can use appendBinaryStringInfo() here. My proposal patch attached. Also, I've not changed it yet, but it might be considerable: * Do we need better names for string_agg1_transfn and string_agg2_transfn? They are almost internal names, but we could have more like string_agg_with_sep_transfn. Yes please. Comments? Patch looks great, thank you! David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Pavel Stehule pavel.steh...@gmail.com wrote: 2010/1/28 Pavel Stehule pavel.steh...@gmail.com: 2010/1/28 David E. Wheeler da...@kineticode.com: On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote: * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. no, has not. What is use case for this behave?? I also think this usage is nonsense, but seems to be the most consistent behavior for me. I didn't say anything about use-cases, but just capability. Since we allow such kinds of usage for now, you need to verify the delimiter is not changed rather than ignoring it if you want disallow to change the delimiter during an aggregation. Of course you can cache the first delimiter at start, and check delimiters are not changed every calls -- but I think it is just a waste of cpu cycle. Regards, --- Takahiro Itagaki NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: Pavel Stehule pavel.steh...@gmail.com wrote: 2010/1/28 Pavel Stehule pavel.steh...@gmail.com: 2010/1/28 David E. Wheeler da...@kineticode.com: On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote: * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. no, has not. What is use case for this behave?? I also think this usage is nonsense, but seems to be the most consistent behavior for me. I didn't say anything about use-cases, but just capability. Since we allow such kinds of usage for now, you need to verify the delimiter is not changed rather than ignoring it if you want disallow to change the delimiter during an aggregation. Of course you can cache the first delimiter at start, and check delimiters are not changed every calls -- but I think it is just a waste of cpu cycle. Agreed. Not caching it seems the simplest solution. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Robert Haas robertmh...@gmail.com: On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: Pavel Stehule pavel.steh...@gmail.com wrote: 2010/1/28 Pavel Stehule pavel.steh...@gmail.com: 2010/1/28 David E. Wheeler da...@kineticode.com: On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote: * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. no, has not. What is use case for this behave?? I also think this usage is nonsense, but seems to be the most consistent behavior for me. I didn't say anything about use-cases, but just capability. Since we allow such kinds of usage for now, you need to verify the delimiter is not changed rather than ignoring it if you want disallow to change the delimiter during an aggregation. Of course you can cache the first delimiter at start, and check delimiters are not changed every calls -- but I think it is just a waste of cpu cycle. Agreed. Not caching it seems the simplest solution. simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Pavel ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote: simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Well... that's an entirely arbitrary limitation. I admit that it doesn't seem likely that someone would want to have a variable delimiter, but putting extra effort and code complexity into preventing it seems pointless. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Robert Haas robertmh...@gmail.com: On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote: simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Well... that's an entirely arbitrary limitation. I admit that it doesn't seem likely that someone would want to have a variable delimiter, but putting extra effort and code complexity into preventing it seems pointless. It is only a few lines with zero complexity. The main issue of Takahiro proposal is unclean behave. we can have a content c1c2 --- c11, c12, c21, c22 and result of string_agg(c1, c2) have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2 will be NULL ?? I checked oracle. Oracle doesn't allow variable as delimiter. We can't check it. But we can fix first value and using it as constant. More - Takahiro proposal has one performance gain. It have to deTOAST separator value in every cycle. Takahiro has true - we can store length of separator and we can add separator to cumulative value as binary value. I prefer original behave with note in documentation - so as separator is used a value on first row, when is used expression and not constant. Regards Pavel ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Robert Haas robertmh...@gmail.com writes: On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote: simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Well... that's an entirely arbitrary limitation. I admit that it doesn't seem likely that someone would want to have a variable delimiter, but putting extra effort and code complexity into preventing it seems pointless. Yeah. The real issue here is that in some cases you'd like to have non-aggregated parameters to an aggregate, but SQL has no notation to express that. I think Pavel's underlying complaint is that if the delimiter argument isn't constant, then we're exposing an implementation dependency in terms of just which values get separated by which delimiters. The most practical implementation seems to be that the first-call delimiter isn't actually used at all, and on subsequent calls the delimiter *precedes* the associated value, which is a bit surprising given the order in which one writes them. Not sure if this is worth documenting though. Those two or three people who actually try it will figure it out soon enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote: simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Well... that's an entirely arbitrary limitation. I admit that it doesn't seem likely that someone would want to have a variable delimiter, but putting extra effort and code complexity into preventing it seems pointless. Yeah. The real issue here is that in some cases you'd like to have non-aggregated parameters to an aggregate, but SQL has no notation to express that. Right. I think Pavel's underlying complaint is that if the delimiter argument isn't constant, then we're exposing an implementation dependency in terms of just which values get separated by which delimiters. The most practical implementation seems to be that the first-call delimiter isn't actually used at all, and on subsequent calls the delimiter *precedes* the associated value, which is a bit surprising given the order in which one writes them. Not sure if this is worth documenting though. Those two or three people who actually try it will figure it out soon enough. Yeah, I'm thoroughly unworried about it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: * Do we need better names for string_agg1_transfn and string_agg2_transfn? They are almost internal names, but we could have more like string_agg_with_sep_transfn. Surely the names of the transition and final functions should match the name of the aggregate. But if there's a proposal on the table to call this string_app_with_sep() instead of just string_agg(), +1 from me. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/29 Pavel Stehule pavel.steh...@gmail.com: 2010/1/28 Robert Haas robertmh...@gmail.com: On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote: simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Well... that's an entirely arbitrary limitation. I admit that it doesn't seem likely that someone would want to have a variable delimiter, but putting extra effort and code complexity into preventing it seems pointless. It is only a few lines with zero complexity. The main issue of Takahiro proposal is unclean behave. we can have a content c1 c2 --- c11, c12, c21, c22 and result of string_agg(c1, c2) have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2 will be NULL ?? I checked oracle. Oracle doesn't allow variable as delimiter. We can't check it. But we can fix first value and using it as constant. What about get_fn_expr_arg_stable() to check if the argument is stable during aggregate? Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Robert Haas robertmh...@gmail.com: On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote: simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Well... that's an entirely arbitrary limitation. I admit that it doesn't seem likely that someone would want to have a variable delimiter, but putting extra effort and code complexity into preventing it seems pointless. Yeah. The real issue here is that in some cases you'd like to have non-aggregated parameters to an aggregate, but SQL has no notation to express that. Right. I think Pavel's underlying complaint is that if the delimiter argument isn't constant, then we're exposing an implementation dependency in terms of just which values get separated by which delimiters. The most practical implementation seems to be that the first-call delimiter isn't actually used at all, and on subsequent calls the delimiter *precedes* the associated value, which is a bit surprising given the order in which one writes them. Not sure if this is worth documenting though. Those two or three people who actually try it will figure it out soon enough. ok - there is one query, in 99.99% the second argument will be a constant. Can we use this information and optimize function for this case? The detoast on every row can take some percent from a performance. Pavel Yeah, I'm thoroughly unworried about it. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Robert Haas robertmh...@gmail.com: On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp wrote: * Do we need better names for string_agg1_transfn and string_agg2_transfn? They are almost internal names, but we could have more like string_agg_with_sep_transfn. Surely the names of the transition and final functions should match the name of the aggregate. But if there's a proposal on the table to call this string_app_with_sep() instead of just string_agg(), +1 from me. no, string_app_with_sep is too long for common using. Pavel ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Pavel Stehule pavel.steh...@gmail.com writes: in 99.99% the second argument will be a constant. Can we use this information and optimize function for this case? The detoast on every row can take some percent from a performance. What detoast? There won't be one for a constant, nor even for a variable in any sane situation --- who's going to be using multi-kilobyte delimiter values? And if they do, aren't they likely to run out of memory for the result long before the repeated detoasts become an interesting problem? You're arguing about a case that seems quite irrelevant to the real world. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Hitoshi Harada umi.tan...@gmail.com: 2010/1/29 Pavel Stehule pavel.steh...@gmail.com: 2010/1/28 Robert Haas robertmh...@gmail.com: On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote: simplest could not be a best. There have to be only a const expression. But we have not possibility to check it in pg. Well... that's an entirely arbitrary limitation. I admit that it doesn't seem likely that someone would want to have a variable delimiter, but putting extra effort and code complexity into preventing it seems pointless. It is only a few lines with zero complexity. The main issue of Takahiro proposal is unclean behave. we can have a content c1 c2 --- c11, c12, c21, c22 and result of string_agg(c1, c2) have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2 will be NULL ?? I checked oracle. Oracle doesn't allow variable as delimiter. We can't check it. But we can fix first value and using it as constant. What about get_fn_expr_arg_stable() to check if the argument is stable during aggregate? I newer know so this function exists. Now we can a) check and allow only stable params b) when second parameter is stable, then store it and use it as constant. I prefer a) Pavel Regards, -- Hitoshi Harada -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: in 99.99% the second argument will be a constant. Can we use this information and optimize function for this case? The detoast on every row can take some percent from a performance. What detoast? There won't be one for a constant, nor even for a variable in any sane situation --- who's going to be using multi-kilobyte delimiter values? And if they do, aren't they likely to run out of memory for the result long before the repeated detoasts become an interesting problem? You're arguing about a case that seems quite irrelevant to the real world. I asking with get_fn_expr_arg_stable() we are able to fix second parameter without some performance issues. Regards Pavel regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Hitoshi Harada umi.tan...@gmail.com writes: What about get_fn_expr_arg_stable() to check if the argument is stable during aggregate? Seems quite expensive (possible catalog lookups) and there still isn't any strong argument why we should bother. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Pavel Stehule pavel.steh...@gmail.com writes: with get_fn_expr_arg_stable() we are able to fix second parameter without some performance issues. No, that will create its own performance issues --- get_fn_expr_arg_stable isn't especially cheap. If there were a really strong reason why we had to do it, then I'd agree, but frankly the argument for disallowing a variable delimiter is too thin. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On 2010-01-28 19:17, Pavel Stehule wrote: 2010/1/28 Hitoshi Harada umi.tan...@gmail.com: What about get_fn_expr_arg_stable() to check if the argument is stable during aggregate? I newer know so this function exists. Now we can a) check and allow only stable params b) when second parameter is stable, then store it and use it as constant. I prefer a) Someone might have a perfectly good use case for using different delimiters. I don't think it's a good idea to be artificially limiting what you can and can't do. Regards, Marko Tiikkaja -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 28, 2010, at 9:29 AM, Marko Tiikkaja wrote: Someone might have a perfectly good use case for using different delimiters. I don't think it's a good idea to be artificially limiting what you can and can't do. +1 David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
One situation where this could actually matter in the long term is if we want to have an optimization for aggregate functions whose state variables can be combined. this could be important if we ever want to do parallel processing someday. So we could have, for example two subjobs build two sublists and then combiner the two lists later. in that case the separator might not be the one the user expected - our put another way the one the user expected might not be available when we need it. We could say this isn't a problem because not all aggregate functions will be amenable to such an optimization and perhaps this will just be one of them. Alternately we could just have faith that a solution will be found - it doesn't seem like it should be an insoluble problem to me. greg On 28 Jan 2010 15:57, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel Yeah. The real issue here is that in some cases you'd like to have non-aggregated parameters to an aggregate, but SQL has no notation to express that. I think Pavel's underlying complaint is that if the delimiter argument isn't constant, then we're exposing an implementation dependency in terms of just which values get separated by which delimiters. The most practical implementation seems to be that the first-call delimiter isn't actually used at all, and on subsequent calls the delimiter *precedes* the associated value, which is a bit surprising given the order in which one writes them. Not sure if this is worth documenting though. Those two or three people who actually try it will figure it out soon enough. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subs...
Re: [HACKERS] Review: listagg aggregate
2010/1/28 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: with get_fn_expr_arg_stable() we are able to fix second parameter without some performance issues. No, that will create its own performance issues --- get_fn_expr_arg_stable isn't especially cheap. If there were a really strong reason why we had to do it, then I'd agree, but frankly the argument for disallowing a variable delimiter is too thin. it could be called only once. But I agree, so could be better, check it in parser time. ok - I am only one who like original behave - so I am withdrawing. Robert, If you like, please commit Itagaki's patch. + add note about behave when second parameter isn't stable. Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
with actualised oids Regards Pavel Stehule 2010/1/26 David E. Wheeler da...@kineticode.com: On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote: actualised patch - the name is string_agg All looks fine except I'm getting this error during initdb: creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3031) is duplicated. child process exited with exit code 1 Would you mind re-submitting with unique OIDs? Thanks, David string_agg.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Tue, Jan 26, 2010 at 1:38 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: But what it *produces* is a string. For comparison, the SQL-standard-specified array_agg produces arrays, but what it acts on isn't an array. This point is well-taken, but naming it string_agg() because it produces a string doesn't seem quite descriptive enough. We might someday (if we don't already) have a number of aggregates that produce an output that is a string; we can't name them all by the output type. True, but the same point could be made against array_agg, and that didn't stop the committee from choosing that name. As long as string_agg is the most obvious aggregate-to-string functionality, which ISTM it is, I think it's all right for it to have pride of place in naming. Maybe so, but personally, I'd still prefer something more descriptive. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 27, 2010, at 7:58 AM, Pavel Stehule wrote: with actualised oids Thanks. Looks good, modulo my preference for concat_agg(). I'll mark it ready for committer. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Pavel Stehule pavel.steh...@gmail.com wrote: with actualised oids I'm checking the patch for commit, and have a couple of comments. * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. * Can we use StringInfo directly as the aggregate context instead of StringAggState? For the first reason, we need to drop 'delimiter' field from struct StringAggState. Now it has only StringInfo field. * We'd better avoiding to call text_to_cstring() for delimitors and elements for performance reason. We can use appendBinaryStringInfo() here. My proposal patch attached. Also, I've not changed it yet, but it might be considerable: * Do we need better names for string_agg1_transfn and string_agg2_transfn? They are almost internal names, but we could have more like string_agg_with_sep_transfn. Comments? Regards, --- Takahiro Itagaki NTT Open Source Software Center string_agg_20100128.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote: * I think we cannot cache the delimiter at the first call. For example, SELECT string_agg(elem, delim) FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim); should return 'A+B*C' rather than 'A,B,C'. Ooh, nice. * Can we use StringInfo directly as the aggregate context instead of StringAggState? For the first reason, we need to drop 'delimiter' field from struct StringAggState. Now it has only StringInfo field. Makes sense. * We'd better avoiding to call text_to_cstring() for delimitors and elements for performance reason. We can use appendBinaryStringInfo() here. My proposal patch attached. Also, I've not changed it yet, but it might be considerable: * Do we need better names for string_agg1_transfn and string_agg2_transfn? They are almost internal names, but we could have more like string_agg_with_sep_transfn. Yes please. Comments? Patch looks great, thank you! David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 25, 2010, at 23:14, Pavel Stehule pavel.steh...@gmail.com wrote: why is concat_agg better than listagg ? Because it's an aggregate that cocatenates values. It's not an aggregate that lists things. I also like concat_agg better than string_agg because it's not limited to acting on strings. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Tue, Jan 26, 2010 at 1:08 PM, David E. Wheeler da...@kineticode.com wrote: . Because it's an aggregate that cocatenates values. It's not an aggregate that lists things. I also like concat_agg better than string_agg because it's not limited to acting on strings. . Given that it potentially produces a delimited list, not a straight conacatenation (and that list is unacceptable since it would be descriptive as a noun but not as a verb) would implode_agg not be the most descriptive name? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Tue, Jan 26, 2010 at 1:23 PM, Alastair Turner b...@ctrlf5.co.za wrote: . Given that it potentially produces a delimited list, not a straight conacatenation (and that list is unacceptable since it would be descriptive as a noun but not as a verb) would implode_agg not be the most descriptive name? Actually, scratch that. The other *agg functions are named for what they produce as output. Not for their process - as per the objection to list_agg and suggestions of conact_agg and implode_agg. string_agg would be consistent, which is a wonderful thing if you can get it in a naming scheme. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On tis, 2010-01-26 at 03:08 -0800, David E. Wheeler wrote: On Jan 25, 2010, at 23:14, Pavel Stehule pavel.steh...@gmail.com wrote: why is concat_agg better than listagg ? Because it's an aggregate that cocatenates values. It's not an aggregate that lists things. I also like concat_agg better than string_agg because it's not limited to acting on strings. What else can it act on? -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/1/25 Robert Haas robertmh...@gmail.com: On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com wrote: On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum concat_agg(). I like that one... why is concat_agg better than listagg ? Because it doesn't make lists. Honestly, I don't love concat_agg() either - why should something need to have agg in the name just because it's an aggregate? I think the most descriptive name would be something like concatenate_with_separator(), but that's kind of long. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/26 Robert Haas robertmh...@gmail.com: On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule pavel.steh...@gmail.com wrote: 2010/1/25 Robert Haas robertmh...@gmail.com: On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com wrote: On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum concat_agg(). I like that one... why is concat_agg better than listagg ? Because it doesn't make lists. Honestly, I don't love concat_agg() either - why should something need to have agg in the name just because it's an aggregate? I think the most descriptive name would be something like concatenate_with_separator(), but that's kind of long. This is never ending story :) MySQL has function concate_ws - but this function has different semantic. I thing so string_agg is short, and from one view consistent Pavel ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 26, 2010, at 4:03, Peter Eisentraut pete...@gmx.net wrote: What else can it act on? Any data type, since they all can be converted to text. Integers would be a common choice. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Pavel Stehule pavel.steh...@gmail.com writes: 2010/1/25 Robert Haas robertmh...@gmail.com: On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com wrote: concat_agg(). I like that one... why is concat_agg better than listagg ? It isn't ... it's the wrong part of speech. concatenate is a verb, whereas the other functions we would like it to be named parallel to are using nouns there. (Yes, I know array can be used as a verb, but I don't think anyone reads it that way in array_agg...) regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Tom Lane t...@sss.pgh.pa.us wrote: Pavel Stehule pavel.steh...@gmail.com writes: why is concat_agg better than listagg ? It isn't ... it's the wrong part of speech. concatenate is a verb, Concatenation is a noun. concat doesn't get far enough to distinguish. -Kevin -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
David E. Wheeler da...@kineticode.com writes: Because it's an aggregate that cocatenates values. It's not an aggregate that lists things. I also like concat_agg better than string_agg because it's not limited to acting on strings. But what it *produces* is a string. For comparison, the SQL-standard-specified array_agg produces arrays, but what it acts on isn't an array. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 26, 2010, at 9:36 AM, Tom Lane wrote: But what it *produces* is a string. For comparison, the SQL-standard-specified array_agg produces arrays, but what it acts on isn't an array. Meh. This is all just bike-shedding. I'm fine with string_agg(), though in truth none of the names has really been great. The inclusion of agg in the name is unfortunate. I'll have a look at Pavel's new patch now. David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
David E. Wheeler da...@kineticode.com writes: Meh. This is all just bike-shedding. I'm fine with string_agg(), though in truth none of the names has really been great. The inclusion of agg in the name is unfortunate. Yeah, I wouldn't be for it either if it weren't for the precedent of array_agg. I was quite surprised that the SQL committee chose that name, because they've avoided using the term aggregate function at all, but there it is ... regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote: actualised patch - the name is string_agg All looks fine except I'm getting this error during initdb: creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL: could not create unique index pg_proc_oid_index DETAIL: Key (oid)=(3031) is duplicated. child process exited with exit code 1 Would you mind re-submitting with unique OIDs? Thanks, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: David E. Wheeler da...@kineticode.com writes: Because it's an aggregate that cocatenates values. It's not an aggregate that lists things. I also like concat_agg better than string_agg because it's not limited to acting on strings. But what it *produces* is a string. For comparison, the SQL-standard-specified array_agg produces arrays, but what it acts on isn't an array. This point is well-taken, but naming it string_agg() because it produces a string doesn't seem quite descriptive enough. We might someday (if we don't already) have a number of aggregates that produce an output that is a string; we can't name them all by the output type. ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Robert Haas robertmh...@gmail.com writes: On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote: But what it *produces* is a string. For comparison, the SQL-standard-specified array_agg produces arrays, but what it acts on isn't an array. This point is well-taken, but naming it string_agg() because it produces a string doesn't seem quite descriptive enough. We might someday (if we don't already) have a number of aggregates that produce an output that is a string; we can't name them all by the output type. True, but the same point could be made against array_agg, and that didn't stop the committee from choosing that name. As long as string_agg is the most obvious aggregate-to-string functionality, which ISTM it is, I think it's all right for it to have pride of place in naming. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote: I think listagg or string_agg would be the most appropriate names. Oh and before Oracle had wm_concat, Tom Kyte wrote a function called stragg that was pretty popular. Well, xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum So it's pretty clear that listagg does not fit into this scheme. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/25 Peter Eisentraut pete...@gmx.net: On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote: I think listagg or string_agg would be the most appropriate names. Oh and before Oracle had wm_concat, Tom Kyte wrote a function called stragg that was pretty popular. Well, xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum So it's pretty clear that listagg does not fit into this scheme. when you define list as text domain, then this the name is correct. searching list sql string on google. You can see, so list is usually used. Regards Pavel -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Pavel Stehule pavel.steh...@gmail.com writes: 2010/1/25 Peter Eisentraut pete...@gmx.net: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum So it's pretty clear that listagg does not fit into this scheme. when you define list as text domain, then this the name is correct. IOW, if you define away the problem then there's no problem? I agree that list is a terrible choice of name here. string_agg seemed reasonable and in keeping with the standardized array_agg. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/25 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: 2010/1/25 Peter Eisentraut pete...@gmx.net: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum So it's pretty clear that listagg does not fit into this scheme. when you define list as text domain, then this the name is correct. IOW, if you define away the problem then there's no problem? I agree that list is a terrible choice of name here. string_agg seemed reasonable and in keeping with the standardized array_agg. I am not happy, I thing so we do bigger chaos then it is. But it hasn't progress. So I agree with name string_agg. In this case isn't a problem rename this function if somebody would. I'll send patch over hour. regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/25 Tom Lane t...@sss.pgh.pa.us: Pavel Stehule pavel.steh...@gmail.com writes: 2010/1/25 Peter Eisentraut pete...@gmx.net: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum So it's pretty clear that listagg does not fit into this scheme. when you define list as text domain, then this the name is correct. IOW, if you define away the problem then there's no problem? I agree that list is a terrible choice of name here. string_agg seemed reasonable and in keeping with the standardized array_agg. actualised patch - the name is string_agg regards Pavel Stehule regards, tom lane string_agg.diff Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/24 Tom Lane t...@sss.pgh.pa.us: Simon Riggs si...@2ndquadrant.com writes: On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote: No performance issues ISTM that this class of function is inherently dangerous performance wise. * It looks incredibly easy to construct enormous lists. We should test the explosion limit of this to see how it is handled. Perhaps we need some parameter limits to control that, depending upon results. * Optimizer doesn't consider whether the result type of an aggregate get bigger as the aggregate processes more rows. If we're adding this function we should give some thought in that area also, or at least a comment to note that it can and will cause the optimizer problems in complex queries. We have that problem already with array_agg(), and I don't recall many complaints about it. It might be worth worrying about at some point, but I don't think it's reasonable to insist that it be fixed before any more such aggregates are created. I agree that testing-to-failure would be a good idea just to be sure it fails cleanly. postgres=# \timing Timing is on. postgres=# select pg_size_pretty(length(string_agg('012345678901234567890'::text,','))) from generate_series(1,1000) g(i); pg_size_pretty 210 MB (1 row) Time: 5831,218 ms postgres=# select pg_size_pretty(length(string_agg('012345678901234567890'::text,','))) from generate_series(1,5000) g(i); ^[^[ERROR: out of memory DETAIL: Cannot enlarge string buffer containing 1073741812 bytes by 21 more bytes. postgres=# I thing, so 210 MB is more then is necessary :) Regards Pavel Stehule regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum concat_agg(). David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote: I am not happy, I thing so we do bigger chaos then it is. But it hasn't progress. So I agree with name string_agg. In this case isn't a problem rename this function if somebody would. Could you not use CREATE AGGREGATE to create an alias named listagg if you needed it? Or are we limited to a single argument in that case? David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com wrote: On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum concat_agg(). I like that one... ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/25 David E. Wheeler da...@kineticode.com: On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote: I am not happy, I thing so we do bigger chaos then it is. But it hasn't progress. So I agree with name string_agg. In this case isn't a problem rename this function if somebody would. Could you not use CREATE AGGREGATE to create an alias named listagg if you needed it? Or are we limited to a single argument in that case? we can, but without one common known name we will have a propriety name. Pavel David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/25 Robert Haas robertmh...@gmail.com: On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com wrote: On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote: xmlagg - concatenates values to form xml datum array_agg - concatenates values to form array datum ??? - concatenates values to form string datum concat_agg(). I like that one... why is concat_agg better than listagg ? Pavel ...Robert -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Hello 2010/1/22 David E. Wheeler da...@kineticode.com: Pavel, My review of your listagg patch. Submission Review - * The diff is a context diff and applies cleanly to HEAD (with just two hunks offset by 2 lines each). * There is documentation, though I'm not sure it needs to be mentioned in the string functions documentation. No harm in it, I guess. I would like to see an example, though, and the documentation does not currently explain what each of the parameters are for. In fact, it looks like all the existing aggregates take only one parameter, so there was not previously a need to explain it. But listagg() has an optional second param. I think that the description should explain what it's for. can I help with it, please. My English is terrible. * There are tests and they look fine. Usability Review * The patch does in fact implement the aggregate function it describes, and OH YES do we want it (I've written my own in SQL a few times). * No, we don't already have it. * Yes it follows community-agreed behavior. I'm assuming that there is no special parsing of aggregate functions, so the simple use of commas to separate the two parameters is appropriate, rather than using a keyword like MySQL's SEPARATOR in the group_concat() aggregate. * No need to have pg_dump support, no dangers that I can see, looks like all the bases have been covered. Feature Test * Everything built cleanly, but I got an OID dupe error when I tried to init the DB. Looks like 2997 and 2998 have been used for something else since you created the patch. I changed them to 2995 and 2996 and then it worked. * The feature appears to work. I didn't see any tests for encodings or other data types, so I ran a few myself and they work fine: postgres=# select listagg(a, U'-\0441\043B\043E\043D-') from (values(''),(''),('' listagg -- -слон--слон- (1 row) postgres=# select listagg(a, U'\2014') from (values(U'\0441\043B\043E\043D'),(U'd\0061t\+61'),(U'\0441\043B\043E\043D')) AS g(a); listagg слон—data—слон (1 row) postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a); listagg - 123 (1 row) Performance Review -- No performance issues, except that it should be faster than a custom aggregate that does the same thing. To test, I created a quick custom aggregate (no second argument, alas, so listagg() is more flexible) like so: CREATE OR REPLACE FUNCTION a2s(ANYARRAY) RETURNS TEXT LANGUAGE SQL AS $$ SELECT array_to_string($1, ','); $$; CREATE AGGREGATE string_accum ( SFUNC = array_accum, BASETYPE = ANYELEMENT, STYPE = ANYARRAY, INITCOND = '{}', FINALFUNC = a2s ); Then I ran some simple tests (thanks for the clue, depesz): postgres=# select count(*) from (select string_accum(a) from (values(''),(''),('')) AS g(a), generate_series(1,1) i) AS x(i); count --- 1 (1 row) Time: 1365.382 ms postgres=# select count(*) from (select listagg(a) from (values(''),(''),('')) AS g(a), generate_series(1,1) i) AS x(i); count --- 1 (1 row) Time: 17.989 ms So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, and my system is built with --enable-cassert and --enable-debug. Still, good job. Coding Review - * Is varchar.c really the best place to put the ListAggState struct and the listagg() function? I grepped the source for array_agg() and it's in src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for string functions? Otherwise, the style of the C code looks fine to my untrained eye. array user functions are used more time in pg core. The complexity of array functions are much higher, so I don't think we need special file. Actually, shouldn't it return text rather than varchar? I'll recheck it. I am sure so all parameters should be a text. * Does it really require four functions to do its work? Might there be some way to use the array_agg() C functions and then just a different final function to turn it into a string (using the internal array_to_string() function, perhaps)? We can, but it isn't good way. Processing of arrays is little bit more expensive then processing plain text. It is reason why listagg is faster, than your custom aggregate. More, the final function could be faster - the content is final. I'm not at all sure about it, but given how little code was required to create the same basic functionality in SQL, I'm surprised that the C implementation requires four functions (accumStringResult(), listagg1_transfn(), listagg2_transfn(), and listagg_finalfn()). Maybe they're required to make it fast and avoid the overhead of an array? It normal for aggregate
Re: [HACKERS] Review: listagg aggregate
On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote: can I help with it, please. My English is terrible. Yes, I added a bit in the patch I submitted. array user functions are used more time in pg core. The complexity of array functions are much higher, so I don't think we need special file. Okay. Should have tried it in PL/pgSQL then, perhaps. I'll recheck it. I am sure so all parameters should be a text. Probably shouldn't go into varchar.c then, yes? We can, but it isn't good way. Processing of arrays is little bit more expensive then processing plain text. It is reason why listagg is faster, than your custom aggregate. More, the final function could be faster - the content is final. Understood. It normal for aggregate functions. We need more transfn function, because we need two two variant: listagg(col), listagg(col, sep). Our coding guidlines doesn't advice share C functions - but these functions are +/- wrapper for accumStringResult - so there is zero overhead. Ah, okay, it's because of the second argument. Now I understand. I don't think. When we have function, with same parameters, same behave like some Oracle function, then I am strongly prefer Oracle name. I don't see any benefit from different name. It can only confuse developers and add the trable to people who porting applications. Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who need it. Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote: No performance issues ISTM that this class of function is inherently dangerous performance wise. * It looks incredibly easy to construct enormous lists. We should test the explosion limit of this to see how it is handled. Perhaps we need some parameter limits to control that, depending upon results. * Optimizer doesn't consider whether the result type of an aggregate get bigger as the aggregate processes more rows. If we're adding this function we should give some thought in that area also, or at least a comment to note that it can and will cause the optimizer problems in complex queries. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/24 Simon Riggs si...@2ndquadrant.com: On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote: No performance issues ISTM that this class of function is inherently dangerous performance wise. there is potencial risk, but this risk isn't new. The almost all what you say is true for array aggregates. * It looks incredibly easy to construct enormous lists. We should test the explosion limit of this to see how it is handled. Perhaps we need some parameter limits to control that, depending upon results. There are no limit for generating large values - like bytea, xml, or text. There are not limit for array_accum or array(query). So, I don't think we need some special mechanism for listagg. If somebody will generate too large a value, then he will get out of memory exception. * Optimizer doesn't consider whether the result type of an aggregate get bigger as the aggregate processes more rows. If we're adding this function we should give some thought in that area also, or at least a comment to note that it can and will cause the optimizer problems in complex queries. this is true, but this isn't some new. array_accum working well without optimizer problems. -- Simon Riggs www.2ndQuadrant.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
Simon Riggs si...@2ndquadrant.com writes: On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote: No performance issues ISTM that this class of function is inherently dangerous performance wise. * It looks incredibly easy to construct enormous lists. We should test the explosion limit of this to see how it is handled. Perhaps we need some parameter limits to control that, depending upon results. * Optimizer doesn't consider whether the result type of an aggregate get bigger as the aggregate processes more rows. If we're adding this function we should give some thought in that area also, or at least a comment to note that it can and will cause the optimizer problems in complex queries. We have that problem already with array_agg(), and I don't recall many complaints about it. It might be worth worrying about at some point, but I don't think it's reasonable to insist that it be fixed before any more such aggregates are created. I agree that testing-to-failure would be a good idea just to be sure it fails cleanly. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
2010/1/24 David E. Wheeler da...@kineticode.com: On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote: can I help with it, please. My English is terrible. Yes, I added a bit in the patch I submitted. array user functions are used more time in pg core. The complexity of array functions are much higher, so I don't think we need special file. Okay. Should have tried it in PL/pgSQL then, perhaps. :) - I'll look on it again. I'll recheck it. I am sure so all parameters should be a text. Probably shouldn't go into varchar.c then, yes? We can, but it isn't good way. Processing of arrays is little bit more expensive then processing plain text. It is reason why listagg is faster, than your custom aggregate. More, the final function could be faster - the content is final. Understood. It normal for aggregate functions. We need more transfn function, because we need two two variant: listagg(col), listagg(col, sep). Our coding guidlines doesn't advice share C functions - but these functions are +/- wrapper for accumStringResult - so there is zero overhead. Ah, okay, it's because of the second argument. Now I understand. I don't think. When we have function, with same parameters, same behave like some Oracle function, then I am strongly prefer Oracle name. I don't see any benefit from different name. It can only confuse developers and add the trable to people who porting applications. Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who need it. The list is common name for this content - it is usual in Microsoft SQL Server, it is usual in Oracle. Maybe we can vote about the name Regards Pavel Stehule Best, David -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Review: listagg aggregate
I don't think. When we have function, with same parameters, same behave like some Oracle function, then I am strongly prefer Oracle name. I don't see any benefit from different name. It can only confuse developers and add the trable to people who porting applications. Meh. If the name is terrible, we don't have to use it, and it's easy enough to create an alias in SQL for those who need it. The corresponding function in Oracle is called wm_concat. In MySQL its called group_concat. I don't believe DB2 or SQL Server have built in equivalents. The Oracle name isn't really an option (wm' stands for workspace manager) I think listagg or string_agg would be the most appropriate names. Oh and before Oracle had wm_concat, Tom Kyte wrote a function called stragg that was pretty popular. Scott -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Review: listagg aggregate
Pavel, My review of your listagg patch. Submission Review - * The diff is a context diff and applies cleanly to HEAD (with just two hunks offset by 2 lines each). * There is documentation, though I'm not sure it needs to be mentioned in the string functions documentation. No harm in it, I guess. I would like to see an example, though, and the documentation does not currently explain what each of the parameters are for. In fact, it looks like all the existing aggregates take only one parameter, so there was not previously a need to explain it. But listagg() has an optional second param. I think that the description should explain what it's for. * There are tests and they look fine. Usability Review * The patch does in fact implement the aggregate function it describes, and OH YES do we want it (I've written my own in SQL a few times). * No, we don't already have it. * Yes it follows community-agreed behavior. I'm assuming that there is no special parsing of aggregate functions, so the simple use of commas to separate the two parameters is appropriate, rather than using a keyword like MySQL's SEPARATOR in the group_concat() aggregate. * No need to have pg_dump support, no dangers that I can see, looks like all the bases have been covered. Feature Test * Everything built cleanly, but I got an OID dupe error when I tried to init the DB. Looks like 2997 and 2998 have been used for something else since you created the patch. I changed them to 2995 and 2996 and then it worked. * The feature appears to work. I didn't see any tests for encodings or other data types, so I ran a few myself and they work fine: postgres=# select listagg(a, U'-\0441\043B\043E\043D-') from (values(''),(''),('' listagg -- -слон--слон- (1 row) postgres=# select listagg(a, U'\2014') from (values(U'\0441\043B\043E\043D'),(U'd\0061t\+61'),(U'\0441\043B\043E\043D')) AS g(a); listagg слон—data—слон (1 row) postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a); listagg - 123 (1 row) Performance Review -- No performance issues, except that it should be faster than a custom aggregate that does the same thing. To test, I created a quick custom aggregate (no second argument, alas, so listagg() is more flexible) like so: CREATE OR REPLACE FUNCTION a2s(ANYARRAY) RETURNS TEXT LANGUAGE SQL AS $$ SELECT array_to_string($1, ','); $$; CREATE AGGREGATE string_accum ( SFUNC= array_accum, BASETYPE = ANYELEMENT, STYPE= ANYARRAY, INITCOND = '{}', FINALFUNC = a2s ); Then I ran some simple tests (thanks for the clue, depesz): postgres=# select count(*) from (select string_accum(a) from (values(''),(''),('')) AS g(a), generate_series(1,1) i) AS x(i); count --- 1 (1 row) Time: 1365.382 ms postgres=# select count(*) from (select listagg(a) from (values(''),(''),('')) AS g(a), generate_series(1,1) i) AS x(i); count --- 1 (1 row) Time: 17.989 ms So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, and my system is built with --enable-cassert and --enable-debug. Still, good job. Coding Review - * Is varchar.c really the best place to put the ListAggState struct and the listagg() function? I grepped the source for array_agg() and it's in src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for string functions? Otherwise, the style of the C code looks fine to my untrained eye. Actually, shouldn't it return text rather than varchar? * Does it really require four functions to do its work? Might there be some way to use the array_agg() C functions and then just a different final function to turn it into a string (using the internal array_to_string() function, perhaps)? I'm not at all sure about it, but given how little code was required to create the same basic functionality in SQL, I'm surprised that the C implementation requires four functions (accumStringResult(), listagg1_transfn(), listagg2_transfn(), and listagg_finalfn()). Maybe they're required to make it fast and avoid the overhead of an array? * No compiler warnings, I never made it crash, good comments, does what it says on the tin. I doubt that there are any portability issues, as the code seems to use standard PostgreSQL internal macros and functions. Architecture Review --- * No dependencies, things seem to make sense overall, notwithstanding my questions in the Coding Review. Review Review - The only thing I haven't covered so far is the name. I agree with Tom's assertion that the name is awful. Sure there may be a precedent in Oracle, but I hardly find that convincing (some of the big corporations seem to do a really shitty job naming