subject:"\[HACKERS\] Review\: listagg aggregate"

Re: [HACKERS] Review: listagg aggregate

2010-02-01 Thread Thom Brown

I noticed that the regression test results don't include the following case:

select string_agg(a) from (values(null),(null)) g(a);

There is one similar where a delimiter is provided.

Which leads me to ask for clarification, would it be safe to assume that
string_agg can never itself return null?

Thom

Re: [HACKERS] Review: listagg aggregate

2010-02-01 Thread Pavel Stehule

2010/2/1 Thom Brown thombr...@gmail.com:
 I noticed that the regression test results don't include the following case:

 select string_agg(a) from (values(null),(null)) g(a);

 There is one similar where a delimiter is provided.

 Which leads me to ask for clarification, would it be safe to assume that
 string_agg can never itself return null?

if all values are null, then result is null.

Pavel


 Thom



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-02-01 Thread Thom Brown

On 1 February 2010 13:40, Pavel Stehule pavel.steh...@gmail.com wrote:

 2010/2/1 Thom Brown thombr...@gmail.com:
  I noticed that the regression test results don't include the following
 case:
 
  select string_agg(a) from (values(null),(null)) g(a);
 
  There is one similar where a delimiter is provided.
 
  Which leads me to ask for clarification, would it be safe to assume that
  string_agg can never itself return null?

 if all values are null, then result is null.

 Pavel



Ah, I was looking at the expected results, and couldn't see a NULL outcome,
but then these aren't indicated in such results anyway.

Thom

Re: [HACKERS] Review: listagg aggregate

2010-02-01 Thread Robert Haas

On Sun, Jan 31, 2010 at 10:29 PM, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:
 Applied with some editorialization. Please check the docs are good enough.

Looks pretty good.  I have committed a couple very minor adjustments.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-31 Thread Takahiro Itagaki


Robert Haas robertmh...@gmail.com wrote:

  ok - I am only one who like original behave - so I ?am withdrawing.
  Robert, If you like, please commit Itagaki's patch. + add note about
  behave when second parameter isn't stable.
 
 I haven't even looked at this code - I sort of assumed Itagaki was
 handling this one.  But it might be good to make sure that the docs
 have been read through by a native English speaker prior to commit...

Applied with some editorialization. Please check the docs are good enough.

I removed a test pattern for variable delimiters from regression test
because it might be an undefined behavior. We might change the behavior
in the future, so we guarantee only constant delimiters.

The transition functions are renamed to string_agg_transfn and
string_agg_delim_transfn. We cannot use string_agg_transfn for
both names because we cast the function name as regproc in bootstrap;
it complains about duplicated function names.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-31 Thread Pavel Stehule

2010/2/1 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp:

 Robert Haas robertmh...@gmail.com wrote:

  ok - I am only one who like original behave - so I ?am withdrawing.
  Robert, If you like, please commit Itagaki's patch. + add note about
  behave when second parameter isn't stable.

 I haven't even looked at this code - I sort of assumed Itagaki was
 handling this one.  But it might be good to make sure that the docs
 have been read through by a native English speaker prior to commit...

 Applied with some editorialization. Please check the docs are good enough.

 I removed a test pattern for variable delimiters from regression test
 because it might be an undefined behavior. We might change the behavior
 in the future, so we guarantee only constant delimiters.

 The transition functions are renamed to string_agg_transfn and
 string_agg_delim_transfn. We cannot use string_agg_transfn for
 both names because we cast the function name as regproc in bootstrap;
 it complains about duplicated function names.

thank you very much

Pavel Stehule


 Regards,
 ---
 Takahiro Itagaki
 NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-29 Thread Robert Haas

On Fri, Jan 29, 2010 at 2:43 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/1/28 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 with get_fn_expr_arg_stable() we are able to fix second parameter
 without some performance issues.

 No, that will create its own performance issues ---
 get_fn_expr_arg_stable isn't especially cheap.
 If there were a really strong reason why we had to do it,
 then I'd agree, but frankly the argument for disallowing
 a variable delimiter is too thin.

 it could be called only once. But I agree, so could be better, check
 it in parser time.

 ok - I am only one who like original behave - so I  am withdrawing.
 Robert, If you like, please commit Itagaki's patch. + add note about
 behave when second parameter isn't stable.

I haven't even looked at this code - I sort of assumed Itagaki was
handling this one.  But it might be good to make sure that the docs
have been read through by a native English speaker prior to commit...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-29 Thread David E. Wheeler

On Jan 29, 2010, at 10:43 AM, Robert Haas wrote:

 I haven't even looked at this code - I sort of assumed Itagaki was
 handling this one.  But it might be good to make sure that the docs
 have been read through by a native English speaker prior to commit...

I did and revised them slightly. There isn't much, just a brief comment in the 
table of aggregate functions. The documentation for all the functions on that 
page could use a little love, frankly.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-29 Thread Robert Haas

On Fri, Jan 29, 2010 at 1:45 PM, David E. Wheeler da...@kineticode.com wrote:
 On Jan 29, 2010, at 10:43 AM, Robert Haas wrote:

 I haven't even looked at this code - I sort of assumed Itagaki was
 handling this one.  But it might be good to make sure that the docs
 have been read through by a native English speaker prior to commit...

 I did and revised them slightly. There isn't much, just a brief comment in 
 the table of aggregate functions. The documentation for all the functions on 
 that page could use a little love, frankly.

Want to take a short at it?

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-29 Thread David E. Wheeler

On Jan 29, 2010, at 10:46 AM, Robert Haas wrote:

 I did and revised them slightly. There isn't much, just a brief comment in 
 the table of aggregate functions. The documentation for all the functions on 
 that page could use a little love, frankly.
 
 Want to take a short at it?

ENOTUITS! /me is already sorely over-committed…

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp:

 Pavel Stehule pavel.steh...@gmail.com wrote:

 with actualised oids

 I'm checking the patch for commit, and have a couple of comments.

 * I think we cannot cache the delimiter at the first call.
  For example,
    SELECT string_agg(elem, delim)
      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
  should return 'A+B*C' rather than 'A,B,C'.

no I dislike it. This using is nonsense.

Regards
Pavel


 * Can we use StringInfo directly as the aggregate context instead of
  StringAggState? For the first reason, we need to drop 'delimiter' field
  from struct StringAggState. Now it has only StringInfo field.

 * We'd better avoiding to call text_to_cstring() for delimitors and elements
  for performance reason. We can use appendBinaryStringInfo() here.

 My proposal patch attached.

 Also, I've not changed it yet, but it might be considerable:

 * Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost internal names, but we could have more
  like string_agg_with_sep_transfn.

 Comments?

 Regards,
 ---
 Takahiro Itagaki
 NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 David E. Wheeler da...@kineticode.com:
 On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

 * I think we cannot cache the delimiter at the first call.
  For example,
    SELECT string_agg(elem, delim)
      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
  should return 'A+B*C' rather than 'A,B,C'.

 Ooh, nice.

 * Can we use StringInfo directly as the aggregate context instead of
  StringAggState? For the first reason, we need to drop 'delimiter' field
  from struct StringAggState. Now it has only StringInfo field.

 Makes sense.

no, has not.

Pavel


 * We'd better avoiding to call text_to_cstring() for delimitors and elements
  for performance reason. We can use appendBinaryStringInfo() here.

 My proposal patch attached.

 Also, I've not changed it yet, but it might be considerable:

 * Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost internal names, but we could have more
  like string_agg_with_sep_transfn.

 Yes please.

 Comments?

 Patch looks great, thank you!

 David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Pavel Stehule pavel.steh...@gmail.com:
 2010/1/28 David E. Wheeler da...@kineticode.com:
 On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

 * I think we cannot cache the delimiter at the first call.
  For example,
    SELECT string_agg(elem, delim)
      FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
  should return 'A+B*C' rather than 'A,B,C'.

 Ooh, nice.

 * Can we use StringInfo directly as the aggregate context instead of
  StringAggState? For the first reason, we need to drop 'delimiter' field
  from struct StringAggState. Now it has only StringInfo field.

 Makes sense.

 no, has not.

What is use case for this behave??

Pavel



 Pavel


 * We'd better avoiding to call text_to_cstring() for delimitors and elements
  for performance reason. We can use appendBinaryStringInfo() here.

 My proposal patch attached.

 Also, I've not changed it yet, but it might be considerable:

 * Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost internal names, but we could have more
  like string_agg_with_sep_transfn.

 Yes please.

 Comments?

 Patch looks great, thank you!

 David





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Takahiro Itagaki


Pavel Stehule pavel.steh...@gmail.com wrote:

 2010/1/28 Pavel Stehule pavel.steh...@gmail.com:
  2010/1/28 David E. Wheeler da...@kineticode.com:
  On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
 
  * I think we cannot cache the delimiter at the first call.
For example,
  SELECT string_agg(elem, delim)
FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
should return 'A+B*C' rather than 'A,B,C'.
 
  no, has not.
 What is use case for this behave??

I also think this usage is nonsense, but seems to be the most consistent
behavior for me. I didn't say anything about use-cases, but just capability.
Since we allow such kinds of usage for now, you need to verify the
delimiter is not changed rather than ignoring it if you want disallow
to change the delimiter during an aggregation.

Of course you can cache the first delimiter at start, and check delimiters
are not changed every calls -- but I think it is just a waste of cpu cycle.

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Robert Haas

On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:
 Pavel Stehule pavel.steh...@gmail.com wrote:

 2010/1/28 Pavel Stehule pavel.steh...@gmail.com:
  2010/1/28 David E. Wheeler da...@kineticode.com:
  On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
 
  * I think we cannot cache the delimiter at the first call.
    For example,
      SELECT string_agg(elem, delim)
        FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
    should return 'A+B*C' rather than 'A,B,C'.
 
  no, has not.
 What is use case for this behave??

 I also think this usage is nonsense, but seems to be the most consistent
 behavior for me. I didn't say anything about use-cases, but just capability.
 Since we allow such kinds of usage for now, you need to verify the
 delimiter is not changed rather than ignoring it if you want disallow
 to change the delimiter during an aggregation.

 Of course you can cache the first delimiter at start, and check delimiters
 are not changed every calls -- but I think it is just a waste of cpu cycle.

Agreed.  Not caching it seems the simplest solution.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Robert Haas robertmh...@gmail.com:
 On Thu, Jan 28, 2010 at 3:59 AM, Takahiro Itagaki
 itagaki.takah...@oss.ntt.co.jp wrote:
 Pavel Stehule pavel.steh...@gmail.com wrote:

 2010/1/28 Pavel Stehule pavel.steh...@gmail.com:
  2010/1/28 David E. Wheeler da...@kineticode.com:
  On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:
 
  * I think we cannot cache the delimiter at the first call.
    For example,
      SELECT string_agg(elem, delim)
        FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
    should return 'A+B*C' rather than 'A,B,C'.
 
  no, has not.
 What is use case for this behave??

 I also think this usage is nonsense, but seems to be the most consistent
 behavior for me. I didn't say anything about use-cases, but just capability.
 Since we allow such kinds of usage for now, you need to verify the
 delimiter is not changed rather than ignoring it if you want disallow
 to change the delimiter during an aggregation.

 Of course you can cache the first delimiter at start, and check delimiters
 are not changed every calls -- but I think it is just a waste of cpu cycle.

 Agreed.  Not caching it seems the simplest solution.

simplest could not be a best. There have to be only a const
expression. But we have not possibility to check it in pg.

Pavel




 ...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Robert Haas

On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 simplest could not be a best. There have to be only a const
 expression. But we have not possibility to check it in pg.

Well... that's an entirely arbitrary limitation.  I admit that it
doesn't seem likely that someone would want to have a variable
delimiter, but putting extra effort and code complexity into
preventing it seems pointless.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Robert Haas robertmh...@gmail.com:
 On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 simplest could not be a best. There have to be only a const
 expression. But we have not possibility to check it in pg.

 Well... that's an entirely arbitrary limitation.  I admit that it
 doesn't seem likely that someone would want to have a variable
 delimiter, but putting extra effort and code complexity into
 preventing it seems pointless.

It is only a few lines with zero complexity.

The main issue of Takahiro proposal is  unclean behave.

we can have a content

c1c2
---
c11, c12,
c21, c22

and result of string_agg(c1, c2)

have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
will be NULL ?? I checked oracle. Oracle doesn't allow variable as
delimiter. We can't check it. But we can fix first value and using it
as constant.

More - Takahiro proposal has one performance gain. It have to deTOAST
separator value in every cycle.

Takahiro has true - we can store length of separator and we can add
separator to cumulative value as binary value.

I prefer original behave with  note in documentation - so as separator
is used a value on first row, when is used expression and not
constant.

Regards
Pavel





 ...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 simplest could not be a best. There have to be only a const
 expression. But we have not possibility to check it in pg.

 Well... that's an entirely arbitrary limitation.  I admit that it
 doesn't seem likely that someone would want to have a variable
 delimiter, but putting extra effort and code complexity into
 preventing it seems pointless.

Yeah.  The real issue here is that in some cases you'd like to have
non-aggregated parameters to an aggregate, but SQL has no notation
to express that.

I think Pavel's underlying complaint is that if the delimiter
argument isn't constant, then we're exposing an implementation
dependency in terms of just which values get separated by which
delimiters.  The most practical implementation seems to be that
the first-call delimiter isn't actually used at all, and on
subsequent calls the delimiter *precedes* the associated value,
which is a bit surprising given the order in which one writes
them.  Not sure if this is worth documenting though.  Those two
or three people who actually try it will figure it out soon enough.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Robert Haas

On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 simplest could not be a best. There have to be only a const
 expression. But we have not possibility to check it in pg.

 Well... that's an entirely arbitrary limitation.  I admit that it
 doesn't seem likely that someone would want to have a variable
 delimiter, but putting extra effort and code complexity into
 preventing it seems pointless.

 Yeah.  The real issue here is that in some cases you'd like to have
 non-aggregated parameters to an aggregate, but SQL has no notation
 to express that.

Right.

 I think Pavel's underlying complaint is that if the delimiter
 argument isn't constant, then we're exposing an implementation
 dependency in terms of just which values get separated by which
 delimiters.  The most practical implementation seems to be that
 the first-call delimiter isn't actually used at all, and on
 subsequent calls the delimiter *precedes* the associated value,
 which is a bit surprising given the order in which one writes
 them.  Not sure if this is worth documenting though.  Those two
 or three people who actually try it will figure it out soon enough.

Yeah, I'm thoroughly unworried about it.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Robert Haas

On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki
itagaki.takah...@oss.ntt.co.jp wrote:
 * Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost internal names, but we could have more
  like string_agg_with_sep_transfn.

Surely the names of the transition and final functions should match
the name of the aggregate.  But if there's a proposal on the table to
call this string_app_with_sep() instead of just string_agg(), +1 from
me.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Hitoshi Harada

2010/1/29 Pavel Stehule pavel.steh...@gmail.com:
 2010/1/28 Robert Haas robertmh...@gmail.com:
 On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 simplest could not be a best. There have to be only a const
 expression. But we have not possibility to check it in pg.

 Well... that's an entirely arbitrary limitation.  I admit that it
 doesn't seem likely that someone would want to have a variable
 delimiter, but putting extra effort and code complexity into
 preventing it seems pointless.

 It is only a few lines with zero complexity.

 The main issue of Takahiro proposal is  unclean behave.

 we can have a content

 c1    c2
 ---
 c11, c12,
 c21, c22

 and result of string_agg(c1, c2)

 have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
 will be NULL ?? I checked oracle. Oracle doesn't allow variable as
 delimiter. We can't check it. But we can fix first value and using it
 as constant.

What about get_fn_expr_arg_stable() to check if the argument is stable
during aggregate?

Regards,


-- 
Hitoshi Harada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Robert Haas robertmh...@gmail.com:
 On Thu, Jan 28, 2010 at 10:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 simplest could not be a best. There have to be only a const
 expression. But we have not possibility to check it in pg.

 Well... that's an entirely arbitrary limitation.  I admit that it
 doesn't seem likely that someone would want to have a variable
 delimiter, but putting extra effort and code complexity into
 preventing it seems pointless.

 Yeah.  The real issue here is that in some cases you'd like to have
 non-aggregated parameters to an aggregate, but SQL has no notation
 to express that.

 Right.

 I think Pavel's underlying complaint is that if the delimiter
 argument isn't constant, then we're exposing an implementation
 dependency in terms of just which values get separated by which
 delimiters.  The most practical implementation seems to be that
 the first-call delimiter isn't actually used at all, and on
 subsequent calls the delimiter *precedes* the associated value,
 which is a bit surprising given the order in which one writes
 them.  Not sure if this is worth documenting though.  Those two
 or three people who actually try it will figure it out soon enough.


ok - there is one query,

in 99.99% the second argument will be a constant. Can we use this
information and optimize function for this case?

The detoast on every row can take some percent from a performance.

Pavel

 Yeah, I'm thoroughly unworried about it.



 ...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Robert Haas robertmh...@gmail.com:
 On Wed, Jan 27, 2010 at 9:47 PM, Takahiro Itagaki
 itagaki.takah...@oss.ntt.co.jp wrote:
 * Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost internal names, but we could have more
  like string_agg_with_sep_transfn.

 Surely the names of the transition and final functions should match
 the name of the aggregate.  But if there's a proposal on the table to
 call this string_app_with_sep() instead of just string_agg(), +1 from
 me.

no, string_app_with_sep is too long for common using.

Pavel


 ...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 in 99.99% the second argument will be a constant. Can we use this
 information and optimize function for this case?

 The detoast on every row can take some percent from a performance.

What detoast?  There won't be one for a constant, nor even for a
variable in any sane situation --- who's going to be using
multi-kilobyte delimiter values?  And if they do, aren't they likely
to run out of memory for the result long before the repeated detoasts
become an interesting problem?  You're arguing about a case that
seems quite irrelevant to the real world.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Hitoshi Harada umi.tan...@gmail.com:
 2010/1/29 Pavel Stehule pavel.steh...@gmail.com:
 2010/1/28 Robert Haas robertmh...@gmail.com:
 On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 simplest could not be a best. There have to be only a const
 expression. But we have not possibility to check it in pg.

 Well... that's an entirely arbitrary limitation.  I admit that it
 doesn't seem likely that someone would want to have a variable
 delimiter, but putting extra effort and code complexity into
 preventing it seems pointless.

 It is only a few lines with zero complexity.

 The main issue of Takahiro proposal is  unclean behave.

 we can have a content

 c1    c2
 ---
 c11, c12,
 c21, c22

 and result of string_agg(c1, c2)

 have to be ?? c11 c12 c21 or c11 c22 c21 ?? What if some content of c2
 will be NULL ?? I checked oracle. Oracle doesn't allow variable as
 delimiter. We can't check it. But we can fix first value and using it
 as constant.

 What about get_fn_expr_arg_stable() to check if the argument is stable
 during aggregate?

I newer know so this function exists. Now we can

a) check and allow only stable params
b) when second parameter is stable, then store it and use it as constant.

I prefer a)

Pavel


 Regards,


 --
 Hitoshi Harada


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 in 99.99% the second argument will be a constant. Can we use this
 information and optimize function for this case?

 The detoast on every row can take some percent from a performance.

 What detoast?  There won't be one for a constant, nor even for a
 variable in any sane situation --- who's going to be using
 multi-kilobyte delimiter values?  And if they do, aren't they likely
 to run out of memory for the result long before the repeated detoasts
 become an interesting problem?  You're arguing about a case that
 seems quite irrelevant to the real world.


I asking

with get_fn_expr_arg_stable() we are able to fix second parameter
without some performance issues.

Regards
Pavel


                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Tom Lane

Hitoshi Harada umi.tan...@gmail.com writes:
 What about get_fn_expr_arg_stable() to check if the argument is stable
 during aggregate?

Seems quite expensive (possible catalog lookups) and there still isn't
any strong argument why we should bother.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 with get_fn_expr_arg_stable() we are able to fix second parameter
 without some performance issues.

No, that will create its own performance issues ---
get_fn_expr_arg_stable isn't especially cheap.
If there were a really strong reason why we had to do it,
then I'd agree, but frankly the argument for disallowing
a variable delimiter is too thin.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Marko Tiikkaja

On 2010-01-28 19:17, Pavel Stehule wrote:
 2010/1/28 Hitoshi Harada umi.tan...@gmail.com:
 What about get_fn_expr_arg_stable() to check if the argument is stable
 during aggregate?
 
 I newer know so this function exists. Now we can
 
 a) check and allow only stable params
 b) when second parameter is stable, then store it and use it as constant.
 
 I prefer a)

Someone might have a perfectly good use case for using different
delimiters.  I don't think it's a good idea to be artificially limiting
what you can and can't do.


Regards,
Marko Tiikkaja

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread David E. Wheeler

On Jan 28, 2010, at 9:29 AM, Marko Tiikkaja wrote:

 Someone might have a perfectly good use case for using different
 delimiters.  I don't think it's a good idea to be artificially limiting
 what you can and can't do.

+1

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Greg Stark

One situation where this could actually matter in the long term is if we
want to have an optimization for aggregate functions whose state variables
can be combined. this could be important if we ever want to do parallel
processing someday.

So we could have, for example two subjobs build two sublists and then
combiner the two lists later. in that case the separator might not be the
one the user expected - our put another way the one the user expected might
not be available when we need it.

We could say this isn't a problem because not all aggregate functions will
be amenable to such an optimization and perhaps this will just be one of
them. Alternately we could just have faith that a solution will be found -
it doesn't seem like it should be an insoluble problem to me.

greg

On 28 Jan 2010 15:57, Tom Lane t...@sss.pgh.pa.us wrote:

Robert Haas robertmh...@gmail.com writes:
 On Thu, Jan 28, 2010 at 9:01 AM, Pavel Stehule pavel
Yeah.  The real issue here is that in some cases you'd like to have
non-aggregated parameters to an aggregate, but SQL has no notation
to express that.

I think Pavel's underlying complaint is that if the delimiter
argument isn't constant, then we're exposing an implementation
dependency in terms of just which values get separated by which
delimiters.  The most practical implementation seems to be that
the first-call delimiter isn't actually used at all, and on
subsequent calls the delimiter *precedes* the associated value,
which is a bit surprising given the order in which one writes
them.  Not sure if this is worth documenting though.  Those two
or three people who actually try it will figure it out soon enough.

   regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subs...

Re: [HACKERS] Review: listagg aggregate

2010-01-28 Thread Pavel Stehule

2010/1/28 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 with get_fn_expr_arg_stable() we are able to fix second parameter
 without some performance issues.

 No, that will create its own performance issues ---
 get_fn_expr_arg_stable isn't especially cheap.
 If there were a really strong reason why we had to do it,
 then I'd agree, but frankly the argument for disallowing
 a variable delimiter is too thin.

it could be called only once. But I agree, so could be better, check
it in parser time.

ok - I am only one who like original behave - so I  am withdrawing.
Robert, If you like, please commit Itagaki's patch. + add note about
behave when second parameter isn't stable.

Regards
Pavel Stehule




                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-27 Thread Pavel Stehule

with actualised oids

Regards
Pavel Stehule

2010/1/26 David E. Wheeler da...@kineticode.com:
 On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote:

 actualised patch - the name is string_agg

 All looks fine except I'm getting this error during initdb:

 creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL:  
 could not create unique index pg_proc_oid_index
 DETAIL:  Key (oid)=(3031) is duplicated.
 child process exited with exit code 1

 Would you mind re-submitting with unique OIDs?

 Thanks,

 David


string_agg.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-27 Thread Robert Haas

On Tue, Jan 26, 2010 at 1:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 But what it *produces* is a string.  For comparison, the
 SQL-standard-specified array_agg produces arrays, but what it
 acts on isn't an array.

 This point is well-taken, but naming it string_agg() because it
 produces a string doesn't seem quite descriptive enough.  We might
 someday (if we don't already) have a number of aggregates that produce
 an output that is a string; we can't name them all by the output type.

 True, but the same point could be made against array_agg, and that
 didn't stop the committee from choosing that name.  As long as
 string_agg is the most obvious aggregate-to-string functionality,
 which ISTM it is, I think it's all right for it to have pride of place
 in naming.

Maybe so, but personally, I'd still prefer something more descriptive.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-27 Thread David E. Wheeler

On Jan 27, 2010, at 7:58 AM, Pavel Stehule wrote:

 with actualised oids

Thanks. Looks good, modulo my preference for concat_agg(). I'll mark it ready 
for committer.

Best,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-27 Thread Takahiro Itagaki


Pavel Stehule pavel.steh...@gmail.com wrote:

 with actualised oids

I'm checking the patch for commit, and have a couple of comments.

* I think we cannot cache the delimiter at the first call.
  For example,
SELECT string_agg(elem, delim)
  FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
  should return 'A+B*C' rather than 'A,B,C'.

* Can we use StringInfo directly as the aggregate context instead of
  StringAggState? For the first reason, we need to drop 'delimiter' field
  from struct StringAggState. Now it has only StringInfo field.

* We'd better avoiding to call text_to_cstring() for delimitors and elements
  for performance reason. We can use appendBinaryStringInfo() here.

My proposal patch attached.

Also, I've not changed it yet, but it might be considerable:

* Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost internal names, but we could have more 
  like string_agg_with_sep_transfn.

Comments?

Regards,
---
Takahiro Itagaki
NTT Open Source Software Center



string_agg_20100128.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-27 Thread David E. Wheeler

On Jan 27, 2010, at 6:47 PM, Takahiro Itagaki wrote:

 * I think we cannot cache the delimiter at the first call.
  For example,
SELECT string_agg(elem, delim)
  FROM (VALUES('A', ','), ('B', '+'), ('C', '*')) t(elem, delim);
  should return 'A+B*C' rather than 'A,B,C'.

Ooh, nice.

 * Can we use StringInfo directly as the aggregate context instead of
  StringAggState? For the first reason, we need to drop 'delimiter' field
  from struct StringAggState. Now it has only StringInfo field.

Makes sense.

 * We'd better avoiding to call text_to_cstring() for delimitors and elements
  for performance reason. We can use appendBinaryStringInfo() here.
 
 My proposal patch attached.
 
 Also, I've not changed it yet, but it might be considerable:
 
 * Do we need better names for string_agg1_transfn and string_agg2_transfn?
  They are almost internal names, but we could have more 
  like string_agg_with_sep_transfn.

Yes please.

 Comments?

Patch looks great, thank you!

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread David E. Wheeler

On Jan 25, 2010, at 23:14, Pavel Stehule pavel.steh...@gmail.com  
wrote:



why is concat_agg better than listagg ?


Because it's an aggregate that cocatenates values. It's not an  
aggregate that lists things. I also like concat_agg better than  
string_agg because it's not limited to acting on strings.


Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Alastair Turner

On Tue, Jan 26, 2010 at 1:08 PM, David E. Wheeler da...@kineticode.com wrote:

.


 Because it's an aggregate that cocatenates values. It's not an aggregate
 that lists things. I also like concat_agg better than string_agg because
 it's not limited to acting on strings.


.

Given that it potentially produces a delimited list, not a straight
conacatenation (and that list is unacceptable since it would be
descriptive as a noun but not as a verb) would implode_agg not be the
most descriptive name?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Alastair Turner

On Tue, Jan 26, 2010 at 1:23 PM, Alastair Turner b...@ctrlf5.co.za wrote:
 .

 Given that it potentially produces a delimited list, not a straight
 conacatenation (and that list is unacceptable since it would be
 descriptive as a noun but not as a verb) would implode_agg not be the
 most descriptive name?


Actually, scratch that. The other *agg functions are named for what
they produce as output. Not for their process - as per the objection
to list_agg and suggestions of conact_agg and implode_agg. string_agg
would be consistent, which is a wonderful thing if you can get it in a
naming scheme.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Peter Eisentraut

On tis, 2010-01-26 at 03:08 -0800, David E. Wheeler wrote:
 On Jan 25, 2010, at 23:14, Pavel Stehule pavel.steh...@gmail.com  
 wrote:
 
  why is concat_agg better than listagg ?
 
 Because it's an aggregate that cocatenates values. It's not an  
 aggregate that lists things. I also like concat_agg better than  
 string_agg because it's not limited to acting on strings.

What else can it act on?


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Robert Haas

On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule pavel.steh...@gmail.com wrote:
 2010/1/25 Robert Haas robertmh...@gmail.com:
 On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com 
 wrote:
 On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

 concat_agg().

 I like that one...

 why is concat_agg better than listagg ?

Because it doesn't make lists.

Honestly, I don't love concat_agg() either - why should something need
to have agg in the name just because it's an aggregate?  I think the
most descriptive name would be something like
concatenate_with_separator(), but that's kind of long.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Pavel Stehule

2010/1/26 Robert Haas robertmh...@gmail.com:
 On Tue, Jan 26, 2010 at 2:14 AM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 2010/1/25 Robert Haas robertmh...@gmail.com:
 On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com 
 wrote:
 On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

 concat_agg().

 I like that one...

 why is concat_agg better than listagg ?

 Because it doesn't make lists.

 Honestly, I don't love concat_agg() either - why should something need
 to have agg in the name just because it's an aggregate?  I think the
 most descriptive name would be something like
 concatenate_with_separator(), but that's kind of long.

This is never ending story :)

MySQL has function concate_ws - but this function has different semantic.

I thing so string_agg is short, and from one view consistent

Pavel


 ...Robert


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread David E. Wheeler


On Jan 26, 2010, at 4:03, Peter Eisentraut pete...@gmx.net wrote:


What else can it act on?


Any data type, since they all can be converted to text. Integers would  
be a common choice.


David

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 2010/1/25 Robert Haas robertmh...@gmail.com:
 On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com 
 wrote:
 concat_agg().
 
 I like that one...

 why is concat_agg better than listagg ?

It isn't ... it's the wrong part of speech.  concatenate is a verb,
whereas the other functions we would like it to be named parallel to
are using nouns there.

(Yes, I know array can be used as a verb, but I don't think anyone
reads it that way in array_agg...)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Kevin Grittner

Tom Lane t...@sss.pgh.pa.us wrote:
 Pavel Stehule pavel.steh...@gmail.com writes:
 
 why is concat_agg better than listagg ?
 
 It isn't ... it's the wrong part of speech.  concatenate is a
 verb,
 
Concatenation is a noun.  concat doesn't get far enough to
distinguish.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Tom Lane

David E. Wheeler da...@kineticode.com writes:
 Because it's an aggregate that cocatenates values. It's not an  
 aggregate that lists things. I also like concat_agg better than  
 string_agg because it's not limited to acting on strings.

But what it *produces* is a string.  For comparison, the
SQL-standard-specified array_agg produces arrays, but what it
acts on isn't an array.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread David E. Wheeler

On Jan 26, 2010, at 9:36 AM, Tom Lane wrote:

 But what it *produces* is a string.  For comparison, the
 SQL-standard-specified array_agg produces arrays, but what it
 acts on isn't an array.

Meh. This is all just bike-shedding. I'm fine with string_agg(), though in 
truth none of the names has really been great. The inclusion of agg in the 
name is unfortunate.

I'll have a look at Pavel's new patch now.

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Tom Lane

David E. Wheeler da...@kineticode.com writes:
 Meh. This is all just bike-shedding. I'm fine with string_agg(), though in 
 truth none of the names has really been great. The inclusion of agg in the 
 name is unfortunate.

Yeah, I wouldn't be for it either if it weren't for the precedent of
array_agg.  I was quite surprised that the SQL committee chose that
name, because they've avoided using the term aggregate function at
all, but there it is ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread David E. Wheeler

On Jan 25, 2010, at 6:56 AM, Pavel Stehule wrote:

 actualised patch - the name is string_agg

All looks fine except I'm getting this error during initdb:

creating template1 database in /usr/local/pgsql-devel/data/base/1 ... FATAL:  
could not create unique index pg_proc_oid_index
DETAIL:  Key (oid)=(3031) is duplicated.
child process exited with exit code 1

Would you mind re-submitting with unique OIDs?

Thanks,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Robert Haas

On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 David E. Wheeler da...@kineticode.com writes:
 Because it's an aggregate that cocatenates values. It's not an
 aggregate that lists things. I also like concat_agg better than
 string_agg because it's not limited to acting on strings.

 But what it *produces* is a string.  For comparison, the
 SQL-standard-specified array_agg produces arrays, but what it
 acts on isn't an array.

This point is well-taken, but naming it string_agg() because it
produces a string doesn't seem quite descriptive enough.  We might
someday (if we don't already) have a number of aggregates that produce
an output that is a string; we can't name them all by the output type.

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-26 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 On Tue, Jan 26, 2010 at 12:36 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 But what it *produces* is a string.  For comparison, the
 SQL-standard-specified array_agg produces arrays, but what it
 acts on isn't an array.

 This point is well-taken, but naming it string_agg() because it
 produces a string doesn't seem quite descriptive enough.  We might
 someday (if we don't already) have a number of aggregates that produce
 an output that is a string; we can't name them all by the output type.

True, but the same point could be made against array_agg, and that
didn't stop the committee from choosing that name.  As long as
string_agg is the most obvious aggregate-to-string functionality,
which ISTM it is, I think it's all right for it to have pride of place
in naming.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Peter Eisentraut

On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote:
 I think listagg or string_agg would be the most appropriate names. Oh 
 and before Oracle had wm_concat, Tom Kyte wrote a function called
 stragg that was pretty popular.

Well,

xmlagg - concatenates values to form xml datum
array_agg - concatenates values to form array datum
??? - concatenates values to form string datum

So it's pretty clear that listagg does not fit into this scheme.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Pavel Stehule

2010/1/25 Peter Eisentraut pete...@gmx.net:
 On sön, 2010-01-24 at 21:29 -0800, Scott Bailey wrote:
 I think listagg or string_agg would be the most appropriate names. Oh
 and before Oracle had wm_concat, Tom Kyte wrote a function called
 stragg that was pretty popular.

 Well,

 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

 So it's pretty clear that listagg does not fit into this scheme.

when you define list as text domain, then this the name is correct.

searching list sql string on google. You can see, so list is usually used.

Regards
Pavel



 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Tom Lane

Pavel Stehule pavel.steh...@gmail.com writes:
 2010/1/25 Peter Eisentraut pete...@gmx.net:
 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum
 
 So it's pretty clear that listagg does not fit into this scheme.

 when you define list as text domain, then this the name is correct.

IOW, if you define away the problem then there's no problem?

I agree that list is a terrible choice of name here.  string_agg
seemed reasonable and in keeping with the standardized array_agg.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Pavel Stehule

2010/1/25 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 2010/1/25 Peter Eisentraut pete...@gmx.net:
 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

 So it's pretty clear that listagg does not fit into this scheme.

 when you define list as text domain, then this the name is correct.

 IOW, if you define away the problem then there's no problem?

 I agree that list is a terrible choice of name here.  string_agg
 seemed reasonable and in keeping with the standardized array_agg.


I am not happy, I thing so we do bigger chaos then it is. But it
hasn't progress. So I agree with name string_agg. In this case isn't a
problem rename this function if somebody would.

I'll send patch over hour.

regards
Pavel Stehule

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Pavel Stehule

2010/1/25 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 2010/1/25 Peter Eisentraut pete...@gmx.net:
 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

 So it's pretty clear that listagg does not fit into this scheme.

 when you define list as text domain, then this the name is correct.

 IOW, if you define away the problem then there's no problem?

 I agree that list is a terrible choice of name here.  string_agg
 seemed reasonable and in keeping with the standardized array_agg.

actualised patch - the name is string_agg

regards
Pavel Stehule


                        regards, tom lane



string_agg.diff
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Pavel Stehule

2010/1/24 Tom Lane t...@sss.pgh.pa.us:
 Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
 No performance issues

 ISTM that this class of function is inherently dangerous performance
 wise.

 * It looks incredibly easy to construct enormous lists. We should test
 the explosion limit of this to see how it is handled. Perhaps we need
 some parameter limits to control that, depending upon results.

 * Optimizer doesn't consider whether the result type of an aggregate get
 bigger as the aggregate processes more rows. If we're adding this
 function we should give some thought in that area also, or at least a
 comment to note that it can and will cause the optimizer problems in
 complex queries.

 We have that problem already with array_agg(), and I don't recall many
 complaints about it.  It might be worth worrying about at some point,
 but I don't think it's reasonable to insist that it be fixed before
 any more such aggregates are created.

 I agree that testing-to-failure would be a good idea just to be sure it
 fails cleanly.

postgres=# \timing
Timing is on.
postgres=# select
pg_size_pretty(length(string_agg('012345678901234567890'::text,',')))
from generate_series(1,1000) g(i);
 pg_size_pretty

 210 MB
(1 row)

Time: 5831,218 ms
postgres=# select
pg_size_pretty(length(string_agg('012345678901234567890'::text,',')))
from generate_series(1,5000) g(i);
^[^[ERROR:  out of memory
DETAIL:  Cannot enlarge string buffer containing 1073741812 bytes by
21 more bytes.
postgres=#

I thing, so 210 MB is more then is necessary :)

Regards
Pavel Stehule



                        regards, tom lane

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread David E. Wheeler

On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:

 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

concat_agg().

David

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread David E. Wheeler

On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote:

 I am not happy, I thing so we do bigger chaos then it is. But it
 hasn't progress. So I agree with name string_agg. In this case isn't a
 problem rename this function if somebody would.

Could you not use CREATE AGGREGATE to create an alias named listagg if you 
needed it? Or are we limited to a single argument in that case?

David




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Robert Haas

On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com wrote:
 On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

 concat_agg().

I like that one...

...Robert

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Pavel Stehule

2010/1/25 David E. Wheeler da...@kineticode.com:
 On Jan 25, 2010, at 6:12 AM, Pavel Stehule wrote:

 I am not happy, I thing so we do bigger chaos then it is. But it
 hasn't progress. So I agree with name string_agg. In this case isn't a
 problem rename this function if somebody would.

 Could you not use CREATE AGGREGATE to create an alias named listagg if you 
 needed it? Or are we limited to a single argument in that case?

we can, but without one common known name we will have a propriety name.

Pavel


 David





-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-25 Thread Pavel Stehule

2010/1/25 Robert Haas robertmh...@gmail.com:
 On Mon, Jan 25, 2010 at 2:27 PM, David E. Wheeler da...@kineticode.com 
 wrote:
 On Jan 25, 2010, at 2:09 AM, Peter Eisentraut wrote:
 xmlagg - concatenates values to form xml datum
 array_agg - concatenates values to form array datum
 ??? - concatenates values to form string datum

 concat_agg().

 I like that one...


why is concat_agg better than listagg ?

Pavel

 ...Robert

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread Pavel Stehule

Hello

2010/1/22 David E. Wheeler da...@kineticode.com:
 Pavel,

 My review of your listagg patch.

 Submission Review
 -
 * The diff is a context diff and applies cleanly to HEAD (with just two hunks 
 offset by 2 lines each).

 * There is documentation, though I'm not sure it needs to be mentioned in the 
 string functions documentation. No harm in it, I guess.

  I would like to see an example, though, and the documentation does not 
 currently explain what each of the parameters are for. In fact, it looks like 
 all the existing aggregates take only one parameter, so there was not 
 previously a need to explain it. But listagg() has an optional second param. 
 I think that the description should explain what it's for.


can I help with it, please. My English is terrible.

 * There are tests and they look fine.

 Usability Review
 
 * The patch does in fact implement the aggregate function it describes, and 
 OH YES do we want it (I've written my own in SQL a few times).

 * No, we don't already have it.

 * Yes it follows community-agreed behavior. I'm assuming that there is no 
 special parsing of aggregate functions, so the simple use of commas to 
 separate the two parameters is appropriate, rather than using a keyword like 
 MySQL's SEPARATOR in the group_concat() aggregate.

 * No need to have pg_dump support, no dangers that I can see, looks like all 
 the bases have been covered.

 Feature Test
 
 * Everything built cleanly, but I got an OID dupe error when I tried to init 
 the DB. Looks like 2997 and 2998 have been used for something else since you 
 created the patch. I changed them to 2995 and 2996 and then it worked.
 * The feature appears to work. I didn't see any tests for encodings or other 
 data types, so I ran a few myself and they work fine:

 postgres=# select listagg(a, U'-\0441\043B\043E\043D-') from 
 (values(''),(''),(''
         listagg
 --
  -слон--слон-
 (1 row)

 postgres=# select listagg(a, U'\2014') from 
 (values(U'\0441\043B\043E\043D'),(U'd\0061t\+61'),(U'\0441\043B\043E\043D'))
  AS g(a);
    listagg
 
  слон—data—слон
 (1 row)


 postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a);
  listagg
 -
  123
 (1 row)


 Performance Review
 --

 No performance issues, except that it should be faster than a custom 
 aggregate that does the same thing. To test, I created a quick custom 
 aggregate (no second argument, alas, so listagg() is more flexible) like so:

    CREATE OR REPLACE FUNCTION a2s(ANYARRAY)
    RETURNS TEXT LANGUAGE SQL AS $$
        SELECT array_to_string($1, ',');
    $$;

   CREATE AGGREGATE string_accum (
        SFUNC    = array_accum,
        BASETYPE = ANYELEMENT,
        STYPE    = ANYARRAY,
        INITCOND = '{}',
        FINALFUNC = a2s
    );

 Then I ran some simple tests (thanks for the clue, depesz):

 postgres=# select count(*) from (select string_accum(a) from 
 (values(''),(''),('')) AS g(a), generate_series(1,1) i) AS 
 x(i);
  count
 ---
     1
 (1 row)

 Time: 1365.382 ms

 postgres=# select count(*) from (select listagg(a) from 
 (values(''),(''),('')) AS g(a), generate_series(1,1) i) AS 
 x(i);
  count
 ---
     1
 (1 row)

 Time: 17.989 ms

 So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, 
 and my system is built with --enable-cassert and --enable-debug. Still, good 
 job.

 Coding Review
 -

 * Is varchar.c really the best place to put the ListAggState struct and the 
 listagg() function? I grepped the source for array_agg() and it's in 
 src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for 
 string functions? Otherwise, the style of the C code looks fine to my 
 untrained eye.


array user functions are used more time in pg core. The complexity of
array functions are much higher, so I don't think we need special
file.

  Actually, shouldn't it return text rather than varchar?


I'll recheck it. I am sure so all parameters should be a text.

 * Does it really require four functions to do its work? Might there be some 
 way to use the array_agg() C functions and then just a different final 
 function to turn it into a string (using the internal array_to_string() 
 function, perhaps)?

We can, but it isn't good way. Processing of arrays is little bit more
expensive then processing plain text. It is reason why listagg is
faster, than your custom aggregate. More, the final function could be
faster - the content is final.

 I'm not at all sure about it, but given how little code was required
to create the same basic functionality in SQL, I'm surprised that the
C implementation requires four functions (accumStringResult(),
listagg1_transfn(), listagg2_transfn(), and listagg_finalfn()). Maybe
they're required to make it fast and avoid the overhead of an array?

It normal for aggregate

Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread David E. Wheeler

On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote:

 can I help with it, please. My English is terrible.

Yes, I added a bit in the patch I submitted.

 array user functions are used more time in pg core. The complexity of
 array functions are much higher, so I don't think we need special
 file.

Okay. Should have tried it in PL/pgSQL then, perhaps.

 I'll recheck it. I am sure so all parameters should be a text.

Probably shouldn't go into varchar.c then, yes?

 We can, but it isn't good way. Processing of arrays is little bit more
 expensive then processing plain text. It is reason why listagg is
 faster, than your custom aggregate. More, the final function could be
 faster - the content is final.

Understood.

 It normal for aggregate functions. We need more transfn function,
 because we need two two variant: listagg(col), listagg(col, sep). Our
 coding guidlines doesn't advice share C functions - but these
 functions are +/- wrapper for accumStringResult - so there is zero
 overhead.

Ah, okay, it's because of the second argument. Now I understand.

 I don't think. When we have function, with same parameters, same
 behave like some Oracle function, then I am strongly prefer Oracle
 name. I don't see any benefit from different name. It can only confuse
 developers and add the trable to people who porting applications.

Meh. If the name is terrible, we don't have to use it, and it's easy enough to 
create an alias in SQL for those who need it.

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread Simon Riggs

On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
 No performance issues

ISTM that this class of function is inherently dangerous performance
wise. 

* It looks incredibly easy to construct enormous lists. We should test
the explosion limit of this to see how it is handled. Perhaps we need
some parameter limits to control that, depending upon results.

* Optimizer doesn't consider whether the result type of an aggregate get
bigger as the aggregate processes more rows. If we're adding this
function we should give some thought in that area also, or at least a
comment to note that it can and will cause the optimizer problems in
complex queries.

-- 
 Simon Riggs   www.2ndQuadrant.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread Pavel Stehule

2010/1/24 Simon Riggs si...@2ndquadrant.com:
 On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
 No performance issues

 ISTM that this class of function is inherently dangerous performance
 wise.

there is potencial risk, but this risk isn't new. The almost all what
you say is true for array aggregates.


 * It looks incredibly easy to construct enormous lists. We should test
 the explosion limit of this to see how it is handled. Perhaps we need
 some parameter limits to control that, depending upon results.

There are no limit for generating large values - like bytea, xml, or
text. There are not limit for array_accum or array(query). So, I don't
think we need some special mechanism for listagg. If somebody will
generate too large a value, then he will get out of memory
exception.


 * Optimizer doesn't consider whether the result type of an aggregate get
 bigger as the aggregate processes more rows. If we're adding this
 function we should give some thought in that area also, or at least a
 comment to note that it can and will cause the optimizer problems in
 complex queries.


this is true, but this isn't some new. array_accum working well
without optimizer problems.

 --
  Simon Riggs           www.2ndQuadrant.com


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread Tom Lane

Simon Riggs si...@2ndquadrant.com writes:
 On Fri, 2010-01-22 at 11:14 -0800, David E.Wheeler wrote:
 No performance issues

 ISTM that this class of function is inherently dangerous performance
 wise. 

 * It looks incredibly easy to construct enormous lists. We should test
 the explosion limit of this to see how it is handled. Perhaps we need
 some parameter limits to control that, depending upon results.

 * Optimizer doesn't consider whether the result type of an aggregate get
 bigger as the aggregate processes more rows. If we're adding this
 function we should give some thought in that area also, or at least a
 comment to note that it can and will cause the optimizer problems in
 complex queries.

We have that problem already with array_agg(), and I don't recall many
complaints about it.  It might be worth worrying about at some point,
but I don't think it's reasonable to insist that it be fixed before
any more such aggregates are created.

I agree that testing-to-failure would be a good idea just to be sure it
fails cleanly.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread Pavel Stehule

2010/1/24 David E. Wheeler da...@kineticode.com:
 On Jan 24, 2010, at 1:19 AM, Pavel Stehule wrote:

 can I help with it, please. My English is terrible.

 Yes, I added a bit in the patch I submitted.

 array user functions are used more time in pg core. The complexity of
 array functions are much higher, so I don't think we need special
 file.

 Okay. Should have tried it in PL/pgSQL then, perhaps.


:) - I'll look on it again.

 I'll recheck it. I am sure so all parameters should be a text.

 Probably shouldn't go into varchar.c then, yes?

 We can, but it isn't good way. Processing of arrays is little bit more
 expensive then processing plain text. It is reason why listagg is
 faster, than your custom aggregate. More, the final function could be
 faster - the content is final.

 Understood.

 It normal for aggregate functions. We need more transfn function,
 because we need two two variant: listagg(col), listagg(col, sep). Our
 coding guidlines doesn't advice share C functions - but these
 functions are +/- wrapper for accumStringResult - so there is zero
 overhead.

 Ah, okay, it's because of the second argument. Now I understand.

 I don't think. When we have function, with same parameters, same
 behave like some Oracle function, then I am strongly prefer Oracle
 name. I don't see any benefit from different name. It can only confuse
 developers and add the trable to people who porting applications.

 Meh. If the name is terrible, we don't have to use it, and it's easy enough 
 to create an alias in SQL for those who need it.


The list is common name for this content - it is usual in Microsoft
SQL Server, it is usual in Oracle. Maybe we can vote about the name

Regards
Pavel Stehule

 Best,

 David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Review: listagg aggregate

2010-01-24 Thread Scott Bailey




I don't think. When we have function, with same parameters, same
behave like some Oracle function, then I am strongly prefer Oracle
name. I don't see any benefit from different name. It can only confuse
developers and add the trable to people who porting applications.


Meh. If the name is terrible, we don't have to use it, and it's easy enough to 
create an alias in SQL for those who need it.


The corresponding function in Oracle is called wm_concat. In MySQL its 
called group_concat. I don't believe DB2 or SQL Server have built in 
equivalents. The Oracle name isn't really an option (wm' stands for 
workspace manager)


I think listagg or string_agg would be the most appropriate names. Oh 
and before Oracle had wm_concat, Tom Kyte wrote a function called stragg 
that was pretty popular.


Scott

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Review: listagg aggregate

2010-01-22 Thread David E . Wheeler

Pavel,

My review of your listagg patch.

Submission Review
-
* The diff is a context diff and applies cleanly to HEAD (with just two hunks 
offset by 2 lines each).

* There is documentation, though I'm not sure it needs to be mentioned in the 
string functions documentation. No harm in it, I guess.

  I would like to see an example, though, and the documentation does not 
currently explain what each of the parameters are for. In fact, it looks like 
all the existing aggregates take only one parameter, so there was not 
previously a need to explain it. But listagg() has an optional second param. I 
think that the description should explain what it's for.

* There are tests and they look fine.

Usability Review

* The patch does in fact implement the aggregate function it describes, and OH 
YES do we want it (I've written my own in SQL a few times).

* No, we don't already have it.

* Yes it follows community-agreed behavior. I'm assuming that there is no 
special parsing of aggregate functions, so the simple use of commas to separate 
the two parameters is appropriate, rather than using a keyword like MySQL's 
SEPARATOR in the group_concat() aggregate.

* No need to have pg_dump support, no dangers that I can see, looks like all 
the bases have been covered.

Feature Test

* Everything built cleanly, but I got an OID dupe error when I tried to init 
the DB. Looks like 2997 and 2998 have been used for something else since you 
created the patch. I changed them to 2995 and 2996 and then it worked.
* The feature appears to work. I didn't see any tests for encodings or other 
data types, so I ran a few myself and they work fine:

postgres=# select listagg(a, U'-\0441\043B\043E\043D-') from 
(values(''),(''),(''
 listagg  
--
 -слон--слон-
(1 row)

postgres=# select listagg(a, U'\2014') from 
(values(U'\0441\043B\043E\043D'),(U'd\0061t\+61'),(U'\0441\043B\043E\043D'))
 AS g(a);
listagg 

 слон—data—слон
(1 row)


postgres=# select listagg(a::text) from (values(1),(2),(3)) AS g(a);
 listagg 
-
 123
(1 row)


Performance Review
--

No performance issues, except that it should be faster than a custom aggregate 
that does the same thing. To test, I created a quick custom aggregate (no 
second argument, alas, so listagg() is more flexible) like so:

CREATE OR REPLACE FUNCTION a2s(ANYARRAY)
RETURNS TEXT LANGUAGE SQL AS $$
SELECT array_to_string($1, ',');
$$;

   CREATE AGGREGATE string_accum (
SFUNC= array_accum,
BASETYPE = ANYELEMENT,
STYPE= ANYARRAY,
INITCOND = '{}',
FINALFUNC = a2s 
);

Then I ran some simple tests (thanks for the clue, depesz):

postgres=# select count(*) from (select string_accum(a) from 
(values(''),(''),('')) AS g(a), generate_series(1,1) i) AS x(i);
 count 
---
 1
(1 row)

Time: 1365.382 ms

postgres=# select count(*) from (select listagg(a) from 
(values(''),(''),('')) AS g(a), generate_series(1,1) i) AS x(i);
 count 
---
 1
(1 row)

Time: 17.989 ms

So overall, it looks like listagg() is 1-2 orders of magnitude faster. YMMV, 
and my system is built with --enable-cassert and --enable-debug. Still, good 
job.

Coding Review
-

* Is varchar.c really the best place to put the ListAggState struct and the 
listagg() function? I grepped the source for array_agg() and it's in 
src/backend/utils/adt/array_userfuncs.c. Maybe there's an equivalent file for 
string functions? Otherwise, the style of the C code looks fine to my untrained 
eye.

  Actually, shouldn't it return text rather than varchar?

* Does it really require four functions to do its work? Might there be some way 
to use the array_agg() C functions and then just a different final function to 
turn it into a string (using the internal array_to_string() function, perhaps)? 
I'm not at all sure about it, but given how little code was required to create 
the same basic functionality in SQL, I'm surprised that the C implementation 
requires four functions (accumStringResult(), listagg1_transfn(), 
listagg2_transfn(), and listagg_finalfn()). Maybe they're required to make it 
fast and avoid the overhead of an array?

* No compiler warnings, I never made it crash, good comments, does what it says 
on the tin. I doubt that there are any portability issues, as the code seems to 
use standard PostgreSQL internal macros and functions.

Architecture Review
---

* No dependencies, things seem to make sense overall, notwithstanding my 
questions in the Coding Review.

Review Review
-

The only thing I haven't covered so far is the name. I agree with Tom's 
assertion that the name is awful. Sure there may be a precedent in Oracle, but 
I hardly find that convincing (some of the big corporations seem to do a really 
shitty job naming

72 matches

Mail list logo