Re: [PATCHES] array_accum aggregate

2007-02-13 Thread Bruce Momjian

What is the status of this feature addition?

---

Stephen Frost wrote:
-- Start of PGP signed section.
 * Tom Lane ([EMAIL PROTECTED]) wrote:
  (However, now that we support nulls in arrays, meseems a more consistent
  definition would be that it allows null inputs and just includes them in
  the output.  So probably you do need it non-strict.)
 
 This was my intention.
 
  I'm inclined to think that this example demonstrates a deficiency in the
  aggregate-function design: there should be a way to declare what we're
  really doing.  But I don't know exactly what that should look like.
 
 I agree and would much rather have a clean solution which works with the
 design than one which has to work outside it.  When I first was trying
 to decide on the state-type I was looking through the PG catalogs for
 essentially a complex C type which translated to a void*.  Perhaps
 such a type could be added.  Unless that's considered along the lines of
 an 'any' type it'd cause problems for the polymorphism aspect.  
 
 Another alternative would be to provide a seperate area for each 
 aggregate to put any other information it needs.  This would almost
 certainly only be available to C functions but would essentially be a
 void* which is provided through the AggState structure but tracked by
 the aggregator routines and reset for each aggregate function being 
 run.  If that's acceptable, I don't think it'd be all that difficult to
 implement.  With that, aaccum_sfunc and aaccum_ffunc would ignore the 
 state variable passed to them in favor of their custom structure 
 available through fcinfo-AggState (I expect they'd just keep the 
 state variable NULL and be marked non-strict, or set it to some constant
 if necessary).  The pointer would have to be tracked somewhere and then
 copied in/out on each call, but that doesn't seem too difficult to me.
 After all, the state variable is already being tracked somewhere, this
 would just sit next to it, in my head anyway.
 
 I've got some time this weekend and would be happy to take a shot at
 the second proposal if that's generally acceptable.
 
   Thanks,
 
   Stephen
-- End of PGP section, PGP failed!

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Martijn van Oosterhout
On Thu, Oct 12, 2006 at 06:58:52PM -0400, Tom Lane wrote:
 I wrote:
  aggregate_state would have no other uses in the system, and its input
  and output functions would raise an error, so type safety is assured
  --- there would be no way to call either the sfunc or ffunc manually,
  except by passing a NULL value, which should be safe because that's what
  they'd expect as the aggregate initial condition.
 
 Um, no, I take that back, unless you want to invent a separate
 pseudotype for each such aggregate.  Otherwise you can crash it with

snip

What this really calls for is a type that users are forbidden to
interact with directly. Basically, the type may only be used by C
functions and such C functions may not appear in an SQL query.

Seems the only way to safely deal with datums you don't know the
representation of.

The name internal would be nice, but it's taken :(

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 What this really calls for is a type that users are forbidden to
 interact with directly. Basically, the type may only be used by C
 functions and such C functions may not appear in an SQL query.

That's not really the flavor of solution I'd like to have.  Ideally,
it'd actually *work* to write

my_ffunc(my_sfunc(my_sfunc(null, 1), 2))

and get the same result as aggregating over the values 1 and 2.  The
trick is to make sure that my_sfunc and my_ffunc can only be used
together.  Maybe we do need a type for each such aggregate ...

In any case, internal isn't quite the right model, because with that,
the type system is basically disclaiming all knowledge of the data's
properties.  With an aggregate_state datatype, the type system would
be asserting that it's OK to use this type (or more accurately, these
functions) in an aggregate.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 That's not really the flavor of solution I'd like to have.  Ideally,
 it'd actually *work* to write
 
   my_ffunc(my_sfunc(my_sfunc(null, 1), 2))
 
 and get the same result as aggregating over the values 1 and 2.  The
 trick is to make sure that my_sfunc and my_ffunc can only be used
 together.  Maybe we do need a type for each such aggregate ...

In general I like this idea but there are some complications, the main
one being where would the memory be allocated?  I guess we could just
always use the QueryContext.  The other issue is, in the above scenario
is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
call?  Normally it's unacceptable to modify your input, aggregates get a
special exception when called as an aggregate, but in this case you'd
have to copy the entire custom structure underneath, I'd think.

As for a type for each such aggregate, that seems reasonable to me,
honestly.  I'd like for the type creation to be reasonably simple
though, if possible.  ie: could we just provide NULLs for the
input/output functions instead of having to implement functions that
just return an error?  Or perhaps have a polymorphic function which can
be reused that just returns an error.  Additionally, we'd have to be
able to mark the types as being polymorhpic along the same lines as
anyelement/anyarray.  Hopefully that's already easy, but if not it'd be
nice if it could be made easy...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 That's not really the flavor of solution I'd like to have.  Ideally,
 it'd actually *work* to write
 my_ffunc(my_sfunc(my_sfunc(null, 1), 2))

 In general I like this idea but there are some complications, the main
 one being where would the memory be allocated?

In the agg context if called with that context, else
CurrentMemoryContext will do fine.

 The other issue is, in the above scenario
 is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
 call?

Yes, because the only place a nonnull value of the type could have come
from is a my_sfunc call; since it's a pseudotype, we don't allow it on
disk.  (We might also need a hack to prevent the type from being used as
a record-type component ... not sure if that comes for free with being a
pseudotype currently.)

 As for a type for each such aggregate, that seems reasonable to me,
 honestly.

The ugly part is that we'd still need a way for the planner to recognize
this class of types.

 Additionally, we'd have to be
 able to mark the types as being polymorhpic along the same lines as
 anyelement/anyarray.

What for?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  The other issue is, in the above scenario
  is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
  call?
 
 Yes, because the only place a nonnull value of the type could have come
 from is a my_sfunc call; since it's a pseudotype, we don't allow it on
 disk.  (We might also need a hack to prevent the type from being used as
 a record-type component ... not sure if that comes for free with being a
 pseudotype currently.)

Ah, ok.

  As for a type for each such aggregate, that seems reasonable to me,
  honestly.
 
 The ugly part is that we'd still need a way for the planner to recognize
 this class of types.

Hrm, true.  I would agree that they would need more memory than most
aggregates.  It seems likely to me that worst offenders are going to be
of the array_accum type- store all tuples coming in.  Therefore, my
suggestion would be that the memory usage be estimated along those lines
and allow for hashagg when there's enough memory to keep all the tuples
(or rather the portion of the tuple going into the aggregate) in memory
(plus some amount of overhead, maybe 10%).  That doesn't help with how
to get the planner to recognize this class of types though.  I don't
suppose we already have a class framework which the planner uses to
group types which have similar characteristics?

  Additionally, we'd have to be
  able to mark the types as being polymorhpic along the same lines as
  anyelement/anyarray.
 
 What for?

So that the finalfunc can be polymorphic along the lines of my suggested
aaccum_sfunc(anyarray,anyelement) returns anyarray and
aaccum_ffunc(anyarray) returns anyarray.  If the state type isn't
polymorphic then PG won't let me create a function from it which returns
a polymorphic, aiui.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 Additionally, we'd have to be
 able to mark the types as being polymorhpic along the same lines as
 anyelement/anyarray.

 What for?

 So that the finalfunc can be polymorphic along the lines of my suggested
 aaccum_sfunc(anyarray,anyelement) returns anyarray and
 aaccum_ffunc(anyarray) returns anyarray.

Hmm ... there's not any real need for this at the level of the
aggregate, because we resolve a polymorphic aggregate's output type
directly from its input type(s), and we've already established that the
general-purpose agg code doesn't need to be able to infer anything about
the transition data type.  However the function code is going to
complain if you try to declare my_sfunc(aggregate_state) returns
anyarray because that looks ill-formed ... and indeed that kinda kills
the idea of being able to call my_sfunc standalone anyway.

Maybe we need a more radical solution in which the sfunc/ffunc don't
exist as separately callable entities at all.  That would sidestep the
whole need to have a type-system representation for the state data,
as well as eliminate worries about whether we've sufficiently plugged
the security risks of being able to call one of them in an unexpected
context.  Not sure what this should look like in the catalogs though.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Neil Conway
On Wed, 2006-10-11 at 00:51 -0400, Stephen Frost wrote:
 An array_accum aggregate has existed in the documentation for quite
 some time using the inefficient (for larger arrays) array_append
 routine.

My vague recollection is that array_accum() is only defined in the
documentation because there was some debate about how it ought to behave
in some corner cases. I can't recall the details, but it was discussed
when array_accum() was originally proposed by Joe. Does anyone remember?

Minor comments on the patch:

 *** 132,138 
   /programlisting
   
  Here, the actual state type for any aggregate call is the array type
 !having the actual input type as elements.
 /para
   
 para
 --- 132,141 
   /programlisting
   
  Here, the actual state type for any aggregate call is the array type
 !having the actual input type as elements.  Note: array_accum() is now
 !a built-in aggregate which uses a much more efficient mechanism than
 !that which is provided by array_append, prior users of array_accum()
 !may be pleasantly suprised at the marked improvment for larger arrays.
 /para

Not worth documenting, I think.

 *** 15,20 
 --- 15,22 
   #include utils/array.h
   #include utils/builtins.h
   #include utils/lsyscache.h
 + #include utils/memutils.h
 + #include nodes/execnodes.h

Include directives should be sorted roughly alphabetically, with the
exception of postgres.h and system headers.

 + /* Structure, used by aaccum_sfunc and aaccum_ffunc to
 +  * implement the array_accum() aggregate, for storing 
 +  * pointers to the ArrayBuildState for the array we are 
 +  * building and the MemoryContext in which it is being
 +  * built.  Note that this structure is 
 +  * considered an 'anyarray' externally, which is a
 +  * variable-length datatype, and therefore
 +  * must open with an int32 defining the length. */

Gross.

 + /* Make sure we are being called in an aggregate. */
 + if (!fcinfo-context || !IsA(fcinfo-context, AggState))
 + ereport(ERROR,
 + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 +  errmsg(Can not call aaccum_sfunc as a 
 non-aggregate),
 +  errhint(Use the array_accum aggregate)));

Per the error message style guidelines, errmsg() should begin with a
lowercase letter and errhint() should be a complete sentence (so it
needs punctuation, for example).

 + Datum
 + aaccum_ffunc(PG_FUNCTION_ARGS)
 + {
 + aaccum_info *ainfo;
 + 
 + /* Check if we are passed in a NULL */
 + if (PG_ARGISNULL(0)) PG_RETURN_ARRAYTYPE_P(NULL);

There is no guarantee why SQL NULL and PG_RETURN_XYZ(NULL) refer to the
same thing -- use PG_RETURN_NULL() to return a SQL NULL value, or just
make the function strict.

-Neil



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Stephen Frost
* Neil Conway ([EMAIL PROTECTED]) wrote:
 On Wed, 2006-10-11 at 00:51 -0400, Stephen Frost wrote:
   Here, the actual state type for any aggregate call is the array type
  !having the actual input type as elements.  Note: array_accum() is now
  !a built-in aggregate which uses a much more efficient mechanism than
  !that which is provided by array_append, prior users of array_accum()
  !may be pleasantly suprised at the marked improvment for larger arrays.
  /para
 
 Not worth documenting, I think.

Ok.  Either way is fine with me, honestly.

  *** 15,20 
  --- 15,22 
#include utils/array.h
#include utils/builtins.h
#include utils/lsyscache.h
  + #include utils/memutils.h
  + #include nodes/execnodes.h
 
 Include directives should be sorted roughly alphabetically, with the
 exception of postgres.h and system headers.

Ah, yeah, you know I noticed that when I added memutils but wasn't
thinking when I went back and added execnodes (I had been trying to
minimize the number of includes by adding ones I needed one at a time
and seeing if any more were necessary).

  + /* Structure, used by aaccum_sfunc and aaccum_ffunc to
  +  * implement the array_accum() aggregate, for storing 
  +  * pointers to the ArrayBuildState for the array we are 
  +  * building and the MemoryContext in which it is being
  +  * built.  Note that this structure is 
  +  * considered an 'anyarray' externally, which is a
  +  * variable-length datatype, and therefore
  +  * must open with an int32 defining the length. */
 
 Gross.

Indeed. :/

  +   /* Make sure we are being called in an aggregate. */
  +   if (!fcinfo-context || !IsA(fcinfo-context, AggState))
  +   ereport(ERROR,
  +   (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
  +errmsg(Can not call aaccum_sfunc as a 
  non-aggregate),
  +errhint(Use the array_accum aggregate)));
 
 Per the error message style guidelines, errmsg() should begin with a
 lowercase letter and errhint() should be a complete sentence (so it
 needs punctuation, for example).

Ah, ok, guess I should go read those.

  + Datum
  + aaccum_ffunc(PG_FUNCTION_ARGS)
  + {
  +   aaccum_info *ainfo;
  + 
  +   /* Check if we are passed in a NULL */
  +   if (PG_ARGISNULL(0)) PG_RETURN_ARRAYTYPE_P(NULL);
 
 There is no guarantee why SQL NULL and PG_RETURN_XYZ(NULL) refer to the
 same thing -- use PG_RETURN_NULL() to return a SQL NULL value, or just
 make the function strict.

Huh, alright.  I'll probably just change it to PG_RETURN_NULL().

Thanks!

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 * Neil Conway ([EMAIL PROTECTED]) wrote:
 There is no guarantee why SQL NULL and PG_RETURN_XYZ(NULL) refer to the
 same thing -- use PG_RETURN_NULL() to return a SQL NULL value, or just
 make the function strict.

 Huh, alright.  I'll probably just change it to PG_RETURN_NULL().

Unless the function actually *needs* to be non-strict, you should mark
it strict and omit the runtime test for null input altogether.  This
is the general way that it's done in existing backend C functions.
Doing it the other way is needlessly inconsistent (thus distracting
readers) and clutters the code.

(However, now that we support nulls in arrays, meseems a more consistent
definition would be that it allows null inputs and just includes them in
the output.  So probably you do need it non-strict.)

Personally though I'm much more concerned about the state datatype.
As-is I think it's not only ugly but probably a security hole.  If you
are declaring the state type as something other than what it really is
then you have to defend against two sorts of problems: someone being
able to crash the database by calling your function and passing it
something it didn't expect, or crashing the database by using your
function to pass some other function an input it didn't expect.  For
example, since you've got aaccum_sfunc declared to return anyarray when
it returns no such thing, something like array_out(aaccum_sfunc(...))
would trivially crash the backend.  It's possible that the error check
to insist on being called with an AggState context is a sufficient
defense against that, but I feel nervous about it, and would much rather
have a solution that isn't playing fast and loose with the type system.
Particularly if it's going to go into core rather than contrib.

I'm inclined to think that this example demonstrates a deficiency in the
aggregate-function design: there should be a way to declare what we're
really doing.  But I don't know exactly what that should look like.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 (However, now that we support nulls in arrays, meseems a more consistent
 definition would be that it allows null inputs and just includes them in
 the output.  So probably you do need it non-strict.)

This was my intention.

 I'm inclined to think that this example demonstrates a deficiency in the
 aggregate-function design: there should be a way to declare what we're
 really doing.  But I don't know exactly what that should look like.

I agree and would much rather have a clean solution which works with the
design than one which has to work outside it.  When I first was trying
to decide on the state-type I was looking through the PG catalogs for
essentially a complex C type which translated to a void*.  Perhaps
such a type could be added.  Unless that's considered along the lines of
an 'any' type it'd cause problems for the polymorphism aspect.  

Another alternative would be to provide a seperate area for each 
aggregate to put any other information it needs.  This would almost
certainly only be available to C functions but would essentially be a
void* which is provided through the AggState structure but tracked by
the aggregator routines and reset for each aggregate function being 
run.  If that's acceptable, I don't think it'd be all that difficult to
implement.  With that, aaccum_sfunc and aaccum_ffunc would ignore the 
state variable passed to them in favor of their custom structure 
available through fcinfo-AggState (I expect they'd just keep the 
state variable NULL and be marked non-strict, or set it to some constant
if necessary).  The pointer would have to be tracked somewhere and then
copied in/out on each call, but that doesn't seem too difficult to me.
After all, the state variable is already being tracked somewhere, this
would just sit next to it, in my head anyway.

I've got some time this weekend and would be happy to take a shot at
the second proposal if that's generally acceptable.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 Another alternative would be to provide a seperate area for each
 aggregate to put any other information it needs.

I'm not convinced that that's necessary --- the cases we have at hand
suggest that the transition function is perfectly capable of doing the
storage management it wants.  The problem is how to declare to CREATE
AGGREGATE that we're using a transition function of this kind rather
than the stupid functions it expects.  When the function is doing its
own storage management, we'd really rather that nodeAgg.c stayed out of
the way and didn't try to do any datum copying at all; having it copy a
placeholder bytea or anyarray or whatever is really a waste of cycles,
not to mention obscuring what is going on.  If nodeAgg just provided a
pass-by-value Datum, which the transition function could use to store a
pointer to storage it's handling, things would be a lot cleaner.

After a little bit of thought I'm tempted to propose that we handle this
by inventing a new pseudotype called something like aggregate_state,
which'd be declared in the catalogs as pass-by-value, thereby
suppressing useless copying activity in nodeAgg.c.  You'd declare the
aggregate as having stype = aggregate_state, and the transition function
would have signature
sfunc(aggregate_state, ... aggregate-input-type(s) ...)
returns aggregate_state
and the final function of course
ffunc(aggregate_state) returns aggregate-result-type

aggregate_state would have no other uses in the system, and its input
and output functions would raise an error, so type safety is assured
--- there would be no way to call either the sfunc or ffunc manually,
except by passing a NULL value, which should be safe because that's what
they'd expect as the aggregate initial condition.

One advantage of doing it this way is that the planner could be taught
to recognize aggregates with stype = aggregate_state specially, and make
allowance for the fact that they'll use more workspace than meets the
eye.  If we don't have something like this then the planner is likely to
try to use hash aggregation in scenarios where it'd be absolutely fatal
to do so.  I'm not sure whether we'd want to completely forbid hash
aggregation when any stype = aggregate_state is present, but for sure we
want to assume that there's some pretty large amount of per-aggregate
state we don't know about.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] array_accum aggregate

2006-10-12 Thread Tom Lane
I wrote:
 aggregate_state would have no other uses in the system, and its input
 and output functions would raise an error, so type safety is assured
 --- there would be no way to call either the sfunc or ffunc manually,
 except by passing a NULL value, which should be safe because that's what
 they'd expect as the aggregate initial condition.

Um, no, I take that back, unless you want to invent a separate
pseudotype for each such aggregate.  Otherwise you can crash it with

my_ffunc(your_sfunc(null, whatever))

because my_ffunc will be expecting a datastructure different from what
it gets.

Maybe having a check for AggState call context is enough of a defense for
that, but I'm not really satisfied.  Back to the drawing board ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PATCHES] array_accum aggregate

2006-10-10 Thread Stephen Frost
Greetings,

  Please find below a patch to add the array_accum aggregate as a
  built-in using two new C functions defined in array_userfuncs.c.
  These functions simply expose the pre-existing efficient array
  building routines used elsewhere in the backend (accumArrayResult
  and makeArrayResult, specifically).   An array_accum aggregate has
  existed in the documentation for quite some time using the
  inefficient (for larger arrays) array_append routine.  The
  documentation around the example has also been updated to reflect
  the addition of this built-in.

  Documentation and a regression test are also included.

Thanks,

Stephen

Index: doc/src/sgml/func.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/func.sgml,v
retrieving revision 1.343
diff -c -r1.343 func.sgml
*** doc/src/sgml/func.sgml  1 Oct 2006 18:54:31 -   1.343
--- doc/src/sgml/func.sgml  11 Oct 2006 04:38:45 -
***
*** 7851,7856 
--- 7851,7872 
   row
entry
 indexterm
+ primaryarray_accum/primary
+/indexterm
+functionarray_accum(replaceable 
class=parameteranyelement/replaceable)/function
+   /entry
+   entry
+typeanyelement/type
+   /entry
+   entry
+ array of elements of same type as argument type
+   /entry
+   entryan array of all input elements (NULLs, non-nulls, and 
duplicates)/entry
+  /row
+ 
+  row
+   entry
+indexterm
  primaryaverage/primary
 /indexterm
 functionavg(replaceable 
class=parameterexpression/replaceable)/function
Index: doc/src/sgml/xaggr.sgml
===
RCS file: /projects/cvsroot/pgsql/doc/src/sgml/xaggr.sgml,v
retrieving revision 1.33
diff -c -r1.33 xaggr.sgml
*** doc/src/sgml/xaggr.sgml 16 Sep 2006 00:30:16 -  1.33
--- doc/src/sgml/xaggr.sgml 11 Oct 2006 04:38:45 -
***
*** 132,138 
  /programlisting
  
 Here, the actual state type for any aggregate call is the array type
!having the actual input type as elements.
/para
  
para
--- 132,141 
  /programlisting
  
 Here, the actual state type for any aggregate call is the array type
!having the actual input type as elements.  Note: array_accum() is now
!a built-in aggregate which uses a much more efficient mechanism than
!that which is provided by array_append, prior users of array_accum()
!may be pleasantly suprised at the marked improvment for larger arrays.
/para
  
para
Index: src/backend/utils/adt/array_userfuncs.c
===
RCS file: /projects/cvsroot/pgsql/src/backend/utils/adt/array_userfuncs.c,v
retrieving revision 1.20
diff -c -r1.20 array_userfuncs.c
*** src/backend/utils/adt/array_userfuncs.c 14 Jul 2006 14:52:23 -  
1.20
--- src/backend/utils/adt/array_userfuncs.c 11 Oct 2006 04:38:46 -
***
*** 15,20 
--- 15,22 
  #include utils/array.h
  #include utils/builtins.h
  #include utils/lsyscache.h
+ #include utils/memutils.h
+ #include nodes/execnodes.h
  
  
  
/*-
***
*** 399,404 
--- 401,516 
PG_RETURN_ARRAYTYPE_P(result);
  }
  
+ /* Structure, used by aaccum_sfunc and aaccum_ffunc to
+  * implement the array_accum() aggregate, for storing 
+  * pointers to the ArrayBuildState for the array we are 
+  * building and the MemoryContext in which it is being
+  * built.  Note that this structure is 
+  * considered an 'anyarray' externally, which is a
+  * variable-length datatype, and therefore
+  * must open with an int32 defining the length. */
+ typedef struct {
+   int32vl_len;
+   ArrayBuildState *astate;
+   MemoryContextarrctx;
+ } aaccum_info;
+ 
+ 
/*-
+  * aaccum_sfunc :
+  *  State transistion function for the array_accum() aggregate,
+  *  efficiently builds an in-memory array by working in blocks and
+  *  minimizing realloc()'s and copying of the data in general.
+  *  Creates a seperate memory context attached to the AggContext into
+  *  which the array is built.  That context is free'd when the final
+  *  function is called (aaccum_ffunc).  accumArrayResult() does all
+  *  the heavy lifting here, this is really just a glue function.
+  *
+  */
+ Datum
+ aaccum_sfunc(PG_FUNCTION_ARGS)
+ {
+   aaccum_info *ainfo;
+   AggState*aggstate;
+ 
+   /* Make sure we are being called in an aggregate. */
+   if (!fcinfo-context || !IsA(fcinfo-context, AggState))
+