Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Martijn van Oosterhout
On Thu, Oct 12, 2006 at 06:58:52PM -0400, Tom Lane wrote:
 I wrote:
  aggregate_state would have no other uses in the system, and its input
  and output functions would raise an error, so type safety is assured
  --- there would be no way to call either the sfunc or ffunc manually,
  except by passing a NULL value, which should be safe because that's what
  they'd expect as the aggregate initial condition.
 
 Um, no, I take that back, unless you want to invent a separate
 pseudotype for each such aggregate.  Otherwise you can crash it with

snip

What this really calls for is a type that users are forbidden to
interact with directly. Basically, the type may only be used by C
functions and such C functions may not appear in an SQL query.

Seems the only way to safely deal with datums you don't know the
representation of.

The name internal would be nice, but it's taken :(

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes:
 What this really calls for is a type that users are forbidden to
 interact with directly. Basically, the type may only be used by C
 functions and such C functions may not appear in an SQL query.

That's not really the flavor of solution I'd like to have.  Ideally,
it'd actually *work* to write

my_ffunc(my_sfunc(my_sfunc(null, 1), 2))

and get the same result as aggregating over the values 1 and 2.  The
trick is to make sure that my_sfunc and my_ffunc can only be used
together.  Maybe we do need a type for each such aggregate ...

In any case, internal isn't quite the right model, because with that,
the type system is basically disclaiming all knowledge of the data's
properties.  With an aggregate_state datatype, the type system would
be asserting that it's OK to use this type (or more accurately, these
functions) in an aggregate.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 That's not really the flavor of solution I'd like to have.  Ideally,
 it'd actually *work* to write
 
   my_ffunc(my_sfunc(my_sfunc(null, 1), 2))
 
 and get the same result as aggregating over the values 1 and 2.  The
 trick is to make sure that my_sfunc and my_ffunc can only be used
 together.  Maybe we do need a type for each such aggregate ...

In general I like this idea but there are some complications, the main
one being where would the memory be allocated?  I guess we could just
always use the QueryContext.  The other issue is, in the above scenario
is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
call?  Normally it's unacceptable to modify your input, aggregates get a
special exception when called as an aggregate, but in this case you'd
have to copy the entire custom structure underneath, I'd think.

As for a type for each such aggregate, that seems reasonable to me,
honestly.  I'd like for the type creation to be reasonably simple
though, if possible.  ie: could we just provide NULLs for the
input/output functions instead of having to implement functions that
just return an error?  Or perhaps have a polymorphic function which can
be reused that just returns an error.  Additionally, we'd have to be
able to mark the types as being polymorhpic along the same lines as
anyelement/anyarray.  Hopefully that's already easy, but if not it'd be
nice if it could be made easy...

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 That's not really the flavor of solution I'd like to have.  Ideally,
 it'd actually *work* to write
 my_ffunc(my_sfunc(my_sfunc(null, 1), 2))

 In general I like this idea but there are some complications, the main
 one being where would the memory be allocated?

In the agg context if called with that context, else
CurrentMemoryContext will do fine.

 The other issue is, in the above scenario
 is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
 call?

Yes, because the only place a nonnull value of the type could have come
from is a my_sfunc call; since it's a pseudotype, we don't allow it on
disk.  (We might also need a hack to prevent the type from being used as
a record-type component ... not sure if that comes for free with being a
pseudotype currently.)

 As for a type for each such aggregate, that seems reasonable to me,
 honestly.

The ugly part is that we'd still need a way for the planner to recognize
this class of types.

 Additionally, we'd have to be
 able to mark the types as being polymorhpic along the same lines as
 anyelement/anyarray.

What for?

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Stephen Frost
* Tom Lane ([EMAIL PROTECTED]) wrote:
 Stephen Frost [EMAIL PROTECTED] writes:
  The other issue is, in the above scenario
  is it acceptable to modify the result of my_sfunc(null, 1) in the ,2
  call?
 
 Yes, because the only place a nonnull value of the type could have come
 from is a my_sfunc call; since it's a pseudotype, we don't allow it on
 disk.  (We might also need a hack to prevent the type from being used as
 a record-type component ... not sure if that comes for free with being a
 pseudotype currently.)

Ah, ok.

  As for a type for each such aggregate, that seems reasonable to me,
  honestly.
 
 The ugly part is that we'd still need a way for the planner to recognize
 this class of types.

Hrm, true.  I would agree that they would need more memory than most
aggregates.  It seems likely to me that worst offenders are going to be
of the array_accum type- store all tuples coming in.  Therefore, my
suggestion would be that the memory usage be estimated along those lines
and allow for hashagg when there's enough memory to keep all the tuples
(or rather the portion of the tuple going into the aggregate) in memory
(plus some amount of overhead, maybe 10%).  That doesn't help with how
to get the planner to recognize this class of types though.  I don't
suppose we already have a class framework which the planner uses to
group types which have similar characteristics?

  Additionally, we'd have to be
  able to mark the types as being polymorhpic along the same lines as
  anyelement/anyarray.
 
 What for?

So that the finalfunc can be polymorphic along the lines of my suggested
aaccum_sfunc(anyarray,anyelement) returns anyarray and
aaccum_ffunc(anyarray) returns anyarray.  If the state type isn't
polymorphic then PG won't let me create a function from it which returns
a polymorphic, aiui.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] [PATCHES] array_accum aggregate

2006-10-13 Thread Tom Lane
Stephen Frost [EMAIL PROTECTED] writes:
 Additionally, we'd have to be
 able to mark the types as being polymorhpic along the same lines as
 anyelement/anyarray.

 What for?

 So that the finalfunc can be polymorphic along the lines of my suggested
 aaccum_sfunc(anyarray,anyelement) returns anyarray and
 aaccum_ffunc(anyarray) returns anyarray.

Hmm ... there's not any real need for this at the level of the
aggregate, because we resolve a polymorphic aggregate's output type
directly from its input type(s), and we've already established that the
general-purpose agg code doesn't need to be able to infer anything about
the transition data type.  However the function code is going to
complain if you try to declare my_sfunc(aggregate_state) returns
anyarray because that looks ill-formed ... and indeed that kinda kills
the idea of being able to call my_sfunc standalone anyway.

Maybe we need a more radical solution in which the sfunc/ffunc don't
exist as separately callable entities at all.  That would sidestep the
whole need to have a type-system representation for the state data,
as well as eliminate worries about whether we've sufficiently plugged
the security risks of being able to call one of them in an unexpected
context.  Not sure what this should look like in the catalogs though.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org