Re: [HACKERS] Combining Aggregates

David Rowley Sun, 17 Jan 2016 18:27:12 -0800

On 18 January 2016 at 14:36, Haribabu Kommi <[email protected]>
wrote:

> On Sat, Jan 16, 2016 at 12:00 PM, David Rowley
> <[email protected]> wrote:
> > On 16 January 2016 at 03:03, Robert Haas <[email protected]> wrote:
> >>
> >> On Tue, Dec 29, 2015 at 7:39 PM, David Rowley
> >> <[email protected]> wrote:
> >> >> No, the idea I had in mind was to allow it to continue to exist in
> the
> >> >> expanded format until you really need it in the varlena format, and
> >> >> then serialize it at that point.  You'd actually need to do the
> >> >> opposite: if you get an input that is not in expanded format, expand
> >> >> it.
> >> >
> >> > Admittedly I'm struggling to see how this can be done. I've spent a
> good
> >> > bit
> >> > of time analysing how the expanded object stuff works.
> >> >
> >> > Hypothetically let's say we can make it work like:
> >> >
> >> > 1. During partial aggregation (finalizeAggs = false), in
> >> > finalize_aggregates(), where we'd normally call the final function,
> >> > instead
> >> > flatten INTERNAL states and store the flattened Datum instead of the
> >> > pointer
> >> > to the INTERNAL state.
> >> > 2. During combining aggregation (combineStates = true) have all the
> >> > combine
> >> > functions written in such a ways that the INTERNAL states expand the
> >> > flattened states before combining the aggregate states.
> >> >
> >> > Does that sound like what you had in mind?
> >>
> >> More or less.  But what I was really imagining is that we'd get rid of
> >> the internal states and replace them with new datatypes built to
> >> purpose.  So, for example, for string_agg(text, text) you could make a
> >> new datatype that is basically a StringInfo.  In expanded form, it
> >> really is a StringInfo.  When you flatten it, you just get the string.
> >> When somebody expands it again, they again have a StringInfo.  So the
> >> RW pointer to the expanded form supports append cheaply.
> >
> >
> > That sounds fine in theory, but where and how do you suppose we determine
> > which expand function to call? Nothing exists in the catalogs to decide
> this
> > currently.
>
> I am thinking of transition function returns and accepts the StringInfoData
> instead of PolyNumAggState internal data for int8_avg_accum for example.
>

hmm, so wouldn't that mean that the transition function would need to (for
each input tuple):

1. Parse that StringInfo into tokens.
2. Create a new aggregate state object.
3. Populate the new aggregate state based on the tokenised StringInfo, this
would perhaps require that various *_in() functions are called on each
token.
4. Add the new tuple to the aggregate state.
5. Build a new StringInfo based on the aggregate state modified in 4.

?

Currently the transition function only does 4, and performs 2 only if it's
the first Tuple.

Is that what you mean? as I'd say that would slow things down significantly!

To get a gauge on how much more CPU work that would be for some aggregates,
have a look at how simple int8_avg_accum() is currently when we have
HAVE_INT128 defined. For the case of AVG(BIGINT) we just really have:

state->sumX += newval;
state->N++;

The above code is step 4 only. So unless I've misunderstood you, that would
need to turn into steps 1-5 above. Step 4 here is probably just a handful
of instructions right now, but adding code for steps 1,2,3 and 5 would turn
that into hundreds.

I've been trying to avoid any overhead by adding the serializeStates flag
to make_agg() so that we can maintain the same performance when we're just
passing internal states around in the same process. This keeps the
conversions between internal state and serialised state to a minimum.

The StringInfoData is formed with the members of the PolyNumAggState
> structure data. The input given StringInfoData is transformed into
> PolyNumAggState data and finish the calculation and again form the
> StringInfoData and return. Similar changes needs to be done for final
> functions input type also. I am not sure whether this approach may have
> some impact on performance?

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

Reply via email to