Re: [HACKERS] Combining Aggregates

David Rowley Mon, 18 Jan 2016 03:34:07 -0800

On 18 January 2016 at 16:44, Robert Haas <[email protected]> wrote:

> On Sun, Jan 17, 2016 at 9:26 PM, David Rowley
> <[email protected]> wrote:
> > hmm, so wouldn't that mean that the transition function would need to
> (for
> > each input tuple):
> >
> > 1. Parse that StringInfo into tokens.
> > 2. Create a new aggregate state object.
> > 3. Populate the new aggregate state based on the tokenised StringInfo,
> this
> > would perhaps require that various *_in() functions are called on each
> > token.
> > 4. Add the new tuple to the aggregate state.
> > 5. Build a new StringInfo based on the aggregate state modified in 4.
> >
> > ?
>
> I don't really know what you mean by parse the StringInfo into tokens.
> The whole point of the expanded-object interface is to be able to keep
> things in an expanded internal form so that you *don't* have to
> repeatedly construct and deconstruct internal data structures.

That was a response to Haribabu proposal, although perhaps I misunderstood
that. However I'm not sure how a PolyNumAggState could be converted to a
string and back again without doing any string parsing.

I worked up an example of this approach using string_agg(), which I
> attach here.  This changes the transition type of string_agg() from
> internal to text.  The same code would work for bytea_string_agg(),
> which would allow removal of some other code, but this patch doesn't
>
do that, because the point of this is to elucidate the approach.
>
>
Many thanks for working up that patch. I was clearly missing what you meant
previously. I understand it much better now. Thank you for taking the time
on that.

> In my tests, this seems to be slightly slower than what we're doing
> today; worst of all, it adds a handful of cycles to
> advance_transition_function() even when the aggregate is not an
> expanded object or, indeed, not even pass-by-reference.  Some of this
> might be able to be fixed by a little massaging - in particular,
> DatumIsReadWriteExpandedObject() seems like it could be partly or
> entirely inlined, and maybe there's some other way to improve the
> coding here.
>

It also seems that an expanded object requires a new memory context which
must be malloc()d and free()d. This has added quite an overhead in my
testing. I assume that we must be doing that so that we can ensure that all
memory is properly free()d once we're done with the expanded-object.

create table ab(a int, b text);
insert into ab select x,'aaaaaaaaaaaaaaaaaaaaaaaaaaa' from
generate_Series(1,1000000) x(x);
set work_mem='1GB';
vacuum analyze ab;

explain analyze select a%1000000,length(string_agg(b,',')) from ab group by
1;

Patched
1521.045 ms
1515.905 ms
1530.920 ms

Master
932.457 ms
959.285 ms
991.021 ms

Although performance of the patched version is closer to master when we
have less groups, I felt it necessary to show the extreme case. I feel this
puts a bit of a dampener on the future of this idea, as I've previously had
patches rejected for adding 2-5% on planning time alone, adding over 50%
execution time seems like a dead-end.

I've run profiles on this and malloc() and free() are both top of the
profile with the patched version with about 30% CPU time each.

> Generally, I think finding a way to pass expanded objects through
> nodeAgg.c would be a good thing to pursue, if we can make it work.
> The immediate impetus for changing things this way would be that we
> wouldn't need to add a mechanism for serializing and deserializing
> internal functions just to pass around partial aggregates.  But
> there's another advantage, too: right now,
> advance_transition_function() does a lot of data copying to move data
> from per-call context to the per-aggregate context.  When an expanded
> object is in use, this can be skipped.

The part I quite liked about the serialize/deserialize is that there's no
need to add any overhead at all for serializing and deserializing the
states when the aggregation is done in a single backend process. We'd
simply just have the planner pass the make_agg()'s serializeStates as false
when we're working within a single backend. This does not appear possible
with your proposed implementation, since it makes changes to each
transition function. It is my understanding that we normally bend over
backwards with new code to try and stop any performance regression. I'm not
quite sure the expanded-object idea works well in this regard, but I do
agree your approach seems neater. I just don't want to waste my time
writing all the code required to replace all INTERNAL aggregate states when
I know the performance regression is going to be unacceptable.

I also witnessed another regression with your patch when testing on another
machine. It caused the plan to change to a HashAgg instead of GroupAgg
causing a significant slowdown.

Unpatched

# explain analyze select a%1000000,length(string_agg(b,',')) from ab group
by 1;
                                                        QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------
 GroupAggregate  (cost=119510.84..144510.84 rows=1000000 width=32) (actual
time=538.938..1015.278 rows=1000000 loops=1)
   Group Key: ((a % 1000000))
   ->  Sort  (cost=119510.84..122010.84 rows=1000000 width=32) (actual
time=538.917..594.194 rows=1000000 loops=1)
         Sort Key: ((a % 1000000))
         Sort Method: quicksort  Memory: 102702kB
         ->  Seq Scan on ab  (cost=0.00..19853.00 rows=1000000 width=32)
(actual time=0.016..138.964 rows=1000000 loops=1)
 Planning time: 0.146 ms
 Execution time: 1047.511 ms

Patched
# explain analyze select a%1000000,length(string_agg(b,',')) from ab group
by 1;
                                                       QUERY PLAN

------------------------------------------------------------------------------------------------------------------------
 HashAggregate  (cost=24853.00..39853.00 rows=1000000 width=32) (actual
time=8072.346..144424.872 rows=1000000 loops=1)
   Group Key: (a % 1000000)
   ->  Seq Scan on ab  (cost=0.00..19853.00 rows=1000000 width=32) (actual
time=0.025..481.332 rows=1000000 loops=1)
 Planning time: 0.164 ms
 Execution time: 263288.332 ms

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] Combining Aggregates

Reply via email to