On 1.4.2014 19:08, Tom Lane wrote:
> Tomas Vondra <t...@fuzzy.cz> writes:
>> I've been thinking about it a bit more and maybe the doubling is not
>> that bad idea, after all.
> It is not. There's a reason why that's our standard behavior.
>> The "current" array_agg however violates some of the assumptions
>> mentioned above, because it
>> (1) pre-allocates quite large number of items (64) at the beginning,
>> resulting in ~98% of memory being "wasted" initially
>> (2) allocates one memory context per group, with 8kB initial size, so
>> you're actually wasting ~99.999% of the memory
>> (3) thanks to the dedicated memory contexts, the doubling is pretty
>> much pointless up until you cross the 8kB boundary
>> IMNSHO these are the issues we really should fix - by lowering the
>> initial element count (64->4) and using a single memory context.
> The real issue here is that all those decisions are perfectly
> reasonable if you expect that a large number of values will get
> aggregated --- and even if you don't expect that, they're cheap
> insurance in simple cases.
Yes, if you expect a large number of values it's perfectly valid. But
what if those assumptions are faulty? Is it OK to fail because of OOM
even for trivial queries breaking those assumptions?
I'd like to improve that and make this work without impacting the
queries that match the assumptions.
> It only gets to be a problem if you have a lot of concurrent
> executions of array_agg, such as in a grouped-aggregate query. You're
> essentially arguing that in the grouped-aggregate case, it's better
> to optimize on the assumption that only a very small number of values
> will get aggregated (per hash table entry) --- which is possibly
> reasonable, but the argument that it's okay to pessimize the behavior
> for other cases seems pretty flimsy from here.
I'm not saying it's okay to pessimize the behavior of other cases. I
admit decreasing the initial size from 64 to only 4 items may be too
aggressive - let's measure the difference and tweak the number
accordingly. Heck, even 64 items is way lower than the 8kB utilized by
each per-group memory context right now.
> Actually, though, the patch as given outright breaks things for both
> the grouped and ungrouped cases, because the aggregate no longer
> releases memory when it's done. That's going to result in memory
> bloat not savings, in any situation where the aggregate is executed
Really? Can you provide query for which the current and patched code
Looking at array_agg_finalfn (which is the final function for
array_agg), I see it does this:
* Make the result. We cannot release the ArrayBuildState because
* sometimes aggregate final functions are re-executed. Rather, it
* is nodeAgg.c's responsibility to reset the aggcontext when it's
* safe to do so.
result = makeMdArrayResult(state, 1, dims, lbs,
i.e. it sets release=false. So I fail to see how the current code
behaves differently from the patch? If it wasn't releasing the memory
before, it's not releasing memory before.
In both cases the memory gets released when the aggcontext gets released
in nodeAgg.c (as explained by the comment in the code).
However, after looking at the code now, I think it's actually wrong to
remove the MemoryContextDelete from makeMdArrayResult(). It does not
make any difference to the array_agg (which sets release=false anyway),
but it makes difference to functions calling makeArrayResult() as that
uses release=true. That however is not called by aggregate functions,
but from regexp_split_to_array, xpath and subplans.
> I think a patch that stood a chance of getting committed would need to
> detect whether the aggregate was being called in simple or grouped
> contexts, and apply different behaviors in the two cases. And you
> can't just remove the sub-context without providing some substitute
> cleanup mechanism. Possibly you could keep the context but give it
> some much-more-miserly allocation parameters in the grouped case.
I don't think the patch removes any cleanup mechanism (see above), but
maybe I'm wrong.
Yes, tweaking the parameters depending on the aggregate - whether it's
simple or grouped, or maybe an estimate number of elements in a group -
seems like a good idea.
Sent via pgsql-hackers mailing list (firstname.lastname@example.org)
To make changes to your subscription: