Re: Improving on MAX_CONVERSION_GROWTH

Robert Haas Fri, 27 Sep 2019 07:55:06 -0700

On Tue, Sep 24, 2019 at 5:15 PM Tom Lane <t...@sss.pgh.pa.us> wrote:
> In any case, it seems likely that we could end up with a
> multiplier of 1, 2, or 3 rather than 4 in just about every
> case of practical interest.  That sure seems like a win
> when converting long strings.


+1. From what I've seen, I'd say this is a significant practical
problem for people who are trying to store large blobs of data in the
database.

A lot of that is because they hit the 1GB allocation limit, and I
wonder whether we shouldn't be trying harder to avoid imposing that
limit in multiple places. It's reasonable - and necessary - to impose
a limit on the size of an individual datum, but when that same limit
is imposed on other things, like the worst-case size of the encoding
conversion, the size of an individual message sent via the wire
protocol, etc., you end up with a situation where users have trouble
predicting what the behavior is going to be. >=1GB definitely won't
work, but it'll probably break at some point before you even get that
far depending on a bunch of complex factors that are hard to
understand, not really documented, and mostly the result of applying
1GB limit to every single memory allocation across the whole backend
without really thinking about what that does to the user-visible
behavior.

Now, that's not to say we should abandon MaxAllocSize, which I agree
serves as a useful backstop. But IMHO it would be smart to start with
the desired user-facing behavior -- we want to support datums up to X
size -- and then consider how we can get there while maintaining
MaxAllocSize as a general-purpose backstop. Our current strategy seems
to be mostly the reverse: write the code the way that feels natural,
enforce MaxAllocSize everywhere, and if that breaks things for a user,
well then that means - by definition - that the user was trying to do
something we don't support.

One approach I think we should consider is, for larger strings,
actually scan the string and figure out how much memory we're going to
need for the conversion and then allocate exactly that amount (and
fail if it's >=1GB). An extra scan over the string is somewhat costly,
but allocating hundreds of megabytes of memory on the theory that we
could hypothetically have needed it is costly in different way. Memory
is more abundant today than it's ever been, but there are still plenty
of systems where a couple of extra allocations in the multi-hundred-MB
range can make the whole thing fall over. And even if it doesn't make
the whole thing fall over, the CPU efficiency of avoiding an extra
pass over the string really ought to be compared with the memory
efficiency of allocating extra storage.  Getting down from a
worst-case multiple of 4 to 2 is a great idea, but it still means that
converting a 100MB string will allocate 200MB when what you need will
very often be between 100MB and 105MB.  That's not an insignificant
cost, even though it's much better than allocating 400MB.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Improving on MAX_CONVERSION_GROWTH

Reply via email to