On Tue, Sep 24, 2019 at 5:15 PM Tom Lane <t...@sss.pgh.pa.us> wrote: > In any case, it seems likely that we could end up with a > multiplier of 1, 2, or 3 rather than 4 in just about every > case of practical interest. That sure seems like a win > when converting long strings.
+1. From what I've seen, I'd say this is a significant practical problem for people who are trying to store large blobs of data in the database. A lot of that is because they hit the 1GB allocation limit, and I wonder whether we shouldn't be trying harder to avoid imposing that limit in multiple places. It's reasonable - and necessary - to impose a limit on the size of an individual datum, but when that same limit is imposed on other things, like the worst-case size of the encoding conversion, the size of an individual message sent via the wire protocol, etc., you end up with a situation where users have trouble predicting what the behavior is going to be. >=1GB definitely won't work, but it'll probably break at some point before you even get that far depending on a bunch of complex factors that are hard to understand, not really documented, and mostly the result of applying 1GB limit to every single memory allocation across the whole backend without really thinking about what that does to the user-visible behavior. Now, that's not to say we should abandon MaxAllocSize, which I agree serves as a useful backstop. But IMHO it would be smart to start with the desired user-facing behavior -- we want to support datums up to X size -- and then consider how we can get there while maintaining MaxAllocSize as a general-purpose backstop. Our current strategy seems to be mostly the reverse: write the code the way that feels natural, enforce MaxAllocSize everywhere, and if that breaks things for a user, well then that means - by definition - that the user was trying to do something we don't support. One approach I think we should consider is, for larger strings, actually scan the string and figure out how much memory we're going to need for the conversion and then allocate exactly that amount (and fail if it's >=1GB). An extra scan over the string is somewhat costly, but allocating hundreds of megabytes of memory on the theory that we could hypothetically have needed it is costly in different way. Memory is more abundant today than it's ever been, but there are still plenty of systems where a couple of extra allocations in the multi-hundred-MB range can make the whole thing fall over. And even if it doesn't make the whole thing fall over, the CPU efficiency of avoiding an extra pass over the string really ought to be compared with the memory efficiency of allocating extra storage. Getting down from a worst-case multiple of 4 to 2 is a great idea, but it still means that converting a 100MB string will allocate 200MB when what you need will very often be between 100MB and 105MB. That's not an insignificant cost, even though it's much better than allocating 400MB. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company