Re: [HACKERS] pg_dump / copy bugs with "big lines" ?

Tomas Vondra Fri, 02 Sep 2016 17:22:02 -0700

Hi,

I've subscribed to review this patch in the current CF, so let me startwith a brief summary of the thread as it started more than a year ago.

In general the thread is about hitting the 1GB allocation size limit(and 32-bit length in FE/BE protocol) in various ways:



1) attributes are smaller than 1GB, but row exceeds the limit

This causes issues in COPY/pg_dump, as we allocate a single StringInfofor the whole row, and send it as a single message (which includes int32length, just like most FE/BE messages).



2) attribute values that get over the 1GB due to encoding

For example it's possible to construct values that are OK in hexencoding but >1GB in escape (and vice versa). Those values get storedjust fine, but there's no way to dump / COPY them. And if you happen tohave both, you can't do pg_dump :-/

I think it's OK not to be able to store extremely large values, but onlyif we make it predictable (ideally by rejecting the value before storingin in the database). The cases discussed in the thread are particularlyannoying exactly because we do the opposite - we allow the value to bestored, but then fail when retrieving the value or trying to backup thedatabase (which is particularly nasty, IMNSHO).

So although we don't have that many reports about this, it'd be nice toimprove the behavior a bit.

The trouble is, this rabbit hole is fairly deep - wherever we palloc ordetoast a value, we're likely to hit those issues with wide values/rows.

Honestly, I don't think it's feasible to make all the code paths workwith such wide values/rows - as TL points out, particularly with thewide values it would be enormously invasive.

So the patch aims to fix just the simplest case, i.e. when the valuesare smaller than 1GB but the total row is larger. It does so by allowingStringInfo to exceed the MaxAllocSize in special cases (currently onlyCOPY FROM/TO), and using MCXT_ALLOC_HUGE when forming the heap tuple (tomake it possible to load the data).

It seems to work and I do think it's a reasonable first step to makethings work.


A few minor comments regarding the patch:

1) CopyStartSend seems pretty pointless - It only has one function callin it, and is called on exactly one place (and all other places simplycall allowLongStringInfo directly). I'd get rid of this function andreplace the call in CopyOneRowTo(() with allowLongStringInfo().


2) allowlong seems awkward, allowLong or allow_long would be better

3) Why does resetStringInfo reset the allowLong flag? Are there anycases when we want/need to forget the flag value? I don't think so, solet's simply not reset it and get rid of the allowLongStringInfo()calls. Maybe it'd be better to invent a new makeLongStringInfo() methodinstead, which would make it clear that the flag value is permanent.

4) HandleParallelMessage needs a tweak, as it uses msg->len in a logmessage, but with '%d' and not '%ld' (as needed after changing the typeto Size).

I however wonder whether it wouldn't be better to try Robert's proposal,i.e. not building the huge StringInfo for the whole row, but sending theattribute data directly. I don't mean to send messages for eachattribute - the current FE/BE protocol does not allow that (it says thateach 'd' message is a whole row), but just sending the network messagein smaller chunks.

That would make the StringInfo changes unnecessary, reduce the memoryconsumption considerably (right now we need 2x the memory, and we knowwe're dealing with gigabytes of data).

Of course, it'd require significant changes of how copy works internally(instead of accumulating data for the whole row, we'd have to send themright in smaller chunks). Which would allow us getting rid of theStringInfo changes, but we'd not be able to backpatch this.

Regarding interpreting the Int32 length field in FE/BE protocol asunsigned, I'm a bit worried it might qualify as breaking the protocol.It's true we don't really say whether it's signed or unsigned, and wehandle it differently depending on the message type, but I wonder howmany libraries simply use int32. OTOH those clients are unlikely tohandle even the 2GB we might send them without breaking the protocol, soI guess this is fine. And 4GB seems like the best we can do.



regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump / copy bugs with "big lines" ?

Reply via email to