2018-03-21 13:03 GMT+01:00 Gary Cowell <gary.cow...@gmail.com>: > We are trying to implement postgresql code to load a large object into > a postgresql bytea in chunks to avoid loading the file into memory in > the client. > > First attempt was to do > > update build_attachment set chunk = chunk || newdata ; > > this did not scale and got significantly slower after 4000-5000 updates. > > The chunks are 4K in size, and I'm testing with a 128MB input file, > requiring 32,774 chunk updates. > > Next, I tried creating an aggregate, thus: > > (taken from stackoverflow) > > CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea); > > changed the code to insert the chunks to a temporary table : > > create temporary table build_attachment (seq bigserial primary key, > chunk bytea ) on commit drop; > > we then insert our 4K chunks to this, which takes very little time (20 > seconds for the 32,774 inserts) > > Here's an example though of trying to select the aggregate: > > gary=> \timing > Timing is on. > gary=> select bytea_agg(chunk order by seq) from build_attachment > where seq < 4000 \g output > Time: 13372.843 ms > gary=> select bytea_agg(chunk order by seq) from build_attachment > where seq < 8000 \g output > Time: 54447.541 ms > gary=> select bytea_agg(chunk order by seq) from build_attachment > where seq < 16000 \g output > Time: 582219.773 ms > > So those partial aggregates completed in somewhat acceptable times but ... > > gary=> select bytea_agg(chunk order by seq) from build_attachment > where seq < 32000 \g output > this one hadn't completed in an hour - the PostgreSQL connection > process for my connection on the server goes to 100% CPU and stays > there, not using much RAM, not doing much IO, oddly > > EXPLAINing these aggregate selects doesn't show anything useful. > > Am I doomed to not be able to update a bytea this way? Is there some > way I can tune this? > > bytea is immutable object without preallocation - so update of big tasks is very expensive.
I am thinking so using LO API and then transformation to bytea will be much more effective \lo_import path you can use CREATE OR REPLACE FUNCTION attachment_to_bytea(attachment oid) RETURNS bytea AS $$ DECLARE fd integer; size integer; BEGIN fd := lo_open(attachment, 262144); size := lo_lseek(fd, 0, 2); PERFORM lo_lseek(fd, 0, 0); RETURN loread(fd, size); EXCEPTION WHEN undefined_object THEN PERFORM lo_close(fd); RETURN NULL; END; $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path = 'pg_catalog'; function import cca 44MB was in few seconds Regards Pavel