2018-03-21 13:56 GMT+01:00 Pavel Stehule <pavel.steh...@gmail.com>: > > > 2018-03-21 13:03 GMT+01:00 Gary Cowell <gary.cow...@gmail.com>: > >> We are trying to implement postgresql code to load a large object into >> a postgresql bytea in chunks to avoid loading the file into memory in >> the client. >> >> First attempt was to do >> >> update build_attachment set chunk = chunk || newdata ; >> >> this did not scale and got significantly slower after 4000-5000 updates. >> >> The chunks are 4K in size, and I'm testing with a 128MB input file, >> requiring 32,774 chunk updates. >> >> Next, I tried creating an aggregate, thus: >> >> (taken from stackoverflow) >> >> CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea); >> >> changed the code to insert the chunks to a temporary table : >> >> create temporary table build_attachment (seq bigserial primary key, >> chunk bytea ) on commit drop; >> >> we then insert our 4K chunks to this, which takes very little time (20 >> seconds for the 32,774 inserts) >> >> Here's an example though of trying to select the aggregate: >> >> gary=> \timing >> Timing is on. >> gary=> select bytea_agg(chunk order by seq) from build_attachment >> where seq < 4000 \g output >> Time: 13372.843 ms >> gary=> select bytea_agg(chunk order by seq) from build_attachment >> where seq < 8000 \g output >> Time: 54447.541 ms >> gary=> select bytea_agg(chunk order by seq) from build_attachment >> where seq < 16000 \g output >> Time: 582219.773 ms >> >> So those partial aggregates completed in somewhat acceptable times but ... >> >> gary=> select bytea_agg(chunk order by seq) from build_attachment >> where seq < 32000 \g output >> this one hadn't completed in an hour - the PostgreSQL connection >> process for my connection on the server goes to 100% CPU and stays >> there, not using much RAM, not doing much IO, oddly >> >> EXPLAINing these aggregate selects doesn't show anything useful. >> >> Am I doomed to not be able to update a bytea this way? Is there some >> way I can tune this? >> >> > bytea is immutable object without preallocation - so update of big tasks > is very expensive. > > I am thinking so using LO API and then transformation to bytea will be > much more effective > > \lo_import path > > you can use > > CREATE OR REPLACE FUNCTION attachment_to_bytea(attachment oid) > RETURNS bytea AS $$ > DECLARE > fd integer; > size integer; > BEGIN > fd := lo_open(attachment, 262144); > size := lo_lseek(fd, 0, 2); > PERFORM lo_lseek(fd, 0, 0); > RETURN loread(fd, size); > EXCEPTION WHEN undefined_object THEN > PERFORM lo_close(fd); > RETURN NULL; > END; > $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path = > 'pg_catalog'; > > function > > import cca 44MB was in few seconds >
there is native function lo_get https://www.postgresql.org/docs/current/static/lo-funcs.html > Regards > > Pavel > >