Re: badly scaling performance with appending to bytea

Pavel Stehule Wed, 21 Mar 2018 06:01:42 -0700

2018-03-21 13:56 GMT+01:00 Pavel Stehule <pavel.steh...@gmail.com>:

>
>
> 2018-03-21 13:03 GMT+01:00 Gary Cowell <gary.cow...@gmail.com>:
>
>> We are trying to implement postgresql code to load a large object into
>> a postgresql bytea in chunks to avoid loading the file into memory in
>> the client.
>>
>> First attempt was to do
>>
>> update build_attachment set chunk = chunk || newdata ;
>>
>> this did not scale and got significantly slower after 4000-5000 updates.
>>
>> The chunks are 4K in size, and I'm testing with a 128MB input file,
>> requiring 32,774 chunk updates.
>>
>> Next, I tried creating an aggregate, thus:
>>
>> (taken from stackoverflow)
>>
>> CREATE AGGREGATE bytea_agg(bytea) (SFUNC=byteacat,STYPE=bytea);
>>
>> changed the code to insert the chunks to a temporary table :
>>
>> create temporary table build_attachment (seq bigserial primary key,
>> chunk bytea ) on commit drop;
>>
>> we then insert our 4K chunks to this, which takes very little time (20
>> seconds for the 32,774 inserts)
>>
>> Here's an example though of trying to select the aggregate:
>>
>> gary=> \timing
>> Timing is on.
>> gary=> select bytea_agg(chunk order by seq) from build_attachment
>> where seq < 4000 \g output
>> Time: 13372.843 ms
>> gary=> select bytea_agg(chunk order by seq) from build_attachment
>> where seq < 8000 \g output
>> Time: 54447.541 ms
>> gary=> select bytea_agg(chunk order by seq) from build_attachment
>> where seq < 16000 \g output
>> Time: 582219.773 ms
>>
>> So those partial aggregates completed in somewhat acceptable times but ...
>>
>> gary=> select bytea_agg(chunk order by seq) from build_attachment
>> where seq < 32000 \g output
>> this one hadn't completed in an hour - the PostgreSQL connection
>> process for my connection on the server goes to 100% CPU and stays
>> there, not using much RAM, not doing much IO, oddly
>>
>> EXPLAINing these aggregate selects doesn't show anything useful.
>>
>> Am I doomed to not be able to update a bytea this way? Is there some
>> way I can tune this?
>>
>>
> bytea is immutable object without preallocation - so update of big tasks
> is very expensive.
>
> I am thinking so using LO API and then transformation to bytea will be
> much more effective
>
> \lo_import path
>
> you can use
>
>  CREATE OR REPLACE FUNCTION attachment_to_bytea(attachment oid)
>  RETURNS bytea AS $$
>  DECLARE
>   fd        integer;
>   size      integer;
>  BEGIN
>   fd   := lo_open(attachment, 262144);
>   size := lo_lseek(fd, 0, 2);
>   PERFORM lo_lseek(fd, 0, 0);
>   RETURN loread(fd, size);
>  EXCEPTION WHEN undefined_object THEN
>    PERFORM lo_close(fd);
>    RETURN NULL;
>  END;
>  $$ LANGUAGE plpgsql STRICT SECURITY DEFINER SET search_path =
> 'pg_catalog';
>
> function
>
> import cca 44MB was in few seconds
>


there is native function lo_get

 https://www.postgresql.org/docs/current/static/lo-funcs.html



> Regards
>
> Pavel
>
>

Re: badly scaling performance with appending to bytea

Reply via email to