RE: pg_dump out of memory for large table with LOB

2018-11-21 Thread Jean-Marc Lessard
JMLessard wrote:
> What about updates where the bytea do not changed. Does a new copy of the 
> bytea will be made in the toast table or new row will point to the original 
> bytea?
> > https://www.postgresql.org/docs/current/storage-toast.html says:
> > The TOAST management code is triggered only when a row value to be stored 
> > in a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The 
> > TOAST code will compress
> > and/or move field values out-of-line until the row value is shorter than 
> > TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains 
> > can be had. During an UPDATE
> > operation, values of unchanged fields are normally preserved as-is; so an 
> > UPDATE of a row with out-of-line values incurs no TOAST costs if none of 
> > the out-of-line values change.
> Does it means, no incurs cost to generate the out of line toast, but that a 
> copy of the bytea is still made for the new line?

I bench mark it as follow:
UPDATE table SET mod_tim=mod_tim;
The relpages of the table doubled, but the relpages of the toast table did not 
changed.

Jean-Marc Lessard
Administrateur de base de donn?es / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com


RE: pg_dump out of memory for large table with LOB

2018-11-15 Thread Jean-Marc Lessard
Thanks to Daniel Verite, nice answer, really helpful :)
It summarizes what I have read in the doc and blogs.

What about updates where the bytea do not changed. Does a new copy of the bytea 
will be made in the toast table or new row will point to the original bytea?
> https://www.postgresql.org/docs/current/storage-toast.html says
> The TOAST management code is triggered only when a row value to be stored in 
> a table is wider than TOAST_TUPLE_THRESHOLD bytes (normally 2 kB). The TOAST 
> code will compress
> and/or move field values out-of-line until the row value is shorter than 
> TOAST_TUPLE_TARGET bytes (also normally 2 kB, adjustable) or no more gains 
> can be had. During an UPDATE
> operation, values of unchanged fields are normally preserved as-is; so an 
> UPDATE of a row with out-of-line values incurs no TOAST costs if none of the 
> out-of-line values change.

Does it means, no incurs cost to generate the out of line toast, but that a 
copy of the bytea is still made for the new line?


Jean-Marc Lessard
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com


RE: pg_dump out of memory for large table with LOB

2018-11-15 Thread Daniel Verite
Jean-Marc Lessard wrote:

> Another area where LOB hurts is the storage. LOB are broken and stored in 2K
> pieces.
> Due to the block header, only three 2k pieces fit in an 8k block wasting 25%
> of space (in fact pgstattuple reports ~ 20%).

Yes. bytea stored as TOAST is sliced into pieces of 2000 bytes, versus
2048 bytes for large objects. And that makes a significant difference
when packing these slices because 2000*4+page overhead+
4*(row overhead) is just under the default size of 8192 bytes per page,
whereas 2048*4+(page overhead)+4*(row overhead)
is obviously a bit over 8192, since 2048*4=8192.

If the data is compressible, the difference may be less obvious because
the slices in pg_largeobject are compressed individually
(as opposed to bytea that gets compressed as a whole),
so more than 3 slices can fit in a page inside pg_largeobject
The post-compression size can be known with pg_column_size(),
versus octet_length() that gives the pre-compression size.
 
> Would you recommend bytea over LOB considering that the max LOB size is well
> bellow 1GB?
> Are bytea preferable in terms of support by the community, performance,
> feature, etc?

For the storage and pg_dump issues, bytea seems clearly preferable
in your case.
As for the performance aspect, large objects are excellent because their
API never requires a binary<->text conversion.
This may be different with bytea. The C API provided by libpq allows to
retrieve and send bytea in binary format, for instance through
PQexecParams(), but most drivers implemented on top of libpq use only 
the text representation for all datatypes, because it's simpler for them.
So you may want to check the difference in sending and retrieving
your biggest binary objects with your particular app/language/framework
stored in a bytea column versus large objects.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite



Re: pg_dump out of memory for large table with LOB

2018-11-14 Thread Ron

On 11/14/2018 11:14 AM, Jean-Marc Lessard wrote:

Adrien Nayrat wrote:
> With 17 million LO, it could eat lot of memory ;)
Yes it does.

I did several tests and here are my observations.

First memory settings are:
shared_buffers = 3GB
work_mem = 32Mb
maintenance_work_mem = 1GB
effective_cache_size = 9MB
bytea_output = 'escape'


Why escape instead of hex?


--
Angular momentum makes the world go 'round.


Re: pg_dump out of memory for large table with LOB

2018-11-14 Thread Tom Lane
Jean-Marc Lessard  writes:
> Would you recommend bytea over LOB considering that the max LOB size is well 
> bellow 1GB?

Yes, probably.  The reason that pg_dump has trouble with lots of small
BLOBs is the 9.0-era decision to treat BLOBs as independent objects
having their own owners, privilege attributes, and archive TOC entries
--- it's really the per-BLOB TOC entries that are causing the issue
for you here.  That model is fine as long as BLOBs are, uh, large.
If you're using them as replacements for bytea, the overhead is going
to be prohibitive.

regards, tom lane



Re: pg_dump out of memory for large table with LOB

2018-11-11 Thread Adrien Nayrat
Hello,

On 11/10/18 12:49 AM, Jean-Marc Lessard wrote:
> The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of the 
> space.
> 

If I understand, you have 17 million Large Object?

I do not recall exactly and maybe I am wrong. But it seems pg_dump has to
allocate memory for each object to dump :
addBoundaryDependencies:

for (i = 0; i < numObjs; i++)
[...]

case DO_BLOB_DATA:
/* Data objects: must come between the boundaries */
addObjectDependency(dobj, preDataBound->dumpId);
addObjectDependency(postDataBound, dobj->dumpId);
break;

addObjectDependency:

[...]
pg_malloc(dobj->allocDeps * sizeof(DumpId));


With 17 million LO, it could eat lot of memory ;)



signature.asc
Description: OpenPGP digital signature


Re: pg_dump out of memory for large table with LOB

2018-11-10 Thread Adrian Klaver

On 11/10/18 2:46 PM, Ron wrote:

On 11/09/2018 05:49 PM, Jean-Marc Lessard wrote:


I am running PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by 
gcc.exe (Rev5, Built by MSYS2 project) 4.9.2, 64-bit


on win2012 with 12Gb RAM

The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of 
the space.


The pg_dump consumes the entire system memory and swap, then 
terminates with out of memory error


Is it a bug or normal behavior?

If I do not include LOB in the dump, it works fine.

Here is the dump output:

C:\Users\Administrator> pg_dump -h localhost -Fc -a -b -t signatures 
-v > d:\postgresql\sig.dmp


pg_dump: last built-in OID is 16383

...

pg_dump: reading row security enabled for table "ibisl1.signatures"

pg_dump: reading policies for table "ibisl1.signatures"

pg_dump: reading large objects

pg_dump: reading dependency data

pg_dump: saving encoding = UTF8

pg_dump: saving standard_conforming_strings = on

out of memory



This looks similar to the recent thread "Trouble Upgrading Postgres".
https://www.postgresql.org/message-id/flat/CAFw6%3DU2oz9rTF0qa0LFMg91bu%3Dhdisfu2-xXU1%3D%3DD7yBif%2B2uw%40mail.gmail.com

Specifically, message 
ce239c9c-68f2-43e6-a6b6-81c66d0f4...@manitou-mail.org 



"The hex expansion performed by COPY must allocate twice that size,
plus the rest of the row, and if that resulting size is above 1GB, it
will error out with the message you mentioned upthread:
ERROR: invalid memory alloc request size .
So there's no way it can deal with the contents over 500MB, and the
ones just under that limit may also be problematic."


I don't this is the case. The above is an issue because of the maximum 
length of a string that Postgres can process. LO's are different creatures:


https://www.postgresql.org/docs/11/lo-implementation.html


It would help to see the memory configuration values set for the cluster:

https://www.postgresql.org/docs/11/lo-implementation.html




And message 89b5b622-4c79-4c95-9ad4-b16d0d0da...@manitou-mail.org

"It's undoubtedly very annoying that a database can end up with

non-pg_dump'able contents, but it's not an easy problem to solve. Some 
time ago, work was done to extend the 1GB limit but eventually it got 
scratched. The thread in [1] discusses many details of the problem and 
why the proposed solution were mostly a band aid. Basically, the specs 
of COPY and other internal aspects of Postgres are from the 32-bit era 
when putting the size of an entire CDROM in a single row/column was not 
anticipated as a valid use case. It's still a narrow use case today and 
applications that need to store big pieces of data like that should 
slice them in chunks, a bit like in pg_largeobject, except in much 
larger chunks, like 1MB.


[1] pg_dump / copy bugs with "big lines" ? 
https://www.postgresql.org/message-id/1836813.YmyOrS99PX%40ronan.dunklau.fr 



  "



*Jean-Marc Lessard*
Administrateur de base de données / Database Administrator
Ultra Electronics Forensic Technology Inc.
*T* +1 514 489 4247 x4164
www.ultra-forensictechnology.com 



--
Angular momentum makes the world go 'round.



--
Adrian Klaver
adrian.kla...@aklaver.com



pg_dump out of memory for large table with LOB

2018-11-10 Thread Jean-Marc Lessard
I am running PostgreSQL 9.6.5 on x86_64-pc-mingw64, compiled by gcc.exe (Rev5, 
Built by MSYS2 project) 4.9.2, 64-bit
on win2012 with 12Gb RAM
The dumped table is 0.5TB, 17 million rows and LOB uses about 99% of the space.
The pg_dump consumes the entire system memory and swap, then terminates with 
out of memory error
Is it a bug or normal behavior?
If I do not include LOB in the dump, it works fine.

Here is the dump output:
C:\Users\Administrator> pg_dump -h localhost -Fc -a -b -t signatures -v > 
d:\postgresql\sig.dmp
pg_dump: last built-in OID is 16383
...
pg_dump: reading row security enabled for table "ibisl1.signatures"
pg_dump: reading policies for table "ibisl1.signatures"
pg_dump: reading large objects
pg_dump: reading dependency data
pg_dump: saving encoding = UTF8
pg_dump: saving standard_conforming_strings = on
out of memory

Jean-Marc Lessard
Administrateur de base de donn?es / Database Administrator
Ultra Electronics Forensic Technology Inc.
T +1 514 489 4247 x4164
www.ultra-forensictechnology.com