On 01/05/2017 05:37 PM, Stephen Frost wrote:
Tomas,

* Tomas Vondra (tomas.von...@2ndquadrant.com) wrote:
On 01/05/2017 02:23 PM, Magnus Hagander wrote:
It's easy enough to construct a benchmark specifically to show the
difference, but of any actual "normal workload" for it. Typically the
optimization applies to things like bulk loading, which typically never
done alone and does not lend itself to that type of benchmarking very
easily.

Not sure if I understand correctly what you're saying. You're saying
that although it'd be easy to construct a benchmark showing
significant performance impact, it won't represent a common
workload. Correct?

I think he's saying that it's not very easy to construct a good example
of typical bulk-loading workloads using just pgbench.  Bulk loading
certainly happens with PG and I don't think we'll make very many friends
if we break optimizations when wal_level is set to minimal like those
you get using:

BEGIN;
CREATE TABLE x (c1 int);
COPY x FROM STDIN;
COMMIT;

or:

BEGIN;
TRUNCATE x;
COPY x FROM STDIN;
COMMIT;

Changing the wal_level from 'minimal' to 'replica' or 'logical' with
such a benchmark is going to make the WAL go from next-to-nothing to
size-of-database.

Sure, I do know how to construct such workloads - and it's trivial even with pgbench custom scripts. The question is whether such workloads are common or not.

Most importantly, no one is proposing to break the optimizations, but changing the defaults - users relying on the optimizations are free to switch back to wal_level=minimal if needed.

>
One doesn't typically *just* do bulk loads, however,
often it's a bulk load into a table and then the contents of that table
are merged with another table or perhaps joined to it to produce some
report or something along those lines.  In many of those cases, our
more-recently added capability to have UNLOGGED tables will work, but
not all (in particular, it can be very handy to load everything in using
the above technique and then switch the wal_level to replica, which
avoids having to have the bulk of the data sent through WAL, something
you can't avoid if you want to turn an unlogged table into a logged
one).


Ultimately, the question is whether the number of people running into "Hey, I can't take pg_basebackup or setup a standby with the default config!" is higher or lower than number of people running into "Hey, CREATE TABLE + COPY is slower now!"

I haven't seen many systems relying on such load optimizations, for a number of reasons:

1) The important/critical systems usually have replicas, so are inherently incompatible with wal_level=minimal.

2) The batch jobs usually don't truncate the main table, but load the increment into a temporary/unlogged table first, then merge it into the main one.

That is not to say there are no other cases benefiting from those optimizations, but we're talking about the default value - we're not removing the wal_level=minimal.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to