Re: [HACKERS] UNDO and in-place update

Robert Haas Sun, 27 Nov 2016 19:04:33 -0800

On Sun, Nov 27, 2016 at 8:26 PM, Tsunakawa, Takayuki
<tsunakawa.ta...@jp.fujitsu.com> wrote:
> I see.  autovacuum is certainly almost unpredictable, at least for those who 
> are not aware of its existence and tuning.  Recently, one of our customers 
> faced the inability to perform INSERTs because of xid wraparound.
> Their workload is INSERT-heavy, and (inefficiently) used autocommit to insert 
> each row, which resulted in the xid consumption faster than the slow xid 
> wraparound autovacuum.


Yes, that can happen.  The new freeze map stuff in 9.6 should help
with it, even without doing anything like what I am proposing here.
But with this, there is no deferred work at all: once the transaction
is all-visible, the UNDO can be discarded and there's nothing further
to do.

>> > Furthermore, it maybe the best to be able to switch the method for
>> > each table and/or tablespace.  For example, in pgbench, history table
>> uses the current method, and other tables use the UNDO method.  Is it time
>> to introduce a pluggable storage system?
>>
>> IMHO, it's past time for that.
>
> Do you mean by "past time" that the community decided not to introduce 
> pluggable storage manager?  If it's true, that's a pity.  But I remember that 
> there was a discussion about pluggable storage manager at PGConf or some 
> other event this year.  Or, do you mean that the current approach should be 
> abandoned and the UNDO approach replace it?

I mean that we should most definitely introduce a pluggable storage
manager and that I think we should even have done it sooner, before
now.

>> I agree up to a point.  I think we need to design our own system as well
>> as we can, not just copy what others have done.  For example, the design
>> I sketched will work with all of PostgreSQL's existing index types.  You
>> need to modify each AM in order to support in-place updates when a column
>> indexed by that AM has been modified, and that's probably highly desirable,
>> but it's not a hard requirement.  I believe that's a better approach for
>> us than insisting that we have to do it in exactly the same way as some
>> other system.  Now, that doesn't mean we shouldn't learn from what works
>> well and poorly in other systems, but I think our goal here should be to
>> chart the best way forward given PostgreSQL's existing architecture and
>> its existing strengths and weaknesses, rather than to make it exactly like
>> Oracle or MySQL or anything else.  Few people on this mailing list would
>> say that either of those systems are categorically better than PostgreSQL;
>> most, I suspect, would disagree somewhat vigorously.
>
> Yes, agreed.  I didn't intend to just imitate Oracle/MySQL design.  I meant 
> that it will be better to study in advance what trouble Oracle/MySQL design 
> has caused their users, and avoid pitfalls as much as possible.  For example, 
> when I ran TPC-B benchmark against Oracle and PostgreSQL, I was embarrassed 
> by frequent deadlocks in Oracle.  It took some time for me to find out that 
> INITRANS needs to be tuned with ALTER TABLE.  PostgreSQL ran smoothly without 
> any tuning.

I agree that we want to stick with our principal of requiring as
little tuning as possible -- and where possible reduce tuning that is
required today.  I have only limited experience with it, but some of
those experiences were frustrating.

> I find your UNDO approach attractive.  On the other hand, I sometimes wonder 
> where PostgreSQL is headed for.  I'm sometimes asked by database users "How 
> different is PostgreSQL from MySQL?"  If the UNDO approach is taken, 
> PostgreSQL would appear more similar to MySQL.  I don't say that's bad, but I 
> wonder whether we can appeal the new feature in a big picture.  For example, 
> the current (VACUUM) approach would prevent PostgreSQL from becoming a 
> database for OLTP/analytics mixed workload, because long-running analytics 
> queries cause tale and index bloat regardless of whether those queries access 
> the same data as the OLTP workload, wouldn't it?  Can we appeal the future of 
> PostgreSQL and the difference from MySQL as "PostgreSQL is pursuing to handle 
> multiple workloads in the same database to better utilize data and IT 
> resources.  The new UNDO approach is one step toward that."

I think there are a lot of differences between PostgreSQL and MySQL
other than whether UNDO is used.  For example, just in the area of
storage, MySQL uses index-organized tables, while PostgreSQL's current
heap and this proposed new type of heap are both flat.  There are
differences in the types of indexing that are offered, the datatypes
available, the features available in every area of the system, and of
course the licensing.  If we get to a point where there are no real
differences in licensing or extensibility or features or any other
area between PostgreSQL and MySQL, then of course it makes no
difference which one people use, but I don't think there is any
prospect of such a thing happening.

PostgreSQL is a relatively feature-complete product at this point.  In
my view, the major thing missing is logical replication, and that's
being worked on.  However, there are two things that, IMHO, are really
hurting us.  The first is tooling, for example around backup and
recovery.  Users familiar with other systems expect things to be
easier to set up and administer than they are in PostgreSQL, and they
expect better tools for automating administrative tasks than are in
fact available.  The second is performance.  The work Andres is doing
to speed up the executor, in which Heikki has expressed interest and
shared ideas; the work on parallel query; Amit Langote's work on
partitioning; Kevin's work on materialized views; Amit Kapila's work
on hash indexes; Tomas Vondra's work on multivariate statistics; and
many other patches currently in flight are all attacks, from various
angles, on the problem of performance, and this idea is, too.  The
assumptions underlying every layer of the system need to be questioned
and challenged, and alternatives need to be explored.  Sometimes that
will mean trying things that don't work; often it will mean building
new things that provide users with choices they don't have today;
other times it will mean rewriting the functionality in place just so
it gets faster.  Some people want CTEs or window functions or XML
manipulation inside the server or more functionality in pgbench and
other people don't care about those things, but the one thing that
virtually everybody cares about is whether queries (and utility
commands) run fast.  I don't think that our performance today sucks -
we wouldn't have gotten this far if that were the case - but there's
clearly room for it to be a lot better, sometimes even by surprisingly
simple techniques (e.g. 5dfc198146b49ce7ecc8a1fc9d5e171fb75f6ba5).
Whether the techniques are simple or complicated, we need to put work
into making the necessary things happen; if other products optimize
and we don't, we'll fall behind.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] UNDO and in-place update

Reply via email to