On Sun, Nov 27, 2016 at 8:26 PM, Tsunakawa, Takayuki <tsunakawa.ta...@jp.fujitsu.com> wrote: > I see. autovacuum is certainly almost unpredictable, at least for those who > are not aware of its existence and tuning. Recently, one of our customers > faced the inability to perform INSERTs because of xid wraparound. > Their workload is INSERT-heavy, and (inefficiently) used autocommit to insert > each row, which resulted in the xid consumption faster than the slow xid > wraparound autovacuum.
Yes, that can happen. The new freeze map stuff in 9.6 should help with it, even without doing anything like what I am proposing here. But with this, there is no deferred work at all: once the transaction is all-visible, the UNDO can be discarded and there's nothing further to do. >> > Furthermore, it maybe the best to be able to switch the method for >> > each table and/or tablespace. For example, in pgbench, history table >> uses the current method, and other tables use the UNDO method. Is it time >> to introduce a pluggable storage system? >> >> IMHO, it's past time for that. > > Do you mean by "past time" that the community decided not to introduce > pluggable storage manager? If it's true, that's a pity. But I remember that > there was a discussion about pluggable storage manager at PGConf or some > other event this year. Or, do you mean that the current approach should be > abandoned and the UNDO approach replace it? I mean that we should most definitely introduce a pluggable storage manager and that I think we should even have done it sooner, before now. >> I agree up to a point. I think we need to design our own system as well >> as we can, not just copy what others have done. For example, the design >> I sketched will work with all of PostgreSQL's existing index types. You >> need to modify each AM in order to support in-place updates when a column >> indexed by that AM has been modified, and that's probably highly desirable, >> but it's not a hard requirement. I believe that's a better approach for >> us than insisting that we have to do it in exactly the same way as some >> other system. Now, that doesn't mean we shouldn't learn from what works >> well and poorly in other systems, but I think our goal here should be to >> chart the best way forward given PostgreSQL's existing architecture and >> its existing strengths and weaknesses, rather than to make it exactly like >> Oracle or MySQL or anything else. Few people on this mailing list would >> say that either of those systems are categorically better than PostgreSQL; >> most, I suspect, would disagree somewhat vigorously. > > Yes, agreed. I didn't intend to just imitate Oracle/MySQL design. I meant > that it will be better to study in advance what trouble Oracle/MySQL design > has caused their users, and avoid pitfalls as much as possible. For example, > when I ran TPC-B benchmark against Oracle and PostgreSQL, I was embarrassed > by frequent deadlocks in Oracle. It took some time for me to find out that > INITRANS needs to be tuned with ALTER TABLE. PostgreSQL ran smoothly without > any tuning. I agree that we want to stick with our principal of requiring as little tuning as possible -- and where possible reduce tuning that is required today. I have only limited experience with it, but some of those experiences were frustrating. > I find your UNDO approach attractive. On the other hand, I sometimes wonder > where PostgreSQL is headed for. I'm sometimes asked by database users "How > different is PostgreSQL from MySQL?" If the UNDO approach is taken, > PostgreSQL would appear more similar to MySQL. I don't say that's bad, but I > wonder whether we can appeal the new feature in a big picture. For example, > the current (VACUUM) approach would prevent PostgreSQL from becoming a > database for OLTP/analytics mixed workload, because long-running analytics > queries cause tale and index bloat regardless of whether those queries access > the same data as the OLTP workload, wouldn't it? Can we appeal the future of > PostgreSQL and the difference from MySQL as "PostgreSQL is pursuing to handle > multiple workloads in the same database to better utilize data and IT > resources. The new UNDO approach is one step toward that." I think there are a lot of differences between PostgreSQL and MySQL other than whether UNDO is used. For example, just in the area of storage, MySQL uses index-organized tables, while PostgreSQL's current heap and this proposed new type of heap are both flat. There are differences in the types of indexing that are offered, the datatypes available, the features available in every area of the system, and of course the licensing. If we get to a point where there are no real differences in licensing or extensibility or features or any other area between PostgreSQL and MySQL, then of course it makes no difference which one people use, but I don't think there is any prospect of such a thing happening. PostgreSQL is a relatively feature-complete product at this point. In my view, the major thing missing is logical replication, and that's being worked on. However, there are two things that, IMHO, are really hurting us. The first is tooling, for example around backup and recovery. Users familiar with other systems expect things to be easier to set up and administer than they are in PostgreSQL, and they expect better tools for automating administrative tasks than are in fact available. The second is performance. The work Andres is doing to speed up the executor, in which Heikki has expressed interest and shared ideas; the work on parallel query; Amit Langote's work on partitioning; Kevin's work on materialized views; Amit Kapila's work on hash indexes; Tomas Vondra's work on multivariate statistics; and many other patches currently in flight are all attacks, from various angles, on the problem of performance, and this idea is, too. The assumptions underlying every layer of the system need to be questioned and challenged, and alternatives need to be explored. Sometimes that will mean trying things that don't work; often it will mean building new things that provide users with choices they don't have today; other times it will mean rewriting the functionality in place just so it gets faster. Some people want CTEs or window functions or XML manipulation inside the server or more functionality in pgbench and other people don't care about those things, but the one thing that virtually everybody cares about is whether queries (and utility commands) run fast. I don't think that our performance today sucks - we wouldn't have gotten this far if that were the case - but there's clearly room for it to be a lot better, sometimes even by surprisingly simple techniques (e.g. 5dfc198146b49ce7ecc8a1fc9d5e171fb75f6ba5). Whether the techniques are simple or complicated, we need to put work into making the necessary things happen; if other products optimize and we don't, we'll fall behind. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (email@example.com) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers