Re: [HACKERS] The plan for FDW-based sharding

Petr Jelinek Tue, 01 Mar 2016 10:58:16 -0800

On 27/02/16 04:54, Robert Haas wrote:

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
<[email protected]> wrote:

We do not have formal prove that proposed XTM is "general enough" to handle
all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based and
CSN  based.


I don't believe that for a minute.  For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.

I have similar problem with the FDW approach though. It seems to me likebecause we have something that solves access to external tables somebodydecided that it should be used as base for the whole sharding solutionbut there is no real concept of how it will look like together, no ideaswhat it will be usable for and not even simple prototype that wouldprove that the idea is sound (although again, I am not clear on what theactual idea is beyond "we will use FDWs").

Don't get me wrong, I agree that the current FDW enhancements areuseful, I am just worried about them being presented as future ofsharding in Postgres when nobody has sketched how the future might looklike. And once we get to more interesting parts like consistency,distributed query planning, p2p connections (and I am really concernedabout these as FDWs abstract some knowledge that coordinator and or datanodes might need to do these well), etc we might very well findourselves painted in the corner and have to start from beginning, whileif we had some idea on how the whole thing might look like we couldidentify this early and not postpone built-in sharding by several yearsjust because somebody said we will use FDWs and that's what we worked onin those years.

Note that I am not saying that other discussed approaches are anybetter, I am saying that we should know approximately what we actuallywant and not just beat FDWs with a hammer and hope sharding willeventually emerge and call that the plan.


--
  Petr Jelinek                  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

Reply via email to