Re: [HACKERS] The plan for FDW-based sharding

2016-03-11 Thread Bruce Momjian
On Fri, Mar 11, 2016 at 10:19:16AM +0100, Oleg Bartunov wrote:
> Our XTM is the yet another example of infrastructure we need to work on
> clustering. Should we wait other smart guy starts thinking on distributed
> transactions ?  We described in https://wiki.postgresql.org/wiki/DTM our  API,
> which is just a wrapper on existed functions, but it will allow us and
> fortunately others to play with their ideas.  We did several prototypes,
> including FDW, to demonstrate viability of API, and plan to continue our work
> on built-in high availability, multi-master.  Of course, there will be a lot 
> to
> learn, but it will be much easier if XTM will exists not as separate patch,
> which is really small.

I think everyone agrees we want a global transaction manager of some
type.  I think choosing the one we want is the problem as there are
several possible directions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-11 Thread Oleg Bartunov
On Fri, Mar 11, 2016 at 9:09 AM, Bruce Momjian  wrote:

>
>
>
> 3.  I have tried to encourage others to get involved, with limited
> success.  I do think the FDW is perhaps the only reasonable way to get
> _built-in_ sharding.  The external sharding solutions are certainly
> viable, but external.  It is possible we will make all the FDW
> improvements, find out it doesn't work, but find out the improvements
> allow us to go in another direction.
>

I remember last summer emails and we really wanted to participate in
development, but it happens all slots were occupied by edb and ntt people.
We wanted to work on distributed transactions and proposed our XTM.  Our
feeling that time from discussion was that we were invited, but all doors
were closed. It was very bad experience. Hopefully, we understand our
misunderstanding.


>
> There seems to be serious interest in how this idea came about, so let
> me say what I remember.
>

I think the idea was so obvious, so let's don't discuss this.


>
> As for why there is so much hostility, I think this is typical for any
> ill-defined feature development.  There was simmering hostility to the
> Windows port and pg_upgrade for many years because those projects were
> not easy to define and risky, and had few active developers.  The
> agreement was that work could continue as long as destabilization wasn't
> introduced.  Ideally everything would have a well-defined plan, it is
> sometimes hard to do.  Similar to our approach on parallelism (which is
> also super-important and doesn't many active developers), sometimes you
> just need to create infrastructure and see how well it solves problems.
>
>

Our XTM is the yet another example of infrastructure we need to work on
clustering. Should we wait other smart guy starts thinking on distributed
transactions ?  We described in https://wiki.postgresql.org/wiki/DTM our
API, which is just a wrapper on existed functions, but it will allow us and
fortunately others to play with their ideas.  We did several prototypes,
including FDW, to demonstrate viability of API, and plan to continue our
work on built-in high availability, multi-master.  Of course, there will be
a lot to learn, but it will be much easier if XTM will exists not as
separate patch, which is really small.


>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I. As I am, so you will be. +
> + Roman grave inscription +
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] The plan for FDW-based sharding

2016-03-11 Thread Bruce Momjian
On Fri, Mar 11, 2016 at 04:30:13PM +0800, Craig Ringer wrote:
> ... eventually.
> 
> Sometimes the bug reports start. Occasionally you get a "thanks, this looks
> interesting/handy". But usually just bug reports or complaints that whatever
> you built isn't good enough to meet some random person's particular use case.
> Ah well. 

As they say, if this was easy, everyone would be doing it.  ;-)

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-11 Thread Craig Ringer
On 11 March 2016 at 16:09, Bruce Momjian  wrote:



> Ideally everything would have a well-defined plan, it is
> sometimes hard to do.


BDR helped for logical decoding etc - having something concrete really
helped shape and guide each part of it as it was (or is/will be, in some
cases) migrated from BDR to core.

That said, it was necessary because for many of the things it needs there
weren't really good, isolated improvements to make with obvious utility for
other projects. Sure, commit timestamps are handy, replication origins will
be handy, etc. They can be used by other projects and will be. Some are
already. But unlike the FDW enhancements they're not things that will be
used simply by being present without even requiring any special user
action, so they had an understandably higher barrier to cross for
acceptance.

Once you get to the point where you're not making FDW improvements that
help a broad set of users and start doing things that'll really only aid
some hypothetical sharding system that also requires other infrastructure
changes, hooks, etc ... that's when I think it's going to be
proof-of-concept prototype time.

Similar to our approach on parallelism (which is
> also super-important and doesn't many active developers), sometimes you
> just need to create infrastructure and see how well it solves problems.
>

Yep. Again, like BDR and logical decoding. We've had quite a lot of
surprises as we find unexpected corner cases and challenges over time.
Andres's original work on logical decoding went through a number of
significant revisions as more was learned about the problem to solve.
Sometimes you can only do that by actually building it. Logical decoding as
it stands in core is only partway through that evolution as it is - I think
we now have a good understanding of why logical decoding of prepared xacts,
streaming of in-progress xacts etc will be needed down the track, but it
would've been hard to come up with that at the start when we didn't have
experience using what we've already got.


> The weird thing is that if you do implement an ill-defined feature,
> there really isn't much huge positive feedback ---  people just use the
> feature, and the complaints stop.


... eventually.

Sometimes the bug reports start. Occasionally you get a "thanks, this looks
interesting/handy". But usually just bug reports or complaints that
whatever you built isn't good enough to meet some random person's
particular use case. Ah well.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-11 Thread Bruce Momjian
I have read the recent comments on this thread with great interest.  I
am glad people have expressed their concerns, rather than remain silent.
Now that the responses have decreased, I can reply.

I saw several concerns:

1.  My motivation for starting this thread was to decrease interest in
external sharding solutions.

2.  No prototype was produced.

3.  More work needs to be done to encourage others to be involved.

4.  An FDW-based sharding solution will only work for some workloads,
decreasing interest in a more general solution.

5.  I started this thread to take credit for the idea or feature.

Let me reply to each item as briefly as I can:

1.  I said good things about external sharding solutions in the email,
so it is hard to logically argue that the _intent_ was to reduce
interest in them.  I will admit that that might be the short-term
effect.

2.  We have not produced a prototype because we don't really need to
make any decision yet on viability.  We already need to improve FDW
pushdown, partitioning syntax, and perhaps a global transaction/snapshot
manger with or without sharding, so we might as well just make those
improvements, and then producing a prototype will be much easier and
more representative.

3.  I have tried to encourage others to get involved, with limited
success.  I do think the FDW is perhaps the only reasonable way to get
_built-in_ sharding.  The external sharding solutions are certainly
viable, but external.  It is possible we will make all the FDW
improvements, find out it doesn't work, but find out the improvements
allow us to go in another direction.

4.  Hard to argue with #4.  We got partitioning working with a complex
API that has not improved much over the years.  I think this will be
cleaned up with the FDW-sharding work, and it would be a shame to create
another partial solution (FDW sharding) out of that work.

5.  See below on why I talk about these things.

There seems to be serious interest in how this idea came about, so let
me say what I remember.  It is very possible others came to the same
conclusions independently, and earlier.  I think I first heard it form
Korry Douglas in an EDB-internal discussion.  I then heard it from Josh
Berkus or we discussed it at a conference.  That got me thinking, and
then an EDB customer talked about the need for multi-node write scaling,
and I realized that only sharding could do that.  (The data warehouse
use of sharding was already clear to me.)  I then understood the wisdom
of Postgres XC, which NTT worked on for perhaps a decade.  (I just left
their offices here in Tokyo.)  I discussed the FDW-sharding idea
internally inside EDB, and then mentioned it during a visit to NTT in
July, 2014.  I wrote and blogged about a new sharding presentation I
wrote in February, 2015
(http://momjian.us/main/blogs/pgblog/2015.html#February_1_2015).  I
presented the talk in three locations in 2015.

The reason I talk about these things (#5) is because I am trying to
encourage people to work on them, and I want to communicate to our users
that we realize sharding is important for certain workloads and that we
are attempting a built-in solution.  Frankly, I don't think many users
need sharding, but many users want to know it is available, so I think
it is important to talk about it.

As for why there is so much hostility, I think this is typical for any
ill-defined feature development.  There was simmering hostility to the
Windows port and pg_upgrade for many years because those projects were
not easy to define and risky, and had few active developers.  The
agreement was that work could continue as long as destabilization wasn't
introduced.  Ideally everything would have a well-defined plan, it is
sometimes hard to do.  Similar to our approach on parallelism (which is
also super-important and doesn't many active developers), sometimes you
just need to create infrastructure and see how well it solves problems.

The weird thing is that if you do implement an ill-defined feature,
there really isn't much huge positive feedback ---  people just use the
feature, and the complaints stop.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-08 Thread Oleg Bartunov
On Tue, Mar 8, 2016 at 6:40 AM, Craig Ringer  wrote:



> Either that, or bless experimental features/API as an official concept.
> I'd quite like that myself - stuff that's in Pg, but documented as "might
> change or go away in the next release, experimental feature". As we're
> doing more stuff that spans multiple release cycles, where patches in a
> prior cycle might need revision based on what we learn in a later one, we
> might need more freedom to change things that're committed and user visible.
>
>
+1


> --
>  Craig Ringer   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Craig Ringer
On 7 March 2016 at 23:02, Robert Haas  wrote:

> On Fri, Mar 4, 2016 at 11:17 PM, Craig Ringer 
> wrote:
> > If FDW-based sharding works, I'm happy enough, I have no horse in this
> race.
> > If it doesn't work I don't much care either. What I'm worried about is
> it if
> > works like partitioning using inheritance works - horribly badly, but
> just
> > well enough that it's served as an effective barrier to doing anything
> > better.
> >
> > That's what I want to prevent. Sharding that only-just-works and then
> stops
> > us getting anything better into core.
>
> That's a reasonable worry.  Thanks for articulating it so clearly.
> I've thought about that issue and I admit it's both real and serious,
> but I've sort of taken the attitude of saying, well, I don't know how
> to solve that problem, but there's so much other important work that
> needs to be done before we get to the point where that's the blocker
> that solving that problem doesn't seem like the most important thing
> right now.


[snip explanation]


> I think your concern is
> valid, and I share it.  But I just fundamentally believe that it's
> better to enhance what we have than to start inventing totally new
> abstractions.  The FDW API is *really* powerful, and getting more
> powerful, and I just have a very hard time believing that starting
> over will be better.  Somebody can do that if they like and I'm not
> gonna get in the way, but if it's got problems that could have been
> avoided by basing that same work on the FDW stuff we've already got, I
> do plan to point that out.


Yep. As has been noted, each of these improvements is useful in their own
right, and I'm not sure anyone's against them, just
concerned about whether the overall vision for sharding will work out.

Personally I think that once the FDW infrastructure is closer to being
usable for sharding, when we're at the point where new patches are proposed
that're really specifically for sharding and not so general-use FDW
improvements, that's when it'd be well worth building a proof of concept
sharding implementation. Find unexpected wrinkles and issues before
starting to stream stuff into core that can't be easily removed again. That
was certainly useful when building BDR, and even then we still found lots
of things that required revision, often repeatedly.

Either that, or bless experimental features/API as an official concept. I'd
quite like that myself - stuff that's in Pg, but documented as "might
change or go away in the next release, experimental feature". As we're
doing more stuff that spans multiple release cycles, where patches in a
prior cycle might need revision based on what we learn in a later one, we
might need more freedom to change things that're committed and user visible.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Kevin Grittner
On Mon, Mar 7, 2016 at 6:13 AM, Craig Ringer  wrote:
> On 5 March 2016 at 23:41, Kevin Grittner  wrote:

>> The only place you *need* to vary from commit order for correctness
>> is when there are overlapping SERIALIZABLE transactions, one
>> modifies data and commits, and another reads the old version of the
>> data but commits later.
>
> Ah, right. So here, even though X1 commits before X2 running concurrently
> under SSI, the logical order in which the xacts could've occurred serially
> is that where xact 2 runs and commits before X1, since xact 2 doesn't depend
> on xact 1. X2 read the old row version before xact 1 modified it, and
> logically occurs before xact1 in the serial rearrangement.

Right, because X2 is *seeing* data in a state that existed before X1 ran.

> I don't fully grasp how that can lead to a situation where xacts can commit
> in an order that's valid upstream but not valid as a downstream apply order.

With SSI, it can matter whether an intermediate state is *read*.

> I presume we're looking at read-only logical replicas here (rather than
> multimaster),

I have not worked out how this works with MMR.  I'm not sure that
there is one clear answer to that.

> and it's only a concern for SERIALIZABLE xacts since a READ
> COMMITTED xact on the master and replica would both be able to see the state
> where X1 is commited but X2 isn't yet.

REPEATABLE READ would allow the anomaly to be seen, too, if a
transaction acquired its snapshot between the two commits.

> But I don't see how a read-only xact
> in SERIALIZABLE on the replica can get different results to what it'd get
> with SSI on the master. It's entirely possible for a read xact on the master
> to get a snapshot after X1 commits and after X2 commits, same as READ
> COMMITTED. SSI shouldn't AFAIK come into play with no writes to create a
> pivot. Is that wrong?

As mentioned earlier in this thread, look at the examples in this
section of the Wiki page, and imagine that the READ ONLY
transaction involved did *not* run on the primary, but *did* run on
the replica:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

> If we applied this sequence to the downstream in commit order we'd still get
> correct results on the heap after applying both.

... eventually.

> We'd have an intermediate
> state where X1 is commited but X2 isn't, but we can have the same on the
> master. SSI doesn't AFAIK mask X1 from becoming visible in a snapshot until
> X2 commits or anything, right?

If that intermediate state is *seen* on the master, a transaction
is rolled back.

>> The key is that
>> there is a read-write dependency (a/k/a rw-conflict) between the
>> two transactions which tells you that the second to commit has to
>> come before the first in any graph of apparent order of execution.
>
> Yeah, I get that part. How does that stop a 3rd SERIALIZABLE xact from
> getting a snapshot between the two commits and reading from there?

Serializable Snapshot Isolation doesn't generally block anything
that REPEATABLE READ (which is straight Snapshot Isolation) doesn't
block -- unless you explicitly request READ ONLY DEFERRABLE.  What
is does is monitor for situations that can present anomalies and
rolls back transactions as necessary to prevent anomalies in
successfully committed transactions.  We tried very hard to avoid
rolling back a transaction that could fail a second time on
conflict the same set of transactions, although there were some
corner cases where it could not be avoided when a transaction was
PREPARED and not yet committed.  Another possibly useful fact is
that we were able to guarantee that whenever there was a rollback,
some SERIALIZABLE transaction which overlaps the one being rolled
back has modified data and successfully committed -- ensuring that
there is some forward progress even in worst case situations.

>> The tricky part is that when there are two overlapping SERIALIZABLE
>> transactions and one of them has modified data and committed, and
>> there is an overlapping SERIALIZABLE transaction which is not READ
>> ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
>> correct ordering remains in doubt -- there is no way to know which
>> might need to commit first, or whether it even matters.  I am
>> skeptical about whether in logical replication (including MMR), it
>> is going to be possible to manage this by finding "safe snapshots".
>> The only alternative I can see, though, is to suspend replication
>> while correct transaction ordering remains in doubt.  A big READ
>> ONLY transaction would not cause a replication stall, but a big
>> READ WRITE transaction could cause an indefinite stall.  Simon
>> seemed to be saying that this is unacceptable, but I tend to think
>> it is a viable approach for some workloads, especially if the READ
>> ONLY transaction property is used when possible.
>
> We already have huge replication stalls when big write xacts occur. We 

Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Robert Haas
On Fri, Mar 4, 2016 at 11:17 PM, Craig Ringer  wrote:
> If FDW-based sharding works, I'm happy enough, I have no horse in this race.
> If it doesn't work I don't much care either. What I'm worried about is it if
> works like partitioning using inheritance works - horribly badly, but just
> well enough that it's served as an effective barrier to doing anything
> better.
>
> That's what I want to prevent. Sharding that only-just-works and then stops
> us getting anything better into core.

That's a reasonable worry.  Thanks for articulating it so clearly.
I've thought about that issue and I admit it's both real and serious,
but I've sort of taken the attitude of saying, well, I don't know how
to solve that problem, but there's so much other important work that
needs to be done before we get to the point where that's the blocker
that solving that problem doesn't seem like the most important thing
right now.

The sharding discussion we had in Vienna convinced me that, in the
long run, having PostgreSQL servers talk to other PostgreSQL servers
only using SQL is not going to be a winner.  I believe Postgres-XL has
already done something about that; I think it is passing plans around
directly.  So you could look at that and say - ha, the FDW approach is
a dead end!  But from my point of view, the important thing about the
FDW interface is that it provides a pluggable interface to the
planner.  We can now push down joins and sorts; hopefully soon we will
be able to push down aggregates and limits and so on.  That's the hard
part.  The deparsing code that turns the plan we want to execute in to
an SQL query that can be shipped over the wire is a detail.
Serializing some other on-the-wire representation of what we want the
remote side to do is small potatoes compared to having all of the
logic that lets you decide, in the first instance, what you want the
remote side to do.  I can imagine, in the long term, adding a new
sub-protocol (probably mediated via COPY BOTH) that uses a different
and more expressive on-the-wire representation.

Another foreseeable problem with the FDW approach is that you might
want to have a hash-partitioned table where there are multiple copies
of each piece data and they are spread out across the shards and you
can add and remove shards and the data automatically rebalances.
Table inheritance (or table partitioning) + postgres_fdw doesn't sound
so great in this situation because when you rebalance you need to
change the partitioning constraints and that requires a full table
lock on every node and the whole thing seems likely to end up being
somewhat annoyingly manual and overly constrained by locking.  But I'd
say two things about that.  The first is that I honestly think that
this would be a pretty nice problem to have.  If we had things working
well enough that this was the kind of problem we were trying to
tackle, we'd be light-years ahead of where we are today.  Sure,
everybody hates table inheritance, but I don't think it's right to say
that partitioning work is blocked because table inheritance exists: I
think the problem is that getting true table partitioning correct is
*hard*.  And Amit Langote is working on that and hopefully we will get
there, but it's not an easy problem.  I don't think sharding is an
easy problem either, and I think getting to a point where ease-of-use
is our big limiting factor would actually be better than the current
scenario where "it doesn't work at all" is the limiting factor.  I
don't want that to *block* other approaches, BUT I also think that
anybody who tries to start over from scratch and ignore all the good
work that has been done in FDW-land is not going to have a very fun
time.

The second thing I want to say about this problem is that I don't want
to presume that it's not a *solvable* problem.  Just because we use
the FDW technology as a base doesn't mean we can't invent new and
quite different stuff along the way.  One idea I've been toying with
is trying to create some notion of a "distributed" table.  This would
be a new relkind.  You'd have a single relation at the SQL level, not
an inheritance hierarchy, but under the hood the data would be spread
across a bunch of remote servers using the FDW interface.  So then you
reuse all of the query planner work and other enhancements that have
been put into the FDW stuff, but you'd present a much cleaner user
interface.  Or, maybe better, you could create a new FDW,
sharding_fdw, that works like postgres_fdw except that instead of
putting the data on one particular foreign server, it spreads the data
out across multiple servers and manages the sharding process under the
hood.  That would, again, let you reuse a lot of the work that's been
done to improve the FDW infrastructure while creating something
significantly more powerful than what postgres_fdw is today.  I don't
know, I don't have any ideas about this.  I think your concern is
valid, and I share it.  But I just 

Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Craig Ringer
On 5 March 2016 at 23:41, Kevin Grittner  wrote:

>
> > I'd be really interested in some ideas on how that information might be
> > usefully accessed. If we could write info on when to apply commits to the
> > xlog in serializable mode that'd be very handy, especially when looking
> to
> > the future with logical decoding of in-progress transactions, parallel
> > apply, etc.
>
> Are you suggesting the possibility of holding off on writing the
> commit record for a SERIALIZABLE transaction to WAL until it is
> known that no other SERIALIZABLE transaction comes ahead of it in
> the apparent order of execution?  If so, that's an interesting idea
> that I hadn't given much thought to yet -- I had been assuming
> current WAL writes, with adjustments to the timing of application
> of the records.
>

I wasn't, I simply wrote less than clearly. I intended to say "from the
xlog" where I wrote "to the xlog". Nonetheless, that'd be a completely
unrelated but interesting thing to explore...


> > For parallel apply I anticipated that we'd probably have workers applying
> > xacts in parallel and committing them in upstream commit order. They'd
> > sometimes deadlock with each other; when this happened all workers whose
> > xacts committed after the first aborted xact would have to abort and
> start
> > again. Not ideal, but safe.
> >
> > Being able to avoid that by using SSI information was in the back of my
> > mind, but with no idea how to even begin to tackle it. What you've
> mentioned
> > here is helpful and I'd be interested if you could share a bit more of
> your
> > experience in the area.
>
> My thinking so far has been that reordering the application of
> transaction commits on a replica would best be done as the minimal
> rearrangement possible from commit order which allows the work of
> transactions to become visible in an order consistent with some
> one-at-a-time run of those transactions.  Partly that is because
> the commit order is something that is fairly obvious to see and is
> what most people intuitively look at, even when it is wrong.
> Deviating from this intuitive order seems likely to introduce
> confusion, even when the results are 100% correct.
>

The only place you *need* to vary from commit order for correctness
> is when there are overlapping SERIALIZABLE transactions, one
> modifies data and commits, and another reads the old version of the
> data but commits later.


Ah, right. So here, even though X1 commits before X2 running concurrently
under SSI, the logical order in which the xacts could've occurred serially
is that where xact 2 runs and commits before X1, since xact 2 doesn't
depend on xact 1. X2 read the old row version before xact 1 modified it,
and logically occurs before xact1 in the serial rearrangement.

I don't fully grasp how that can lead to a situation where xacts can commit
in an order that's valid upstream but not valid as a downstream apply
order. I presume we're looking at read-only logical replicas here (rather
than multimaster), and it's only a concern for SERIALIZABLE xacts since a
READ COMMITTED xact on the master and replica would both be able to see the
state where X1 is commited but X2 isn't yet. But I don't see how a
read-only xact in SERIALIZABLE on the replica can get different results to
what it'd get with SSI on the master. It's entirely possible for a read
xact on the master to get a snapshot after X1 commits and after X2 commits,
same as READ COMMITTED. SSI shouldn't AFAIK come into play with no writes
to create a pivot. Is that wrong?

If we applied this sequence to the downstream in commit order we'd still
get correct results on the heap after applying both. We'd have an
intermediate state where X1 is commited but X2 isn't, but we can have the
same on the master. SSI doesn't AFAIK mask X1 from becoming visible in a
snapshot until X2 commits or anything, right?


>   Due to the action of SSI on the source
> machine, you know that there could not be any SERIALIZABLE
> transaction which saw the inconsistent state between the two
> commits, but on replicas we don't yet manage that.


OK, maybe that's what I'm missing. How exactly does SSI ensure that? (A
RTFM link / hint is fine, but I didn't find it in the SSI section of TFM at
least in a way I recognised).

The key is that
> there is a read-write dependency (a/k/a rw-conflict) between the
> two transactions which tells you that the second to commit has to
> come before the first in any graph of apparent order of execution.
>

Yeah, I get that part. How does that stop a 3rd SERIALIZABLE xact from
getting a snapshot between the two commits and reading from there?


> The tricky part is that when there are two overlapping SERIALIZABLE
> transactions and one of them has modified data and committed, and
> there is an overlapping SERIALIZABLE transaction which is not READ
> ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
> correct ordering remains in doubt -- there is no 

Re: [HACKERS] The plan for FDW-based sharding

2016-03-07 Thread Konstantin Knizhnik

On 03/07/2016 04:28 AM, Robert Haas wrote:

On Fri, Mar 4, 2016 at 10:54 PM, Craig Ringer  wrote:

I've got to say that this is somewhat reminicient of the discussions around
in-core pooling, where argument 1 is applied to justify excluding pooling
from core/contrib.

I don't have a strong position on whether a DTM should be in core or not as
I haven't done enough work in the area. I do think it's interesting to
strongly require that a DTM be in core while we also reject things like
pooling that are needed by a large proportion of users.

I don't remember this discussion, but I don't think I feel differently
about either of these two issues.  I'm not opposed to having some
hooks in core to make it easier to build a DTM, but I'm not convinced
that these hooks are the right hooks or that the design underlying
those hooks is correct.

What can I try to convince you that design of XTM API is correct?
I already wrote that we have not introduced some new abstractions.
What we have done is just encapsulate some existed Postgres functions.
The main reason was that we tried to minimize changes in Postgres core.
If seems to betempting if we can provide enough level of flexibility without 
rewriting core, isn't it?

What does it mean "enough level of flexibility"? We are interested in 
implementation of DTM, so if XTM API allows to do it for several considered approaches,
then it is "flexible enough".

So do you agree than before rewriting/refactoring xact.c/transam.c/procarray.c 
it is better first to try introduce XTM over existed code?
And if we find out that some useful functionality is missed and can not be 
overrden through this API in convenient and efficient way,
without copying substantial peaces of code, then only in this case we should 
consider refactoring of core transaction processing code to make it more 
modular and tunable.

If you agree with this statement, then next question is which set of functions 
needs to be overridden by XTM.
PostgreSQL transaction manager has many different functions, some of them are 
doing almost the same things, but in different way.
For example consider TransactionIdIsInProgress,TransactionIdIsKnownCompleted, 
TransactionIdDidCommit, TransactionIdDidAbort, TransactionIdGetStatus.
Some of them are accessing clog, some - procarray, some just check cached 
value. And so them are scattered through different Postgres modules.

So which of them has to be included in XTM API?
We have investigated code and usage of all this functions.
We found out that TransactionIdDidCommit is always called by visibility check 
after TransactionIdIsInProgress.
And it is in turn using TransactionIdGetStatus to extract information about 
transaction from clog.
So we have included in XTM TransactionIdIsInProgress and 
TransactionIdGetStatus, but not TransactionIdDidCommit,TransactionIdDidAbort 
and TransactionIdIsKnownCompleted.

Similar story is with other functions. For example: transaction commit.
There are once again a bundle of functions: CommitTransactionCommand, 
CommitTransaction, CommitSubTransaction, RecordTransactionCommit, 
TransactionIdSetTreeStatus.
CommitTransactionCommand - is function from public API. It is initiating switch 
of state of Postgres TM finite state automaton.
We do not want to affect logic of this automaton: it is the same for DTM and 
local TM. So we are looking deeper.
CommitTransaction/CommitSubTransaction are called by this FSM. We also do not 
want to change logic of processing subtransactions.
One more step deeper. So we arrive at TransactionIdSetTreeStatus. And this is 
why it is included in XTM.

Another example is tuple visibility check. There is a family of 
HeapTupleSatisfies* functions in  utils/time/tqual.c (IMHO: very strange place 
for one of the core Postgres submodule:)
Should we override all of them? No, because them are mostly based on few other 
functions, such as TransactionIdIsInProgress, TransactionIdIsInProgress, 
XidInMVCCSnapshot...
As far as we do not want to change heap tuple format, we leave all 
manipulations with tuple status bits as it is and redefine only 
XidInMVCCSnapshot() function.

So, I can provide arguments for all functions included in XTM: why it was 
included in this API and why some other related functions were not included.
But I can not provide that is a necessary and sufficient subset of function.
I do not see big problems in extending and refactoring this API in future. Postgres lives for a years a without custom TMs and  I do not expect that if presence of XTM API will cause development of many different TMs. Most likely very few people or 
companies will try to develop their TMs.  So compatibility will not be a buig issue here.




And, eventually, I would like to see a DTM in
core or contrib so that it can be accessible to everyone relatively
easily.


So am I. But before including something in core, it will be best to test it for 
many different scenarios.
It is especially true for DTM, 

Re: [HACKERS] The plan for FDW-based sharding

2016-03-06 Thread Robert Haas
On Fri, Mar 4, 2016 at 10:54 PM, Craig Ringer  wrote:
> I've got to say that this is somewhat reminicient of the discussions around
> in-core pooling, where argument 1 is applied to justify excluding pooling
> from core/contrib.
>
> I don't have a strong position on whether a DTM should be in core or not as
> I haven't done enough work in the area. I do think it's interesting to
> strongly require that a DTM be in core while we also reject things like
> pooling that are needed by a large proportion of users.

I don't remember this discussion, but I don't think I feel differently
about either of these two issues.  I'm not opposed to having some
hooks in core to make it easier to build a DTM, but I'm not convinced
that these hooks are the right hooks or that the design underlying
those hooks is correct.  And, eventually, I would like to see a DTM in
core or contrib so that it can be accessible to everyone relatively
easily.  Now, on connection pooling, I am similarly not opposed to
having some well-designed hooks, but I also think in the long run it
would be better for some improvements in this area to be part of core.
None of that means I would support any particular hook proposal, of
course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-06 Thread Robert Haas
On Fri, Mar 4, 2016 at 10:23 PM, Craig Ringer  wrote:
> I can imagine that many such hooks would have little use beyond PPAS, but
> I'm somewhat curious as to if any would have wider applications. It's not
> unusual for me to be working on something and think "gee, I wish there was a
> hook here".

Well, on the whole, we've adopted an approach of "hack core and
merge", so to some extent you have to use your imagination to think
about what it would look like if it were all done using hooks.  But
we've also actually added hooks to Advanced Server in some places
where PostgreSQL doesn't have them, and it's not hard to imagine that
somebody else might find those useful, at least.  Whether they'd be
useful enough that the PostgreSQL community would accept them if
EnterpriseDB were to approve open-sourcing them is another
question

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-06 Thread Thom Brown
On 6 Mar 2016 8:27 p.m., "Peter Geoghegan"  wrote:
>
> On Fri, Mar 4, 2016 at 4:41 PM, Robert Haas  wrote:
> > Yeah, I agree with that.  I am utterly mystified by why Bruce keeps
> > beating this drum, and am frankly pretty annoyed about it.  In the
> > first place, he seems to think that he invented the idea of using FDWs
> > for sharding in PostgreSQL, but I don't think that's true.  I think it
> > was partly my idea, and partly something that the NTT folks have been
> > working on for years (cf, e.g.,
> > cb1ca4d800621dcae67ca6c799006de99fa4f0a5).  As far as I understand it,
> > Bruce came in near the end of that conversation and now wants to claim
> > credit for something that doesn't really exist yet and, to the extent
> > that it does exist, wasn't even his idea.
>
> I think that it's easy to have the same idea as someone else
> independently. I've had that happen several times myself; ideas that
> other people had that I felt I could have easily had myself, or did in
> fact have. Most of the ideas that I have are fairly heavily based on
> known techniques. I don't think that I've ever creating a PostgreSQL
> feature that was in some way truly original, except perhaps for some
> aspects of how UPSERT works.

Everything is a remix.

Thom


Re: [HACKERS] The plan for FDW-based sharding

2016-03-06 Thread Peter Geoghegan
On Fri, Mar 4, 2016 at 4:41 PM, Robert Haas  wrote:
> Yeah, I agree with that.  I am utterly mystified by why Bruce keeps
> beating this drum, and am frankly pretty annoyed about it.  In the
> first place, he seems to think that he invented the idea of using FDWs
> for sharding in PostgreSQL, but I don't think that's true.  I think it
> was partly my idea, and partly something that the NTT folks have been
> working on for years (cf, e.g.,
> cb1ca4d800621dcae67ca6c799006de99fa4f0a5).  As far as I understand it,
> Bruce came in near the end of that conversation and now wants to claim
> credit for something that doesn't really exist yet and, to the extent
> that it does exist, wasn't even his idea.

I think that it's easy to have the same idea as someone else
independently. I've had that happen several times myself; ideas that
other people had that I felt I could have easily had myself, or did in
fact have. Most of the ideas that I have are fairly heavily based on
known techniques. I don't think that I've ever creating a PostgreSQL
feature that was in some way truly original, except perhaps for some
aspects of how UPSERT works.

Who cares whose idea FDW sharding was? It matters not a whit. It
probably independently occurred to several people that the FDW
interface could be built to support horizontal sharding more directly.
The idea almost suggests itself.

> EnterpriseDB *does* have a plan to try to continue enhancing foreign
> data wrappers so that you can run queries against foreign tables and
> get reasonable plans, something that currently isn't true.  I haven't
> heard anybody objecting to that, and I don't expect to hear anybody
> objecting to that, because it's hard to imagine why you wouldn't want
> queries against foreign data wrappers to produce better plans than
> they do today.  At worst, you might think it doesn't matter either
> way, but actually, I think there are a substantial number of people
> who are pretty happy about join pushdown and I expect that when and if
> we get aggregate pushdown working there will be even more people who
> are happy about that.

I think that that's Bruce's point, to a large degree.

>> Alternately, you can just work on the individual FDW features, which
>> *everyone* thinks are a good idea, and when most of them are done, FDW-based
>> scaleout will be such an obvious solution that nobody will argue with it.
>
> That's exactly what the people at EnterpriseDB who are actually doing
> work in this area are attempting to do.  Meanwhile, there's also
> Bruce, who is neither doing nor planning to do any work in this area,
> nor advising either EnterpriseDB or the PostgreSQL community to
> undertake any particular project, but who *is* making it sound like
> there is a super sekret plan that nobody else gets to see.

Is he? I didn't get that impression.

I think Bruce is trying to facilitate discussion, which can sometimes
require being a bit provocative. I think you're being quite unfair,
and mischaracterizing his words. I've heard Bruce talk about
horizontal scaling on several occasions, including at a talk in San
Francisco about a year ago, and I just thought it was Bruce being
Bruce -- primarily, a facilitator. I think that he is not especially
motivated by taking credit either here or in general, and not at all
by taking credit for other people's work.

It's not hard to get agreement about something abstract, like the
general idea of a distributed transaction manager. I fear that any
particular detailed interpretation of what that phrase means will be
very hard to get accepted into PostgreSQL.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-05 Thread Kevin Grittner
On Fri, Mar 4, 2016 at 10:10 PM, Craig Ringer  wrote:
> On 28 February 2016 at 06:38, Kevin Grittner  wrote:

>> What I sketched out with the "apparent order of execution"
>> ordering of the transactions (basically, commit order except
>> when one SERIALIZABLE transaction needs to be dragged in front
>> of another due to a read-write dependency) is possibly the
>> simplest approach, but batching may well give better
>> performance.
>
> I'd be really interested in some ideas on how that information might be
> usefully accessed. If we could write info on when to apply commits to the
> xlog in serializable mode that'd be very handy, especially when looking to
> the future with logical decoding of in-progress transactions, parallel
> apply, etc.

Are you suggesting the possibility of holding off on writing the
commit record for a SERIALIZABLE transaction to WAL until it is
known that no other SERIALIZABLE transaction comes ahead of it in
the apparent order of execution?  If so, that's an interesting idea
that I hadn't given much thought to yet -- I had been assuming
current WAL writes, with adjustments to the timing of application
of the records.

> For parallel apply I anticipated that we'd probably have workers applying
> xacts in parallel and committing them in upstream commit order. They'd
> sometimes deadlock with each other; when this happened all workers whose
> xacts committed after the first aborted xact would have to abort and start
> again. Not ideal, but safe.
>
> Being able to avoid that by using SSI information was in the back of my
> mind, but with no idea how to even begin to tackle it. What you've mentioned
> here is helpful and I'd be interested if you could share a bit more of your
> experience in the area.

My thinking so far has been that reordering the application of
transaction commits on a replica would best be done as the minimal
rearrangement possible from commit order which allows the work of
transactions to become visible in an order consistent with some
one-at-a-time run of those transactions.  Partly that is because
the commit order is something that is fairly obvious to see and is
what most people intuitively look at, even when it is wrong.
Deviating from this intuitive order seems likely to introduce
confusion, even when the results are 100% correct.

The only place you *need* to vary from commit order for correctness
is when there are overlapping SERIALIZABLE transactions, one
modifies data and commits, and another reads the old version of the
data but commits later.  Due to the action of SSI on the source
machine, you know that there could not be any SERIALIZABLE
transaction which saw the inconsistent state between the two
commits, but on replicas we don't yet manage that.  The key is that
there is a read-write dependency (a/k/a rw-conflict) between the
two transactions which tells you that the second to commit has to
come before the first in any graph of apparent order of execution.

The tricky part is that when there are two overlapping SERIALIZABLE
transactions and one of them has modified data and committed, and
there is an overlapping SERIALIZABLE transaction which is not READ
ONLY which has not yet reached completion (COMMIT or ROLLBACK) the
correct ordering remains in doubt -- there is no way to know which
might need to commit first, or whether it even matters.  I am
skeptical about whether in logical replication (including MMR), it
is going to be possible to manage this by finding "safe snapshots".
The only alternative I can see, though, is to suspend replication
while correct transaction ordering remains in doubt.  A big READ
ONLY transaction would not cause a replication stall, but a big
READ WRITE transaction could cause an indefinite stall.  Simon
seemed to be saying that this is unacceptable, but I tend to think
it is a viable approach for some workloads, especially if the READ
ONLY transaction property is used when possible.

There might be some wiggle room in terms of letting
non-SERIALIZABLE transactions commit while the ordering of
SERIALIZABLE transactions remain in doubt, but that would involve
allowing bigger deviations from commit order in transaction
application, which may confuse people.  The argument on the other
side is that if they use transaction isolation less strict than
SERIALIZABLE that they are vulnerable to seeing anomalies anyway,
so they must be OK with that.

Hopefully this is in some way helpful

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 2 March 2016 at 03:02, Bruce Momjian  wrote:

> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > Note that I am not saying that other discussed approaches are any
> > better, I am saying that we should know approximately what we
> > actually want and not just beat FDWs with a hammer and hope sharding
> > will eventually emerge and call that the plan.
>
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?


That you won't push it too hard if it works, but works badly, and will be
prepared to back off on the last steps despite all the lead-up
work/time/investment you've put into it.

If FDW-based sharding works, I'm happy enough, I have no horse in this
race. If it doesn't work I don't much care either. What I'm worried about
is it if works like partitioning using inheritance works - horribly badly,
but just well enough that it's served as an effective barrier to doing
anything better.

That's what I want to prevent. Sharding that only-just-works and then stops
us getting anything better into core.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 2 March 2016 at 00:03, Robert Haas  wrote:


>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.  We end up revising the index AM API pretty
> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.  I suspect that
> a transaction manager API would end up similarly situated.
> 
>

IMO that needs to be true of all hooks into the real innards.

The ProcessUtility_hook API changed a couple of times after introduction
and nobody screamed. I think we just have to mark such places as having
cross-version API volatility, so you should be prepared to #if
PG_VERSION_NUM around them if you use them.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 28 February 2016 at 06:38, Kevin Grittner  wrote:


>
> > For logical replay, applying in batches is actually a good thing since it
> > allows parallelism. We can remove them all from the target's procarray
> all
> > at once to avoid intermediate states becoming visible. So that would be
> the
> > preferred mechanism.
>
> That could be part of a solution.  What I sketched out with the
> "apparent order of execution" ordering of the transactions
> (basically, commit order except when one SERIALIZABLE transaction
> needs to be dragged in front of another due to a read-write
> dependency) is possibly the simplest approach, but batching may
> well give better performance.
>

I'd be really interested in some ideas on how that information might be
usefully accessed. If we could write info on when to apply commits to the
xlog in serializable mode that'd be very handy, especially when looking to
the future with logical decoding of in-progress transactions, parallel
apply, etc.

For parallel apply I anticipated that we'd probably have workers applying
xacts in parallel and committing them in upstream commit order. They'd
sometimes deadlock with each other; when this happened all workers whose
xacts committed after the first aborted xact would have to abort and start
again. Not ideal, but safe.

Being able to avoid that by using SSI information was in the back of my
mind, but with no idea how to even begin to tackle it. What you've
mentioned here is helpful and I'd be interested if you could share a bit
more of your experience in the area.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 27 February 2016 at 15:29, Konstantin Knizhnik  wrote:


> Two reasons:
> 1. There is no ideal implementation of DTM which will fit all possible
> needs and be  efficient for all clusters.
> 2. Even if such implementation exists, still the right way of it
> integration is Postgres should use kind of TM API.
> 
>


I've got to say that this is somewhat reminicient of the discussions around
in-core pooling, where argument 1 is applied to justify excluding pooling
from core/contrib.

I don't have a strong position on whether a DTM should be in core or not as
I haven't done enough work in the area. I do think it's interesting to
strongly require that a DTM be in core while we also reject things like
pooling that are needed by a large proportion of users.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Craig Ringer
On 27 February 2016 at 11:54, Robert Haas  wrote:



> I could submit a patch adding
> hooks to core to enable all of the things (or even just some of the
> things) that EnterpriseDB has changed in Advanced Server, and that
> patch would be rejected so fast it would make your head spin, because
> of course the core project doesn't want to be burdened with
> maintaining a whole bunch of hooks for the convenience of
> EnterpriseDB.


I can imagine that many such hooks would have little use beyond PPAS, but
I'm somewhat curious as to if any would have wider applications. It's not
unusual for me to be working on something and think "gee, I wish there was
a hook here".

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas
On Fri, Mar 4, 2016 at 8:27 PM, Joshua D. Drake  wrote:
> This does not sound like Bruce at all. Bruce is a lot of things, stubborn,
> sometimes temperamental, a lot of times like you... a hot head but he does
> not take credit for other people's work in my experience.

On the whole, Bruce is a much nicer guy than I am.  But I can't see
eye to eye with him on this.  I admit I may be being unfair to him,
but I'm telling it like I see it.  Like I do.

> Even if there was, so what? IF EDB wants to have a secret plan to push a lot
> of cool features to .Org, who cares? In the end, it all has to go through
> peer review and the meritocracy anyway.

I would just like to say that if I or my employer ever get accused of
having a nefarious plan, and somehow I get to pick *which* nefarious
plan I or my employer is to be accused of having, "a secret plan to
push a lot of cool features to .Org" sounds like a good one for me to
pick, especially since, yeah, we have that plan.  We plan to (try to)
push a lot of cool features to .Org.  We - or at least I - do not plan
to do it in a way that is anything but respectful to the community
process.  Specifically, and in no particular order, we plan to
continue contributing performance and scalability enhancements,
improvements to parallel query, and FDW-related improvements, just as
we have for 9.6.  We may also try to contribute other stuff that we
think will be cool and benefit PostgreSQL.  Suggestions are welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Joshua D. Drake

On 03/04/2016 04:41 PM, Robert Haas wrote:

As far as I understand it,
Bruce came in near the end of that conversation and now wants to claim
credit for something that doesn't really exist yet and, to the extent
that it does exist, wasn't even his idea.


Robert,

This does not sound like Bruce at all. Bruce is a lot of things, 
stubborn, sometimes temperamental, a lot of times like you... a hot head 
but he does not take credit for other people's work in my experience.



get reasonable plans, something that currently isn't true.  I haven't
heard anybody objecting to that, and I don't expect to hear anybody
objecting to that, because it's hard to imagine why you wouldn't want
queries against foreign data wrappers to produce better plans than
they do today.  At worst, you might think it doesn't matter either
way, but actually, I think there are a substantial number of people
who are pretty happy about join pushdown and I expect that when and if
we get aggregate pushdown working there will be even more people who
are happy about that.


Agreed.


That's exactly what the people at EnterpriseDB who are actually doing
work in this area are attempting to do.  Meanwhile, there's also
Bruce, who is neither doing nor planning to do any work in this area,
nor advising either EnterpriseDB or the PostgreSQL community to
undertake any particular project, but who *is* making it sound like
there is a super sekret plan that nobody else gets to see.  However,


I don't see this Robert. I don't see some secret hidden plan. I don't 
see any cabal. I see a guy that has an idea, just like everyone else on 
this list.



as the guy who actually wrote the plan that EnterpriseDB is following,
I happen to know that there's nothing more to it than what I wrote
above.


Even if there was, so what? IF EDB wants to have a secret plan to push a 
lot of cool features to .Org, who cares? In the end, it all has to go 
through peer review and the meritocracy anyway.


Sincerely,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas
On Tue, Mar 1, 2016 at 12:07 PM, Konstantin Knizhnik
 wrote:
> In the article them used anotion "wait":
>
> if T.SnapshotTime>GetClockTime()
> then wait until T.SnapshotTime
> Originally we really do sleep here, but then we think that instead of
> sleeping we can just adjust local time.
> Sorry, I do not have format prove it is equivalent but... at least we have
> not encountered any inconsistencies after this fix and performance is
> improved.

I think that those things are probably not equivalent.  They would be
if you could cause the adjustment to advance in lock-step on every
node at the same time, but you probably can't.  And I think it is
extremely unwise to assume that the fact that nothing obviously broke
means that you got it right.  This is the sort of work where formal
proofs of correctness are, IMHO, extremely wise.

> I fear that building a DTM that is fully reliable and also
> well-performing is going to be really hard, and I think it would be
> far better to have one such DTM that is 100% reliable than two or more
> implementations each of which are 99% reliable.
>
> The question is not about it's reliability, but mostly about its
> functionality and flexibility.

Well, *my* concern is about reliability.  A lot of code can be made
faster at the price of less reliability, but that usually doesn't work
out well in the end.  Performance matters too, of course, but the way
to get there is to start with a good algorithm, write reliable code to
implement it, and then optimize.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas
On Wed, Mar 2, 2016 at 1:53 PM, Josh berkus  wrote:
> One of the things which causes bad reactions and arguments, Bruce, is that a
> lot of your posts and presentations detailing plans for the FDW approach
> carry the subtext that all four of the other approaches are dead ends and
> not worth considering.  Given that the other approaches, whatever their
> limitations, have working code in the field and the FDW approach does not,
> that's more than a little offensive.

Yeah, I agree with that.  I am utterly mystified by why Bruce keeps
beating this drum, and am frankly pretty annoyed about it.  In the
first place, he seems to think that he invented the idea of using FDWs
for sharding in PostgreSQL, but I don't think that's true.  I think it
was partly my idea, and partly something that the NTT folks have been
working on for years (cf, e.g.,
cb1ca4d800621dcae67ca6c799006de99fa4f0a5).  As far as I understand it,
Bruce came in near the end of that conversation and now wants to claim
credit for something that doesn't really exist yet and, to the extent
that it does exist, wasn't even his idea.  In the second place, the
only thing that these repeated emails and development meeting
discussions of the topic actually accomplish is to be piss people off.
I do believe that enhancing the foreign data wrapper interface can be
part of a horizontal scalability story for PostgreSQL, but as long as
nobody is objecting to the individual enhancements, which I don't see
anybody doing, then why the heck do we have to keep arguing about this
big picture story?  It doesn't matter at all, and it doesn't even
really exist, yet somehow Bruce keeps bringing it up, which I think
serves no useful purpose whatsoever.

> If we want to move forwards on serious work on FDW-based sharding, the folks
> working on it should stop treating it as a "fait accompli" that this is the
> Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all of your
> time arguing that point instead of working on features that matter.

The only person treating it that way is Bruce.

> In contrast, this FDW plan *still* feels very much like a small group made
> up of employees of only two companies came up with it in private and decided
> that it should be the plan for the whole project.  I know that Bruce and
> others have good reasons for starting the FDW project, but there hasn't been
> much of an attempt to obtain community consensus around it. If Bruce and
> others want contributors to work on FDWs instead of other sharding
> approaches, then they need to win over those people as to why they should do
> that.  It's how this community works.

There hasn't been much of an attempt to obtain community consensus
about it because there isn't actually some grand plan, private or
otherwise, much as Bruce's emails might make you think otherwise.
EnterpriseDB *does* have a plan to try to continue enhancing foreign
data wrappers so that you can run queries against foreign tables and
get reasonable plans, something that currently isn't true.  I haven't
heard anybody objecting to that, and I don't expect to hear anybody
objecting to that, because it's hard to imagine why you wouldn't want
queries against foreign data wrappers to produce better plans than
they do today.  At worst, you might think it doesn't matter either
way, but actually, I think there are a substantial number of people
who are pretty happy about join pushdown and I expect that when and if
we get aggregate pushdown working there will be even more people who
are happy about that.

The only other ongoing work that EnterpriseDB has that at all touches
on this area is Ashutosh Bapat's work on 2PC for FDWs.  I'm not
convinced that's fully baked, and it conflicts with the XTM stuff the
Postgres Pro guys are doing, which I *also* don't think is fully
baked, so I'm not real keen on pressing forward aggressively with
either approach right now.  I think we (eventually) need a solution to
the problem of consistent cross-node consistency, but I am deeply
unconvinced that anything currently on the table is going to get us
there.  I did recommend the 2PC for FDW project, but I'm not amazingly
happy with how it came out, and I think we need to think harder about
other approaches before adopting something.

> Alternately, you can just work on the individual FDW features, which
> *everyone* thinks are a good idea, and when most of them are done, FDW-based
> scaleout will be such an obvious solution that nobody will argue with it.

That's exactly what the people at EnterpriseDB who are actually doing
work in this area are attempting to do.  Meanwhile, there's also
Bruce, who is neither doing nor planning to do any work in this area,
nor advising either EnterpriseDB or the PostgreSQL community to
undertake any particular project, but who *is* making it sound like
there is a super sekret plan that nobody else gets to see.  However,
as the guy who actually wrote the plan that EnterpriseDB is 

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov
On Mar 3, 2016 4:47 AM, "Michael Paquier"  wrote:
>
> On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
>  wrote:
> > If FDWs existed then Postgres XC/XL were being developed then I believe
they
> > would try to build full-featured prototype of FDW based sharding. If
this
> > prototype succeed then we could make a full roadmap.
>
> Speaking here with my XC hat, that's actually the case. A couple of
> years back when I worked on it, there were discussions about reusing
> FDW routines for the purpose of XC, which would have been roughly
> reusing postgres_fdw + the possibility to send XID, snapshot and
> transaction timestamp to the remote nodes after getting that from the
> GTM (global transaction manager ensuring global data visibility and
> consistency), and have the logic for query pushdown in the FDW itself
> when planning query on what would have been roughly foreign tables
> (not entering in the details here, those would have not been entirely
> foreign tables). At this point the global picture was not completely
> set, XC being based on 9.1~9.2 and the FDW base routines were not as
> extended as they are now. As history has told, this global picture has
> never showed up, though it would should XC have been merged with 9.3.
> The point is that XC would have moved as using the FDW approach, as a
> set of plugins.
>
> This was a reason behind this email of 2013 on -hackers actually:
>
http://www.postgresql.org/message-id/cab7npqtdjf-58wuf-xz01nkj7wf0e+eukggqhd0igvsod4h...@mail.gmail.com

Good to remember!

> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Tatsuo Ishii
> On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
>  wrote:
>> If FDWs existed then Postgres XC/XL were being developed then I believe they
>> would try to build full-featured prototype of FDW based sharding. If this
>> prototype succeed then we could make a full roadmap.
> 
> Speaking here with my XC hat, that's actually the case. A couple of
> years back when I worked on it, there were discussions about reusing
> FDW routines for the purpose of XC, which would have been roughly
> reusing postgres_fdw + the possibility to send XID, snapshot and
> transaction timestamp to the remote nodes after getting that from the
> GTM (global transaction manager ensuring global data visibility and
> consistency), and have the logic for query pushdown in the FDW itself
> when planning query on what would have been roughly foreign tables
> (not entering in the details here, those would have not been entirely
> foreign tables). At this point the global picture was not completely
> set, XC being based on 9.1~9.2 and the FDW base routines were not as
> extended as they are now. As history has told, this global picture has
> never showed up, though it would should XC have been merged with 9.3.
> The point is that XC would have moved as using the FDW approach, as a
> set of plugins.
> 
> This was a reason behind this email of 2013 on -hackers actually:
> http://www.postgresql.org/message-id/cab7npqtdjf-58wuf-xz01nkj7wf0e+eukggqhd0igvsod4h...@mail.gmail.com
> 
> There were as well discussions about making the connection pooler a
> background worker and plug in that in a shared memory context that all
> backends connecting to this XC-like-postgres_fdw would use, though
> this is another story, for another time...

Thanks for the history. Very interesting...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Michael Paquier
On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
 wrote:
> If FDWs existed then Postgres XC/XL were being developed then I believe they
> would try to build full-featured prototype of FDW based sharding. If this
> prototype succeed then we could make a full roadmap.

Speaking here with my XC hat, that's actually the case. A couple of
years back when I worked on it, there were discussions about reusing
FDW routines for the purpose of XC, which would have been roughly
reusing postgres_fdw + the possibility to send XID, snapshot and
transaction timestamp to the remote nodes after getting that from the
GTM (global transaction manager ensuring global data visibility and
consistency), and have the logic for query pushdown in the FDW itself
when planning query on what would have been roughly foreign tables
(not entering in the details here, those would have not been entirely
foreign tables). At this point the global picture was not completely
set, XC being based on 9.1~9.2 and the FDW base routines were not as
extended as they are now. As history has told, this global picture has
never showed up, though it would should XC have been merged with 9.3.
The point is that XC would have moved as using the FDW approach, as a
set of plugins.

This was a reason behind this email of 2013 on -hackers actually:
http://www.postgresql.org/message-id/cab7npqtdjf-58wuf-xz01nkj7wf0e+eukggqhd0igvsod4h...@mail.gmail.com

There were as well discussions about making the connection pooler a
background worker and plug in that in a shared memory context that all
backends connecting to this XC-like-postgres_fdw would use, though
this is another story, for another time...
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov
On Wed, Mar 2, 2016 at 9:53 PM, Josh berkus  wrote:

> On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:
>
>> Sorry, but based on this plan it is possible to make a conclusion that
>> there are only two possible cluster solutions for Postgres:
>> XC/XL and FDW-based.  From my point of view there are  much more
>> possible alternatives.
>>
>
> Definitely.
>
> Currently we have five approaches to sharding inside postgres in the
> field, in chronological order:
>
> 1. Greenplum's executor-based approach with motion nodes
>
> 2. Skype's function-based approach (PL/proxy)
>
> 3. XC/XL's approach, which I believe is also query executor-based
>
> 4. CitusDB's pg_shard which is based on query hooks
>
> 5. FDW-based (currently theoretical)
>
> One of the things which causes bad reactions and arguments, Bruce, is that
> a lot of your posts and presentations detailing plans for the FDW approach
> carry the subtext that all four of the other approaches are dead ends and
> not worth considering.  Given that the other approaches, whatever their
> limitations, have working code in the field and the FDW approach does not,
> that's more than a little offensive.
>
> If we want to move forwards on serious work on FDW-based sharding, the
> folks working on it should stop treating it as a "fait accompli" that this
> is the Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all
> of your time arguing that point instead of working on features that matter.
>
> Bruce made a long comparison with built-in replication, but there's a big
> difference here.  We decided that WAL-based replication was the way to go
> for built-in as a community decision here on -hackers and at various
> conferences.  Both the plan and the implementation for replication
> transcended company backing, involving even active competitors, and
> involved discussions with maintainers of the older replication projects.
>
> In contrast, this FDW plan *still* feels very much like a small group made
> up of employees of only two companies came up with it in private and
> decided that it should be the plan for the whole project.  I know that
> Bruce and others have good reasons for starting the FDW project, but there
> hasn't been much of an attempt to obtain community consensus around it. If
> Bruce and others want contributors to work on FDWs instead of other
> sharding approaches, then they need to win over those people as to why they
> should do that.  It's how this community works.
>
> Alternately, you can just work on the individual FDW features, which
> *everyone* thinks are a good idea, and when most of them are done,
> FDW-based scaleout will be such an obvious solution that nobody will argue
> with it.


+1

Thank you, Josh. I think this is excellent summary for conversation about
FDW-based sharding.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Josh berkus

On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:

Sorry, but based on this plan it is possible to make a conclusion that
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more
possible alternatives.


Definitely.

Currently we have five approaches to sharding inside postgres in the 
field, in chronological order:


1. Greenplum's executor-based approach with motion nodes

2. Skype's function-based approach (PL/proxy)

3. XC/XL's approach, which I believe is also query executor-based

4. CitusDB's pg_shard which is based on query hooks

5. FDW-based (currently theoretical)

One of the things which causes bad reactions and arguments, Bruce, is 
that a lot of your posts and presentations detailing plans for the FDW 
approach carry the subtext that all four of the other approaches are 
dead ends and not worth considering.  Given that the other approaches, 
whatever their limitations, have working code in the field and the FDW 
approach does not, that's more than a little offensive.


If we want to move forwards on serious work on FDW-based sharding, the 
folks working on it should stop treating it as a "fait accompli" that 
this is the Chosen Way for the PostgreSQL project.  Otherwise, you'll 
spend all of your time arguing that point instead of working on features 
that matter.


Bruce made a long comparison with built-in replication, but there's a 
big difference here.  We decided that WAL-based replication was the way 
to go for built-in as a community decision here on -hackers and at 
various conferences.  Both the plan and the implementation for 
replication transcended company backing, involving even active 
competitors, and involved discussions with maintainers of the older 
replication projects.


In contrast, this FDW plan *still* feels very much like a small group 
made up of employees of only two companies came up with it in private 
and decided that it should be the plan for the whole project.  I know 
that Bruce and others have good reasons for starting the FDW project, 
but there hasn't been much of an attempt to obtain community consensus 
around it. If Bruce and others want contributors to work on FDWs instead 
of other sharding approaches, then they need to win over those people as 
to why they should do that.  It's how this community works.


Alternately, you can just work on the individual FDW features, which 
*everyone* thinks are a good idea, and when most of them are done, 
FDW-based scaleout will be such an obvious solution that nobody will 
argue with it.


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Konstantin Knizhnik



On 01.03.2016 22:02, Bruce Momjian wrote:

On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:

Note that I am not saying that other discussed approaches are any
better, I am saying that we should know approximately what we
actually want and not just beat FDWs with a hammer and hope sharding
will eventually emerge and call that the plan.

I will say it again --- FDWs are the only sharding method I can think of
that has a chance of being accepted into Postgres core.  It is a plan,
and if it fails, it fails.  If is succeeds, that's good.  What more do
you want me to say?  I know of no other way to answer the questions you
asked above.


I do not understand why it can fail.
FDW approach may be not flexible enough for building optimal distributed 
query execution plans for complex OLAP queries.
But for simple queries it should work fine. Simple queries corresponds  
OLTP and simple OLAP.
For OLTP we definitely need transaction manager to provide global 
consistency.
And we have actually prototype of integration postgres_fdw with out 
pg_dtm and pg_tsdtm transaction managers.

The results are quite IMHO promising (see attached diagram).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



DTM-pgconf.pdf
Description: Adobe PDF document

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov
On Tue, Mar 1, 2016 at 10:11 PM, Bruce Momjian  wrote:

> On Tue, Mar  1, 2016 at 02:02:44PM -0500, Bruce wrote:
> > On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > > Note that I am not saying that other discussed approaches are any
> > > better, I am saying that we should know approximately what we
> > > actually want and not just beat FDWs with a hammer and hope sharding
> > > will eventually emerge and call that the plan.
> >
> > I will say it again --- FDWs are the only sharding method I can think of
> > that has a chance of being accepted into Postgres core.  It is a plan,
> > and if it fails, it fails.  If is succeeds, that's good.  What more do
> > you want me to say?  I know of no other way to answer the questions you
> > asked above.
>
> I guess all I can say is that if FDWs existed when Postgres XC/XL were
> being developed, that they likely would have been used or at least
> considered.  I think we are basically making that attempt now.


If FDWs existed then Postgres XC/XL were being developed then I believe
they would try to build full-featured prototype of FDW based sharding. If
this prototype succeed then we could make a full roadmap.
For now, we don't have a full roadmap, we have only some pieces. This is
why people doubt. When you're speaking about advances that are natural to
FDW, then no problem, nobody is against FDW advances. However, other things
are unclear.
You can try to build full-featured prototype to convince people. Despite it
would take some resources it will save more resources because it would save
us from errors.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov
On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> > On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> >> > Two reasons:
> >> > 1. There is no ideal implementation of DTM which will fit all
> possible needs
> >> > and be  efficient for all clusters.
> >>
> >> Hmm, what is the reasoning behind that statement?  I mean, it is
> >> certainly true that there are some places where we have decided that
> >> one-size-fits-all is not the right approach.  Indexing, for example.
> >
> > Uh, is that even true of indexing?  While the plug-in nature of indexing
> > allows for easier development and testing, does anyone create plug-in
> > indexing that isn't shipped by us?  I thought WAL support was something
> > that prevented external indexing solutions from working.
>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.


It's because we didn't offer legal mechanism for pluggable AMs.


> We end up revising the index AM API pretty
> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.


I can't buy this argument. One may say this about any single API. Thinking
so will lead you to rejecting any extendability. And that would be in
direct contradiction to original postgres concept.
During last 5 years we add 2 new AMs: SP-GiST and BRIN. And BRIN is very
different from any other AM we have before.
And I wouldn't say that AM API have dramatical changes during that time.
There were some changes. But it would be normal work for extension
maintainers to adopt these changes like they do for other API changes.

There is simple example where we lack of extensible AMs: fast full-text
search. We can't provide it with current GIN, because we lack of positional
information in it. And we can't push these advances into core because
current implementation has not perfect design. Ideal design would be push
all required functionality into btree, then make GIN wrapper over btree
then add required functionality. But this is roadmap for 5-10 years. These
5-10 years uses will suffer from having 3-rd party solutions for fast FTS
instead of in core one. But our design questions is actually not something
that users care about. It's not reliability questions. And having pluggable
AMs would be really chance in this situation. Users could use extension
right now. And then when after many years we finally implement the right
design, they could migrate to in-core solution. But 5-10 years of fast FTS
does matter.


> I suspect that
> a transaction manager API would end up similarly situated.
>

I disagree with you about AM API. But I agree that TM API should be in
similar situation to AM API.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov
On Wed, Mar 2, 2016 at 4:36 AM, Tomas Vondra 
wrote:

Hi,
>
> On 03/01/2016 08:02 PM, Bruce Momjian wrote:
>
>> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
>>
>>> Note that I am not saying that other discussed approaches are any
>>> better, I am saying that we should know approximately what we
>>> actually want and not just beat FDWs with a hammer and hope sharding
>>> will eventually emerge and call that the plan.
>>>
>>
>> I will say it again --- FDWs are the only sharding method I can think
>> of that has a chance of being accepted into Postgres core.
>>
>
>
>
> While I disagree with Simon on various things, I absolutely understand why
> he was asking about a prototype, and some sort of analysis of what usecases
> we expect to support initially/later/never, and what pieces are missing to
> get the sharding working. IIRC at the FOSDEM Dev Meeting you've claimed
> you're essentially working on a prototype - once we have the missing FDW
> pieces, we'll know if it works. I disagree that - it's not a prototype if
> it takes several years to find the outcome.
>
>
fully agree. Probably, we all need to help to build prototype in
between-releases period. I see no legal way to resolve the situation.


>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov
On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> > On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> >> > Two reasons:
> >> > 1. There is no ideal implementation of DTM which will fit all
> possible needs
> >> > and be  efficient for all clusters.
> >>
> >> Hmm, what is the reasoning behind that statement?  I mean, it is
> >> certainly true that there are some places where we have decided that
> >> one-size-fits-all is not the right approach.  Indexing, for example.
> >
> > Uh, is that even true of indexing?  While the plug-in nature of indexing
> > allows for easier development and testing, does anyone create plug-in
> > indexing that isn't shipped by us?  I thought WAL support was something
> > that prevented external indexing solutions from working.
>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.  We end up revising the index AM API pretty
>

We'd love to develop new special index AM, that's why we all are for
pluggable WAL. I think there are will be other AM developers, once we open
the door for that.


> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.  I suspect that
> a transaction manager API would end up similarly situated.
>

I don't expect many other TM developers, so there is no problem with
improving API. We started from practical needs and analyses of many
academical papers. We spent a year to play with several prototypes to prove
our proposed API (expect more in several months). Everybody could download
them a test. Wish we can do that with FDW-based sharding solution.

Of course, we can fork postgres as XC/XL people did and certainly
eventually will do, if community don't accept our proposal, since it's very
difficult to work on cross-releases projects. But then there are will be no
winners, so why do we all are aggressively don't understand each other ! I
was watching  XC/XL for years and thought I don't want to go this way of
isolation from the community, so we decided to let TM pluggable to stay
with community and let everybody prove their concepts. if you have ideas
how to improve TM API, we are open, if you know it's broken by design,
let's help us to fix it.  I have my understanding about FDW, but I
deliberately don't participate in some very hot discussion, just because I
feel myself not commited to work on. Your group is very enthusiastic on
FDW, it's ok until you improve FDW in general way, I'm very happy on
current work.  I prefer you show prototype of sharding solution, which
convince us in functionality and perfromance. I agree with Thomas Vondra,
that we don't want to wait for years to see the result, we want to expect
results, based on prototype, which should be done between releases. If you
don't have enough resources for this, let's do together with community.
Nobody as I've seen are against FDW sharding, people complained about "the
only sharding solution" in postgres, without proving so.





>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Tomas Vondra

Hi,

On 03/01/2016 08:02 PM, Bruce Momjian wrote:

On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:

Note that I am not saying that other discussed approaches are any
better, I am saying that we should know approximately what we
actually want and not just beat FDWs with a hammer and hope sharding
will eventually emerge and call that the plan.


I will say it again --- FDWs are the only sharding method I can think
of that has a chance of being accepted into Postgres core.


I don't quite see why that would be the case. Firstly, it assumes that 
FDW-based approach is going to work, but given the lack of prototype or 
even a technical analysis discussing the missing pieces, that's very 
difficult to judge.


I find it a bit annoying that there are objections from people who 
implemented (or attempted to implement) sharding on PostgreSQL, yet no 
reasonable analysis of their arguments and how the FDW approach will 
address them. My my understanding is they deem FDWs a bad foundation for 
sharding because it was designed for a different purpose, but the 
abstractions are a bad fit for sharding (which assumes isolated nodes, 
certain form of execution etc.)



It is a plan, and if it fails, it fails. If is succeeds, that's
good. What more do you want me to say? I know of no other way to
answer the questions you asked above.


Well, wouldn't it be great if we could do the decision based on some 
facts and not mere belief that it'll help. That's exactly what Petr is 
talking about - the fear that we'll spend a few years working on 
sharding based on FDWs, only to find out that it does not work too well. 
That'd be a pretty bad outcome, wouldn't it?


My other worry is that we'll eventually mess the FDW infrastructure, 
making it harder to use for the original purpose. Granted, most of the 
improvements proposed so far look sane and useful for FDWs in general, 
but sooner or later that ceases to be the case - there sill be changes 
needed merely for the sharding. Those will be tough decisions.


While I disagree with Simon on various things, I absolutely understand 
why he was asking about a prototype, and some sort of analysis of what 
usecases we expect to support initially/later/never, and what pieces are 
missing to get the sharding working. IIRC at the FOSDEM Dev Meeting 
you've claimed you're essentially working on a prototype - once we have 
the missing FDW pieces, we'll know if it works. I disagree that - it's 
not a prototype if it takes several years to find the outcome.


Also, in another branch of this thread you've said this (I don't want to 
sprinkle the thread with responses, so I'll just respond here):



In a way, I don't see any need for an FDW sharding prototype
because, as I said, we already know XC/XL work, so copying what they
do doesn't help. What we need to know is if we can get near the XC/XL
 benchmarks with an acceptable addition of code, which is what I
thought I already said. Perhaps this can be done with FDWs, or some
other approach I have not heard of yet.


I don't quite understand the reasoning presented here. The XC/XL are not 
based on FDWs at all, therefore the need for prototype of the FDW-based 
sharding is entirely independent to the fact that these solutions seem 
to work quite well.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Konstantin Knizhnik

On 03/01/2016 09:19 PM, Petr Jelinek wrote:


Since this thread heavily discusses the XTM, I have question about the XTM as proposed because one thing is very unclear to me - what happens when user changes the XTM plugin on the server? I didn't see any xid handover API which makes me wonder if 
changes of a plugin (or for example failure to load previously used plugin due to admin error) will send server to similar situation as xid wraparound.



Transaction manager is very "intimate" part of DBMS and certainly bugs and 
problems in custom TM implementation can break the server.
So if you are providing custom TM implementation, you should take full 
responsibility on system integrity.
XTM API itself doesn't enforce any XID handling policy. As far as we do not 
want to change tuple header format, XID is still 32-bit integer.

In case of pg_dtm, global transactions at all nodes are assigned the same XID 
by arbiter. Arbiter is handling XID wraparound.
In pg_tsdtm each node maintains its own XIDs, actually pg_tsdtm doesn't change way of assigning CIDs by Postgres. So wraparound in this case is handled in standard way. Instead of assigning own global XIDs, pg_tsdtm provides mapping between local XIDs and 
global CSNs. Visibility checking rules looks on CSNs, not on XIDs.


In both cases if system is for some reasons restarted and DTM plugin failed to 
be loaded, you can still access database locally. No data can be lost.


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Bruce Momjian
On Tue, Mar  1, 2016 at 02:02:44PM -0500, Bruce wrote:
> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > Note that I am not saying that other discussed approaches are any
> > better, I am saying that we should know approximately what we
> > actually want and not just beat FDWs with a hammer and hope sharding
> > will eventually emerge and call that the plan.
> 
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?  I know of no other way to answer the questions you
> asked above.

I guess all I can say is that if FDWs existed when Postgres XC/XL were
being developed, that they likely would have been used or at least
considered.  I think we are basically making that attempt now.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Bruce Momjian
On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> Note that I am not saying that other discussed approaches are any
> better, I am saying that we should know approximately what we
> actually want and not just beat FDWs with a hammer and hope sharding
> will eventually emerge and call that the plan.

I will say it again --- FDWs are the only sharding method I can think of
that has a chance of being accepted into Postgres core.  It is a plan,
and if it fails, it fails.  If is succeeds, that's good.  What more do
you want me to say?  I know of no other way to answer the questions you
asked above.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Petr Jelinek

On 27/02/16 04:54, Robert Haas wrote:

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
 wrote:

We do not have formal prove that proposed XTM is "general enough" to handle
all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based and
CSN  based.


I don't believe that for a minute.  For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.



I have similar problem with the FDW approach though. It seems to me like 
because we have something that solves access to external tables somebody 
decided that it should be used as base for the whole sharding solution 
but there is no real concept of how it will look like together, no ideas 
what it will be usable for and not even simple prototype that would 
prove that the idea is sound (although again, I am not clear on what the 
actual idea is beyond "we will use FDWs").


Don't get me wrong, I agree that the current FDW enhancements are 
useful, I am just worried about them being presented as future of 
sharding in Postgres when nobody has sketched how the future might look 
like. And  once we get to more interesting parts like consistency, 
distributed query planning, p2p connections (and I am really concerned 
about these as FDWs abstract some knowledge that coordinator and or data 
nodes might need to do these well), etc we might very well find 
ourselves painted in the corner and have to start from beginning, while 
if we had some idea on how the whole thing might look like we could 
identify this early and not postpone built-in sharding by several years 
just because somebody said we will use FDWs and that's what we worked on 
in those years.


Note that I am not saying that other discussed approaches are any 
better, I am saying that we should know approximately what we actually 
want and not just beat FDWs with a hammer and hope sharding will 
eventually emerge and call that the plan.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Petr Jelinek

On 01/03/16 18:18, Konstantin Knizhnik wrote:


On 01.03.2016 19:03, Robert Haas wrote:

On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:

On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:

Two reasons:
1. There is no ideal implementation of DTM which will fit all
possible needs
and be  efficient for all clusters.

Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.

Uh, is that even true of indexing?  While the plug-in nature of indexing
allows for easier development and testing, does anyone create plug-in
indexing that isn't shipped by us?  I thought WAL support was something
that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.



IMHO non-stable API is better than lack of API.
Just because it makes it possible to implement features in modular way.
And refactoring of API is not so difficult thing...



Since this thread heavily discusses the XTM, I have question about the 
XTM as proposed because one thing is very unclear to me - what happens 
when user changes the XTM plugin on the server? I didn't see any xid 
handover API which makes me wonder if changes of a plugin (or for 
example failure to load previously used plugin due to admin error) will 
send server to similar situation as xid wraparound.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Konstantin Knizhnik



On 01.03.2016 19:03, Robert Haas wrote:

On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:

On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:

Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs
and be  efficient for all clusters.

Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.

Uh, is that even true of indexing?  While the plug-in nature of indexing
allows for easier development and testing, does anyone create plug-in
indexing that isn't shipped by us?  I thought WAL support was something
that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.



IMHO non-stable API is better than lack of API.
Just because it makes it possible to implement features in modular way.
And refactoring of API is not so difficult thing...


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Konstantin Knizhnik

Thank you very much for you comments.

On 01.03.2016 18:19, Robert Haas wrote:

On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
 wrote:

How do you prevent clock skew from causing serialization anomalies?

If node receives message from "feature" it just needs to wait until this
future arrive.
Practically we just "adjust" system time in this case, moving it forward
(certainly system time is not actually changed, we just set correction value
which need to be added to system time).
This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do
here.

Hmm, the approach in that article is very interesting, but it sounds
different than what you are describing - they do not, AFAICT, have
anything like a "correction value"


In the article them used anotion "wait":

if T.SnapshotTime>GetClockTime()
then wait until T.SnapshotTime

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Robert Haas
On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
>> > Two reasons:
>> > 1. There is no ideal implementation of DTM which will fit all possible 
>> > needs
>> > and be  efficient for all clusters.
>>
>> Hmm, what is the reasoning behind that statement?  I mean, it is
>> certainly true that there are some places where we have decided that
>> one-size-fits-all is not the right approach.  Indexing, for example.
>
> Uh, is that even true of indexing?  While the plug-in nature of indexing
> allows for easier development and testing, does anyone create plug-in
> indexing that isn't shipped by us?  I thought WAL support was something
> that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Bruce Momjian
On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> > Two reasons:
> > 1. There is no ideal implementation of DTM which will fit all possible needs
> > and be  efficient for all clusters.
> 
> Hmm, what is the reasoning behind that statement?  I mean, it is
> certainly true that there are some places where we have decided that
> one-size-fits-all is not the right approach.  Indexing, for example.

Uh, is that even true of indexing?  While the plug-in nature of indexing
allows for easier development and testing, does anyone create plug-in
indexing that isn't shipped by us?  I thought WAL support was something
that prevented external indexing solutions from working.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Robert Haas
On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
 wrote:
>> How do you prevent clock skew from causing serialization anomalies?
>
> If node receives message from "feature" it just needs to wait until this
> future arrive.
> Practically we just "adjust" system time in this case, moving it forward
> (certainly system time is not actually changed, we just set correction value
> which need to be added to system time).
> This approach was discussed in the article:
> http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
> I hope, in this article algorithm is explained much better than I can do
> here.

Hmm, the approach in that article is very interesting, but it sounds
different than what you are describing - they do not, AFAICT, have
anything like a "correction value".

> There are well know limitation of this  pg_tsdtm which we will try to
> address in future.

How well known are those limitations?  Are they documented somewhere?
Or are they only well-known to you?

> What we want is to include XTM API in PostgreSQL to be able to continue our
> experiments with different transaction managers and implementing multimaster
> on top of it (our first practical goal) without affecting PostgreSQL core.
>
> If XTM patch will be included in 9.6, then we can propose our multimaster as
> PostgreSQL extension and everybody can use it.
> Otherwise we have to propose our own fork of Postgres which significantly
> complicates using and maintaining it.

Well I still think what I said before is valid.  If the code is good,
let it be a core submission.  If it's not ready yet, submit it to core
when it is.  If it can't be made good, forget it.

>> This seems rather defeatist.  If the code is good and reliable, why
>> should it not be committed to core?
>
> Two reasons:
> 1. There is no ideal implementation of DTM which will fit all possible needs
> and be  efficient for all clusters.

Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.
But there are many other places where we have not chosen to make
things pluggable, and that I don't think it should be taken for
granted that plugability is always an advantage.

I fear that building a DTM that is fully reliable and also
well-performing is going to be really hard, and I think it would be
far better to have one such DTM that is 100% reliable than two or more
implementations each of which are 99% reliable.

> 2. Even if such implementation exists, still the right way of it integration
> is Postgres should use kind of TM API.

Sure, APIs are generally good, but that doesn't mean *this* API is good.

> I hope that everybody will agree that doing it in this way:
>
> #ifdef PGXC
> /* In Postgres-XC, stop timestamp has to follow the timeline of GTM
> */
> xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
> #else
> xlrec.xact_time = xactStopTimestamp;
> #endif

PGXC chose that style in order to simplify merging.  I wouldn't have
picked the same thing, but I don't know why it deserves scorn.

> or in this way:
>
> xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp
> : xactStopTimestamp;
>
> is very very bad idea.

I don't know why that is such a bad idea.  It's a heck of a lot faster
than insisting on calling some out-of-line function.  It might be a
bad idea, but I think we need to decide that, not assume it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-28 Thread Simon Riggs
On 27 February 2016 at 22:38, Kevin Grittner  wrote:


> That could be part of a solution.  What I sketched out with the
> "apparent order of execution" ordering of the transactions
> (basically, commit order except when one SERIALIZABLE transaction
> needs to be dragged in front of another due to a read-write
> dependency) is possibly the simplest approach, but batching may
> well give better performance.
>
> > Collecting a list of transactions that must be applied before the current
> > one could be accumulated during SSI processing and added to the commit
> > record. But reordering the transaction apply is something we'd need to
> get
> > some real clear theory on before we considered it.
>
> Oh, there is a lot of very clear theory on it.  I even considered
> whether it might work at the physical level, but that seems fraught
> with potential land-mines due to the subtle ways in which we manage
> race conditions at the detail level.  It's one of those things that
> seems theoretically possible, but probably a really bad idea in
> practice.  For logical replication, though, there is a clear way to
> determine a reasonable order of applying changes that will never
> yield a serialization anomaly -- if we do that, we dodge the choice
> between using a "stale" safe snapshot or waiting an indeterminate
> length of time for a "fresh" safe snapshot -- at the cost of
> delaying logical replication itself at various points.
>

I think we're going to have practical difficulties with these concepts.

If an xid commits with inConflicts, those refer to transactions that may
not yet have assigned xids. They may be assigned xids for hours or days
even so its hard to know whether they will eventually become write
transactions or not, making it a challenge to even know whether we should
delay. And if even if we did know, delaying apply of commits for hours to
allow us to reorder transactions isn't practical in all cases, clearly,
more so if the impact is caused by one minor table that nobody much cares
about.

What I see as more practical is reducing the scope of "safe transactions"
down to "safe scopes", where particular tables or sets of tables are known
safe at particular times, so we know more about which things we can look at
safely.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-02-28 Thread Konstantin Knizhnik

On 02/27/2016 11:38 PM, Kevin Grittner wrote:


Is this an implementation of some particular formal technique?  If
so, do you have a reference to a paper on it?  I get the sense that
there has been a lot written about distributed transactions, and
that it would be a mistake to ignore it, but I have not (yet)
reviewed the literature for it.


The reference to the article is at our WiKi pages explaining our DTM: 
https://wiki.postgresql.org/wiki/DTM

http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Kevin Grittner
On Sat, Feb 27, 2016 at 3:57 PM, Simon Riggs  wrote:
> On 27 February 2016 at 17:54, Kevin Grittner  wrote:
>>
>> On a single database SSI can see whether a read has
>> caused such a problem.  If you replicate the transactions to
>> somewhere else and read them SSI cannot tell whether there is an
>> anomaly
>
> OK, I thought you were saying something else. What you're saying is that SSI
> doesn't work on replicas, yet, whether that is physical or logical.

Right.

> Row level locking (S2PL) can be used on logical standbys, so its actually a
> better situation.

Except that S2PL has the concurrency and performance problems that
caused us to rip out a working S2PL implementation in PostgreSQL
core.  Layering it on outside of that isn't going to offer better
concurrency or perform better than what we ripped out; but it does
work.

>> One possibility is to pass along information
>> about when things are in a state on the source that is known to be
>> free of anomalies if read; another would be to reorder the
>> application of transactions to match the apparent order of
>> execution.  The latter would not work for "physical" replication,
>> but should be fine for logical replication.  An implementation
>> might create a list in commit order, but not release the front of
>> the list for processing if it is a SERIALIZABLE transaction which
>> has written data until all overlapping SERIALIZABLE transactions
>> complete, so it can move any subsequently-committed SERIALIZABLE
>> transaction which read the "old" version of the data ahead of it.
>
> The best way would be to pass across "anomaly barriers", since they can
> easily be inserted into the WAL stream. The main issue seems to be how and
> when to detect them.

That, and how to choose whether to run right away with the last
known consistent snapshot, or wait for the next one.  There seem to
be use cases for both.  None of it seems extraordinarily hard; it's
just never been anyone's top priority.  :-/

> For logical replay, applying in batches is actually a good thing since it
> allows parallelism. We can remove them all from the target's procarray all
> at once to avoid intermediate states becoming visible. So that would be the
> preferred mechanism.

That could be part of a solution.  What I sketched out with the
"apparent order of execution" ordering of the transactions
(basically, commit order except when one SERIALIZABLE transaction
needs to be dragged in front of another due to a read-write
dependency) is possibly the simplest approach, but batching may
well give better performance.

> Collecting a list of transactions that must be applied before the current
> one could be accumulated during SSI processing and added to the commit
> record. But reordering the transaction apply is something we'd need to get
> some real clear theory on before we considered it.

Oh, there is a lot of very clear theory on it.  I even considered
whether it might work at the physical level, but that seems fraught
with potential land-mines due to the subtle ways in which we manage
race conditions at the detail level.  It's one of those things that
seems theoretically possible, but probably a really bad idea in
practice.  For logical replication, though, there is a clear way to
determine a reasonable order of applying changes that will never
yield a serialization anomaly -- if we do that, we dodge the choice
between using a "stale" safe snapshot or waiting an indeterminate
length of time for a "fresh" safe snapshot -- at the cost of
delaying logical replication itself at various points.

Anyway, we seem to be on the same page; just some minor
miscommunication at some point.  I apologize if I was unclear.

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Simon Riggs
On 27 February 2016 at 17:54, Kevin Grittner  wrote:

> On a single database SSI can see whether a read has
> caused such a problem.  If you replicate the transactions to
> somewhere else and read them SSI cannot tell whether there is an
> anomaly


OK, I thought you were saying something else. What you're saying is that
SSI doesn't work on replicas, yet, whether that is physical or logical.

Row level locking (S2PL) can be used on logical standbys, so its actually a
better situation.

(at least, not without exchanging a lot of information that
> isn't currently happening), so some other mechanism would probably
> need to be used.  One possibility is to pass along information
> about when things are in a state on the source that is known to be
> free of anomalies if read; another would be to reorder the
> application of transactions to match the apparent order of
> execution.  The latter would not work for "physical" replication,
> but should be fine for logical replication.  An implementation
> might create a list in commit order, but not release the front of
> the list for processing if it is a SERIALIZABLE transaction which
> has written data until all overlapping SERIALIZABLE transactions
> complete, so it can move any subsequently-committed SERIALIZABLE
> transaction which read the "old" version of the data ahead of it.
>

The best way would be to pass across "anomaly barriers", since they can
easily be inserted into the WAL stream. The main issue seems to be how and
when to detect them.

For logical replay, applying in batches is actually a good thing since it
allows parallelism. We can remove them all from the target's procarray all
at once to avoid intermediate states becoming visible. So that would be the
preferred mechanism.

Collecting a list of transactions that must be applied before the current
one could be accumulated during SSI processing and added to the commit
record. But reordering the transaction apply is something we'd need to get
some real clear theory on before we considered it.

Anyway, next release.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Kevin Grittner
On Sat, Feb 27, 2016 at 1:14 PM, Konstantin Knizhnik
 wrote:

> We do not try to preserve transaction commit order at all nodes.
> But in principle it can be implemented using XTM API: it allows to redefine
> function which actually sets transaction status.  pg_dtm performs 2PC here.
> And in principle it is possible to enforce commits in any particular order.

That's encouraging.

> Concerning CSNs, may be you are right and it is not correct to use this
> notion in this case. Actually there are many "CSNs" involved in transaction
> commit.

Perhaps we should distinguish "commit sequence number" from "apply
sequence number"?  I really think we need to differentiate the
order to be applied from the order previously committed in order to
avoid long-term confusion.  Calling both "CSN" is going to cause
not only miscommunication but muddled thinking, IMO.

> First of all each transaction is assigned local CSN (timestamp) when it is
> ready to commit. Then CSNs of all nodes are exchanged and maximal CSN is
> chosen.
> This maximum is writen as final transaction CSN and is used in visibility
> check.

Is this an implementation of some particular formal technique?  If
so, do you have a reference to a paper on it?  I get the sense that
there has been a lot written about distributed transactions, and
that it would be a mistake to ignore it, but I have not (yet)
reviewed the literature for it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Konstantin Knizhnik

Neither pg_dtm, neither pg_tsdtm supports serializable isolation level.
We implemented distributed snapshot isolation - repeatable-read isolation level.
We also do not support read-committed isolation level now.

We do not try to preserve transaction commit order at all nodes.
But in principle it can be implemented using XTM API: it allows to redefine 
function which actually sets transaction status.  pg_dtm performs 2PC here.
And in principle it is possible to enforce commits in any particular order.

Concerning CSNs, may be you are right and it is not correct to use this notion in this 
case. Actually there are many "CSNs" involved in transaction commit.
First of all each transaction is assigned local CSN (timestamp) when it is 
ready to commit. Then CSNs of all nodes are exchanged and maximal CSN is chosen.
This maximum is writen as final transaction CSN and is used in visibility check.


On 02/27/2016 01:48 AM, Kevin Grittner wrote:

On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
 wrote:


pg_tsdtm  is based on another approach: it is using system time
as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database.  Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

To avoid confusion, it might be best to reserve CSN for actual
commit sequence numbers, or at least values which increase
monotonically with each commit.  The term of art for what I
described above is "apparent order of execution", so maybe we want
to use AOE or AOoE for the order we choose to use in a particular
implementation.  It doesn't seem to me to be outright inaccurate
for cases where the system time on the various systems is used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Kevin Grittner
On Fri, Feb 26, 2016 at 5:37 PM, Simon Riggs  wrote:
> On 26 February 2016 at 22:48, Kevin Grittner  wrote:

>> if we want logical
>> replication to be free of serialization anomalies for those using
>> serializable transactions, we need to support applying transactions
>> in an order which may not be the same as commit order -- CSN (as
>> such) would be the wrong thing.  If serializable transaction 1 (T1)
>> modifies a row and concurrent serializable transaction 2 (T2) reads
>> the old version of the row, and modifies something based on that,
>> T2 must be applied to a logical replica first even if T1 commits
>> before it; otherwise the logical replica could see a state not
>> consistent with business rules and which could not have been seen
>> (due to SSI) on the source database.
>
> How would SSI allow that commit order?
>
> Surely there is a read-write dependency that would cause T2 to be
> aborted?

*A* read-write dependency does not cause an abort under SSI, it
takes a *pattern* of read-write dependencies which has been proven
to appear in any set of concurrent transactions which can cause a
serialization anomaly.  A read-only transaction can be part of that
pattern.  On a single database SSI can see whether a read has
caused such a problem.  If you replicate the transactions to
somewhere else and read them SSI cannot tell whether there is an
anomaly (at least, not without exchanging a lot of information that
isn't currently happening), so some other mechanism would probably
need to be used.  One possibility is to pass along information
about when things are in a state on the source that is known to be
free of anomalies if read; another would be to reorder the
application of transactions to match the apparent order of
execution.  The latter would not work for "physical" replication,
but should be fine for logical replication.  An implementation
might create a list in commit order, but not release the front of
the list for processing if it is a SERIALIZABLE transaction which
has written data until all overlapping SERIALIZABLE transactions
complete, so it can move any subsequently-committed SERIALIZABLE
transaction which read the "old" version of the data ahead of it.

>> Any DTM API which does not
>> support some mechanism to rearrange the order of transactions from
>> commit order to some other order (based on, for example, read-write
>> dependencies) is not complete.  If it does support that, it gives
>> us a way forward for presenting consistent data on logical
>> replicas.
>
> You appear to be saying that SSI allows transactions to commit in a
> non-serializable order.

Absolutely not.  If you want to understand this better, this paper
might be helpful:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

> Do you have a test case?

There are a couple in this section of the Wiki page of examples:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

Just picture the read-only transaction executing on a replica.

Thinking of commit sequence number as the right order to apply
transactions during replication seems to me to be a holdover from
the techniques initially developed for transaction in the 1960s --
specifically, strict two-phase locking (S2PL) is very easy to get
one's head around and when using it the apparent order of execution
always *does* match commit order.  Unfortunately S2PL performs so
poorly that it was ripped out of PostgreSQL years ago.  In general,
I think it is time we gave up on thinking that is based on it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Álvaro Hernández Tortosa



On 27/02/16 09:19, Konstantin Knizhnik wrote:

On 02/27/2016 06:54 AM, Robert Haas wrote:


[...]



So maybe the goal for the GTM isn't to provide true serializability

across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

Absolutely agree. There are some theoretical discussion regarding CAP 
and different distributed level of isolation.
But at practice people want to solve their tasks. Most of PostgeSQL 
used are using default isolation level: read committed although there 
are alot of "wonderful" anomalies with it.
Serialazable transaction in Oracle are actually violating fundamental 
serializability rule and still Oracle is one of ther most popular 
database in the world...
The was isolation bug in Postgres-XL which doesn't prevent from using 
it by commercial customers...


I think this might be a dangerous line of thought. While I agree 
PostgreSQL should definitely look at the market and answer questions 
that (current and prospective) users may ask, and be more practical than 
idealist, easily ditching isolation guarantees might not be a good thing.


 That Oracle is the leader with their isolation problems or that 
most people run PostgreSQL under read committed is not a good argument 
to cut the corner and just go to bare minimum (if any) isolation 
guarantees. First, because PostgreSQL has always been trusted and 
understood as a system with *strong* guarantees (whatever that means). . 
Second, because what we may perceive as OK from the market, might change 
soon. From my observations, while I agree with you most people "don't 
care" or, worse, "don't realize", is rapidly changing. More and more 
people are becoming aware of the problems of distributed systems and the 
significant consequences they may have on them.


A lot of them have been illustrated in the famous Jepsen posts. As 
an example, and a good one given that you have mentioned Galera before, 
is this one: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster 
which demonstrates how Galera fails to provide Snapshot Isolation, even 
on healthy state --despite they claim that.


As of today, I would expect any distributed system to clearly state 
its guarantees in the documentation. And them adhere to them, like for 
instance proving it with tests such as Jepsen.




So I do not say that discussing all this theoretical questions is not 
need as formally proven correctness of distributed algorithm.


I would like to see work forward here, so I really appreciate all 
your work here. I cannot give an opinion on whether the DTM API is good 
or not, but I agree with Robert a good technical discussion on these 
issues is a good, and a needed, starting point. Feedback may also help 
you avoid pitfalls that may have gone unnoticed until tons of code are 
implemented.


Academical approaches are sometimes "very academical", but studying 
them doesn't hurt either :)



Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Konstantin Knizhnik

On 02/27/2016 06:54 AM, Robert Haas wrote:

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
 wrote:

We do not have formal prove that proposed XTM is "general enough" to handle
all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based and
CSN  based.

I don't believe that for a minute.  For example, consider this article:


Well, I have to agree that saying that there are just two ways of providing 
distributed isolation I was not right.
There is at least one more method: conservative locking. But it will cause huge 
number of extra network messages which has to be exchanged.
Also I mostly considered solutions compatible with PostgreSQL MVCC model.

And definitely their are other approaches. Like preserving transaction commit 
order (as it is done in Galera).
Some other them can be implemented with XTM (preserving commit order), some - 
not (2PL).
I have already noticed that XTM is not allowing to implement ANY transaction 
manager.
But we have considered several approaches to distributed transaction management 
explained in the article related with really working systems.
Some of them are real production system as SAP HANA, some are just prototypes, 
but working prototypes for which authors have performed
some benchmarking and comparison with other approaches. The references you have 
mentioned are mostly theoretical description of the problem.
Nice to know it but it is hard to build some concrete implementation based on 
this articles.


Briefly answering other your questions:


For example, consider a table with a million rows spread across any number of 
servers.


It is sharding scenario, pg_tsdtm will work well in this case does not 
requiring sending a lot of extra messages.


Now consider another workload where each transaction reads a row one

one server, reads a row on another server,

It can be solved both with pg_dtm (central arbiter) and pg_tsdtm (no arbiter),
But actually you scenarios just once again proves that there can not be just 
one ideal distributed TM.


So maybe the goal for the GTM isn't to provide true serializability

across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

Absolutely agree. There are some theoretical discussion regarding CAP and 
different distributed level of isolation.
But at practice people want to solve their tasks. Most of PostgeSQL used are using 
default isolation level: read committed although there are alot of "wonderful" 
anomalies with it.
Serialazable transaction in Oracle are actually violating fundamental 
serializability rule and still Oracle is one of ther most popular database in 
the world...
The was isolation bug in Postgres-XL which doesn't prevent from using it by 
commercial customers...

So I do not say that discussing all this theoretical questions is not need as 
formally proven correctness of distributed algorithm.
But I do not understand hot why it should prevent from providing extensible TM 
API.
Yes, we can tot do everything with it. But still we can implement many 
different approaches.
I think that it somehow proves that it is "general enough".






 






https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik

On 02/27/2016 06:57 AM, Robert Haas wrote:

On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
 wrote:

pg_tsdtm  is based on another approach: it is using system time as CSN and
doesn't require arbiter. In theory there is no limit for scalability. But
differences in system time and necessity to use more rounds of communication
have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?


If node receives message from "feature" it just needs to wait until this future 
arrive.
Practically we just "adjust" system time in this case, moving it forward 
(certainly system time is not actually changed, we just set correction value which need 
to be added to system time).
This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do here.

Few notes:
1. I can not prove that our pg_tsdtm absolutely correctly implements approach 
described in this article.
2. I didn't try to formally prove that our implementation can not cause some 
serialization anomalies.
3. We just run various synchronization tests (including simplest debit-credit 
test which breaks old version of Postgtes-XL) during several days and we didn't 
get any inconsistencies.
4. We have tested pg_tsdtm both at single node, blade cluster and geographically distributed nodes (distance more than thousand kilometers: one server was in Vladivostok, another in Kaliningrad). Ping between these two servers takes about 100msec. 
Performance of our benchmark drops about 100 times but there was no inconsistencies.


Also I once again want to notice that primary idea of the proposed patch was 
not pg_tsdtm.
There are well know limitation of this  pg_tsdtm which we will try to address 
in future.
What we want is to include XTM API in PostgreSQL to be able to continue our 
experiments with different transaction managers and implementing multimaster on 
top of it (our first practical goal) without affecting PostgreSQL core.

If XTM patch will be included in 9.6, then we can propose our multimaster as 
PostgreSQL extension and everybody can use it.
Otherwise we have to propose our own fork of Postgres which significantly 
complicates using and maintaining it.


So there is no ideal solution which can work well for all cluster. This is
why it is not possible to develop just one GTM, propose it as a patch for
review and then (hopefully) commit it in Postgres core. IMHO it will never
happen. And I do not think that it is actually needed. What we need is a way
to be able to create own transaction managers as Postgres extension not
affecting its  core.

This seems rather defeatist.  If the code is good and reliable, why
should it not be committed to core?


Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs 
and be  efficient for all clusters.
2. Even if such implementation exists, still the right way of it integration is 
Postgres should use kind of TM API.
I hope that everybody will agree that doing it in this way:

#ifdef PGXC
/* In Postgres-XC, stop timestamp has to follow the timeline of GTM */
xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
#else
xlrec.xact_time = xactStopTimestamp;
#endif

or in this way:

xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp : 
xactStopTimestamp;

is very very bad idea.
In OO programming we should have abstract TM interface and several 
implementations of this interface, for example
MVCC_TM, 2PL_TM, Distributed_TM...
This is actually what can be done with our XTM API.
As far as Postgres is implemented in C, not in C++ we have to emulate 
interfaces using structures with function pointers.
And please notice that there is completely no need to include DTM 
implementation in core, as far as it is not needed for everybody.
It can be easily distributed as extension.

I have that quite soon we can propose multimaster extension which should provides functionality similar with MySQL Gallera. But even right now we have integrated pg_dtm and pg_tsdtm with pg_shard and postgres_fdw, allowing to provide distributed 
consistency for them.






All arguments against XTM can be applied to any other extension API in
Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are
not handled by this API. For example performing aggregation and grouping at
foreign server side.  But still it is very useful and flexible mechanism,
allowing to implement many wonderful things.

That is true.  And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not.  We have
both accepted and rejected proposed hooks in the past.




--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
 wrote:
> pg_tsdtm  is based on another approach: it is using system time as CSN and
> doesn't require arbiter. In theory there is no limit for scalability. But
> differences in system time and necessity to use more rounds of communication
> have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?

> So there is no ideal solution which can work well for all cluster. This is
> why it is not possible to develop just one GTM, propose it as a patch for
> review and then (hopefully) commit it in Postgres core. IMHO it will never
> happen. And I do not think that it is actually needed. What we need is a way
> to be able to create own transaction managers as Postgres extension not
> affecting its  core.

This seems rather defeatist.  If the code is good and reliable, why
should it not be committed to core?

> All arguments against XTM can be applied to any other extension API in
> Postgres, for example FDW.
> Is it general enough? There are many useful operations which currently are
> not handled by this API. For example performing aggregation and grouping at
> foreign server side.  But still it is very useful and flexible mechanism,
> allowing to implement many wonderful things.

That is true.  And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not.  We have
both accepted and rejected proposed hooks in the past.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
 wrote:
> We do not have formal prove that proposed XTM is "general enough" to handle
> all possible transaction manager implementations.
> But there are two general ways of dealing with isolation: snapshot based and
> CSN  based.

I don't believe that for a minute.  For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.

For example, consider a table with a million rows spread across any
number of servers.  Consider also a series of update transactions each
of which reads exactly one row and then writes that row.  If we adopt
any solution that involves a central coordinator to arbitrate commit
ordering, this is going to require at least one and probably two
million network round trips, one per transaction to get a snapshot and
a second to commit.  But all of this is completely unnecessary.
Because each transaction touches only a single node, a perfect global
transaction manager doesn't really need to do anything at all in this
case.  The existing PostreSQL mechanisms - snapshot isolation, and SSI
if you have it turned on - will provide just as much transaction
isolation on this workload as they would on a workload that only
touched a single node.  If we design a GTM that does two million
network round trips in this scenario, we have just wasted two million
network round trips.

Now consider another workload where each transaction reads a row one
one server, reads a row on another server, and then updates the second
row.  Here, the GTM has a job to do.  If T1 reads R1, reads R2, writes
R2; and T2 concurrently reads R2, reads R1, and then writes R1, it
could happen that both transactions see the pre-update values of the
row they read first and yet both transactions go on to commit.  That's
not equivalent to any serial history, so transaction isolation is
broken.  A GTM which aims to provide true cluster-wide serializability
must do something to keep that from happening.  If all of this were
happening on a single node, those transactions would succeed if run at
READ COMMITTED but SSI would roll one of them back at SERIALIZABLE.
So maybe the goal for the GTM isn't to provide true serializability
across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

I have seen zero discussion of any of this.  What I think we 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Simon Riggs
On 26 February 2016 at 22:48, Kevin Grittner  wrote:

> On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
>  wrote:
>
> > pg_tsdtm  is based on another approach: it is using system time
> > as CSN
>
> Which brings up an interesting point, if we want logical
> replication to be free of serialization anomalies for those using
> serializable transactions, we need to support applying transactions
> in an order which may not be the same as commit order -- CSN (as
> such) would be the wrong thing.  If serializable transaction 1 (T1)
> modifies a row and concurrent serializable transaction 2 (T2) reads
> the old version of the row, and modifies something based on that,
> T2 must be applied to a logical replica first even if T1 commits
> before it; otherwise the logical replica could see a state not
> consistent with business rules and which could not have been seen
> (due to SSI) on the source database.


How would SSI allow that commit order?

Surely there is a read-write dependency that would cause T2 to be aborted?


> Any DTM API which does not
> support some mechanism to rearrange the order of transactions from
> commit order to some other order (based on, for example, read-write
> dependencies) is not complete.  If it does support that, it gives
> us a way forward for presenting consistent data on logical
> replicas.
>

You appear to be saying that SSI allows transactions to commit in a
non-serializable order.

Do you have a test case?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Kevin Grittner
On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
 wrote:

> pg_tsdtm  is based on another approach: it is using system time
> as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database.  Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

To avoid confusion, it might be best to reserve CSN for actual
commit sequence numbers, or at least values which increase
monotonically with each commit.  The term of art for what I
described above is "apparent order of execution", so maybe we want
to use AOE or AOoE for the order we choose to use in a particular
implementation.  It doesn't seem to me to be outright inaccurate
for cases where the system time on the various systems is used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik

On 02/26/2016 09:30 PM, Alvaro Herrera wrote:

Konstantin Knizhnik wrote:


Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently
synchronize their branch with master,
and, what is more important, for customers, which can not use standard
version of PostgreSQL.
It may cause problems with system certification, with running Postgres in
cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
wrong direction.

That's not the point, though.  I don't think a Postgres clone with a GTM
solves any particular problem that's not already solved by the existing
forks.  However, if you have a clone at home and you make a GTM work on
it, then you take the GTM as a patch and post it for discussion.
There's no need for hooks for that.  Just make sure your GTM solves the
problem that it is supposed to solve.

Excuse me if I've missed the discussion elsewhere -- why does
PostgresPro have *two* GTMs instead of a single one?


There are many different clusters which require different approaches for 
managing distributed transactions.
Some clusters do no need distributed transactions at all: if you are executing 
OLAP queries on read-only database GTM will  just add extra overhead.

pg_dtm uses centralized arbiter. It is similar with Postgres-XL DTM. Presence of single arbiter signficantly simplify all distributed algorithms: failure detection, global deadlock elimination, ... But at the same time arbiter is SPOF and main factor 
limiting cluster scalability.


pg_tsdtm  is based on another approach: it is using system time as CSN and doesn't require arbiter. In theory there is no limit for scalability. But differences in system time and necessity to use more rounds of communication have negative impact on 
performance.


So there is no ideal solution which can work well for all cluster. This is why it is not possible to develop just one GTM, propose it as a patch for review and then (hopefully) commit it in Postgres core. IMHO it will never happen. And I do not think that 
it is actually needed. What we need is a way to be able to create own transaction managers as Postgres extension not affecting its  core.


All arguments against XTM can be applied to any other extension API in 
Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are not handled by this API. For example performing aggregation and grouping at foreign server side.  But still it is very useful and flexible mechanism, allowing to implement many 
wonderful things.


From my point of view good system should be as open and customizable as 
possible, if it doesn't affect  performance.
Replacing direct function calls with indirect function calls in almost all 
cases can not suffer performance as well as adding hooks.
So without any extra price we get better flexibility. What's wrong with it?






--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Bruce Momjian
On Fri, Feb 26, 2016 at 03:30:29PM -0300, Alvaro Herrera wrote:
> That's not the point, though.  I don't think a Postgres clone with a GTM
> solves any particular problem that's not already solved by the existing
> forks.  However, if you have a clone at home and you make a GTM work on
> it, then you take the GTM as a patch and post it for discussion.
> There's no need for hooks for that.  Just make sure your GTM solves the
> problem that it is supposed to solve.
> 
> Excuse me if I've missed the discussion elsewhere -- why does
> PostgresPro have *two* GTMs instead of a single one?

I think the issue is that a GTM that works for a low-latency network
doesn't work well for a high-latency network, so the high-latency GTM
has fewer features and guarantees.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Alvaro Herrera
Konstantin Knizhnik wrote:

> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
> But it cause big problems both for developers, which have to permanently
> synchronize their branch with master,
> and, what is more important, for customers, which can not use standard
> version of PostgreSQL.
> It may cause problems with system certification, with running Postgres in
> cloud,...
> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
> wrong direction.

That's not the point, though.  I don't think a Postgres clone with a GTM
solves any particular problem that's not already solved by the existing
forks.  However, if you have a clone at home and you make a GTM work on
it, then you take the GTM as a patch and post it for discussion.
There's no need for hooks for that.  Just make sure your GTM solves the
problem that it is supposed to solve.

Excuse me if I've missed the discussion elsewhere -- why does
PostgresPro have *two* GTMs instead of a single one?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik
We do not have formal prove that proposed XTM is "general enough" to 
handle all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based 
and CSN  based.

pg_dtm and pg_tsdtm prove that both of them can be implemented using XTM.
If you know some approach to distributed transaction manager 
implementation, please let us know.

Otherwise your statement "is not general enough" is not concrete enough.
Postgres-XL GTM can be in principle implemented as extension based on XTM.

This API is based on existed PostgreSQL TM functions: we do not 
introduce some new abstractions.
Is it possible that some other TM function has to be encapsulated? Yes, 
it is.
But I do not see much problems with adding this function to XTM in 
future if it is actually needed.
It happens with most APIs. It is awful when API functions are changed, 
breaking application based on this API.
But as far as functions encapsulated in XTM are in any case present in 
PostgreSQL core, I do not think
that them will be changed in future unless there are some plans to 
completely rewrite Postgres transaction manager...


Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently 
synchronize their branch with master,
and, what is more important, for customers, which can not use standard 
version of PostgreSQL.
It may cause problems with system certification, with running Postgres 
in cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it 
is wrong direction.




On 26.02.2016 19:06, Robert Haas wrote:

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:

Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.

I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.


And what makes us think we
really need multiple transaction managers, anyway?

If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well  in various clustering environment during
development period, which is short, than probably we don't need  multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question.  The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do.  There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs.  Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.  We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.


dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 10:00 PM, Joshua D. Drake  
wrote:
> Robert, this is all a game. It is a game of who wins the intellectual prize
> to whatever problem. Who gets the market or mind share and who gets to
> pretend they win the Oscar for coolest design.

JD, I don't have a horse in this race.  I am not developing a GTM and
I would be quite happy never to have to develop a GTM.  That doesn't
mean I think we should add these proposed hooks.  I think that's just
freezing the way that potential GTMs have to interact with the rest of
the system before we actually have a solution that the community is
willing to endorse.  I don't know what problem that solves.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Joshua D. Drake

On 02/26/2016 08:06 AM, Robert Haas wrote:

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:

Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.


I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.


Correct.

[snip]



Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.


No it didn't. It allowed MySQL people to use the tool that best fit 
their needs.



We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.


The reason people developed a bunch of external replication tools (and 
continue to) is because .Org has shown a unique lack of leadership in 
providing solutions for the problem. Historically speaking .Org was anti 
replication in core. It wasn't about who was going to be best. It was 
who was going to be best for what problem. The inclusion of the 
replication tools we have now speaks very loudly to the that lack of 
leadership.


The moment .Org showed leadership and developed a reasonable solution to 
80% of the problem, a great majority of people moved to hot standby and 
streaming replication. It is easy. It does not answer all the questions 
but it is default, in core and that gives people piece of mind. This is 
also why once PgLogical is up to -core quality and in -core, the great 
majority of people will work to dump Slony/Londiste/Insertproghere and 
use PgLogical.


If .Org was interested in showing leadership in this area, a few hackers 
would get together with a few other hackers from XL and XC (although as 
I understand it XL is further along), have a few heart to heart, mind to 
mind meetings and determine:


* Are either of these two solutions worth it?
Yes? Then let's start working on an integration plan and get it done.
No? Then let's start working on a .Org plan to solve that problem.

But that likely won't happen because NIH.



I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.


dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !


I don't like to play games with the architecture of PostgreSQL.



Robert, this is all a game. It is a game of who wins the intellectual 
prize to whatever problem. Who gets the market or mind share and who 
gets to pretend they win the Oscar for coolest design.


Sincerely,

jD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:
> Right now tm is hardcoded and it's doesn't matter  "if other people might
> need" at all.  We at least provide developers ("other people")  ability to
> work on their implementations and the patch  is safe and doesn't sacrifices
> anything in core.

I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.

>> And what makes us think we
>> really need multiple transaction managers, anyway?
>
> If you brave enough to say that one tm-fits-all and you are able to teach
> existed tm to play well  in various clustering environment during
> development period, which is short, than probably we don't need  multiple
> tms. But It's too perfect to believe and practical solution is to let
> multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question.  The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do.  There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs.  Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.  We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.

> dtms.  It's time to start working on dtm, I believe. The fact you don't
> think about distributed transactions support doesn't mean there no "other
> people", who has different ideas on postgres future.  That's why we propose
> this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Oleg Bartunov
On Fri, Feb 26, 2016 at 3:50 PM, Robert Haas  wrote:

> On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov 
> wrote:
> > I already several times pointed, that we need XTM to be able to continue
> > development in different directions, since there is no clear winner.
> > Moreover, I think there is no fits-all  solution and while I agree we
> need
> > one built-in in the core, other approaches should have ability to exists
> > without patching.
>
> I don't think I necessarily agree with that.  Transaction management
> is such a fundamental part of the system that I think making it
> pluggable is going to be really hard.  I understand that you've done
> several implementations based on your proposed API, and that's good as
> far as it goes, but how do we know that's really going to be general
> enough for what other people might need?


Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.



> And what makes us think we
> really need multiple transaction managers, anyway?



If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well  in various clustering environment during
development period, which is short, than probably we don't need  multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.



> Even writing one
> good distributed transaction manager seems like a really hard project
> - why would we want to write two or three or five?
>

again, right now it's simply impossible to any bright person to work on
dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !



>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>


Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Robert Haas
On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov  wrote:
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.
> Moreover, I think there is no fits-all  solution and while I agree we need
> one built-in in the core, other approaches should have ability to exists
> without patching.

I don't think I necessarily agree with that.  Transaction management
is such a fundamental part of the system that I think making it
pluggable is going to be really hard.  I understand that you've done
several implementations based on your proposed API, and that's good as
far as it goes, but how do we know that's really going to be general
enough for what other people might need?  And what makes us think we
really need multiple transaction managers, anyway?  Even writing one
good distributed transaction manager seems like a really hard project
- why would we want to write two or three or five?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-25 Thread Bruce Momjian
On Thu, Feb 25, 2016 at 01:53:12PM +0900, Michael Paquier wrote:
> > Well, as far as I know XC doesn't support data redistribution between
> > nodes and I saw good benchmarks of that, as well as XL.
> 
> XC does support that in 1.2 with a very basic approach (coded that
> years ago), though it takes an exclusive lock on the table involved.
> And actually I think what I did in this case really sucked, the effort
> was centralized on the Coordinator to gather and then redistribute the
> tuples, at least tuples that do not need to move were not moved at
> all.

Yes, there is a lot of complexity involved in sending results between
nodes.

> >> Once that is done, we can see what workloads it covers and
> >> decide if we are willing to copy the volume of code necessary
> >> to implement all supported Postgres XC or XL workloads.
> >> (The Postgres XL license now matches the Postgres license,
> >> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> >> Postgres XC has always used the Postgres license.)
> 
> Postgres-XC used the GPL license first, and has moved to PostgreSQL
> license exactly to allow Postgres core to reuse it later on if needed.

Ah, yes, I remember that now.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Michael Paquier
On Wed, Feb 24, 2016 at 11:34 PM, Bruce Momjian  wrote:
> On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:
>> Hi, Bruce!
>>
>> The important point for me is to distinguish different kind of plans:
>> implementation plan and research plan.
>> If we're talking about implementation plan then it should be proven that
>> proposed approach works in this case. I.e research should be already done.
>> If we're talking about research plan then we should realize that result is
>> unpredictable. And we would probably need to dramatically change our way.
>
> Yes, good point.  I would say FDW-based sharding is certainly still a
> research approach, but an odd one because we are adding code even while
> in research mode.  I think that is possible because the FDW improvements
> have other uses beyond sharding.
>
> I think another aspect is that we already know that modifying the
> Postgres source code can produce a useful sharding solution --- XC, XL,
> Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
> So, we know that with unlimited code changes, it is possible.  What we
> don't know is whether it is possible with acceptable code changes, and
> how much of the feature-set can be supported this way.
>
> We had a similar case with the Windows port, where SRA (my employer at
> the time) and Nusphere both had native Windows ports of Postgres, and
> they supplied source code to help with the port.  So, in that case also,
> we knew a native Windows port was possible, and we (or at least I) could
> see the code that was required to do it.  The big question was whether a
> native Windows port could be added in a community-acceptable way, and
> the community agreed we could try if we didn't make the code messier ---
> that was a success.
>
> For pg_upgrade, I had code from EDB (my employer at the time) that kind
> of worked, but needed lots of polish, and again, I could do it in
> contrib as long as I didn't mess up the backend code --- that worked
> well too.
>
> So, I guess I am saying, the FDW/sharding thing is a research project,
> but one that is implementing code because of existing proven solutions
> and because the improvements are benefiting other use-cases beyond
> sharding.
>
> Also, in the big picture, the existence of many Postgres forks, all
> doing sharding, indicates that there is demand for this capability, and
> if we can get some this capability into Postgres we will increase the
> number of people using native Postgres.  We might also be able to reduce
> the amount of duplicate work being done in all these forks and allow
> them to more easily focus on more advanced use-cases.
>
>> This two things would work with FDW:
>> 1) Pull data from data nodes to coordinator.
>> 2) Pushdown computations from coordinator to data nodes: joins, aggregates 
>> etc.
>> It's proven and clear. This is good.
>> Another point is that these FDW advances are useful by themselves. This is 
>> good
>> too.
>>
>> However, the model of FDW assumes that communication happen only between
>> coordinator and data node. But full-weight distributed optimized can't be 
>> done
>> under this restriction, because it requires every node to communicate every
>> other node if it makes distributed query faster. And as I get, FDW approach
>> currently have no research and no particular plan for that.
>
> This is very true.  I imagine cross-node connections will certainly
> complicate the implementation and lead to significant code changes,
> which might be unacceptable.  I think we need to go with a
> non-cross-node implementation first, then if that is accepted, we can
> start to think what cross-node code changes would look like.  It
> certainly would require FDW knowledge to exist on every shard.  Some
> have suggested that FDWs wouldn't work well for cross-node connections
> or wouldn't scale and we shouldn't be using them --- I am not sure what
> to think of that.
>
>> As I get from Robert Haas's talk 
>> (https://docs.google.com/viewer?a=v=sites;
>> srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)
>>
>> Before we consider repartitioning joins, we should probably get 
>> everything
>> previously discussed working first.
>> – Join Pushdown For Parallelism, FDWs
>> – PartialAggregate/FinalizeAggregate
>> – Aggregate Pushdown For Parallelism, FDWs
>> – Declarative Partitioning
>> – Parallel-Aware Append
>>
>>
>> So, as I get we didn't ever think about possibility of data redistribution
>> using FDW. Probably, something changed since that time. But I haven't heard
>> about it.
>
> No, you didn't miss it.  :-(  We just haven't gotten to studying that
> yet.  One possible outcome is that built-in Postgres has non-cross-node
> sharding, and forks of Postgres have cross-node sharding, again assuming
> cross-node sharding requires an unacceptable amount of code change.  I
> don't think anyone knows the answer yet.
>
>> On Tue, Feb 23, 2016 at 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian
On Wed, Feb 24, 2016 at 01:02:21PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:
> 
> > > It's never been our policy to try to include major projects in single code
> > > drops. Any move of XL/XC code into PostgreSQL core would need to be done 
> > > piece
> > > by piece across many releases. XL is definitely too big for the elephant 
> > > to eat
> > > in one mouthful.
> > 
> > Is there any plan to move the XL/XC code into Postgres?  If so, I have
> > not heard of it.  I thought everyone agreed it was too much code change,
> > which is why it is a separate code tree.  Is that incorrect?
> 
> Yes, I think that's incorrect.
> 
> What was said, as I understood it, is that Postgres-XL is too big to
> merge in a single commit -- just like merging BDR would have been.
> Indulge me while I make a parallel with BDR for a bit.
> 2ndQuadrant never pushed for merging BDR in a single commit; what was
> done was to split it, and propose individual pieces for commit.  Many of
> these pieces are now already committed (event triggers, background
> workers, logical decoding, replication slots, and many others).  The
> "BDR patch" is now much smaller, and it's quite possible that we will
> see it merged someday.  Will it be different from what it was when the
> BDR project started, all those years ago?  You bet.  Having the
> prototype BDR initially was what allowed the whole plan to make sense,
> because it showed that the pieces interacted in the right ways to make
> it work as a whole.

Yes, that is my understanding too.

> (I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
> pretty sure you can see the same thing in parallel query development,
> for instance.)
> 
> In the same way, Postgres-XL is far too big to merge in a single commit.
> But that doesn't mean it will never be merged.  What is more likely to
> happen instead is that some pieces of it are going to be submitted
> separately for consideration.  It is a slow process, but progress is
> real and tangible.  We know this process will yield a useful outcome,

I was not aware there was any process to merge XC/XL into Postgres, at
least from the XC/XL side.  I know there is desire to take code from
XC/XL on the FDW-sharding side.

I think the most conservative merge approach is to try to enhance
existing Postgres features first (FDWs, partitioning, parallelism),
perhaps features that didn't exist at the time XC/XL were designed. If
they work, keep them and add the XC/XL-specific parts.  If the
enhance-features approach doesn't work, we then have to consider how
much additional code will be needed.  We have to evaluate this for the
FDW-based approach too, but it is likely to be smaller, which is its
attraction.

> because the architecture has already proven by the existence of
> Postgres-XL itself.  It's the prototype that proves the overall design,
> even if the pieces change shape during the process.  (Really, it's way
> more than merely a prototype at this point because of how long it has
> matured.)

True, it is beyond a prototype.

> In contrast, we don't have a prototype for FDW-based sharding; as you
> admitted, there is no actual plan, other than "let's push FDWs in this
> direction and hope that sharding will emerge".  We don't really know
> what pieces we need or how will they interact with each other; we have a
> vague idea of a direction but there's no clear path forward.  As the
> saying goes, if you don't know where you're going, you will probably end
> up somewhere else.

I think I have covered that already.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Alvaro Herrera
Bruce Momjian wrote:
> On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:

> > It's never been our policy to try to include major projects in single code
> > drops. Any move of XL/XC code into PostgreSQL core would need to be done 
> > piece
> > by piece across many releases. XL is definitely too big for the elephant to 
> > eat
> > in one mouthful.
> 
> Is there any plan to move the XL/XC code into Postgres?  If so, I have
> not heard of it.  I thought everyone agreed it was too much code change,
> which is why it is a separate code tree.  Is that incorrect?

Yes, I think that's incorrect.

What was said, as I understood it, is that Postgres-XL is too big to
merge in a single commit -- just like merging BDR would have been.
Indulge me while I make a parallel with BDR for a bit.
2ndQuadrant never pushed for merging BDR in a single commit; what was
done was to split it, and propose individual pieces for commit.  Many of
these pieces are now already committed (event triggers, background
workers, logical decoding, replication slots, and many others).  The
"BDR patch" is now much smaller, and it's quite possible that we will
see it merged someday.  Will it be different from what it was when the
BDR project started, all those years ago?  You bet.  Having the
prototype BDR initially was what allowed the whole plan to make sense,
because it showed that the pieces interacted in the right ways to make
it work as a whole.

(I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
pretty sure you can see the same thing in parallel query development,
for instance.)

In the same way, Postgres-XL is far too big to merge in a single commit.
But that doesn't mean it will never be merged.  What is more likely to
happen instead is that some pieces of it are going to be submitted
separately for consideration.  It is a slow process, but progress is
real and tangible.  We know this process will yield a useful outcome,
because the architecture has already proven by the existance of
Postgres-XL itself.  It's the prototype that proves the overall design,
even if the pieces change shape during the process.  (Really, it's way
more than merely a prototype at this point because of how long it has
matured.)

In contrast, we don't have a prototype for FDW-based sharding; as you
admitted, there is no actual plan, other than "let's push FDWs in this
direction and hope that sharding will emerge".  We don't really know
what pieces we need or how will they interact with each other; we have a
vague idea of a direction but there's no clear path forward.  As the
saying goes, if you don't know where you're going, you will probably end
up somewhere else.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian
On Wed, Feb 24, 2016 at 09:34:37AM -0500, Bruce Momjian wrote:
> > I have nothing against particular FDW advances. However, it's unclear for me
> > that FDW should be the only sharding approach.
> > It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can
> > have some low-hanging fruits. That's good.
> > But it's unclear we can have high-hanging fruits (like data redistribution)
> > with FDW approach. And if we can it's unclear that it would be easier than 
> > with
> > other approaches.
> > Just let's don't call this community chosen plan for implementing sharding.
> > Until we have full picture we can't select one way and reject others.
> 
> I agree.  I think the FDW approach is the only existing approach for
> built-in sharding though.  The forks of Postgres doing sharding are,
> just that, forks and just Postgres community ecosystem projects.   (Yes,
> they are open source.)  If the forks were community-chosen plans we
> hopefully would not have 5+ of them.  If FDW works, it has the potential
> to be the community-chosen plan, at least for the workloads it supports,
> because it is built into community Postgres in a way the others cannot.
> 
> That doesn't mean the forks go away, but rather their value is in doing
> things the FDW approach can't, but there are a lot of "if's" in there.

Actually, this seems similar to how we handled replication.  For years
we had multiple external replication solutions.  When we implemented
streaming replication, we knew it would become the default for workloads
it supports.  The external solutions didn't go away, but their value was
in handling workloads that streaming replication didn't support.

I think the only difference is that we knew streaming replication would
have this effect before we implemented it, while with FDW-based
sharding, we don't know.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian
On Wed, Feb 24, 2016 at 12:22:20PM +0300, Konstantin Knizhnik wrote:
> Sorry, but based on this plan it is possible to make a conclusion
> that there are only two possible cluster solutions for Postgres:
> XC/XL and FDW-based.  From my point of view there are  much more
> possible alternatives.
> Our main idea with XTM (eXtensible Transaction Manager API) was to
> make it possible to develop cluster solutions for Postgres as
> extensions without patching code of Postgres core. And FDW is one of
> the mechanism which makes it possible to reach this goal.

Yes, this is a good example of code reuse.

> IMHO it will be hard to implement efficient execution of complex
> OLAP queries (including cross-node joins  and aggregation) within
> FDW paradigm. It will be necessary to build distributed query
> execution plan and coordinate it execution at cluster nodes. And
> definitely we need specialized optimizer for distributed queries.
> Right now solution of the problem are provided by XL and Greenplum,
> but both are forks of Posrgres with a lot of changes in Postgres
> core. The challenge is to provide the similar functionality, but at
> extension level (using custom nodes, pluggable transaction manager,
> ...).

Agreed.

> But, as you noticed,  complex OLAP is just one of the scenarios and
> this is not the only possible way of using clusters. In some cases
> FDW-based sharding can be quite efficient. Or pg_shard approach
> which also adds sharding at extension level and in some aspects is
> more flexible than FDW-based solution. Not all scenarios require
> global transaction manager. But if one need global consistency, then
> XTM API allows to provide ACID for both approaches (and not only for
> them).

Yep.

> We currently added to commitfest our XTM patch together with
> postgres_fdw patch integrating timestamp-based DTM implementation in
> postgres_fdw. It illustrates how global consistency canbe reached
> for FDW-based sharding.
> If this XTM patch will be committed, then in 9.6 we will have wide
> flexibility to play with different distributed transaction managers.
> And it can be used for many cluster solutions.
> 
> IMHO it will be very useful to extend your classification of cluster
> use cases, more precisely  formulate demands in all cases,
> investigate  how them can be covered by existed cluster solutions
> for Postgres and which niches are still vacant. We are currently
> continue work on "multimaster" - some more convenient alternative to
> hot-standby replication. Looks like PostgreSQL is missing some
> product providing functionality similar to Oracle RAC or MySQL
> Gallera. It is yet another direction of cluster development for
> PostgreSQL.  Let's be more open and flexible.

Yes, I listed only the workloads I could think of.  It would be helpful
to list more workloads and start to decide what can be accomplished with
each approach.  I don't even know all the workloads supported by the
sharding forks of Postgres.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian
On Wed, Feb 24, 2016 at 12:35:15PM +0300, Oleg Bartunov wrote:
> I have nothing against particular FDW advances. However, it's unclear for
> me that FDW should be the only sharding approach.
> It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
> can have some low-hanging fruits. That's good.
> But it's unclear we can have high-hanging fruits (like data 
> redistribution)
> with FDW approach. And if we can it's unclear that it would be easier than
> with other approaches.
> Just let's don't call this community chosen plan for implementing 
> sharding.
> Until we have full picture we can't select one way and reject others.
> 
> 
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.  
> Moreover,
> I think there is no fits-all  solution and while I agree we need one built-in
> in the core, other approaches should have ability to exists without patching.

Yep.  I think much of what we eventually add to core will be either
copied from an existing soltion, which then doesn't need to be
maintained anymore, or used by existing solutions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Bruce Momjian
On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:
> Hi, Bruce!
> 
> The important point for me is to distinguish different kind of plans:
> implementation plan and research plan.
> If we're talking about implementation plan then it should be proven that
> proposed approach works in this case. I.e research should be already done.
> If we're talking about research plan then we should realize that result is
> unpredictable. And we would probably need to dramatically change our way.

Yes, good point.  I would say FDW-based sharding is certainly still a
research approach, but an odd one because we are adding code even while
in research mode.  I think that is possible because the FDW improvements
have other uses beyond sharding.

I think another aspect is that we already know that modifying the
Postgres source code can produce a useful sharding solution --- XC, XL,
Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
So, we know that with unlimited code changes, it is possible.  What we
don't know is whether it is possible with acceptable code changes, and
how much of the feature-set can be supported this way.

We had a similar case with the Windows port, where SRA (my employer at
the time) and Nusphere both had native Windows ports of Postgres, and
they supplied source code to help with the port.  So, in that case also,
we knew a native Windows port was possible, and we (or at least I) could
see the code that was required to do it.  The big question was whether a
native Windows port could be added in a community-acceptable way, and
the community agreed we could try if we didn't make the code messier ---
that was a success.

For pg_upgrade, I had code from EDB (my employer at the time) that kind
of worked, but needed lots of polish, and again, I could do it in
contrib as long as I didn't mess up the backend code --- that worked
well too.

So, I guess I am saying, the FDW/sharding thing is a research project,
but one that is implementing code because of existing proven solutions
and because the improvements are benefiting other use-cases beyond
sharding.

Also, in the big picture, the existence of many Postgres forks, all
doing sharding, indicates that there is demand for this capability, and
if we can get some this capability into Postgres we will increase the
number of people using native Postgres.  We might also be able to reduce
the amount of duplicate work being done in all these forks and allow
them to more easily focus on more advanced use-cases.

> This two things would work with FDW:
> 1) Pull data from data nodes to coordinator.
> 2) Pushdown computations from coordinator to data nodes: joins, aggregates 
> etc.
> It's proven and clear. This is good.
> Another point is that these FDW advances are useful by themselves. This is 
> good
> too.
> 
> However, the model of FDW assumes that communication happen only between
> coordinator and data node. But full-weight distributed optimized can't be done
> under this restriction, because it requires every node to communicate every
> other node if it makes distributed query faster. And as I get, FDW approach
> currently have no research and no particular plan for that.

This is very true.  I imagine cross-node connections will certainly
complicate the implementation and lead to significant code changes,
which might be unacceptable.  I think we need to go with a
non-cross-node implementation first, then if that is accepted, we can
start to think what cross-node code changes would look like.  It
certainly would require FDW knowledge to exist on every shard.  Some
have suggested that FDWs wouldn't work well for cross-node connections
or wouldn't scale and we shouldn't be using them --- I am not sure what
to think of that.

> As I get from Robert Haas's talk 
> (https://docs.google.com/viewer?a=v=sites;
> srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)
> 
> Before we consider repartitioning joins, we should probably get everything
> previously discussed working first.
> – Join Pushdown For Parallelism, FDWs
> – PartialAggregate/FinalizeAggregate
> – Aggregate Pushdown For Parallelism, FDWs
> – Declarative Partitioning
> – Parallel-Aware Append
> 
> 
> So, as I get we didn't ever think about possibility of data redistribution
> using FDW. Probably, something changed since that time. But I haven't heard
> about it.

No, you didn't miss it.  :-(  We just haven't gotten to studying that
yet.  One possible outcome is that built-in Postgres has non-cross-node
sharding, and forks of Postgres have cross-node sharding, again assuming
cross-node sharding requires an unacceptable amount of code change.  I
don't think anyone knows the answer yet.

> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian  wrote:
> 
> Second, as part of this staged implementation, there are several use
> cases that will be shardable at first, and then only later, more complex
> ones.  For 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Oleg Bartunov
On Wed, Feb 24, 2016 at 12:17 PM, Alexander Korotkov <
a.korot...@postgrespro.ru> wrote:

> Hi, Bruce!
>
> The important point for me is to distinguish different kind of plans:
> implementation plan and research plan.
> If we're talking about implementation plan then it should be proven that
> proposed approach works in this case. I.e research should be already done.
> If we're talking about research plan then we should realize that result is
> unpredictable. And we would probably need to dramatically change our way.
>
> This two things would work with FDW:
> 1) Pull data from data nodes to coordinator.
> 2) Pushdown computations from coordinator to data nodes: joins, aggregates
> etc.
> It's proven and clear. This is good.
> Another point is that these FDW advances are useful by themselves. This is
> good too.
>
> However, the model of FDW assumes that communication happen only between
> coordinator and data node. But full-weight distributed optimized can't be
> done under this restriction, because it requires every node to communicate
> every other node if it makes distributed query faster. And as I get, FDW
> approach currently have no research and no particular plan for that.
>
> As I get from Robert Haas's talk (
> https://docs.google.com/viewer?a=v=sites=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
> )
>
>> Before we consider repartitioning joins, we should probably get
>> everything previously discussed working first.
>> – Join Pushdown For Parallelism, FDWs
>> – PartialAggregate/FinalizeAggregate
>> – Aggregate Pushdown For Parallelism, FDWs
>> – Declarative Partitioning
>> – Parallel-Aware Append
>
>
> So, as I get we didn't ever think about possibility of data redistribution
> using FDW. Probably, something changed since that time. But I haven't heard
> about it.
>
> On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian  wrote:
>
>> Second, as part of this staged implementation, there are several use
>> cases that will be shardable at first, and then only later, more complex
>> ones.  For example, here are some use cases and the technology they
>> require:
>>
>> 1. Cross-node read-only queries on read-only shards using aggregate
>> queries, e.g. data warehouse:
>>
>> This is the simplest to implement as it doesn't require a global
>> transaction manager, global snapshot manager, and the number of rows
>> returned from the shards is minimal because of the aggregates.
>>
>> 2. Cross-node read-only queries on read-only shards using non-aggregate
>> queries:
>>
>> This will stress the coordinator to collect and process many returned
>> rows, and will show how well the FDW transfer mechanism scales.
>>
>
> FDW would work for queries which fits pull-pushdown model. I see no plan
> to make other queries work.
>
>
>> 3. Cross-node read-only queries on read/write shards:
>>
>> This will require a global snapshot manager to make sure the shards
>> return consistent data.
>>
>> 4. Cross-node read-write queries:
>>
>> This will require a global snapshot manager and global snapshot manager.
>>
>
> At this point, it unclear why don't you refer work done in the direction
> of distributed transaction manager (which is also distributed snapshot
> manager in your terminology)
> http://www.postgresql.org/message-id/56bb7880.4020...@postgrespro.ru
>
>
>> In 9.6, we will have FDW join and sort pushdown
>> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
>> calability.html
>> ).
>> Unfortunately I don't think we will have aggregate
>> pushdown, so we can't test #1, but we might be able to test #2, even in
>> 9.5.  Also, we might have better partitioning syntax in 9.6.
>>
>> We need things like parallel partition access and replicated lookup
>> tables for more join pushdown.
>>
>> In a way, because these enhancements are useful independent of sharding,
>> we have not tested to see how well an FDW sharding setup will work and
>> for which workloads.
>>
>
> This is the point I agree. I'm not objecting against any single FDW
> advance, because it's useful by itself.
>
> We know Postgres XC/XL works, and scales, but we also know they require
>> too many code changes to be merged into Postgres (at least based on
>> previous discussions).  The FDW sharding approach is to enhance the
>> existing features of Postgres to allow as much sharding as possible.
>>
>
> This comparison doesn't seems correct to me. Postgres XC/XL supports data
> redistribution between nodes. And I haven't heard any single idea of
> supporting this in FDW. You are comparing not equal things.
>
>
>> Once that is done, we can see what workloads it covers and
>> decide if we are willing to copy the volume of code necessary
>> to implement all supported Postgres XC or XL workloads.
>> (The Postgres XL license now matches the Postgres license,
>> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
>> 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Konstantin Knizhnik
Sorry, but based on this plan it is possible to make a conclusion that 
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more 
possible alternatives.
Our main idea with XTM (eXtensible Transaction Manager API) was to make 
it possible to develop cluster solutions for Postgres as extensions 
without patching code of Postgres core. And FDW is one of the mechanism 
which makes it possible to reach this goal.


IMHO it will be hard to implement efficient execution of complex OLAP 
queries (including cross-node joins  and aggregation) within FDW 
paradigm. It will be necessary to build distributed query execution plan 
and coordinate it execution at cluster nodes. And definitely we need 
specialized optimizer for distributed queries. Right now solution of the 
problem are provided by XL and Greenplum, but both are forks of Posrgres 
with a lot of changes in Postgres core. The challenge is to provide the 
similar functionality, but at extension level (using custom nodes, 
pluggable transaction manager, ...).


But, as you noticed,  complex OLAP is just one of the scenarios and this 
is not the only possible way of using clusters. In some cases FDW-based 
sharding can be quite efficient. Or pg_shard approach which also adds 
sharding at extension level and in some aspects is more flexible than 
FDW-based solution. Not all scenarios require global transaction 
manager. But if one need global consistency, then XTM API allows to 
provide ACID for both approaches (and not only for them).


We currently added to commitfest our XTM patch together with 
postgres_fdw patch integrating timestamp-based DTM implementation in 
postgres_fdw. It illustrates how global consistency canbe reached for 
FDW-based sharding.
If this XTM patch will be committed, then in 9.6 we will have wide 
flexibility to play with different distributed transaction managers. And 
it can be used for many cluster solutions.


IMHO it will be very useful to extend your classification of cluster use 
cases, more precisely  formulate demands in all cases, investigate  how 
them can be covered by existed cluster solutions for Postgres and which 
niches are still vacant. We are currently continue work on "multimaster" 
- some more convenient alternative to hot-standby replication. Looks 
like PostgreSQL is missing some product providing functionality similar 
to Oracle RAC or MySQL Gallera. It is yet another direction of cluster 
development for PostgreSQL.  Let's be more open and flexible.



On 23.02.2016 19:43, Bruce Momjian wrote:

There was discussion at the FOSDEM/PGDay Developer Meeting
(https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
about sharding so I wanted to outline where I think we are going with
sharding and FDWs.

First, let me point out that, unlike pg_upgrade and the Windows port,
which either worked or didn't work, sharding is going be implemented and
useful in stages.  It will take several years to complete, similar to
parallelism, streaming replication, and logical replication.

Second, as part of this staged implementation, there are several use
cases that will be shardable at first, and then only later, more complex
ones.  For example, here are some use cases and the technology they
require:

1. Cross-node read-only queries on read-only shards using aggregate
queries, e.g. data warehouse:

This is the simplest to implement as it doesn't require a global
transaction manager, global snapshot manager, and the number of rows
returned from the shards is minimal because of the aggregates.

2. Cross-node read-only queries on read-only shards using non-aggregate
queries:

This will stress the coordinator to collect and process many returned
rows, and will show how well the FDW transfer mechanism scales.

3. Cross-node read-only queries on read/write shards:

This will require a global snapshot manager to make sure the shards
return consistent data.

4. Cross-node read-write queries:

This will require a global snapshot manager and global snapshot manager.

In 9.6, we will have FDW join and sort pushdown
(http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
calability.html).  Unfortunately I don't think we will have aggregate
pushdown, so we can't test #1, but we might be able to test #2, even in
9.5.  Also, we might have better partitioning syntax in 9.6.

We need things like parallel partition access and replicated lookup
tables for more join pushdown.

In a way, because these enhancements are useful independent of sharding,
we have not tested to see how well an FDW sharding setup will work and
for which workloads.

We know Postgres XC/XL works, and scales, but we also know they require
too many code changes to be merged into Postgres (at least based on
previous discussions).  The FDW sharding approach is to enhance the
existing features of Postgres to allow as much sharding as possible.

Once that is done, we can see what 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Alexander Korotkov
Hi, Bruce!

The important point for me is to distinguish different kind of plans:
implementation plan and research plan.
If we're talking about implementation plan then it should be proven that
proposed approach works in this case. I.e research should be already done.
If we're talking about research plan then we should realize that result is
unpredictable. And we would probably need to dramatically change our way.

This two things would work with FDW:
1) Pull data from data nodes to coordinator.
2) Pushdown computations from coordinator to data nodes: joins, aggregates
etc.
It's proven and clear. This is good.
Another point is that these FDW advances are useful by themselves. This is
good too.

However, the model of FDW assumes that communication happen only between
coordinator and data node. But full-weight distributed optimized can't be
done under this restriction, because it requires every node to communicate
every other node if it makes distributed query faster. And as I get, FDW
approach currently have no research and no particular plan for that.

As I get from Robert Haas's talk (
https://docs.google.com/viewer?a=v=sites=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0
)

> Before we consider repartitioning joins, we should probably get everything
> previously discussed working first.
> – Join Pushdown For Parallelism, FDWs
> – PartialAggregate/FinalizeAggregate
> – Aggregate Pushdown For Parallelism, FDWs
> – Declarative Partitioning
> – Parallel-Aware Append


So, as I get we didn't ever think about possibility of data redistribution
using FDW. Probably, something changed since that time. But I haven't heard
about it.

On Tue, Feb 23, 2016 at 7:43 PM, Bruce Momjian  wrote:

> Second, as part of this staged implementation, there are several use
> cases that will be shardable at first, and then only later, more complex
> ones.  For example, here are some use cases and the technology they
> require:
>
> 1. Cross-node read-only queries on read-only shards using aggregate
> queries, e.g. data warehouse:
>
> This is the simplest to implement as it doesn't require a global
> transaction manager, global snapshot manager, and the number of rows
> returned from the shards is minimal because of the aggregates.
>
> 2. Cross-node read-only queries on read-only shards using non-aggregate
> queries:
>
> This will stress the coordinator to collect and process many returned
> rows, and will show how well the FDW transfer mechanism scales.
>

FDW would work for queries which fits pull-pushdown model. I see no plan to
make other queries work.


> 3. Cross-node read-only queries on read/write shards:
>
> This will require a global snapshot manager to make sure the shards
> return consistent data.
>
> 4. Cross-node read-write queries:
>
> This will require a global snapshot manager and global snapshot manager.
>

At this point, it unclear why don't you refer work done in the direction of
distributed transaction manager (which is also distributed snapshot manager
in your terminology)
http://www.postgresql.org/message-id/56bb7880.4020...@postgrespro.ru


> In 9.6, we will have FDW join and sort pushdown
> (http://thombrown.blogspot.com/2016/02/postgresql-96-part-1-horizontal-s
> calability.html
> ).
> Unfortunately I don't think we will have aggregate
> pushdown, so we can't test #1, but we might be able to test #2, even in
> 9.5.  Also, we might have better partitioning syntax in 9.6.
>
> We need things like parallel partition access and replicated lookup
> tables for more join pushdown.
>
> In a way, because these enhancements are useful independent of sharding,
> we have not tested to see how well an FDW sharding setup will work and
> for which workloads.
>

This is the point I agree. I'm not objecting against any single FDW
advance, because it's useful by itself.

We know Postgres XC/XL works, and scales, but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
>

This comparison doesn't seems correct to me. Postgres XC/XL supports data
redistribution between nodes. And I haven't heard any single idea of
supporting this in FDW. You are comparing not equal things.


> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
>
> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres.  I don't think anyone knows the answer to this 

Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread Bruce Momjian
On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:
> On 23 February 2016 at 16:43, Bruce Momjian  wrote:
> 
> There was discussion at the FOSDEM/PGDay Developer Meeting
> (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
> about sharding so I wanted to outline where I think we are going with
> sharding and FDWs.
> 
> I think we need to be very careful to understand that "FDWs and Sharding" is
> one tentative proposal amongst others, not a statement of direction for the
 --

What other directions are proposed to add sharding to the existing
Postgres code?  If there are, I have not heard of them.  Or are they
only (regularly updated?) forks of Postgres?

> PostgreSQL project since there is not yet any universal agreement.

As I stated clearly, we are going in the FDW direction because improving
FDWs have uses beyond sharding, and once it is done we can see how well
it works for sharding.

> We know Postgres XC/XL works, and scales
> 
> 
> Agreed. 
> 
> In contrast, the FDW/sharding approach is as-yet unproven, and significantly
> without any detailed technical discussion of the exact approach and how it
> would work, even after more than 6 months since we first heard of it openly.
> Since we don't know how it will work, we have no idea how long it will take
> either, or even if it ever will.

Yep.

> I'd like to see discussion of the details in presentation/wiki form and an
> initial prototype, with measurements. Without these things we are still just 
> at
> the speculation stage. Some alternate proposals are also at that stage.

Uh, what "alternate proposals"?

My point was that we know XC/XL works, but there is too much code change
for us, so maybe FDWs will make built-in sharding possible/easier.

> , but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
> 
> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
> 
> 
> It's never been our policy to try to include major projects in single code
> drops. Any move of XL/XC code into PostgreSQL core would need to be done piece
> by piece across many releases. XL is definitely too big for the elephant to 
> eat
> in one mouthful.

Is there any plan to move the XL/XC code into Postgres?  If so, I have
not heard of it.  I thought everyone agreed it was too much code change,
which is why it is a separate code tree.  Is that incorrect?

> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres. 
> 
> 
> And if the FDW approach doesn't work, that won't be part of PostgreSQL core
> either...

Uh, duh.  Yeah, that's what I said.  What is your point?  I said we
don't know if it will work, as you quoted below:

> I don't think anyone knows the answer to this question, and I
> don't know how to find the answer except to keep going with our current
> FDW sharding approach.
> 
> 
> This is exactly the wrong time to discuss this, since we are days away from 
> the
> final deadline for PostgreSQL 9.6 and the community should be focusing on that
> for next few months, not futures.

I posted this because of the discussion at the FOSDEM meeting, and to
address the questions you asked in that meeting.  I even told you last
week on IM that I was going to post this for that stated purpose.  I
didn't pick the time at random.

> What I notice is that when Greenplum announced it would publish as open source
> its modified version of Postgres, there was some scary noise made immediately
> about that concerning patents etc..

> Now, Postgres-XL 9.5 is recently announced and we see another scary sounding
> pronouncement about that *maybe* it won't be included in core. While the
> comments made are true, they do not solely apply to XC/XL, in fact the
> uncertainty applies to all approaches equally since notably we have
> approximately five proposals for future designs.
> 
> These comments, given their timing and nature could easily cause "Fear,
> Uncertainty and Doubt" in people seeing this. FUD is also the name of a sales
> technique designed to undermine proposals. I hope and presume it was not the
> intention and reason for discussing uncertainty now and earlier.

Oh, I absolutely did this as a way to undermine what _everyone_ else is
doing?  Is there another way to behave?

I find this insulting.  Others made the same remarks when I questioned

Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread Simon Riggs
On 23 February 2016 at 16:43, Bruce Momjian  wrote:

> There was discussion at the FOSDEM/PGDay Developer Meeting
> (https://wiki.postgresql.org/wiki/FOSDEM/PGDay_2016_Developer_Meeting)
> about sharding so I wanted to outline where I think we are going with
> sharding and FDWs.
>

I think we need to be very careful to understand that "FDWs and Sharding"
is one tentative proposal amongst others, not a statement of direction for
the PostgreSQL project since there is not yet any universal agreement.

We know Postgres XC/XL works, and scales


Agreed.

In contrast, the FDW/sharding approach is as-yet unproven, and
significantly without any detailed technical discussion of the exact
approach and how it would work, even after more than 6 months since we
first heard of it openly. Since we don't know how it will work, we have no
idea how long it will take either, or even if it ever will.

I'd like to see discussion of the details in presentation/wiki form and an
initial prototype, with measurements. Without these things we are still
just at the speculation stage. Some alternate proposals are also at that
stage.


> , but we also know they require
> too many code changes to be merged into Postgres (at least based on
> previous discussions).  The FDW sharding approach is to enhance the
> existing features of Postgres to allow as much sharding as possible.
>
> Once that is done, we can see what workloads it covers and
> decide if we are willing to copy the volume of code necessary
> to implement all supported Postgres XC or XL workloads.
> (The Postgres XL license now matches the Postgres license,
> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> Postgres XC has always used the Postgres license.)
>

It's never been our policy to try to include major projects in single code
drops. Any move of XL/XC code into PostgreSQL core would need to be done
piece by piece across many releases. XL is definitely too big for the
elephant to eat in one mouthful.


> If we are not willing to add code for the missing Postgres XC/XL
> features, Postgres XC/XL will probably remain a separate fork of
> Postgres.


And if the FDW approach doesn't work, that won't be part of PostgreSQL core
either...


> I don't think anyone knows the answer to this question, and I
> don't know how to find the answer except to keep going with our current
> FDW sharding approach.
>

This is exactly the wrong time to discuss this, since we are days away from
the final deadline for PostgreSQL 9.6 and the community should be focusing
on that for next few months, not futures.

What I notice is that when Greenplum announced it would publish as open
source its modified version of Postgres, there was some scary noise made
immediately about that concerning patents etc..

Now, Postgres-XL 9.5 is recently announced and we see another scary
sounding pronouncement about that *maybe* it won't be included in core.
While the comments made are true, they do not solely apply to XC/XL, in
fact the uncertainty applies to all approaches equally since notably we
have approximately five proposals for future designs.

These comments, given their timing and nature could easily cause "Fear,
Uncertainty and Doubt" in people seeing this. FUD is also the name of a
sales technique designed to undermine proposals. I hope and presume it was
not the intention and reason for discussing uncertainty now and earlier.

I'm glad to see that the viability of the XC/XL approach is recognized. The
fact we have a working solution now is important for users, who don't want
to wait the 3-5 years while we work out and implement a longer term
strategy. Future upgrade support is certain, however.

What eventually gets into PostgreSQL core is as yet uncertain, as is the
timescale, but my hope is that we recognize that multiple use cases can be
supported rather than a single fixed architecture. It seems likely to me
that the PostgreSQL project will do what it does best - take multiple
comments and merge those into a combined system that is better than any of
the individual single proposals.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread Bruce Momjian
On Tue, Feb 23, 2016 at 09:54:46AM -0700, David G. Johnston wrote:
> On Tue, Feb 23, 2016 at 9:43 AM, Bruce Momjian  wrote:
> 
> 4. Cross-node read-write queries:
> 
> This will require a global snapshot manager and global snapshot manager.
> 
> 
> Probably meant "global transaction manager"

Oops, yes, it should be:

4. Cross-node read-write queries:

This will require a global snapshot manager and global transaction
manager.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] The plan for FDW-based sharding

2016-02-23 Thread David G. Johnston
On Tue, Feb 23, 2016 at 9:43 AM, Bruce Momjian  wrote:

> 4. Cross-node read-write queries:
>
> This will require a global snapshot manager and global snapshot manager.
>

Probably meant "global transaction manager"

​David J.​