subject:"Re\: \[HACKERS\] The plan for FDW\-based sharding"

On 2 March 2016 at 03:02, Bruce Momjian  wrote:

> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > Note that I am not saying that other discussed approaches are any
> > better, I am saying that we should know approximately what we
> > actually want and not just beat FDWs with a hammer and hope sharding
> > will eventually emerge and call that the plan.
>
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?

That you won't push it too hard if it works, but works badly, and will be
prepared to back off on the last steps despite all the lead-up
work/time/investment you've put into it.

If FDW-based sharding works, I'm happy enough, I have no horse in this
race. If it doesn't work I don't much care either. What I'm worried about
is it if works like partitioning using inheritance works - horribly badly,
but just well enough that it's served as an effective barrier to doing
anything better.

That's what I want to prevent. Sharding that only-just-works and then stops
us getting anything better into core.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

On 2 March 2016 at 00:03, Robert Haas  wrote:

>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.  We end up revising the index AM API pretty
> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.  I suspect that
> a transaction manager API would end up similarly situated.
> 
>

IMO that needs to be true of all hooks into the real innards.

The ProcessUtility_hook API changed a couple of times after introduction
and nobody screamed. I think we just have to mark such places as having
cross-version API volatility, so you should be prepared to #if
PG_VERSION_NUM around them if you use them.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

On 28 February 2016 at 06:38, Kevin Grittner  wrote:

>
> > For logical replay, applying in batches is actually a good thing since it
> > allows parallelism. We can remove them all from the target's procarray
> all
> > at once to avoid intermediate states becoming visible. So that would be
> the
> > preferred mechanism.
>
> That could be part of a solution.  What I sketched out with the
> "apparent order of execution" ordering of the transactions
> (basically, commit order except when one SERIALIZABLE transaction
> needs to be dragged in front of another due to a read-write
> dependency) is possibly the simplest approach, but batching may
> well give better performance.
>

I'd be really interested in some ideas on how that information might be
usefully accessed. If we could write info on when to apply commits to the
xlog in serializable mode that'd be very handy, especially when looking to
the future with logical decoding of in-progress transactions, parallel
apply, etc.

For parallel apply I anticipated that we'd probably have workers applying
xacts in parallel and committing them in upstream commit order. They'd
sometimes deadlock with each other; when this happened all workers whose
xacts committed after the first aborted xact would have to abort and start
again. Not ideal, but safe.

Being able to avoid that by using SSI information was in the back of my
mind, but with no idea how to even begin to tackle it. What you've
mentioned here is helpful and I'd be interested if you could share a bit
more of your experience in the area.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

On 27 February 2016 at 15:29, Konstantin Knizhnik  wrote:

> Two reasons:
> 1. There is no ideal implementation of DTM which will fit all possible
> needs and be  efficient for all clusters.
> 2. Even if such implementation exists, still the right way of it
> integration is Postgres should use kind of TM API.
> 
>

I've got to say that this is somewhat reminicient of the discussions around
in-core pooling, where argument 1 is applied to justify excluding pooling
from core/contrib.

I don't have a strong position on whether a DTM should be in core or not as
I haven't done enough work in the area. I do think it's interesting to
strongly require that a DTM be in core while we also reject things like
pooling that are needed by a large proportion of users.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

On 27 February 2016 at 11:54, Robert Haas  wrote:

> I could submit a patch adding
> hooks to core to enable all of the things (or even just some of the
> things) that EnterpriseDB has changed in Advanced Server, and that
> patch would be rejected so fast it would make your head spin, because
> of course the core project doesn't want to be burdened with
> maintaining a whole bunch of hooks for the convenience of
> EnterpriseDB.

I can imagine that many such hooks would have little use beyond PPAS, but
I'm somewhat curious as to if any would have wider applications. It's not
unusual for me to be working on something and think "gee, I wish there was
a hook here".

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas

On Fri, Mar 4, 2016 at 8:27 PM, Joshua D. Drake  wrote:
> This does not sound like Bruce at all. Bruce is a lot of things, stubborn,
> sometimes temperamental, a lot of times like you... a hot head but he does
> not take credit for other people's work in my experience.

On the whole, Bruce is a much nicer guy than I am.  But I can't see
eye to eye with him on this.  I admit I may be being unfair to him,
but I'm telling it like I see it.  Like I do.

> Even if there was, so what? IF EDB wants to have a secret plan to push a lot
> of cool features to .Org, who cares? In the end, it all has to go through
> peer review and the meritocracy anyway.

I would just like to say that if I or my employer ever get accused of
having a nefarious plan, and somehow I get to pick *which* nefarious
plan I or my employer is to be accused of having, "a secret plan to
push a lot of cool features to .Org" sounds like a good one for me to
pick, especially since, yeah, we have that plan.  We plan to (try to)
push a lot of cool features to .Org.  We - or at least I - do not plan
to do it in a way that is anything but respectful to the community
process.  Specifically, and in no particular order, we plan to
continue contributing performance and scalability enhancements,
improvements to parallel query, and FDW-related improvements, just as
we have for 9.6.  We may also try to contribute other stuff that we
think will be cool and benefit PostgreSQL.  Suggestions are welcome.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Joshua D. Drake


On 03/04/2016 04:41 PM, Robert Haas wrote:

As far as I understand it,
Bruce came in near the end of that conversation and now wants to claim
credit for something that doesn't really exist yet and, to the extent
that it does exist, wasn't even his idea.


Robert,

This does not sound like Bruce at all. Bruce is a lot of things, 
stubborn, sometimes temperamental, a lot of times like you... a hot head 
but he does not take credit for other people's work in my experience.



get reasonable plans, something that currently isn't true.  I haven't
heard anybody objecting to that, and I don't expect to hear anybody
objecting to that, because it's hard to imagine why you wouldn't want
queries against foreign data wrappers to produce better plans than
they do today.  At worst, you might think it doesn't matter either
way, but actually, I think there are a substantial number of people
who are pretty happy about join pushdown and I expect that when and if
we get aggregate pushdown working there will be even more people who
are happy about that.


Agreed.


That's exactly what the people at EnterpriseDB who are actually doing
work in this area are attempting to do.  Meanwhile, there's also
Bruce, who is neither doing nor planning to do any work in this area,
nor advising either EnterpriseDB or the PostgreSQL community to
undertake any particular project, but who *is* making it sound like
there is a super sekret plan that nobody else gets to see.  However,


I don't see this Robert. I don't see some secret hidden plan. I don't 
see any cabal. I see a guy that has an idea, just like everyone else on 
this list.



as the guy who actually wrote the plan that EnterpriseDB is following,
I happen to know that there's nothing more to it than what I wrote
above.


Even if there was, so what? IF EDB wants to have a secret plan to push a 
lot of cool features to .Org, who cares? In the end, it all has to go 
through peer review and the meritocracy anyway.


Sincerely,

JD




--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas

On Tue, Mar 1, 2016 at 12:07 PM, Konstantin Knizhnik
 wrote:
> In the article them used anotion "wait":
>
> if T.SnapshotTime>GetClockTime()
> then wait until T.SnapshotTime
> Originally we really do sleep here, but then we think that instead of
> sleeping we can just adjust local time.
> Sorry, I do not have format prove it is equivalent but... at least we have
> not encountered any inconsistencies after this fix and performance is
> improved.

I think that those things are probably not equivalent.  They would be
if you could cause the adjustment to advance in lock-step on every
node at the same time, but you probably can't.  And I think it is
extremely unwise to assume that the fact that nothing obviously broke
means that you got it right.  This is the sort of work where formal
proofs of correctness are, IMHO, extremely wise.

> I fear that building a DTM that is fully reliable and also
> well-performing is going to be really hard, and I think it would be
> far better to have one such DTM that is 100% reliable than two or more
> implementations each of which are 99% reliable.
>
> The question is not about it's reliability, but mostly about its
> functionality and flexibility.

Well, *my* concern is about reliability.  A lot of code can be made
faster at the price of less reliability, but that usually doesn't work
out well in the end.  Performance matters too, of course, but the way
to get there is to start with a good algorithm, write reliable code to
implement it, and then optimize.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-04 Thread Robert Haas

On Wed, Mar 2, 2016 at 1:53 PM, Josh berkus  wrote:
> One of the things which causes bad reactions and arguments, Bruce, is that a
> lot of your posts and presentations detailing plans for the FDW approach
> carry the subtext that all four of the other approaches are dead ends and
> not worth considering.  Given that the other approaches, whatever their
> limitations, have working code in the field and the FDW approach does not,
> that's more than a little offensive.

Yeah, I agree with that.  I am utterly mystified by why Bruce keeps
beating this drum, and am frankly pretty annoyed about it.  In the
first place, he seems to think that he invented the idea of using FDWs
for sharding in PostgreSQL, but I don't think that's true.  I think it
was partly my idea, and partly something that the NTT folks have been
working on for years (cf, e.g.,
cb1ca4d800621dcae67ca6c799006de99fa4f0a5).  As far as I understand it,
Bruce came in near the end of that conversation and now wants to claim
credit for something that doesn't really exist yet and, to the extent
that it does exist, wasn't even his idea.  In the second place, the
only thing that these repeated emails and development meeting
discussions of the topic actually accomplish is to be piss people off.
I do believe that enhancing the foreign data wrapper interface can be
part of a horizontal scalability story for PostgreSQL, but as long as
nobody is objecting to the individual enhancements, which I don't see
anybody doing, then why the heck do we have to keep arguing about this
big picture story?  It doesn't matter at all, and it doesn't even
really exist, yet somehow Bruce keeps bringing it up, which I think
serves no useful purpose whatsoever.

> If we want to move forwards on serious work on FDW-based sharding, the folks
> working on it should stop treating it as a "fait accompli" that this is the
> Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all of your
> time arguing that point instead of working on features that matter.

The only person treating it that way is Bruce.

> In contrast, this FDW plan *still* feels very much like a small group made
> up of employees of only two companies came up with it in private and decided
> that it should be the plan for the whole project.  I know that Bruce and
> others have good reasons for starting the FDW project, but there hasn't been
> much of an attempt to obtain community consensus around it. If Bruce and
> others want contributors to work on FDWs instead of other sharding
> approaches, then they need to win over those people as to why they should do
> that.  It's how this community works.

There hasn't been much of an attempt to obtain community consensus
about it because there isn't actually some grand plan, private or
otherwise, much as Bruce's emails might make you think otherwise.
EnterpriseDB *does* have a plan to try to continue enhancing foreign
data wrappers so that you can run queries against foreign tables and
get reasonable plans, something that currently isn't true.  I haven't
heard anybody objecting to that, and I don't expect to hear anybody
objecting to that, because it's hard to imagine why you wouldn't want
queries against foreign data wrappers to produce better plans than
they do today.  At worst, you might think it doesn't matter either
way, but actually, I think there are a substantial number of people
who are pretty happy about join pushdown and I expect that when and if
we get aggregate pushdown working there will be even more people who
are happy about that.

The only other ongoing work that EnterpriseDB has that at all touches
on this area is Ashutosh Bapat's work on 2PC for FDWs.  I'm not
convinced that's fully baked, and it conflicts with the XTM stuff the
Postgres Pro guys are doing, which I *also* don't think is fully
baked, so I'm not real keen on pressing forward aggressively with
either approach right now.  I think we (eventually) need a solution to
the problem of consistent cross-node consistency, but I am deeply
unconvinced that anything currently on the table is going to get us
there.  I did recommend the 2PC for FDW project, but I'm not amazingly
happy with how it came out, and I think we need to think harder about
other approaches before adopting something.

> Alternately, you can just work on the individual FDW features, which
> *everyone* thinks are a good idea, and when most of them are done, FDW-based
> scaleout will be such an obvious solution that nobody will argue with it.

That's exactly what the people at EnterpriseDB who are actually doing
work in this area are attempting to do.  Meanwhile, there's also
Bruce, who is neither doing nor planning to do any work in this area,
nor advising either EnterpriseDB or the PostgreSQL community to
undertake any particular project, but who *is* making it sound like
there is a super sekret plan that nobody else gets to see.  However,
as the guy who actually wrote the plan that EnterpriseDB is

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov

On Mar 3, 2016 4:47 AM, "Michael Paquier"  wrote:
>
> On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
>  wrote:
> > If FDWs existed then Postgres XC/XL were being developed then I believe
they
> > would try to build full-featured prototype of FDW based sharding. If
this
> > prototype succeed then we could make a full roadmap.
>
> Speaking here with my XC hat, that's actually the case. A couple of
> years back when I worked on it, there were discussions about reusing
> FDW routines for the purpose of XC, which would have been roughly
> reusing postgres_fdw + the possibility to send XID, snapshot and
> transaction timestamp to the remote nodes after getting that from the
> GTM (global transaction manager ensuring global data visibility and
> consistency), and have the logic for query pushdown in the FDW itself
> when planning query on what would have been roughly foreign tables
> (not entering in the details here, those would have not been entirely
> foreign tables). At this point the global picture was not completely
> set, XC being based on 9.1~9.2 and the FDW base routines were not as
> extended as they are now. As history has told, this global picture has
> never showed up, though it would should XC have been merged with 9.3.
> The point is that XC would have moved as using the FDW approach, as a
> set of plugins.
>
> This was a reason behind this email of 2013 on -hackers actually:
>
http://www.postgresql.org/message-id/cab7npqtdjf-58wuf-xz01nkj7wf0e+eukggqhd0igvsod4h...@mail.gmail.com

Good to remember!

> Michael
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Tatsuo Ishii

> On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
>  wrote:
>> If FDWs existed then Postgres XC/XL were being developed then I believe they
>> would try to build full-featured prototype of FDW based sharding. If this
>> prototype succeed then we could make a full roadmap.
> 
> Speaking here with my XC hat, that's actually the case. A couple of
> years back when I worked on it, there were discussions about reusing
> FDW routines for the purpose of XC, which would have been roughly
> reusing postgres_fdw + the possibility to send XID, snapshot and
> transaction timestamp to the remote nodes after getting that from the
> GTM (global transaction manager ensuring global data visibility and
> consistency), and have the logic for query pushdown in the FDW itself
> when planning query on what would have been roughly foreign tables
> (not entering in the details here, those would have not been entirely
> foreign tables). At this point the global picture was not completely
> set, XC being based on 9.1~9.2 and the FDW base routines were not as
> extended as they are now. As history has told, this global picture has
> never showed up, though it would should XC have been merged with 9.3.
> The point is that XC would have moved as using the FDW approach, as a
> set of plugins.
> 
> This was a reason behind this email of 2013 on -hackers actually:
> http://www.postgresql.org/message-id/cab7npqtdjf-58wuf-xz01nkj7wf0e+eukggqhd0igvsod4h...@mail.gmail.com
> 
> There were as well discussions about making the connection pooler a
> background worker and plug in that in a shared memory context that all
> backends connecting to this XC-like-postgres_fdw would use, though
> this is another story, for another time...

Thanks for the history. Very interesting...

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Michael Paquier

On Wed, Mar 2, 2016 at 6:54 PM, Alexander Korotkov
 wrote:
> If FDWs existed then Postgres XC/XL were being developed then I believe they
> would try to build full-featured prototype of FDW based sharding. If this
> prototype succeed then we could make a full roadmap.

Speaking here with my XC hat, that's actually the case. A couple of
years back when I worked on it, there were discussions about reusing
FDW routines for the purpose of XC, which would have been roughly
reusing postgres_fdw + the possibility to send XID, snapshot and
transaction timestamp to the remote nodes after getting that from the
GTM (global transaction manager ensuring global data visibility and
consistency), and have the logic for query pushdown in the FDW itself
when planning query on what would have been roughly foreign tables
(not entering in the details here, those would have not been entirely
foreign tables). At this point the global picture was not completely
set, XC being based on 9.1~9.2 and the FDW base routines were not as
extended as they are now. As history has told, this global picture has
never showed up, though it would should XC have been merged with 9.3.
The point is that XC would have moved as using the FDW approach, as a
set of plugins.

This was a reason behind this email of 2013 on -hackers actually:
http://www.postgresql.org/message-id/cab7npqtdjf-58wuf-xz01nkj7wf0e+eukggqhd0igvsod4h...@mail.gmail.com

There were as well discussions about making the connection pooler a
background worker and plug in that in a shared memory context that all
backends connecting to this XC-like-postgres_fdw would use, though
this is another story, for another time...
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov

On Wed, Mar 2, 2016 at 9:53 PM, Josh berkus  wrote:

> On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:
>
>> Sorry, but based on this plan it is possible to make a conclusion that
>> there are only two possible cluster solutions for Postgres:
>> XC/XL and FDW-based.  From my point of view there are  much more
>> possible alternatives.
>>
>
> Definitely.
>
> Currently we have five approaches to sharding inside postgres in the
> field, in chronological order:
>
> 1. Greenplum's executor-based approach with motion nodes
>
> 2. Skype's function-based approach (PL/proxy)
>
> 3. XC/XL's approach, which I believe is also query executor-based
>
> 4. CitusDB's pg_shard which is based on query hooks
>
> 5. FDW-based (currently theoretical)
>
> One of the things which causes bad reactions and arguments, Bruce, is that
> a lot of your posts and presentations detailing plans for the FDW approach
> carry the subtext that all four of the other approaches are dead ends and
> not worth considering.  Given that the other approaches, whatever their
> limitations, have working code in the field and the FDW approach does not,
> that's more than a little offensive.
>
> If we want to move forwards on serious work on FDW-based sharding, the
> folks working on it should stop treating it as a "fait accompli" that this
> is the Chosen Way for the PostgreSQL project.  Otherwise, you'll spend all
> of your time arguing that point instead of working on features that matter.
>
> Bruce made a long comparison with built-in replication, but there's a big
> difference here.  We decided that WAL-based replication was the way to go
> for built-in as a community decision here on -hackers and at various
> conferences.  Both the plan and the implementation for replication
> transcended company backing, involving even active competitors, and
> involved discussions with maintainers of the older replication projects.
>
> In contrast, this FDW plan *still* feels very much like a small group made
> up of employees of only two companies came up with it in private and
> decided that it should be the plan for the whole project.  I know that
> Bruce and others have good reasons for starting the FDW project, but there
> hasn't been much of an attempt to obtain community consensus around it. If
> Bruce and others want contributors to work on FDWs instead of other
> sharding approaches, then they need to win over those people as to why they
> should do that.  It's how this community works.
>
> Alternately, you can just work on the individual FDW features, which
> *everyone* thinks are a good idea, and when most of them are done,
> FDW-based scaleout will be such an obvious solution that nobody will argue
> with it.


+1

Thank you, Josh. I think this is excellent summary for conversation about
FDW-based sharding.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Josh berkus


On 02/24/2016 01:22 AM, Konstantin Knizhnik wrote:

Sorry, but based on this plan it is possible to make a conclusion that
there are only two possible cluster solutions for Postgres:
XC/XL and FDW-based.  From my point of view there are  much more
possible alternatives.


Definitely.

Currently we have five approaches to sharding inside postgres in the 
field, in chronological order:


1. Greenplum's executor-based approach with motion nodes

2. Skype's function-based approach (PL/proxy)

3. XC/XL's approach, which I believe is also query executor-based

4. CitusDB's pg_shard which is based on query hooks

5. FDW-based (currently theoretical)

One of the things which causes bad reactions and arguments, Bruce, is 
that a lot of your posts and presentations detailing plans for the FDW 
approach carry the subtext that all four of the other approaches are 
dead ends and not worth considering.  Given that the other approaches, 
whatever their limitations, have working code in the field and the FDW 
approach does not, that's more than a little offensive.


If we want to move forwards on serious work on FDW-based sharding, the 
folks working on it should stop treating it as a "fait accompli" that 
this is the Chosen Way for the PostgreSQL project.  Otherwise, you'll 
spend all of your time arguing that point instead of working on features 
that matter.


Bruce made a long comparison with built-in replication, but there's a 
big difference here.  We decided that WAL-based replication was the way 
to go for built-in as a community decision here on -hackers and at 
various conferences.  Both the plan and the implementation for 
replication transcended company backing, involving even active 
competitors, and involved discussions with maintainers of the older 
replication projects.


In contrast, this FDW plan *still* feels very much like a small group 
made up of employees of only two companies came up with it in private 
and decided that it should be the plan for the whole project.  I know 
that Bruce and others have good reasons for starting the FDW project, 
but there hasn't been much of an attempt to obtain community consensus 
around it. If Bruce and others want contributors to work on FDWs instead 
of other sharding approaches, then they need to win over those people as 
to why they should do that.  It's how this community works.


Alternately, you can just work on the individual FDW features, which 
*everyone* thinks are a good idea, and when most of them are done, 
FDW-based scaleout will be such an obvious solution that nobody will 
argue with it.


--
--
Josh Berkus
Red Hat OSAS
(any opinions are my own)


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Konstantin Knizhnik




On 01.03.2016 22:02, Bruce Momjian wrote:

On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:

Note that I am not saying that other discussed approaches are any
better, I am saying that we should know approximately what we
actually want and not just beat FDWs with a hammer and hope sharding
will eventually emerge and call that the plan.

I will say it again --- FDWs are the only sharding method I can think of
that has a chance of being accepted into Postgres core.  It is a plan,
and if it fails, it fails.  If is succeeds, that's good.  What more do
you want me to say?  I know of no other way to answer the questions you
asked above.


I do not understand why it can fail.
FDW approach may be not flexible enough for building optimal distributed 
query execution plans for complex OLAP queries.
But for simple queries it should work fine. Simple queries corresponds  
OLTP and simple OLAP.
For OLTP we definitely need transaction manager to provide global 
consistency.
And we have actually prototype of integration postgres_fdw with out 
pg_dtm and pg_tsdtm transaction managers.

The results are quite IMHO promising (see attached diagram).

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



DTM-pgconf.pdf
Description: Adobe PDF document

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov

On Tue, Mar 1, 2016 at 10:11 PM, Bruce Momjian  wrote:

> On Tue, Mar  1, 2016 at 02:02:44PM -0500, Bruce wrote:
> > On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > > Note that I am not saying that other discussed approaches are any
> > > better, I am saying that we should know approximately what we
> > > actually want and not just beat FDWs with a hammer and hope sharding
> > > will eventually emerge and call that the plan.
> >
> > I will say it again --- FDWs are the only sharding method I can think of
> > that has a chance of being accepted into Postgres core.  It is a plan,
> > and if it fails, it fails.  If is succeeds, that's good.  What more do
> > you want me to say?  I know of no other way to answer the questions you
> > asked above.
>
> I guess all I can say is that if FDWs existed when Postgres XC/XL were
> being developed, that they likely would have been used or at least
> considered.  I think we are basically making that attempt now.

If FDWs existed then Postgres XC/XL were being developed then I believe
they would try to build full-featured prototype of FDW based sharding. If
this prototype succeed then we could make a full roadmap.
For now, we don't have a full roadmap, we have only some pieces. This is
why people doubt. When you're speaking about advances that are natural to
FDW, then no problem, nobody is against FDW advances. However, other things
are unclear.
You can try to build full-featured prototype to convince people. Despite it
would take some resources it will save more resources because it would save
us from errors.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Alexander Korotkov

On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> > On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> >> > Two reasons:
> >> > 1. There is no ideal implementation of DTM which will fit all
> possible needs
> >> > and be  efficient for all clusters.
> >>
> >> Hmm, what is the reasoning behind that statement?  I mean, it is
> >> certainly true that there are some places where we have decided that
> >> one-size-fits-all is not the right approach.  Indexing, for example.
> >
> > Uh, is that even true of indexing?  While the plug-in nature of indexing
> > allows for easier development and testing, does anyone create plug-in
> > indexing that isn't shipped by us?  I thought WAL support was something
> > that prevented external indexing solutions from working.
>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.

It's because we didn't offer legal mechanism for pluggable AMs.

> We end up revising the index AM API pretty
> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.

I can't buy this argument. One may say this about any single API. Thinking
so will lead you to rejecting any extendability. And that would be in
direct contradiction to original postgres concept.
During last 5 years we add 2 new AMs: SP-GiST and BRIN. And BRIN is very
different from any other AM we have before.
And I wouldn't say that AM API have dramatical changes during that time.
There were some changes. But it would be normal work for extension
maintainers to adopt these changes like they do for other API changes.

There is simple example where we lack of extensible AMs: fast full-text
search. We can't provide it with current GIN, because we lack of positional
information in it. And we can't push these advances into core because
current implementation has not perfect design. Ideal design would be push
all required functionality into btree, then make GIN wrapper over btree
then add required functionality. But this is roadmap for 5-10 years. These
5-10 years uses will suffer from having 3-rd party solutions for fast FTS
instead of in core one. But our design questions is actually not something
that users care about. It's not reliability questions. And having pluggable
AMs would be really chance in this situation. Users could use extension
right now. And then when after many years we finally implement the right
design, they could migrate to in-core solution. But 5-10 years of fast FTS
does matter.

> I suspect that
> a transaction manager API would end up similarly situated.
>

I disagree with you about AM API. But I agree that TM API should be in
similar situation to AM API.

--
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov

On Wed, Mar 2, 2016 at 4:36 AM, Tomas Vondra 
wrote:

Hi,
>
> On 03/01/2016 08:02 PM, Bruce Momjian wrote:
>
>> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
>>
>>> Note that I am not saying that other discussed approaches are any
>>> better, I am saying that we should know approximately what we
>>> actually want and not just beat FDWs with a hammer and hope sharding
>>> will eventually emerge and call that the plan.
>>>
>>
>> I will say it again --- FDWs are the only sharding method I can think
>> of that has a chance of being accepted into Postgres core.
>>
>
>
>
> While I disagree with Simon on various things, I absolutely understand why
> he was asking about a prototype, and some sort of analysis of what usecases
> we expect to support initially/later/never, and what pieces are missing to
> get the sharding working. IIRC at the FOSDEM Dev Meeting you've claimed
> you're essentially working on a prototype - once we have the missing FDW
> pieces, we'll know if it works. I disagree that - it's not a prototype if
> it takes several years to find the outcome.
>
>
fully agree. Probably, we all need to help to build prototype in
between-releases period. I see no legal way to resolve the situation.


>
> --
> Tomas Vondra  http://www.2ndQuadrant.com
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] The plan for FDW-based sharding

2016-03-02 Thread Oleg Bartunov

On Tue, Mar 1, 2016 at 7:03 PM, Robert Haas  wrote:

> On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> > On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> >> > Two reasons:
> >> > 1. There is no ideal implementation of DTM which will fit all
> possible needs
> >> > and be  efficient for all clusters.
> >>
> >> Hmm, what is the reasoning behind that statement?  I mean, it is
> >> certainly true that there are some places where we have decided that
> >> one-size-fits-all is not the right approach.  Indexing, for example.
> >
> > Uh, is that even true of indexing?  While the plug-in nature of indexing
> > allows for easier development and testing, does anyone create plug-in
> > indexing that isn't shipped by us?  I thought WAL support was something
> > that prevented external indexing solutions from working.
>
> True.  There is an API, though, and having pluggable WAL support seems
> desirable too.  At the same time, I don't think we know of anyone
> maintaining a non-core index AM ... and there are probably good
> reasons for that.  We end up revising the index AM API pretty
>

We'd love to develop new special index AM, that's why we all are for
pluggable WAL. I think there are will be other AM developers, once we open
the door for that.

> regularly every time somebody wants to do something new, so it's not
> really a stable API that extensions can just tap into.  I suspect that
> a transaction manager API would end up similarly situated.
>

I don't expect many other TM developers, so there is no problem with
improving API. We started from practical needs and analyses of many
academical papers. We spent a year to play with several prototypes to prove
our proposed API (expect more in several months). Everybody could download
them a test. Wish we can do that with FDW-based sharding solution.

Of course, we can fork postgres as XC/XL people did and certainly
eventually will do, if community don't accept our proposal, since it's very
difficult to work on cross-releases projects. But then there are will be no
winners, so why do we all are aggressively don't understand each other ! I
was watching  XC/XL for years and thought I don't want to go this way of
isolation from the community, so we decided to let TM pluggable to stay
with community and let everybody prove their concepts. if you have ideas
how to improve TM API, we are open, if you know it's broken by design,
let's help us to fix it.  I have my understanding about FDW, but I
deliberately don't participate in some very hot discussion, just because I
feel myself not commited to work on. Your group is very enthusiastic on
FDW, it's ok until you improve FDW in general way, I'm very happy on
current work.  I prefer you show prototype of sharding solution, which
convince us in functionality and perfromance. I agree with Thomas Vondra,
that we don't want to wait for years to see the result, we want to expect
results, based on prototype, which should be done between releases. If you
don't have enough resources for this, let's do together with community.
Nobody as I've seen are against FDW sharding, people complained about "the
only sharding solution" in postgres, without proving so.

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Tomas Vondra


Hi,

On 03/01/2016 08:02 PM, Bruce Momjian wrote:

On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:

Note that I am not saying that other discussed approaches are any
better, I am saying that we should know approximately what we
actually want and not just beat FDWs with a hammer and hope sharding
will eventually emerge and call that the plan.


I will say it again --- FDWs are the only sharding method I can think
of that has a chance of being accepted into Postgres core.


I don't quite see why that would be the case. Firstly, it assumes that 
FDW-based approach is going to work, but given the lack of prototype or 
even a technical analysis discussing the missing pieces, that's very 
difficult to judge.


I find it a bit annoying that there are objections from people who 
implemented (or attempted to implement) sharding on PostgreSQL, yet no 
reasonable analysis of their arguments and how the FDW approach will 
address them. My my understanding is they deem FDWs a bad foundation for 
sharding because it was designed for a different purpose, but the 
abstractions are a bad fit for sharding (which assumes isolated nodes, 
certain form of execution etc.)



It is a plan, and if it fails, it fails. If is succeeds, that's
good. What more do you want me to say? I know of no other way to
answer the questions you asked above.


Well, wouldn't it be great if we could do the decision based on some 
facts and not mere belief that it'll help. That's exactly what Petr is 
talking about - the fear that we'll spend a few years working on 
sharding based on FDWs, only to find out that it does not work too well. 
That'd be a pretty bad outcome, wouldn't it?


My other worry is that we'll eventually mess the FDW infrastructure, 
making it harder to use for the original purpose. Granted, most of the 
improvements proposed so far look sane and useful for FDWs in general, 
but sooner or later that ceases to be the case - there sill be changes 
needed merely for the sharding. Those will be tough decisions.


While I disagree with Simon on various things, I absolutely understand 
why he was asking about a prototype, and some sort of analysis of what 
usecases we expect to support initially/later/never, and what pieces are 
missing to get the sharding working. IIRC at the FOSDEM Dev Meeting 
you've claimed you're essentially working on a prototype - once we have 
the missing FDW pieces, we'll know if it works. I disagree that - it's 
not a prototype if it takes several years to find the outcome.


Also, in another branch of this thread you've said this (I don't want to 
sprinkle the thread with responses, so I'll just respond here):



In a way, I don't see any need for an FDW sharding prototype
because, as I said, we already know XC/XL work, so copying what they
do doesn't help. What we need to know is if we can get near the XC/XL
 benchmarks with an acceptable addition of code, which is what I
thought I already said. Perhaps this can be done with FDWs, or some
other approach I have not heard of yet.


I don't quite understand the reasoning presented here. The XC/XL are not 
based on FDWs at all, therefore the need for prototype of the FDW-based 
sharding is entirely independent to the fact that these solutions seem 
to work quite well.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Konstantin Knizhnik

On 03/01/2016 09:19 PM, Petr Jelinek wrote:

Since this thread heavily discusses the XTM, I have question about the XTM as proposed because one thing is very unclear to me - what happens when user changes the XTM plugin on the server? I didn't see any xid handover API which makes me wonder if
changes of a plugin (or for example failure to load previously used plugin due to admin error) will send server to similar situation as xid wraparound.

Transaction manager is very "intimate" part of DBMS and certainly bugs and
problems in custom TM implementation can break the server.
So if you are providing custom TM implementation, you should take full
responsibility on system integrity.
XTM API itself doesn't enforce any XID handling policy. As far as we do not
want to change tuple header format, XID is still 32-bit integer.

In case of pg_dtm, global transactions at all nodes are assigned the same XID
by arbiter. Arbiter is handling XID wraparound.
In pg_tsdtm each node maintains its own XIDs, actually pg_tsdtm doesn't change way of assigning CIDs by Postgres. So wraparound in this case is handled in standard way. Instead of assigning own global XIDs, pg_tsdtm provides mapping between local XIDs and
global CSNs. Visibility checking rules looks on CSNs, not on XIDs.

In both cases if system is for some reasons restarted and DTM plugin failed to
be loaded, you can still access database locally. No data can be lost.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Bruce Momjian

On Tue, Mar  1, 2016 at 02:02:44PM -0500, Bruce wrote:
> On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> > Note that I am not saying that other discussed approaches are any
> > better, I am saying that we should know approximately what we
> > actually want and not just beat FDWs with a hammer and hope sharding
> > will eventually emerge and call that the plan.
> 
> I will say it again --- FDWs are the only sharding method I can think of
> that has a chance of being accepted into Postgres core.  It is a plan,
> and if it fails, it fails.  If is succeeds, that's good.  What more do
> you want me to say?  I know of no other way to answer the questions you
> asked above.

I guess all I can say is that if FDWs existed when Postgres XC/XL were
being developed, that they likely would have been used or at least
considered.  I think we are basically making that attempt now.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Bruce Momjian

On Tue, Mar  1, 2016 at 07:56:58PM +0100, Petr Jelinek wrote:
> Note that I am not saying that other discussed approaches are any
> better, I am saying that we should know approximately what we
> actually want and not just beat FDWs with a hammer and hope sharding
> will eventually emerge and call that the plan.

I will say it again --- FDWs are the only sharding method I can think of
that has a chance of being accepted into Postgres core.  It is a plan,
and if it fails, it fails.  If is succeeds, that's good.  What more do
you want me to say?  I know of no other way to answer the questions you
asked above.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Petr Jelinek


On 27/02/16 04:54, Robert Haas wrote:

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
 wrote:

We do not have formal prove that proposed XTM is "general enough" to handle
all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based and
CSN  based.


I don't believe that for a minute.  For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.



I have similar problem with the FDW approach though. It seems to me like 
because we have something that solves access to external tables somebody 
decided that it should be used as base for the whole sharding solution 
but there is no real concept of how it will look like together, no ideas 
what it will be usable for and not even simple prototype that would 
prove that the idea is sound (although again, I am not clear on what the 
actual idea is beyond "we will use FDWs").


Don't get me wrong, I agree that the current FDW enhancements are 
useful, I am just worried about them being presented as future of 
sharding in Postgres when nobody has sketched how the future might look 
like. And  once we get to more interesting parts like consistency, 
distributed query planning, p2p connections (and I am really concerned 
about these as FDWs abstract some knowledge that coordinator and or data 
nodes might need to do these well), etc we might very well find 
ourselves painted in the corner and have to start from beginning, while 
if we had some idea on how the whole thing might look like we could 
identify this early and not postpone built-in sharding by several years 
just because somebody said we will use FDWs and that's what we worked on 
in those years.


Note that I am not saying that other discussed approaches are any 
better, I am saying that we should know approximately what we actually 
want and not just beat FDWs with a hammer and hope sharding will 
eventually emerge and call that the plan.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Petr Jelinek


On 01/03/16 18:18, Konstantin Knizhnik wrote:


On 01.03.2016 19:03, Robert Haas wrote:

On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:

On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:

Two reasons:
1. There is no ideal implementation of DTM which will fit all
possible needs
and be  efficient for all clusters.

Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.

Uh, is that even true of indexing?  While the plug-in nature of indexing
allows for easier development and testing, does anyone create plug-in
indexing that isn't shipped by us?  I thought WAL support was something
that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.



IMHO non-stable API is better than lack of API.
Just because it makes it possible to implement features in modular way.
And refactoring of API is not so difficult thing...



Since this thread heavily discusses the XTM, I have question about the 
XTM as proposed because one thing is very unclear to me - what happens 
when user changes the XTM plugin on the server? I didn't see any xid 
handover API which makes me wonder if changes of a plugin (or for 
example failure to load previously used plugin due to admin error) will 
send server to similar situation as xid wraparound.


--
  Petr Jelinek  http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training & Services


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Konstantin Knizhnik




On 01.03.2016 19:03, Robert Haas wrote:

On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:

On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:

Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs
and be  efficient for all clusters.

Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.

Uh, is that even true of indexing?  While the plug-in nature of indexing
allows for easier development and testing, does anyone create plug-in
indexing that isn't shipped by us?  I thought WAL support was something
that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.



IMHO non-stable API is better than lack of API.
Just because it makes it possible to implement features in modular way.
And refactoring of API is not so difficult thing...


--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Konstantin Knizhnik


Thank you very much for you comments.

On 01.03.2016 18:19, Robert Haas wrote:

On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
 wrote:

How do you prevent clock skew from causing serialization anomalies?

If node receives message from "feature" it just needs to wait until this
future arrive.
Practically we just "adjust" system time in this case, moving it forward
(certainly system time is not actually changed, we just set correction value
which need to be added to system time).
This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do
here.

Hmm, the approach in that article is very interesting, but it sounds
different than what you are describing - they do not, AFAICT, have
anything like a "correction value"


In the article them used anotion "wait":

if T.SnapshotTime>GetClockTime()
then wait until T.SnapshotTime

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Robert Haas

On Tue, Mar 1, 2016 at 10:37 AM, Bruce Momjian  wrote:
> On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
>> > Two reasons:
>> > 1. There is no ideal implementation of DTM which will fit all possible 
>> > needs
>> > and be  efficient for all clusters.
>>
>> Hmm, what is the reasoning behind that statement?  I mean, it is
>> certainly true that there are some places where we have decided that
>> one-size-fits-all is not the right approach.  Indexing, for example.
>
> Uh, is that even true of indexing?  While the plug-in nature of indexing
> allows for easier development and testing, does anyone create plug-in
> indexing that isn't shipped by us?  I thought WAL support was something
> that prevented external indexing solutions from working.

True.  There is an API, though, and having pluggable WAL support seems
desirable too.  At the same time, I don't think we know of anyone
maintaining a non-core index AM ... and there are probably good
reasons for that.  We end up revising the index AM API pretty
regularly every time somebody wants to do something new, so it's not
really a stable API that extensions can just tap into.  I suspect that
a transaction manager API would end up similarly situated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Bruce Momjian

On Tue, Mar  1, 2016 at 10:19:45AM -0500, Robert Haas wrote:
> > Two reasons:
> > 1. There is no ideal implementation of DTM which will fit all possible needs
> > and be  efficient for all clusters.
> 
> Hmm, what is the reasoning behind that statement?  I mean, it is
> certainly true that there are some places where we have decided that
> one-size-fits-all is not the right approach.  Indexing, for example.

Uh, is that even true of indexing?  While the plug-in nature of indexing
allows for easier development and testing, does anyone create plug-in
indexing that isn't shipped by us?  I thought WAL support was something
that prevented external indexing solutions from working.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-03-01 Thread Robert Haas

On Sat, Feb 27, 2016 at 2:29 AM, Konstantin Knizhnik
 wrote:
>> How do you prevent clock skew from causing serialization anomalies?
>
> If node receives message from "feature" it just needs to wait until this
> future arrive.
> Practically we just "adjust" system time in this case, moving it forward
> (certainly system time is not actually changed, we just set correction value
> which need to be added to system time).
> This approach was discussed in the article:
> http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
> I hope, in this article algorithm is explained much better than I can do
> here.

Hmm, the approach in that article is very interesting, but it sounds
different than what you are describing - they do not, AFAICT, have
anything like a "correction value".

> There are well know limitation of this  pg_tsdtm which we will try to
> address in future.

How well known are those limitations?  Are they documented somewhere?
Or are they only well-known to you?

> What we want is to include XTM API in PostgreSQL to be able to continue our
> experiments with different transaction managers and implementing multimaster
> on top of it (our first practical goal) without affecting PostgreSQL core.
>
> If XTM patch will be included in 9.6, then we can propose our multimaster as
> PostgreSQL extension and everybody can use it.
> Otherwise we have to propose our own fork of Postgres which significantly
> complicates using and maintaining it.

Well I still think what I said before is valid.  If the code is good,
let it be a core submission.  If it's not ready yet, submit it to core
when it is.  If it can't be made good, forget it.

>> This seems rather defeatist.  If the code is good and reliable, why
>> should it not be committed to core?
>
> Two reasons:
> 1. There is no ideal implementation of DTM which will fit all possible needs
> and be  efficient for all clusters.

Hmm, what is the reasoning behind that statement?  I mean, it is
certainly true that there are some places where we have decided that
one-size-fits-all is not the right approach.  Indexing, for example.
But there are many other places where we have not chosen to make
things pluggable, and that I don't think it should be taken for
granted that plugability is always an advantage.

I fear that building a DTM that is fully reliable and also
well-performing is going to be really hard, and I think it would be
far better to have one such DTM that is 100% reliable than two or more
implementations each of which are 99% reliable.

> 2. Even if such implementation exists, still the right way of it integration
> is Postgres should use kind of TM API.

Sure, APIs are generally good, but that doesn't mean *this* API is good.

> I hope that everybody will agree that doing it in this way:
>
> #ifdef PGXC
> /* In Postgres-XC, stop timestamp has to follow the timeline of GTM
> */
> xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
> #else
> xlrec.xact_time = xactStopTimestamp;
> #endif

PGXC chose that style in order to simplify merging.  I wouldn't have
picked the same thing, but I don't know why it deserves scorn.

> or in this way:
>
> xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp
> : xactStopTimestamp;
>
> is very very bad idea.

I don't know why that is such a bad idea.  It's a heck of a lot faster
than insisting on calling some out-of-line function.  It might be a
bad idea, but I think we need to decide that, not assume it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-28 Thread Simon Riggs

On 27 February 2016 at 22:38, Kevin Grittner  wrote:

> That could be part of a solution.  What I sketched out with the
> "apparent order of execution" ordering of the transactions
> (basically, commit order except when one SERIALIZABLE transaction
> needs to be dragged in front of another due to a read-write
> dependency) is possibly the simplest approach, but batching may
> well give better performance.
>
> > Collecting a list of transactions that must be applied before the current
> > one could be accumulated during SSI processing and added to the commit
> > record. But reordering the transaction apply is something we'd need to
> get
> > some real clear theory on before we considered it.
>
> Oh, there is a lot of very clear theory on it.  I even considered
> whether it might work at the physical level, but that seems fraught
> with potential land-mines due to the subtle ways in which we manage
> race conditions at the detail level.  It's one of those things that
> seems theoretically possible, but probably a really bad idea in
> practice.  For logical replication, though, there is a clear way to
> determine a reasonable order of applying changes that will never
> yield a serialization anomaly -- if we do that, we dodge the choice
> between using a "stale" safe snapshot or waiting an indeterminate
> length of time for a "fresh" safe snapshot -- at the cost of
> delaying logical replication itself at various points.
>

I think we're going to have practical difficulties with these concepts.

If an xid commits with inConflicts, those refer to transactions that may
not yet have assigned xids. They may be assigned xids for hours or days
even so its hard to know whether they will eventually become write
transactions or not, making it a challenge to even know whether we should
delay. And if even if we did know, delaying apply of commits for hours to
allow us to reorder transactions isn't practical in all cases, clearly,
more so if the impact is caused by one minor table that nobody much cares
about.

What I see as more practical is reducing the scope of "safe transactions"
down to "safe scopes", where particular tables or sets of tables are known
safe at particular times, so we know more about which things we can look at
safely.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

2016-02-28 Thread Konstantin Knizhnik


On 02/27/2016 11:38 PM, Kevin Grittner wrote:


Is this an implementation of some particular formal technique?  If
so, do you have a reference to a paper on it?  I get the sense that
there has been a lot written about distributed transactions, and
that it would be a mistake to ignore it, but I have not (yet)
reviewed the literature for it.


The reference to the article is at our WiKi pages explaining our DTM: 
https://wiki.postgresql.org/wiki/DTM

http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Kevin Grittner

On Sat, Feb 27, 2016 at 3:57 PM, Simon Riggs  wrote:
> On 27 February 2016 at 17:54, Kevin Grittner  wrote:
>>
>> On a single database SSI can see whether a read has
>> caused such a problem.  If you replicate the transactions to
>> somewhere else and read them SSI cannot tell whether there is an
>> anomaly
>
> OK, I thought you were saying something else. What you're saying is that SSI
> doesn't work on replicas, yet, whether that is physical or logical.

Right.

> Row level locking (S2PL) can be used on logical standbys, so its actually a
> better situation.

Except that S2PL has the concurrency and performance problems that
caused us to rip out a working S2PL implementation in PostgreSQL
core.  Layering it on outside of that isn't going to offer better
concurrency or perform better than what we ripped out; but it does
work.

>> One possibility is to pass along information
>> about when things are in a state on the source that is known to be
>> free of anomalies if read; another would be to reorder the
>> application of transactions to match the apparent order of
>> execution.  The latter would not work for "physical" replication,
>> but should be fine for logical replication.  An implementation
>> might create a list in commit order, but not release the front of
>> the list for processing if it is a SERIALIZABLE transaction which
>> has written data until all overlapping SERIALIZABLE transactions
>> complete, so it can move any subsequently-committed SERIALIZABLE
>> transaction which read the "old" version of the data ahead of it.
>
> The best way would be to pass across "anomaly barriers", since they can
> easily be inserted into the WAL stream. The main issue seems to be how and
> when to detect them.

That, and how to choose whether to run right away with the last
known consistent snapshot, or wait for the next one.  There seem to
be use cases for both.  None of it seems extraordinarily hard; it's
just never been anyone's top priority.  :-/

> For logical replay, applying in batches is actually a good thing since it
> allows parallelism. We can remove them all from the target's procarray all
> at once to avoid intermediate states becoming visible. So that would be the
> preferred mechanism.

That could be part of a solution.  What I sketched out with the
"apparent order of execution" ordering of the transactions
(basically, commit order except when one SERIALIZABLE transaction
needs to be dragged in front of another due to a read-write
dependency) is possibly the simplest approach, but batching may
well give better performance.

> Collecting a list of transactions that must be applied before the current
> one could be accumulated during SSI processing and added to the commit
> record. But reordering the transaction apply is something we'd need to get
> some real clear theory on before we considered it.

Oh, there is a lot of very clear theory on it.  I even considered
whether it might work at the physical level, but that seems fraught
with potential land-mines due to the subtle ways in which we manage
race conditions at the detail level.  It's one of those things that
seems theoretically possible, but probably a really bad idea in
practice.  For logical replication, though, there is a clear way to
determine a reasonable order of applying changes that will never
yield a serialization anomaly -- if we do that, we dodge the choice
between using a "stale" safe snapshot or waiting an indeterminate
length of time for a "fresh" safe snapshot -- at the cost of
delaying logical replication itself at various points.

Anyway, we seem to be on the same page; just some minor
miscommunication at some point.  I apologize if I was unclear.

Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Simon Riggs

On 27 February 2016 at 17:54, Kevin Grittner  wrote:

> On a single database SSI can see whether a read has
> caused such a problem.  If you replicate the transactions to
> somewhere else and read them SSI cannot tell whether there is an
> anomaly

OK, I thought you were saying something else. What you're saying is that
SSI doesn't work on replicas, yet, whether that is physical or logical.

Row level locking (S2PL) can be used on logical standbys, so its actually a
better situation.

(at least, not without exchanging a lot of information that
> isn't currently happening), so some other mechanism would probably
> need to be used.  One possibility is to pass along information
> about when things are in a state on the source that is known to be
> free of anomalies if read; another would be to reorder the
> application of transactions to match the apparent order of
> execution.  The latter would not work for "physical" replication,
> but should be fine for logical replication.  An implementation
> might create a list in commit order, but not release the front of
> the list for processing if it is a SERIALIZABLE transaction which
> has written data until all overlapping SERIALIZABLE transactions
> complete, so it can move any subsequently-committed SERIALIZABLE
> transaction which read the "old" version of the data ahead of it.
>

The best way would be to pass across "anomaly barriers", since they can
easily be inserted into the WAL stream. The main issue seems to be how and
when to detect them.

For logical replay, applying in batches is actually a good thing since it
allows parallelism. We can remove them all from the target's procarray all
at once to avoid intermediate states becoming visible. So that would be the
preferred mechanism.

Collecting a list of transactions that must be applied before the current
one could be accumulated during SSI processing and added to the commit
record. But reordering the transaction apply is something we'd need to get
some real clear theory on before we considered it.

Anyway, next release.

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Kevin Grittner

On Sat, Feb 27, 2016 at 1:14 PM, Konstantin Knizhnik
 wrote:

> We do not try to preserve transaction commit order at all nodes.
> But in principle it can be implemented using XTM API: it allows to redefine
> function which actually sets transaction status.  pg_dtm performs 2PC here.
> And in principle it is possible to enforce commits in any particular order.

That's encouraging.

> Concerning CSNs, may be you are right and it is not correct to use this
> notion in this case. Actually there are many "CSNs" involved in transaction
> commit.

Perhaps we should distinguish "commit sequence number" from "apply
sequence number"?  I really think we need to differentiate the
order to be applied from the order previously committed in order to
avoid long-term confusion.  Calling both "CSN" is going to cause
not only miscommunication but muddled thinking, IMO.

> First of all each transaction is assigned local CSN (timestamp) when it is
> ready to commit. Then CSNs of all nodes are exchanged and maximal CSN is
> chosen.
> This maximum is writen as final transaction CSN and is used in visibility
> check.

Is this an implementation of some particular formal technique?  If
so, do you have a reference to a paper on it?  I get the sense that
there has been a lot written about distributed transactions, and
that it would be a mistake to ignore it, but I have not (yet)
reviewed the literature for it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Konstantin Knizhnik


Neither pg_dtm, neither pg_tsdtm supports serializable isolation level.
We implemented distributed snapshot isolation - repeatable-read isolation level.
We also do not support read-committed isolation level now.

We do not try to preserve transaction commit order at all nodes.
But in principle it can be implemented using XTM API: it allows to redefine 
function which actually sets transaction status.  pg_dtm performs 2PC here.
And in principle it is possible to enforce commits in any particular order.

Concerning CSNs, may be you are right and it is not correct to use this notion in this 
case. Actually there are many "CSNs" involved in transaction commit.
First of all each transaction is assigned local CSN (timestamp) when it is 
ready to commit. Then CSNs of all nodes are exchanged and maximal CSN is chosen.
This maximum is writen as final transaction CSN and is used in visibility check.


On 02/27/2016 01:48 AM, Kevin Grittner wrote:

On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
 wrote:


pg_tsdtm  is based on another approach: it is using system time
as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database.  Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

To avoid confusion, it might be best to reserve CSN for actual
commit sequence numbers, or at least values which increase
monotonically with each commit.  The term of art for what I
described above is "apparent order of execution", so maybe we want
to use AOE or AOoE for the order we choose to use in a particular
implementation.  It doesn't seem to me to be outright inaccurate
for cases where the system time on the various systems is used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Kevin Grittner

On Fri, Feb 26, 2016 at 5:37 PM, Simon Riggs  wrote:
> On 26 February 2016 at 22:48, Kevin Grittner  wrote:

>> if we want logical
>> replication to be free of serialization anomalies for those using
>> serializable transactions, we need to support applying transactions
>> in an order which may not be the same as commit order -- CSN (as
>> such) would be the wrong thing.  If serializable transaction 1 (T1)
>> modifies a row and concurrent serializable transaction 2 (T2) reads
>> the old version of the row, and modifies something based on that,
>> T2 must be applied to a logical replica first even if T1 commits
>> before it; otherwise the logical replica could see a state not
>> consistent with business rules and which could not have been seen
>> (due to SSI) on the source database.
>
> How would SSI allow that commit order?
>
> Surely there is a read-write dependency that would cause T2 to be
> aborted?

*A* read-write dependency does not cause an abort under SSI, it
takes a *pattern* of read-write dependencies which has been proven
to appear in any set of concurrent transactions which can cause a
serialization anomaly.  A read-only transaction can be part of that
pattern.  On a single database SSI can see whether a read has
caused such a problem.  If you replicate the transactions to
somewhere else and read them SSI cannot tell whether there is an
anomaly (at least, not without exchanging a lot of information that
isn't currently happening), so some other mechanism would probably
need to be used.  One possibility is to pass along information
about when things are in a state on the source that is known to be
free of anomalies if read; another would be to reorder the
application of transactions to match the apparent order of
execution.  The latter would not work for "physical" replication,
but should be fine for logical replication.  An implementation
might create a list in commit order, but not release the front of
the list for processing if it is a SERIALIZABLE transaction which
has written data until all overlapping SERIALIZABLE transactions
complete, so it can move any subsequently-committed SERIALIZABLE
transaction which read the "old" version of the data ahead of it.

>> Any DTM API which does not
>> support some mechanism to rearrange the order of transactions from
>> commit order to some other order (based on, for example, read-write
>> dependencies) is not complete.  If it does support that, it gives
>> us a way forward for presenting consistent data on logical
>> replicas.
>
> You appear to be saying that SSI allows transactions to commit in a
> non-serializable order.

Absolutely not.  If you want to understand this better, this paper
might be helpful:

http://vldb.org/pvldb/vol5/p1850_danrkports_vldb2012.pdf

> Do you have a test case?

There are a couple in this section of the Wiki page of examples:

https://wiki.postgresql.org/wiki/SSI#Read_Only_Transactions

Just picture the read-only transaction executing on a replica.

Thinking of commit sequence number as the right order to apply
transactions during replication seems to me to be a holdover from
the techniques initially developed for transaction in the 1960s --
specifically, strict two-phase locking (S2PL) is very easy to get
one's head around and when using it the apparent order of execution
always *does* match commit order.  Unfortunately S2PL performs so
poorly that it was ripped out of PostgreSQL years ago.  In general,
I think it is time we gave up on thinking that is based on it.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Álvaro Hernández Tortosa




On 27/02/16 09:19, Konstantin Knizhnik wrote:

On 02/27/2016 06:54 AM, Robert Haas wrote:


[...]



So maybe the goal for the GTM isn't to provide true serializability

across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

Absolutely agree. There are some theoretical discussion regarding CAP 
and different distributed level of isolation.
But at practice people want to solve their tasks. Most of PostgeSQL 
used are using default isolation level: read committed although there 
are alot of "wonderful" anomalies with it.
Serialazable transaction in Oracle are actually violating fundamental 
serializability rule and still Oracle is one of ther most popular 
database in the world...
The was isolation bug in Postgres-XL which doesn't prevent from using 
it by commercial customers...


I think this might be a dangerous line of thought. While I agree 
PostgreSQL should definitely look at the market and answer questions 
that (current and prospective) users may ask, and be more practical than 
idealist, easily ditching isolation guarantees might not be a good thing.


 That Oracle is the leader with their isolation problems or that 
most people run PostgreSQL under read committed is not a good argument 
to cut the corner and just go to bare minimum (if any) isolation 
guarantees. First, because PostgreSQL has always been trusted and 
understood as a system with *strong* guarantees (whatever that means). . 
Second, because what we may perceive as OK from the market, might change 
soon. From my observations, while I agree with you most people "don't 
care" or, worse, "don't realize", is rapidly changing. More and more 
people are becoming aware of the problems of distributed systems and the 
significant consequences they may have on them.


A lot of them have been illustrated in the famous Jepsen posts. As 
an example, and a good one given that you have mentioned Galera before, 
is this one: https://aphyr.com/posts/327-jepsen-mariadb-galera-cluster 
which demonstrates how Galera fails to provide Snapshot Isolation, even 
on healthy state --despite they claim that.


As of today, I would expect any distributed system to clearly state 
its guarantees in the documentation. And them adhere to them, like for 
instance proving it with tests such as Jepsen.




So I do not say that discussing all this theoretical questions is not 
need as formally proven correctness of distributed algorithm.


I would like to see work forward here, so I really appreciate all 
your work here. I cannot give an opinion on whether the DTM API is good 
or not, but I agree with Robert a good technical discussion on these 
issues is a good, and a needed, starting point. Feedback may also help 
you avoid pitfalls that may have gone unnoticed until tons of code are 
implemented.


Academical approaches are sometimes "very academical", but studying 
them doesn't hurt either :)



Álvaro


--
Álvaro Hernández Tortosa


---
8Kdata



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-27 Thread Konstantin Knizhnik


On 02/27/2016 06:54 AM, Robert Haas wrote:

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
 wrote:

We do not have formal prove that proposed XTM is "general enough" to handle
all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based and
CSN  based.

I don't believe that for a minute.  For example, consider this article:


Well, I have to agree that saying that there are just two ways of providing 
distributed isolation I was not right.
There is at least one more method: conservative locking. But it will cause huge 
number of extra network messages which has to be exchanged.
Also I mostly considered solutions compatible with PostgreSQL MVCC model.

And definitely their are other approaches. Like preserving transaction commit 
order (as it is done in Galera).
Some other them can be implemented with XTM (preserving commit order), some - 
not (2PL).
I have already noticed that XTM is not allowing to implement ANY transaction 
manager.
But we have considered several approaches to distributed transaction management 
explained in the article related with really working systems.
Some of them are real production system as SAP HANA, some are just prototypes, 
but working prototypes for which authors have performed
some benchmarking and comparison with other approaches. The references you have 
mentioned are mostly theoretical description of the problem.
Nice to know it but it is hard to build some concrete implementation based on 
this articles.


Briefly answering other your questions:


For example, consider a table with a million rows spread across any number of 
servers.


It is sharding scenario, pg_tsdtm will work well in this case does not 
requiring sending a lot of extra messages.


Now consider another workload where each transaction reads a row one

one server, reads a row on another server,

It can be solved both with pg_dtm (central arbiter) and pg_tsdtm (no arbiter),
But actually you scenarios just once again proves that there can not be just 
one ideal distributed TM.


So maybe the goal for the GTM isn't to provide true serializability

across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

Absolutely agree. There are some theoretical discussion regarding CAP and 
different distributed level of isolation.
But at practice people want to solve their tasks. Most of PostgeSQL used are using 
default isolation level: read committed although there are alot of "wonderful" 
anomalies with it.
Serialazable transaction in Oracle are actually violating fundamental 
serializability rule and still Oracle is one of ther most popular database in 
the world...
The was isolation bug in Postgres-XL which doesn't prevent from using it by 
commercial customers...

So I do not say that discussing all this theoretical questions is not need as 
formally proven correctness of distributed algorithm.
But I do not understand hot why it should prevent from providing extensible TM 
API.
Yes, we can tot do everything with it. But still we can implement many 
different approaches.
I think that it somehow proves that it is "general enough".






 






https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik

On 02/27/2016 06:57 AM, Robert Haas wrote:

On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
wrote:

pg_tsdtm is based on another approach: it is using system time as CSN and
doesn't require arbiter. In theory there is no limit for scalability. But
differences in system time and necessity to use more rounds of communication
have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?

If node receives message from "feature" it just needs to wait until this future
arrive.
Practically we just "adjust" system time in this case, moving it forward
(certainly system time is not actually changed, we just set correction value which need
to be added to system time).
This approach was discussed in the article:
http://research.microsoft.com/en-us/people/samehe/clocksi.srds2013.pdf
I hope, in this article algorithm is explained much better than I can do here.

Few notes:
1. I can not prove that our pg_tsdtm absolutely correctly implements approach
described in this article.
2. I didn't try to formally prove that our implementation can not cause some
serialization anomalies.
3. We just run various synchronization tests (including simplest debit-credit
test which breaks old version of Postgtes-XL) during several days and we didn't
get any inconsistencies.
4. We have tested pg_tsdtm both at single node, blade cluster and geographically distributed nodes (distance more than thousand kilometers: one server was in Vladivostok, another in Kaliningrad). Ping between these two servers takes about 100msec.
Performance of our benchmark drops about 100 times but there was no inconsistencies.

Also I once again want to notice that primary idea of the proposed patch was
not pg_tsdtm.
There are well know limitation of this pg_tsdtm which we will try to address
in future.
What we want is to include XTM API in PostgreSQL to be able to continue our
experiments with different transaction managers and implementing multimaster on
top of it (our first practical goal) without affecting PostgreSQL core.

If XTM patch will be included in 9.6, then we can propose our multimaster as
PostgreSQL extension and everybody can use it.
Otherwise we have to propose our own fork of Postgres which significantly
complicates using and maintaining it.

So there is no ideal solution which can work well for all cluster. This is
why it is not possible to develop just one GTM, propose it as a patch for
review and then (hopefully) commit it in Postgres core. IMHO it will never
happen. And I do not think that it is actually needed. What we need is a way
to be able to create own transaction managers as Postgres extension not
affecting its core.

This seems rather defeatist. If the code is good and reliable, why
should it not be committed to core?

Two reasons:
1. There is no ideal implementation of DTM which will fit all possible needs
and be efficient for all clusters.
2. Even if such implementation exists, still the right way of it integration is
Postgres should use kind of TM API.
I hope that everybody will agree that doing it in this way:

#ifdef PGXC
/* In Postgres-XC, stop timestamp has to follow the timeline of GTM */
xlrec.xact_time = xactStopTimestamp + GTMdeltaTimestamp;
#else
xlrec.xact_time = xactStopTimestamp;
#endif

or in this way:

xlrec.xact_time = xactUseGTM ? xactStopTimestamp + GTMdeltaTimestamp :
xactStopTimestamp;

is very very bad idea.
In OO programming we should have abstract TM interface and several
implementations of this interface, for example
MVCC_TM, 2PL_TM, Distributed_TM...
This is actually what can be done with our XTM API.
As far as Postgres is implemented in C, not in C++ we have to emulate
interfaces using structures with function pointers.
And please notice that there is completely no need to include DTM
implementation in core, as far as it is not needed for everybody.
It can be easily distributed as extension.

I have that quite soon we can propose multimaster extension which should provides functionality similar with MySQL Gallera. But even right now we have integrated pg_dtm and pg_tsdtm with pg_shard and postgres_fdw, allowing to provide distributed
consistency for them.

All arguments against XTM can be applied to any other extension API in
Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are
not handled by this API. For example performing aggregation and grouping at
foreign server side. But still it is very useful and flexible mechanism,
allowing to implement many wonderful things.

That is true. And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not. We have
both accepted and rejected proposed hooks in the past.

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list

Re: [HACKERS] The plan for FDW-based sharding

On Sat, Feb 27, 2016 at 1:49 AM, Konstantin Knizhnik
 wrote:
> pg_tsdtm  is based on another approach: it is using system time as CSN and
> doesn't require arbiter. In theory there is no limit for scalability. But
> differences in system time and necessity to use more rounds of communication
> have negative impact on performance.

How do you prevent clock skew from causing serialization anomalies?

> So there is no ideal solution which can work well for all cluster. This is
> why it is not possible to develop just one GTM, propose it as a patch for
> review and then (hopefully) commit it in Postgres core. IMHO it will never
> happen. And I do not think that it is actually needed. What we need is a way
> to be able to create own transaction managers as Postgres extension not
> affecting its  core.

This seems rather defeatist.  If the code is good and reliable, why
should it not be committed to core?

> All arguments against XTM can be applied to any other extension API in
> Postgres, for example FDW.
> Is it general enough? There are many useful operations which currently are
> not handled by this API. For example performing aggregation and grouping at
> foreign server side.  But still it is very useful and flexible mechanism,
> allowing to implement many wonderful things.

That is true.  And everybody is entitled to an opinion on each new
proposed hook, as to whether that hook is general or not.  We have
both accepted and rejected proposed hooks in the past.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

On Fri, Feb 26, 2016 at 10:56 PM, Konstantin Knizhnik
 wrote:
> We do not have formal prove that proposed XTM is "general enough" to handle
> all possible transaction manager implementations.
> But there are two general ways of dealing with isolation: snapshot based and
> CSN  based.

I don't believe that for a minute.  For example, consider this article:

https://en.wikipedia.org/wiki/Global_serializability

I think the neutrality of that article is *very* debatable, but it
certainly contradicts the idea that snapshots and CSNs are the only
methods of achieving global serializability.

Or consider this lecture:

http://hssl.cs.jhu.edu/~randal/416/lectures.old/ln5.2.pdf

That's a great introduction to the problem we're trying to solve here,
but again, snapshots are not mentioned, and CSNs certainly aren't
mentioned.

This write-up goes further, explaining three different methods for
ensuring global serializability, none of which mention snapshots or
CSNs:

http://heaven.eee.metu.edu.tr/~vision/LectureNotes/EE442/Ee442ch7.html

Actually, I think the second approach is basically a snapshot/CSN-type
approach, but it doesn't use that terminology and the connection to
what you are proposing is very unclear.

I think you're approaching this problem from a viewpoint that is
entirely too focused on the code that exists in PostgreSQL today.
Lots of people have done lots of academic research on how to solve
this problem, and you can't possibly say that CSNs and snapshots are
the only solution to this problem unless you haven't read any of those
papers.  The articles above aren't exceptional in mentioning neither
of the approaches that you are advocating - they are typical of the
literature in this area.  How can it be that the only solutions to
this problem are ones that are totally different from the approaches
that university professors who spend time doing research on
concurrency have spent time exploring?

I think we need to back up here and examine our underlying design
assumptions.  The goal here shouldn't necessarily be to replace
PostgreSQL's current transaction management with a distributed version
of the same thing.  We might want to do that, but I think the goal is
or should be to provide ACID semantics in a multi-node environment,
and specifically the I in ACID: transaction isolation.  Making the
existing transaction manager into something that can be spread across
multiple nodes is one way of accomplishing that.  Maybe the best one.
Certainly one that's been experimented within Postgres-XC.  But it is
often the case that an algorithm that works tolerably well on a single
machine starts performing extremely badly in a distributed
environment, because the latency of communicating between multiple
systems is vastly higher than the latency of communicating between
CPUs or cores on the same system.  So I don't think we should be
assuming that's the way forward.

For example, consider a table with a million rows spread across any
number of servers.  Consider also a series of update transactions each
of which reads exactly one row and then writes that row.  If we adopt
any solution that involves a central coordinator to arbitrate commit
ordering, this is going to require at least one and probably two
million network round trips, one per transaction to get a snapshot and
a second to commit.  But all of this is completely unnecessary.
Because each transaction touches only a single node, a perfect global
transaction manager doesn't really need to do anything at all in this
case.  The existing PostreSQL mechanisms - snapshot isolation, and SSI
if you have it turned on - will provide just as much transaction
isolation on this workload as they would on a workload that only
touched a single node.  If we design a GTM that does two million
network round trips in this scenario, we have just wasted two million
network round trips.

Now consider another workload where each transaction reads a row one
one server, reads a row on another server, and then updates the second
row.  Here, the GTM has a job to do.  If T1 reads R1, reads R2, writes
R2; and T2 concurrently reads R2, reads R1, and then writes R1, it
could happen that both transactions see the pre-update values of the
row they read first and yet both transactions go on to commit.  That's
not equivalent to any serial history, so transaction isolation is
broken.  A GTM which aims to provide true cluster-wide serializability
must do something to keep that from happening.  If all of this were
happening on a single node, those transactions would succeed if run at
READ COMMITTED but SSI would roll one of them back at SERIALIZABLE.
So maybe the goal for the GTM isn't to provide true serializability
across the cluster but some lesser degree of transaction isolation.
But then exactly which serialization anomalies are we trying to
prevent, and why is it OK to prevent those and not others?

I have seen zero discussion of any of this.  What I think we

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Simon Riggs

On 26 February 2016 at 22:48, Kevin Grittner  wrote:

> On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
>  wrote:
>
> > pg_tsdtm  is based on another approach: it is using system time
> > as CSN
>
> Which brings up an interesting point, if we want logical
> replication to be free of serialization anomalies for those using
> serializable transactions, we need to support applying transactions
> in an order which may not be the same as commit order -- CSN (as
> such) would be the wrong thing.  If serializable transaction 1 (T1)
> modifies a row and concurrent serializable transaction 2 (T2) reads
> the old version of the row, and modifies something based on that,
> T2 must be applied to a logical replica first even if T1 commits
> before it; otherwise the logical replica could see a state not
> consistent with business rules and which could not have been seen
> (due to SSI) on the source database.


How would SSI allow that commit order?

Surely there is a read-write dependency that would cause T2 to be aborted?


> Any DTM API which does not
> support some mechanism to rearrange the order of transactions from
> commit order to some other order (based on, for example, read-write
> dependencies) is not complete.  If it does support that, it gives
> us a way forward for presenting consistent data on logical
> replicas.
>

You appear to be saying that SSI allows transactions to commit in a
non-serializable order.

Do you have a test case?

-- 
Simon Riggshttp://www.2ndQuadrant.com/

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Kevin Grittner

On Fri, Feb 26, 2016 at 2:19 PM, Konstantin Knizhnik
 wrote:

> pg_tsdtm  is based on another approach: it is using system time
> as CSN

Which brings up an interesting point, if we want logical
replication to be free of serialization anomalies for those using
serializable transactions, we need to support applying transactions
in an order which may not be the same as commit order -- CSN (as
such) would be the wrong thing.  If serializable transaction 1 (T1)
modifies a row and concurrent serializable transaction 2 (T2) reads
the old version of the row, and modifies something based on that,
T2 must be applied to a logical replica first even if T1 commits
before it; otherwise the logical replica could see a state not
consistent with business rules and which could not have been seen
(due to SSI) on the source database.  Any DTM API which does not
support some mechanism to rearrange the order of transactions from
commit order to some other order (based on, for example, read-write
dependencies) is not complete.  If it does support that, it gives
us a way forward for presenting consistent data on logical
replicas.

To avoid confusion, it might be best to reserve CSN for actual
commit sequence numbers, or at least values which increase
monotonically with each commit.  The term of art for what I
described above is "apparent order of execution", so maybe we want
to use AOE or AOoE for the order we choose to use in a particular
implementation.  It doesn't seem to me to be outright inaccurate
for cases where the system time on the various systems is used.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik

On 02/26/2016 09:30 PM, Alvaro Herrera wrote:

Konstantin Knizhnik wrote:

Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently
synchronize their branch with master,
and, what is more important, for customers, which can not use standard
version of PostgreSQL.
It may cause problems with system certification, with running Postgres in
cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
wrong direction.

That's not the point, though. I don't think a Postgres clone with a GTM
solves any particular problem that's not already solved by the existing
forks. However, if you have a clone at home and you make a GTM work on
it, then you take the GTM as a patch and post it for discussion.
There's no need for hooks for that. Just make sure your GTM solves the
problem that it is supposed to solve.

Excuse me if I've missed the discussion elsewhere -- why does
PostgresPro have *two* GTMs instead of a single one?

There are many different clusters which require different approaches for
managing distributed transactions.
Some clusters do no need distributed transactions at all: if you are executing
OLAP queries on read-only database GTM will just add extra overhead.

pg_dtm uses centralized arbiter. It is similar with Postgres-XL DTM. Presence of single arbiter signficantly simplify all distributed algorithms: failure detection, global deadlock elimination, ... But at the same time arbiter is SPOF and main factor
limiting cluster scalability.

pg_tsdtm is based on another approach: it is using system time as CSN and doesn't require arbiter. In theory there is no limit for scalability. But differences in system time and necessity to use more rounds of communication have negative impact on
performance.

So there is no ideal solution which can work well for all cluster. This is why it is not possible to develop just one GTM, propose it as a patch for review and then (hopefully) commit it in Postgres core. IMHO it will never happen. And I do not think that
it is actually needed. What we need is a way to be able to create own transaction managers as Postgres extension not affecting its core.

All arguments against XTM can be applied to any other extension API in
Postgres, for example FDW.
Is it general enough? There are many useful operations which currently are not handled by this API. For example performing aggregation and grouping at foreign server side. But still it is very useful and flexible mechanism, allowing to implement many
wonderful things.

From my point of view good system should be as open and customizable as
possible, if it doesn't affect performance.
Replacing direct function calls with indirect function calls in almost all
cases can not suffer performance as well as adding hooks.
So without any extra price we get better flexibility. What's wrong with it?

--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Bruce Momjian

On Fri, Feb 26, 2016 at 03:30:29PM -0300, Alvaro Herrera wrote:
> That's not the point, though.  I don't think a Postgres clone with a GTM
> solves any particular problem that's not already solved by the existing
> forks.  However, if you have a clone at home and you make a GTM work on
> it, then you take the GTM as a patch and post it for discussion.
> There's no need for hooks for that.  Just make sure your GTM solves the
> problem that it is supposed to solve.
> 
> Excuse me if I've missed the discussion elsewhere -- why does
> PostgresPro have *two* GTMs instead of a single one?

I think the issue is that a GTM that works for a low-latency network
doesn't work well for a high-latency network, so the high-latency GTM
has fewer features and guarantees.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Alvaro Herrera

Konstantin Knizhnik wrote:

> Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
> But it cause big problems both for developers, which have to permanently
> synchronize their branch with master,
> and, what is more important, for customers, which can not use standard
> version of PostgreSQL.
> It may cause problems with system certification, with running Postgres in
> cloud,...
> Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it is
> wrong direction.

That's not the point, though.  I don't think a Postgres clone with a GTM
solves any particular problem that's not already solved by the existing
forks.  However, if you have a clone at home and you make a GTM work on
it, then you take the GTM as a patch and post it for discussion.
There's no need for hooks for that.  Just make sure your GTM solves the
problem that it is supposed to solve.

Excuse me if I've missed the discussion elsewhere -- why does
PostgresPro have *two* GTMs instead of a single one?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Konstantin Knizhnik

We do not have formal prove that proposed XTM is "general enough" to 
handle all possible transaction manager implementations.
But there are two general ways of dealing with isolation: snapshot based 
and CSN  based.

pg_dtm and pg_tsdtm prove that both of them can be implemented using XTM.
If you know some approach to distributed transaction manager 
implementation, please let us know.

Otherwise your statement "is not general enough" is not concrete enough.
Postgres-XL GTM can be in principle implemented as extension based on XTM.

This API is based on existed PostgreSQL TM functions: we do not 
introduce some new abstractions.
Is it possible that some other TM function has to be encapsulated? Yes, 
it is.
But I do not see much problems with adding this function to XTM in 
future if it is actually needed.
It happens with most APIs. It is awful when API functions are changed, 
breaking application based on this API.
But as far as functions encapsulated in XTM are in any case present in 
PostgreSQL core, I do not think
that them will be changed in future unless there are some plans to 
completely rewrite Postgres transaction manager...


Yes, it is certainly possible to develop cluster by cloning PostgreSQL.
But it cause big problems both for developers, which have to permanently 
synchronize their branch with master,
and, what is more important, for customers, which can not use standard 
version of PostgreSQL.
It may cause problems with system certification, with running Postgres 
in cloud,...
Actually the history of Postgres-XL/XC and Greenplum IMHO shows that it 
is wrong direction.




On 26.02.2016 19:06, Robert Haas wrote:

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:

Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.

I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.


And what makes us think we
really need multiple transaction managers, anyway?

If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well  in various clustering environment during
development period, which is short, than probably we don't need  multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question.  The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do.  There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs.  Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.  We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.


dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.



--
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

On Fri, Feb 26, 2016 at 10:00 PM, Joshua D. Drake  
wrote:
> Robert, this is all a game. It is a game of who wins the intellectual prize
> to whatever problem. Who gets the market or mind share and who gets to
> pretend they win the Oscar for coolest design.

JD, I don't have a horse in this race.  I am not developing a GTM and
I would be quite happy never to have to develop a GTM.  That doesn't
mean I think we should add these proposed hooks.  I think that's just
freezing the way that potential GTMs have to interact with the rest of
the system before we actually have a solution that the community is
willing to endorse.  I don't know what problem that solves.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Joshua D. Drake


On 02/26/2016 08:06 AM, Robert Haas wrote:

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:

Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.


I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.


Correct.

[snip]



Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.


No it didn't. It allowed MySQL people to use the tool that best fit 
their needs.



We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.


The reason people developed a bunch of external replication tools (and 
continue to) is because .Org has shown a unique lack of leadership in 
providing solutions for the problem. Historically speaking .Org was anti 
replication in core. It wasn't about who was going to be best. It was 
who was going to be best for what problem. The inclusion of the 
replication tools we have now speaks very loudly to the that lack of 
leadership.


The moment .Org showed leadership and developed a reasonable solution to 
80% of the problem, a great majority of people moved to hot standby and 
streaming replication. It is easy. It does not answer all the questions 
but it is default, in core and that gives people piece of mind. This is 
also why once PgLogical is up to -core quality and in -core, the great 
majority of people will work to dump Slony/Londiste/Insertproghere and 
use PgLogical.


If .Org was interested in showing leadership in this area, a few hackers 
would get together with a few other hackers from XL and XC (although as 
I understand it XL is further along), have a few heart to heart, mind to 
mind meetings and determine:


* Are either of these two solutions worth it?
Yes? Then let's start working on an integration plan and get it done.
No? Then let's start working on a .Org plan to solve that problem.

But that likely won't happen because NIH.



I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.


dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !


I don't like to play games with the architecture of PostgreSQL.



Robert, this is all a game. It is a game of who wins the intellectual 
prize to whatever problem. Who gets the market or mind share and who 
gets to pretend they win the Oscar for coolest design.


Sincerely,

jD

--
Command Prompt, Inc.  http://the.postgres.company/
+1-503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Everyone appreciates your honesty, until you are honest with them.


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

On Fri, Feb 26, 2016 at 7:21 PM, Oleg Bartunov  wrote:
> Right now tm is hardcoded and it's doesn't matter  "if other people might
> need" at all.  We at least provide developers ("other people")  ability to
> work on their implementations and the patch  is safe and doesn't sacrifices
> anything in core.

I don't believe that.  When we install APIs into core, we're
committing to keep those APIs around.  And I think that we're far too
early in the development of transaction managers for PostgreSQL to
think that we know what APIs we want to commit to over the long term.

>> And what makes us think we
>> really need multiple transaction managers, anyway?
>
> If you brave enough to say that one tm-fits-all and you are able to teach
> existed tm to play well  in various clustering environment during
> development period, which is short, than probably we don't need  multiple
> tms. But It's too perfect to believe and practical solution is to let
> multiple groups to work on their solutions.

Nobody's preventing multiple groups for working on their solutions.
That's not the question.  The question is why we should install hooks
in core at this early stage without waiting to see which
implementations prove to be best and whether those hooks are actually
general enough to cater to everything people want to do.  There is
talk of integrating XC/XL work into PostgreSQL; it has a GTM.
Postgres Pro has several GTMs.  Maybe there will be others.

Frankly, I'd like to see a GTM in core at some point because I'd like
everybody who uses PostgreSQL to have access to a GTM.  What I don't
want is for every PostgreSQL company to develop its own GTM and
distribute it separately from everybody else's.  IIUC, MySQL kinda did
that with storage engines and it resulted in the fragmentation of the
community.  We've had the same thing happen with replication tools -
every PostgreSQL company develops their own set.  It would have been
better to have ONE set that was distributed by the core project so
that we didn't all do the same work over again.

I don't understand the argument that without these hooks in core,
people can't continue to work on this.  It isn't hard to work on GTM
without any core changes at all.  You just patch your copy of
PostgreSQL.  We do this all the time, for every patch.  We don't add
hooks for every patch.

> dtms.  It's time to start working on dtm, I believe. The fact you don't
> think about distributed transactions support doesn't mean there no "other
> people", who has different ideas on postgres future.  That's why we propose
> this patch, let's play the game !

I don't like to play games with the architecture of PostgreSQL.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-26 Thread Oleg Bartunov

On Fri, Feb 26, 2016 at 3:50 PM, Robert Haas  wrote:

> On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov 
> wrote:
> > I already several times pointed, that we need XTM to be able to continue
> > development in different directions, since there is no clear winner.
> > Moreover, I think there is no fits-all  solution and while I agree we
> need
> > one built-in in the core, other approaches should have ability to exists
> > without patching.
>
> I don't think I necessarily agree with that.  Transaction management
> is such a fundamental part of the system that I think making it
> pluggable is going to be really hard.  I understand that you've done
> several implementations based on your proposed API, and that's good as
> far as it goes, but how do we know that's really going to be general
> enough for what other people might need?

Right now tm is hardcoded and it's doesn't matter  "if other people might
need" at all.  We at least provide developers ("other people")  ability to
work on their implementations and the patch  is safe and doesn't sacrifices
anything in core.

> And what makes us think we
> really need multiple transaction managers, anyway?

If you brave enough to say that one tm-fits-all and you are able to teach
existed tm to play well  in various clustering environment during
development period, which is short, than probably we don't need  multiple
tms. But It's too perfect to believe and practical solution is to let
multiple groups to work on their solutions.

> Even writing one
> good distributed transaction manager seems like a really hard project
> - why would we want to write two or three or five?
>

again, right now it's simply impossible to any bright person to work on
dtms.  It's time to start working on dtm, I believe. The fact you don't
think about distributed transactions support doesn't mean there no "other
people", who has different ideas on postgres future.  That's why we propose
this patch, let's play the game !

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>

Re: [HACKERS] The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 3:05 PM, Oleg Bartunov  wrote:
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.
> Moreover, I think there is no fits-all  solution and while I agree we need
> one built-in in the core, other approaches should have ability to exists
> without patching.

I don't think I necessarily agree with that.  Transaction management
is such a fundamental part of the system that I think making it
pluggable is going to be really hard.  I understand that you've done
several implementations based on your proposed API, and that's good as
far as it goes, but how do we know that's really going to be general
enough for what other people might need?  And what makes us think we
really need multiple transaction managers, anyway?  Even writing one
good distributed transaction manager seems like a really hard project
- why would we want to write two or three or five?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-25 Thread Bruce Momjian

On Thu, Feb 25, 2016 at 01:53:12PM +0900, Michael Paquier wrote:
> > Well, as far as I know XC doesn't support data redistribution between
> > nodes and I saw good benchmarks of that, as well as XL.
> 
> XC does support that in 1.2 with a very basic approach (coded that
> years ago), though it takes an exclusive lock on the table involved.
> And actually I think what I did in this case really sucked, the effort
> was centralized on the Coordinator to gather and then redistribute the
> tuples, at least tuples that do not need to move were not moved at
> all.

Yes, there is a lot of complexity involved in sending results between
nodes.

> >> Once that is done, we can see what workloads it covers and
> >> decide if we are willing to copy the volume of code necessary
> >> to implement all supported Postgres XC or XL workloads.
> >> (The Postgres XL license now matches the Postgres license,
> >> http://www.postgres-xl.org/2015/07/license-change-and-9-5-merge/.
> >> Postgres XC has always used the Postgres license.)
> 
> Postgres-XC used the GPL license first, and has moved to PostgreSQL
> license exactly to allow Postgres core to reuse it later on if needed.

Ah, yes, I remember that now.  Thanks.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Michael Paquier

On Wed, Feb 24, 2016 at 11:34 PM, Bruce Momjian  wrote:
> On Wed, Feb 24, 2016 at 12:17:28PM +0300, Alexander Korotkov wrote:
>> Hi, Bruce!
>>
>> The important point for me is to distinguish different kind of plans:
>> implementation plan and research plan.
>> If we're talking about implementation plan then it should be proven that
>> proposed approach works in this case. I.e research should be already done.
>> If we're talking about research plan then we should realize that result is
>> unpredictable. And we would probably need to dramatically change our way.
>
> Yes, good point.  I would say FDW-based sharding is certainly still a
> research approach, but an odd one because we are adding code even while
> in research mode.  I think that is possible because the FDW improvements
> have other uses beyond sharding.
>
> I think another aspect is that we already know that modifying the
> Postgres source code can produce a useful sharding solution --- XC, XL,
> Greenplum, and CitusDB all prove that, and pg_shard does it as a plugin.
> So, we know that with unlimited code changes, it is possible.  What we
> don't know is whether it is possible with acceptable code changes, and
> how much of the feature-set can be supported this way.
>
> We had a similar case with the Windows port, where SRA (my employer at
> the time) and Nusphere both had native Windows ports of Postgres, and
> they supplied source code to help with the port.  So, in that case also,
> we knew a native Windows port was possible, and we (or at least I) could
> see the code that was required to do it.  The big question was whether a
> native Windows port could be added in a community-acceptable way, and
> the community agreed we could try if we didn't make the code messier ---
> that was a success.
>
> For pg_upgrade, I had code from EDB (my employer at the time) that kind
> of worked, but needed lots of polish, and again, I could do it in
> contrib as long as I didn't mess up the backend code --- that worked
> well too.
>
> So, I guess I am saying, the FDW/sharding thing is a research project,
> but one that is implementing code because of existing proven solutions
> and because the improvements are benefiting other use-cases beyond
> sharding.
>
> Also, in the big picture, the existence of many Postgres forks, all
> doing sharding, indicates that there is demand for this capability, and
> if we can get some this capability into Postgres we will increase the
> number of people using native Postgres.  We might also be able to reduce
> the amount of duplicate work being done in all these forks and allow
> them to more easily focus on more advanced use-cases.
>
>> This two things would work with FDW:
>> 1) Pull data from data nodes to coordinator.
>> 2) Pushdown computations from coordinator to data nodes: joins, aggregates 
>> etc.
>> It's proven and clear. This is good.
>> Another point is that these FDW advances are useful by themselves. This is 
>> good
>> too.
>>
>> However, the model of FDW assumes that communication happen only between
>> coordinator and data node. But full-weight distributed optimized can't be 
>> done
>> under this restriction, because it requires every node to communicate every
>> other node if it makes distributed query faster. And as I get, FDW approach
>> currently have no research and no particular plan for that.
>
> This is very true.  I imagine cross-node connections will certainly
> complicate the implementation and lead to significant code changes,
> which might be unacceptable.  I think we need to go with a
> non-cross-node implementation first, then if that is accepted, we can
> start to think what cross-node code changes would look like.  It
> certainly would require FDW knowledge to exist on every shard.  Some
> have suggested that FDWs wouldn't work well for cross-node connections
> or wouldn't scale and we shouldn't be using them --- I am not sure what
> to think of that.
>
>> As I get from Robert Haas's talk 
>> (https://docs.google.com/viewer?a=v=sites;
>> srcid=ZGVmYXVsdGRvbWFpbnxyb2JlcnRtaGFhc3xneDo1ZmFhYzBhNjNhNzVhMDM0)
>>
>> Before we consider repartitioning joins, we should probably get 
>> everything
>> previously discussed working first.
>> – Join Pushdown For Parallelism, FDWs
>> – PartialAggregate/FinalizeAggregate
>> – Aggregate Pushdown For Parallelism, FDWs
>> – Declarative Partitioning
>> – Parallel-Aware Append
>>
>>
>> So, as I get we didn't ever think about possibility of data redistribution
>> using FDW. Probably, something changed since that time. But I haven't heard
>> about it.
>
> No, you didn't miss it.  :-(  We just haven't gotten to studying that
> yet.  One possible outcome is that built-in Postgres has non-cross-node
> sharding, and forks of Postgres have cross-node sharding, again assuming
> cross-node sharding requires an unacceptable amount of code change.  I
> don't think anyone knows the answer yet.
>
>> On Tue, Feb 23, 2016 at

Re: [HACKERS] The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 01:02:21PM -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> > On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:
> 
> > > It's never been our policy to try to include major projects in single code
> > > drops. Any move of XL/XC code into PostgreSQL core would need to be done 
> > > piece
> > > by piece across many releases. XL is definitely too big for the elephant 
> > > to eat
> > > in one mouthful.
> > 
> > Is there any plan to move the XL/XC code into Postgres?  If so, I have
> > not heard of it.  I thought everyone agreed it was too much code change,
> > which is why it is a separate code tree.  Is that incorrect?
> 
> Yes, I think that's incorrect.
> 
> What was said, as I understood it, is that Postgres-XL is too big to
> merge in a single commit -- just like merging BDR would have been.
> Indulge me while I make a parallel with BDR for a bit.
> 2ndQuadrant never pushed for merging BDR in a single commit; what was
> done was to split it, and propose individual pieces for commit.  Many of
> these pieces are now already committed (event triggers, background
> workers, logical decoding, replication slots, and many others).  The
> "BDR patch" is now much smaller, and it's quite possible that we will
> see it merged someday.  Will it be different from what it was when the
> BDR project started, all those years ago?  You bet.  Having the
> prototype BDR initially was what allowed the whole plan to make sense,
> because it showed that the pieces interacted in the right ways to make
> it work as a whole.

Yes, that is my understanding too.

> (I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
> pretty sure you can see the same thing in parallel query development,
> for instance.)
> 
> In the same way, Postgres-XL is far too big to merge in a single commit.
> But that doesn't mean it will never be merged.  What is more likely to
> happen instead is that some pieces of it are going to be submitted
> separately for consideration.  It is a slow process, but progress is
> real and tangible.  We know this process will yield a useful outcome,

I was not aware there was any process to merge XC/XL into Postgres, at
least from the XC/XL side.  I know there is desire to take code from
XC/XL on the FDW-sharding side.

I think the most conservative merge approach is to try to enhance
existing Postgres features first (FDWs, partitioning, parallelism),
perhaps features that didn't exist at the time XC/XL were designed. If
they work, keep them and add the XC/XL-specific parts.  If the
enhance-features approach doesn't work, we then have to consider how
much additional code will be needed.  We have to evaluate this for the
FDW-based approach too, but it is likely to be smaller, which is its
attraction.

> because the architecture has already proven by the existence of
> Postgres-XL itself.  It's the prototype that proves the overall design,
> even if the pieces change shape during the process.  (Really, it's way
> more than merely a prototype at this point because of how long it has
> matured.)

True, it is beyond a prototype.

> In contrast, we don't have a prototype for FDW-based sharding; as you
> admitted, there is no actual plan, other than "let's push FDWs in this
> direction and hope that sharding will emerge".  We don't really know
> what pieces we need or how will they interact with each other; we have a
> vague idea of a direction but there's no clear path forward.  As the
> saying goes, if you don't know where you're going, you will probably end
> up somewhere else.

I think I have covered that already.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

2016-02-24 Thread Alvaro Herrera

Bruce Momjian wrote:
> On Wed, Feb 24, 2016 at 01:08:29AM +, Simon Riggs wrote:

> > It's never been our policy to try to include major projects in single code
> > drops. Any move of XL/XC code into PostgreSQL core would need to be done 
> > piece
> > by piece across many releases. XL is definitely too big for the elephant to 
> > eat
> > in one mouthful.
> 
> Is there any plan to move the XL/XC code into Postgres?  If so, I have
> not heard of it.  I thought everyone agreed it was too much code change,
> which is why it is a separate code tree.  Is that incorrect?

Yes, I think that's incorrect.

What was said, as I understood it, is that Postgres-XL is too big to
merge in a single commit -- just like merging BDR would have been.
Indulge me while I make a parallel with BDR for a bit.
2ndQuadrant never pushed for merging BDR in a single commit; what was
done was to split it, and propose individual pieces for commit.  Many of
these pieces are now already committed (event triggers, background
workers, logical decoding, replication slots, and many others).  The
"BDR patch" is now much smaller, and it's quite possible that we will
see it merged someday.  Will it be different from what it was when the
BDR project started, all those years ago?  You bet.  Having the
prototype BDR initially was what allowed the whole plan to make sense,
because it showed that the pieces interacted in the right ways to make
it work as a whole.

(I'm not saying 2ndQuadrant is so wise to do things this way.  I'm
pretty sure you can see the same thing in parallel query development,
for instance.)

In the same way, Postgres-XL is far too big to merge in a single commit.
But that doesn't mean it will never be merged.  What is more likely to
happen instead is that some pieces of it are going to be submitted
separately for consideration.  It is a slow process, but progress is
real and tangible.  We know this process will yield a useful outcome,
because the architecture has already proven by the existance of
Postgres-XL itself.  It's the prototype that proves the overall design,
even if the pieces change shape during the process.  (Really, it's way
more than merely a prototype at this point because of how long it has
matured.)

In contrast, we don't have a prototype for FDW-based sharding; as you
admitted, there is no actual plan, other than "let's push FDWs in this
direction and hope that sharding will emerge".  We don't really know
what pieces we need or how will they interact with each other; we have a
vague idea of a direction but there's no clear path forward.  As the
saying goes, if you don't know where you're going, you will probably end
up somewhere else.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 09:34:37AM -0500, Bruce Momjian wrote:
> > I have nothing against particular FDW advances. However, it's unclear for me
> > that FDW should be the only sharding approach.
> > It's unproven that FDW can do work that Postgres XC/XL does. With FDW we can
> > have some low-hanging fruits. That's good.
> > But it's unclear we can have high-hanging fruits (like data redistribution)
> > with FDW approach. And if we can it's unclear that it would be easier than 
> > with
> > other approaches.
> > Just let's don't call this community chosen plan for implementing sharding.
> > Until we have full picture we can't select one way and reject others.
> 
> I agree.  I think the FDW approach is the only existing approach for
> built-in sharding though.  The forks of Postgres doing sharding are,
> just that, forks and just Postgres community ecosystem projects.   (Yes,
> they are open source.)  If the forks were community-chosen plans we
> hopefully would not have 5+ of them.  If FDW works, it has the potential
> to be the community-chosen plan, at least for the workloads it supports,
> because it is built into community Postgres in a way the others cannot.
> 
> That doesn't mean the forks go away, but rather their value is in doing
> things the FDW approach can't, but there are a lot of "if's" in there.

Actually, this seems similar to how we handled replication.  For years
we had multiple external replication solutions.  When we implemented
streaming replication, we knew it would become the default for workloads
it supports.  The external solutions didn't go away, but their value was
in handling workloads that streaming replication didn't support.

I think the only difference is that we knew streaming replication would
have this effect before we implemented it, while with FDW-based
sharding, we don't know.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 12:22:20PM +0300, Konstantin Knizhnik wrote:
> Sorry, but based on this plan it is possible to make a conclusion
> that there are only two possible cluster solutions for Postgres:
> XC/XL and FDW-based.  From my point of view there are  much more
> possible alternatives.
> Our main idea with XTM (eXtensible Transaction Manager API) was to
> make it possible to develop cluster solutions for Postgres as
> extensions without patching code of Postgres core. And FDW is one of
> the mechanism which makes it possible to reach this goal.

Yes, this is a good example of code reuse.

> IMHO it will be hard to implement efficient execution of complex
> OLAP queries (including cross-node joins  and aggregation) within
> FDW paradigm. It will be necessary to build distributed query
> execution plan and coordinate it execution at cluster nodes. And
> definitely we need specialized optimizer for distributed queries.
> Right now solution of the problem are provided by XL and Greenplum,
> but both are forks of Posrgres with a lot of changes in Postgres
> core. The challenge is to provide the similar functionality, but at
> extension level (using custom nodes, pluggable transaction manager,
> ...).

Agreed.

> But, as you noticed,  complex OLAP is just one of the scenarios and
> this is not the only possible way of using clusters. In some cases
> FDW-based sharding can be quite efficient. Or pg_shard approach
> which also adds sharding at extension level and in some aspects is
> more flexible than FDW-based solution. Not all scenarios require
> global transaction manager. But if one need global consistency, then
> XTM API allows to provide ACID for both approaches (and not only for
> them).

Yep.

> We currently added to commitfest our XTM patch together with
> postgres_fdw patch integrating timestamp-based DTM implementation in
> postgres_fdw. It illustrates how global consistency canbe reached
> for FDW-based sharding.
> If this XTM patch will be committed, then in 9.6 we will have wide
> flexibility to play with different distributed transaction managers.
> And it can be used for many cluster solutions.
> 
> IMHO it will be very useful to extend your classification of cluster
> use cases, more precisely  formulate demands in all cases,
> investigate  how them can be covered by existed cluster solutions
> for Postgres and which niches are still vacant. We are currently
> continue work on "multimaster" - some more convenient alternative to
> hot-standby replication. Looks like PostgreSQL is missing some
> product providing functionality similar to Oracle RAC or MySQL
> Gallera. It is yet another direction of cluster development for
> PostgreSQL.  Let's be more open and flexible.

Yes, I listed only the workloads I could think of.  It would be helpful
to list more workloads and start to decide what can be accomplished with
each approach.  I don't even know all the workloads supported by the
sharding forks of Postgres.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding

On Wed, Feb 24, 2016 at 12:35:15PM +0300, Oleg Bartunov wrote:
> I have nothing against particular FDW advances. However, it's unclear for
> me that FDW should be the only sharding approach.
> It's unproven that FDW can do work that Postgres XC/XL does. With FDW we
> can have some low-hanging fruits. That's good.
> But it's unclear we can have high-hanging fruits (like data 
> redistribution)
> with FDW approach. And if we can it's unclear that it would be easier than
> with other approaches.
> Just let's don't call this community chosen plan for implementing 
> sharding.
> Until we have full picture we can't select one way and reject others.
> 
> 
> I already several times pointed, that we need XTM to be able to continue
> development in different directions, since there is no clear winner.  
> Moreover,
> I think there is no fits-all  solution and while I agree we need one built-in
> in the core, other approaches should have ability to exists without patching.

Yep.  I think much of what we eventually add to core will be either
copied from an existing soltion, which then doesn't need to be
maintained anymore, or used by existing solutions.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I. As I am, so you will be. +
+ Roman grave inscription +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The plan for FDW-based sharding