Re: Built-in Raft replication

2025-04-29 Thread Jim Nasby
I've always assumed there'd have to be at least one global stream, if for no other purpose than to be the source of truth about transaction commit ordering (though, I was thinking of supporting multiple streams for one database). Presumably the same could be used for shared objects. Or perhaps shar

Re: Built-in Raft replication

2025-04-23 Thread Devrim Gündüz
Hi, On Wed, 2025-04-23 at 11:48 -0500, Jim Nasby wrote: > unless we added multiple WAL streams. That would allow for splitting > WAL traffic across multiple devices as well as providing better > support for configurations that don’t replicate the entire cluster. > The current situation where delay

Re: Built-in Raft replication

2025-04-23 Thread Jim Nasby
On Apr 16, 2025, at 2:29 PM, Greg Sabino Mullane wrote: > > On Wed, Apr 16, 2025 at 2:18 AM Ashutosh Bapat > wrote: >> Users find it a waste of resources to deploy 3 big PostgreSQL instances just >> for HA where 2 suffice even if they deploy 3 lightweight DC

Re: Built-in Raft replication

2025-04-18 Thread Alastair Turner
Hi Konstantin On Wed, 16 Apr 2025 at 15:07, Konstantin Osipov wrote: > * Alastair Turner [25/04/16 15:58]: > > > > > If you use build-in failover you have to resort to 3 big Postgres > > > machines because you need 2/3 majority. Of course, you can install > > > MySQL-stype arbiter - host that h

Re: Built-in Raft replication

2025-04-17 Thread Yura Sokolov
17.04.2025 00:24, Hannu Krosing пишет: > But regarding weather to use RAFT I would just define a "coordinator > API" and leave it up to the specific coordinator/consensus extension > to decide how the consensus is achieved > > > So to summarize: > > # Core should provide > > - way tomove to new

Re: Built-in Raft replication

2025-04-16 Thread Hannu Krosing
On Wed, Apr 16, 2025 at 6:27 AM Tom Lane wrote: > > Andrey Borodin writes: > > I think it's what Konstantin is proposing. To have our own Raft > > implementation, without dependencies. > > Hmm, OK. I thought that the proposal involved relying on some existing > code, but re-reading the thread t

Re: Built-in Raft replication

2025-04-16 Thread Konstantin Osipov
* Greg Sabino Mullane [25/04/16 22:33]: > > Users find it a waste of resources to deploy 3 big PostgreSQL instances > > just for HA where 2 suffice even if they deploy 3 lightweight DCS > > instances. Having only some of the nodes act as DCS and others purely > > PostgreSQL nodes will reduce waste

Re: Built-in Raft replication

2025-04-16 Thread Greg Sabino Mullane
On Wed, Apr 16, 2025 at 2:18 AM Ashutosh Bapat wrote: > Users find it a waste of resources to deploy 3 big PostgreSQL instances > just for HA where 2 suffice even if they deploy 3 lightweight DCS > instances. Having only some of the nodes act as DCS and others purely > PostgreSQL nodes will reduc

Re: Built-in Raft replication

2025-04-16 Thread Yura Sokolov
16.04.2025 07:58, Andrey Borodin пишет: > Yes, shared DCS are common these days. AFAIK, we use one Zookeeper instance > per hundred Postgres clusters to coordinate pg_consuls. > > Actually, scalability is opposite to topic of this thread. Let me explain. > Currently, Postgres automatic failover t

Re: Built-in Raft replication

2025-04-16 Thread Yura Sokolov
16.04.2025 08:24, Andrey Borodin пишет: > 2. After failover, old Primary node must rejoin cluster by running pg_rewind > and following timeline switch. It is really do-able: BiHA already does it. And BiHA runs as a child process of postmaster, ie both postmaster and BiHA doesn't restart when Post

Re: Built-in Raft replication

2025-04-16 Thread Konstantin Osipov
* Alastair Turner [25/04/16 15:58]: > > > If you use build-in failover you have to resort to 3 big Postgres > > machines because you need 2/3 majority. Of course, you can install > > MySQL-stype arbiter - host that had no real PGDATA, only participates in > > voting. But this is a solution to pro

Re: Built-in Raft replication

2025-04-16 Thread Alastair Turner
On Wed, 16 Apr 2025 at 07:18, Ashutosh Bapat wrote: > On Wed, Apr 16, 2025 at 10:29 AM Andrey Borodin > wrote: > > > > If you use build-in failover you have to resort to 3 big Postgres > machines because you need 2/3 majority. Of course, you can install > MySQL-stype arbiter - host that had no r

Re: Built-in Raft replication

2025-04-16 Thread Konstantin Osipov
* Andrey Borodin [25/04/16 11:06]: > > You can run bash from extension, what's the point? > > You cannot run bash that will stop backend running bash. You're right there is a chicken and egg problem when you add Raft to an existing project, and rebootstrap becomes a trick, but it's a plumbing

Re: Built-in Raft replication

2025-04-16 Thread Konstantin Osipov
* Andrey Borodin [25/04/16 11:06]: > > Andrey Borodin writes: > >> I think it's what Konstantin is proposing. To have our own Raft > >> implementation, without dependencies. > > > > Hmm, OK. I thought that the proposal involved relying on some existing > > code, but re-reading the thread that

Re: Built-in Raft replication

2025-04-16 Thread Konstantin Osipov
* Ashutosh Bapat [25/04/16 11:06]: > > My view is what Konstantin wants is automatic replication topology > > management. For some reason this technology is called HA, DCS, Raft, Paxos > > and many other scary words. But basically it manages primary_conn_info of > > some nodes to provide some f

Re: Built-in Raft replication

2025-04-16 Thread Konstantin Osipov
* Tom Lane [25/04/16 11:05]: > Nikolay Samokhvalov writes: > > This is exactly what I wanted to write as well. The idea is great. At the > > same time, I think, consensus on many decisions will be extremely hard to > > reach, so this project has a high risk of being very long. Unless it's an > >

Re: Built-in Raft replication

2025-04-16 Thread Michael Banck
Hi, On Wed, Apr 16, 2025 at 10:24:48AM +0500, Andrey Borodin wrote: > I think I can provide some reasons why it cannot be neither extension, > nor any part running within postmaster reign. > > 1. When joining cluster, there’s not PGDATA to run postmaster on top > of it. > > 2. After failover, ol

Re: Built-in Raft replication

2025-04-15 Thread Ashutosh Bapat
On Wed, Apr 16, 2025 at 11:57 AM Andrey Borodin wrote: > > > > > On 16 Apr 2025, at 11:18, Ashutosh Bapat > > wrote: > > > > Having only some of the nodes act as DCS > > and others purely PostgreSQL nodes will reduce waste of resources. > > But typically you need more DCS nodes than PostgreSQL n

Re: Built-in Raft replication

2025-04-15 Thread Andrey Borodin
> On 16 Apr 2025, at 11:18, Ashutosh Bapat wrote: > > Having only some of the nodes act as DCS > and others purely PostgreSQL nodes will reduce waste of resources. But typically you need more DCS nodes than PostgreSQL nodes. Did you mean “Having only some of nodes act as PostgreSQL and others

Re: Built-in Raft replication

2025-04-15 Thread Ashutosh Bapat
On Wed, Apr 16, 2025 at 10:29 AM Andrey Borodin wrote: > > > We may build an extension which > > has a similar role in PostgreSQL world as zookeeper in Hadoop. > > Patroni, pg_consul and others already use zookeeper, etcd and similar systems > for consensus. > Is it any better as extension than a

Re: Built-in Raft replication

2025-04-15 Thread Andrey Borodin
> On 16 Apr 2025, at 10:39, Kirill Reshke wrote: > > You can run bash from extension, what's the point? You cannot run bash that will stop backend running bash. Best regards, Andrey Borodin.

Re: Built-in Raft replication

2025-04-15 Thread Kirill Reshke
On Wed, 16 Apr 2025 at 10:25, Andrey Borodin wrote: > > I think I can provide some reasons why it cannot be neither extension, nor > any part running within postmaster reign. > > 1. When joining cluster, there’s not PGDATA to run postmaster on top of it. You can join the cluster on pg_basebackup

Re: Built-in Raft replication

2025-04-15 Thread Andrey Borodin
> On 16 Apr 2025, at 09:26, Tom Lane wrote: > > Andrey Borodin writes: >> I think it's what Konstantin is proposing. To have our own Raft >> implementation, without dependencies. > > Hmm, OK. I thought that the proposal involved relying on some existing > code, but re-reading the thread th

Re: Built-in Raft replication

2025-04-15 Thread Andrey Borodin
> On 16 Apr 2025, at 09:33, Ashutosh Bapat wrote: > > In my experience, the load of managing hundreds of replicas which all > participate in RAFT protocol becomes more than regular transaction > load. So making every replica a RAFT participant will affect the > ability to deploy hundreds of re

Re: Built-in Raft replication

2025-04-15 Thread Ashutosh Bapat
On Wed, Apr 16, 2025 at 9:37 AM Andrey Borodin wrote: > > My view is what Konstantin wants is automatic replication topology > management. For some reason this technology is called HA, DCS, Raft, Paxos > and many other scary words. But basically it manages primary_conn_info of > some nodes to p

Re: Built-in Raft replication

2025-04-15 Thread Tom Lane
Andrey Borodin writes: > I think it's what Konstantin is proposing. To have our own Raft > implementation, without dependencies. Hmm, OK. I thought that the proposal involved relying on some existing code, but re-reading the thread that was said nowhere. Still, that moves it from a large proje

Re: Built-in Raft replication

2025-04-15 Thread Andrey Borodin
> On 16 Apr 2025, at 04:19, Tom Lane wrote: > > feebly, and seems to have a bus factor of 1. Another example is the > Spencer regex engine; we thought we could depend on Tcl to be the > upstream for that, but for a decade or more they've acted as though > *we* are the upstream. I think it's

Re: Built-in Raft replication

2025-04-15 Thread Tom Lane
Nikolay Samokhvalov writes: > This is exactly what I wanted to write as well. The idea is great. At the > same time, I think, consensus on many decisions will be extremely hard to > reach, so this project has a high risk of being very long. Unless it's an > extension, at least in the beginning. Y

Re: Built-in Raft replication

2025-04-15 Thread Nikolay Samokhvalov
On Tue, Apr 15, 2025 at 8:08 AM Greg Sabino Mullane wrote: > On Mon, Apr 14, 2025 at 1:15 PM Konstantin Osipov > wrote: > >> If anyone is working on Raft already I'd be happy to discuss >> the details. I am fairly new to the PostgreSQL hackers ecosystem >> so cautious of starting work in isolati

Re: Built-in Raft replication

2025-04-15 Thread Konstantin Osipov
* Greg Sabino Mullane [25/04/15 18:08]: > > If anyone is working on Raft already I'd be happy to discuss > > the details. I am fairly new to the PostgreSQL hackers ecosystem > > so cautious of starting work in isolation/knowing there is no > > interest in accepting the feature into the trunk. > >

Re: Built-in Raft replication

2025-04-15 Thread Greg Sabino Mullane
On Mon, Apr 14, 2025 at 1:15 PM Konstantin Osipov wrote: > If anyone is working on Raft already I'd be happy to discuss > the details. I am fairly new to the PostgreSQL hackers ecosystem > so cautious of starting work in isolation/knowing there is no > interest in accepting the feature into the t

Re: Built-in Raft replication

2025-04-15 Thread Konstantin Osipov
* Yura Sokolov [25/04/15 14:02]: > I've been working in a company which uses MongoDB (3.6 and up) as their > primary storage. And it seemed to me as "God Send". Everything just worked. > Replication was as reliable as one could imagine. It outlives several > hardware incidents without manual inter

Re: Built-in Raft replication

2025-04-15 Thread Konstantin Osipov
* Aleksander Alekseev [25/04/15 13:20]: > > I am considering starting work on implementing a built-in Raft > > replication for PostgreSQL. > > Generally speaking I like the idea. The more important question IMO is > whether we want to maintain Raft within the Po

Re: Built-in Raft replication

2025-04-15 Thread Yura Sokolov
15.04.2025 14:15, Aleksander Alekseev пишет: > Hi Yura, > >> I've been working in a company which uses MongoDB (3.6 and up) as their >> primary storage. And it seemed to me as "God Send". Everything just worked. >> Replication was as reliable as one could imagine. It outlives several >> hardware i

Re: Built-in Raft replication

2025-04-15 Thread Aleksander Alekseev
Hi Yura, > I've been working in a company which uses MongoDB (3.6 and up) as their > primary storage. And it seemed to me as "God Send". Everything just worked. > Replication was as reliable as one could imagine. It outlives several > hardware incidents without manual intervention. It allowed clus

Re: Built-in Raft replication

2025-04-15 Thread Konstantin Osipov
* Yura Sokolov [25/04/15 12:02]: > > OTOH Raft needs to write its own log, and what's worse, it sometimes > > needs to remove already written parts of it (so, it is not appended > > only, unlike WAL). If you have a production system which maintains two > > kinds of logs with different semantics,

Re: Built-in Raft replication

2025-04-15 Thread Yura Sokolov
15.04.2025 13:20, Aleksander Alekseev пишет: > Hi Konstantin, > >> I am considering starting work on implementing a built-in Raft >> replication for PostgreSQL. > > Generally speaking I like the idea. The more important question IMO is > whether we want to maintain

Re: Built-in Raft replication

2025-04-15 Thread Aleksander Alekseev
Hi Konstantin, > I am considering starting work on implementing a built-in Raft > replication for PostgreSQL. Generally speaking I like the idea. The more important question IMO is whether we want to maintain Raft within the PostgreSQL core project. Building distributed systems on com

Re: Built-in Raft replication

2025-04-15 Thread Yura Sokolov
14.04.2025 20:44, Kirill Reshke пишет: > OTOH Raft needs to write its own log, and what's worse, it sometimes > needs to remove already written parts of it (so, it is not appended > only, unlike WAL). If you have a production system which maintains two > kinds of logs with different semantics, it i

Re: Built-in Raft replication

2025-04-14 Thread Konstantin Osipov
* Kirill Reshke [25/04/14 20:48]: > > I am considering starting work on implementing a built-in Raft > > replication for PostgreSQL. > > > > Just some thought on top of my mind, if you need my voice here: > > I have a hard time believing the community will be p

Re: Built-in Raft replication

2025-04-14 Thread Kirill Reshke
On Mon, 14 Apr 2025 at 22:15, Konstantin Osipov wrote: > > Hi, Hi > I am considering starting work on implementing a built-in Raft > replication for PostgreSQL. > Just some thought on top of my mind, if you need my voice here: I have a hard time believing the community will be

Built-in Raft replication

2025-04-14 Thread Konstantin Osipov
Hi, I am considering starting work on implementing a built-in Raft replication for PostgreSQL. Raft's advantage is that it unifies log replication, cluster configuration/membership/topology management and initial state transfer into a single protocol. Currently the cluster configur