On Wed, Apr 16, 2025 at 9:37 AM Andrey Borodin <x4...@yandex-team.ru> wrote: > > My view is what Konstantin wants is automatic replication topology > management. For some reason this technology is called HA, DCS, Raft, Paxos > and many other scary words. But basically it manages primary_conn_info of > some nodes to provide some fault-tolerance properties. I'd start to design > from here, not from Raft paper. >
In my experience, the load of managing hundreds of replicas which all participate in RAFT protocol becomes more than regular transaction load. So making every replica a RAFT participant will affect the ability to deploy hundreds of replica. We may build an extension which has a similar role in PostgreSQL world as zookeeper in Hadoop. It can be then used for other distributed systems as well - like shared nothing clusters based on FDW. There's already a proposal to bring CREATE SERVER to the world of logical replication - so I see these two worlds uniting in future. The way I imagine it is some PostgreSQL instances, which have this extension installed, will act as a RAFT cluster (similar to Zookeeper ensemble or etcd cluster). The distributed system based on logical replication or FDW or both will use this ensemble to manage its shared state. The same ensemble can be shared across multiple distributed clusters if it has scaling capabilities. -- Best Wishes, Ashutosh Bapat