Hi everyone,

I'm revisiting a thread from 2015 
(https://www.mail-archive.com/[email protected]/msg00554.html) about 
achieving sub-second failover detection in HA clusters, and I'm curious about 
the current state of affairs nearly a decade later.

My Environment:

- Corosync 3.1.6
- Pacemaker 2.1.2
- Architecture: 2-node cluster + QDevice (also testing 3-node setups)
- Network: Dedicated physical NIC for cluster traffic (low-latency requirements)

Specific Questions:

1. With modern Corosync/Pacemaker versions, is sub-second fault detection and 
failover initiation realistically achievable in production environments?
2. Are there any published measurements or community experiences showing the 
fastest stable failover times you've achieved? What's considered a reliable 
minimum time span?
3. Have there been significant enhancements in the newer versions of Corosync 
and Pacemaker (post-2015) that specifically target detection speed and failover 
latency?
4. If sub-second detection is possible, what are the key configuration 
parameters and potential trade-offs (false positives, network sensitivity, 
resource overhead)?

Thanks in advance!

Holger Haidinger

_______________________________________________
Manage your subscription:
https://lists.clusterlabs.org/mailman/listinfo/users

ClusterLabs home: https://www.clusterlabs.org/

Reply via email to