Hi, Following up on the discussion we had yesterday in the Apache Geode Community meeting around the "Reflections on conserve-sockets setting in Apache Geode" topic, I'd like to post here some questions that could not be fully answered during the meeting:
The Geode documentation states the following about conserve-sockets and WAN deployments in [1]: "WAN deployments increase the messaging demands on a Geode system. To avoid hangs related to WAN messaging, always set `conserve-sockets=false` for Geode members that participate in a WAN deployment." It also states the following about conserve-sockets and transactions in [2]: "When you have transactions operating on EMPTY, NORMAL or PARTITION regions, make sure that conserve-sockets is set to false to avoid distributed deadlocks." Doing a search on the Geode tests, the only test case related to deadlocks with conserve-sockets=true that I have found is: https://github.com/apache/geode/blob/41eb49989f25607acfcbf9ac5afe3d4c0721bb35/geode-wan/src/distributedTest/java/org/apache/geode/internal/cache/wan/serial/SerialGatewaySenderDistributedDeadlockDUnitTest.java#L176 According to the comments in the test, it always causes a distributed deadlock, and it is commented out. Nevertheless, the test case is actually NOT commented out and, in fact, if you execute it, you see it passing without any failure/deadlock. And here the questions: Could it be that deadlocks with conserve-sockets=true and WAN and/or transactions over partitioned regions was some legacy issue that has already been fixed? Otherwise, could someone please provide some more information about why these deadlocks could happen? It would be great if there were test cases that showcase this possibility. It looks like a big limitation of Geode that you are forced to set conserve-sockets to false (with the implications this has on resources usage) when you are using WAN replication and/or transactions on partitioned regions. Could it be that there are other elements (for example also using CacheListeners as Anthony Baker pointed out) that would increase the risk of hitting a distributed deadlock? Thanks in advance, Alberto [1]: https://geode.apache.org/docs/guide/114/managing/monitor_tune/sockets_and_gateways.html [2]: https://geode.apache.org/docs/guide/114/managing/monitor_tune/performance_controls_controlling_socket_use.html