rvansa commented on PR #2034: URL: https://github.com/apache/cassandra-java-driver/pull/2034#issuecomment-2837840607
@aratno 1) If you want to use CRaC, yes, none of the dependencies (as it is used) must prevent the checkpoint. I wouldn't call this a circular dependency, but CRaC adoption depends on the level of support within libraries/frameworks, but those communities are more motivated if CRaC is more prevalent. That's why we are talking to framework communities and are actively pushing changes (as in here) rather than expecting frameworks to jump on the train. See e.g. https://docs.azul.com/core/crac/crac-frameworks for some overview of which frameworks claim CRaC support - the presence does not mean 100% compatibility, usually only the more common setups are tested. Also, there's workarounds. For some simple use-cases you can get by with [https://docs.azul.com/core/crac/fd-policies](FD policies configuration), and if the community wants to postpone the support e.g. until next major version we publish an artifact from fork with the fix, or in this case I've created an [artifact that you can drop into dependenices](https://mvnrepository.com/artifact/io.github.crac.org.springframework.boot/crac-spring-boot-starter/3.4.3). However these are meant rather as a temporary workaround. We hope that eventually most of the fixes will be in the libraries, transparent to the users. Naturally that is simpler in stateless apps. 2) Yes, CRaC is somewhere in between GraalVM native and Leyden. Leyden can certainly offer some speedup by assuming closed-world app and moving some operations to build time, but it is not AOT *compilation*. It certainly does not save anything your application does during boot time. In a nutshell, as it is more 'generic' it won't be able to go as far. It is up to the app developer to decide what level of improvement is sufficient and how much energy is worth putting in. If you want some numbers from third-party, check out e.g. [this Helidon blogpost](https://danielkec.github.io/blog/helidon/leyden/native-image/crac/2025/03/07/helidon-aot.html) from Oracle. > I’m concerned that restoring a driver session from a checkpoint (rather than close + re-create) could be a source for hard-to-track bugs, due to stale topology metadata, in-progress queue state, etc. Users would also be limited in where they could restore their checkpoints, since driver internal state is dependent on the local datacenter, for example. This is a valid concern. I would expect that stale metadata shouldn't affect correctness (distributed applications should tolerate that). Regrettably I don't have enough insight into Cassandra to speak more concretely - I am roughly basing my expectations on Infinispan as I've spent couple of years developing that in the past. > But if a restored session re-creates connections, then that’s likely going to dominate start-up time and make the gains of CRaC less visible. The setup of connections is dominated by network latency, and with a local datacenter that means milliseconds or lower tens if multiple roundtrips are required for the handshake. Compare that to overall startup time in seconds for small application, and sometimes minutes for legacy leviathans. Anecdotally speaking, CRaC can restore app from, say 200 MB image in 50-100 ms, if we're talking 200 GB apps this goes to ~5 seconds. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org For additional commands, e-mail: pr-h...@cassandra.apache.org