Re: [PR] Support Session suspend and resume [cassandra-java-driver]

via GitHub Tue, 29 Apr 2025 00:53:17 -0700


rvansa commented on PR #2034:
URL: 
https://github.com/apache/cassandra-java-driver/pull/2034#issuecomment-2837840607

@aratno
1) If you want to use CRaC, yes, none of the dependencies (as it is used)
must prevent the checkpoint. I wouldn't call this a circular dependency, but
CRaC adoption depends on the level of support within libraries/frameworks, but
those communities are more motivated if CRaC is more prevalent. That's why we
are talking to framework communities and are actively pushing changes (as in
here) rather than expecting frameworks to jump on the train. See e.g.
https://docs.azul.com/core/crac/crac-frameworks for some overview of which
frameworks claim CRaC support - the presence does not mean 100% compatibility,
usually only the more common setups are tested.
Also, there's workarounds. For some simple use-cases you can get by with
[https://docs.azul.com/core/crac/fd-policies](FD policies configuration), and
if the community wants to postpone the support e.g. until next major version we
publish an artifact from fork with the fix, or in this case I've created an
[artifact that you can drop into
dependenices](https://mvnrepository.com/artifact/io.github.crac.org.springframework.boot/crac-spring-boot-starter/3.4.3).
However these are meant rather as a temporary workaround.
We hope that eventually most of the fixes will be in the libraries,
transparent to the users. Naturally that is simpler in stateless apps.

2) Yes, CRaC is somewhere in between GraalVM native and Leyden. Leyden can
certainly offer some speedup by assuming closed-world app and moving some
operations to build time, but it is not AOT *compilation*. It certainly does
not save anything your application does during boot time. In a nutshell, as it
is more 'generic' it won't be able to go as far. It is up to the app developer
to decide what level of improvement is sufficient and how much energy is worth
putting in.
If you want some numbers from third-party, check out e.g. [this Helidon
blogpost](https://danielkec.github.io/blog/helidon/leyden/native-image/crac/2025/03/07/helidon-aot.html)
from Oracle.

> I’m concerned that restoring a driver session from a checkpoint (rather
than close + re-create) could be a source for hard-to-track bugs, due to stale
topology metadata, in-progress queue state, etc. Users would also be limited in
where they could restore their checkpoints, since driver internal state is
dependent on the local datacenter, for example.

This is a valid concern. I would expect that stale metadata shouldn't affect
correctness (distributed applications should tolerate that). Regrettably I
don't have enough insight into Cassandra to speak more concretely - I am
roughly basing my expectations on Infinispan as I've spent couple of years
developing that in the past.

> But if a restored session re-creates connections, then that’s likely going
to dominate start-up time and make the gains of CRaC less visible.

The setup of connections is dominated by network latency, and with a local
datacenter that means milliseconds or lower tens if multiple roundtrips are
required for the handshake. Compare that to overall startup time in seconds for
small application, and sometimes minutes for legacy leviathans. Anecdotally
speaking, CRaC can restore app from, say 200 MB image in 50-100 ms, if we're
talking 200 GB apps this goes to ~5 seconds.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: pr-unsubscr...@cassandra.apache.org
For additional commands, e-mail: pr-h...@cassandra.apache.org

Re: [PR] Support Session suspend and resume [cassandra-java-driver]

Reply via email to