You can have a read at
https://www.datastax.com/blog/cassandra-anti-patterns-queues-and-queue-datasets
Your table schema does not include the most important piece of
information - the partition keys (and clustering keys, if any). Keep in
mind that you can only efficiently query Cassandra by the exact
partition key or the token of a partition key, otherwise you will have
to rely on MV or secondary index, or worse, scan the entire table (all
the nodes) to find your data.
A Cassandra schema should look like this:
CREATE TABLE xyz (
a text,
b text,
c timeuuid,
d int,
e text,
PRIMARY KEY ((a, b), c, d)
);
The line "PRIMARY KEY" contains arguably the most important piece of
information of the table schema.
On 19/02/2024 06:52, Gowtham S wrote:
Hi Bowen
which is a well documented anti-pattern.
Can you please explain more on this, I'm not aware of it. It will be
helpful to make decisions.
Please find the below table schema
*Table schema*
TopicName - text
Partition - int
MessageUUID - text
Actual data - text
OccurredTime - Timestamp
Status - boolean
We are planning to read the table with the topic name and the status
is not true. And produce those to the respective topic when Kafka is live.
Thanks and regards,
Gowtham S
On Sat, 17 Feb 2024 at 18:10, Bowen Song via user
<user@cassandra.apache.org> wrote:
Hi Gowtham,
On the face of it, it sounds like you are planning to use
Cassandra for a queue-like application, which is a well documented
anti-pattern. If that's not the case, can you please show the
table schema and some example queries?
Cheers,
Bowen
On 17/02/2024 08:44, Gowtham S wrote:
Dear Cassandra Community,
I am reaching out to seek your valuable feedback and insights on
a proposed solution we are considering for managing Kafka outages
using Cassandra.
At our organization, we heavily rely on Kafka for real-time data
processing and messaging. However, like any technology, Kafka is
susceptible to occasional outages which can disrupt our
operations and impact our services. To mitigate the impact of
such outages and ensure continuity, we are exploring the
possibility of leveraging Cassandra as a backup solution.
Our proposed approach involves storing messages in Cassandra
during Kafka outages. Subsequently, we plan to implement a
scheduler that will read from Cassandra and attempt to write
these messages back into Kafka once it is operational again.
We believe that by adopting this strategy, we can achieve the
following benefits:
1.
Improved Fault Tolerance: By having a backup mechanism in
place, we can reduce the risk of data loss and ensure
continuity of operations during Kafka outages.
2.
Enhanced Reliability: Cassandra's distributed architecture
and built-in replication features make it well-suited for
storing data reliably, even in the face of failures.
3.
Scalability: Both Cassandra and Kafka are designed to scale
horizontally, allowing us to handle increased loads seamlessly.
Before proceeding further with this approach, we would greatly
appreciate any feedback, suggestions, or concerns from the
community. Specifically, we are interested in hearing about:
* Potential challenges or drawbacks of using Cassandra as a
backup solution for Kafka outages.
* Best practices or recommendations for implementing such a
backup mechanism effectively.
* Any alternative approaches or technologies that we should
consider?
Your expertise and insights are invaluable to us, and we are
eager to learn from your experiences and perspectives. Please
feel free to share your thoughts or reach out to us with any
questions or clarifications.
Thank you for taking the time to consider our proposal, and we
look forward to hearing from you soon.
Thanks and regards,
Gowtham S