Hello,

I’d start with describing my use case and how I’d like to use Cassandra to 
solve my storage needs.
We're processing a stream of events for various happenings. Every event have a 
unique happening_id.
One happening may have many events, usually ~ 20-100 events. I’d like to store 
only the latest event for the same happening (Event is an incremental update 
and it contains all up-to date data about happening).
Technically the events are streamed from Kafka, processed with Spark an saved 
to Cassandra.
In Cassandra we use upserts (insert with same primary key).  So far so good, 
however there comes the tombstone...

When I’m inserting field with NULL value, Cassandra creates tombstone for this 
field. As I understood this is due to space efficiency, Cassandra doesn’t have 
to remember there is a NULL value, she just deletes the respective column and a 
delete creates a ... tombstone.
I was hoping there could be an option to tell Cassandra not to be so space 
effective and store “unset" info without generating tombstones.
Something similar to inserting empty strings instead of null values:

CREATE TABLE happening (id text PRIMARY KEY, event text);
insert into happening (‘1’, ‘event1’);
— tombstone is generated
insert into happening (‘1’, null);
— tombstone is not generated
insert into happening (‘1’, '’);

Possible solutions:
1. Disable tombstones with gc_grace_seconds = 0 or set to reasonable low value 
(1 hour ?) . Not good, since phantom data may re-appear
2. ignore NULLs on spark side with “spark.cassandra.output.ignoreNulls=true”. 
Not good since this will never overwrite previously inserted event field with 
“empty” one.
3. On inserts with spark, find all NULL values and replace them with “empty” 
equivalent (empty string for text, 0 for integer). Very inefficient and 
problematic to find “empty” equivalent for some data types.

Until tombstones appeared Cassandra was the right fit for our use case, however 
now I’m not sure if we’re heading the right direction.
Could you please give me some advice how to solve this problem ?

Thank you,
Tomas
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to