> I’m still not sure if having tombstones vs. empty values / frozen UDTs will have the same results.
When in doubt, benchmark. Good luck, Jon On Wed, Jan 9, 2019 at 3:02 PM Tomas Bartalos <tomas.barta...@gmail.com> wrote: > Loosing atomic updates is a good point, but in my use case its not a > problem, since I always overwrite the whole record (no partitial updates). > > I’m still not sure if having tombstones vs. empty values / frozen UDTs > will have the same results. > When I update one row with 10 null columns it will create 10 tombstones. > We do OLAP processing of data stored in Cassandra with Spark. > > When Spark requests range of data, lets say 1000 rows, I can easily hit > the 10 000 tombstones threshold. > > Even if I would not hit the error threshold Spark requests would increase > the heap pressure, because tombstones have to be collected and returned to > coordinator. > > Are my assumptions correct ? > > On 4 Jan 2019, at 21:15, DuyHai Doan <doanduy...@gmail.com> wrote: > > The idea of storing your data as a single blob can be dangerous. > > Indeed, you loose the ability to perform atomic update on each column. > > In Cassandra, LWW is the rule. Suppose 2 concurrent updates on the same > row, 1st update changes column Firstname (let's say it's a Person record) > and 2nd update changes column Lastname > > Now depending on the timestamp between the 2 updates, you'll have: > > - old Firstname, new Lastname > - new Firstname, old Lastname > > having updates on columns atomically guarantees you to have new Firstname, > new Lastname > > On Fri, Jan 4, 2019 at 8:17 PM Jonathan Haddad <j...@jonhaddad.com> wrote: > >> Those are two different cases though. It *sounds like* (again, I may be >> missing the point) you're trying to overwrite a value with another value. >> You're either going to serialize a blob and overwrite a single cell, or >> you're going to overwrite all the cells and include a tombstone. >> >> When you do a read, reading a single tombstone vs a single vs is >> essentially the same thing, performance wise. >> >> In your description you said "~ 20-100 events", and you're overwriting >> the event each time, so I don't know how you go to 10K tombstones either. >> Compaction will bring multiple tombstones together for a cell in the same >> way it compacts multiple values for a single cell. >> >> I sounds to make like you're taking some advice about tombstones out of >> context and trying to apply the advice to a different problem. Again, I >> might be misunderstanding what you're doing. >> >> >> On Fri, Jan 4, 2019 at 10:49 AM Tomas Bartalos <tomas.barta...@gmail.com> >> wrote: >> >>> Hello Jon, >>> >>> I thought having tombstones is much higher overhead than just >>> overwriting values. The compaction overhead can be l similar, but I think >>> the read performance is much worse. >>> >>> Tombstones accumulate and hang for 10 days (by default) before they are >>> eligible for compaction. >>> >>> Also we have tombstone warning and error thresholds. If cassandra scans >>> more than 10 000 tombstones, she will abort the query. >>> >>> According to this article: >>> https://opencredo.com/blogs/cassandra-tombstones-common-issues/ >>> >>> "The cassandra.yaml comments explain in perfectly: *“When executing a >>> scan, within or across a partition, we need to keep the tombstones seen in >>> memory so we can return them to the coordinator, which will use them to >>> make sure other replicas also know about the deleted rows. With workloads >>> that generate a lot of tombstones, this can cause performance problems and >>> even exhaust the server heap. "* >>> >>> Regards, >>> Tomas >>> >>> On Fri, 4 Jan 2019, 7:06 pm Jonathan Haddad <j...@jonhaddad.com wrote: >>> >>>> If you're overwriting values, it really doesn't matter much if it's a >>>> tombstone or any other value, they still need to be compacted and have the >>>> same overhead at read time. >>>> >>>> Tombstones are problematic when you try to use Cassandra as a queue (or >>>> something like a queue) and you need to scan over thousands of tombstones >>>> in order to get to the real data. You're simply overwriting a row and >>>> trying to avoid a single tombstone. >>>> >>>> Maybe I'm missing something here. Why do you think overwriting a >>>> single cell with a tombstone is any worse than overwriting a single cell >>>> with a value? >>>> >>>> Jon >>>> >>>> >>>> On Fri, Jan 4, 2019 at 9:57 AM Tomas Bartalos <tomas.barta...@gmail.com> >>>> wrote: >>>> >>>>> Hello, >>>>> >>>>> I beleive your approach is the same as using spark with " >>>>> spark.cassandra.output.ignoreNulls=true" >>>>> This will not cover the situation when a value have to be overwriten >>>>> with null. >>>>> >>>>> I found one possible solution - change the schema to keep only primary >>>>> key fields and move all other fields to frozen UDT. >>>>> create table (year, month, day, id, frozen<Event>, primary key((year, >>>>> month, day), id) ) >>>>> In this way anything that is null inside event doesn't create >>>>> tombstone, since event is serialized to BLOB. >>>>> The penalty is in need of deserializing the whole Event when selecting >>>>> only few columns. >>>>> Can anyone confirm if this is good solution performance wise? >>>>> >>>>> Thank you, >>>>> >>>>> On Fri, 4 Jan 2019, 2:20 pm DuyHai Doan <doanduy...@gmail.com wrote: >>>>> >>>>>> "The problem is I can't know the combination of set/unset values" --> >>>>>> Just for this requirement, Achilles has a working solution for many years >>>>>> using INSERT_NOT_NULL_FIELDS strategy: >>>>>> >>>>>> https://github.com/doanduyhai/Achilles/wiki/Insert-Strategy >>>>>> >>>>>> Or you can use the Update API that by design only perform update on >>>>>> not null fields: >>>>>> https://github.com/doanduyhai/Achilles/wiki/Quick-Reference#updating-all-non-null-fields-for-an-entity >>>>>> >>>>>> >>>>>> Behind the scene, for each new combination of INSERT INTO >>>>>> table(x,y,z) statement, Achilles will check its prepared statement cache >>>>>> and if the statement does not exist yet, create a new prepared statement >>>>>> and put it into the cache for later re-use for you >>>>>> >>>>>> Disclaiment: I'm the creator of Achilles >>>>>> >>>>>> >>>>>> >>>>>> On Thu, Dec 27, 2018 at 10:21 PM Tomas Bartalos < >>>>>> tomas.barta...@gmail.com> wrote: >>>>>> >>>>>>> Hello, >>>>>>> >>>>>>> The problem is I can't know the combination of set/unset values. >>>>>>> From my perspective every value should be set. The event from Kafka >>>>>>> represents the complete state of the happening at certain point in >>>>>>> time. In >>>>>>> my table I want to store the latest event so the most recent state of >>>>>>> the >>>>>>> happening (in this table I don't care about the history). Actually I >>>>>>> used >>>>>>> wrong expression since its just the opposite of "incremental update", >>>>>>> every >>>>>>> event carries all data (state) for specific point of time. >>>>>>> >>>>>>> The event is represented with nested json structure. Top level >>>>>>> elements of the json are table fields with type like text, boolean, >>>>>>> timestamp, list and the nested elements are UDT fields. >>>>>>> >>>>>>> Simplified example: >>>>>>> There is a new purchase for the happening, event: >>>>>>> {total_amount: 50, items : [A, B, C, new_item], purchase_time : >>>>>>> '2018-12-27 13:30', specials: null, customer : {... }, fare_amount,...} >>>>>>> I don't know what actually happened for this event, maybe there is a >>>>>>> new item purchased, maybe some customer info have been changed, maybe >>>>>>> the >>>>>>> specials have been revoked and I have to reset them. I just need to >>>>>>> store >>>>>>> the state as it artived from Kafka, there might already be an event for >>>>>>> this happening saved before, or maybe this is the first one. >>>>>>> >>>>>>> BR, >>>>>>> Tomas >>>>>>> >>>>>>> >>>>>>> On Thu, 27 Dec 2018, 9:36 pm Eric Stevens <migh...@gmail.com wrote: >>>>>>> >>>>>>>> Depending on the use case, creating separate prepared statements >>>>>>>> for each combination of set / unset values in large INSERT/UPDATE >>>>>>>> statements may be prohibitive. >>>>>>>> >>>>>>>> Instead, you can look into driver level support for UNSET values. >>>>>>>> Requires Cassandra 2.2 or later IIRC. >>>>>>>> >>>>>>>> See: >>>>>>>> Java Driver: >>>>>>>> https://docs.datastax.com/en/developer/java-driver/3.0/manual/statements/prepared/#parameters-and-binding >>>>>>>> Python Driver: >>>>>>>> https://www.datastax.com/dev/blog/python-driver-2-6-0-rc1-with-cassandra-2-2-features#distinguishing_between_null_and_unset_values >>>>>>>> Node Driver: >>>>>>>> https://docs.datastax.com/en/developer/nodejs-driver/3.5/features/datatypes/nulls/#unset >>>>>>>> >>>>>>>> On Thu, Dec 27, 2018 at 3:21 PM Durity, Sean R < >>>>>>>> sean_r_dur...@homedepot.com> wrote: >>>>>>>> >>>>>>>>> You say the events are incremental updates. I am interpreting this >>>>>>>>> to mean only some columns are updated. Others should keep their >>>>>>>>> original >>>>>>>>> values. >>>>>>>>> >>>>>>>>> You are correct that inserting null creates a tombstone. >>>>>>>>> >>>>>>>>> Can you only insert the columns that actually have new values? >>>>>>>>> Just skip the columns with no information. (Make the insert generator >>>>>>>>> a bit >>>>>>>>> smarter.) >>>>>>>>> >>>>>>>>> Create table happening (id text primary key, event text, a text, b >>>>>>>>> text, c text); >>>>>>>>> Insert into table happening (id, event, a, b, c) values >>>>>>>>> ("MainEvent","The most complete info we have right >>>>>>>>> now","Priceless","10 >>>>>>>>> pm","Grand Ballroom"); >>>>>>>>> -- b changes >>>>>>>>> Insert into happening (id, b) values ("MainEvent","9:30 pm"); >>>>>>>>> >>>>>>>>> >>>>>>>>> Sean Durity >>>>>>>>> >>>>>>>>> >>>>>>>>> -----Original Message----- >>>>>>>>> From: Tomas Bartalos <tomas.barta...@gmail.com> >>>>>>>>> Sent: Thursday, December 27, 2018 9:27 AM >>>>>>>>> To: user@cassandra.apache.org >>>>>>>>> Subject: [EXTERNAL] Howto avoid tombstones when inserting NULL >>>>>>>>> values >>>>>>>>> >>>>>>>>> Hello, >>>>>>>>> >>>>>>>>> I’d start with describing my use case and how I’d like to use >>>>>>>>> Cassandra to solve my storage needs. >>>>>>>>> We're processing a stream of events for various happenings. Every >>>>>>>>> event have a unique happening_id. >>>>>>>>> One happening may have many events, usually ~ 20-100 events. I’d >>>>>>>>> like to store only the latest event for the same happening (Event is >>>>>>>>> an >>>>>>>>> incremental update and it contains all up-to date data about >>>>>>>>> happening). >>>>>>>>> Technically the events are streamed from Kafka, processed with >>>>>>>>> Spark an saved to Cassandra. >>>>>>>>> In Cassandra we use upserts (insert with same primary key). So >>>>>>>>> far so good, however there comes the tombstone... >>>>>>>>> >>>>>>>>> When I’m inserting field with NULL value, Cassandra creates >>>>>>>>> tombstone for this field. As I understood this is due to space >>>>>>>>> efficiency, >>>>>>>>> Cassandra doesn’t have to remember there is a NULL value, she just >>>>>>>>> deletes >>>>>>>>> the respective column and a delete creates a ... tombstone. >>>>>>>>> I was hoping there could be an option to tell Cassandra not to be >>>>>>>>> so space effective and store “unset" info without generating >>>>>>>>> tombstones. >>>>>>>>> Something similar to inserting empty strings instead of null >>>>>>>>> values: >>>>>>>>> >>>>>>>>> CREATE TABLE happening (id text PRIMARY KEY, event text); insert >>>>>>>>> into happening (‘1’, ‘event1’); — tombstone is generated insert into >>>>>>>>> happening (‘1’, null); — tombstone is not generated insert into >>>>>>>>> happening >>>>>>>>> (‘1’, '’); >>>>>>>>> >>>>>>>>> Possible solutions: >>>>>>>>> 1. Disable tombstones with gc_grace_seconds = 0 or set to >>>>>>>>> reasonable low value (1 hour ?) . Not good, since phantom data may >>>>>>>>> re-appear 2. ignore NULLs on spark side with >>>>>>>>> “spark.cassandra.output.ignoreNulls=true”. Not good since this will >>>>>>>>> never >>>>>>>>> overwrite previously inserted event field with “empty” one. >>>>>>>>> 3. On inserts with spark, find all NULL values and replace them >>>>>>>>> with “empty” equivalent (empty string for text, 0 for integer). Very >>>>>>>>> inefficient and problematic to find “empty” equivalent for some data >>>>>>>>> types. >>>>>>>>> >>>>>>>>> Until tombstones appeared Cassandra was the right fit for our use >>>>>>>>> case, however now I’m not sure if we’re heading the right direction. >>>>>>>>> Could you please give me some advice how to solve this problem ? >>>>>>>>> >>>>>>>>> Thank you, >>>>>>>>> Tomas >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org >>>>>>>>> >>>>>>>>> >>>>>>>>> ________________________________ >>>>>>>>> >>>>>>>>> The information in this Internet Email is confidential and may be >>>>>>>>> legally privileged. It is intended solely for the addressee. Access >>>>>>>>> to this >>>>>>>>> Email by anyone else is unauthorized. If you are not the intended >>>>>>>>> recipient, any disclosure, copying, distribution or any action taken >>>>>>>>> or >>>>>>>>> omitted to be taken in reliance on it, is prohibited and may be >>>>>>>>> unlawful. >>>>>>>>> When addressed to our clients any opinions or advice contained in this >>>>>>>>> Email are subject to the terms and conditions expressed in any >>>>>>>>> applicable >>>>>>>>> governing The Home Depot terms of business or client engagement >>>>>>>>> letter. The >>>>>>>>> Home Depot disclaims all responsibility and liability for the >>>>>>>>> accuracy and >>>>>>>>> content of this attachment and for any damages or losses arising from >>>>>>>>> any >>>>>>>>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or >>>>>>>>> other >>>>>>>>> items of a destructive nature, which may be contained in this >>>>>>>>> attachment >>>>>>>>> and shall not be liable for direct, indirect, consequential or special >>>>>>>>> damages in connection with this e-mail message or its attachment. >>>>>>>>> >>>>>>>>> >>>>>>>>> --------------------------------------------------------------------- >>>>>>>>> To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org >>>>>>>>> For additional commands, e-mail: user-h...@cassandra.apache.org >>>>>>>>> >>>>>>>> >>>> >>>> -- >>>> Jon Haddad >>>> http://www.rustyrazorblade.com >>>> twitter: rustyrazorblade >>>> >>> >> >> -- >> Jon Haddad >> http://www.rustyrazorblade.com >> twitter: rustyrazorblade >> > > -- Jon Haddad http://www.rustyrazorblade.com twitter: rustyrazorblade