I agree that inserting null is not as good as not inserting that column at
all when you have confidence that you are not shadowing any underlying
data. But pragmatically speaking it really doesn't sound like a small
number of incidental nulls/tombstones (< 20% of columns, otherwise
CASSANDRA-3442 takes over) is going to have any performance impact either
in your query patterns or in compaction in any practical sense.

If INSERT of null values is problematic for small portions of your data,
then it stands to reason that an INSERT option containing an instruction to
prevent tombstone creation would be an important performance optimization
(and would also address the fact that non-null collections also generate
tombstones on INSERT as well).  INSERT INTO ... USING no_tombstones;


> There's thresholds (log messages, etc.) which operate on tombstone counts
over a certain number, but not on column counts over the same number.

tombstone_warn_threshold and tombstone_failure_threshold only apply to
clustering scans right?  I.E. tombstones don't count against those
thresholds if they are not part of the clustering key column being
considered for the non-EQ relation?  The documentation certainly implies so:

tombstone_warn_threshold¶
<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_warn_threshold>
(Default: 1000) The maximum number of tombstones a query can scan before
warning.tombstone_failure_threshold¶
<http://docs.datastax.com/en/cassandra/2.0/cassandra/configuration/configCassandra_yaml_r.html?scroll=reference_ds_qfg_n1r_1k__tombstone_failure_threshold>
(Default: 100000) The maximum number of tombstones a query can scan before
aborting.

On Wed, Apr 29, 2015 at 12:42 PM, Robert Coli <rc...@eventbrite.com> wrote:

> On Wed, Apr 29, 2015 at 9:16 AM, Eric Stevens <migh...@gmail.com> wrote:
>
>> In the end, inserting a tombstone into a non-clustered column shouldn't
>> be appreciably worse (if it is at all) than inserting a value instead.  Or
>> am I missing something here?
>>
>
> There's thresholds (log messages, etc.) which operate on tombstone counts
> over a certain number, but not on column counts over the same number.
>
> Given that tombstones are often smaller than data columns, sorta hard to
> understand conceptually?
>
> =Rob
>
>

Reply via email to