Robert Stupp created CASSANDRA-15035:
----------------------------------------

             Summary: C* 3.0 sstables w/ UDTs are corrupted in 3.11 + 4.0
                 Key: CASSANDRA-15035
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15035
             Project: Cassandra
          Issue Type: Bug
          Components: Feature/UDT, Local/SSTable
            Reporter: Robert Stupp
            Assignee: Robert Stupp
             Fix For: 3.11.5, 4.0


OSS C* 3.0 writes incorrect type information for UDTs into the 
serialization-header of each sstable.

In C* 3.0, both UDTs and tuple are always frozen. A frozen type must be 
enclosed in a {{frozen<...>}} via the {{CQL3Type}} hierarchy (resp 
{{org.apache.cassandra.db.marshal.FrozenType(...)}} via the {{AbstractType}} 
hierarchy) “bracket” in the schema and serialization-header.

Since CASSANDRA-7423 (committed to C* 3.6) UDTs can also be non-frozen (= 
multi-cell).

Unfortunately, C* 3.0 does not write the 
{{org.apache.cassandra.db.marshal.FrozenType(...)}} “bracket” for UDTs into the 
{{SerializationHeader.Component}} in the {{-Stats.db}} sstable component.

The order in which columns of a row are serialized depends on the concrete 
{{AbstractType}}. Columns with variable length types (frozen types belong to 
this category) are serialized before columns with multi-cell types (non-frozen 
types belong to that category).

If C* 3.6 (or any newer version) reads an sstable written by C* 3.0 (up to 
3.5), it will read the type information “non-frozen UDT” from the serialization 
header, which is technically correct.

This means, that upgrades from C* 3.0 to C* 3.11 and 4.0, using a schema that 
uses UDTs, result in inaccessible data in those sstables. Reads against 3.0 
sstables as well as attempts to scrub these sstables result in a wide variety 
of errors/exceptions ({{CorruptSSTableException}}, {{EOFExcepiton}}, 
{{OutOfMemoryError}}, etc etc), as usual in such cases.

Mitigation strategy in the proposed patch:
* Fix the broken serialization-headers automatically when an upgrade from C* 
3.0 is detected.
* Enhance {{sstablescrub}} to verify the serialization-header against the 
schema and allow {{sstablescrub}} to fix the UDT types according to the 
information in the schema. This does not apply to "online scrub" (e.g. nodetool 
scrub). The behavior of {{sstablescrub}} has been changed to first inspect the 
serialization-header and verify the type information against the schema. 

Differences between the schema and the sstable serialization-headers cause 
{{sstablescrub}} to error out and stop - i.e. safety first (there’s a way to 
opt-out though).

A new class {{SSTableHeaderFix}} can inspect the serialization-header 
({{SerializationHeader.Component}}) in the the {{-Statistics.db}} component and 
fix the type information in those sstables for UDTs according to the schema 
information.

This new class could be used during verify and before sstables are imported. 
But changes to “verify” and “import” are out of the scope of this ticket, as 
the patch is already bigger than I originally expected.

Another issue not tackled by this ticket is that the wrong ‘kind’ is written to 
the type information in {{system_schema.dropped_columns}} when a non-frozen UDT 
column is dropped. When a UDT column is dropped, the type of the dropped column 
is converted from the UDT definition to its “corresponding” tuple type 
definition. But all versions currently write {{frozen<tuple<...>>}}, but for 
non-frozen UDTs it should actually just be {{tuple<...>}}. Unfortunately, there 
is nothing that could be done in this ticket to fix (or even consider) the type 
information of a dropped column. But for correctness, the tuple type should be 
a multi-cell one (only accessible for dropped UDTs though - not as something 
that a user can create as a type).




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to