Samuel Klock created CASSANDRA-13109:
----------------------------------------

             Summary: Lightweight transactions temporarily fail after upgrade 
from 2.1 to 3.0
                 Key: CASSANDRA-13109
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-13109
             Project: Cassandra
          Issue Type: Bug
            Reporter: Samuel Klock


We've observed this upgrading from 2.1.15 to 3.0.8 and from 2.1.16 to 3.0.10: 
some lightweight transactions executed on upgraded nodes fail with a read 
failure.  The following conditions seem relevant to this occurring:

* The transaction must be conditioned on the current value of at least one 
column, e.g., {{IF NOT EXISTS}} transactions don't seem to be affected.
* There should be a collection column (in our case, a map) defined on the table 
on which the transaction is executed.
* The transaction should be executed before sstables on the node are upgraded.  
The failure does not occur after the sstables have been upgraded (whether via 
{{nodetool upgradesstables}} or effectively via compaction).
* Upgraded nodes seem to be able to participate in lightweight transactions as 
long as they're not the coordinator.
* The values in the row being manipulated by the transaction must have been 
consistently manipulated by lightweight transactions (perhaps the existence of 
Paxos state for the partition is somehow relevant?).
* In 3.0.10, it _seems_ to be necessary to have the partition split across 
multiple legacy sstables.  This was not necessary to reproduce the bug in 3.0.8 
or .9.

For applications affected by this bug, a possible workaround is to prevent 
nodes being upgraded from coordinating requests until sstables have been 
upgraded.

We're able to reproduce this when upgrading from 2.1.16 to 3.0.10 with the 
following steps on a single-node cluster using a mostly pristine 
{{cassandra.yaml}} from the source distribution.

# Start Cassandra-2.1.16 on the node.
# Create a table with a collection column and insert some data into it.
{code:sql}
CREATE KEYSPACE test WITH REPLICATION = {'class': 'SimpleStrategy', 
'replication_factor': 1};
CREATE TABLE test.test (key TEXT PRIMARY KEY, cas_target TEXT, some_collection 
MAP<TEXT, TEXT>);
INSERT INTO test.test (key, cas_target, some_collection) VALUES ('key', 
'value', {}) IF NOT EXISTS;
{code}
# Flush the row to an sstable: {{nodetool flush}}.
# Update the row:
{code:sql}
UPDATE test.test SET cas_target = 'newvalue', some_collection = {} WHERE key = 
'key' IF cas_target = 'value';
{code}
# Drain the node: {{nodetool drain}}
# Stop the node, upgrade to 3.0.10, and start the node.
# Attempt to update the row again:
{code:sql}
UPDATE test.test SET cas_target = 'lastvalue' WHERE key = 'key' IF cas_target = 
'newvalue';
{code}
Using {{cqlsh}}, if the error is reproduced, the following output will be 
returned:
{code:sql}
$ ./cqlsh <<< "UPDATE test.test SET cas_target = 'newvalue', some_collection = 
{} WHERE key = 'key' IF cas_target = 'value';"
(start: 2016-12-22 10:14:27 EST)
<stdin>:2:ReadFailure: Error from server: code=1300 [Replica(s) failed to 
execute read] message="Operation failed - received 0 responses and 1 failures" 
info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 
'consistency': 'QUORUM'}
{code}
and the following stack trace will be present in the system log:
{noformat}
WARN  15:14:28 Uncaught exception on thread 
Thread[SharedPool-Worker-10,10,main]: {}
java.lang.RuntimeException: java.lang.NullPointerException
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2476)
 ~[main/:na]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_101]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 ~[main/:na]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
 [main/:na]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[main/:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
Caused by: java.lang.NullPointerException: null
        at 
org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:617)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.Row$Merger$ColumnDataReducer.getReduced(Row.java:569)
 ~[main/:na]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:220)
 ~[main/:na]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
 ~[main/:na]
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
        at org.apache.cassandra.db.rows.Row$Merger.merge(Row.java:546) 
~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:563)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator$MergeReducer.getReduced(UnfilteredRowIterators.java:527)
 ~[main/:na]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:220)
 ~[main/:na]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
 ~[main/:na]
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:509)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIterators$UnfilteredRowMergeIterator.computeNext(UnfilteredRowIterators.java:369)
 ~[main/:na]
        at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47) 
~[main/:na]
        at 
org.apache.cassandra.db.partitions.AbstractBTreePartition.build(AbstractBTreePartition.java:334)
 ~[main/:na]
        at 
org.apache.cassandra.db.partitions.ImmutableBTreePartition.create(ImmutableBTreePartition.java:111)
 ~[main/:na]
        at 
org.apache.cassandra.db.partitions.ImmutableBTreePartition.create(ImmutableBTreePartition.java:94)
 ~[main/:na]
        at 
org.apache.cassandra.db.SinglePartitionReadCommand.add(SinglePartitionReadCommand.java:810)
 ~[main/:na]
        at 
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndSSTablesInTimestampOrder(SinglePartitionReadCommand.java:760)
 ~[main/:na]
        at 
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDiskInternal(SinglePartitionReadCommand.java:519)
 ~[main/:na]
        at 
org.apache.cassandra.db.SinglePartitionReadCommand.queryMemtableAndDisk(SinglePartitionReadCommand.java:496)
 ~[main/:na]
        at 
org.apache.cassandra.db.SinglePartitionReadCommand.queryStorage(SinglePartitionReadCommand.java:358)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadCommand.executeLocally(ReadCommand.java:394) 
~[main/:na]
        at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1794)
 ~[main/:na]
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2472)
 ~[main/:na]
        ... 5 common frames omitted
{noformat}

Under both 3.0.8 and .9, the {{nodetool flush}} and additional {{UPDATE}} 
statement before upgrading to 3.0 are not necessary to reproduce this.  In that 
case (when Cassandra only has to read the data from one sstable?), a different 
stack trace appears in the log.  Here's a sample from 3.0.8:

{noformat}
 WARN [SharedPool-Worker-3] 2016-12-13 15:19:48,863 
AbstractLocalAwareExecutorService.java (line 169) Uncaught exception on thread 
Thread[SharedPool-Worker-3,5,main]: {}
java.lang.RuntimeException: java.lang.IllegalStateException: 
[ColumnDefinition{name=REDACTED, type=org.apache.cassandra.db.marshal.UTF8Type, 
kind=REGULAR, position=-1}, ColumnDefinition{name=REDACTED2, 
type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
 kind=REGULAR, position=-1}] is not a subset of [REDACTED]
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2453)
 ~[main/:na]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_101]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 ~[main/:na]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
 [main/:na]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[main/:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
Caused by: java.lang.IllegalStateException: [ColumnDefinition{name=REDACTED, 
type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
ColumnDefinition{name=REDACTED2, 
type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
 kind=REGULAR, position=-1}] is not a subset of [REDACTED]
        at 
org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:531) 
~[main/:na]
        at 
org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:465) 
~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:178)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
 ~[main/:na]
        at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:300)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
~[main/:na]
        at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
~[main/:na]
        at 
org.apache.cassandra.service.StorageProxy$LocalReadRunnable.runMayThrow(StorageProxy.java:1796)
 ~[main/:na]
        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2449)
 ~[main/:na]
        ... 5 common frames omitted
 WARN [SharedPool-Worker-1] 2016-12-13 15:19:48,943 
AbstractLocalAwareExecutorService.java (line 169) Uncaught exception on thread 
Thread[SharedPool-Worker-1,5,main]: {}
java.lang.IllegalStateException: [ColumnDefinition{name=REDACTED, 
type=org.apache.cassandra.db.marshal.UTF8Type, kind=REGULAR, position=-1}, 
ColumnDefinition{name=REDACTED2, 
type=org.apache.cassandra.db.marshal.MapType(org.apache.cassandra.db.marshal.UTF8Type,org.apache.cassandra.db.marshal.UTF8Type),
 kind=REGULAR, position=-1}] is not a subset of [REDACTED]
        at 
org.apache.cassandra.db.Columns$Serializer.encodeBitmap(Columns.java:531) 
~[main/:na]
        at 
org.apache.cassandra.db.Columns$Serializer.serializeSubset(Columns.java:465) 
~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:178)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:108)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredSerializer.serialize(UnfilteredSerializer.java:96)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:132)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:87)
 ~[main/:na]
        at 
org.apache.cassandra.db.rows.UnfilteredRowIteratorSerializer.serialize(UnfilteredRowIteratorSerializer.java:77)
 ~[main/:na]
        at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$Serializer.serialize(UnfilteredPartitionIterators.java:300)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.build(ReadResponse.java:134)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:127)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse$LocalDataResponse.<init>(ReadResponse.java:123)
 ~[main/:na]
        at 
org.apache.cassandra.db.ReadResponse.createDataResponse(ReadResponse.java:65) 
~[main/:na]
        at 
org.apache.cassandra.db.ReadCommand.createResponse(ReadCommand.java:289) 
~[main/:na]
        at 
org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:47)
 ~[main/:na]
        at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[main/:na]
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
~[na:1.8.0_101]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
 ~[main/:na]
        at 
org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$LocalSessionFutureTask.run(AbstractLocalAwareExecutorService.java:136)
 [main/:na]
        at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
[main/:na]
        at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
{noformat}

It's not clear to us what changed in 3.0.10 to make this behavior somewhat more 
difficult to reproduce.

We spent some time trying to track down the cause in 3.0.8, and we've 
identified a very small patch (which I will attach to this issue) that _seems_ 
to fix it.  The problem appears to be that the logic that reads data from 
legacy sstables can pull range tombstones covering collection columns that 
weren't requested, which then breaks downstream logic that doesn't expect those 
tombstones to be present in the data.  The patch attempts to include those 
tombstones only if they're explicitly requested.  However, there's enough going 
on in that logic that it's not clear to us whether the change is safe, so it is 
definitely in need of review from someone knowledgable about what that area of 
the code is intended to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to