[ 
https://issues.apache.org/jira/browse/CASSANDRA-4835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-4835:
----------------------------------------

    Attachment: 0002-Ensure-same-timestamp-in-batches.txt
                0001-Fix-prepends-within-same-millis.txt

Alright, this is in fact a legit bug in prepend and is not specific to batches 
(though it's probably harder to reproduce without them). Basically the logic in 
prepend to make sure we were always generating a decreasing keys even in the 
same millisecond was broken. It was working only for the same update, but was 
broke for successive update in the same millisecond. Patch attached to fix that.

That being said, I do think that people should be very careful in assuming that 
statements in a batch are applied in order *even within the same row* because 
that's just not true in general. Batch applies everything "at the same time".  
So for instance:
{noformat}
BEGIN BATCH
  UPDATE user SET name = 'Goo' WHERE userid = 1;
  UPDATE user SET name = 'Foo' WHERE userid = 1;
APPLY BATCH
{noformat}
will always (that's not quite true currently, see below) end up setting 'Goo' 
as the name because the way the reconciliation rules work, the biggest value 
wins for equal timestamp. Similarly,
{noformat}
BEGIN BATCH
  DELETE FROM user WHERE userid = 1;
  UPDATE user SET name = 'Foo' WHERE userid = 1;
APPLY BATCH
{noformat}
will always (again, see below) end up with the user deleted because on 
timestamp ties, tombstone wins.

In other words, there was indeed a bug with prepend, and append/prepend do 
respect the order in batches within the same partition key because we happen to 
process the statements of a batch in order and there is no good reason to do 
otherwise, but I don't think we should make that a guarantee either (as in, 
it's true now, it could change tomorrow, it's an implementation detail). And so 
user shouldn't rely on it, and if the order is important, they should combine 
into one statement.

Now, it is unrelated to lists, but when I said that
{noformat}
BEGIN BATCH
  UPDATE user SET name = 'Goo' WHERE userid = 1;
  UPDATE user SET name = 'Foo' WHERE userid = 1;
APPLY BATCH
{noformat}
will always end up with 'Goo', it's not quite true currently, because batches 
don't guarantee that all update will use the same timestamp (in other words, 
the result of the batch above randomly depends of the timing of the operation). 
 I think that *that* is a guarantee we should provide: that unless the 
timestamp is user provided, all statement of a batch uses the same timestamp. 
I'm attaching a second patch that implements that.

                
> Appending/Prepending items to list using BATCH
> ----------------------------------------------
>
>                 Key: CASSANDRA-4835
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4835
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 1.2.0 beta 1
>            Reporter: Krzysztof Cieslinski
>            Assignee: Sylvain Lebresne
>            Priority: Minor
>             Fix For: 1.2.0 beta 2
>
>         Attachments: 0001-Fix-prepends-within-same-millis.txt, 
> 0002-Ensure-same-timestamp-in-batches.txt
>
>
> As I know, there is no any guarantee that commands that are inside BATCH 
> block will execute in same order, as they are stored in the BATCH block. 
> But...
> I have made two tests:
> First appends some items to the empty list, and the second one, prepends 
> items, also to the empty list. Both of them are using UPDATE commands stored 
> in the BATCH block. 
> Results of those tests are as follow:
> First:
>       When appending new items to list, USING commands are executed in the 
> same order as they are stored i BATCH.
> Second:
>       When prepending new items to list, USING commands are executed in 
> random order.  
> So, in other words below code:
> {code:xml}
> BEGIN BATCH
>  UPDATE... list_name = list_name + [ '1' ]  
>  UPDATE... list_name = list_name + [ '2' ]
>  UPDATE... list_name = list_name + [ '3' ] 
> APPLY BATCH;{code}
>  always results in [ '1', '2', '3' ],
>  but this code:
> {code:xml}
> BEGIN BATCH
>  UPDATE... list_name = [ '1' ] + list_name   
>  UPDATE... list_name = [ '2' ] + list_name
>  UPDATE... list_name = [ '3' ] + list_name
> APPLY BATCH;{code}
> results in randomly ordered list, like [ '2', '1', '3' ]    (expected result 
> is [ '3', '2', '1' ])
> So somehow, when appending items to list, commands from BATCH are executed in 
> order as they are stored, but when prepending, the order is random.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to