[ 
https://issues.apache.org/jira/browse/CASSANDRA-7437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14105081#comment-14105081
 ] 

Benedict edited comment on CASSANDRA-7437 at 8/21/14 6:38 AM:
--------------------------------------------------------------

In hindsight, this was pretty obvious.

We wait for modifications to complete to the commit log segment before force 
recycling, but we don't ensure those modifications have hit memtables before 
flushing them to mark the segment clean.

Patch attached that gets the keyspaces with records in the segment and waits 
for any current writes to complete before flushing, and includes a new long 
test to check this works as advertised

It may be worth mentioning that we did in fact wait for these modifications 
already to the dropped table, so the error would not have caused commit logs to 
keep collecting; the problem is with sstables from _other keyspaces_ in the 
commit log (in this case the system keyspace), which would be cleared on the 
next flush.


was (Author: benedict):
In hindsight, this was pretty obvious.

We wait for modifications to complete to the commit log segment before force 
recycling, but we don't ensure those modifications have hit memtables before 
flushing them to mark the segment clean.

Patch attached that gets the keyspaces with records in the segment and waits 
for any current writes to complete before flushing, and includes a new long 
test to check this works as advertised

>  Ensure writes have completed after dropping a table, before recycling commit 
> log segments (CASSANDRA-7437)
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-7437
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-7437
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Benedict
>            Assignee: Benedict
>            Priority: Minor
>             Fix For: 2.1.0
>
>         Attachments: 7437.log, 7437.round2.txt, 7437_test.py
>
>
> I've noticed on unit test output that there are still assertions being raised 
> here, so I've taken a torch to the code path to make damned certain it cannot 
> happen in future 
> # We now wait for all running reads on a column family or writes on the 
> keyspace during a dropCf call
> # We wait for all appends to the prior commit log segments before recycling 
> them
> # We pass the list of dropped Cfs into the CL.forceRecycle call so that they 
> can be markedClean definitely after they have been marked finished
> # Finally, to prevent any possibility of this still happening causing any 
> negative consequences, I've suppressed the assertion in favour of an error 
> log message, as the assertion would break correct program flow for the drop 
> and potentially result in undefined behaviour
> -(in actuality there is the slightest possibility still of a race condition 
> on read of a secondary index that causes a repair driven write, but this is a 
> really tiny race window, as I force wait for all reads after unlinking the 
> CF, so it would have to be a read that grabbed the CFS reference before it 
> was dropped, but hadn't quite started its read op yet).- In fact this is also 
> safe, as these modifications all grab a write op from the Keyspace, which has 
> to happen before they get the CFS, and also because we drop the data before 
> waiting for reads to finish on the CFS.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to