[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-08-10 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680969#comment-14680969
 ] 

Stefania commented on CASSANDRA-9908:
-

If the listing of files happened before listing the temporary files we would 
have an issue. If the two listing operations happened in the opposite order, as 
we do at the moment, we would have an issue if a new transaction was started in 
between the two operations. We should address these potential problems but 
let's do it as part of CASSANDRA-7066 or a follow up ticket. I'll move the 
discussion there. 

So can we resolve this ticket?

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvi

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-08-10 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14680208#comment-14680208
 ] 

Benedict commented on CASSANDRA-9908:
-

I've committed this as 45c04b72753a738cb4a8af7025ad4cb10a12c452.

I don't think this is quite the end of making this completely 100% safe, 
though, and relates to my comment on CASSANDRA-7066. For instance, we could 
fail to read a {{TransactionLog}} after listing a file that it deletes (because 
both are now deleted), which would result in a {{FileNotFoundException}} during 
CFS instantiation. This particular issue would not affect correctness as it 
stands, though, but I'm not confident there aren't other similar issues 
remaining that wouldn't (very rarely) cause correctness problems.

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixExceptio

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-08-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679665#comment-14679665
 ] 

Benedict commented on CASSANDRA-9908:
-

bq. Done. However, waitForCessasion is not called by Keyspace.dropCf - just 
pointing it out, I agree that we should call it.

The table UUID means it isn't necessary to wait for correctness purposes in 
{{dropCf}}, and it is better not to wait because it makes the command itself 
more responsive. I'm genuinely a little concerned about completely preventing 
schema updates making it to the node, as {{waitForCessation}} could take a long 
time if there are huge rows (or we have a bug)

bq. what are the other risks? 

Loading old data files for the new index. But you're right, with CASSANDRA-7066 
(but not before), running {{dropSSTables()}} before this but compactions have 
stopped should be completely safe.

It may also be worth though, on supported platforms, making cleanup happen much 
earlier. We can delete files we want to be rid of immediately on any unix 
platform, since we no longer depend on the file's presence for serving reads / 
constructing new readers. It is only Windows that requires our file handles are 
all closed before we delete. This should definitely be a separate ticket though.

bq. Please disregard the old patch based on trunk, I rebased it on 
cassandra-3.0 and renamed it to 9908-3.0.

Thanks, I'll review shortly.

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-08-09 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14679517#comment-14679517
 ] 

Stefania commented on CASSANDRA-9908:
-

bq. Thinking about it, we need to replicate a little more code from the regular 
CF drop: interruption of compactions. We should be calling both 
CompactionManager.instance.interruptCompactionForCFs and waitForCessation, to 
ensure that most references to the files are also relinquished before we signal 
successful drop.

Done. However, {{waitForCessasion}} is not called by {{Keyspace.dropCf}} - just 
pointing it out, I agree that we should call it.

bq. Then, unfortunately, that still doesn't put us in the clear, but I'm 
beginning to think it would be better and more robust to just accelerate 
SecondaryIndexes having their own UUID, which I believe is already on the cards 
for the near future. As we can still have other long running operations - such 
as keySamples which have references to sstables that can be long lived, that 
aren't compactions and won't acquiesce. So, really, we need to sleep-spin until 
they are all released. This is pretty unpleasant, and it does run the risk of 
schema changes becoming blocked indefinitely on a leaked reference.

Other than this race, which would result in a log message after this patch, 
what are the other risks? From what I understand we include temporary files 
when calculating the latest generation number in 
{{ColumnFamilyStore.createColumnFamilyStore}} but we exclude them when loading 
the sstables in the constructor of CFS. So, we should not reuse old generation 
numbers (which would be really bad) and we should not load old files (also 
quite bad). We delete the tx log file last, and we call getTemporaryFiles() 
before calling listFiles() in Directories.filter(). So can we still pick up 
temporary files by mistake? If anything we should always ensure that 
listFiles() is never called before getTemporaryFiles() by perhaps moving this 
code together? Or am I missing something completely? I do agree on using 
separate UUID for indexes by the way, I am just reasoning on how to make this 
safe given the current status.

bq. I would also prefer we at least logged an error message if we encounter the 
race condition with the NoSuchFileException. Since it definitely means we still 
have problems.

I changed the log message from WARN to ERROR.

--

Please disregard the old patch based on trunk, I rebased it on cassandra-3.0 
and renamed it to 
[9908-3.0|https://github.com/stef1927/cassandra/commits/9908-3.0].


> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.Transaction

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-08-09 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14663311#comment-14663311
 ] 

Benedict commented on CASSANDRA-9908:
-

(It's also worth noting this race condition as a whole is not new to 
CASSANDRA-7066, but it is easier to hit now. The patch as stands should make 
the situation better than pre-7066)

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:361)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:407)
> [junit] at 
> java.nio.file.spi.FileSystemProvider.newInputStream

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-08-08 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14662981#comment-14662981
 ] 

Benedict commented on CASSANDRA-9908:
-

Thinking about it, we need to replicate a little more code from the regular CF 
drop: interruption of compactions. We should be calling both 
{{CompactionManager.instance.interruptCompactionForCFs}} and 
{{waitForCessation}}, to ensure that most references to the files are also 
relinquished before we signal successful drop.

Then, unfortunately, that still doesn't put us in the clear, but I'm beginning 
to think it would be better and more robust to just accelerate SecondaryIndexes 
having their own UUID, which I believe is already on the cards for the near 
future. As we can still have other long running operations - such as 
{{keySamples}} which have references to sstables that can be long lived, that 
aren't compactions and won't acquiesce. So, really, we need to sleep-spin until 
they are all released. This is pretty unpleasant, and it does run the risk of 
schema changes becoming blocked indefinitely on a leaked reference.

I would also prefer we at least logged an error message if we encounter the 
race condition with the NoSuchFileException. Since it definitely means we still 
have problems.


> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] a

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-08-03 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14651564#comment-14651564
 ] 

Benedict commented on CASSANDRA-9908:
-

Yep, that all looks good.

bq.  Isn't this already guaranteed by SecondaryIndexManager.removeIndexedColumn?

Yes, it was just a general statement, without having yet dived into this piece 
of code sufficiently, that we needed to ensure this.

(sorry for the delay, I thought I'd already responded to this)

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:361)
> [junit] at 

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-30 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647488#comment-14647488
 ] 

Stefania commented on CASSANDRA-9908:
-

So something like this then:

{code}
public void removeIndex(ByteBuffer columnName)
{
indexCfs.keyspace.writeOrder.awaitNewBarrier();
indexCfs.forceBlockingFlush();

indexCfs.readOrdering.awaitNewBarrier();
indexCfs.invalidate();
}
{code}

bq. This should all happen after the index is no longer reachable by any new 
operations.

Here you mean operations on the base cfs of the index? Isn't this already 
guaranteed by {{SecondaryIndexManager.removeIndexedColumn}}?

Agreed on waiting for deletions in {{invalidate}}.

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-30 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14647351#comment-14647351
 ] 

Benedict commented on CASSANDRA-9908:
-

Take a look in {{Keyspace.dropCf}} for another example. In that case it is done 
to ensure there are no writes to the commit log for the table after it's 
dropped. In this case that's unimportant, but we _do_ want to be certain that 
when we call {{waitForDeletions}} we can be confident the deletions have all 
actually happened, and that the log has been cleaned up. This means waiting 
until nobody is using any of the sstables.

Really, though, we should be waiting for any writes to complete, then we should 
clean our memtable, perform a blocking flush (which waits for any in-progress 
flushes to complete), then waiting for any reads to complete, and _then_ drop 
our tables and wait for deletions. This should all happen after the index is no 
longer reachable by any new operations.

Waiting for deletions should be fine to put in {{dropSSTables}}, however I 
think it might be better put in {{invalidate}}, since there's no need to wait 
for them during truncation, for instance.


> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassa

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-29 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14646991#comment-14646991
 ] 

Stefania commented on CASSANDRA-9908:
-

Thank you for checking Sam. So if we cannot rely on calling exists() just prior 
to reading the lines, we might as well not call it again (to help performance a 
bit) and wrap the exception, which is what I did in the latest commit.

bq. Probably we should at least introduce a TransactionLogs.waitForDeletions 
call after any schema change.

I'm not sure exactly where, do you mean in MigrationManager.announce()? Would 
it be bad for performance to put it in Tracker.dropSSTables()?

bq. We should also make certain dropping an index awaits a new writeOrder and 
readOrder barrier prior to this.

You mean something like this?

{code}
public void removeIndex(ByteBuffer columnName)
{
final OpOrder.Barrier readBarrier = indexCfs.readOrdering.newBarrier();
final OpOrder.Barrier writeBarrier = 
indexCfs.keyspace.writeOrder.newBarrier();

readBarrier.issue();
writeBarrier.issue();

readBarrier.await();
writeBarrier.await();

indexCfs.invalidate();
}
{code}

Also, this is not related to the transaction logs, it's to protect ongoing 
operations using the index cfs?

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassand

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-29 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645925#comment-14645925
 ] 

Benedict commented on CASSANDRA-9908:
-

Probably we should at least introduce a {{TransactionLogs.waitForDeletions}} 
call after any schema change. We should also make certain dropping an index 
awaits a new writeOrder and readOrder barrier prior to this.

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
> Attachments: TEST-org.apache.cassandra.db.SecondaryIndexTest.log
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:361)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:407)
> [junit] at 
> java.nio.file.spi.FileSystemProvider.newInputStream(FileSyste

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-29 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645687#comment-14645687
 ] 

Sam Tunnicliffe commented on CASSANDRA-9908:


The patch narrows the window for the race but doesn't close it completely. It 
remains possible for the async cleanup to remove the file between the existence 
check in {{TransactionData#getTemporaryFiles}} and actually reading its content 
in {{TransactionFile#getTrackedFiles}}.

 My test fails much less frequently now but I can still hit the error on 
occasion, so I added some extra logging to capture the race happening 
(extracted from the attached log):  

{noformat}
INFO  [main] 2015-07-29 09:13:50,782 TransactionLogs.java:228  Deleted txn log 
file 
build/test/cassandra/data/SecondaryIndexTest1/CompositeIndexToBeAdded-bd86e0e035c911e5951aad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_bea2b120-35c9-11e5-951a-ad83a3dc3e9c_new.log
INFO  [NonPeriodicTasks:1] 2015-07-29 09:13:50,793 TransactionLogs.java:537  
Removing files for transaction bea2b120-35c9-11e5-951a-ad83a3dc3e9c
INFO  [NonPeriodicTasks:1] 2015-07-29 09:13:50,823 TransactionLogs.java:228  
Deleted txn log file 
build/test/cassandra/data/SecondaryIndexTest1/CompositeIndexToBeAdded-bd86e0e035c911e5951aad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_bea2b120-35c9-11e5-951a-ad83a3dc3e9c_old.log
INFO  [main] 2015-07-29 09:13:50,823 TransactionLogs.java:351  Txn old log file 
build/test/cassandra/data/SecondaryIndexTest1/CompositeIndexToBeAdded-bd86e0e035c911e5951aad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_bea2b120-35c9-11e5-951a-ad83a3dc3e9c_old.log
 exists
INFO  [main] 2015-07-29 09:13:50,831 TransactionLogs.java:195  Error reading 
from txn log file 
build/test/cassandra/data/SecondaryIndexTest1/CompositeIndexToBeAdded-bd86e0e035c911e5951aad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_bea2b120-35c9-11e5-951a-ad83a3dc3e9c_old.log
java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
build/test/cassandra/data/SecondaryIndexTest1/CompositeIndexToBeAdded-bd86e0e035c911e5951aad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_bea2b120-35c9-11e5-951a-ad83a3dc3e9c_old.log
at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620) 
~[main/:na]
at 
org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:192)
 ~[main/:na]
at 
org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:352)
 [main/:na]
at 
org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:757)
 [main/:na]
at 
org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
 [main/:na]
at 
org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
 [main/:na]
at 
org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641) 
[main/:na]
at 
org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606) 
[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351) 
[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313) 
[main/:na]
at 
org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
 [main/:na]
at 
org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
 [main/:na]
at 
org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
 [main/:na]
at 
org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
 [classes/:na]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
~[na:1.8.0_45]
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
~[na:1.8.0_45]
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 ~[na:1.8.0_45]
at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
 [junit-4.6.jar:na]
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
 [junit-4.6.jar:na]
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
 [junit-4.6.jar:na]
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
 [junit-4.6.jar:na]
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28) 
[junit-4.6.jar:na]
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31) 
[junit-4.6.jar:na]
at 
org.junit.runne

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14645277#comment-14645277
 ] 

Stefania commented on CASSANDRA-9908:
-

Regardless of indexes, I think it should be safe to call 
{{LifecycleTransaction.getTemporaryFiles}} at any time. Should we perhaps get 
the temporary files by using a future on the same thread performing the 
deletions or synchronize in some other way? Or just ignore the exception?

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:361)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:407)
> [junit] at 
> java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644539#comment-14644539
 ] 

Benedict commented on CASSANDRA-9908:
-

Doesn't that sort of obviate much of the supposed benefit of unique cfIds?

Anyway, I guess in this instance we can just make sure that all txn logs have 
been fully cleared before creating a secondary index. The rub will be if the 
cleanup fails, though (deletion on windows, for instance), or there are still 
extant readers. In either case the cleanup can't safely complete.

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:361)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:4

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-28 Thread Sam Tunnicliffe (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644519#comment-14644519
 ] 

Sam Tunnicliffe commented on CASSANDRA-9908:


No, an index cfs doesn't have a unique cfId, instead it inherits it from its 
parent table.

{code:title=AbstractPerColumnSecondaryIndex#newIndexMetadata}
 CFMetaData.Builder builder = CFMetaData.Builder.create(baseMetadata.ksName, 
baseMetadata.indexColumnFamilyName(def))
.withId(baseMetadata.cfId)
.addPartitionKey(def.name, 
def.type);
{code}



> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit] at java.nio.file.Files.newByteChannel

[jira] [Commented] (CASSANDRA-9908) Potential race caused by async cleanup of transaction log files

2015-07-28 Thread Benedict (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14644500#comment-14644500
 ] 

Benedict commented on CASSANDRA-9908:
-

Hmm. So our secondary index files don't use their cfId in their directory name?

> Potential race caused by async cleanup of transaction log files
> ---
>
> Key: CASSANDRA-9908
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9908
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sam Tunnicliffe
>Assignee: Stefania
> Fix For: 3.0 beta 1
>
>
> There seems to be a potential race in the cleanup of transaction log files, 
> introduced in CASSANDRA-7066
> It's pretty hard to trigger on trunk, but it's possible to hit it via 
> {{o.a.c.db.SecondaryIndexTest#testCreateIndex}} 
> That test creates an index, then removes it to check that the removal is 
> correctly recorded, then adds the index again to assert that it gets rebuilt 
> from the existing data. 
> The removal causes the SSTables of the index CFS to be dropped, which is a 
> transactional operation and so writes a transaction log. When the drop is 
> completed and the last reference to an SSTable is released, the cleanup of 
> the transaction log is scheduled on the periodic tasks executor. The issue is 
> that re-creating the index re-creates the index CFS. When this happens, it's 
> possible for the cleanup of the txn log to have not yet happened. If so, the 
> initialization of the CFS attempts to read the log to identify any orphaned 
> temporary files. The cleanup can happen between the finding the log file and 
> reading it's contents, which results in a {{NoSuchFileException}}
> {noformat}
> [junit] java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] java.lang.RuntimeException: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:620)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionFile.getTrackedFiles(TransactionLogs.java:190)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs$TransactionData.getTemporaryFiles(TransactionLogs.java:338)
> [junit] at 
> org.apache.cassandra.db.lifecycle.TransactionLogs.getTemporaryFiles(TransactionLogs.java:739)
> [junit] at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getTemporaryFiles(LifecycleTransaction.java:541)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.getFilter(Directories.java:652)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:641)
> [junit] at 
> org.apache.cassandra.db.Directories$SSTableLister.list(Directories.java:606)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:351)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.(ColumnFamilyStore.java:313)
> [junit] at 
> org.apache.cassandra.db.ColumnFamilyStore.createColumnFamilyStore(ColumnFamilyStore.java:511)
> [junit] at 
> org.apache.cassandra.index.internal.CassandraIndexer.addIndexedColumn(CassandraIndexer.java:115)
> [junit] at 
> org.apache.cassandra.index.SecondaryIndexManager.addIndexedColumn(SecondaryIndexManager.java:265)
> [junit] at 
> org.apache.cassandra.db.SecondaryIndexTest.testIndexCreate(SecondaryIndexTest.java:467)
> [junit] Caused by: java.nio.file.NoSuchFileException: 
> build/test/cassandra/data:1/SecondaryIndexTest1/CompositeIndexToBeAdded-d0885f60323211e5a5e8ad83a3dc3e9c/.birthdate_index/transactions/unknowncompactiontype_d4b69fc0-3232-11e5-a5e8-ad83a3dc3e9c_old.log
> [junit] at 
> sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
> [junit] at 
> sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
> [junit] at 
> sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:361)
> [junit] at java.nio.file.Files.newByteChannel(Files.java:407)
> [junit] at 
> java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
> [junit] at java.nio.file.Files.newInputStream(Files.java:152)
> [junit] at java.nio.file.Files.newBufferedReader(Files.java:2784)
> [junit] at java.nio.file.Files.readAllLines(Fil