[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy

2016-04-01 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221300#comment-15221300
 ] 

Branimir Lambov commented on CASSANDRA-11448:
-

I believe we don't need it.

The reason to catch and propagate it is to be able to abort any runnables that 
were created before the one that failed. There can't be any in the singleton 
case, thus a catch-and-propagate would not be doing anything.

> Running OOS should trigger the disk failure policy
> --
>
> Key: CASSANDRA-11448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11448
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Assignee: Branimir Lambov
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Currently when you run OOS, this happens:
> {noformat}
> ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MemtableFlushWriter:8561,5,main]   java.lang.RuntimeException: 
> Insufficient disk space to write 48 bytes 
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>  ~[guava-16.0.1.jar:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
> {noformat}
> Now your flush writer is dead and postflush tasks build up forever.  Instead 
> we should throw FSWE and trigger the failure policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy

2016-03-31 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220427#comment-15220427
 ] 

Joshua McKenzie commented on CASSANDRA-11448:
-

On the trunk patch, do we also need to catch FSWriteError on the [singletonList 
case|https://github.com/blambov/cassandra/blob/11448/src/java/org/apache/cassandra/db/Memtable.java#L264-L265]?

Other than that, looks reasonable to me.

> Running OOS should trigger the disk failure policy
> --
>
> Key: CASSANDRA-11448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11448
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Assignee: Branimir Lambov
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Currently when you run OOS, this happens:
> {noformat}
> ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MemtableFlushWriter:8561,5,main]   java.lang.RuntimeException: 
> Insufficient disk space to write 48 bytes 
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>  ~[guava-16.0.1.jar:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
> {noformat}
> Now your flush writer is dead and postflush tasks build up forever.  Instead 
> we should throw FSWE and trigger the failure policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy

2016-03-31 Thread Branimir Lambov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219524#comment-15219524
 ] 

Branimir Lambov commented on CASSANDRA-11448:
-

Patch here:
|[2.1|https://github.com/blambov/cassandra/tree/11448-2.1]|[utest|http://cassci.datastax.com/job/blambov-11448-2.1-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11448-2.1-dtest/]|
|[2.2|https://github.com/blambov/cassandra/tree/11448-2.2]|[utest|http://cassci.datastax.com/job/blambov-11448-2.2-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11448-2.2-dtest/]|
|[3.0|https://github.com/blambov/cassandra/tree/11448-3.0]|[utest|http://cassci.datastax.com/job/blambov-11448-3.0-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11448-3.0-dtest/]|
|[trunk|https://github.com/blambov/cassandra/tree/11448]|[utest|http://cassci.datastax.com/job/blambov-11448-testall/]|[dtest|http://cassci.datastax.com/job/blambov-11448-dtest/]|
2.1 patch applies cleanly up to 3.0 (disregarding {{CHANGES.txt}} conflicts), 
trunk is different.

Changes the error thrown to {{FSWriteError}}, applies 
{{JVMStabilityInspector}}, passes it on to the post-flush instead of failing to 
trigger it, and makes sure commit log is not discarded for failed flushes. 
Trunk version also removes unused {{DiskAwareRunnable}} and fixes {{null}} 
return from {{getWriteableLocation}} not being treated properly.

> Running OOS should trigger the disk failure policy
> --
>
> Key: CASSANDRA-11448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11448
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Assignee: Branimir Lambov
> Fix For: 2.1.x, 2.2.x, 3.0.x, 3.x
>
>
> Currently when you run OOS, this happens:
> {noformat}
> ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MemtableFlushWriter:8561,5,main]   java.lang.RuntimeException: 
> Insufficient disk space to write 48 bytes 
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>  ~[guava-16.0.1.jar:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
> {noformat}
> Now your flush writer is dead and postflush tasks build up forever.  Instead 
> we should throw FSWE and trigger the failure policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy

2016-03-29 Thread Jack Krupansky (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215912#comment-15215912
 ] 

Jack Krupansky commented on CASSANDRA-11448:


Curious that we haven't had an acronym for Out Of Space in more common usage. 
In fact, this is the first time I've seen it. OOM is so common and so obvious, 
but OOS seems so foreign. Maybe that's because disk drives are so big these 
days that most people will now no longer come close to... running OOS on an 
HDD. SSD changes that with the (currently) much smaller drive size.

> Running OOS should trigger the disk failure policy
> --
>
> Key: CASSANDRA-11448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11448
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Assignee: Branimir Lambov
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Currently when you run OOS, this happens:
> {noformat}
> ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MemtableFlushWriter:8561,5,main]   java.lang.RuntimeException: 
> Insufficient disk space to write 48 bytes 
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>  ~[guava-16.0.1.jar:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
> {noformat}
> Now your flush writer is dead and postflush tasks build up forever.  Instead 
> we should throw FSWE and trigger the failure policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy

2016-03-29 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215636#comment-15215636
 ] 

Sylvain Lebresne commented on CASSANDRA-11448:
--

Fyi, it would be nice to avoid acronyms at least on the first use in tickets 
descriptions. I have personally no clue what OOS stands for so I assume our 
average user won't either and this makes searches useless.

> Running OOS should trigger the disk failure policy
> --
>
> Key: CASSANDRA-11448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11448
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
>Assignee: Branimir Lambov
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Currently when you run OOS, this happens:
> {noformat}
> ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MemtableFlushWriter:8561,5,main]   java.lang.RuntimeException: 
> Insufficient disk space to write 48 bytes 
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>  ~[guava-16.0.1.jar:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
> {noformat}
> Now your flush writer is dead and postflush tasks build up forever.  Instead 
> we should throw FSWE and trigger the failure policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11448) Running OOS should trigger the disk failure policy

2016-03-28 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15214773#comment-15214773
 ] 

Jeremiah Jordan commented on CASSANDRA-11448:
-

And make sure the post flush isn't blocking forever, if someone has their 
failure policy set to ignore.

> Running OOS should trigger the disk failure policy
> --
>
> Key: CASSANDRA-11448
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11448
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Brandon Williams
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Currently when you run OOS, this happens:
> {noformat}
> ERROR [MemtableFlushWriter:8561] 2016-03-28 01:17:37,047  
> CassandraDaemon.java:229 - Exception in thread 
> Thread[MemtableFlushWriter:8561,5,main]   java.lang.RuntimeException: 
> Insufficient disk space to write 48 bytes 
> at 
> org.apache.cassandra.io.util.DiskAwareRunnable.getWriteDirectory(DiskAwareRunnable.java:29)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.db.Memtable$FlushRunnable.runMayThrow(Memtable.java:332) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
> ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297)
>  ~[guava-16.0.1.jar:na]
> at 
> org.apache.cassandra.db.ColumnFamilyStore$Flush.run(ColumnFamilyStore.java:1120)
>  ~[cassandra-all-2.1.12.1046.jar:2.1.12.1046]
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  ~[na:1.8.0_66]
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  ~[na:1.8.0_66]
> at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_66]
> {noformat}
> Now your flush writer is dead and postflush tasks build up forever.  Instead 
> we should throw FSWE and trigger the failure policy.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)