[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables

2016-08-31 Thread Wei Deng (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wei Deng updated CASSANDRA-11906:
-
Labels: lcs  (was: )

> Unstable JVM due too many files when anticompacting big LCS tables
> --
>
> Key: CASSANDRA-11906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11906
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
>Assignee: Sean McCarthy
>  Labels: lcs
> Fix For: 3.0.x
>
>
> I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite 
> heavy:
>   - 13 nodes with spinning disks
>   - ~120 GB of data per node
>   - 50% of CFs are LCS and have quite wide rows.
>   - 2/3 CFs with LCS have more than 200 SStables
> Incremental repairs do not seem to play really well with that.
> I have been running some tests node by node by using the -pr option:
> {code:xml}
> nodetool -h localhost repair -pr keyscheme
> {code}
> and to my surprise the whole process takes quite some time (1 hour
> minimum, 8 hours if I haven't been repairing for 5/6 days).
> Yesterday I tried to run the command with the -seq option so to 
> decrease the number of simultanoues compactions. After a while
> the node on which I was running the repair simply died during
> the anticompaction phase with the following
> exception in the logs.
> {code:xml}
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.RuntimeException: Failed to list files in 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log:
>  Too many open files
>   at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) 
> ~[na:1.8.0_91]
>   at 

[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables

2016-08-11 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-11906:

Assignee: Sean McCarthy  (was: DS Test Eng)

> Unstable JVM due too many files when anticompacting big LCS tables
> --
>
> Key: CASSANDRA-11906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11906
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
>Assignee: Sean McCarthy
> Fix For: 3.0.x
>
>
> I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite 
> heavy:
>   - 13 nodes with spinning disks
>   - ~120 GB of data per node
>   - 50% of CFs are LCS and have quite wide rows.
>   - 2/3 CFs with LCS have more than 200 SStables
> Incremental repairs do not seem to play really well with that.
> I have been running some tests node by node by using the -pr option:
> {code:xml}
> nodetool -h localhost repair -pr keyscheme
> {code}
> and to my surprise the whole process takes quite some time (1 hour
> minimum, 8 hours if I haven't been repairing for 5/6 days).
> Yesterday I tried to run the command with the -seq option so to 
> decrease the number of simultanoues compactions. After a while
> the node on which I was running the repair simply died during
> the anticompaction phase with the following
> exception in the logs.
> {code:xml}
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.RuntimeException: Failed to list files in 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log:
>  Too many open files
>   at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) 
> ~[na:1.8.0_91]

[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables

2016-08-08 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-11906:

Fix Version/s: 3.0.x

> Unstable JVM due too many files when anticompacting big LCS tables
> --
>
> Key: CASSANDRA-11906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11906
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
>Assignee: DS Test Eng
> Fix For: 3.0.x
>
>
> I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite 
> heavy:
>   - 13 nodes with spinning disks
>   - ~120 GB of data per node
>   - 50% of CFs are LCS and have quite wide rows.
>   - 2/3 CFs with LCS have more than 200 SStables
> Incremental repairs do not seem to play really well with that.
> I have been running some tests node by node by using the -pr option:
> {code:xml}
> nodetool -h localhost repair -pr keyscheme
> {code}
> and to my surprise the whole process takes quite some time (1 hour
> minimum, 8 hours if I haven't been repairing for 5/6 days).
> Yesterday I tried to run the command with the -seq option so to 
> decrease the number of simultanoues compactions. After a while
> the node on which I was running the repair simply died during
> the anticompaction phase with the following
> exception in the logs.
> {code:xml}
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.RuntimeException: Failed to list files in 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log:
>  Too many open files
>   at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) 
> ~[na:1.8.0_91]
>   at 

[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables

2016-08-08 Thread Marcus Eriksson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-11906:

Assignee: DS Test Eng

[~cassandra-te] could you try to reproduce this?

> Unstable JVM due too many files when anticompacting big LCS tables
> --
>
> Key: CASSANDRA-11906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11906
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
>Assignee: DS Test Eng
>
> I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite 
> heavy:
>   - 13 nodes with spinning disks
>   - ~120 GB of data per node
>   - 50% of CFs are LCS and have quite wide rows.
>   - 2/3 CFs with LCS have more than 200 SStables
> Incremental repairs do not seem to play really well with that.
> I have been running some tests node by node by using the -pr option:
> {code:xml}
> nodetool -h localhost repair -pr keyscheme
> {code}
> and to my surprise the whole process takes quite some time (1 hour
> minimum, 8 hours if I haven't been repairing for 5/6 days).
> Yesterday I tried to run the command with the -seq option so to 
> decrease the number of simultanoues compactions. After a while
> the node on which I was running the repair simply died during
> the anticompaction phase with the following
> exception in the logs.
> {code:xml}
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.RuntimeException: Failed to list files in 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396
>   at 
> org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332)
>  ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_91]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_91]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_91]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
> Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: 
> /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log:
>  Too many open files
>   at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) 
> ~[apache-cassandra-3.0.6.jar:3.0.6]
>   at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) 
> ~[na:1.8.0_91]
>