[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables
[ https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wei Deng updated CASSANDRA-11906: - Labels: lcs (was: ) > Unstable JVM due too many files when anticompacting big LCS tables > -- > > Key: CASSANDRA-11906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11906 > Project: Cassandra > Issue Type: Bug >Reporter: Stefano Ortolani >Assignee: Sean McCarthy > Labels: lcs > Fix For: 3.0.x > > > I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite > heavy: > - 13 nodes with spinning disks > - ~120 GB of data per node > - 50% of CFs are LCS and have quite wide rows. > - 2/3 CFs with LCS have more than 200 SStables > Incremental repairs do not seem to play really well with that. > I have been running some tests node by node by using the -pr option: > {code:xml} > nodetool -h localhost repair -pr keyscheme > {code} > and to my surprise the whole process takes quite some time (1 hour > minimum, 8 hours if I haven't been repairing for 5/6 days). > Yesterday I tried to run the command with the -seq option so to > decrease the number of simultanoues compactions. After a while > the node on which I was running the repair simply died during > the anticompaction phase with the following > exception in the logs. > {code:xml} > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.lang.RuntimeException: Failed to list files in > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396 > at > org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_91] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_91] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: > Too many open files > at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) > ~[na:1.8.0_91] > at
[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables
[ https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-11906: Assignee: Sean McCarthy (was: DS Test Eng) > Unstable JVM due too many files when anticompacting big LCS tables > -- > > Key: CASSANDRA-11906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11906 > Project: Cassandra > Issue Type: Bug >Reporter: Stefano Ortolani >Assignee: Sean McCarthy > Fix For: 3.0.x > > > I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite > heavy: > - 13 nodes with spinning disks > - ~120 GB of data per node > - 50% of CFs are LCS and have quite wide rows. > - 2/3 CFs with LCS have more than 200 SStables > Incremental repairs do not seem to play really well with that. > I have been running some tests node by node by using the -pr option: > {code:xml} > nodetool -h localhost repair -pr keyscheme > {code} > and to my surprise the whole process takes quite some time (1 hour > minimum, 8 hours if I haven't been repairing for 5/6 days). > Yesterday I tried to run the command with the -seq option so to > decrease the number of simultanoues compactions. After a while > the node on which I was running the repair simply died during > the anticompaction phase with the following > exception in the logs. > {code:xml} > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.lang.RuntimeException: Failed to list files in > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396 > at > org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_91] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_91] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: > Too many open files > at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) > ~[na:1.8.0_91]
[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables
[ https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-11906: Fix Version/s: 3.0.x > Unstable JVM due too many files when anticompacting big LCS tables > -- > > Key: CASSANDRA-11906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11906 > Project: Cassandra > Issue Type: Bug >Reporter: Stefano Ortolani >Assignee: DS Test Eng > Fix For: 3.0.x > > > I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite > heavy: > - 13 nodes with spinning disks > - ~120 GB of data per node > - 50% of CFs are LCS and have quite wide rows. > - 2/3 CFs with LCS have more than 200 SStables > Incremental repairs do not seem to play really well with that. > I have been running some tests node by node by using the -pr option: > {code:xml} > nodetool -h localhost repair -pr keyscheme > {code} > and to my surprise the whole process takes quite some time (1 hour > minimum, 8 hours if I haven't been repairing for 5/6 days). > Yesterday I tried to run the command with the -seq option so to > decrease the number of simultanoues compactions. After a while > the node on which I was running the repair simply died during > the anticompaction phase with the following > exception in the logs. > {code:xml} > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.lang.RuntimeException: Failed to list files in > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396 > at > org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_91] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_91] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: > Too many open files > at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) > ~[na:1.8.0_91] > at
[jira] [Updated] (CASSANDRA-11906) Unstable JVM due too many files when anticompacting big LCS tables
[ https://issues.apache.org/jira/browse/CASSANDRA-11906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-11906: Assignee: DS Test Eng [~cassandra-te] could you try to reproduce this? > Unstable JVM due too many files when anticompacting big LCS tables > -- > > Key: CASSANDRA-11906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11906 > Project: Cassandra > Issue Type: Bug >Reporter: Stefano Ortolani >Assignee: DS Test Eng > > I have recently moved from C* 2.1.x to C* 3.0.6. The setup is quite > heavy: > - 13 nodes with spinning disks > - ~120 GB of data per node > - 50% of CFs are LCS and have quite wide rows. > - 2/3 CFs with LCS have more than 200 SStables > Incremental repairs do not seem to play really well with that. > I have been running some tests node by node by using the -pr option: > {code:xml} > nodetool -h localhost repair -pr keyscheme > {code} > and to my surprise the whole process takes quite some time (1 hour > minimum, 8 hours if I haven't been repairing for 5/6 days). > Yesterday I tried to run the command with the -seq option so to > decrease the number of simultanoues compactions. After a while > the node on which I was running the repair simply died during > the anticompaction phase with the following > exception in the logs. > {code:xml} > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-25 21:54:21,868 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.lang.RuntimeException: Failed to list files in > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396 > at > org.apache.cassandra.db.lifecycle.LogAwareFileLister.list(LogAwareFileLister.java:57) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.lifecycle.LifecycleTransaction.getFiles(LifecycleTransaction.java:547) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.filter(Directories.java:691) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$SSTableLister.listFiles(Directories.java:662) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories$TrueFilesSizeVisitor.(Directories.java:981) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.getTrueAllocatedSizeIn(Directories.java:893) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.Directories.trueSnapshotsSize(Directories.java:883) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.db.ColumnFamilyStore.trueSnapshotsSize(ColumnFamilyStore.java:2332) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:637) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > org.apache.cassandra.metrics.TableMetrics$32.getValue(TableMetrics.java:634) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at > com.codahale.metrics.graphite.GraphiteReporter.reportGauge(GraphiteReporter.java:281) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:158) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_91] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_91] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_91] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_91] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91] > Caused by: java.lang.RuntimeException: java.nio.file.FileSystemException: > /data/cassandra/data/keyschema/columnfamily-3996ce80b7ac11e48a9b6776bf484396/ma_txn_anticompactionafterrepair_f20b50d0-22bd-11e6-970f-6f22464f4624.log: > Too many open files > at org.apache.cassandra.io.util.FileUtils.readLines(FileUtils.java:622) > ~[apache-cassandra-3.0.6.jar:3.0.6] > at java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1321) > ~[na:1.8.0_91] >