[jira] [Created] (CASSANDRA-7803) When compaction is interrupted, it leaves locked, undeletable files
Scooletz created CASSANDRA-7803: --- Summary: When compaction is interrupted, it leaves locked, undeletable files Key: CASSANDRA-7803 URL: https://issues.apache.org/jira/browse/CASSANDRA-7803 Project: Cassandra Issue Type: Bug Components: Core Environment: RedHat, xfs4, JNA enabled, JBOD, Reporter: Scooletz During tests of new features of 2.1 like: - incremental repairs - leveled compaction I interrupted a compaction, which left the following ERROR in the _system.log_ {quote} org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction interrupted: Compaction@152e6e70-1975-11e4-ba09-61f0d75c60c6(xx, xxx, 378505918/1993581634)bytes at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:174) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_09] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.7.0_09] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ~[na:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_09] at java.lang.Thread.run(Thread.java:722) [na:1.7.0_09] {quote} Right after that, a cascade of reoccurring errors was emitted: {quote} ERROR [NonPeriodicTasks:1] 2014-08-19 13:38:41,258 SSTableDeletingTask.java:81 - Unable to delete /grid/data04/cassandra/data/xx/xxx-152e6e70197511e4ba0961f0d75c60c6/xx-xxx-ka-55058-Data.db (it will be removed on server restart; we'll also retry after GC) {quote} which made this node blinking (noted from the other nodes gossiper log entries). After restart, the node is healthy and fully operational. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-7803) When compaction is interrupted, it leaves locked, undeletable files
[ https://issues.apache.org/jira/browse/CASSANDRA-7803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scooletz updated CASSANDRA-7803: Description: During tests of new features of 2.1 like: - incremental repairs - leveled compaction I interrupted a compaction, which left the following ERROR in the _system.log_ {quote} org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction interrupted: Compaction@152e6e70-1975-11e4-ba09-61f0d75c60c6(xx, xxx, 378505918/1993581634)bytes at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:174) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_09] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.7.0_09] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ~[na:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_09] at java.lang.Thread.run(Thread.java:722) [na:1.7.0_09] {quote} Right after that, a cascade of reoccurring errors was being emitted till the restart: {quote} ERROR [NonPeriodicTasks:1] 2014-08-19 13:38:41,258 SSTableDeletingTask.java:81 - Unable to delete /grid/data04/cassandra/data/xx/xxx-152e6e70197511e4ba0961f0d75c60c6/xx-xxx-ka-55058-Data.db (it will be removed on server restart; we'll also retry after GC) {quote} which made this node blinking (noted from the other nodes gossiper log entries). After restart, the node is healthy and fully operational. was: During tests of new features of 2.1 like: - incremental repairs - leveled compaction I interrupted a compaction, which left the following ERROR in the _system.log_ {quote} org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction interrupted: Compaction@152e6e70-1975-11e4-ba09-61f0d75c60c6(xx, xxx, 378505918/1993581634)bytes at org.apache.cassandra.db.compaction.CompactionTask.runWith(CompactionTask.java:174) at org.apache.cassandra.io.util.DiskAwareRunnable.runMayThrow(DiskAwareRunnable.java:48) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) at org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:74) at org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:59) at org.apache.cassandra.db.compaction.CompactionManager$BackgroundCompactionTask.run(CompactionManager.java:235) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) ~[na:1.7.0_09] at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.7.0_09] at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) ~[na:1.7.0_09] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) [na:1.7.0_09] at java.lang.Thread.run(Thread.java:722) [na:1.7.0_09] {quote} Right after that, a cascade of reoccurring errors was emitted: {quote} ERROR [NonPeriodicTasks:1] 2014-08-19 13:38:41,258 SSTableDeletingTask.java:81 - Unable to delete /grid/data04/cassandra/data/xx/xxx-152e6e70197511e4ba0961f0d75c60c6/xx-xxx-ka-55058-Data.db (it will be removed on server restart; we'll also retry after GC) {quote} which made this node blinking (noted from the other nodes gossiper log entries). After restart, the node is healthy and fully operational. When compaction is interrupted, it leaves locked, undeletable files --- Key: CASSANDRA-7803 URL: https://issues.apache.org/jira/browse/CASSANDRA-7803 Project: Cassandra Issue Type: Bug Components: Core Environment: RedHat, xfs4, JNA enabled, JBOD, Reporter: Scooletz Labels: Comparator During tests of new features of 2.1 like: - incremental repairs - leveled compaction I interrupted a compaction, which left the following ERROR in the _system.log_ {quote} org.apache.cassandra.db.compaction.CompactionInterruptedException: Compaction interrupted:
[jira] [Created] (CASSANDRA-6998) HintedHandoff - expired hints may block future hints deliveries
Scooletz created CASSANDRA-6998: --- Summary: HintedHandoff - expired hints may block future hints deliveries Key: CASSANDRA-6998 URL: https://issues.apache.org/jira/browse/CASSANDRA-6998 Project: Cassandra Issue Type: Bug Components: Core Environment: - cluster of two DCs: DC1, DC2 - keyspace using NetworkTopologyStrategy (replication factors for both DCs) - heavy load (write:read, 100:1) with LOCAL_QUORUM using Java driver setup with DC awareness, writing to DC1 Reporter: Scooletz Fix For: 2.0.3 For tests purposes, DC2 was shut down for 1 day. The _hints_ table was filled with millions of rows. Now, when _HintedHandOffManager_ tries to _doDeliverHintsToEndpoint_ it queries the store with QueryFilter.getSliceFilter which counts deleted (TTLed) cells and throws org.apache.cassandra.db.filter.TombstoneOverwhelmingException. Throwing this exception stops the manager from running compaction as it is run only after successful handoff. This leaves the HH practically disabled till administrator runs truncateAllHints. Wouldn't it be nicer if on org.apache.cassandra.db.filter.TombstoneOverwhelmingException run compaction? That would remove TTLed hints leaving whole HH mechanism in a healthy state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (CASSANDRA-6998) HintedHandoff - expired hints may block future hints deliveries
[ https://issues.apache.org/jira/browse/CASSANDRA-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Scooletz updated CASSANDRA-6998: Description: For tests purposes, DC2 was shut down for 1 day. The _hints_ table was filled with millions of rows. Now, when _HintedHandOffManager_ tries to _doDeliverHintsToEndpoint_ it queries the store with QueryFilter.getSliceFilter which counts deleted (TTLed) cells and throws org.apache.cassandra.db.filter.TombstoneOverwhelmingException. Throwing this exception stops the manager from running compaction as it is run only after successful handoff. This leaves the HH practically disabled till administrator runs truncateAllHints. Wouldn't it be nicer if on org.apache.cassandra.db.filter.TombstoneOverwhelmingException run compaction? That would remove TTLed hints leaving whole HH mechanism in a healthy state. The stacktrace is: {quote} org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:72) at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:297) at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:53) at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1487) at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1306) at org.apache.cassandra.db.HintedHandOffManager.doDeliverHintsToEndpoint(HintedHandOffManager.java:351) at org.apache.cassandra.db.HintedHandOffManager.deliverHintsToEndpoint(HintedHandOffManager.java:309) at org.apache.cassandra.db.HintedHandOffManager.access$300(HintedHandOffManager.java:92) at org.apache.cassandra.db.HintedHandOffManager$4.run(HintedHandOffManager.java:530) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) {quote} was: For tests purposes, DC2 was shut down for 1 day. The _hints_ table was filled with millions of rows. Now, when _HintedHandOffManager_ tries to _doDeliverHintsToEndpoint_ it queries the store with QueryFilter.getSliceFilter which counts deleted (TTLed) cells and throws org.apache.cassandra.db.filter.TombstoneOverwhelmingException. Throwing this exception stops the manager from running compaction as it is run only after successful handoff. This leaves the HH practically disabled till administrator runs truncateAllHints. Wouldn't it be nicer if on org.apache.cassandra.db.filter.TombstoneOverwhelmingException run compaction? That would remove TTLed hints leaving whole HH mechanism in a healthy state. HintedHandoff - expired hints may block future hints deliveries --- Key: CASSANDRA-6998 URL: https://issues.apache.org/jira/browse/CASSANDRA-6998 Project: Cassandra Issue Type: Bug Components: Core Environment: - cluster of two DCs: DC1, DC2 - keyspace using NetworkTopologyStrategy (replication factors for both DCs) - heavy load (write:read, 100:1) with LOCAL_QUORUM using Java driver setup with DC awareness, writing to DC1 Reporter: Scooletz Labels: HintedHandoff, TTL Fix For: 2.0.3 For tests purposes, DC2 was shut down for 1 day. The _hints_ table was filled with millions of rows. Now, when _HintedHandOffManager_ tries to _doDeliverHintsToEndpoint_ it queries the store with QueryFilter.getSliceFilter which counts deleted (TTLed) cells and throws org.apache.cassandra.db.filter.TombstoneOverwhelmingException. Throwing this exception stops the manager from running compaction as it is run only after successful handoff. This leaves the HH practically disabled till administrator runs truncateAllHints. Wouldn't it be nicer if on org.apache.cassandra.db.filter.TombstoneOverwhelmingException run compaction? That would remove TTLed hints leaving whole HH mechanism in a healthy state. The stacktrace is: {quote} org.apache.cassandra.db.filter.TombstoneOverwhelmingException at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:201) at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:122) at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:80)
[jira] [Commented] (CASSANDRA-6998) HintedHandoff - expired hints may block future hints deliveries
[ https://issues.apache.org/jira/browse/CASSANDRA-6998?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13963007#comment-13963007 ] Scooletz commented on CASSANDRA-6998: - Thx [~iamaleksey] for your quick comment, I think this problem is not solved yet though. In the comments section of the issue referenced by you https://issues.apache.org/jira/browse/CASSANDRA- [~gsanderson] asks: {quote} One quick question... this patch will hopefully prevent us getting into this state in many cases, unless you have a huge number of hints for a node that is down for a very long time: since there is no auto-compaction, a large number of hints may have expired from TTL, thus preventing any further hint delivery. {quote} That's the case I'm referring to. I got this Is it covered by ? If not, is it considered as a bug? I added stacktrace to the issue. HintedHandoff - expired hints may block future hints deliveries --- Key: CASSANDRA-6998 URL: https://issues.apache.org/jira/browse/CASSANDRA-6998 Project: Cassandra Issue Type: Bug Components: Core Environment: - cluster of two DCs: DC1, DC2 - keyspace using NetworkTopologyStrategy (replication factors for both DCs) - heavy load (write:read, 100:1) with LOCAL_QUORUM using Java driver setup with DC awareness, writing to DC1 Reporter: Scooletz Labels: HintedHandoff, TTL Fix For: 2.0.3 For tests purposes, DC2 was shut down for 1 day. The _hints_ table was filled with millions of rows. Now, when _HintedHandOffManager_ tries to _doDeliverHintsToEndpoint_ it queries the store with QueryFilter.getSliceFilter which counts deleted (TTLed) cells and throws org.apache.cassandra.db.filter.TombstoneOverwhelmingException. Throwing this exception stops the manager from running compaction as it is run only after successful handoff. This leaves the HH practically disabled till administrator runs truncateAllHints. Wouldn't it be nicer if on org.apache.cassandra.db.filter.TombstoneOverwhelmingException run compaction? That would remove TTLed hints leaving whole HH mechanism in a healthy state. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (CASSANDRA-5310) New authentication module does not wok in multi datacenters in case of network outage
[ https://issues.apache.org/jira/browse/CASSANDRA-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13871801#comment-13871801 ] Scooletz commented on CASSANDRA-5310: - Hi, will this be merged to 2.* branch? New authentication module does not wok in multi datacenters in case of network outage - Key: CASSANDRA-5310 URL: https://issues.apache.org/jira/browse/CASSANDRA-5310 Project: Cassandra Issue Type: Improvement Affects Versions: 1.2.2 Environment: Ubuntu 12.04 Cluster of 16 nodes in 2 datacenters (8 nodes in each datacenter) Reporter: jal Assignee: Aleksey Yeschenko Priority: Minor Fix For: 1.2.3 Attachments: auth_fix_consistency.patch With 1.2.2, I am using the new authentication backend PasswordAuthenticator with the authorizer CassandraAuthorizer In case of network outage, we are no more able to connect to Cassandra. Here is the error message we get when I want to connect through cqlsh: Traceback (most recent call last): File ./cqlsh, line 2262, in module main(*read_options(sys.argv[1:], os.environ)) File ./cqlsh, line 2248, in main display_float_precision=options.float_precision) File ./cqlsh, line 483, in __init__ cql_version=cqlver, transport=transport) File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, line 143, in connect File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, line 59, in __init__ File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py, line 157, in establish_connection File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, line 455, in login File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, line 476, in recv_login cql.cassandra.ttypes.AuthenticationException: AuthenticationException(why='org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM') -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5310) New authentication module does not wok in multi datacenters in case of network outage
[ https://issues.apache.org/jira/browse/CASSANDRA-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872183#comment-13872183 ] Scooletz commented on CASSANDRA-5310: - [~jbellis] Today I reproduced it on 2.0.3 with two datacenters: # DC1: 3 nodes with replication factor 3 # DC2: 1 node with replication factor 1 After shutting down the DC1 I got {quote} cql.cassandra.ttypes.AuthenticationException: AuthenticationException(why='org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM') {quote} Should I hit the user group with this? New authentication module does not wok in multi datacenters in case of network outage - Key: CASSANDRA-5310 URL: https://issues.apache.org/jira/browse/CASSANDRA-5310 Project: Cassandra Issue Type: Improvement Affects Versions: 1.2.2 Environment: Ubuntu 12.04 Cluster of 16 nodes in 2 datacenters (8 nodes in each datacenter) Reporter: jal Assignee: Aleksey Yeschenko Priority: Minor Fix For: 1.2.3 Attachments: auth_fix_consistency.patch With 1.2.2, I am using the new authentication backend PasswordAuthenticator with the authorizer CassandraAuthorizer In case of network outage, we are no more able to connect to Cassandra. Here is the error message we get when I want to connect through cqlsh: Traceback (most recent call last): File ./cqlsh, line 2262, in module main(*read_options(sys.argv[1:], os.environ)) File ./cqlsh, line 2248, in main display_float_precision=options.float_precision) File ./cqlsh, line 483, in __init__ cql_version=cqlver, transport=transport) File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, line 143, in connect File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, line 59, in __init__ File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py, line 157, in establish_connection File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, line 455, in login File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, line 476, in recv_login cql.cassandra.ttypes.AuthenticationException: AuthenticationException(why='org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM') -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (CASSANDRA-5310) New authentication module does not wok in multi datacenters in case of network outage
[ https://issues.apache.org/jira/browse/CASSANDRA-5310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13872646#comment-13872646 ] Scooletz commented on CASSANDRA-5310: - Oh! Great! Thx for the clarification :) New authentication module does not wok in multi datacenters in case of network outage - Key: CASSANDRA-5310 URL: https://issues.apache.org/jira/browse/CASSANDRA-5310 Project: Cassandra Issue Type: Improvement Affects Versions: 1.2.2 Environment: Ubuntu 12.04 Cluster of 16 nodes in 2 datacenters (8 nodes in each datacenter) Reporter: jal Assignee: Aleksey Yeschenko Priority: Minor Fix For: 1.2.3 Attachments: auth_fix_consistency.patch With 1.2.2, I am using the new authentication backend PasswordAuthenticator with the authorizer CassandraAuthorizer In case of network outage, we are no more able to connect to Cassandra. Here is the error message we get when I want to connect through cqlsh: Traceback (most recent call last): File ./cqlsh, line 2262, in module main(*read_options(sys.argv[1:], os.environ)) File ./cqlsh, line 2248, in main display_float_precision=options.float_precision) File ./cqlsh, line 483, in __init__ cql_version=cqlver, transport=transport) File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, line 143, in connect File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/connection.py, line 59, in __init__ File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/thrifteries.py, line 157, in establish_connection File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, line 455, in login File ./../lib/cql-internal-only-1.4.0.zip/cql-1.4.0/cql/cassandra/Cassandra.py, line 476, in recv_login cql.cassandra.ttypes.AuthenticationException: AuthenticationException(why='org.apache.cassandra.exceptions.UnavailableException: Cannot achieve consistency level QUORUM') -- This message was sent by Atlassian JIRA (v6.1.5#6160)