[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.
[ https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832314#comment-15832314 ] Anubhav Kale commented on CASSANDRA-4663: - OOF 1/20 > Streaming sends one file at a time serially. > - > > Key: CASSANDRA-4663 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4663 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Priority: Minor > Fix For: 3.x > > Attachments: > 0001-streaming-add-a-way-to-configure-the-number-of-conne.patch > > > This is not fast enough when someone is using SSD and may be 10G link. We > should try to create multiple connections and send multiple files in > parallel. > Current approach under utilize the link(even 1G). > This change will improve the bootstrapping time of a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8911) Consider Mutation-based Repairs
[ https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549228#comment-15549228 ] Anubhav Kale commented on CASSANDRA-8911: - Have we tested this on large scale yet ? Just curious about the future of this ticket. Thanks ! > Consider Mutation-based Repairs > --- > > Key: CASSANDRA-8911 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8911 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Assignee: Marcus Eriksson > Fix For: 3.x > > > We should consider a mutation-based repair to replace the existing streaming > repair. While we're at it, we could do away with a lot of the complexity > around merkle trees. > I have not planned this out in detail, but here's roughly what I'm thinking: > * Instead of building an entire merkle tree up front, just send the "leaves" > one-by-one. Instead of dealing with token ranges, make the leaves primary > key ranges. The PK ranges would need to be contiguous, so that the start of > each range would match the end of the previous range. (The first and last > leaves would need to be open-ended on one end of the PK range.) This would be > similar to doing a read with paging. > * Once one page of data is read, compute a hash of it and send it to the > other replicas along with the PK range that it covers and a row count. > * When the replicas receive the hash, the perform a read over the same PK > range (using a LIMIT of the row count + 1) and compare hashes (unless the row > counts don't match, in which case this can be skipped). > * If there is a mismatch, the replica will send a mutation covering that > page's worth of data (ignoring the row count this time) to the source node. > Here are the advantages that I can think of: > * With the current repair behavior of streaming, vnode-enabled clusters may > need to stream hundreds of small SSTables. This results in increased compact > ion load on the receiving node. With the mutation-based approach, memtables > would naturally merge these. > * It's simple to throttle. For example, you could give a number of rows/sec > that should be repaired. > * It's easy to see what PK range has been repaired so far. This could make > it simpler to resume a repair that fails midway. > * Inconsistencies start to be repaired almost right away. > * Less special code \(?\) > * Wide partitions are no longer a problem. > There are a few problems I can think of: > * Counters. I don't know if this can be made safe, or if they need to be > skipped. > * To support incremental repair, we need to be able to read from only > repaired sstables. Probably not too difficult to do. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.
[ https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342248#comment-15342248 ] Anubhav Kale commented on CASSANDRA-4663: - I ran some more tests on the original code and change with multiple sockets, and confirmed that the end-to-end time we see during streaming is a direct function of how long it takes for the sender to send bytes through (meaning sender is the only "slow" entity which makes the problem somewhat tangible). Then, I tested sending multiple files in parallel through some hacks, but as I was expecting it does not yield much improvements mainly because {{WritableByteChannel}} is a blocking channel across threads. >From docs, "Only one write operation upon a writable channel may be in >progress at any given time. If one thread initiates a write operation upon a >channel then any other thread that attempts to initiate another write >operation will block until the first operation is complete." We would need to move to {{AsynchronousSocketChannel}} to get true parallelism (which obviously is a deeper change - not impossible though). > Streaming sends one file at a time serially. > - > > Key: CASSANDRA-4663 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4663 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Priority: Minor > > This is not fast enough when someone is using SSD and may be 10G link. We > should try to create multiple connections and send multiple files in > parallel. > Current approach under utilize the link(even 1G). > This change will improve the bootstrapping time of a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.
[ https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340256#comment-15340256 ] Anubhav Kale commented on CASSANDRA-4663: - Agree with Paulo. I don't like SS Tables blowing up. I will spend some time on sending multiple files at a time, and see what it offers. > Streaming sends one file at a time serially. > - > > Key: CASSANDRA-4663 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4663 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Priority: Minor > > This is not fast enough when someone is using SSD and may be 10G link. We > should try to create multiple connections and send multiple files in > parallel. > Current approach under utilize the link(even 1G). > This change will improve the bootstrapping time of a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.
[ https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336787#comment-15336787 ] Anubhav Kale commented on CASSANDRA-4663: - I made a change to RangeStreamer to created multiple StreamSessions per host (Split token ranges into chunks equal to the number of sockets). I saw a performance improvement (time-wise) of ~33%. Since the same code is used for bootstrap and nodetool rebuild, it will help in both cases. The one side-effect that operators need to be aware of is the number of SS Tables created on destination (since they will blow up corresponding to number of splits). I suggest we could add a -par option for nodetool rebuild command and let operators provide number of connections. For bootstrap, we can provide yaml setting and default to 1. (If we do decide to add yaml setting, do I need to worry about any version breaking stuff?) If that makes sense, I will create a patch for trunk. > Streaming sends one file at a time serially. > - > > Key: CASSANDRA-4663 > URL: https://issues.apache.org/jira/browse/CASSANDRA-4663 > Project: Cassandra > Issue Type: Improvement >Reporter: sankalp kohli >Priority: Minor > > This is not fast enough when someone is using SSD and may be 10G link. We > should try to create multiple connections and send multiple files in > parallel. > Current approach under utilize the link(even 1G). > This change will improve the bootstrapping time of a node. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11374) LEAK DETECTED during repair
[ https://issues.apache.org/jira/browse/CASSANDRA-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11374: - Attachment: Leak_Logs_2.zip Leak_Logs_1.zip Attached Leak_Logs*.zip that show this error on Cassandra 2.1.13 while bootstrapping. This is a consistent repro for us. Our node size is ~300 GB. The process stays up after the leak message, but doesn't do much and the node is eventually removed from gossip (thus doesn't show up in gossipinfo / status on other nodes). The only workaround seems to be letting the node boot with auto_bootstrap=false and then do a nodetool rebuild. > LEAK DETECTED during repair > --- > > Key: CASSANDRA-11374 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11374 > Project: Cassandra > Issue Type: Bug >Reporter: Jean-Francois Gosselin >Assignee: Marcus Eriksson > Attachments: Leak_Logs_1.zip, Leak_Logs_2.zip > > > When running a range repair we are seeing the following LEAK DETECTED errors: > {noformat} > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,261 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@5ee90b43) to class > org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@367168611:[[OffHeapBitSet]] > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@4ea9d4a7) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1875396681:Memory@[7f34b905fd10..7f34b9060b7a) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@27a6b614) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@838594402:Memory@[7f34bae11ce0..7f34bae11d84) > was not released before the reference was garbage collected > ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,263 Ref.java:179 - LEAK > DETECTED: a reference > (org.apache.cassandra.utils.concurrent.Ref$State@64e7b566) to class > org.apache.cassandra.io.util.SafeMemory$MemoryTidy@674656075:Memory@[7f342deab4e0..7f342deb7ce0) > was not released before the reference was garbage collected > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11419) On local cassandra installations, rack-dc from ROOT/conf isn't honored.
Anubhav Kale created CASSANDRA-11419: Summary: On local cassandra installations, rack-dc from ROOT/conf isn't honored. Key: CASSANDRA-11419 URL: https://issues.apache.org/jira/browse/CASSANDRA-11419 Project: Cassandra Issue Type: Bug Reporter: Anubhav Kale Priority: Minor 1. Get the latest sources from trunk, build in eclipse. I am doing this on Windows BTW. 2. Run from Eclipse 3. Bug: The change in conf/cassandra-rackdc.properties isn't honored. Instead, the one in test/conf/cassandra-rackdc.properties is honored. Since yaml changes from conf/ are used, why don't we stay consistent for other files as well ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11407) Proposal for simplified DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11407: - Description: Today's DTCS implementation has been discussed and debated in a few JIRAs already (the notable one is https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main challenges with the current approach is that it is very difficult to reason about how the "Target" class makes buckets, thus making it difficult to reason about the expected file layout on disk. I am proposing a simplification to current approach that keeps most of the DTCS properties intact that makes it a great fit for time-series data. The simplification is as follows. Given the min and max timestamps across all SS Tables in question, start from min and make windows based on base and min_threshold. The logic in GetWindow simply tries to fit maximum sized windows from min to max. This keeps the DTCS properties intact except that we don't need to wait for min_threshold windows before making a bigger one. I would argue this simplifies the algorithm to a great extent, is easy to reason about and the end result isn't drastically different than the original DTCS in most cases. We give up on the "alignment" logic that exists in current implementation, but I honestly don't think it buys us a lot besides complexity. The implementation can obviously be optimized and cleaned up more if folks think this is a good idea. was: Today's DTCS implementation has been discussed and debated in a few JIRAs already (the notable one is https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main challenges with the current approach is that it is very difficult to reason about how the "Target" class makes buckets, thus making it difficult to reason about the expected file layout on disk. I am proposing a simplification to current approach that keeps most of the DTCS properties intact that makes it a great fit for time-series data. The simplification is as follows. Given the min and max timestamps across all SS Tables in question, start from min and make windows based on base and min_threshold. The logic in GetWindow simply tries to fit maximum sized windows from min to max. This keeps the DTCS properties intact except that we don't need to wait for min_threshold windows before making a bigger one. I would argue this simplifies the algorithm to a great extent, is easy to reason about and the end result isn't drastically different than the original DTCS in most cases. We give up on the "alignment" logic in current class, but I honestly don't think it buys us a lot besides complexity. The implementation can obviously be optimized and cleaned up more if folks think this is a good idea. > Proposal for simplified DTCS > > > Key: CASSANDRA-11407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11407 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Anubhav Kale > Attachments: 0001-Simple-DTCS.patch > > > Today's DTCS implementation has been discussed and debated in a few JIRAs > already (the notable one is > https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main > challenges with the current approach is that it is very difficult to reason > about how the "Target" class makes buckets, thus making it difficult to > reason about the expected file layout on disk. > I am proposing a simplification to current approach that keeps most of the > DTCS properties intact that makes it a great fit for time-series data. The > simplification is as follows. > Given the min and max timestamps across all SS Tables in question, start from > min and make windows based on base and min_threshold. The logic in GetWindow > simply tries to fit maximum sized windows from min to max. > This keeps the DTCS properties intact except that we don't need to wait for > min_threshold windows before making a bigger one. I would argue this > simplifies the algorithm to a great extent, is easy to reason about and the > end result isn't drastically different than the original DTCS in most cases. > We give up on the "alignment" logic that exists in current implementation, > but I honestly don't think it buys us a lot besides complexity. > The implementation can obviously be optimized and cleaned up more if folks > think this is a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11407) Proposal for a simple DTCS
Anubhav Kale created CASSANDRA-11407: Summary: Proposal for a simple DTCS Key: CASSANDRA-11407 URL: https://issues.apache.org/jira/browse/CASSANDRA-11407 Project: Cassandra Issue Type: Improvement Components: Compaction Reporter: Anubhav Kale Attachments: 0001-Simple-DTCS.patch Today's DTCS implementation has been discussed and debated in a few JIRAs already (the notable one is https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main challenges with the current approach is that it is very difficult to reason about how the "Target" class makes buckets, thus making it difficult to reason about the expected file layout on disk. I am proposing a simplification to current approach that keeps most of the DTCS properties intact that makes it a great fit for time-series data. The simplification is as follows. Given the min and max timestamps across all SS Tables in question, start from min and make windows based on base and min_threshold. The logic in GetWindow simply tries to fit maximum sized windows from min to max. This keeps the DTCS properties intact except that we don't need to wait for min_threshold windows before making a bigger one. I would argue this simplifies the algorithm to a great extent, is easy to reason about and the end result isn't drastically different than the original DTCS in most cases. We give up on the "alignment" logic in current class, but I honestly don't think it buys us a lot besides complexity. The implementation can obviously be optimized and cleaned up more if folks think this is a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11407) Proposal for simplified DTCS
[ https://issues.apache.org/jira/browse/CASSANDRA-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11407: - Summary: Proposal for simplified DTCS (was: Proposal for a simple DTCS) > Proposal for simplified DTCS > > > Key: CASSANDRA-11407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11407 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Anubhav Kale > Attachments: 0001-Simple-DTCS.patch > > > Today's DTCS implementation has been discussed and debated in a few JIRAs > already (the notable one is > https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main > challenges with the current approach is that it is very difficult to reason > about how the "Target" class makes buckets, thus making it difficult to > reason about the expected file layout on disk. > I am proposing a simplification to current approach that keeps most of the > DTCS properties intact that makes it a great fit for time-series data. The > simplification is as follows. > Given the min and max timestamps across all SS Tables in question, start from > min and make windows based on base and min_threshold. The logic in GetWindow > simply tries to fit maximum sized windows from min to max. > This keeps the DTCS properties intact except that we don't need to wait for > min_threshold windows before making a bigger one. I would argue this > simplifies the algorithm to a great extent, is easy to reason about and the > end result isn't drastically different than the original DTCS in most cases. > We give up on the "alignment" logic in current class, but I honestly don't > think it buys us a lot besides complexity. > The implementation can obviously be optimized and cleaned up more if folks > think this is a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196342#comment-15196342 ] Anubhav Kale commented on CASSANDRA-7276: - Attached a patch. It will require some more fit and finish, but take a look when you can. In CompactionManager Submit* methods, I took the liberty to print System.Cache as KS.CF instead of providing the overrides on Logger. > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-and-CF.patch, > 0001-Consistent-KS-and-Table-Logging.patch, > 0001-Logging-KS-and-CF-consistently.patch, > 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-7276: Attachment: 0001-Consistent-KS-and-Table-Logging.patch > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-and-CF.patch, > 0001-Consistent-KS-and-Table-Logging.patch, > 0001-Logging-KS-and-CF-consistently.patch, > 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195961#comment-15195961 ] Anubhav Kale commented on CASSANDRA-7276: - So, on the ContextualizedLogger class if we implement it from Logger and override all methods, there is chances of developers missing out the ones providing KS/CF wrappers and just logging the usual way. I am thinking if it would make more sense to not implement logger and provide wrappers only for what's needed thus keeping the non KS/CF aware methods to a minimum. Even this isn't bullet-proof, but may work better. WDYT ? > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-and-CF.patch, > 0001-Logging-KS-and-CF-consistently.patch, > 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11350) Max_SSTable_Age isn't really deprecated in DTCS
Anubhav Kale created CASSANDRA-11350: Summary: Max_SSTable_Age isn't really deprecated in DTCS Key: CASSANDRA-11350 URL: https://issues.apache.org/jira/browse/CASSANDRA-11350 Project: Cassandra Issue Type: Bug Components: Compaction Environment: PROD Reporter: Anubhav Kale Priority: Minor Based on the comments in https://issues.apache.org/jira/browse/CASSANDRA-10280, and changes made to DateTieredCompactionStrategyOptions.java, the Max_SSTable_Age field is marked as deprecated. However, this is still used to filter the old SS Tables in DateTieredCompactionStrategy.java. Once those tables are filtered, Max_Window_Size is used to limit how back in time we can go (essentially how Max_SSTable_Age was used previously). So I am somewhat confused on the exact use of these two fields. Should Max_SSTable_Age be really removed and Max_Window_Size be used to filter old tables (in which case it should be set to 1 year as well) ? Currently, Max_SSTable_Age = 1 Year, and Max_Window_Size = 1 Day. What is the expected behavior with these settings ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11168: - Attachment: 0001-Hinted-handoffs-fix.patch > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch, > 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch, > 0001-Hinted-handoffs-fix.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189855#comment-15189855 ] Anubhav Kale commented on CASSANDRA-11168: -- My bad on the trunk patch. Updated. > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch, > 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch, > 0001-Hinted-handoffs-fix.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11168: - Attachment: 0001-Hinted-Handoff-fix-2_2.patch > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch, > 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189739#comment-15189739 ] Anubhav Kale commented on CASSANDRA-11168: -- I have attached for 2.2 as well. > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch, > 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189658#comment-15189658 ] Anubhav Kale commented on CASSANDRA-7276: - Thanks for the suggestion. I did consider making up a new Logger class, but I wasn't sure if that was the recommended approach. Do we think this approach is what would like to roll with ? We went back and forth a bit on this, so might be better to agree on the approach first before making the changes (esp because it touches so many files and requires constant rebasing). > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-and-CF.patch, > 0001-Logging-KS-and-CF-consistently.patch, > 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11168: - Attachment: 0001-Hinted-handoff-metrics.patch > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch, > 0001-Hinted-handoff-metrics.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188197#comment-15188197 ] Anubhav Kale commented on CASSANDRA-11168: -- Updated patch. Not really sure if its really necessary to be back-ported though. > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch, > 0001-Hinted-handoff-metrics.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176707#comment-15176707 ] Anubhav Kale commented on CASSANDRA-11168: -- So, making sure I understand Aleksey's thought this correctly, what we want is below. Can you confirm (only increment if hint window expired) ? if (DatabaseDescriptor.hintedHandoffEnabled()) { Set disabledDCs = DatabaseDescriptor.hintedHandoffDisabledDCs(); if (!disabledDCs.isEmpty()) { final String dc = DatabaseDescriptor.getEndpointSnitch().getDatacenter(ep); if (disabledDCs.contains(dc)) { Tracing.trace("Not hinting {} since its data center {} has been disabled {}", ep, dc, disabledDCs); return false; } } boolean hintWindowExpired = Gossiper.instance.getEndpointDowntime(ep) > DatabaseDescriptor.getMaxHintWindow(); if (hintWindowExpired) { HintsService.instance.metrics.incrPastWindow(ep); Tracing.trace("Not hinting {} which has been down {} ms", ep, Gossiper.instance.getEndpointDowntime(ep)); } return !hintWindowExpired; } else { return false; } > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-7276: Attachment: 0001-Logging-KS-and-CF-consistently.patch Another try. Addressed comments. > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-and-CF.patch, > 0001-Logging-KS-and-CF-consistently.patch, > 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11166) Range tombstones not accounted in tracing/cfstats
[ https://issues.apache.org/jira/browse/CASSANDRA-11166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157290#comment-15157290 ] Anubhav Kale commented on CASSANDRA-11166: -- Thanks for the update. Based on the code in SliceQueryFilter (2.1.9 Tag) where the TombstoneoverwhelmingException is thrown, it appears that range tombstones don't contribute to this counting. Is this the expected behavior (seems wrong to me) ? So, I am not sure if this is just a logging issue or has more implications. > Range tombstones not accounted in tracing/cfstats > - > > Key: CASSANDRA-11166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11166 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Priority: Minor > > I noticed an inconsistent behavior on deletes. Not sure if it is intentional. > The summary is: > If a table is created with TTL or if rows are inserted in a table using TTL, > when its time to expire the row, tombstone is generated (as expected) and > cfstats, cqlsh tracing and sstable2json show it. > However, if one executes a delete from table query followed by a select *, > neither cql tracing nor cfstats shows a tombstone being present. However, > sstable2json shows a tombstone. > Is this situation treated differently on purpose ? In such a situation, does > Cassandra not have to scan tombstones (seems odd) ? > Also as a data point, if one executes a delete from table, > cqlsh tracing, nodetool cfstats, and sstable2json all show a consistent > result (tombstone being present). > As a end user, I'd assume that deleting a row either via TTL or explicitly > should show me a tombstone. Is this expectation reasonable ? If not, can this > behavior be clearly documented ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11166) Inconsistent behavior on Tombstones
[ https://issues.apache.org/jira/browse/CASSANDRA-11166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154851#comment-15154851 ] Anubhav Kale commented on CASSANDRA-11166: -- Any thoughts on this ? > Inconsistent behavior on Tombstones > --- > > Key: CASSANDRA-11166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11166 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Priority: Minor > > I noticed an inconsistent behavior on deletes. Not sure if it is intentional. > The summary is: > If a table is created with TTL or if rows are inserted in a table using TTL, > when its time to expire the row, tombstone is generated (as expected) and > cfstats, cqlsh tracing and sstable2json show it. > However, if one executes a delete from table query followed by a select *, > neither cql tracing nor cfstats shows a tombstone being present. However, > sstable2json shows a tombstone. > Is this situation treated differently on purpose ? In such a situation, does > Cassandra not have to scan tombstones (seems odd) ? > Also as a data point, if one executes a delete from table, > cqlsh tracing, nodetool cfstats, and sstable2json all show a consistent > result (tombstone being present). > As a end user, I'd assume that deleting a row either via TTL or explicitly > should show me a tombstone. Is this expectation reasonable ? If not, can this > behavior be clearly documented ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11168: - Attachment: 0001-Hinted-Handoff-Fix.patch > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-Hinted-Handoff-Fix.patch > > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154819#comment-15154819 ] Anubhav Kale commented on CASSANDRA-7276: - Submitted. I tested this locally by forcing exceptions through code changes. > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-and-CF.patch, > 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-7276: Attachment: 0001-Better-Logging-for-KS-and-CF.patch > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-and-CF.patch, > 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11166) Inconsistent behavior on Tombstones
Anubhav Kale created CASSANDRA-11166: Summary: Inconsistent behavior on Tombstones Key: CASSANDRA-11166 URL: https://issues.apache.org/jira/browse/CASSANDRA-11166 Project: Cassandra Issue Type: Bug Reporter: Anubhav Kale Priority: Minor I noticed an inconsistent behavior on deletes. Not sure if it is intentional. The summary is: If a table is created with TTL or if rows are inserted in a table using TTL, when its time to expire the row, tombstone is generated (as expected) and cfstats, cqlsh tracing and sstable2json show it. However, if one executes a delete from table query followed by a select *, neither cql tracing nor cfstats shows a tombstone being present. However, sstable2json shows a tombstone. Is this situation treated differently on purpose ? In such a situation, does Cassandra not have to scan tombstones (seems odd) ? Also as a data point, if one executes a delete from table, cqlsh tracing, nodetool cfstats, and sstable2json all show a consistent result (tombstone being present). As a end user, I'd assume that deleting a row either via TTL or explicitly should show me a tombstone. Is this expectation reasonable ? If not, can this behavior be clearly documented ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
Anubhav Kale created CASSANDRA-11168: Summary: Hint Metrics are updated even if hinted_hand-offs=false Key: CASSANDRA-11168 URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 Project: Cassandra Issue Type: Bug Reporter: Anubhav Kale Priority: Minor In our PROD logs, we noticed a lot of hint metrics even though we have disabled hinted handoffs. The reason is StorageProxy.ShouldHint has an inverted if condition. We should also wrap the if (hintWindowExpired) block in if (DatabaseDescriptor.hintedHandoffEnabled()) as well. The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false
[ https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11168: - Description: In our PROD logs, we noticed a lot of hint metrics even though we have disabled hinted handoffs. The reason is StorageProxy.ShouldHint has an inverted if condition. We should also wrap the if (hintWindowExpired) block in if (DatabaseDescriptor.hintedHandoffEnabled()). The fix is easy, and I can provide a patch. was: In our PROD logs, we noticed a lot of hint metrics even though we have disabled hinted handoffs. The reason is StorageProxy.ShouldHint has an inverted if condition. We should also wrap the if (hintWindowExpired) block in if (DatabaseDescriptor.hintedHandoffEnabled()) as well. The fix is easy, and I can provide a patch. > Hint Metrics are updated even if hinted_hand-offs=false > --- > > Key: CASSANDRA-11168 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11168 > Project: Cassandra > Issue Type: Bug >Reporter: Anubhav Kale >Priority: Minor > > In our PROD logs, we noticed a lot of hint metrics even though we have > disabled hinted handoffs. > The reason is StorageProxy.ShouldHint has an inverted if condition. > We should also wrap the if (hintWindowExpired) block in if > (DatabaseDescriptor.hintedHandoffEnabled()). > The fix is easy, and I can provide a patch. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11160) Use UUID for SS Table Filenames
Anubhav Kale created CASSANDRA-11160: Summary: Use UUID for SS Table Filenames Key: CASSANDRA-11160 URL: https://issues.apache.org/jira/browse/CASSANDRA-11160 Project: Cassandra Issue Type: Improvement Reporter: Anubhav Kale Priority: Minor Today, Cassandra uses monotonically increasing counter to generate SS Table file names. While this works practically, wouldn't it be safer / better if UUIDs are used in file names to make them really unique ? AFAIK, no code paths rely on such counters being part of files. A specific scenario where this will really help is below: In backup / restore model, suppose we move files out to some other storage. In that process, we can optimize by not moving files that were already backed up using a check on file names (which we can't do easily today because if the node went down, a file with same name can be generated). Note that using incremental backups is not a viable option here, because we lose the benefits of compaction (as discussed in my last comment of CASSANDRA-10960). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143017#comment-15143017 ] Anubhav Kale commented on CASSANDRA-7276: - Any thoughts here ? > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Logging-for-Keyspace-and-Tables.patch, > 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, > cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down
[ https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-11143: - Description: We saw a problem similar to what I describe below in our PROD environment a few times. Below is a consistent repro. We can change the priority to Minor since there is a workaround, though. Using steps from http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301, setup a two node cluster locally. . Bring up both nodes . Create a table, and ensure cqlsh is correctly showing it on both nodes. . Bring down one node . Drop and re-create the same table Or change some schema in the table. . Bring up the down node. You will notice the exceptions like below (because of schema mismatch), and the new schema never propagates to this node that was down ((meaning a select * via cqlsh will continue to show old schema for the table). I let the cluster run for an hour to see if gossip will somehow catch up. However, the interesting part is if you restart this node that was down when schema changes were made, the exception below goes away and it gets new schema correctly. What is it caching that a second restart is necessary to make it behave correctly ? ERROR 00:23:33 Configuration exception merging remote schema org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected e2839010-cf7e-11e5-a13b-fb6871b443fb) at org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783) ~[main/:na] at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) ~[main/:na] at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) ~[main/:na] at org.apach was: We saw a problem similar to what I describe below in our PROD environment a few times. Below is a consistent repro. We can change the priority to Minor since there is a workaround, though. Using steps from http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301, setup a two node cluster locally. . Bring up both nodes . Create a table, and ensure cqlsh is correctly showing it on both nodes. . Bring down one node . Drop and re-create the same table Or change some schema in the table. . Bring up the down node. You will notice the exceptions like below (because of schema mismatch), and the new schema never propagates to this node that was down ((meaning cqlsh will continue to show old schema for the table). I let the cluster run for an hour to see if gossip will somehow catch up. However, the interesting part is if you restart this node that was down when schema changes were made, the exception below goes away and it gets new schema correctly. What is it caching that a second restart is necessary to make it behave correctly ? ERROR 00:23:33 Configuration exception merging remote schema org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected e2839010-cf7e-11e5-a13b-fb6871b443fb) at org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783) ~[main/:na] at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) ~[main/:na] at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) ~[main/:na] at org.apach > Schema changes don't propagate correctly if nodes are down > -- > > Key: CASSANDRA-11143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11143 > Project: Cassandra > Issue Type: Bug > Environment: PROD >Reporter: Anubhav Kale > > We saw a problem similar to what I describe below in our PROD environment a > few times. Below is a consistent repro. We can change the priority to Minor > since there is a workaround, though. > Using steps from > http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301, > setup a two node cluster locally. > . Bring up both nodes > . Create a table, and ensure cqlsh is correctly showing it on both nodes. > . Bring down one node > . Drop and re-create the same table Or change some schema in the table. > . Bring up the down node. > You will notice the exceptions like below (because of schema mismatch), and > the new schema never propagates to this node that was down ((meaning a > select * via cqlsh will continue to show old schema for the table). I let the > cluster run for an hour to see if gossip will somehow catch up. > However, the interesting part is if you restart this node that was down when > schema changes were made, the
[jira] [Commented] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down
[ https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141799#comment-15141799 ] Anubhav Kale commented on CASSANDRA-11143: -- After digging through code, it appears that the cached data in CFMetadata isn't refreshed when system_schema.tables is changed in SchemaKeyspace.MergeSchema (mutations.forEach line). This leads to the check in validateCompatibility failing. On reboot, the node refreshes this data from disk so everything works correctly from that point onward. Is this the expected behavior ? Seems odd to me. > Schema changes don't propagate correctly if nodes are down > -- > > Key: CASSANDRA-11143 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11143 > Project: Cassandra > Issue Type: Bug > Environment: PROD >Reporter: Anubhav Kale > > We saw a problem similar to what I describe below in our PROD environment a > few times. Below is a consistent repro. We can change the priority to Minor > since there is a workaround, though. > Using steps from > http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301, > setup a two node cluster locally. > . Bring up both nodes > . Create a table, and ensure cqlsh is correctly showing it on both nodes. > . Bring down one node > . Drop and re-create the same table Or change some schema in the table. > . Bring up the down node. > You will notice the exceptions like below (because of schema mismatch), and > the new schema never propagates to this node that was down ((meaning a > select * via cqlsh will continue to show old schema for the table). I let the > cluster run for an hour to see if gossip will somehow catch up. > However, the interesting part is if you restart this node that was down when > schema changes were made, the exception below goes away and it gets new > schema correctly. > What is it caching that a second restart is necessary to make it behave > correctly ? > ERROR 00:23:33 Configuration exception merging remote schema > org.apache.cassandra.exceptions.ConfigurationException: Column family ID > mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected > e2839010-cf7e-11e5-a13b-fb6871b443fb) > at > org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783) > ~[main/:na] > at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) > ~[main/:na] > at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) > ~[main/:na] > at org.apach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-7276: Attachment: (was: 0001-Better-Logging-for-KS-CF.patch) > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Logging-for-Keyspace-and-Tables.patch, > 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, > cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-7276: Attachment: 0001-Logging-for-Keyspace-and-Tables.patch > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Logging-for-Keyspace-and-Tables.patch, > 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, > cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down
Anubhav Kale created CASSANDRA-11143: Summary: Schema changes don't propagate correctly if nodes are down Key: CASSANDRA-11143 URL: https://issues.apache.org/jira/browse/CASSANDRA-11143 Project: Cassandra Issue Type: Bug Environment: PROD Reporter: Anubhav Kale We saw a problem similar to what I describe below in our PROD environment a few times. Below is a consistent repro. We can change the priority to Minor since there is a workaround, though. Using steps from http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301, setup a two node cluster locally. . Bring up both nodes . Create a table, and ensure cqlsh is correctly showing it on both nodes. . Bring down one node . Drop and re-create the same table Or change some schema in the table. . Bring up the down node. You will notice the exceptions like below (because of schema mismatch), and the new schema never propagates to this node that was down ((meaning cqlsh will continue to show old schema for the table). I let the cluster run for an hour to see if gossip will somehow catch up. However, the interesting part is if you restart this node that was down when schema changes were made, the exception below goes away and it gets new schema correctly. What is it caching that a second restart is necessary to make it behave correctly ? ERROR 00:23:33 Configuration exception merging remote schema org.apache.cassandra.exceptions.ConfigurationException: Column family ID mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected e2839010-cf7e-11e5-a13b-fb6871b443fb) at org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783) ~[main/:na] at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) ~[main/:na] at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) ~[main/:na] at org.apach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-11142) Confusing error message on schema updates when nodes are down
Anubhav Kale created CASSANDRA-11142: Summary: Confusing error message on schema updates when nodes are down Key: CASSANDRA-11142 URL: https://issues.apache.org/jira/browse/CASSANDRA-11142 Project: Cassandra Issue Type: Bug Environment: PROD Reporter: Anubhav Kale Priority: Minor Repro steps are as follows (this was tested on Windows and is a consistent repro) . Start a two node cluster. . Ensure that "nodetool status" shows both nodes as UN on both nodes . Stop Node2 . Ensure that "nodetool status" shows that Node2 in DN. . Start cqlsh on Node1 . Create a table . cqlsh times out with below message (coming from .py) Warning: schema version mismatch detected, which might be caused by DOWN nodes; if this is not the case, check the schema versions of your nodes in system.local and system.peers. OperationTimedOut: errors={}, last_host=10.1.0.10 . Do a select * on the table that just timed out. It works fine. It just seems odd that there are no errors, but the table gets created fine. We should either fix the timeout exception with a real error or not throw timeout. Not sure what the best approach is. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137825#comment-15137825 ] Anubhav Kale commented on CASSANDRA-7276: - I will take a stab at this. > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-7276: Attachment: 0001-Better-Logging-for-KS-CF.patch Attached first cut of this. With current approach, I have omitted BatchLog*VerbHandler and Repair*VerbHandler since they operate on Collection of mutations. This would mean we change the interface to collection of KS, instead of just the KS as originally suggested. We can do that, but we may lose the goal we are after here if MessageDeliveryTask simply prints a collection of KS and CF when something goes wrong. I took a stab at manually updating log stmts in KFS and CompactionManager. We can add logs in other places later once this first pass is committed (to keep merges sane). Also, there is a possibility of introducing a base class for IKeyspaceAwareVerbHandler but I did not do that for readability sake (else the tree starts getting too deep). In logs, should we use KS and CF or Keyspace and Table ? I don't believe there is a consistent pattern as such. > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement >Reporter: Tyler Hobbs >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 0001-Better-Logging-for-KS-CF.patch, > 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, > cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10962) Cassandra should not create snapshot at restart for compactions_in_progress
[ https://issues.apache.org/jira/browse/CASSANDRA-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134875#comment-15134875 ] Anubhav Kale commented on CASSANDRA-10962: -- This does not repro on latest bits. Also, listsnapshots does not list any system table snapshots by design: >From StorageService.getSnapshotDetails for (Keyspace keyspace : Keyspace.all()) { if (Schema.isSystemKeyspace(keyspace.getName())) continue; > Cassandra should not create snapshot at restart for compactions_in_progress > --- > > Key: CASSANDRA-10962 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10962 > Project: Cassandra > Issue Type: Bug > Environment: Ubuntu 14.04.3 LTS >Reporter: FACORAT >Priority: Minor > > If auto_snapshot is set to true in cassandra.yaml, each time you restart > Cassandra, a snapshot is created for system.compactions_in_progress as the > table is truncated at cassandra start. > However as datas in this table are temporary, Cassandra should not create > snapshot for this table (or maybe even for system.* tables). This will be > coherent with the fact that "nodetool listsnapshots" doesn't even list this > table. > Exemple: > $ nodetool listsnapshots | grep compactions > $ ls -lh > system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/snapshots/ > total 16K > drwxr-xr-x 2 cassandra cassandra 4.0K Nov 30 13:12 > 1448885530280-compactions_in_progress > drwxr-xr-x 2 cassandra cassandra 4.0K Dec 7 15:36 > 1449498977181-compactions_in_progress > drwxr-xr-x 2 cassandra cassandra 4.0K Dec 14 18:20 > 1450113621506-compactions_in_progress > drwxr-xr-x 2 cassandra cassandra 4.0K Jan 4 12:53 > 1451908396364-compactions_in_progress -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
[ https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10907: - Attachment: 0001-Skip-Flush-option-for-Snapshot.patch I initally went down the route of boolean option (didn't quite like it myself but felt less weird than inspecting array elements). I have changed that now. Addressed other comments and made the tests robust. > Nodetool snapshot should provide an option to skip flushing > --- > > Key: CASSANDRA-10907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 > Project: Cassandra > Issue Type: Improvement > Components: Configuration > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Labels: lhf > Attachments: 0001-Skip-Flush-for-snapshots.patch, > 0001-Skip-Flush-option-for-Snapshot.patch, > 0001-Skip-Flush-option-for-Snapshot.patch, 0001-flush.patch > > > For some practical scenarios, it doesn't matter if the data is flushed to > disk before taking a snapshot. However, it's better to save some flushing > time to make snapshot process quick. > As such, it will be a good idea to provide this option to snapshot command. > The wiring from nodetool to MBean to VerbHandler should be easy. > I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
[ https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10907: - Attachment: 0001-Skip-Flush-option-for-Snapshot.patch 0001-Skip-Flush-for-snapshots.patch Sorry about the delay. Modified per comments. There is a lot of scope for cleaning up existing methods, but I am not doing that for now. I did add a Boolean to detect if KS / CF was passed to the proposed signature to make things easy. Tested locally, and ensured existing functionality is not broken. > Nodetool snapshot should provide an option to skip flushing > --- > > Key: CASSANDRA-10907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 > Project: Cassandra > Issue Type: Improvement > Components: Configuration > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Labels: lhf > Attachments: 0001-Skip-Flush-for-snapshots.patch, > 0001-Skip-Flush-option-for-Snapshot.patch, 0001-flush.patch > > > For some practical scenarios, it doesn't matter if the data is flushed to > disk before taking a snapshot. However, it's better to save some flushing > time to make snapshot process quick. > As such, it will be a good idea to provide this option to snapshot command. > The wiring from nodetool to MBean to VerbHandler should be easy. > I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10866: - Attachment: 0002-Dropped-Mutations-Count.patch Rebased. > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Components: Observability, Tools > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.x > > Attachments: 0001-CF-Dropped-Mutation-Stats.patch, > 0001-CFCount.patch, 0002-Dropped-Mutations-Count.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089947#comment-15089947 ] Anubhav Kale commented on CASSANDRA-10866: -- Thanks. > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Components: Observability, Tools > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.x > > Attachments: 0001-CF-Dropped-Mutation-Stats.patch, > 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
[ https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083489#comment-15083489 ] Anubhav Kale commented on CASSANDRA-10907: -- Any updates here ? > Nodetool snapshot should provide an option to skip flushing > --- > > Key: CASSANDRA-10907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 > Project: Cassandra > Issue Type: Improvement > Components: Configuration > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Labels: lhf > Attachments: 0001-flush.patch > > > For some practical scenarios, it doesn't matter if the data is flushed to > disk before taking a snapshot. However, it's better to save some flushing > time to make snapshot process quick. > As such, it will be a good idea to provide this option to snapshot command. > The wiring from nodetool to MBean to VerbHandler should be easy. > I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083486#comment-15083486 ] Anubhav Kale commented on CASSANDRA-10866: -- Any updates here ? > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Components: Observability, Tools > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.x > > Attachments: 0001-CF-Dropped-Mutation-Stats.patch, > 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder
[ https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081741#comment-15081741 ] Anubhav Kale commented on CASSANDRA-10960: -- This is not about manually deleting old backup folders (that's okay). This is about C* not deleting the files from backups when those were deleted as part of compaction. Why is that by design -- can you please elaborate ? > Compaction should delete old files from incremental backups folder > -- > > Key: CASSANDRA-10960 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10960 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > > When compaction runs the old flushed SS Tables from backups folder are not > deleted. If folks need to move the backups folder somewhere outside the > cluster, recovery becomes slower because unnecessary files need to be copied > back. > Is this behavior by design ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder
[ https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale reopened CASSANDRA-10960: -- > Compaction should delete old files from incremental backups folder > -- > > Key: CASSANDRA-10960 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10960 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > > When compaction runs the old flushed SS Tables from backups folder are not > deleted. If folks need to move the backups folder somewhere outside the > cluster, recovery becomes slower because unnecessary files need to be copied > back. > Is this behavior by design ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder
[ https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081833#comment-15081833 ] Anubhav Kale commented on CASSANDRA-10960: -- Here is a scenario: Time t1: KS/CF/s1.db s2.db KS/CF/backups/s1.db s2.db Time t2: KS/CF/s1.db s2.db s3.db KS/CF/backups/s1.db s2.db s3.db [Since anytime SS Table is flushed its written to backups as well] Time t3 (Compaction ran): KS/CF/s4.db KS/CF/backups/s1.db s2.db s3.db s4.db This is existing behavior - correct ? The data hasn't changed in here, its simply represented via s4. It is reasonable to keep s1,s2,s3,s4 in backups so that folks can go back to any point in time. However, if folks want to move data from backups to elsewhere outside C* and copy it back during recovery -- it adds unnecessary burden of copying the same data multiple times (copying back s4 should have been enough here for recovery). Does this make sense ? Please let me know if I did not understand something correctly here. > Compaction should delete old files from incremental backups folder > -- > > Key: CASSANDRA-10960 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10960 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > > When compaction runs the old flushed SS Tables from backups folder are not > deleted. If folks need to move the backups folder somewhere outside the > cluster, recovery becomes slower because unnecessary files need to be copied > back. > Is this behavior by design ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder
[ https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082020#comment-15082020 ] Anubhav Kale commented on CASSANDRA-10960: -- Thanks for the explanation. While I don't want to continue the conversation here, IMHO C* need to enable a behavior where "old" ss tables from backups are deleted whenever they are deleted as part of compaction from actual folders. Else, too much duplicate data has to be moved back to nodes at the time of recovery. Specific scenario is when backups need to be moved outside of Cassandra, else current behavior is good enough. > Compaction should delete old files from incremental backups folder > -- > > Key: CASSANDRA-10960 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10960 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > > When compaction runs the old flushed SS Tables from backups folder are not > deleted. If folks need to move the backups folder somewhere outside the > cluster, recovery becomes slower because unnecessary files need to be copied > back. > Is this behavior by design ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder
Anubhav Kale created CASSANDRA-10960: Summary: Compaction should delete old files from incremental backups folder Key: CASSANDRA-10960 URL: https://issues.apache.org/jira/browse/CASSANDRA-10960 Project: Cassandra Issue Type: Improvement Components: Compaction Environment: PROD Reporter: Anubhav Kale Priority: Minor When compaction runs the old flushed SS Tables from backups folder are not deleted. If folks need to move the backups folder somewhere outside the cluster, recovery becomes slower because unnecessary files need to be copied back. Is this behavior by design ? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10866: - Attachment: 0001-CF-Dropped-Mutation-Stats.patch > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Components: Observability, Tools > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.x > > Attachments: 0001-CF-Dropped-Mutation-Stats.patch, > 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10866: - Attachment: (was: 0001-CF-Dropped-Mutation-Stats.patch) > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Components: Observability, Tools > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.x > > Attachments: 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073178#comment-15073178 ] Anubhav Kale commented on CASSANDRA-10866: -- Attached. Please take a look when you get a chance ! > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Components: Observability, Tools > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.x > > Attachments: 0001-CF-Dropped-Mutation-Stats.patch, > 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10866: - Attachment: 0001-CF-Dropped-Mutation-Stats.patch > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Components: Observability, Tools > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.x > > Attachments: 0001-CF-Dropped-Mutation-Stats.patch, > 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069937#comment-15069937 ] Anubhav Kale commented on CASSANDRA-10580: -- Thanks for your help in working through this. > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2 > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069936#comment-15069936 ] Anubhav Kale commented on CASSANDRA-10866: -- Attached 10866-Trunk.patch. > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10866: - Attachment: 10866-Trunk.patch > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-CFCount.patch, 10866-Trunk.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
[ https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070095#comment-15070095 ] Anubhav Kale commented on CASSANDRA-10907: -- For point in time backups, its always somewhat unpredictable what data is backed up especially with replication on. The concern here is the unnecessary time and resources spent in a blocking flush when its not really required. I have provided a patch. Its possible to provide overrides at other places, I took a stab at providing those on KS and CF and did the wiring. If you prefer some other approach, let me know. > Nodetool snapshot should provide an option to skip flushing > --- > > Key: CASSANDRA-10907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 > Project: Cassandra > Issue Type: Improvement > Components: Configuration > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Labels: lhf > > For some practical scenarios, it doesn't matter if the data is flushed to > disk before taking a snapshot. However, it's better to save some flushing > time to make snapshot process quick. > As such, it will be a good idea to provide this option to snapshot command. > The wiring from nodetool to MBean to VerbHandler should be easy. > I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
[ https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10907: - Attachment: 0001-flush.patch > Nodetool snapshot should provide an option to skip flushing > --- > > Key: CASSANDRA-10907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 > Project: Cassandra > Issue Type: Improvement > Components: Configuration > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Labels: lhf > Attachments: 0001-flush.patch > > > For some practical scenarios, it doesn't matter if the data is flushed to > disk before taking a snapshot. However, it's better to save some flushing > time to make snapshot process quick. > As such, it will be a good idea to provide this option to snapshot command. > The wiring from nodetool to MBean to VerbHandler should be easy. > I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10936) Provide option to repair from a data center in "nodetool repair"
Anubhav Kale created CASSANDRA-10936: Summary: Provide option to repair from a data center in "nodetool repair" Key: CASSANDRA-10936 URL: https://issues.apache.org/jira/browse/CASSANDRA-10936 Project: Cassandra Issue Type: Improvement Components: Tools Environment: PROD Reporter: Anubhav Kale Priority: Minor Sometimes, its known that the correct / latest data resides in a Data Center. It would be useful if nodetool repair can provide a "Source DC" option to source the data from. This will save a ton of traffic on the network. There are some discussions around this in CASSANDRA-6552. A case in point where this is handy: People may want to backup data from a designated data center (so that only one copy of data is backed up) to remote storage (azure / AWS). At restore time once the data is restored to this DC, other data centers can "source" data from this through "nodetool repair -- source Foo". -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068505#comment-15068505 ] Anubhav Kale commented on CASSANDRA-10580: -- Any updates here ? It seems like another rebasing might be necessary for Trunk patch. If the patch looks okay, can the committer please take care of rebasing when it gets committed ? > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
[ https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066956#comment-15066956 ] Anubhav Kale commented on CASSANDRA-10907: -- I agree that what is backed up will be undefined. In my opinion, the trap is very clear here so I don't think it can be misused. IMHO, the other nodetool commands have such traps as well so this is no different (e.g. why does scrub have an option to not snapshot ?). That said, if you feel strongly against this, I understand and we can kill this (I can always make a local patch). BTW I can't use incremental backups, because I do not want to ship SS Table files that would have been removed as part of compaction. When compaction kicks in and deletes some files, it won't remove them from backups (which makes sense else it won't be incremental). So, at the time of recovery we are moving too many files back thus increasing the downtime of Apps. If I am not understanding something correctly here, please let me know ! > Nodetool snapshot should provide an option to skip flushing > --- > > Key: CASSANDRA-10907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 > Project: Cassandra > Issue Type: Improvement > Components: Configuration > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Labels: lhf > > For some practical scenarios, it doesn't matter if the data is flushed to > disk before taking a snapshot. However, it's better to save some flushing > time to make snapshot process quick. > As such, it will be a good idea to provide this option to snapshot command. > The wiring from nodetool to MBean to VerbHandler should be easy. > I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067289#comment-15067289 ] Anubhav Kale commented on CASSANDRA-10866: -- Thanks. I included the Collection because I did not realize that SCHEMA_* verb isn't part of DROPPABLE_VERBs. Good point. I'll submit a rebased patch shortly. > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Attachments: 0001-CFCount.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066779#comment-15066779 ] Anubhav Kale commented on CASSANDRA-10866: -- Any updates here ? > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Attachments: 0001-CFCount.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
[ https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066788#comment-15066788 ] Anubhav Kale commented on CASSANDRA-10907: -- We plan to move backups outside the nodes. So, when a snapshot is taken it would be ideal for it to be fast (thus not flush) so that it can be moved out as quickly as possible. We have enough replication so we can tolerate the data loss because the memtable wasn't flushed. Do you feel strongly against it ? > Nodetool snapshot should provide an option to skip flushing > --- > > Key: CASSANDRA-10907 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 > Project: Cassandra > Issue Type: Improvement > Components: Configuration > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Labels: lhf > > For some practical scenarios, it doesn't matter if the data is flushed to > disk before taking a snapshot. However, it's better to save some flushing > time to make snapshot process quick. > As such, it will be a good idea to provide this option to snapshot command. > The wiring from nodetool to MBean to VerbHandler should be easy. > I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing
Anubhav Kale created CASSANDRA-10907: Summary: Nodetool snapshot should provide an option to skip flushing Key: CASSANDRA-10907 URL: https://issues.apache.org/jira/browse/CASSANDRA-10907 Project: Cassandra Issue Type: Improvement Components: Configuration Environment: PROD Reporter: Anubhav Kale Priority: Minor For some practical scenarios, it doesn't matter if the data is flushed to disk before taking a snapshot. However, it's better to save some flushing time to make snapshot process quick. As such, it will be a good idea to provide this option to snapshot command. The wiring from nodetool to MBean to VerbHandler should be easy. I can provide a patch if this makes sense. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063020#comment-15063020 ] Anubhav Kale commented on CASSANDRA-10866: -- I have provided a patch on top of https://issues.apache.org/jira/secure/attachment/12777927/Trunk-All-Comments.patch I can add Unit Tests in messaging service but I did not see a pattern for checking metrics thorough unit tests so skipping those. The change is verified on a local installation through visual VM. It is possible to optimize the KS/CF lookup in MessagingService.java (maybe through a Map), but I am hoping that's not necessary. > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Attachments: 0001-CFCount.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10866: - Attachment: 0001-CFCount.patch > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > Attachments: 0001-CFCount.patch > > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063212#comment-15063212 ] Anubhav Kale commented on CASSANDRA-10580: -- Thanks. When will the decision about 2.2 be made ? I think its better to produce 3.0 patch at that point to avoid merge conflicts agaun. What do you think ? > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060800#comment-15060800 ] Anubhav Kale commented on CASSANDRA-10580: -- I have re-created 2.2-All-Comments.patch. I tested this applies locally on a 2.2 branch pulled in another directory (via git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra-2.2). Can you please confirm this works correctly for you ? > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: (was: 2.2-All-Comments.patch) > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: 2.2-All-Comments.patch > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060307#comment-15060307 ] Anubhav Kale commented on CASSANDRA-10580: -- I am not sure what's going on with 2.2. I will look at it again today. I do the following to generate patches -- can you please confirm if that's what everyone follows ? 1. Change files 2. git status (Confirm that the changes are correct) 3. git add . 4. git commit -m "Foo" 5. git format-patch HEAD~ About the tests, I don't think the failures are related to my change. Are those failures expected ? > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060376#comment-15060376 ] Anubhav Kale commented on CASSANDRA-10580: -- My HEAD is indeed pointing to different location: f1c3df0e848638735c790b4817adf6411a52a064. I pulled down 2.2. via below and then made changes. git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra-2.2 I will dig some more on why this did not work correctly. > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061241#comment-15061241 ] Anubhav Kale commented on CASSANDRA-10580: -- That explains it. I will get you the correct patch. Thanks for the explanation. > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: 2.2-All-Comments.patch > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: (was: 2.2-All-Comments.patch) > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061285#comment-15061285 ] Anubhav Kale commented on CASSANDRA-10580: -- Attached 2.2 patch. Sorry about the goofup. > Add latency metrics for dropped messages > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination, Observability > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059191#comment-15059191 ] Anubhav Kale commented on CASSANDRA-10580: -- Can someone review this so we don't have to deal with too many merge conflicts later ? Thanks a lot ! > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10866: - Summary: Column Family should expose count metrics for dropped mutations. (was: Column Family should expose latency metrics for dropped mutations.) > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.
[ https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059214#comment-15059214 ] Anubhav Kale commented on CASSANDRA-10866: -- That makes sense. Updating title to reflect. > Column Family should expose count metrics for dropped mutations. > > > Key: CASSANDRA-10866 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 > Project: Cassandra > Issue Type: Improvement > Environment: PROD >Reporter: Anubhav Kale >Priority: Minor > > Please take a look at the discussion in CASSANDRA-10580. This is opened so > that the latency on dropped mutations is exposed as a metric on column > families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Comment: was deleted (was: Trunk patch with all comments addressed => 0001-Mutation.patch) > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059535#comment-15059535 ] Anubhav Kale commented on CASSANDRA-10580: -- Thanks. Attached *-All-Comments.patch for trunk and 2.2. Let me know if those look good. > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: 2.2-All-Comments.patch Trunk-All-Comments.patch > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: (was: 0001-Mutation.patch) > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, > Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: 0001-Mutation.patch Trunk patch with all comments addressed => 0001-Mutation.patch > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 0001-Mutation.patch, > 10580-Metrics.patch, 10580.patch, CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: 0001-Metrics.patch Patch for exposing this data as JMX metrics > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056808#comment-15056808 ] Anubhav Kale edited comment on CASSANDRA-10580 at 12/14/15 10:12 PM: - Thanks for the pointers. Please take a look at the latest patch. I have tested it via Visual VM and the new metrics work well. It is interesting that I could not find any documentation that Metrics return results (.getSnapshot.GetMean()) in nanoseconds therefore callers must convert themselves. Noting this here so other Devs can save some time on this. I'll file a separate JIRA for doing the same change on a per ks/cf basis. Let me know this makes sense, or more changes are needed. was (Author: anubhavk): Patch for exposing this data as JMX metrics > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10866) Column Family should expose latency metrics for dropped mutations.
Anubhav Kale created CASSANDRA-10866: Summary: Column Family should expose latency metrics for dropped mutations. Key: CASSANDRA-10866 URL: https://issues.apache.org/jira/browse/CASSANDRA-10866 Project: Cassandra Issue Type: Improvement Environment: PROD Reporter: Anubhav Kale Priority: Minor Please take a look at the discussion in CASSANDRA-10580. This is opened so that the latency on dropped mutations is exposed as a metric on column families. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056843#comment-15056843 ] Anubhav Kale commented on CASSANDRA-10580: -- CASSANDRA-10866 is opened to track per CF work. > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054502#comment-15054502 ] Anubhav Kale commented on CASSANDRA-10580: -- Thanks for the interesting suggestion. Actually I considered going down that route when I started working on it, but I just wasn't sure what the rationale / design philosophy behind adding new metrics was therefore took a simpler route. Glad to see your feedback. I have attached 10580-metrics.patch and will open a separate JIRA for doing this on a CF basis. I am using ApproximateTime class wherever its not taking part in decision of dropping the mutation and simply used for logging. I hope that makes sense. I can clean up the methods in MessagingService a bit more if you like (couple of them are printing the same message). I wanted to send this out first to make sure I was on the right path. Also, a question: It appears that Timer.Update appends entries to the metric (which is what we want). Do you know at what point it starts dropping new appends / starts giving up ? I wonder if there is a huge number of dropped mutations will the timeTaken metric mess up ? To make this work for CF, I will probably pass the mutation to MessagingService.LogDroppedMessages (maybe through an overload) and update the metrics on appropriate CF. Does that make sense ? If this change looks good, I am more inclined towards making this work for CF before making up patches for old branches. Let me know if that's okay. Appreciate your time and feedback ! > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: 10580-Metrics.patch > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: 10580-Metrics.patch > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: (was: 10580-Metrics.patch) > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580-Metrics.patch, 10580.patch, > CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051604#comment-15051604 ] Anubhav Kale commented on CASSANDRA-10580: -- Attached Trunk.patch. Let me know if that looks good. > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580.patch, CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anubhav Kale updated CASSANDRA-10580: - Attachment: Trunk.patch > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580.patch, CASSANDRA-10580-Head.patch, Trunk.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.
[ https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049489#comment-15049489 ] Anubhav Kale commented on CASSANDRA-10580: -- Thanks for the review. I will work on the patches. Quick Question about the Null Ref Exception on empty constructors. I did not see this happen in my local run. Are you suggesting this is a pattern enforced by logging library and therefore, I should have constructor with (say) bool arg, and use that wherever I was using default constructor ? Sorry, I just am not clear on why something like this necessary. > On dropped mutations, more details should be logged. > > > Key: CASSANDRA-10580 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10580 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Environment: Production >Reporter: Anubhav Kale >Assignee: Anubhav Kale >Priority: Minor > Fix For: 3.2, 2.2.x > > Attachments: 10580.patch, CASSANDRA-10580-Head.patch > > > In our production cluster, we are seeing a large number of dropped mutations. > At a minimum, we should print the time the thread took to get scheduled > thereby dropping the mutation (We should also print the Message / Mutation so > it helps in figuring out which column family was affected). This will help > find the right tuning parameter for write_timeout_in_ms. > The change is small and is in StorageProxy.java and MessagingTask.java. I > will submit a patch shortly. -- This message was sent by Atlassian JIRA (v6.3.4#6332)