[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.

2017-01-20 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15832314#comment-15832314
 ] 

Anubhav Kale commented on CASSANDRA-4663:
-

OOF 1/20










> Streaming sends one file at a time serially. 
> -
>
> Key: CASSANDRA-4663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 
> 0001-streaming-add-a-way-to-configure-the-number-of-conne.patch
>
>
> This is not fast enough when someone is using SSD and may be 10G link. We 
> should try to create multiple connections and send multiple files in 
> parallel. 
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8911) Consider Mutation-based Repairs

2016-10-05 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549228#comment-15549228
 ] 

Anubhav Kale commented on CASSANDRA-8911:
-

Have we tested this on large scale yet ? Just curious about the future of this 
ticket. Thanks !

> Consider Mutation-based Repairs
> ---
>
> Key: CASSANDRA-8911
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8911
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Assignee: Marcus Eriksson
> Fix For: 3.x
>
>
> We should consider a mutation-based repair to replace the existing streaming 
> repair.  While we're at it, we could do away with a lot of the complexity 
> around merkle trees.
> I have not planned this out in detail, but here's roughly what I'm thinking:
>  * Instead of building an entire merkle tree up front, just send the "leaves" 
> one-by-one.  Instead of dealing with token ranges, make the leaves primary 
> key ranges.  The PK ranges would need to be contiguous, so that the start of 
> each range would match the end of the previous range. (The first and last 
> leaves would need to be open-ended on one end of the PK range.) This would be 
> similar to doing a read with paging.
>  * Once one page of data is read, compute a hash of it and send it to the 
> other replicas along with the PK range that it covers and a row count.
>  * When the replicas receive the hash, the perform a read over the same PK 
> range (using a LIMIT of the row count + 1) and compare hashes (unless the row 
> counts don't match, in which case this can be skipped).
>  * If there is a mismatch, the replica will send a mutation covering that 
> page's worth of data (ignoring the row count this time) to the source node.
> Here are the advantages that I can think of:
>  * With the current repair behavior of streaming, vnode-enabled clusters may 
> need to stream hundreds of small SSTables.  This results in increased compact
> ion load on the receiving node.  With the mutation-based approach, memtables 
> would naturally merge these.
>  * It's simple to throttle.  For example, you could give a number of rows/sec 
> that should be repaired.
>  * It's easy to see what PK range has been repaired so far.  This could make 
> it simpler to resume a repair that fails midway.
>  * Inconsistencies start to be repaired almost right away.
>  * Less special code \(?\)
>  * Wide partitions are no longer a problem.
> There are a few problems I can think of:
>  * Counters.  I don't know if this can be made safe, or if they need to be 
> skipped.
>  * To support incremental repair, we need to be able to read from only 
> repaired sstables.  Probably not too difficult to do.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.

2016-06-21 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15342248#comment-15342248
 ] 

Anubhav Kale commented on CASSANDRA-4663:
-

I ran some more tests on the original code and change with multiple sockets, 
and confirmed that the end-to-end time we see during streaming is a direct 
function of how long it takes for the sender to send bytes through (meaning 
sender is the only "slow" entity which makes the problem somewhat tangible).

Then, I tested sending multiple files in parallel through some hacks, but as I 
was expecting it does not yield much improvements mainly because 
{{WritableByteChannel}} is a blocking channel across threads. 

>From docs, "Only one write operation upon a writable channel may be in 
>progress at any given time. If one thread initiates a write operation upon a 
>channel then any other thread that attempts to initiate another write 
>operation will block until the first operation is complete."

We would need to move to {{AsynchronousSocketChannel}} to get true parallelism 
(which obviously is a deeper change - not impossible though).


> Streaming sends one file at a time serially. 
> -
>
> Key: CASSANDRA-4663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
>
> This is not fast enough when someone is using SSD and may be 10G link. We 
> should try to create multiple connections and send multiple files in 
> parallel. 
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.

2016-06-20 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340256#comment-15340256
 ] 

Anubhav Kale commented on CASSANDRA-4663:
-

Agree with Paulo. I don't like SS Tables blowing up. I will spend some time on 
sending multiple files at a time, and see what it offers. 

> Streaming sends one file at a time serially. 
> -
>
> Key: CASSANDRA-4663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
>
> This is not fast enough when someone is using SSD and may be 10G link. We 
> should try to create multiple connections and send multiple files in 
> parallel. 
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-4663) Streaming sends one file at a time serially.

2016-06-17 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-4663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336787#comment-15336787
 ] 

Anubhav Kale commented on CASSANDRA-4663:
-

I made a change to RangeStreamer to created multiple StreamSessions per host 
(Split token ranges into chunks equal to the number of sockets). I saw a 
performance improvement (time-wise) of ~33%. 

Since the same code is used for bootstrap and nodetool rebuild, it will help in 
both cases. The one side-effect that operators need to be aware of is the 
number of SS Tables created on destination (since they will blow up 
corresponding to number of splits).

I suggest we could add a -par option for nodetool rebuild command and let 
operators provide number of connections. For bootstrap, we can provide yaml 
setting and default to 1. (If we do decide to add yaml setting, do I need to 
worry about any version breaking stuff?)

If that makes sense, I will create a patch for trunk. 

> Streaming sends one file at a time serially. 
> -
>
> Key: CASSANDRA-4663
> URL: https://issues.apache.org/jira/browse/CASSANDRA-4663
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Priority: Minor
>
> This is not fast enough when someone is using SSD and may be 10G link. We 
> should try to create multiple connections and send multiple files in 
> parallel. 
> Current approach under utilize the link(even 1G).
> This change will improve the bootstrapping time of a node. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11374) LEAK DETECTED during repair

2016-04-13 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11374:
-
Attachment: Leak_Logs_2.zip
Leak_Logs_1.zip

Attached Leak_Logs*.zip that show this error on Cassandra 2.1.13 while 
bootstrapping. This is a consistent repro for us. Our node size is ~300 GB.

The process stays up after the leak message, but doesn't do much and the node 
is eventually removed from gossip (thus doesn't show up in gossipinfo / status 
on other nodes).

The only workaround seems to be letting the node boot with auto_bootstrap=false 
and then do a nodetool rebuild.

> LEAK DETECTED during repair
> ---
>
> Key: CASSANDRA-11374
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11374
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jean-Francois Gosselin
>Assignee: Marcus Eriksson
> Attachments: Leak_Logs_1.zip, Leak_Logs_2.zip
>
>
> When running a range repair we are seeing the following LEAK DETECTED errors:
> {noformat}
> ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,261 Ref.java:179 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@5ee90b43) to class 
> org.apache.cassandra.utils.concurrent.WrappedSharedCloseable$1@367168611:[[OffHeapBitSet]]
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@4ea9d4a7) to class 
> org.apache.cassandra.io.util.SafeMemory$MemoryTidy@1875396681:Memory@[7f34b905fd10..7f34b9060b7a)
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,262 Ref.java:179 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@27a6b614) to class 
> org.apache.cassandra.io.util.SafeMemory$MemoryTidy@838594402:Memory@[7f34bae11ce0..7f34bae11d84)
>  was not released before the reference was garbage collected
> ERROR [Reference-Reaper:1] 2016-03-17 06:58:52,263 Ref.java:179 - LEAK 
> DETECTED: a reference 
> (org.apache.cassandra.utils.concurrent.Ref$State@64e7b566) to class 
> org.apache.cassandra.io.util.SafeMemory$MemoryTidy@674656075:Memory@[7f342deab4e0..7f342deb7ce0)
>  was not released before the reference was garbage collected
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11419) On local cassandra installations, rack-dc from ROOT/conf isn't honored.

2016-03-23 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11419:


 Summary: On local cassandra installations, rack-dc from ROOT/conf 
isn't honored.
 Key: CASSANDRA-11419
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11419
 Project: Cassandra
  Issue Type: Bug
Reporter: Anubhav Kale
Priority: Minor


1. Get the latest sources from trunk, build in eclipse. I am doing this on 
Windows BTW.
2. Run from Eclipse
3. Bug: The change in conf/cassandra-rackdc.properties isn't honored. Instead, 
the one in test/conf/cassandra-rackdc.properties is honored.

Since yaml changes from conf/ are used, why don't we stay consistent for other 
files as well ?




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11407) Proposal for simplified DTCS

2016-03-22 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11407:
-
Description: 
Today's DTCS implementation has been discussed and debated in a few JIRAs 
already (the notable one is 
https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main 
challenges with the current approach is that it is very difficult to reason 
about how the "Target" class makes buckets, thus making it difficult to reason 
about the expected file layout on disk.

I am proposing a simplification to current approach that keeps most of the DTCS 
properties intact that makes it a great fit for time-series data. The 
simplification is as follows.

Given the min and max timestamps across all SS Tables in question, start from 
min and make windows based on base and min_threshold. The logic in GetWindow 
simply tries to fit maximum sized windows from min to max. 

This keeps the DTCS properties intact except that we don't need to wait for 
min_threshold windows before making a bigger one. I would argue this simplifies 
the algorithm to a great extent, is easy to reason about and the end result 
isn't drastically different than the original DTCS in most cases. We give up on 
the "alignment" logic that exists in current implementation, but I honestly 
don't think it buys us a lot besides complexity.

The implementation can obviously be optimized and cleaned up more if folks 
think this is a good idea. 






  was:
Today's DTCS implementation has been discussed and debated in a few JIRAs 
already (the notable one is 
https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main 
challenges with the current approach is that it is very difficult to reason 
about how the "Target" class makes buckets, thus making it difficult to reason 
about the expected file layout on disk.

I am proposing a simplification to current approach that keeps most of the DTCS 
properties intact that makes it a great fit for time-series data. The 
simplification is as follows.

Given the min and max timestamps across all SS Tables in question, start from 
min and make windows based on base and min_threshold. The logic in GetWindow 
simply tries to fit maximum sized windows from min to max. 

This keeps the DTCS properties intact except that we don't need to wait for 
min_threshold windows before making a bigger one. I would argue this simplifies 
the algorithm to a great extent, is easy to reason about and the end result 
isn't drastically different than the original DTCS in most cases. We give up on 
the "alignment" logic in current class, but I honestly don't think it buys us a 
lot besides complexity.

The implementation can obviously be optimized and cleaned up more if folks 
think this is a good idea. 







> Proposal for simplified DTCS
> 
>
> Key: CASSANDRA-11407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Anubhav Kale
> Attachments: 0001-Simple-DTCS.patch
>
>
> Today's DTCS implementation has been discussed and debated in a few JIRAs 
> already (the notable one is 
> https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main 
> challenges with the current approach is that it is very difficult to reason 
> about how the "Target" class makes buckets, thus making it difficult to 
> reason about the expected file layout on disk.
> I am proposing a simplification to current approach that keeps most of the 
> DTCS properties intact that makes it a great fit for time-series data. The 
> simplification is as follows.
> Given the min and max timestamps across all SS Tables in question, start from 
> min and make windows based on base and min_threshold. The logic in GetWindow 
> simply tries to fit maximum sized windows from min to max. 
> This keeps the DTCS properties intact except that we don't need to wait for 
> min_threshold windows before making a bigger one. I would argue this 
> simplifies the algorithm to a great extent, is easy to reason about and the 
> end result isn't drastically different than the original DTCS in most cases. 
> We give up on the "alignment" logic that exists in current implementation, 
> but I honestly don't think it buys us a lot besides complexity.
> The implementation can obviously be optimized and cleaned up more if folks 
> think this is a good idea. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11407) Proposal for a simple DTCS

2016-03-22 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11407:


 Summary: Proposal for a simple DTCS
 Key: CASSANDRA-11407
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11407
 Project: Cassandra
  Issue Type: Improvement
  Components: Compaction
Reporter: Anubhav Kale
 Attachments: 0001-Simple-DTCS.patch

Today's DTCS implementation has been discussed and debated in a few JIRAs 
already (the notable one is 
https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main 
challenges with the current approach is that it is very difficult to reason 
about how the "Target" class makes buckets, thus making it difficult to reason 
about the expected file layout on disk.

I am proposing a simplification to current approach that keeps most of the DTCS 
properties intact that makes it a great fit for time-series data. The 
simplification is as follows.

Given the min and max timestamps across all SS Tables in question, start from 
min and make windows based on base and min_threshold. The logic in GetWindow 
simply tries to fit maximum sized windows from min to max. 

This keeps the DTCS properties intact except that we don't need to wait for 
min_threshold windows before making a bigger one. I would argue this simplifies 
the algorithm to a great extent, is easy to reason about and the end result 
isn't drastically different than the original DTCS in most cases. We give up on 
the "alignment" logic in current class, but I honestly don't think it buys us a 
lot besides complexity.

The implementation can obviously be optimized and cleaned up more if folks 
think this is a good idea. 








--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11407) Proposal for simplified DTCS

2016-03-22 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11407:
-
Summary: Proposal for simplified DTCS  (was: Proposal for a simple DTCS)

> Proposal for simplified DTCS
> 
>
> Key: CASSANDRA-11407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11407
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
>Reporter: Anubhav Kale
> Attachments: 0001-Simple-DTCS.patch
>
>
> Today's DTCS implementation has been discussed and debated in a few JIRAs 
> already (the notable one is 
> https://issues.apache.org/jira/browse/CASSANDRA-9666). One of the main 
> challenges with the current approach is that it is very difficult to reason 
> about how the "Target" class makes buckets, thus making it difficult to 
> reason about the expected file layout on disk.
> I am proposing a simplification to current approach that keeps most of the 
> DTCS properties intact that makes it a great fit for time-series data. The 
> simplification is as follows.
> Given the min and max timestamps across all SS Tables in question, start from 
> min and make windows based on base and min_threshold. The logic in GetWindow 
> simply tries to fit maximum sized windows from min to max. 
> This keeps the DTCS properties intact except that we don't need to wait for 
> min_threshold windows before making a bigger one. I would argue this 
> simplifies the algorithm to a great extent, is easy to reason about and the 
> end result isn't drastically different than the original DTCS in most cases. 
> We give up on the "alignment" logic in current class, but I honestly don't 
> think it buys us a lot besides complexity.
> The implementation can obviously be optimized and cleaned up more if folks 
> think this is a good idea. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-03-15 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15196342#comment-15196342
 ] 

Anubhav Kale commented on CASSANDRA-7276:
-

Attached a patch. It will require some more fit and finish, but take a look 
when you can. 

In CompactionManager Submit* methods, I took the liberty to print System.Cache 
as KS.CF instead of providing the overrides on Logger.

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-and-CF.patch, 
> 0001-Consistent-KS-and-Table-Logging.patch, 
> 0001-Logging-KS-and-CF-consistently.patch, 
> 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-03-15 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-7276:

Attachment: 0001-Consistent-KS-and-Table-Logging.patch

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-and-CF.patch, 
> 0001-Consistent-KS-and-Table-Logging.patch, 
> 0001-Logging-KS-and-CF-consistently.patch, 
> 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-03-15 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15195961#comment-15195961
 ] 

Anubhav Kale commented on CASSANDRA-7276:
-

So, on the ContextualizedLogger class if we implement it from Logger and 
override all methods, there is chances of developers missing out the ones 
providing KS/CF wrappers and just logging the usual way. I am thinking if it 
would make more sense to not implement logger and provide wrappers only for 
what's needed thus keeping the non KS/CF aware methods to a minimum. Even this 
isn't bullet-proof, but may work better. WDYT ?

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-and-CF.patch, 
> 0001-Logging-KS-and-CF-consistently.patch, 
> 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11350) Max_SSTable_Age isn't really deprecated in DTCS

2016-03-14 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11350:


 Summary: Max_SSTable_Age isn't really deprecated in DTCS
 Key: CASSANDRA-11350
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11350
 Project: Cassandra
  Issue Type: Bug
  Components: Compaction
 Environment: PROD
Reporter: Anubhav Kale
Priority: Minor


Based on the comments in https://issues.apache.org/jira/browse/CASSANDRA-10280, 
and changes made to DateTieredCompactionStrategyOptions.java, the 
Max_SSTable_Age field is marked as deprecated.

However, this is still used to filter the old SS Tables in 
DateTieredCompactionStrategy.java. Once those tables are filtered, 
Max_Window_Size is used to limit how back in time we can go (essentially how 
Max_SSTable_Age was used previously).

So I am somewhat confused on the exact use of these two fields. Should 
Max_SSTable_Age be really removed and Max_Window_Size be used to filter old 
tables (in which case it should be set to 1 year as well) ?

Currently, Max_SSTable_Age = 1 Year, and Max_Window_Size = 1 Day. What is the 
expected behavior with these settings ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-03-10 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11168:
-
Attachment: 0001-Hinted-handoffs-fix.patch

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch, 
> 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch, 
> 0001-Hinted-handoffs-fix.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-03-10 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189855#comment-15189855
 ] 

Anubhav Kale commented on CASSANDRA-11168:
--

My bad on the trunk patch. Updated. 

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch, 
> 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch, 
> 0001-Hinted-handoffs-fix.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-03-10 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11168:
-
Attachment: 0001-Hinted-Handoff-fix-2_2.patch

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch, 
> 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-03-10 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189739#comment-15189739
 ] 

Anubhav Kale commented on CASSANDRA-11168:
--

I have attached for 2.2 as well.

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch, 
> 0001-Hinted-Handoff-fix-2_2.patch, 0001-Hinted-handoff-metrics.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-03-10 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15189658#comment-15189658
 ] 

Anubhav Kale commented on CASSANDRA-7276:
-

Thanks for the suggestion. I did consider making up a new Logger class, but I 
wasn't sure if that was the recommended approach.

Do we think this approach is what would like to roll with ? We went back and 
forth a bit on this, so might be better to agree on the approach first before 
making the changes (esp because it touches so many files and requires constant 
rebasing).

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-and-CF.patch, 
> 0001-Logging-KS-and-CF-consistently.patch, 
> 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-03-09 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11168:
-
Attachment: 0001-Hinted-handoff-metrics.patch

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch, 
> 0001-Hinted-handoff-metrics.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-03-09 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15188197#comment-15188197
 ] 

Anubhav Kale commented on CASSANDRA-11168:
--

Updated patch. Not really sure if its really necessary to be back-ported though.

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch, 
> 0001-Hinted-handoff-metrics.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-03-02 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15176707#comment-15176707
 ] 

Anubhav Kale commented on CASSANDRA-11168:
--

So, making sure I understand Aleksey's thought this correctly, what we want is 
below. Can you confirm (only increment if hint window expired) ?

if (DatabaseDescriptor.hintedHandoffEnabled())
{
Set disabledDCs = 
DatabaseDescriptor.hintedHandoffDisabledDCs();
if (!disabledDCs.isEmpty())
{
final String dc = 
DatabaseDescriptor.getEndpointSnitch().getDatacenter(ep);
if (disabledDCs.contains(dc))
{
Tracing.trace("Not hinting {} since its 
data center {} has been disabled {}", ep, dc, disabledDCs);

return false;
}
}

boolean hintWindowExpired = 
Gossiper.instance.getEndpointDowntime(ep) > 
DatabaseDescriptor.getMaxHintWindow();
if (hintWindowExpired)
{

HintsService.instance.metrics.incrPastWindow(ep);
Tracing.trace("Not hinting {} which has been 
down {} ms", ep, Gossiper.instance.getEndpointDowntime(ep));
}
return !hintWindowExpired;
}
else
{
return false;
}

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-29 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-7276:

Attachment: 0001-Logging-KS-and-CF-consistently.patch

Another try. Addressed comments. 

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-and-CF.patch, 
> 0001-Logging-KS-and-CF-consistently.patch, 
> 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11166) Range tombstones not accounted in tracing/cfstats

2016-02-22 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15157290#comment-15157290
 ] 

Anubhav Kale commented on CASSANDRA-11166:
--

Thanks for the update. 

Based on the code in SliceQueryFilter (2.1.9 Tag) where the 
TombstoneoverwhelmingException is thrown, it appears that range tombstones 
don't contribute to this counting. Is this the expected behavior (seems wrong 
to me) ? So, I am not sure if this is just a logging issue or has more 
implications.

> Range tombstones not accounted in tracing/cfstats
> -
>
> Key: CASSANDRA-11166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11166
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Priority: Minor
>
> I noticed an inconsistent behavior on deletes. Not sure if it is intentional. 
> The summary is:
> If a table is created with TTL or if rows are inserted in a table using TTL, 
> when its time to expire the row, tombstone is generated (as expected) and 
> cfstats, cqlsh tracing and sstable2json show it.
> However, if one executes a delete from table query followed by a select *, 
> neither cql tracing nor cfstats shows a tombstone being present. However, 
> sstable2json shows a tombstone.
> Is this situation treated differently on purpose ? In such a situation, does 
> Cassandra not have to scan tombstones (seems odd) ?
> Also as a data point, if one executes a delete  from table, 
> cqlsh tracing, nodetool cfstats, and sstable2json all show a consistent 
> result (tombstone being present).
> As a end user, I'd assume that deleting a row either via TTL or explicitly 
> should show me a tombstone. Is this expectation reasonable ? If not, can this 
> behavior be clearly documented ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11166) Inconsistent behavior on Tombstones

2016-02-19 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154851#comment-15154851
 ] 

Anubhav Kale commented on CASSANDRA-11166:
--

Any thoughts on this ?

> Inconsistent behavior on Tombstones
> ---
>
> Key: CASSANDRA-11166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11166
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Priority: Minor
>
> I noticed an inconsistent behavior on deletes. Not sure if it is intentional. 
> The summary is:
> If a table is created with TTL or if rows are inserted in a table using TTL, 
> when its time to expire the row, tombstone is generated (as expected) and 
> cfstats, cqlsh tracing and sstable2json show it.
> However, if one executes a delete from table query followed by a select *, 
> neither cql tracing nor cfstats shows a tombstone being present. However, 
> sstable2json shows a tombstone.
> Is this situation treated differently on purpose ? In such a situation, does 
> Cassandra not have to scan tombstones (seems odd) ?
> Also as a data point, if one executes a delete  from table, 
> cqlsh tracing, nodetool cfstats, and sstable2json all show a consistent 
> result (tombstone being present).
> As a end user, I'd assume that deleting a row either via TTL or explicitly 
> should show me a tombstone. Is this expectation reasonable ? If not, can this 
> behavior be clearly documented ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-02-19 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11168:
-
Attachment: 0001-Hinted-Handoff-Fix.patch

> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-Hinted-Handoff-Fix.patch
>
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-19 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154819#comment-15154819
 ] 

Anubhav Kale commented on CASSANDRA-7276:
-

Submitted. I tested this locally by forcing exceptions through code changes. 

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-and-CF.patch, 
> 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-19 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-7276:

Attachment: 0001-Better-Logging-for-KS-and-CF.patch

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-and-CF.patch, 
> 0001-Logging-for-Keyspace-and-Tables.patch, 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11166) Inconsistent behavior on Tombstones

2016-02-12 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11166:


 Summary: Inconsistent behavior on Tombstones
 Key: CASSANDRA-11166
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11166
 Project: Cassandra
  Issue Type: Bug
Reporter: Anubhav Kale
Priority: Minor


I noticed an inconsistent behavior on deletes. Not sure if it is intentional. 

The summary is:

If a table is created with TTL or if rows are inserted in a table using TTL, 
when its time to expire the row, tombstone is generated (as expected) and 
cfstats, cqlsh tracing and sstable2json show it.

However, if one executes a delete from table query followed by a select *, 
neither cql tracing nor cfstats shows a tombstone being present. However, 
sstable2json shows a tombstone.

Is this situation treated differently on purpose ? In such a situation, does 
Cassandra not have to scan tombstones (seems odd) ?

Also as a data point, if one executes a delete  from table, cqlsh 
tracing, nodetool cfstats, and sstable2json all show a consistent result 
(tombstone being present).

As a end user, I'd assume that deleting a row either via TTL or explicitly 
should show me a tombstone. Is this expectation reasonable ? If not, can this 
behavior be clearly documented ?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-02-12 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11168:


 Summary: Hint Metrics are updated even if hinted_hand-offs=false
 Key: CASSANDRA-11168
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
 Project: Cassandra
  Issue Type: Bug
Reporter: Anubhav Kale
Priority: Minor


In our PROD logs, we noticed a lot of hint metrics even though we have disabled 
hinted handoffs.

The reason is StorageProxy.ShouldHint has an inverted if condition. We should 
also wrap the if (hintWindowExpired) block in if 
(DatabaseDescriptor.hintedHandoffEnabled()) as well.

The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11168) Hint Metrics are updated even if hinted_hand-offs=false

2016-02-12 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11168:
-
Description: 
In our PROD logs, we noticed a lot of hint metrics even though we have disabled 
hinted handoffs.

The reason is StorageProxy.ShouldHint has an inverted if condition. 
We should also wrap the if (hintWindowExpired) block in if 
(DatabaseDescriptor.hintedHandoffEnabled()).

The fix is easy, and I can provide a patch.

  was:
In our PROD logs, we noticed a lot of hint metrics even though we have disabled 
hinted handoffs.

The reason is StorageProxy.ShouldHint has an inverted if condition. We should 
also wrap the if (hintWindowExpired) block in if 
(DatabaseDescriptor.hintedHandoffEnabled()) as well.

The fix is easy, and I can provide a patch.


> Hint Metrics are updated even if hinted_hand-offs=false
> ---
>
> Key: CASSANDRA-11168
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11168
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Anubhav Kale
>Priority: Minor
>
> In our PROD logs, we noticed a lot of hint metrics even though we have 
> disabled hinted handoffs.
> The reason is StorageProxy.ShouldHint has an inverted if condition. 
> We should also wrap the if (hintWindowExpired) block in if 
> (DatabaseDescriptor.hintedHandoffEnabled()).
> The fix is easy, and I can provide a patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11160) Use UUID for SS Table Filenames

2016-02-11 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11160:


 Summary: Use UUID for SS Table Filenames
 Key: CASSANDRA-11160
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11160
 Project: Cassandra
  Issue Type: Improvement
Reporter: Anubhav Kale
Priority: Minor


Today, Cassandra uses monotonically increasing counter to generate SS Table 
file names. While this works practically, wouldn't it be safer / better if 
UUIDs are used in file names to make them really unique ? AFAIK, no code paths 
rely on such counters being part of files. 

A specific scenario where this will really help is below:

In backup / restore model, suppose we move files out to some other storage. In 
that process, we can optimize by not moving files that were already backed up 
using a check on file names (which we can't do easily today because if the node 
went down, a file with same name can be generated). Note that using incremental 
backups is not a viable option here, because we lose the benefits of compaction 
(as discussed in my last comment of CASSANDRA-10960).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-11 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143017#comment-15143017
 ] 

Anubhav Kale commented on CASSANDRA-7276:
-

Any thoughts here ?

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Logging-for-Keyspace-and-Tables.patch, 
> 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, 
> cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down

2016-02-10 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-11143:
-
Description: 
We saw a problem similar to what I describe below in our PROD environment a few 
times. Below is a consistent repro. We can change the priority to Minor since 
there is a workaround, though.

Using steps from 
http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
 setup a two node cluster locally. 

. Bring up both nodes
. Create a table, and ensure cqlsh is correctly showing it on both nodes.
. Bring down one node
. Drop and re-create the same table Or change some schema in the table.
. Bring up the down node.

You will notice the exceptions like below (because of schema mismatch), and the 
new schema never propagates to this node that was down ((meaning  a select * 
via cqlsh will continue to show old schema for the table). I let the cluster 
run for an hour to see if gossip will somehow catch up. 

However, the interesting part is if you restart this node that was down when 
schema changes were made, the exception below goes away and it gets new schema 
correctly. 

What is it caching that a second restart is necessary to make it behave 
correctly ?

ERROR 00:23:33 Configuration exception merging remote schema
org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
e2839010-cf7e-11e5-a13b-fb6871b443fb)
at 
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
 ~[main/:na]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
~[main/:na]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
~[main/:na]
at org.apach


  was:
We saw a problem similar to what I describe below in our PROD environment a few 
times. Below is a consistent repro. We can change the priority to Minor since 
there is a workaround, though.

Using steps from 
http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
 setup a two node cluster locally. 

. Bring up both nodes
. Create a table, and ensure cqlsh is correctly showing it on both nodes.
. Bring down one node
. Drop and re-create the same table Or change some schema in the table.
. Bring up the down node.

You will notice the exceptions like below (because of schema mismatch), and the 
new schema never propagates to this node that was down ((meaning cqlsh will 
continue to show old schema for the table). I let the cluster run for an hour 
to see if gossip will somehow catch up. 

However, the interesting part is if you restart this node that was down when 
schema changes were made, the exception below goes away and it gets new schema 
correctly. 

What is it caching that a second restart is necessary to make it behave 
correctly ?

ERROR 00:23:33 Configuration exception merging remote schema
org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
e2839010-cf7e-11e5-a13b-fb6871b443fb)
at 
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
 ~[main/:na]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
~[main/:na]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
~[main/:na]
at org.apach



> Schema changes don't propagate correctly if nodes are down
> --
>
> Key: CASSANDRA-11143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11143
> Project: Cassandra
>  Issue Type: Bug
> Environment: PROD
>Reporter: Anubhav Kale
>
> We saw a problem similar to what I describe below in our PROD environment a 
> few times. Below is a consistent repro. We can change the priority to Minor 
> since there is a workaround, though.
> Using steps from 
> http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
>  setup a two node cluster locally. 
> . Bring up both nodes
> . Create a table, and ensure cqlsh is correctly showing it on both nodes.
> . Bring down one node
> . Drop and re-create the same table Or change some schema in the table.
> . Bring up the down node.
> You will notice the exceptions like below (because of schema mismatch), and 
> the new schema never propagates to this node that was down ((meaning  a 
> select * via cqlsh will continue to show old schema for the table). I let the 
> cluster run for an hour to see if gossip will somehow catch up. 
> However, the interesting part is if you restart this node that was down when 
> schema changes were made, the 

[jira] [Commented] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down

2016-02-10 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15141799#comment-15141799
 ] 

Anubhav Kale commented on CASSANDRA-11143:
--

After digging through code, it appears that the cached data in CFMetadata isn't 
refreshed when system_schema.tables is changed in SchemaKeyspace.MergeSchema 
(mutations.forEach line). This leads to the check in validateCompatibility 
failing. 

On reboot, the node refreshes this data from disk so everything works correctly 
from that point onward.

Is this the expected behavior ? Seems odd to me.


> Schema changes don't propagate correctly if nodes are down
> --
>
> Key: CASSANDRA-11143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11143
> Project: Cassandra
>  Issue Type: Bug
> Environment: PROD
>Reporter: Anubhav Kale
>
> We saw a problem similar to what I describe below in our PROD environment a 
> few times. Below is a consistent repro. We can change the priority to Minor 
> since there is a workaround, though.
> Using steps from 
> http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
>  setup a two node cluster locally. 
> . Bring up both nodes
> . Create a table, and ensure cqlsh is correctly showing it on both nodes.
> . Bring down one node
> . Drop and re-create the same table Or change some schema in the table.
> . Bring up the down node.
> You will notice the exceptions like below (because of schema mismatch), and 
> the new schema never propagates to this node that was down ((meaning  a 
> select * via cqlsh will continue to show old schema for the table). I let the 
> cluster run for an hour to see if gossip will somehow catch up. 
> However, the interesting part is if you restart this node that was down when 
> schema changes were made, the exception below goes away and it gets new 
> schema correctly. 
> What is it caching that a second restart is necessary to make it behave 
> correctly ?
> ERROR 00:23:33 Configuration exception merging remote schema
> org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
> mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
> e2839010-cf7e-11e5-a13b-fb6871b443fb)
>   at 
> org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
>  ~[main/:na]
>   at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
> ~[main/:na]
>   at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
> ~[main/:na]
>   at org.apach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-09 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-7276:

Attachment: (was: 0001-Better-Logging-for-KS-CF.patch)

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Logging-for-Keyspace-and-Tables.patch, 
> 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, 
> cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-09 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-7276:

Attachment: 0001-Logging-for-Keyspace-and-Tables.patch

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Logging-for-Keyspace-and-Tables.patch, 
> 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, 
> cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11143) Schema changes don't propagate correctly if nodes are down

2016-02-09 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11143:


 Summary: Schema changes don't propagate correctly if nodes are down
 Key: CASSANDRA-11143
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11143
 Project: Cassandra
  Issue Type: Bug
 Environment: PROD
Reporter: Anubhav Kale


We saw a problem similar to what I describe below in our PROD environment a few 
times. Below is a consistent repro. We can change the priority to Minor since 
there is a workaround, though.

Using steps from 
http://stackoverflow.com/questions/22513979/setting-up-cassandra-multi-node-cluster-on-a-single-ubuntu-server/25348301#25348301,
 setup a two node cluster locally. 

. Bring up both nodes
. Create a table, and ensure cqlsh is correctly showing it on both nodes.
. Bring down one node
. Drop and re-create the same table Or change some schema in the table.
. Bring up the down node.

You will notice the exceptions like below (because of schema mismatch), and the 
new schema never propagates to this node that was down ((meaning cqlsh will 
continue to show old schema for the table). I let the cluster run for an hour 
to see if gossip will somehow catch up. 

However, the interesting part is if you restart this node that was down when 
schema changes were made, the exception below goes away and it gets new schema 
correctly. 

What is it caching that a second restart is necessary to make it behave 
correctly ?

ERROR 00:23:33 Configuration exception merging remote schema
org.apache.cassandra.exceptions.ConfigurationException: Column family ID 
mismatch (found 7208d260-cf8c-11e5-a13b-fb6871b443fb; expected 
e2839010-cf7e-11e5-a13b-fb6871b443fb)
at 
org.apache.cassandra.config.CFMetaData.validateCompatibility(CFMetaData.java:783)
 ~[main/:na]
at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:743) 
~[main/:na]
at org.apache.cassandra.config.Schema.updateTable(Schema.java:626) 
~[main/:na]
at org.apach




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-11142) Confusing error message on schema updates when nodes are down

2016-02-09 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-11142:


 Summary: Confusing error message on schema updates when nodes are 
down
 Key: CASSANDRA-11142
 URL: https://issues.apache.org/jira/browse/CASSANDRA-11142
 Project: Cassandra
  Issue Type: Bug
 Environment: PROD
Reporter: Anubhav Kale
Priority: Minor


Repro steps are as follows (this was tested on Windows and is a consistent 
repro)

. Start a two node cluster.
. Ensure that "nodetool status" shows both nodes as UN on both nodes
. Stop Node2
. Ensure that "nodetool status" shows that Node2 in DN.
. Start cqlsh on Node1
. Create a table
. cqlsh times out with below message (coming from .py)

Warning: schema version mismatch detected, which might be caused by DOWN nodes; 
if this is not the case, check the schema versions of your nodes in 
system.local and system.peers.
OperationTimedOut: errors={}, last_host=10.1.0.10
. Do a select * on the table that just timed out. It works fine.

It just seems odd that there are no errors, but the table gets created fine. We 
should either fix the timeout exception with a real error or not throw timeout. 
Not sure what the best approach is.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-08 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15137825#comment-15137825
 ] 

Anubhav Kale commented on CASSANDRA-7276:
-

I will take a stab at this. 

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2016-02-08 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-7276:

Attachment: 0001-Better-Logging-for-KS-CF.patch

Attached first cut of this.

With current approach, I have omitted BatchLog*VerbHandler and 
Repair*VerbHandler since they operate on Collection of mutations. This would 
mean we change the interface to collection of KS, instead of just the KS as 
originally suggested. We can do that, but we may lose the goal we are after 
here if MessageDeliveryTask simply prints a collection of KS and CF when 
something goes wrong. 

I took a stab at manually updating log stmts in KFS and CompactionManager. We 
can add logs in other places later once this first pass is committed (to keep 
merges sane).

Also, there is a possibility of introducing a base class for 
IKeyspaceAwareVerbHandler but I did not do that for readability sake (else the 
tree starts getting too deep).

In logs, should we use KS and CF or Keyspace and Table ? I don't believe there 
is a consistent pattern as such.

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tyler Hobbs
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 0001-Better-Logging-for-KS-CF.patch, 
> 2.1-CASSANDRA-7276-v1.txt, cassandra-2.1-7276-compaction.txt, 
> cassandra-2.1-7276.txt, cassandra-2.1.9-7276-v2.txt, cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10962) Cassandra should not create snapshot at restart for compactions_in_progress

2016-02-05 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15134875#comment-15134875
 ] 

Anubhav Kale commented on CASSANDRA-10962:
--

This does not repro on latest bits. 

Also, listsnapshots does not list any system table snapshots by design:

>From StorageService.getSnapshotDetails

for (Keyspace keyspace : Keyspace.all())
{
if (Schema.isSystemKeyspace(keyspace.getName()))
continue;

> Cassandra should not create snapshot at restart for compactions_in_progress
> ---
>
> Key: CASSANDRA-10962
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10962
> Project: Cassandra
>  Issue Type: Bug
> Environment: Ubuntu 14.04.3 LTS
>Reporter: FACORAT
>Priority: Minor
>
> If auto_snapshot is set to true in cassandra.yaml, each time you restart 
> Cassandra, a snapshot is created for system.compactions_in_progress as the 
> table is truncated at cassandra start.
> However as datas in this table are temporary, Cassandra should not create 
> snapshot for this table (or maybe even for system.* tables). This will be 
> coherent with the fact that "nodetool listsnapshots" doesn't even list this 
> table.
> Exemple:
> $ nodetool listsnapshots | grep compactions
> $ ls -lh 
> system/compactions_in_progress-55080ab05d9c388690a4acb25fe1f77b/snapshots/
> total 16K
> drwxr-xr-x 2 cassandra cassandra 4.0K Nov 30 13:12 
> 1448885530280-compactions_in_progress
> drwxr-xr-x 2 cassandra cassandra 4.0K Dec  7 15:36 
> 1449498977181-compactions_in_progress
> drwxr-xr-x 2 cassandra cassandra 4.0K Dec 14 18:20 
> 1450113621506-compactions_in_progress
> drwxr-xr-x 2 cassandra cassandra 4.0K Jan  4 12:53 
> 1451908396364-compactions_in_progress



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2016-01-19 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10907:
-
Attachment: 0001-Skip-Flush-option-for-Snapshot.patch

I initally went down the route of boolean option (didn't quite like it myself 
but felt less weird than inspecting array elements). I have changed that now. 

Addressed other comments and made the tests robust.

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
> Attachments: 0001-Skip-Flush-for-snapshots.patch, 
> 0001-Skip-Flush-option-for-Snapshot.patch, 
> 0001-Skip-Flush-option-for-Snapshot.patch, 0001-flush.patch
>
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2016-01-15 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10907:
-
Attachment: 0001-Skip-Flush-option-for-Snapshot.patch
0001-Skip-Flush-for-snapshots.patch

Sorry about the delay. Modified per comments. There is a lot of scope for 
cleaning up existing methods, but I am not doing that for now. 

I did add a Boolean to detect if KS / CF was passed to the proposed signature 
to make things easy. 

Tested locally, and ensured existing functionality is not broken. 

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
> Attachments: 0001-Skip-Flush-for-snapshots.patch, 
> 0001-Skip-Flush-option-for-Snapshot.patch, 0001-flush.patch
>
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2016-01-14 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10866:
-
Attachment: 0002-Dropped-Mutations-Count.patch

Rebased.

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability, Tools
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 0001-CF-Dropped-Mutation-Stats.patch, 
> 0001-CFCount.patch, 0002-Dropped-Mutations-Count.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2016-01-08 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15089947#comment-15089947
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

Thanks.

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability, Tools
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 0001-CF-Dropped-Mutation-Stats.patch, 
> 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2016-01-05 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083489#comment-15083489
 ] 

Anubhav Kale commented on CASSANDRA-10907:
--

Any updates here ?

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
> Attachments: 0001-flush.patch
>
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2016-01-05 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15083486#comment-15083486
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

Any updates here ?

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability, Tools
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 0001-CF-Dropped-Mutation-Stats.patch, 
> 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder

2016-01-04 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081741#comment-15081741
 ] 

Anubhav Kale commented on CASSANDRA-10960:
--

This is not about manually deleting old backup folders (that's okay). This is 
about C* not deleting the files from backups when those were deleted as part of 
compaction. Why is that by design -- can you please elaborate ?

> Compaction should delete old files from incremental backups folder
> --
>
> Key: CASSANDRA-10960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10960
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>
> When compaction runs the old flushed SS Tables from backups folder are not 
> deleted. If folks need to move the backups folder somewhere outside the 
> cluster, recovery becomes slower because unnecessary files need to be copied 
> back. 
> Is this behavior by design ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder

2016-01-04 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale reopened CASSANDRA-10960:
--

> Compaction should delete old files from incremental backups folder
> --
>
> Key: CASSANDRA-10960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10960
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>
> When compaction runs the old flushed SS Tables from backups folder are not 
> deleted. If folks need to move the backups folder somewhere outside the 
> cluster, recovery becomes slower because unnecessary files need to be copied 
> back. 
> Is this behavior by design ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder

2016-01-04 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15081833#comment-15081833
 ] 

Anubhav Kale commented on CASSANDRA-10960:
--

Here is a scenario:

Time t1: KS/CF/s1.db s2.db KS/CF/backups/s1.db s2.db
Time t2: KS/CF/s1.db s2.db s3.db KS/CF/backups/s1.db s2.db s3.db [Since anytime 
SS Table is flushed its written to backups as well]
Time t3 (Compaction ran): KS/CF/s4.db KS/CF/backups/s1.db s2.db s3.db s4.db 

This is existing behavior - correct ? The data hasn't changed in here, its 
simply represented via s4. It is reasonable to keep s1,s2,s3,s4 in backups so 
that folks can go back to any point in time. However, if folks want to move 
data from backups to elsewhere outside C* and copy it back during recovery -- 
it adds unnecessary burden of copying the same data multiple times (copying 
back s4 should have been enough here for recovery). 

Does this make sense ? Please let me know if I did not understand something 
correctly here.

> Compaction should delete old files from incremental backups folder
> --
>
> Key: CASSANDRA-10960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10960
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>
> When compaction runs the old flushed SS Tables from backups folder are not 
> deleted. If folks need to move the backups folder somewhere outside the 
> cluster, recovery becomes slower because unnecessary files need to be copied 
> back. 
> Is this behavior by design ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder

2016-01-04 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15082020#comment-15082020
 ] 

Anubhav Kale commented on CASSANDRA-10960:
--

Thanks for the explanation. While I don't want to continue the conversation 
here, IMHO C* need to enable a behavior where "old" ss tables from backups are 
deleted whenever they are deleted as part of compaction from actual folders. 
Else, too much duplicate data has to be moved back to nodes at the time of 
recovery.
Specific scenario is when backups need to be moved outside of Cassandra, else 
current behavior is good enough.

> Compaction should delete old files from incremental backups folder
> --
>
> Key: CASSANDRA-10960
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10960
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compaction
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>
> When compaction runs the old flushed SS Tables from backups folder are not 
> deleted. If folks need to move the backups folder somewhere outside the 
> cluster, recovery becomes slower because unnecessary files need to be copied 
> back. 
> Is this behavior by design ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10960) Compaction should delete old files from incremental backups folder

2015-12-31 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-10960:


 Summary: Compaction should delete old files from incremental 
backups folder
 Key: CASSANDRA-10960
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10960
 Project: Cassandra
  Issue Type: Improvement
  Components: Compaction
 Environment: PROD
Reporter: Anubhav Kale
Priority: Minor


When compaction runs the old flushed SS Tables from backups folder are not 
deleted. If folks need to move the backups folder somewhere outside the 
cluster, recovery becomes slower because unnecessary files need to be copied 
back. 

Is this behavior by design ? 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-28 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10866:
-
Attachment: 0001-CF-Dropped-Mutation-Stats.patch

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability, Tools
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 0001-CF-Dropped-Mutation-Stats.patch, 
> 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-28 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10866:
-
Attachment: (was: 0001-CF-Dropped-Mutation-Stats.patch)

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability, Tools
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-28 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15073178#comment-15073178
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

Attached. Please take a look when you get a chance !

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability, Tools
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 0001-CF-Dropped-Mutation-Stats.patch, 
> 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-28 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10866:
-
Attachment: 0001-CF-Dropped-Mutation-Stats.patch

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Observability, Tools
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.x
>
> Attachments: 0001-CF-Dropped-Mutation-Stats.patch, 
> 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-23 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069937#comment-15069937
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Thanks for your help in working through this.

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-23 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15069936#comment-15069936
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

Attached 10866-Trunk.patch.

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-23 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10866:
-
Attachment: 10866-Trunk.patch

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-CFCount.patch, 10866-Trunk.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-23 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070095#comment-15070095
 ] 

Anubhav Kale commented on CASSANDRA-10907:
--

For point in time backups, its always somewhat unpredictable what data is 
backed up especially with replication on. The concern here is the unnecessary 
time and resources spent in a blocking flush when its not really required. 

I have provided a patch. Its possible to provide overrides at other places, I 
took a stab at providing those on KS and CF and did the wiring. If you prefer 
some other approach, let me know.

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-23 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10907:
-
Attachment: 0001-flush.patch

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
> Attachments: 0001-flush.patch
>
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10936) Provide option to repair from a data center in "nodetool repair"

2015-12-23 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-10936:


 Summary: Provide option to repair from a data center in "nodetool 
repair"
 Key: CASSANDRA-10936
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10936
 Project: Cassandra
  Issue Type: Improvement
  Components: Tools
 Environment: PROD
Reporter: Anubhav Kale
Priority: Minor


Sometimes, its known that the correct / latest data resides in a Data Center. 
It would be useful if nodetool repair can provide a "Source DC" option to 
source the data from. This will save a ton of traffic on the network.

There are some discussions around this in CASSANDRA-6552.

A case in point where this is handy: People may want to backup data from a 
designated data center (so that only one copy of data is backed up) to remote 
storage (azure / AWS). At restore time once the data is restored to this DC, 
other data centers can "source" data from this through "nodetool repair -- 
source Foo".



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-22 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15068505#comment-15068505
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Any updates here ? It seems like another rebasing might be necessary for Trunk 
patch. If the patch looks okay, can the committer please take care of rebasing 
when it gets committed ?

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-21 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066956#comment-15066956
 ] 

Anubhav Kale commented on CASSANDRA-10907:
--

I agree that what is backed up will be undefined. In my opinion, the trap is 
very clear here so I don't think it can be misused. IMHO, the other nodetool 
commands have such traps as well so this is no different (e.g. why does scrub 
have an option to not snapshot ?). 

That said, if you feel strongly against this, I understand and we can kill this 
(I can always make a local patch).

BTW I can't use incremental backups, because I do not want to ship SS Table 
files that would have been removed as part of compaction. When compaction kicks 
in and deletes some files, it won't remove them from backups (which makes sense 
else it won't be incremental). So, at the time of recovery we are moving too 
many files back thus increasing the downtime of Apps. If I am not understanding 
something correctly here, please let me know !

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-21 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15067289#comment-15067289
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

Thanks. I included the Collection because I did not realize that SCHEMA_* verb 
isn't part of DROPPABLE_VERBs. Good point.

I'll submit a rebased patch shortly.

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Attachments: 0001-CFCount.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-21 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066779#comment-15066779
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

Any updates here ?

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
> Attachments: 0001-CFCount.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-21 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10907?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15066788#comment-15066788
 ] 

Anubhav Kale commented on CASSANDRA-10907:
--

We plan to move backups outside the nodes. So, when a snapshot is taken it 
would be ideal for it to be fast (thus not flush) so that it can be moved out 
as quickly as possible. We have enough replication so we can tolerate the data 
loss because the memtable wasn't flushed.

Do you feel strongly against it ?

> Nodetool snapshot should provide an option to skip flushing
> ---
>
> Key: CASSANDRA-10907
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>  Labels: lhf
>
> For some practical scenarios, it doesn't matter if the data is flushed to 
> disk before taking a snapshot. However, it's better to save some flushing 
> time to make snapshot process quick.
> As such, it will be a good idea to provide this option to snapshot command. 
> The wiring from nodetool to MBean to VerbHandler should be easy. 
> I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10907) Nodetool snapshot should provide an option to skip flushing

2015-12-18 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-10907:


 Summary: Nodetool snapshot should provide an option to skip 
flushing
 Key: CASSANDRA-10907
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10907
 Project: Cassandra
  Issue Type: Improvement
  Components: Configuration
 Environment: PROD
Reporter: Anubhav Kale
Priority: Minor


For some practical scenarios, it doesn't matter if the data is flushed to disk 
before taking a snapshot. However, it's better to save some flushing time to 
make snapshot process quick.

As such, it will be a good idea to provide this option to snapshot command. The 
wiring from nodetool to MBean to VerbHandler should be easy. 

I can provide a patch if this makes sense.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-17 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063020#comment-15063020
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

I have provided a patch on top of 
https://issues.apache.org/jira/secure/attachment/12777927/Trunk-All-Comments.patch

I can add Unit Tests in messaging service but I did not see a pattern for 
checking metrics thorough unit tests so skipping those. The change is verified 
on a local installation through visual VM. 

It is possible to optimize the KS/CF lookup in MessagingService.java (maybe 
through a Map), but I am hoping that's not necessary.

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
> Attachments: 0001-CFCount.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-17 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10866:
-
Attachment: 0001-CFCount.patch

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
> Attachments: 0001-CFCount.patch
>
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-17 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15063212#comment-15063212
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Thanks. When will the decision about 2.2 be made ? I think its better to 
produce 3.0 patch at that point to avoid merge conflicts agaun. What do you 
think ?

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060800#comment-15060800
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

I have re-created 2.2-All-Comments.patch. I tested this applies locally on a 
2.2 branch pulled in another directory (via git clone 
http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra-2.2). Can you 
please confirm this works correctly for you ?

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: (was: 2.2-All-Comments.patch)

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: 2.2-All-Comments.patch

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060307#comment-15060307
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

I am not sure what's going on with 2.2. I will look at it again today. I do the 
following to generate patches -- can you please confirm if that's what everyone 
follows ?

1. Change files
2. git status (Confirm that the changes are correct)
3. git add .
4. git commit -m "Foo"
5. git format-patch HEAD~

About the tests, I don't think the failures are related to my change. Are those 
failures expected ?

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060376#comment-15060376
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

My HEAD is indeed pointing to different location: 
f1c3df0e848638735c790b4817adf6411a52a064. I pulled down 2.2. via below and then 
made changes.

git clone http://git-wip-us.apache.org/repos/asf/cassandra.git cassandra-2.2

I will dig some more on why this did not work correctly.

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061241#comment-15061241
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

That explains it. I will get you the correct patch. Thanks for the explanation.

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: 2.2-All-Comments.patch

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: (was: 2.2-All-Comments.patch)

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) Add latency metrics for dropped messages

2015-12-16 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15061285#comment-15061285
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Attached 2.2 patch. Sorry about the goofup.

> Add latency metrics for dropped messages
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination, Observability
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-15 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059191#comment-15059191
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Can someone review this so we don't have to deal with too many merge conflicts 
later ? Thanks a lot !

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-15 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10866:
-
Summary: Column Family should expose count metrics for dropped mutations.  
(was: Column Family should expose latency metrics for dropped mutations.)

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10866) Column Family should expose count metrics for dropped mutations.

2015-12-15 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10866?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059214#comment-15059214
 ] 

Anubhav Kale commented on CASSANDRA-10866:
--

That makes sense. Updating title to reflect.

> Column Family should expose count metrics for dropped mutations.
> 
>
> Key: CASSANDRA-10866
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
> Project: Cassandra
>  Issue Type: Improvement
> Environment: PROD
>Reporter: Anubhav Kale
>Priority: Minor
>
> Please take a look at the discussion in CASSANDRA-10580. This is opened so 
> that the latency on dropped mutations is exposed as a metric on column 
> families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-15 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Comment: was deleted

(was: Trunk patch with all comments addressed => 0001-Mutation.patch)

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-15 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059535#comment-15059535
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Thanks. Attached *-All-Comments.patch for trunk and 2.2. Let me know if those 
look good.

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-15 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: 2.2-All-Comments.patch
Trunk-All-Comments.patch

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-15 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: (was: 0001-Mutation.patch)

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> 2.2-All-Comments.patch, CASSANDRA-10580-Head.patch, Trunk-All-Comments.patch, 
> Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-15 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: 0001-Mutation.patch

Trunk patch with all comments addressed => 0001-Mutation.patch

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 0001-Mutation.patch, 
> 10580-Metrics.patch, 10580.patch, CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-14 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: 0001-Metrics.patch

Patch for exposing this data as JMX metrics

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-14 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056808#comment-15056808
 ] 

Anubhav Kale edited comment on CASSANDRA-10580 at 12/14/15 10:12 PM:
-

Thanks for the pointers. Please take a look at the latest patch. I have tested 
it via Visual VM and the new metrics work well. 

It is interesting that I could not find any documentation that Metrics return 
results (.getSnapshot.GetMean()) in nanoseconds therefore callers must convert 
themselves. Noting this here so other Devs can save some time on this.

I'll file a separate JIRA for doing the same change on a per ks/cf basis. 

Let me know this makes sense, or more changes are needed. 


was (Author: anubhavk):
Patch for exposing this data as JMX metrics

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10866) Column Family should expose latency metrics for dropped mutations.

2015-12-14 Thread Anubhav Kale (JIRA)
Anubhav Kale created CASSANDRA-10866:


 Summary: Column Family should expose latency metrics for dropped 
mutations.
 Key: CASSANDRA-10866
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10866
 Project: Cassandra
  Issue Type: Improvement
 Environment: PROD
Reporter: Anubhav Kale
Priority: Minor


Please take a look at the discussion in CASSANDRA-10580. This is opened so that 
the latency on dropped mutations is exposed as a metric on column families.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-14 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15056843#comment-15056843
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

CASSANDRA-10866 is opened to track per CF work.

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 0001-Metrics.patch, 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-12 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15054502#comment-15054502
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Thanks for the interesting suggestion. Actually I considered going down that 
route when I started working on it, but I just wasn't sure what the rationale / 
design philosophy behind adding new metrics was therefore took a simpler route. 
Glad to see your feedback.

I have attached 10580-metrics.patch and will open a separate JIRA for doing 
this on a CF basis. I am using ApproximateTime class wherever its not taking 
part in decision of dropping the mutation and simply used for logging. I hope 
that makes sense.

I can clean up the methods in MessagingService a bit more if you like (couple 
of them are printing the same message). I wanted to send this out first to make 
sure I was on the right path. 

Also, a question: It appears that Timer.Update appends entries to the metric 
(which is what we want). Do you know at what point it starts dropping new 
appends / starts giving up ? I wonder if there is a huge number of dropped 
mutations will the timeTaken metric mess up ?

To make this work for CF, I will probably pass the mutation to 
MessagingService.LogDroppedMessages (maybe through an overload) and update the 
metrics on appropriate CF. Does that make sense ?

If this change looks good, I am more inclined towards making this work for CF 
before making up patches for old branches. Let me know if that's okay.

Appreciate your time and feedback !

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-12 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: 10580-Metrics.patch

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-12 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: 10580-Metrics.patch

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-12 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: (was: 10580-Metrics.patch)

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580-Metrics.patch, 10580.patch, 
> CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-10 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051604#comment-15051604
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Attached Trunk.patch. Let me know if that looks good. 

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580.patch, CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-10 Thread Anubhav Kale (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anubhav Kale updated CASSANDRA-10580:
-
Attachment: Trunk.patch

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580.patch, CASSANDRA-10580-Head.patch, Trunk.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10580) On dropped mutations, more details should be logged.

2015-12-09 Thread Anubhav Kale (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15049489#comment-15049489
 ] 

Anubhav Kale commented on CASSANDRA-10580:
--

Thanks for the review. I will work on the patches. 

Quick Question about the Null Ref Exception on empty constructors. I did not 
see this happen in my local run. Are you suggesting this is a pattern enforced 
by logging library and therefore, I should have constructor with (say) bool 
arg, and use that wherever I was using default constructor ? Sorry, I just am 
not clear on why something like this necessary.

> On dropped mutations, more details should be logged.
> 
>
> Key: CASSANDRA-10580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10580
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
> Environment: Production
>Reporter: Anubhav Kale
>Assignee: Anubhav Kale
>Priority: Minor
> Fix For: 3.2, 2.2.x
>
> Attachments: 10580.patch, CASSANDRA-10580-Head.patch
>
>
> In our production cluster, we are seeing a large number of dropped mutations. 
> At a minimum, we should print the time the thread took to get scheduled 
> thereby dropping the mutation (We should also print the Message / Mutation so 
> it helps in figuring out which column family was affected). This will help 
> find the right tuning parameter for write_timeout_in_ms. 
> The change is small and is in StorageProxy.java and MessagingTask.java. I 
> will submit a patch shortly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   >