[jira] [Updated] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations

2015-09-28 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-10392:

Description: 
It can be possible to use an external tracing solution in Cassandra by 
abstracting out the writing of tracing to system_traces tables in the tracing 
package to separate implementation classes and leaving abstract classes in 
place that define the interface and behaviour otherwise of C* tracing.

Then via a system property "cassandra.custom_tracing_class" the Tracing class 
implementation could be swapped out with something third party.

An example of this is adding Zipkin tracing into Cassandra in the Summit 
[presentation|].

In addition this patch passes the custom payload through into the tracing 
session allowing a third party tracing solution like Zipkin to do full-stack 
tracing from clients through and into Cassandra.

  was:
It can be possible to use an external tracing solution in Cassandra by 
abstracting out the writing of tracing to system_traces tables in the tracing 
package to separate implementation classes and leaving abstract classes in 
place that define the interface and behaviour otherwise of C* tracing.

Then via a system property "cassandra.custom_tracing_class" the Tracing class 
implementation could be swapped out with something third party.

An example of this is adding Zipkin tracing into Cassandra in the Summit 
presentation.

In addition this patch passes the custom payload through into the tracing 
session allowing a third party tracing solution like Zipkin to do full-stack 
tracing from clients through and into Cassandra.


> Allow Cassandra to trace to custom tracing implementations 
> ---
>
> Key: CASSANDRA-10392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10392
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: mck
>Assignee: mck
> Fix For: 3.x
>
> Attachments: 10392-trunk.txt
>
>
> It can be possible to use an external tracing solution in Cassandra by 
> abstracting out the writing of tracing to system_traces tables in the tracing 
> package to separate implementation classes and leaving abstract classes in 
> place that define the interface and behaviour otherwise of C* tracing.
> Then via a system property "cassandra.custom_tracing_class" the Tracing class 
> implementation could be swapped out with something third party.
> An example of this is adding Zipkin tracing into Cassandra in the Summit 
> [presentation|].
> In addition this patch passes the custom payload through into the tracing 
> session allowing a third party tracing solution like Zipkin to do full-stack 
> tracing from clients through and into Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations

2015-09-28 Thread mck (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mck updated CASSANDRA-10392:

Description: 
It can be possible to use an external tracing solution in Cassandra by 
abstracting out the writing of tracing to system_traces tables in the tracing 
package to separate implementation classes and leaving abstract classes in 
place that define the interface and behaviour otherwise of C* tracing.

Then via a system property "cassandra.custom_tracing_class" the Tracing class 
implementation could be swapped out with something third party.

An example of this is adding Zipkin tracing into Cassandra in the Summit 
[presentation|http://thelastpickle.com/files/2015-09-24-using-zipkin-for-full-stack-tracing-including-cassandra/presentation/tlp-reveal.js/tlp-cassandra-zipkin.html].
 Code for the implemented Zipkin plugin can be found at 
https://github.com/thelastpickle/cassandra-zipkin-tracing/

In addition this patch passes the custom payload through into the tracing 
session allowing a third party tracing solution like Zipkin to do full-stack 
tracing from clients through and into Cassandra.

  was:
It can be possible to use an external tracing solution in Cassandra by 
abstracting out the writing of tracing to system_traces tables in the tracing 
package to separate implementation classes and leaving abstract classes in 
place that define the interface and behaviour otherwise of C* tracing.

Then via a system property "cassandra.custom_tracing_class" the Tracing class 
implementation could be swapped out with something third party.

An example of this is adding Zipkin tracing into Cassandra in the Summit 
[presentation|].

In addition this patch passes the custom payload through into the tracing 
session allowing a third party tracing solution like Zipkin to do full-stack 
tracing from clients through and into Cassandra.


> Allow Cassandra to trace to custom tracing implementations 
> ---
>
> Key: CASSANDRA-10392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10392
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: mck
>Assignee: mck
> Fix For: 3.x
>
> Attachments: 10392-trunk.txt
>
>
> It can be possible to use an external tracing solution in Cassandra by 
> abstracting out the writing of tracing to system_traces tables in the tracing 
> package to separate implementation classes and leaving abstract classes in 
> place that define the interface and behaviour otherwise of C* tracing.
> Then via a system property "cassandra.custom_tracing_class" the Tracing class 
> implementation could be swapped out with something third party.
> An example of this is adding Zipkin tracing into Cassandra in the Summit 
> [presentation|http://thelastpickle.com/files/2015-09-24-using-zipkin-for-full-stack-tracing-including-cassandra/presentation/tlp-reveal.js/tlp-cassandra-zipkin.html].
>  Code for the implemented Zipkin plugin can be found at 
> https://github.com/thelastpickle/cassandra-zipkin-tracing/
> In addition this patch passes the custom payload through into the tracing 
> session allowing a third party tracing solution like Zipkin to do full-stack 
> tracing from clients through and into Cassandra.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2015-09-28 Thread J.P. Eiti Kimura (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

J.P. Eiti Kimura updated CASSANDRA-7276:

Attachment: cassandra-2.1.9-7276.txt

new patch added

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Nitzan Volman
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, 
> cassandra-2.1.9-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out

2015-09-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934590#comment-14934590
 ] 

Stefania commented on CASSANDRA-7392:
-

bq. Use a dedicated thread to update the timestamp so it isn't impacted by 
other activities

bq. I was going to suggest using the thread used by 
NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. 
However I'm not even sure why that activity deserved it's own thread. I think 
there was nothing available in some version of C*, but now it could just use 
ScheduledExecutors. So maybe just a dedicated thread for updating 
ApproximateTime. I believe approximate time will find more traction over time 
so it should be reasonably accurate when possible.

I've introduced a new periodic SES for fast jobs (sub-microsecond) and moved 
{{ApproximateTime}} and {{NanoTimeToCurrentTimeMillis}} to it.


bq. I think the timestamp field in ApproximateTime needs to be volatile.

OK

bq. Several properties don't have the "cassandra." prefix

Thanks, I accidentally dropped them during the refactoring.

bq. By polling the queue when not reporting you are increasing the bound on the 
number of retained failures and resources pinned by this reporting since 
aggregation doesn't really aggregate yet. I would just drain the queue when 
logging.

OK

bq. I think you want a count of operations that were truncated instead of a 
boolean so you can log the count.

OK

bq. Offering into the queue returns a boolean and doesn't throw, which style 
wise seems a little nicer, but that is bike shedding.

OK

bq. More bike shedding, when aggregating I would just allocate the map each 
time rather than clear it.

It's done now since we only drain when reporting, a map is now created only 
during reporting.

bq. I think you should sync logging to the debug log and logging info level to 
the regular log. Then in the regular log print a count of how many operations 
timed out since the last time you logged. That way it is easy to map between 
the two when looking at timestamps.

I've added number of operations and interval and made the two messages 
partially identical, is this what you meant by "sync"? 
Bear in mind that the no spam logger will only log once every 15 minutes 
however.

bq. I don't think this is a correct average calculation. You want a sum and a 
count. I didn't work for the simple example I did by hand.

Done.

bq. More bike shedding, you can implement min and max as "oldValue = 
Math.min(oldValue, nextMeasurement)".

OK

bq. Can you humor me and for Monitorable boolean checks rename to isXYZ and for 
things that might change it leave as is?

Sure, done.

bq. I think failedAt is unused now?

No, we still need it when adding a timeout to the same failed operation.

bq. If we use approximate time for timeouts can we also use it for setting the 
construction time?

I believe we can, this is however existing functionality that we are changing 
as it is used by the existing logging of all dropped messages.

bq. More bike shedding. The idiom for polling a thread safe queue is to avoid 
calling isEmpty() and poll checking for null to avoid extra lock acquisitions 
(assuming the queue does that) on the queue.. Some queues do have cheap(er) 
isEmpty() calls.

OK

> Abort in-progress queries that time out
> ---
>
> Key: CASSANDRA-7392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node 
> is overloaded) but not queries that time out while being processed.  
> (Particularly common for index queries on data that shouldn't be indexed.)  
> Adding the latter and logging when we have to interrupt one gets us a poor 
> man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10256) document commitlog segment size's relationship to max write size

2015-09-28 Thread Chris Gerlt (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933898#comment-14933898
 ] 

Chris Gerlt commented on CASSANDRA-10256:
-

I have reviewed the attachment above (CASSANDRA-10256.txt [ 12762495 ]) and see 
no issues with that text.  In other words, it looks great.

(Please note this is my fist contribution as a reviewer so I don't know if I'm 
suppose to do something other than comment!)

> document commitlog segment size's relationship to max write size
> 
>
> Key: CASSANDRA-10256
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10256
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Chris Burroughs
>Priority: Trivial
>  Labels: lhf
> Attachments: CASSANDRA-10256.txt
>
>
> This is in the code: 
> https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/commitlog/CommitLog.java#L57
> But not part of the description in cassandra.yaml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10166) Fix failing tests

2015-09-28 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-10166:

Assignee: Sylvain Lebresne

> Fix failing tests
> -
>
> Key: CASSANDRA-10166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10166
> Project: Cassandra
>  Issue Type: Test
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
> Fix For: 3.0.0 rc2
>
>
> Until we find a better way to track those things, this is meant as a master 
> ticket to track tickets open regarding tests (unit test and dtests, though at 
> the time of this writing only dtest are still failing) that are still 
> failing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10404) Node to Node encryption transitional mode

2015-09-28 Thread Tom Lewis (JIRA)
Tom Lewis created CASSANDRA-10404:
-

 Summary: Node to Node encryption transitional mode
 Key: CASSANDRA-10404
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10404
 Project: Cassandra
  Issue Type: New Feature
  Components: Core
Reporter: Tom Lewis


Create a transitional mode for encryption that allows encrypted and unencrypted 
traffic node-to-node during a change over to encryption from unencrypted. This 
alleviates downtime during the switch.

 This is similar to https://issues.apache.org/jira/browse/CASSANDRA-8803 which 
is intended for client-to-node



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933634#comment-14933634
 ] 

Joshua McKenzie commented on CASSANDRA-10403:
-

Adding extra configuration files w/options to switch on launch is something I'd 
be comfortable with us adding after GA so long as we leave our default alone. 
For this ticket, let's focus on just determining whether or not we feel 
reverting from G1 to CMS is appropriate for 3.0, and then move forward on a 
separate ticket for adding more intelligence to our GC configuration sourcing 
options.

For the record and my .02, I quite like the idea of us having multiple GC 
profiles out of the box with either logic to switch based on available heap, or 
via command-line for different expected workloads for instance; I think there's 
a lot we could do there to make operators' lives easier.

[~enigmacurry]: Any update on how that 100x test went?

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure

2015-09-28 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-10347:

Reviewer: Carl Yeksigian

> Bulk Loader API could not tolerate even node failure
> 
>
> Key: CASSANDRA-10347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10347
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Shenghua Wan
>Assignee: Paulo Motta
>
> When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in 
> the token range, which includes the dead nodes. Therefore, the stream failed. 
> There was a design in C* API to allow stream() method to have a list of 
> ignore hosts, but it was not utilized.
> The empty-argument stream() method is called in all existing versions of C*, 
> i.e.
> in v2.0.11, 
> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> in v2.1.5, 
> https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> and current trunk branch 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure

2015-09-28 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-10347:
--
Fix Version/s: 3.0.x
   2.2.x
   2.1.x

> Bulk Loader API could not tolerate even node failure
> 
>
> Key: CASSANDRA-10347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10347
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Shenghua Wan
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in 
> the token range, which includes the dead nodes. Therefore, the stream failed. 
> There was a design in C* API to allow stream() method to have a list of 
> ignore hosts, but it was not utilized.
> The empty-argument stream() method is called in all existing versions of C*, 
> i.e.
> in v2.0.11, 
> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> in v2.1.5, 
> https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> and current trunk branch 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10406) Nodetool supports to rebuild from specific ranges.

2015-09-28 Thread Dikang Gu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dikang Gu updated CASSANDRA-10406:
--
Attachment: CASSANDRA-10406.patch

Patch is based on 1.2.19.

> Nodetool supports to rebuild from specific ranges.
> --
>
> Key: CASSANDRA-10406
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10406
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
> Fix For: 1.2.x
>
> Attachments: CASSANDRA-10406.patch
>
>
> Add the 'nodetool rebuildrange' command, so that if `nodetool rebuild` 
> failed, we do not need to rebuild all the ranges, and can just rebuild those 
> failed ones.
> Should be easily ported to all versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure

2015-09-28 Thread Shenghua Wan (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shenghua Wan updated CASSANDRA-10347:
-
Attachment: AbstractBulkRecordWriter.java

> Bulk Loader API could not tolerate even node failure
> 
>
> Key: CASSANDRA-10347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10347
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Shenghua Wan
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
> Attachments: AbstractBulkRecordWriter.java
>
>
> When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in 
> the token range, which includes the dead nodes. Therefore, the stream failed. 
> There was a design in C* API to allow stream() method to have a list of 
> ignore hosts, but it was not utilized.
> The empty-argument stream() method is called in all existing versions of C*, 
> i.e.
> in v2.0.11, 
> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> in v2.1.5, 
> https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> and current trunk branch 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure

2015-09-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933818#comment-14933818
 ] 

Paulo Motta commented on CASSANDRA-10347:
-

bq. Isn't using mapreduce.output.bulkoutputformat.maxfailedhosts a better way 
to do this? Does that not work for this use case?

Probably yes, but [~wanshenghua] could tell better, did you try the 
{{mapreduce.output.bulkoutputformat.maxfailedhosts}} property? 

Unfortunately I just discovered that property after implementing the new one, 
my bad. Anyway, I guess the parameters are not mutually exclusive, as you may 
want still want to blacklist nodes that are alive. Since it's already 
implemented and to be consistent with sstable loader, I think it's still valid 
to have an {{ignorehosts}} property in addition to {{maxfailedhosts}}.

> Bulk Loader API could not tolerate even node failure
> 
>
> Key: CASSANDRA-10347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10347
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Shenghua Wan
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in 
> the token range, which includes the dead nodes. Therefore, the stream failed. 
> There was a design in C* API to allow stream() method to have a list of 
> ignore hosts, but it was not utilized.
> The empty-argument stream() method is called in all existing versions of C*, 
> i.e.
> in v2.0.11, 
> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> in v2.1.5, 
> https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> and current trunk branch 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure

2015-09-28 Thread Shenghua Wan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933906#comment-14933906
 ] 

Shenghua Wan commented on CASSANDRA-10347:
--

First thank you for looking into this issue.

[~pauloricardomg] To you question, I have not tried 
mapreduce.output.bulkoutputformat.maxfailedhosts property. I have read the 
source code and thought this property only gave up when certain number of host 
connections failed. However, I still want the streaming continue if there exist 
some hosts alive, even exceeding the threshold. Therefore, to solve the problem 
of my use case (skip connecting to lost hosts), I have implemented something 
just like "mapreduce.output.bulkoutputformat.ignorehosts" property, e.g.


> Bulk Loader API could not tolerate even node failure
> 
>
> Key: CASSANDRA-10347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10347
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Shenghua Wan
>Assignee: Paulo Motta
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in 
> the token range, which includes the dead nodes. Therefore, the stream failed. 
> There was a design in C* API to allow stream() method to have a list of 
> ignore hosts, but it was not utilized.
> The empty-argument stream() method is called in all existing versions of C*, 
> i.e.
> in v2.0.11, 
> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> in v2.1.5, 
> https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> and current trunk branch 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-10399) Create default Stress tables without compact storage

2015-09-28 Thread T Jake Luciani (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

T Jake Luciani resolved CASSANDRA-10399.

Resolution: Not A Problem

You can just use a yaml file like 
[this|https://gist.github.com/tjake/3186dec175b015d9f5b]

> Create default Stress tables without compact storage 
> -
>
> Key: CASSANDRA-10399
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10399
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sebastian Estevez
>Priority: Minor
>
> ~$ cassandra-stress write
> {code}
> cqlsh> desc TABLE keyspace1.standard1
> CREATE TABLE keyspace1.standard1 (
> key blob PRIMARY KEY,
> "C0" blob,
> "C1" blob,
> "C2" blob,
> "C3" blob,
> "C4" blob
> ) WITH COMPACT STORAGE
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9840) global_row_key_cache_test.py fails; loses mutations on cluster restart

2015-09-28 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-9840:
---
Assignee: Ariel Weisberg

> global_row_key_cache_test.py fails; loses mutations on cluster restart
> --
>
> Key: CASSANDRA-9840
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9840
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Shawn Kumar
>Assignee: Ariel Weisberg
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
> Attachments: node1.log, node2.log, node3.log, noseout.txt
>
>
> This test is currently failing on trunk. I've attached the test output and 
> logs. It seems that the failure of the test doesn't necessarily have anything 
> to do with global row/key caches - as on the initial loop of the test 
> [neither are 
> used|https://github.com/riptano/cassandra-dtest/blob/master/global_row_key_cache_test.py#L15]
>  and we still hit failure. The test itself fails when a second validation of 
> values after a cluster restart fails to capture deletes issued prior to the 
> restart and first successful validation. However, if I add flushes prior to 
> restarting the cluster the test completes successfully, implying an issue 
> with loss of in-memory mutations due to the cluster restart. Initially I had 
> though this might be due to CASSANDRA-9669, but as Benedict pointed out, the 
> fact that this test has been succeeding consistently on both 2.1 and 2.2 
> branch indicates there may be another issue at hand.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Joshua McKenzie (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933598#comment-14933598
 ] 

Joshua McKenzie commented on CASSANDRA-10403:
-

To me, the long-term solution of C* having the intelligence to select G1 for 
heaps over X, CMS for heaps under X makes a lot of sense, assuming test data 
shows that to be the appropriate solution.

I'd argue that what we need to do here is figure out what the sanest 
recommendation is for a default GC on 3.0, get that setup in our launch scripts 
(if necessary), and probably include the alternate set of GC configurations in 
our launch files, commented out, so people can easily swap back and forth based 
on their needs.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10405) MV updates should optionally wait for acknowledgement from view replicas

2015-09-28 Thread Carl Yeksigian (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Yeksigian updated CASSANDRA-10405:
---
Issue Type: Improvement  (was: Bug)

> MV updates should optionally wait for acknowledgement from view replicas
> 
>
> Key: CASSANDRA-10405
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10405
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Carl Yeksigian
>  Labels: materializedviews
> Fix For: 3.x
>
>
> MV updates are currently completely asynchronous in order to provide 
> parallelism of updates trying to acquire the partition lock. For some use 
> cases, leaving the MV updates asynchronous is exactly what's needed.
> However, there are some use cases where knowing that the update has either 
> succeeded or failed on the view is necessary, especially when trying to allow 
> read-your-write behavior. In those cases, we would follow the same code path 
> as asynchronous writes, but at the end wait on the acknowledgements from the 
> view replicas before acknowledging our write. This option should be for each 
> MV separately, since MVs which need the synchronous properties might be mixed 
> with MV which do not need this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission

2015-09-28 Thread John Sumsion (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933745#comment-14933745
 ] 

John Sumsion commented on CASSANDRA-5780:
-

The only thing I wouldn't want to have happen is to accidentally issue some 
kind of truncate that in a race condition inadvertently gets replicated to the 
entire cluster.  I don't know the cassandra codebase enough to understand 
whether that risk exists when calling {{ColumnFamilyStore.truncateBlocking()}}. 
 From what I can tell, I think it's likely pretty safe because once you get 
down to StorageService, there is no cross-cluster effect of actions taken at 
that level.

Can anyone reply who knows better what cross-cluster effects 
{{truncateBlocking()}} might have?

The reason I don't have that concern with the 'system' keyspace is that it is 
never replicated.

Actually, looking into  {{ColumnFamilyStore.truncateBlocking()}} makes me think 
that my proposed changes will blow up half-way through because a side-effect of 
truncating a table is writing back a "truncated at" record to 'system.local' 
table (which we just truncated).  I guess I need to run ccm with a local-built 
cassandra and try decomissioning to see what happens (not sure how to do that).

> nodetool status and ring report incorrect/stale information after decommission
> --
>
> Key: CASSANDRA-5780
> URL: https://issues.apache.org/jira/browse/CASSANDRA-5780
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
>Reporter: Peter Haggerty
>Priority: Trivial
>  Labels: lhf, ponies, qa-resolved
> Fix For: 2.1.x
>
>
> Cassandra 1.2.6 ring of 12 instances, each with 256 tokens.
> Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring.
> The 9 instances of cassandra that are in the ring all correctly report 
> nodetool status information for the ring and have the same data.
> After the first node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> After the second node is decommissioned:
> "nodetool status" on "decommissioned-1st" reports 11 nodes
> "nodetool status" on "decommissioned-2nd" reports 10 nodes
> "nodetool status" on "decommissioned-3rd" reports 9 nodes
> The storage load information is similarly stale on the various decommissioned 
> nodes. The nodetool status and ring commands continue to return information 
> as if they were part of a cluster and they appear to return the last 
> information that they saw.
> In contrast the nodetool info command fails with an exception, which isn't 
> ideal but at least indicates that there was a failure rather than returning 
> stale information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

2015-09-28 Thread Joel Knighton (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933782#comment-14933782
 ] 

Joel Knighton commented on CASSANDRA-10231:
---

I'm continuing to follow-up, but it doesn't look like this patch fixes the 
issue.

I'll try to reproduce this with a dtest again.

> Null status entries on nodes that crash during decommission of a different 
> node
> ---
>
> Key: CASSANDRA-10231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Stefania
> Fix For: 3.0.0 rc2
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10399) Create default Stress tables without compact storage

2015-09-28 Thread T Jake Luciani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933538#comment-14933538
 ] 

T Jake Luciani edited comment on CASSANDRA-10399 at 9/28/15 4:37 PM:
-

You can just use a yaml file like 
[this|https://gist.github.com/tjake/3186dec175b015d9f5b9]


was (Author: tjake):
You can just use a yaml file like 
[this|https://gist.github.com/tjake/3186dec175b015d9f5b]

> Create default Stress tables without compact storage 
> -
>
> Key: CASSANDRA-10399
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10399
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sebastian Estevez
>Priority: Minor
>
> ~$ cassandra-stress write
> {code}
> cqlsh> desc TABLE keyspace1.standard1
> CREATE TABLE keyspace1.standard1 (
> key blob PRIMARY KEY,
> "C0" blob,
> "C1" blob,
> "C2" blob,
> "C3" blob,
> "C4" blob
> ) WITH COMPACT STORAGE
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10298) Replaced dead node stayed in gossip forever

2015-09-28 Thread Dikang Gu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933609#comment-14933609
 ] 

Dikang Gu commented on CASSANDRA-10298:
---

[~Stefania], yeah, looks like the same issues, have you committed your patches 
to 2.1 branch?

> Replaced dead node stayed in gossip forever
> ---
>
> Key: CASSANDRA-10298
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10298
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
>Assignee: Dikang Gu
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-10298.patch
>
>
> The dead node stayed in the nodetool status,
> DN  10.210.165.55379.76 GB  256 ?   null
> And in the log, it throws NPE when trying to remove it.
> {code}
> 2015-09-10_06:41:22.92453 ERROR 06:41:22 Exception in thread 
> Thread[GossipStage:1,5,main]
> 2015-09-10_06:41:22.92454 java.lang.NullPointerException: null
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:100) 
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.db.HintedHandOffManager.deleteHintsForEndpoint(HintedHandOffManager.java:201)
>  
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:1886) 
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:1902) 
> 2015-09-10_06:41:22.92456   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1805)
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1473)
>  
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2099) 
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) 
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1085) 
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
> 2015-09-10_06:41:22.92459   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_45]
> 2015-09-10_06:41:22.92460   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933599#comment-14933599
 ] 

Paulo Motta commented on CASSANDRA-10403:
-

+1. As an operator I had some issues with an 8GB heap and G1GC. We should 
probably make it easy to switch by extracting gc properties to a variable, and 
provide a commented-out option with pre-filled G1 settings, and maybe mention 
something on the documentation too.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10399) Create default Stress tables without compact storage

2015-09-28 Thread Sebastian Estevez (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933663#comment-14933663
 ] 

Sebastian Estevez edited comment on CASSANDRA-10399 at 9/28/15 5:53 PM:


Yes, I can also manually create the table without COMPACT STORAGE. My point was 
we shouldn't use compact storage by default.


was (Author: sebastian.este...@datastax.com):
Yes, I can also manually create the table without COMPACT STORAGE. I guess my 
point was we shouldn't use compact storage by default.

> Create default Stress tables without compact storage 
> -
>
> Key: CASSANDRA-10399
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10399
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sebastian Estevez
>Priority: Minor
>
> ~$ cassandra-stress write
> {code}
> cqlsh> desc TABLE keyspace1.standard1
> CREATE TABLE keyspace1.standard1 (
> key blob PRIMARY KEY,
> "C0" blob,
> "C1" blob,
> "C2" blob,
> "C3" blob,
> "C4" blob
> ) WITH COMPACT STORAGE
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned

2015-09-28 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-10068:
--
Description: 
This issue is reproducible through a Jepsen test of materialized views that 
crashes and decommissions nodes throughout the test.

At the conclusion of the test, a batchlog replay is initiated through nodetool 
and hits the following assertion due to a missing host ID: 
https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197

A nodetool status on the node with failed batchlog replay shows the following 
entry for the decommissioned node:
DN  10.0.0.5  ?  256  ?   null  
rack1

On the unaffected nodes, there is no entry for the decommissioned node as 
expected.

There are occasional hits of the same assertions for logs in other nodes; it 
looks like the issue might occasionally resolve itself, but one node seems to 
have the errant null entry indefinitely.

In logs for the nodes, this possibly unrelated exception also appears:
java.lang.RuntimeException: Trying to get the view natural endpoint on a 
non-data replica
at 
org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
 ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]

I have a running cluster with the issue on my machine; it is also repeatable.

Nothing stands out in the logs of the decommissioned node (n4) for me. The logs 
of each node in the cluster are attached.



  was:
This issue is reproducible through a Jepsen test of materialized views that 
crashes and decommissions nodes throughout the test.

At the conclusion of the test, a batchlog replay is initiated through nodetool 
and hits the following assertion due to a missing host ID: 
https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197

A nodetool status on the node with failed batchlog replay shows the following 
entry for the decommissioned node:
DN  10.0.0.5  ?  256  ?   null  
rack1

On the unaffected nodes, there is no entry for the decommissioned node as 
expected.

There are occasional hits of the same assertions for logs in other nodes; it 
looks like the issue might occasionally resolve itself, but one node seems to 
have the errant null entry indefinitely.

In logs for the nodes, this possibly unrelated exception also appears:
java.lang.RuntimeException: Trying to get the view natural endpoint on a 
non-data replica
at 
org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
 ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]

I haveb a running cluster with the issue on my machine; it is also repeatable.

Nothing stands out in the logs of the decommissioned node (n4) for me. The logs 
of each node in the cluster are attached.




> Batchlog replay fails with exception after a node is decommissioned
> ---
>
> Key: CASSANDRA-10068
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10068
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Branimir Lambov
> Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> At the conclusion of the test, a batchlog replay is initiated through 
> nodetool and hits the following assertion due to a missing host ID: 
> https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197
> A nodetool status on the node with failed batchlog replay shows the following 
> entry for the decommissioned node:
> DN  10.0.0.5  ?  256  ?   null
>   rack1
> On the unaffected nodes, there is no entry for the decommissioned node as 
> expected.
> There are occasional hits of the same assertions for logs in other nodes; it 
> looks like the issue might occasionally resolve itself, but one node seems to 
> have the errant null entry indefinitely.
> In logs for the nodes, this possibly unrelated exception also appears:
> java.lang.RuntimeException: Trying to get the view natural endpoint on a 
> non-data replica
>   at 
> org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
>  ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]
> I have a running cluster with the issue on my machine; it is also repeatable.
> Nothing 

[jira] [Updated] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned

2015-09-28 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-10068:
--
Description: 
This issue is reproducible through a Jepsen test of materialized views that 
crashes and decommissions nodes throughout the test.

At the conclusion of the test, a batchlog replay is initiated through nodetool 
and hits the following assertion due to a missing host ID: 
https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197

A nodetool status on the node with failed batchlog replay shows the following 
entry for the decommissioned node:
DN  10.0.0.5  ?  256  ?   null  
rack1

On the unaffected nodes, there is no entry for the decommissioned node as 
expected.

There are occasional hits of the same assertions for logs in other nodes; it 
looks like the issue might occasionally resolve itself, but one node seems to 
have the errant null entry indefinitely.

In logs for the nodes, this possibly unrelated exception also appears:
java.lang.RuntimeException: Trying to get the view natural endpoint on a 
non-data replica
at 
org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
 ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]

I haveb a running cluster with the issue on my machine; it is also repeatable.

Nothing stands out in the logs of the decommissioned node (n4) for me. The logs 
of each node in the cluster are attached.



  was:
This issue is reproducible through a Jepsen test of materialized views that 
crashes and decommissions nodes throughout the test.

At the conclusion of the test, a batchlog replay is initiated through nodetool 
and hits the following assertion due to a missing host ID: 
https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197

A nodetool status on the node with failed batchlog replay shows the following 
entry for the decommissioned node:
DN  10.0.0.5  ?  256  ?   null  
rack1

On the unaffected nodes, there is no entry for the decommissioned node as 
expected.

There are occasional hits of the same assertions for logs in other nodes; it 
looks like the issue might occasionally resolve itself, but one node seems to 
have the errant null entry indefinitely.

In logs for the nodes, this possibly unrelated exception also appears:
java.lang.RuntimeException: Trying to get the view natural endpoint on a 
non-data replica
at 
org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
 ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]

I havereddit.com/r/androidwear a running cluster with the issue on my machine; 
it is also repeatable.

Nothing stands out in the logs of the decommissioned node (n4) for me. The logs 
of each node in the cluster are attached.




> Batchlog replay fails with exception after a node is decommissioned
> ---
>
> Key: CASSANDRA-10068
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10068
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Branimir Lambov
> Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> At the conclusion of the test, a batchlog replay is initiated through 
> nodetool and hits the following assertion due to a missing host ID: 
> https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197
> A nodetool status on the node with failed batchlog replay shows the following 
> entry for the decommissioned node:
> DN  10.0.0.5  ?  256  ?   null
>   rack1
> On the unaffected nodes, there is no entry for the decommissioned node as 
> expected.
> There are occasional hits of the same assertions for logs in other nodes; it 
> looks like the issue might occasionally resolve itself, but one node seems to 
> have the errant null entry indefinitely.
> In logs for the nodes, this possibly unrelated exception also appears:
> java.lang.RuntimeException: Trying to get the view natural endpoint on a 
> non-data replica
>   at 
> org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
>  ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]
> I haveb a running cluster with the issue on my machine; it is 

[jira] [Resolved] (CASSANDRA-9922) Add Materialized View WHERE schema support

2015-09-28 Thread Carl Yeksigian (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Carl Yeksigian resolved CASSANDRA-9922.
---
   Resolution: Duplicate
Fix Version/s: (was: 3.x)

This was completed as part of CASSANDRA-9664.

> Add Materialized View WHERE schema support
> --
>
> Key: CASSANDRA-9922
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9922
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Carl Yeksigian
>  Labels: materializedviews
>
> In order to provide forward compatibility with the 3.x series, we should add 
> schema support for capturing the where clause of the MV.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?

2015-09-28 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta reassigned CASSANDRA-9806:
--

Assignee: Paulo Motta

> some TTL test are failing on trunk: losing data after restart? 
> ---
>
> Key: CASSANDRA-9806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9806
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alan Boudreault
>Assignee: Paulo Motta
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is 
> failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are 
> failing:
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/
> After some debugging, I noticed a strange behaviour. It looks like some data 
> disappear after a node restart, even if the row has no TTL set. Here a test 
> example where I see the issue with latest trunk:
> https://gist.github.com/aboudreault/94cb552750a186ca853d



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10323) Add more MaterializedView metrics

2015-09-28 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-10323:

Assignee: Chris Lohfink

> Add more MaterializedView metrics
> -
>
> Key: CASSANDRA-10323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10323
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: Chris Lohfink
>  Labels: lhf
> Fix For: 3.0.0 rc2
>
> Attachments: trunk-10323.txt
>
>
> We need to add more metrics to help understand where time is spent in 
> materialized view writes. We currently track the ratio of async base -> view 
> mutations that fail.
> We should also add
>   * The amount of time spent waiting for the partition lock (contention)
>   * The amount of time spent reading data 
> Any others? 
> [~carlyeks] [~jkni] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933621#comment-14933621
 ] 

Jonathan Shook commented on CASSANDRA-10403:


I would be entirely in favor of having a separate settings file that can simply 
be sourced in. Having several related GC options sprinkled through the -env 
file is bothersome. This should apply as well to the CMS settings. Perhaps it 
should even be a soft setting, as long as the possible values are marshaled 
against any injection.

> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10405) MV updates should optionally wait for acknowledgement from view replicas

2015-09-28 Thread Carl Yeksigian (JIRA)
Carl Yeksigian created CASSANDRA-10405:
--

 Summary: MV updates should optionally wait for acknowledgement 
from view replicas
 Key: CASSANDRA-10405
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10405
 Project: Cassandra
  Issue Type: Bug
Reporter: Carl Yeksigian
 Fix For: 3.x


MV updates are currently completely asynchronous in order to provide 
parallelism of updates trying to acquire the partition lock. For some use 
cases, leaving the MV updates asynchronous is exactly what's needed.

However, there are some use cases where knowing that the update has either 
succeeded or failed on the view is necessary, especially when trying to allow 
read-your-write behavior. In those cases, we would follow the same code path as 
asynchronous writes, but at the end wait on the acknowledgements from the view 
replicas before acknowledging our write. This option should be for each MV 
separately, since MVs which need the synchronous properties might be mixed with 
MV which do not need this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure

2015-09-28 Thread Jeremiah Jordan (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933716#comment-14933716
 ] 

Jeremiah Jordan commented on CASSANDRA-10347:
-

Isn't using mapreduce.output.bulkoutputformat.maxfailedhosts a better way to do 
this?  Does that not work for this use case?

> Bulk Loader API could not tolerate even node failure
> 
>
> Key: CASSANDRA-10347
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10347
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Shenghua Wan
>Assignee: Paulo Motta
>
> When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in 
> the token range, which includes the dead nodes. Therefore, the stream failed. 
> There was a design in C* API to allow stream() method to have a list of 
> ignore hosts, but it was not utilized.
> The empty-argument stream() method is called in all existing versions of C*, 
> i.e.
> in v2.0.11, 
> https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> in v2.1.5, 
> https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122
> and current trunk branch 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned

2015-09-28 Thread Joel Knighton (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Knighton updated CASSANDRA-10068:
--
Description: 
This issue is reproducible through a Jepsen test of materialized views that 
crashes and decommissions nodes throughout the test.

At the conclusion of the test, a batchlog replay is initiated through nodetool 
and hits the following assertion due to a missing host ID: 
https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197

A nodetool status on the node with failed batchlog replay shows the following 
entry for the decommissioned node:
DN  10.0.0.5  ?  256  ?   null  
rack1

On the unaffected nodes, there is no entry for the decommissioned node as 
expected.

There are occasional hits of the same assertions for logs in other nodes; it 
looks like the issue might occasionally resolve itself, but one node seems to 
have the errant null entry indefinitely.

In logs for the nodes, this possibly unrelated exception also appears:
java.lang.RuntimeException: Trying to get the view natural endpoint on a 
non-data replica
at 
org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
 ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]

I havereddit.com/r/androidwear a running cluster with the issue on my machine; 
it is also repeatable.

Nothing stands out in the logs of the decommissioned node (n4) for me. The logs 
of each node in the cluster are attached.



  was:
This issue is reproducible through a Jepsen test of materialized views that 
crashes and decommissions nodes throughout the test.

At the conclusion of the test, a batchlog replay is initiated through nodetool 
and hits the following assertion due to a missing host ID: 
https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197

A nodetool status on the node with failed batchlog replay shows the following 
entry for the decommissioned node:
DN  10.0.0.5  ?  256  ?   null  
rack1

On the unaffected nodes, there is no entry for the decommissioned node as 
expected.

There are occasional hits of the same assertions for logs in other nodes; it 
looks like the issue might occasionally resolve itself, but one node seems to 
have the errant null entry indefinitely.

In logs for the nodes, this possibly unrelated exception also appears:
java.lang.RuntimeException: Trying to get the view natural endpoint on a 
non-data replica
at 
org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
 ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]

I have a running cluster with the issue on my machine; it is also repeatable.

Nothing stands out in the logs of the decommissioned node (n4) for me. The logs 
of each node in the cluster are attached.




> Batchlog replay fails with exception after a node is decommissioned
> ---
>
> Key: CASSANDRA-10068
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10068
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Branimir Lambov
> Attachments: n1.log, n2.log, n3.log, n4.log, n5.log
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> At the conclusion of the test, a batchlog replay is initiated through 
> nodetool and hits the following assertion due to a missing host ID: 
> https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197
> A nodetool status on the node with failed batchlog replay shows the following 
> entry for the decommissioned node:
> DN  10.0.0.5  ?  256  ?   null
>   rack1
> On the unaffected nodes, there is no entry for the decommissioned node as 
> expected.
> There are occasional hits of the same assertions for logs in other nodes; it 
> looks like the issue might occasionally resolve itself, but one node seems to 
> have the errant null entry indefinitely.
> In logs for the nodes, this possibly unrelated exception also appears:
> java.lang.RuntimeException: Trying to get the view natural endpoint on a 
> non-data replica
>   at 
> org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91)
>  ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT]
> I havereddit.com/r/androidwear a running cluster with the issue 

[jira] [Commented] (CASSANDRA-10399) Create default Stress tables without compact storage

2015-09-28 Thread Sebastian Estevez (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933663#comment-14933663
 ] 

Sebastian Estevez commented on CASSANDRA-10399:
---

Yes, I can also manually create the table without COMPACT STORAGE. I guess my 
point was we shouldn't use compact storage by default.

> Create default Stress tables without compact storage 
> -
>
> Key: CASSANDRA-10399
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10399
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Sebastian Estevez
>Priority: Minor
>
> ~$ cassandra-stress write
> {code}
> cqlsh> desc TABLE keyspace1.standard1
> CREATE TABLE keyspace1.standard1 (
> key blob PRIMARY KEY,
> "C0" blob,
> "C1" blob,
> "C2" blob,
> "C3" blob,
> "C4" blob
> ) WITH COMPACT STORAGE
> AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
> AND comment = ''
> AND compaction = {'class': 
> 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
> AND compression = {}
> AND dclocal_read_repair_chance = 0.1
> AND default_time_to_live = 0
> AND gc_grace_seconds = 864000
> AND max_index_interval = 2048
> AND memtable_flush_period_in_ms = 0
> AND min_index_interval = 128
> AND read_repair_chance = 0.0
> AND speculative_retry = 'NONE';
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out

2015-09-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933983#comment-14933983
 ] 

Ariel Weisberg commented on CASSANDRA-7392:
---

* Use a dedicated thread to update the timestamp so it isn't impacted by other 
activities
* I was going to suggest use the thread used by NanoTimeToCurrentTimeMillis, so 
make it an SES and schedule the work there. However I'm not even sure why that 
activity deserved it's own thread. I think there was nothing available in some 
version of C*, but now it could just use ScheduledExecutors. So maybe just a 
dedicated thread for updating ApproximateTime. I believe approximate time will 
find more traction over time so it should be reasonably accurate when possible.
* I think the timestamp field in ApproximateTime needs to be volatile.
* Several properties don't have the "cassandra." prefix
* By polling the queue when not reporting you are increasing the bound on the 
number of retained failures and resources pinned by this reporting since 
aggregation doesn't really aggregate yet. I would just drain the queue when 
logging. 
* I think you want a count of operations that were truncated instead of a 
boolean so you can log the count.
* [Offering into the queue returns a boolean and doesn't throw, which style 
wise seems a little nicer, but that is bike 
shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126]
* More bike shedding, when aggregating I would just allocate the map each time 
rather than clear it.
* I think you should sync logging to the debug log and logging info level to 
the regular log. Then in the regular log print a count of how many operations 
timed out since the last time you logged. That way it is easy to map between 
the two when looking at timestamps.
* [I don't think this is a correct average calculation. You want a sum and a 
count. I didn't work for the simple example I did by 
hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257]
* [More bike shedding, you can implement min and max as "oldValue = 
Math.min(oldValue, 
nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259]
* [Can you humor me and for Monitorable boolean checks rename to isXYZ and for 
things that might change it leave as 
is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53]
* [I think failedAt is unused 
now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223]

> Abort in-progress queries that time out
> ---
>
> Key: CASSANDRA-7392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node 
> is overloaded) but not queries that time out while being processed.  
> (Particularly common for index queries on data that shouldn't be indexed.)  
> Adding the latter and logging when we have to interrupt one gets us a poor 
> man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10323) Add more MaterializedView metrics

2015-09-28 Thread Chris Lohfink (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Lohfink updated CASSANDRA-10323:
--
Attachment: trunk-10323-v2.txt

> Add more MaterializedView metrics
> -
>
> Key: CASSANDRA-10323
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10323
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: T Jake Luciani
>Assignee: Chris Lohfink
>  Labels: lhf
> Fix For: 3.0.0 rc2
>
> Attachments: trunk-10323-v2.txt, trunk-10323.txt
>
>
> We need to add more metrics to help understand where time is spent in 
> materialized view writes. We currently track the ratio of async base -> view 
> mutations that fail.
> We should also add
>   * The amount of time spent waiting for the partition lock (contention)
>   * The amount of time spent reading data 
> Any others? 
> [~carlyeks] [~jkni] 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out

2015-09-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934099#comment-14934099
 ] 

Ariel Weisberg commented on CASSANDRA-7392:
---

Sorry noticed one more thing. Not editing because it drives observers crazy.
* [More bike shedding. The idiom for polling a thread safe queue is to avoid 
calling isEmpty() and poll checking for null to avoid extra lock acquisitions 
(assuming the queue does that) on the 
queue.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R151].
 Some queues do have cheap(er) isEmpty() calls.

> Abort in-progress queries that time out
> ---
>
> Key: CASSANDRA-7392
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7392
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Core
>Reporter: Jonathan Ellis
>Assignee: Stefania
>Priority: Critical
> Fix For: 3.x
>
>
> Currently we drop queries that time out before we get to them (because node 
> is overloaded) but not queries that time out while being processed.  
> (Particularly common for index queries on data that shouldn't be indexed.)  
> Adding the latter and logging when we have to interrupt one gets us a poor 
> man's "slow query log" for free.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7392) Abort in-progress queries that time out

2015-09-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933983#comment-14933983
 ] 

Ariel Weisberg edited comment on CASSANDRA-7392 at 9/28/15 9:02 PM:


* Use a dedicated thread to update the timestamp so it isn't impacted by other 
activities
* I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, 
so make it an SES and schedule the work there. However I'm not even sure why 
that activity deserved it's own thread. I think there was nothing available in 
some version of C*, but now it could just use ScheduledExecutors. So maybe just 
a dedicated thread for updating ApproximateTime. I believe approximate time 
will find more traction over time so it should be reasonably accurate when 
possible.
* I think the timestamp field in ApproximateTime needs to be volatile.
* Several properties don't have the "cassandra." prefix
* By polling the queue when not reporting you are increasing the bound on the 
number of retained failures and resources pinned by this reporting since 
aggregation doesn't really aggregate yet. I would just drain the queue when 
logging. 
* I think you want a count of operations that were truncated instead of a 
boolean so you can log the count.
* [Offering into the queue returns a boolean and doesn't throw, which style 
wise seems a little nicer, but that is bike 
shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126]
* More bike shedding, when aggregating I would just allocate the map each time 
rather than clear it.
* I think you should sync logging to the debug log and logging info level to 
the regular log. Then in the regular log print a count of how many operations 
timed out since the last time you logged. That way it is easy to map between 
the two when looking at timestamps.
* [I don't think this is a correct average calculation. You want a sum and a 
count. I didn't work for the simple example I did by 
hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257]
* [More bike shedding, you can implement min and max as "oldValue = 
Math.min(oldValue, 
nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259]
* [Can you humor me and for Monitorable boolean checks rename to isXYZ and for 
things that might change it leave as 
is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53]
* [I think failedAt is unused 
now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223]


was (Author: aweisberg):
* Use a dedicated thread to update the timestamp so it isn't impacted by other 
activities
* I was going to suggest use the thread used by NanoTimeToCurrentTimeMillis, so 
make it an SES and schedule the work there. However I'm not even sure why that 
activity deserved it's own thread. I think there was nothing available in some 
version of C*, but now it could just use ScheduledExecutors. So maybe just a 
dedicated thread for updating ApproximateTime. I believe approximate time will 
find more traction over time so it should be reasonably accurate when possible.
* I think the timestamp field in ApproximateTime needs to be volatile.
* Several properties don't have the "cassandra." prefix
* By polling the queue when not reporting you are increasing the bound on the 
number of retained failures and resources pinned by this reporting since 
aggregation doesn't really aggregate yet. I would just drain the queue when 
logging. 
* I think you want a count of operations that were truncated instead of a 
boolean so you can log the count.
* [Offering into the queue returns a boolean and doesn't throw, which style 
wise seems a little nicer, but that is bike 
shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126]
* More bike shedding, when aggregating I would just allocate the map each time 
rather than clear it.
* I think you should sync logging to the debug log and logging info level to 
the regular log. Then in the regular log print a count of how many operations 
timed out since the last time you logged. That way it is easy to map between 
the two when looking at timestamps.
* [I don't think this is a correct average calculation. You want a sum and a 
count. I didn't work for the simple example I did by 
hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257]
* [More bike shedding, you can implement min and max as "oldValue = 
Math.min(oldValue, 

[jira] [Comment Edited] (CASSANDRA-7392) Abort in-progress queries that time out

2015-09-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933983#comment-14933983
 ] 

Ariel Weisberg edited comment on CASSANDRA-7392 at 9/28/15 9:11 PM:


* Use a dedicated thread to update the timestamp so it isn't impacted by other 
activities
* I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, 
so make it an SES and schedule the work there. However I'm not even sure why 
that activity deserved it's own thread. I think there was nothing available in 
some version of C*, but now it could just use ScheduledExecutors. So maybe just 
a dedicated thread for updating ApproximateTime. I believe approximate time 
will find more traction over time so it should be reasonably accurate when 
possible.
* I think the timestamp field in ApproximateTime needs to be volatile.
* Several properties don't have the "cassandra." prefix
* By polling the queue when not reporting you are increasing the bound on the 
number of retained failures and resources pinned by this reporting since 
aggregation doesn't really aggregate yet. I would just drain the queue when 
logging. 
* I think you want a count of operations that were truncated instead of a 
boolean so you can log the count.
* [Offering into the queue returns a boolean and doesn't throw, which style 
wise seems a little nicer, but that is bike 
shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126]
* More bike shedding, when aggregating I would just allocate the map each time 
rather than clear it.
* I think you should sync logging to the debug log and logging info level to 
the regular log. Then in the regular log print a count of how many operations 
timed out since the last time you logged. That way it is easy to map between 
the two when looking at timestamps.
* [I don't think this is a correct average calculation. You want a sum and a 
count. I didn't work for the simple example I did by 
hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257]
* [More bike shedding, you can implement min and max as "oldValue = 
Math.min(oldValue, 
nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259]
* [Can you humor me and for Monitorable boolean checks rename to isXYZ and for 
things that might change it leave as 
is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53]
* [I think failedAt is unused 
now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223]
* [If we use approximate time for timeouts can we also use it for setting the 
construction 
time?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2603bbfead4cdd58e1e08b225338bda0R28]


was (Author: aweisberg):
* Use a dedicated thread to update the timestamp so it isn't impacted by other 
activities
* I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, 
so make it an SES and schedule the work there. However I'm not even sure why 
that activity deserved it's own thread. I think there was nothing available in 
some version of C*, but now it could just use ScheduledExecutors. So maybe just 
a dedicated thread for updating ApproximateTime. I believe approximate time 
will find more traction over time so it should be reasonably accurate when 
possible.
* I think the timestamp field in ApproximateTime needs to be volatile.
* Several properties don't have the "cassandra." prefix
* By polling the queue when not reporting you are increasing the bound on the 
number of retained failures and resources pinned by this reporting since 
aggregation doesn't really aggregate yet. I would just drain the queue when 
logging. 
* I think you want a count of operations that were truncated instead of a 
boolean so you can log the count.
* [Offering into the queue returns a boolean and doesn't throw, which style 
wise seems a little nicer, but that is bike 
shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126]
* More bike shedding, when aggregating I would just allocate the map each time 
rather than clear it.
* I think you should sync logging to the debug log and logging info level to 
the regular log. Then in the regular log print a count of how many operations 
timed out since the last time you logged. That way it is easy to map between 
the two when looking at timestamps.
* [I don't think this is a correct average calculation. You want a sum and a 
count. I didn't work for the simple example I did by 

[jira] [Commented] (CASSANDRA-10379) Consider using -XX:+TrustFinalNonStaticFields

2015-09-28 Thread Jonathan Ellis (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933365#comment-14933365
 ] 

Jonathan Ellis commented on CASSANDRA-10379:


Sounds reasonable.  AFAIK we're not doing any reflection tricks to defeat 
final-ness.

I found this thread on the flag, FWIW: 
http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017698.html

> Consider using -XX:+TrustFinalNonStaticFields
> -
>
> Key: CASSANDRA-10379
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10379
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Robert Stupp
> Fix For: 3.x
>
>
> The JVM option {{-XX:+TrustFinalNonStaticFields}}, although experimental, 
> seems to improve performance a bit without any code change. Therefore I 
> propose to include it in {{cassandra-env.sh/psl}}.
> [cstar perf 
> benchmark|http://cstar.datastax.com/graph?stats=a6e75018-5ff4-11e5-bf84-42010af0688f=op_rate=1_user=1_aggregates=true=0=865.59=0=145568.5]
> The cstar test was run with 8u45.
> {noformat}
> JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions"
> JVM_OPTS="$JVM_OPTS -XX:+TrustFinalNonStaticFields"
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10401) json2sstable fails with NPE

2015-09-28 Thread Jose Martinez Poblete (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jose Martinez Poblete updated CASSANDRA-10401:
--
Description: 
We have the following table...

{noformat}
CREATE TABLE keyspace_name.table_name (
col1 text,
col2 text,
col3 text,
col4 text,
PRIMARY KEY ((col1, col2), col3)
) WITH CLUSTERING ORDER BY (col3 ASC)
{noformat}

And the following  json in a file created from sstable2json tool

{noformat}
[
{"key": "This is col1:This is col2,
 "cells": [["This is col3:","",1443217787319002],
   ["This is col3:"col4","This is col4",1443217787319002]]}
]
{noformat}

Let's say we deleted that record form the DB and wanted to bring it back
If we try to create an sstable from this data in a json file named 
test_file.json, we get a NPE 

{noformat}
-bash-4.1$ json2sstable -K elp -c table_name-3264cbe063c211e5bc34e746786b7b29 
test_file.json  
/var/lib/cassandra/data/keyspace_name/table_name-3264cbe063c211e5bc34e746786b7b29/keyspace_name-table_name-ka-1-Data.db
Importing 1 keys...
java.lang.NullPointerException
at 
org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442)
at 
org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:316)
at 
org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:287)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:514)
ERROR: null
-bash-4.1$
{noformat}

  was:
We have the following table...

{noformat}
CREATE TABLE elp.document (
business_area_ct text,
business_id text,
document_id text,
access_level_ct text,
annotation_tx text,
author_nm text,
business_id_type_ct text,
cms_id text,
direction_ct text,
document_code_id uuid,
file_metadata_map_nm map,
last_mod_ts timestamp,
last_mod_user_id text,
official_document_ts timestamp,
repository_logical_package_no int,
repository_offset_no int,
repository_package_handle_user_id text,
repository_package_nm text,
repository_package_sequence_no int,
repository_procedural_nm text,
review_complete_in boolean,
source_system_document_id text,
source_system_nm text,
status_cd text,
vendor_nm text,
PRIMARY KEY ((business_area_ct, business_id), document_id)
) WITH CLUSTERING ORDER BY (document_id ASC)
{noformat}

And the following  json in a file created from sstable2json tool

{noformat}
[
{"key": "This is business_area_ct:This is business_id",
 "cells": [["This is document_id:","",1443217787319002],
   ["This is document_id:author_nm","This is 
autor_nm",1443217787319002]]}
]
{noformat}

Let's say we deleted that record form the DB and wanted to bring it back
If we try to create an sstable from this json file,  get a NPE 

{noformat}
-bash-4.1$ json2sstable -K elp -c document-3264cbe063c211e5bc34e746786b7b29 
test2.json  
/var/lib/cassandra/data/elp/document-3264cbe063c211e5bc34e746786b7b29/elp-document-ka-1-Data.db
Importing 1 keys...
java.lang.NullPointerException
at 
org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442)
at 
org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:316)
at 
org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:287)
at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:514)
ERROR: null
-bash-4.1$
{noformat}


> json2sstable fails with NPE
> ---
>
> Key: CASSANDRA-10401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.1.8.621
>Reporter: Jose Martinez Poblete
>
> We have the following table...
> {noformat}
> CREATE TABLE keyspace_name.table_name (
> col1 text,
> col2 text,
> col3 text,
> col4 text,
> PRIMARY KEY ((col1, col2), col3)
> ) WITH CLUSTERING ORDER BY (col3 ASC)
> {noformat}
> And the following  json in a file created from sstable2json tool
> {noformat}
> [
> {"key": "This is col1:This is col2,
>  "cells": [["This is col3:","",1443217787319002],
>["This is col3:"col4","This is col4",1443217787319002]]}
> ]
> {noformat}
> Let's say we deleted that record form the DB and wanted to bring it back
> If we try to create an sstable from this data in a json file named 
> test_file.json, we get a NPE 
> {noformat}
> -bash-4.1$ json2sstable -K elp -c table_name-3264cbe063c211e5bc34e746786b7b29 
> test_file.json  
> /var/lib/cassandra/data/keyspace_name/table_name-3264cbe063c211e5bc34e746786b7b29/keyspace_name-table_name-ka-1-Data.db
> Importing 1 keys...
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442)
>   at 
> 

[jira] [Created] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Joshua McKenzie (JIRA)
Joshua McKenzie created CASSANDRA-10403:
---

 Summary: Consider reverting to CMS GC on 3.0
 Key: CASSANDRA-10403
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
 Project: Cassandra
  Issue Type: Improvement
  Components: Config
Reporter: Joshua McKenzie
 Fix For: 3.0.0 rc2


Reference discussion on CASSANDRA-7486.

For smaller heap sizes G1 appears to have some throughput/latency issues when 
compared to CMS. With our default max heap size at 8G on 3.0, there's a strong 
argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10276) With DTCS, do STCS in windows if more than max_threshold sstables

2015-09-28 Thread JIRA

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1490#comment-1490
 ] 

Björn Hegerfors commented on CASSANDRA-10276:
-

I have one comment on this patch. If for some reason the STCS won't find 
something to compact even though bucket.size() > maxThreshold (sounds unlikely 
with default STCS options), then just skipping that window might render that 
window uncompacted for all eternity. IMO, rather than trying the next window, 
as in the latest commit, why not just return the maxThreshold smallest SSTables 
from the bucket?

> With DTCS, do STCS in windows if more than max_threshold sstables
> -
>
> Key: CASSANDRA-10276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10276
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Core
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
> Fix For: 3.x, 2.1.x, 2.2.x
>
>
> To avoid constant recompaction of files in big ( > max threshold) DTCS 
> windows, we should do STCS of those files.
> Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9428) Implement hints compression

2015-09-28 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie updated CASSANDRA-9428:
---
Fix Version/s: (was: 3.0.0 rc2)
   3.0.x

> Implement hints compression
> ---
>
> Key: CASSANDRA-9428
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9428
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Aleksey Yeschenko
>Assignee: Joshua McKenzie
> Fix For: 3.0.x
>
>
> CASSANDRA-6230 is being implemented with compression in mind, but it's not 
> going to be implemented by the original ticket.
> Adding it on top should be relatively straight-forward, and important, since 
> there are several users in the wild that use compression interface for 
> encryption purposes. DSE is one of them (but isn't the only one). Losing 
> encryption capabilities would be a regression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0

2015-09-28 Thread Jonathan Shook (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933505#comment-14933505
 ] 

Jonathan Shook commented on CASSANDRA-10403:


Can we get some G1 tests with a 24+G heap to see if it's worth making this 
machine-specific? The notion of "commodity" changes with time. The settings 
need to adapt if possible.



> Consider reverting to CMS GC on 3.0
> ---
>
> Key: CASSANDRA-10403
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10403
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Config
>Reporter: Joshua McKenzie
> Fix For: 3.0.0 rc2
>
>
> Reference discussion on CASSANDRA-7486.
> For smaller heap sizes G1 appears to have some throughput/latency issues when 
> compared to CMS. With our default max heap size at 8G on 3.0, there's a 
> strong argument to be made for having CMS as the default for the 3.0 release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10166) Fix failing tests

2015-09-28 Thread Aleksey Yeschenko (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aleksey Yeschenko updated CASSANDRA-10166:
--
Summary: Fix failing tests  (was: Failing tests on cassandra 3.0 branch)

> Fix failing tests
> -
>
> Key: CASSANDRA-10166
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10166
> Project: Cassandra
>  Issue Type: Test
>Reporter: Sylvain Lebresne
> Fix For: 3.0.0 rc2
>
>
> Until we find a better way to track those things, this is meant as a master 
> ticket to track tickets open regarding tests (unit test and dtests, though at 
> the time of this writing only dtest are still failing) that are still 
> failing. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-7486) Migrate to G1GC by default

2015-09-28 Thread Joshua McKenzie (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joshua McKenzie resolved CASSANDRA-7486.

Resolution: Fixed

Opened CASSANDRA-10403 to cover profiling and possible revert to CMS.

> Migrate to G1GC by default
> --
>
> Key: CASSANDRA-7486
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7486
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
> Fix For: 3.0 alpha 1
>
>
> See 
> http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning
>  and https://twitter.com/rbranson/status/482113561431265281
> May want to default 2.1 to G1.
> 2.1 is a different animal from 2.0 after moving most of memtables off heap.  
> Suspect this will help G1 even more than CMS.  (NB this is off by default but 
> needs to be part of the test.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2015-09-28 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933382#comment-14933382
 ] 

Yuki Morishita commented on CASSANDRA-7276:
---

Sure. Though attached patches need to work a bit more.
Or maybe consider using logging context as suggested before.

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Nitzan Volman
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9774) fix sstableverify dtest

2015-09-28 Thread Sylvain Lebresne (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sylvain Lebresne updated CASSANDRA-9774:

Assignee: Jeff Jirsa

> fix sstableverify dtest
> ---
>
> Key: CASSANDRA-9774
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9774
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Jim Witschey
>Assignee: Jeff Jirsa
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> One of our dtests for {{sstableverify}} 
> ({{offline_tools_test.py:TestOfflineTools.sstableverify_test}}) is failing 
> hard on trunk ([cassci 
> history|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/history/])
> The way the test works is by deleting an SSTable, then running 
> {{sstableverify}} on its table. In earlier versions, it successfully detects 
> this problem and outputs that it "was not released before the reference was 
> garbage collected". The test no longer finds this string in the output; 
> looking through the output of the test, it doesn't look like it reports any 
> problems at all.
> EDIT: After digging into the C* source a bit, I may have misattributed the 
> problem to {{sstableverify}}; this could be a more general memory management 
> problem, as the error text expected in the dtest is emitted by part of the 
> {{Ref}} implementation:
> https://github.com/apache/cassandra/blob/075ff5000ced24b42f3b540815cae471bee4049d/src/java/org/apache/cassandra/utils/concurrent/Ref.java#L187



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2015-09-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933377#comment-14933377
 ] 

Paulo Motta commented on CASSANDRA-7276:


[~yukim] mind if I take this for review as it might be related to the recent 
logging changes?

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Nitzan Volman
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10401) json2sstable fails with NPE

2015-09-28 Thread Philip Thompson (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Philip Thompson updated CASSANDRA-10401:

Tester: Jim Witschey

> json2sstable fails with NPE
> ---
>
> Key: CASSANDRA-10401
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10401
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tools
> Environment: Cassandra 2.1.8.621
>Reporter: Jose Martinez Poblete
>
> We have the following table...
> {noformat}
> CREATE TABLE elp.document (
> business_area_ct text,
> business_id text,
> document_id text,
> access_level_ct text,
> annotation_tx text,
> author_nm text,
> business_id_type_ct text,
> cms_id text,
> direction_ct text,
> document_code_id uuid,
> file_metadata_map_nm map,
> last_mod_ts timestamp,
> last_mod_user_id text,
> official_document_ts timestamp,
> repository_logical_package_no int,
> repository_offset_no int,
> repository_package_handle_user_id text,
> repository_package_nm text,
> repository_package_sequence_no int,
> repository_procedural_nm text,
> review_complete_in boolean,
> source_system_document_id text,
> source_system_nm text,
> status_cd text,
> vendor_nm text,
> PRIMARY KEY ((business_area_ct, business_id), document_id)
> ) WITH CLUSTERING ORDER BY (document_id ASC)
> {noformat}
> And the following  json in a file created from sstable2json tool
> {noformat}
> [
> {"key": "This is business_area_ct:This is business_id",
>  "cells": [["This is document_id:","",1443217787319002],
>["This is document_id:author_nm","This is 
> autor_nm",1443217787319002]]}
> ]
> {noformat}
> Let's say we deleted that record form the DB and wanted to bring it back
> If we try to create an sstable from this json file,  get a NPE 
> {noformat}
> -bash-4.1$ json2sstable -K elp -c document-3264cbe063c211e5bc34e746786b7b29 
> test2.json  
> /var/lib/cassandra/data/elp/document-3264cbe063c211e5bc34e746786b7b29/elp-document-ka-1-Data.db
> Importing 1 keys...
> java.lang.NullPointerException
>   at 
> org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442)
>   at 
> org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:316)
>   at 
> org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:287)
>   at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:514)
> ERROR: null
> -bash-4.1$
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10289) Fix cqlshlib tests

2015-09-28 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-10289:
---
Reviewer: Stefania

[~Stefania] to review

> Fix cqlshlib tests
> --
>
> Key: CASSANDRA-10289
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10289
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tests
>Reporter: Jim Witschey
>Assignee: Jim Witschey
>  Labels: cqlsh
> Fix For: 3.0.0 rc2
>
> Attachments: trunk-10289.txt
>
>
> The cqlsh tests in trunk haven't been running for a while:
> http://cassci.datastax.com/view/All_Jobs/job/trunk_cqlshlib/423/testReport/
> This looks like the driver errors that happened because of CASSANDRA-6717. 
> Not sure why it's happening now; the driver installation looks normal to me 
> on those jobs. [~mshuler]?
> There were also some changes to cqlsh itself that also broke the test 
> harness, but I believe those are fixed here:
> https://github.com/mambocab/cassandra/tree/fix-cqlsh-tests
> Once the tests are running successfully on CassCI, I'll test my patch and 
> mark as patch available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?

2015-09-28 Thread Sylvain Lebresne (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933464#comment-14933464
 ] 

Sylvain Lebresne commented on CASSANDRA-9806:
-

[~aboudreault] Could you try to bisect when this started to fail?

> some TTL test are failing on trunk: losing data after restart? 
> ---
>
> Key: CASSANDRA-9806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9806
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alan Boudreault
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is 
> failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are 
> failing:
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/
> After some debugging, I noticed a strange behaviour. It looks like some data 
> disappear after a node restart, even if the row has no TTL set. Here a test 
> example where I see the issue with latest trunk:
> https://gist.github.com/aboudreault/94cb552750a186ca853d



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-9855) Make page_size configurable in cqlsh

2015-09-28 Thread Jonathan Ellis (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Ellis updated CASSANDRA-9855:
--
Reviewer: Philip Thompson

> Make page_size configurable in cqlsh
> 
>
> Key: CASSANDRA-9855
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9855
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Sylvain Lebresne
>Assignee: Ryan McGuire
>Priority: Minor
>  Labels: cqlsh
> Fix For: 2.2.x, 3.0.0 rc2
>
> Attachments: 9855.txt
>
>
> Appears we made cqlsh use paging, but the page size if hard-coded. It sounds 
> easy enough to make that configurable, by either one of:
> {noformat}
> PAGING 50;
> PAGING ON WITH PAGE_SIZE=50;
> {noformat}
> I'm sure some users may be happy with the convenience but it would also be 
> nice when we want to quickly test paging.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10383) Disable auto snapshot on selected tables.

2015-09-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910197#comment-14910197
 ] 

ASF GitHub Bot commented on CASSANDRA-10383:


GitHub user tommystendahl opened a pull request:

https://github.com/apache/cassandra/pull/54

Disable auto snapshot on selected tables (CASSANDRA-10383)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/tommystendahl/cassandra cassandra-10383

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/cassandra/pull/54.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #54


commit c647b57abedda8e2c578cc0bab14d79ca0722b71
Author: tommy stendahl 
Date:   2015-09-28T08:24:18Z

Disable auto snapshot on selected tables (CASSANDRA-10383)




> Disable auto snapshot on selected tables.
> -
>
> Key: CASSANDRA-10383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10383
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
> Fix For: 2.1.x
>
>
> I have a use case where I would like to turn off auto snapshot for selected 
> tables, I don't want to turn it off completely since its a good feature. 
> Looking at the code I think it would be relatively easy to fix.
> My plan is to create a new table property named something like 
> "disable_auto_snapshot". If set to false it will prevent auto snapshot on the 
> table, if set to true auto snapshot will be controlled by the "auto_snapshot" 
> property in the cassandra.yaml. Default would be true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10383) Disable auto snapshot on selected tables.

2015-09-28 Thread Tommy Stendahl (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933082#comment-14933082
 ] 

Tommy Stendahl edited comment on CASSANDRA-10383 at 9/28/15 9:41 AM:
-

Attached a patch based on 2.1.

Sorry for the pull request, it was supposed to be on my private github fork. :-(


was (Author: tommy_s):
based on 2.1

> Disable auto snapshot on selected tables.
> -
>
> Key: CASSANDRA-10383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10383
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
> Fix For: 2.1.x
>
> Attachments: 10383.txt
>
>
> I have a use case where I would like to turn off auto snapshot for selected 
> tables, I don't want to turn it off completely since its a good feature. 
> Looking at the code I think it would be relatively easy to fix.
> My plan is to create a new table property named something like 
> "disable_auto_snapshot". If set to false it will prevent auto snapshot on the 
> table, if set to true auto snapshot will be controlled by the "auto_snapshot" 
> property in the cassandra.yaml. Default would be true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10383) Disable auto snapshot on selected tables.

2015-09-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933086#comment-14933086
 ] 

ASF GitHub Bot commented on CASSANDRA-10383:


Github user tommystendahl closed the pull request at:

https://github.com/apache/cassandra/pull/54


> Disable auto snapshot on selected tables.
> -
>
> Key: CASSANDRA-10383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10383
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
> Fix For: 2.1.x
>
> Attachments: 10383.txt
>
>
> I have a use case where I would like to turn off auto snapshot for selected 
> tables, I don't want to turn it off completely since its a good feature. 
> Looking at the code I think it would be relatively easy to fix.
> My plan is to create a new table property named something like 
> "disable_auto_snapshot". If set to false it will prevent auto snapshot on the 
> table, if set to true auto snapshot will be controlled by the "auto_snapshot" 
> property in the cassandra.yaml. Default would be true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-10383) Disable auto snapshot on selected tables.

2015-09-28 Thread Tommy Stendahl (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933082#comment-14933082
 ] 

Tommy Stendahl edited comment on CASSANDRA-10383 at 9/28/15 9:53 AM:
-

Attached a patch based on 2.1. The patch is also pn github, 
[here|https://github.com/tommystendahl/cassandra/tree/cassandra-10383]

Sorry for the pull request, it was supposed to be on my private github fork. :-(


was (Author: tommy_s):
Attached a patch based on 2.1.

Sorry for the pull request, it was supposed to be on my private github fork. :-(

> Disable auto snapshot on selected tables.
> -
>
> Key: CASSANDRA-10383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10383
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
> Fix For: 2.1.x
>
> Attachments: 10383.txt
>
>
> I have a use case where I would like to turn off auto snapshot for selected 
> tables, I don't want to turn it off completely since its a good feature. 
> Looking at the code I think it would be relatively easy to fix.
> My plan is to create a new table property named something like 
> "disable_auto_snapshot". If set to false it will prevent auto snapshot on the 
> table, if set to true auto snapshot will be controlled by the "auto_snapshot" 
> property in the cassandra.yaml. Default would be true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10383) Disable auto snapshot on selected tables.

2015-09-28 Thread Tommy Stendahl (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tommy Stendahl updated CASSANDRA-10383:
---
Attachment: 10383.txt

based on 2.1

> Disable auto snapshot on selected tables.
> -
>
> Key: CASSANDRA-10383
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10383
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Tommy Stendahl
>Assignee: Tommy Stendahl
> Fix For: 2.1.x
>
> Attachments: 10383.txt
>
>
> I have a use case where I would like to turn off auto snapshot for selected 
> tables, I don't want to turn it off completely since its a good feature. 
> Looking at the code I think it would be relatively easy to fix.
> My plan is to create a new table property named something like 
> "disable_auto_snapshot". If set to false it will prevent auto snapshot on the 
> table, if set to true auto snapshot will be controlled by the "auto_snapshot" 
> property in the cassandra.yaml. Default would be true.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Cassandra Wiki] Update of "Committers" by RobertStupp

2015-09-28 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "Committers" page has been changed by RobertStupp:
https://wiki.apache.org/cassandra/Committers?action=diff=53=54

  ||Tyler Hobbs ||Mar 2014 ||Datastax || ||
  ||Benedict Elliott Smith ||May 2014 ||Datastax || ||
  ||Josh Mckenzie ||Jul 2014 ||Datastax || ||
- ||Robert Stupp ||Jan 2015 ||contentteam || ||
+ ||Robert Stupp ||Jan 2015 ||Independent || ||
  ||Sam Tunnicliffe ||May 2015 ||Datastax || ||
  ||Benjamin Lerer ||Jul 2015 ||Datastax || ||
  


[jira] [Commented] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?

2015-09-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934172#comment-14934172
 ] 

Paulo Motta commented on CASSANDRA-9806:


I forgot to mention that I also tested the attached gist and it passed.

> some TTL test are failing on trunk: losing data after restart? 
> ---
>
> Key: CASSANDRA-9806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9806
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alan Boudreault
>Assignee: Paulo Motta
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is 
> failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are 
> failing:
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/
> After some debugging, I noticed a strange behaviour. It looks like some data 
> disappear after a node restart, even if the row has no TTL set. Here a test 
> example where I see the issue with latest trunk:
> https://gist.github.com/aboudreault/94cb552750a186ca853d



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10241) Keep a separate production debug log for troubleshooting

2015-09-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934310#comment-14934310
 ] 

Paulo Motta commented on CASSANDRA-10241:
-

Now that we have the basic capability committed, I'd like to follow up on this 
by introducing a simple logging guideline for future system logging statements, 
based on the discussions of this thread and current practices. This guideline 
could help external and new contributors to understand the logging practices, 
and current contributors to review tickets related to logging using the new 
framework.

I've drafted an initial version for review, presented below:

*INFO*: General cluster status, operations overview. At this level a beginner 
user or operator should be able to understand most messages. 
Examples:
* Node startup and shutdown information
* User or system triggered operations overview
** Repair start and finish state
** Cleanup start and finish state
** Bootstrap start and finish state
** Index rebuild start and finish state

*DEBUG*: Low frequency state changes or message passing. Non-critical path logs 
on operation details, performance measurements or general troubleshooting 
information. At this level an advanced operator or system developer will have 
elements to investigate or detect erroneous conditions or performance 
bottlenecks, extract reproduction steps or inspect advanced operational 
information.
Examples:
* SSTable flushing
* Compactions in progress
* Gossip or schema state changes
* Operations intermediate steps
** Repair steps
** Stream session message exchanges

*WARN*: Use of suboptimal parameters or deprecated options, detection of 
degraded performance, capability limitations or missing dependencies. General 
optimization tips. At this level, an operator should be able to detect an 
eminent error condition, use of suboptimal parameters or non-critical 
configuration errors. Examples:
* Use of chunk_length_in_kb property instead of chunk_length
* GC above treshold warnings
* OpenJDK not recommended notice
* Small sstable size warning (Testing done for CASSANDRA-5727 indicates that 
performance improves up to 160MB)

*ERROR*:  A expected error condition has ocurred. Non-critical, transient or 
recovered errors might be reported at DEBUG level instead so they don't pollute 
system.log.
Examples:
 * critical errors in general (corrupted disk, read error, etc)
 * leak detection

*TRACE*:  High frequency state changes or message passing, critical path logs, 
testing or development information. This level is disabled by default, so 
everything that does not fit in the previous levels and highly verbose stuff 
must be kept at TRACE level. 
Examples:
* Failure detector checks
* Gossip digests
* CassandraServer.insert()

What do you think [~aweisberg]? After review and suggestions, if there are no 
objections, I will add this to the wiki and send an e-mail to the dev list.

After this, the next step would be to groom the current logs in a separate 
ticket so they follow the guideline.

> Keep a separate production debug log for troubleshooting
> 
>
> Key: CASSANDRA-10241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10241
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Config
>Reporter: Jonathan Ellis
>Assignee: Paulo Motta
> Fix For: 2.2.x, 3.0.0 rc2
>
> Attachments: 2.2-debug.log, 2.2-system.log, 3.0-debug.log, 
> 3.0-system.log
>
>
> [~aweisberg] had the suggestion to keep a separate debug log for aid in 
> troubleshooting, not intended for regular human consumption but where we can 
> log things that might help if something goes wrong.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10298) Replaced dead node stayed in gossip forever

2015-09-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934354#comment-14934354
 ] 

Stefania commented on CASSANDRA-10298:
--

No they are both still under test, besides we are focusing on the 3.0 branch 
for the tests. If you really need this in 2.1 then you can commit this patch.

> Replaced dead node stayed in gossip forever
> ---
>
> Key: CASSANDRA-10298
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10298
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Dikang Gu
>Assignee: Dikang Gu
> Fix For: 2.1.x
>
> Attachments: CASSANDRA-10298.patch
>
>
> The dead node stayed in the nodetool status,
> DN  10.210.165.55379.76 GB  256 ?   null
> And in the log, it throws NPE when trying to remove it.
> {code}
> 2015-09-10_06:41:22.92453 ERROR 06:41:22 Exception in thread 
> Thread[GossipStage:1,5,main]
> 2015-09-10_06:41:22.92454 java.lang.NullPointerException: null
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:100) 
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.db.HintedHandOffManager.deleteHintsForEndpoint(HintedHandOffManager.java:201)
>  
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:1886) 
> 2015-09-10_06:41:22.92455   at 
> org.apache.cassandra.service.StorageService.excise(StorageService.java:1902) 
> 2015-09-10_06:41:22.92456   at 
> org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1805)
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1473)
>  
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2099) 
> 2015-09-10_06:41:22.92457   at 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) 
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1085) 
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  
> 2015-09-10_06:41:22.92458   at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) 
> 2015-09-10_06:41:22.92459   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_45]
> 2015-09-10_06:41:22.92460   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_45]
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal

2015-09-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934356#comment-14934356
 ] 

Stefania commented on CASSANDRA-10089:
--

[~jbellis] can I have a reviewer for this patch? I was not able to reproduce 
the problem despite a few rounds of CI, so I am no 100% sure of what causes 
tokens to be missing during a {{handleStateNormal}} but the patch will at least 
fix the NPE. 

> NullPointerException in Gossip handleStateNormal
> 
>
> Key: CASSANDRA-10089
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10089
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefania
>Assignee: Stefania
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> Whilst comparing dtests for CASSANDRA-9970 I found [this failing 
> dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/]
>  in 2.2:
> {code}
> Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 
> 15:39:57,873 CassandraDaemon.java:183 - Exception in thread 
> Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat 
> org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629)
>  ~[main/:na] \tat 
> org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) 
> ~[main/:na] \tat 
> org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49)
>  ~[main/:na] \tat 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) 
> ~[main/:na] \tat 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  ~[na:1.7.0_80] \tat 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]']
> {code}
> I wasn't able to find it on unpatched branches  but it is clearly not related 
> to CASSANDRA-9970, if anything it could have been a side effect of 
> CASSANDRA-9871.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10406) Nodetool supports to rebuild from specific ranges.

2015-09-28 Thread Yuki Morishita (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934363#comment-14934363
 ] 

Yuki Morishita commented on CASSANDRA-10406:


Can you post the patch against cassandra-2.1 branch?
1.2 and 2.0 went into EOL and no further development is happening on those 
versions.

> Nodetool supports to rebuild from specific ranges.
> --
>
> Key: CASSANDRA-10406
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10406
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Dikang Gu
>Assignee: Dikang Gu
> Fix For: 1.2.x
>
> Attachments: CASSANDRA-10406.patch
>
>
> Add the 'nodetool rebuildrange' command, so that if `nodetool rebuild` 
> failed, we do not need to rebuild all the ranges, and can just rebuild those 
> failed ones.
> Should be easily ported to all versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?

2015-09-28 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta resolved CASSANDRA-9806.

Resolution: Invalid

> some TTL test are failing on trunk: losing data after restart? 
> ---
>
> Key: CASSANDRA-9806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9806
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alan Boudreault
>Assignee: Paulo Motta
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is 
> failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are 
> failing:
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/
> After some debugging, I noticed a strange behaviour. It looks like some data 
> disappear after a node restart, even if the row has no TTL set. Here a test 
> example where I see the issue with latest trunk:
> https://gist.github.com/aboudreault/94cb552750a186ca853d



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?

2015-09-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934149#comment-14934149
 ] 

Paulo Motta commented on CASSANDRA-9806:


The error was reported on jenkins build #346. We're currently on build #625, 
and both 
[ttl_is_respected_on_delayed_replication_test|http://cassci.datastax.com/view/trunk/job/trunk_dtest/625/testReport/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/history/]
 and 
[ttl_is_respected_on_repair_test|http://cassci.datastax.com/view/trunk/job/trunk_dtest/625/testReport/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/history/]
 seem to been consistently stable in the last builds.

The only recent failures in 
[ttl_test.py|http://cassci.datastax.com/view/trunk/job/trunk_dtest/625/testReport/ttl_test/history/]
 are related to driver connection timeouts during setup, so I increased our 
default dtest timeout from 5s to 10s, which should make these and other tests 
less flakey: [dtest PR|https://github.com/riptano/cassandra-dtest/pull/572].

> some TTL test are failing on trunk: losing data after restart? 
> ---
>
> Key: CASSANDRA-9806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9806
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alan Boudreault
>Assignee: Paulo Motta
>Priority: Blocker
> Fix For: 3.0.0 rc2
>
>
> ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is 
> failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are 
> failing:
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/
> http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/
> After some debugging, I noticed a strange behaviour. It looks like some data 
> disappear after a node restart, even if the row has no TTL set. Here a test 
> example where I see the issue with latest trunk:
> https://gist.github.com/aboudreault/94cb552750a186ca853d



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

2015-09-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934321#comment-14934321
 ] 

Stefania commented on CASSANDRA-10231:
--

If you've reproduced it with your Jepsen test could you attach the logs please?

> Null status entries on nodes that crash during decommission of a different 
> node
> ---
>
> Key: CASSANDRA-10231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Stefania
> Fix For: 3.0.0 rc2
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node

2015-09-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934358#comment-14934358
 ] 

Stefania commented on CASSANDRA-10231:
--

Also, a similar exception affects 2.1+ as well, albeit the patch would be 
slightly different, see CASSANDRA-10298.

> Null status entries on nodes that crash during decommission of a different 
> node
> ---
>
> Key: CASSANDRA-10231
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10231
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Joel Knighton
>Assignee: Stefania
> Fix For: 3.0.0 rc2
>
>
> This issue is reproducible through a Jepsen test of materialized views that 
> crashes and decommissions nodes throughout the test.
> In a 5 node cluster, if a node crashes at a certain point (unknown) during 
> the decommission of a different node, it may start with a null entry for the 
> decommissioned node like so:
> DN 10.0.0.5 ? 256 ? null rack1
> This entry does not get updated/cleared by gossip. This entry is removed upon 
> a restart of the affected node.
> This issue is further detailed in ticket 
> [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068].



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10205) decommissioned_wiped_node_can_join_test fails on Jenkins

2015-09-28 Thread Stefania (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934364#comment-14934364
 ] 

Stefania commented on CASSANDRA-10205:
--

[~jbellis] we need a new reviewer, thanks.

> decommissioned_wiped_node_can_join_test fails on Jenkins
> 
>
> Key: CASSANDRA-10205
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10205
> Project: Cassandra
>  Issue Type: Test
>Reporter: Stefania
>Assignee: Stefania
> Attachments: decommissioned_wiped_node_can_join_test.tar.gz
>
>
> This test passes locally but reliably fails on Jenkins. It seems after we 
> restart node4, it is unable to Gossip with other nodes:
> {code}
> INFO  [HANDSHAKE-/127.0.0.2] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.2
> INFO  [HANDSHAKE-/127.0.0.1] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.1
> INFO  [HANDSHAKE-/127.0.0.3] 2015-08-27 06:50:42,778 
> OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.3
> ERROR [main] 2015-08-27 06:51:13,785 CassandraDaemon.java:635 - Exception 
> encountered during startup
> java.lang.RuntimeException: Unable to gossip with any seeds
> at 
> org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1342) 
> ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:518)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:763)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:687)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.StorageService.initServer(StorageService.java:570)
>  ~[main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) 
> [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516)
>  [main/:na]
> at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) 
> [main/:na]
> WARN  [StorageServiceShutdownHook] 2015-08-27 06:51:13,799 Gossiper.java:1453 
> - No local state or state is in silent shutdown, not announcing shutdown
> {code}
> It seems both the addresses and port number of the seeds are correct so I 
> don't think the problem is the Amazon private addresses but I might be wrong. 
> It's also worth noting that the first time the node starts up without 
> problems. The problem only occurs during a restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10406) Nodetool supports to rebuild from specific ranges.

2015-09-28 Thread Dikang Gu (JIRA)
Dikang Gu created CASSANDRA-10406:
-

 Summary: Nodetool supports to rebuild from specific ranges.
 Key: CASSANDRA-10406
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10406
 Project: Cassandra
  Issue Type: Improvement
Reporter: Dikang Gu
Assignee: Dikang Gu
 Fix For: 1.2.x


Add the 'nodetool rebuildrange' command, so that if `nodetool rebuild` failed, 
we do not need to rebuild all the ranges, and can just rebuild those failed 
ones.

Should be easily ported to all versions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions

2015-09-28 Thread Ariel Weisberg (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933841#comment-14933841
 ] 

Ariel Weisberg commented on CASSANDRA-10228:


Cassci looks happy near as I can tell. This is ready for commit.

> JVMStabilityInspector should inspect cause and suppressed exceptions
> 
>
> Key: CASSANDRA-10228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
>Reporter: Benedict
>Assignee: Paul MacIntosh
>  Labels: lhf
> Fix For: 2.1.x, 2.2.x, 3.0.x
>
>
> JVMStabilityInspector only checks the outer exception, but this can wrap or 
> otherwise suppress an exception we do consider "unstable". We should check 
> all of the exceptions in an exception graph before deciding things are kosher.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-10407) Benchmark and evaluate CASSANDRA-8894 improvements

2015-09-28 Thread Aleksey Yeschenko (JIRA)
Aleksey Yeschenko created CASSANDRA-10407:
-

 Summary: Benchmark and evaluate CASSANDRA-8894 improvements
 Key: CASSANDRA-10407
 URL: https://issues.apache.org/jira/browse/CASSANDRA-10407
 Project: Cassandra
  Issue Type: Test
Reporter: Aleksey Yeschenko
 Fix For: 3.0.0 rc2


The original ticket (CASSANDRA-8894) was committed to 3.0 alpha1 two months 
ago. We need to get proper performance tests before GA.

See [~benedict]'s 
[comment|https://issues.apache.org/jira/browse/CASSANDRA-8894?focusedCommentId=14631203=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14631203]
 for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size

2015-09-28 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934494#comment-14934494
 ] 

Aleksey Yeschenko commented on CASSANDRA-8894:
--

Closing the ticket as it's been committed to 3.0 alpha1 two months ago. Opened 
a separate CASSANDRA-10407 to follow up with proper tests.

> Our default buffer size for (uncompressed) buffered reads should be smaller, 
> and based on the expected record size
> --
>
> Key: CASSANDRA-8894
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8894
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Benedict
>Assignee: Stefania
>  Labels: benedict-to-commit
> Fix For: 3.0 alpha 1
>
> Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml, 
> screenshot-1.png, screenshot-2.png
>
>
> A large contributor to slower buffered reads than mmapped is likely that we 
> read a full 64Kb at once, when average record sizes may be as low as 140 
> bytes on our stress tests. The TLB has only 128 entries on a modern core, and 
> each read will touch 32 of these, meaning we are unlikely to almost ever be 
> hitting the TLB, and will be incurring at least 30 unnecessary misses each 
> time (as well as the other costs of larger than necessary accesses). When 
> working with an SSD there is little to no benefit reading more than 4Kb at 
> once, and in either case reading more data than we need is wasteful. So, I 
> propose selecting a buffer size that is the next larger power of 2 than our 
> average record size (with a minimum of 4Kb), so that we expect to read in one 
> operation. I also propose that we create a pool of these buffers up-front, 
> and that we ensure they are all exactly aligned to a virtual page, so that 
> the source and target operations each touch exactly one virtual page per 4Kb 
> of expected record size.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-10407) Benchmark and evaluate CASSANDRA-8894 improvements

2015-09-28 Thread Aleksey Yeschenko (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934495#comment-14934495
 ] 

Aleksey Yeschenko commented on CASSANDRA-10407:
---

cc [~enigmacurry]

> Benchmark and evaluate CASSANDRA-8894 improvements
> --
>
> Key: CASSANDRA-10407
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10407
> Project: Cassandra
>  Issue Type: Test
>Reporter: Aleksey Yeschenko
> Fix For: 3.0.0 rc2
>
>
> The original ticket (CASSANDRA-8894) was committed to 3.0 alpha1 two months 
> ago. We need to get proper performance tests before GA.
> See [~benedict]'s 
> [comment|https://issues.apache.org/jira/browse/CASSANDRA-8894?focusedCommentId=14631203=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14631203]
>  for more details.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2015-09-28 Thread Paulo Motta (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934399#comment-14934399
 ] 

Paulo Motta commented on CASSANDRA-7276:


A more elegant approach would be to use logback 
[MDC|http://logback.qos.ch/manual/mdc.html] feature, which allows to 
transparently add thread-local contexts to log statements (similar to the 
solution mentioned by  [~odpeer]). 

We could add new CF and KS MDC placeholders to the appender layout pattern on 
logback.xml (they will be empty if not set), and set them when necessary. We 
could start by setting on the following places:
* VerbHandlers which contains KS and CF info
* Flush
* Compaction

Some helper methods would be nice to provide encapsulated and consistent access 
to MDC. Are you still willing to take this [~nitzanv]?

> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Nitzan Volman
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2015-09-28 Thread J.P. Eiti Kimura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934449#comment-14934449
 ] 

J.P. Eiti Kimura commented on CASSANDRA-7276:
-

Very nice suggestion [~pauloricardomg]! We have been using MDC with logback for 
years you our platforms at Movile. It enable us to trace all the thread 
execution context. I think It is a better approuch than we are thinking before 
:)
[~nitzanv], I think I can help as well with this task. 
[~pauloricardomg] I believe I can start to work on it as you suggested in the 
next few weeks ;) 



> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Nitzan Volman
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-7276) Include keyspace and table names in logs where possible

2015-09-28 Thread J.P. Eiti Kimura (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934449#comment-14934449
 ] 

J.P. Eiti Kimura edited comment on CASSANDRA-7276 at 9/29/15 1:14 AM:
--

Very nice suggestion [~pauloricardomg]! We have been using MDC with logback for 
years in our platforms at Movile. It enable us to trace all the thread 
execution context. I think It is a better approuch than we are thinking before 
:)
[~nitzanv], I think I can help as well with this task. 
[~pauloricardomg] I believe I can start to work on it as you suggested in the 
next few weeks ;) 




was (Author: eitikimura):
Very nice suggestion [~pauloricardomg]! We have been using MDC with logback for 
years you our platforms at Movile. It enable us to trace all the thread 
execution context. I think It is a better approuch than we are thinking before 
:)
[~nitzanv], I think I can help as well with this task. 
[~pauloricardomg] I believe I can start to work on it as you suggested in the 
next few weeks ;) 



> Include keyspace and table names in logs where possible
> ---
>
> Key: CASSANDRA-7276
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7276
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Tyler Hobbs
>Assignee: Nitzan Volman
>Priority: Minor
>  Labels: bootcamp, lhf
> Fix For: 2.1.x
>
> Attachments: 2.1-CASSANDRA-7276-v1.txt, 
> cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt
>
>
> Most error messages and stacktraces give you no clue as to what keyspace or 
> table was causing the problem.  For example:
> {noformat}
> ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java 
> (line 198) Exception in thread Thread[MutationStage:61648,5,main]
> java.lang.IllegalArgumentException
> at java.nio.Buffer.limit(Unknown Source)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98)
> at 
> org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059)
> at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023)
> at 
> edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985)
> at 
> org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328)
> at 
> org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200)
> at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226)
> at org.apache.cassandra.db.Memtable.put(Memtable.java:173)
> at 
> org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368)
> at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333)
> at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206)
> at 
> org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56)
> at 
> org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> {noformat}
> We should try to include info on the keyspace and column family in the error 
> messages or logs whenever possible.  This includes reads, writes, 
> compactions, flushes, repairs, and probably more.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)