[jira] [Updated] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-10392: Description: It can be possible to use an external tracing solution in Cassandra by abstracting out the writing of tracing to system_traces tables in the tracing package to separate implementation classes and leaving abstract classes in place that define the interface and behaviour otherwise of C* tracing. Then via a system property "cassandra.custom_tracing_class" the Tracing class implementation could be swapped out with something third party. An example of this is adding Zipkin tracing into Cassandra in the Summit [presentation|]. In addition this patch passes the custom payload through into the tracing session allowing a third party tracing solution like Zipkin to do full-stack tracing from clients through and into Cassandra. was: It can be possible to use an external tracing solution in Cassandra by abstracting out the writing of tracing to system_traces tables in the tracing package to separate implementation classes and leaving abstract classes in place that define the interface and behaviour otherwise of C* tracing. Then via a system property "cassandra.custom_tracing_class" the Tracing class implementation could be swapped out with something third party. An example of this is adding Zipkin tracing into Cassandra in the Summit presentation. In addition this patch passes the custom payload through into the tracing session allowing a third party tracing solution like Zipkin to do full-stack tracing from clients through and into Cassandra. > Allow Cassandra to trace to custom tracing implementations > --- > > Key: CASSANDRA-10392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10392 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: mck >Assignee: mck > Fix For: 3.x > > Attachments: 10392-trunk.txt > > > It can be possible to use an external tracing solution in Cassandra by > abstracting out the writing of tracing to system_traces tables in the tracing > package to separate implementation classes and leaving abstract classes in > place that define the interface and behaviour otherwise of C* tracing. > Then via a system property "cassandra.custom_tracing_class" the Tracing class > implementation could be swapped out with something third party. > An example of this is adding Zipkin tracing into Cassandra in the Summit > [presentation|]. > In addition this patch passes the custom payload through into the tracing > session allowing a third party tracing solution like Zipkin to do full-stack > tracing from clients through and into Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10392) Allow Cassandra to trace to custom tracing implementations
[ https://issues.apache.org/jira/browse/CASSANDRA-10392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mck updated CASSANDRA-10392: Description: It can be possible to use an external tracing solution in Cassandra by abstracting out the writing of tracing to system_traces tables in the tracing package to separate implementation classes and leaving abstract classes in place that define the interface and behaviour otherwise of C* tracing. Then via a system property "cassandra.custom_tracing_class" the Tracing class implementation could be swapped out with something third party. An example of this is adding Zipkin tracing into Cassandra in the Summit [presentation|http://thelastpickle.com/files/2015-09-24-using-zipkin-for-full-stack-tracing-including-cassandra/presentation/tlp-reveal.js/tlp-cassandra-zipkin.html]. Code for the implemented Zipkin plugin can be found at https://github.com/thelastpickle/cassandra-zipkin-tracing/ In addition this patch passes the custom payload through into the tracing session allowing a third party tracing solution like Zipkin to do full-stack tracing from clients through and into Cassandra. was: It can be possible to use an external tracing solution in Cassandra by abstracting out the writing of tracing to system_traces tables in the tracing package to separate implementation classes and leaving abstract classes in place that define the interface and behaviour otherwise of C* tracing. Then via a system property "cassandra.custom_tracing_class" the Tracing class implementation could be swapped out with something third party. An example of this is adding Zipkin tracing into Cassandra in the Summit [presentation|]. In addition this patch passes the custom payload through into the tracing session allowing a third party tracing solution like Zipkin to do full-stack tracing from clients through and into Cassandra. > Allow Cassandra to trace to custom tracing implementations > --- > > Key: CASSANDRA-10392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10392 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: mck >Assignee: mck > Fix For: 3.x > > Attachments: 10392-trunk.txt > > > It can be possible to use an external tracing solution in Cassandra by > abstracting out the writing of tracing to system_traces tables in the tracing > package to separate implementation classes and leaving abstract classes in > place that define the interface and behaviour otherwise of C* tracing. > Then via a system property "cassandra.custom_tracing_class" the Tracing class > implementation could be swapped out with something third party. > An example of this is adding Zipkin tracing into Cassandra in the Summit > [presentation|http://thelastpickle.com/files/2015-09-24-using-zipkin-for-full-stack-tracing-including-cassandra/presentation/tlp-reveal.js/tlp-cassandra-zipkin.html]. > Code for the implemented Zipkin plugin can be found at > https://github.com/thelastpickle/cassandra-zipkin-tracing/ > In addition this patch passes the custom payload through into the tracing > session allowing a third party tracing solution like Zipkin to do full-stack > tracing from clients through and into Cassandra. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] J.P. Eiti Kimura updated CASSANDRA-7276: Attachment: cassandra-2.1.9-7276.txt new patch added > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tyler Hobbs >Assignee: Nitzan Volman >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt, > cassandra-2.1.9-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934590#comment-14934590 ] Stefania commented on CASSANDRA-7392: - bq. Use a dedicated thread to update the timestamp so it isn't impacted by other activities bq. I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. I've introduced a new periodic SES for fast jobs (sub-microsecond) and moved {{ApproximateTime}} and {{NanoTimeToCurrentTimeMillis}} to it. bq. I think the timestamp field in ApproximateTime needs to be volatile. OK bq. Several properties don't have the "cassandra." prefix Thanks, I accidentally dropped them during the refactoring. bq. By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. OK bq. I think you want a count of operations that were truncated instead of a boolean so you can log the count. OK bq. Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding. OK bq. More bike shedding, when aggregating I would just allocate the map each time rather than clear it. It's done now since we only drain when reporting, a map is now created only during reporting. bq. I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. I've added number of operations and interval and made the two messages partially identical, is this what you meant by "sync"? Bear in mind that the no spam logger will only log once every 15 minutes however. bq. I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by hand. Done. bq. More bike shedding, you can implement min and max as "oldValue = Math.min(oldValue, nextMeasurement)". OK bq. Can you humor me and for Monitorable boolean checks rename to isXYZ and for things that might change it leave as is? Sure, done. bq. I think failedAt is unused now? No, we still need it when adding a timeout to the same failed operation. bq. If we use approximate time for timeouts can we also use it for setting the construction time? I believe we can, this is however existing functionality that we are changing as it is used by the existing logging of all dropped messages. bq. More bike shedding. The idiom for polling a thread safe queue is to avoid calling isEmpty() and poll checking for null to avoid extra lock acquisitions (assuming the queue does that) on the queue.. Some queues do have cheap(er) isEmpty() calls. OK > Abort in-progress queries that time out > --- > > Key: CASSANDRA-7392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7392 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.x > > > Currently we drop queries that time out before we get to them (because node > is overloaded) but not queries that time out while being processed. > (Particularly common for index queries on data that shouldn't be indexed.) > Adding the latter and logging when we have to interrupt one gets us a poor > man's "slow query log" for free. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10256) document commitlog segment size's relationship to max write size
[ https://issues.apache.org/jira/browse/CASSANDRA-10256?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933898#comment-14933898 ] Chris Gerlt commented on CASSANDRA-10256: - I have reviewed the attachment above (CASSANDRA-10256.txt [ 12762495 ]) and see no issues with that text. In other words, it looks great. (Please note this is my fist contribution as a reviewer so I don't know if I'm suppose to do something other than comment!) > document commitlog segment size's relationship to max write size > > > Key: CASSANDRA-10256 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10256 > Project: Cassandra > Issue Type: Improvement > Components: Config >Reporter: Chris Burroughs >Priority: Trivial > Labels: lhf > Attachments: CASSANDRA-10256.txt > > > This is in the code: > https://github.com/apache/cassandra/blob/cassandra-2.1/src/java/org/apache/cassandra/db/commitlog/CommitLog.java#L57 > But not part of the description in cassandra.yaml -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10166) Fix failing tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-10166: Assignee: Sylvain Lebresne > Fix failing tests > - > > Key: CASSANDRA-10166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10166 > Project: Cassandra > Issue Type: Test >Reporter: Sylvain Lebresne >Assignee: Sylvain Lebresne > Fix For: 3.0.0 rc2 > > > Until we find a better way to track those things, this is meant as a master > ticket to track tickets open regarding tests (unit test and dtests, though at > the time of this writing only dtest are still failing) that are still > failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10404) Node to Node encryption transitional mode
Tom Lewis created CASSANDRA-10404: - Summary: Node to Node encryption transitional mode Key: CASSANDRA-10404 URL: https://issues.apache.org/jira/browse/CASSANDRA-10404 Project: Cassandra Issue Type: New Feature Components: Core Reporter: Tom Lewis Create a transitional mode for encryption that allows encrypted and unencrypted traffic node-to-node during a change over to encryption from unencrypted. This alleviates downtime during the switch. This is similar to https://issues.apache.org/jira/browse/CASSANDRA-8803 which is intended for client-to-node -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933634#comment-14933634 ] Joshua McKenzie commented on CASSANDRA-10403: - Adding extra configuration files w/options to switch on launch is something I'd be comfortable with us adding after GA so long as we leave our default alone. For this ticket, let's focus on just determining whether or not we feel reverting from G1 to CMS is appropriate for 3.0, and then move forward on a separate ticket for adding more intelligence to our GC configuration sourcing options. For the record and my .02, I quite like the idea of us having multiple GC profiles out of the box with either logic to switch based on available heap, or via command-line for different expected workloads for instance; I think there's a lot we could do there to make operators' lives easier. [~enigmacurry]: Any update on how that 100x test went? > Consider reverting to CMS GC on 3.0 > --- > > Key: CASSANDRA-10403 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10403 > Project: Cassandra > Issue Type: Improvement > Components: Config >Reporter: Joshua McKenzie > Fix For: 3.0.0 rc2 > > > Reference discussion on CASSANDRA-7486. > For smaller heap sizes G1 appears to have some throughput/latency issues when > compared to CMS. With our default max heap size at 8G on 3.0, there's a > strong argument to be made for having CMS as the default for the 3.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-10347: Reviewer: Carl Yeksigian > Bulk Loader API could not tolerate even node failure > > > Key: CASSANDRA-10347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10347 > Project: Cassandra > Issue Type: Bug >Reporter: Shenghua Wan >Assignee: Paulo Motta > > When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in > the token range, which includes the dead nodes. Therefore, the stream failed. > There was a design in C* API to allow stream() method to have a list of > ignore hosts, but it was not utilized. > The empty-argument stream() method is called in all existing versions of C*, > i.e. > in v2.0.11, > https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > in v2.1.5, > https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > and current trunk branch > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-10347: -- Fix Version/s: 3.0.x 2.2.x 2.1.x > Bulk Loader API could not tolerate even node failure > > > Key: CASSANDRA-10347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10347 > Project: Cassandra > Issue Type: Bug >Reporter: Shenghua Wan >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in > the token range, which includes the dead nodes. Therefore, the stream failed. > There was a design in C* API to allow stream() method to have a list of > ignore hosts, but it was not utilized. > The empty-argument stream() method is called in all existing versions of C*, > i.e. > in v2.0.11, > https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > in v2.1.5, > https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > and current trunk branch > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10406) Nodetool supports to rebuild from specific ranges.
[ https://issues.apache.org/jira/browse/CASSANDRA-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dikang Gu updated CASSANDRA-10406: -- Attachment: CASSANDRA-10406.patch Patch is based on 1.2.19. > Nodetool supports to rebuild from specific ranges. > -- > > Key: CASSANDRA-10406 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10406 > Project: Cassandra > Issue Type: Improvement >Reporter: Dikang Gu >Assignee: Dikang Gu > Fix For: 1.2.x > > Attachments: CASSANDRA-10406.patch > > > Add the 'nodetool rebuildrange' command, so that if `nodetool rebuild` > failed, we do not need to rebuild all the ranges, and can just rebuild those > failed ones. > Should be easily ported to all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shenghua Wan updated CASSANDRA-10347: - Attachment: AbstractBulkRecordWriter.java > Bulk Loader API could not tolerate even node failure > > > Key: CASSANDRA-10347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10347 > Project: Cassandra > Issue Type: Bug >Reporter: Shenghua Wan >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > Attachments: AbstractBulkRecordWriter.java > > > When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in > the token range, which includes the dead nodes. Therefore, the stream failed. > There was a design in C* API to allow stream() method to have a list of > ignore hosts, but it was not utilized. > The empty-argument stream() method is called in all existing versions of C*, > i.e. > in v2.0.11, > https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > in v2.1.5, > https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > and current trunk branch > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933818#comment-14933818 ] Paulo Motta commented on CASSANDRA-10347: - bq. Isn't using mapreduce.output.bulkoutputformat.maxfailedhosts a better way to do this? Does that not work for this use case? Probably yes, but [~wanshenghua] could tell better, did you try the {{mapreduce.output.bulkoutputformat.maxfailedhosts}} property? Unfortunately I just discovered that property after implementing the new one, my bad. Anyway, I guess the parameters are not mutually exclusive, as you may want still want to blacklist nodes that are alive. Since it's already implemented and to be consistent with sstable loader, I think it's still valid to have an {{ignorehosts}} property in addition to {{maxfailedhosts}}. > Bulk Loader API could not tolerate even node failure > > > Key: CASSANDRA-10347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10347 > Project: Cassandra > Issue Type: Bug >Reporter: Shenghua Wan >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in > the token range, which includes the dead nodes. Therefore, the stream failed. > There was a design in C* API to allow stream() method to have a list of > ignore hosts, but it was not utilized. > The empty-argument stream() method is called in all existing versions of C*, > i.e. > in v2.0.11, > https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > in v2.1.5, > https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > and current trunk branch > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933906#comment-14933906 ] Shenghua Wan commented on CASSANDRA-10347: -- First thank you for looking into this issue. [~pauloricardomg] To you question, I have not tried mapreduce.output.bulkoutputformat.maxfailedhosts property. I have read the source code and thought this property only gave up when certain number of host connections failed. However, I still want the streaming continue if there exist some hosts alive, even exceeding the threshold. Therefore, to solve the problem of my use case (skip connecting to lost hosts), I have implemented something just like "mapreduce.output.bulkoutputformat.ignorehosts" property, e.g. > Bulk Loader API could not tolerate even node failure > > > Key: CASSANDRA-10347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10347 > Project: Cassandra > Issue Type: Bug >Reporter: Shenghua Wan >Assignee: Paulo Motta > Fix For: 2.1.x, 2.2.x, 3.0.x > > > When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in > the token range, which includes the dead nodes. Therefore, the stream failed. > There was a design in C* API to allow stream() method to have a list of > ignore hosts, but it was not utilized. > The empty-argument stream() method is called in all existing versions of C*, > i.e. > in v2.0.11, > https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > in v2.1.5, > https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > and current trunk branch > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-10399) Create default Stress tables without compact storage
[ https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] T Jake Luciani resolved CASSANDRA-10399. Resolution: Not A Problem You can just use a yaml file like [this|https://gist.github.com/tjake/3186dec175b015d9f5b] > Create default Stress tables without compact storage > - > > Key: CASSANDRA-10399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10399 > Project: Cassandra > Issue Type: Bug >Reporter: Sebastian Estevez >Priority: Minor > > ~$ cassandra-stress write > {code} > cqlsh> desc TABLE keyspace1.standard1 > CREATE TABLE keyspace1.standard1 ( > key blob PRIMARY KEY, > "C0" blob, > "C1" blob, > "C2" blob, > "C3" blob, > "C4" blob > ) WITH COMPACT STORAGE > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} > AND compression = {} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9840) global_row_key_cache_test.py fails; loses mutations on cluster restart
[ https://issues.apache.org/jira/browse/CASSANDRA-9840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-9840: --- Assignee: Ariel Weisberg > global_row_key_cache_test.py fails; loses mutations on cluster restart > -- > > Key: CASSANDRA-9840 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9840 > Project: Cassandra > Issue Type: Bug >Reporter: Shawn Kumar >Assignee: Ariel Weisberg >Priority: Blocker > Fix For: 3.0.0 rc2 > > Attachments: node1.log, node2.log, node3.log, noseout.txt > > > This test is currently failing on trunk. I've attached the test output and > logs. It seems that the failure of the test doesn't necessarily have anything > to do with global row/key caches - as on the initial loop of the test > [neither are > used|https://github.com/riptano/cassandra-dtest/blob/master/global_row_key_cache_test.py#L15] > and we still hit failure. The test itself fails when a second validation of > values after a cluster restart fails to capture deletes issued prior to the > restart and first successful validation. However, if I add flushes prior to > restarting the cluster the test completes successfully, implying an issue > with loss of in-memory mutations due to the cluster restart. Initially I had > though this might be due to CASSANDRA-9669, but as Benedict pointed out, the > fact that this test has been succeeding consistently on both 2.1 and 2.2 > branch indicates there may be another issue at hand. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933598#comment-14933598 ] Joshua McKenzie commented on CASSANDRA-10403: - To me, the long-term solution of C* having the intelligence to select G1 for heaps over X, CMS for heaps under X makes a lot of sense, assuming test data shows that to be the appropriate solution. I'd argue that what we need to do here is figure out what the sanest recommendation is for a default GC on 3.0, get that setup in our launch scripts (if necessary), and probably include the alternate set of GC configurations in our launch files, commented out, so people can easily swap back and forth based on their needs. > Consider reverting to CMS GC on 3.0 > --- > > Key: CASSANDRA-10403 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10403 > Project: Cassandra > Issue Type: Improvement > Components: Config >Reporter: Joshua McKenzie > Fix For: 3.0.0 rc2 > > > Reference discussion on CASSANDRA-7486. > For smaller heap sizes G1 appears to have some throughput/latency issues when > compared to CMS. With our default max heap size at 8G on 3.0, there's a > strong argument to be made for having CMS as the default for the 3.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10405) MV updates should optionally wait for acknowledgement from view replicas
[ https://issues.apache.org/jira/browse/CASSANDRA-10405?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Yeksigian updated CASSANDRA-10405: --- Issue Type: Improvement (was: Bug) > MV updates should optionally wait for acknowledgement from view replicas > > > Key: CASSANDRA-10405 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10405 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian > Labels: materializedviews > Fix For: 3.x > > > MV updates are currently completely asynchronous in order to provide > parallelism of updates trying to acquire the partition lock. For some use > cases, leaving the MV updates asynchronous is exactly what's needed. > However, there are some use cases where knowing that the update has either > succeeded or failed on the view is necessary, especially when trying to allow > read-your-write behavior. In those cases, we would follow the same code path > as asynchronous writes, but at the end wait on the acknowledgements from the > view replicas before acknowledging our write. This option should be for each > MV separately, since MVs which need the synchronous properties might be mixed > with MV which do not need this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-5780) nodetool status and ring report incorrect/stale information after decommission
[ https://issues.apache.org/jira/browse/CASSANDRA-5780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933745#comment-14933745 ] John Sumsion commented on CASSANDRA-5780: - The only thing I wouldn't want to have happen is to accidentally issue some kind of truncate that in a race condition inadvertently gets replicated to the entire cluster. I don't know the cassandra codebase enough to understand whether that risk exists when calling {{ColumnFamilyStore.truncateBlocking()}}. From what I can tell, I think it's likely pretty safe because once you get down to StorageService, there is no cross-cluster effect of actions taken at that level. Can anyone reply who knows better what cross-cluster effects {{truncateBlocking()}} might have? The reason I don't have that concern with the 'system' keyspace is that it is never replicated. Actually, looking into {{ColumnFamilyStore.truncateBlocking()}} makes me think that my proposed changes will blow up half-way through because a side-effect of truncating a table is writing back a "truncated at" record to 'system.local' table (which we just truncated). I guess I need to run ccm with a local-built cassandra and try decomissioning to see what happens (not sure how to do that). > nodetool status and ring report incorrect/stale information after decommission > -- > > Key: CASSANDRA-5780 > URL: https://issues.apache.org/jira/browse/CASSANDRA-5780 > Project: Cassandra > Issue Type: Bug > Components: Tools >Reporter: Peter Haggerty >Priority: Trivial > Labels: lhf, ponies, qa-resolved > Fix For: 2.1.x > > > Cassandra 1.2.6 ring of 12 instances, each with 256 tokens. > Decommission 3 of the 12 nodes, one after another resulting a 9 instance ring. > The 9 instances of cassandra that are in the ring all correctly report > nodetool status information for the ring and have the same data. > After the first node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > After the second node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > "nodetool status" on "decommissioned-2nd" reports 10 nodes > After the second node is decommissioned: > "nodetool status" on "decommissioned-1st" reports 11 nodes > "nodetool status" on "decommissioned-2nd" reports 10 nodes > "nodetool status" on "decommissioned-3rd" reports 9 nodes > The storage load information is similarly stale on the various decommissioned > nodes. The nodetool status and ring commands continue to return information > as if they were part of a cluster and they appear to return the last > information that they saw. > In contrast the nodetool info command fails with an exception, which isn't > ideal but at least indicates that there was a failure rather than returning > stale information. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
[ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933782#comment-14933782 ] Joel Knighton commented on CASSANDRA-10231: --- I'm continuing to follow-up, but it doesn't look like this patch fixes the issue. I'll try to reproduce this with a dtest again. > Null status entries on nodes that crash during decommission of a different > node > --- > > Key: CASSANDRA-10231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10231 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Stefania > Fix For: 3.0.0 rc2 > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > In a 5 node cluster, if a node crashes at a certain point (unknown) during > the decommission of a different node, it may start with a null entry for the > decommissioned node like so: > DN 10.0.0.5 ? 256 ? null rack1 > This entry does not get updated/cleared by gossip. This entry is removed upon > a restart of the affected node. > This issue is further detailed in ticket > [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10399) Create default Stress tables without compact storage
[ https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933538#comment-14933538 ] T Jake Luciani edited comment on CASSANDRA-10399 at 9/28/15 4:37 PM: - You can just use a yaml file like [this|https://gist.github.com/tjake/3186dec175b015d9f5b9] was (Author: tjake): You can just use a yaml file like [this|https://gist.github.com/tjake/3186dec175b015d9f5b] > Create default Stress tables without compact storage > - > > Key: CASSANDRA-10399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10399 > Project: Cassandra > Issue Type: Bug >Reporter: Sebastian Estevez >Priority: Minor > > ~$ cassandra-stress write > {code} > cqlsh> desc TABLE keyspace1.standard1 > CREATE TABLE keyspace1.standard1 ( > key blob PRIMARY KEY, > "C0" blob, > "C1" blob, > "C2" blob, > "C3" blob, > "C4" blob > ) WITH COMPACT STORAGE > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} > AND compression = {} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10298) Replaced dead node stayed in gossip forever
[ https://issues.apache.org/jira/browse/CASSANDRA-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933609#comment-14933609 ] Dikang Gu commented on CASSANDRA-10298: --- [~Stefania], yeah, looks like the same issues, have you committed your patches to 2.1 branch? > Replaced dead node stayed in gossip forever > --- > > Key: CASSANDRA-10298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10298 > Project: Cassandra > Issue Type: Bug >Reporter: Dikang Gu >Assignee: Dikang Gu > Fix For: 2.1.x > > Attachments: CASSANDRA-10298.patch > > > The dead node stayed in the nodetool status, > DN 10.210.165.55379.76 GB 256 ? null > And in the log, it throws NPE when trying to remove it. > {code} > 2015-09-10_06:41:22.92453 ERROR 06:41:22 Exception in thread > Thread[GossipStage:1,5,main] > 2015-09-10_06:41:22.92454 java.lang.NullPointerException: null > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:100) > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.db.HintedHandOffManager.deleteHintsForEndpoint(HintedHandOffManager.java:201) > > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.service.StorageService.excise(StorageService.java:1886) > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.service.StorageService.excise(StorageService.java:1902) > 2015-09-10_06:41:22.92456 at > org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1805) > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1473) > > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2099) > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1085) > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > 2015-09-10_06:41:22.92459 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_45] > 2015-09-10_06:41:22.92460 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_45] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933599#comment-14933599 ] Paulo Motta commented on CASSANDRA-10403: - +1. As an operator I had some issues with an 8GB heap and G1GC. We should probably make it easy to switch by extracting gc properties to a variable, and provide a commented-out option with pre-filled G1 settings, and maybe mention something on the documentation too. > Consider reverting to CMS GC on 3.0 > --- > > Key: CASSANDRA-10403 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10403 > Project: Cassandra > Issue Type: Improvement > Components: Config >Reporter: Joshua McKenzie > Fix For: 3.0.0 rc2 > > > Reference discussion on CASSANDRA-7486. > For smaller heap sizes G1 appears to have some throughput/latency issues when > compared to CMS. With our default max heap size at 8G on 3.0, there's a > strong argument to be made for having CMS as the default for the 3.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10399) Create default Stress tables without compact storage
[ https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933663#comment-14933663 ] Sebastian Estevez edited comment on CASSANDRA-10399 at 9/28/15 5:53 PM: Yes, I can also manually create the table without COMPACT STORAGE. My point was we shouldn't use compact storage by default. was (Author: sebastian.este...@datastax.com): Yes, I can also manually create the table without COMPACT STORAGE. I guess my point was we shouldn't use compact storage by default. > Create default Stress tables without compact storage > - > > Key: CASSANDRA-10399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10399 > Project: Cassandra > Issue Type: Bug >Reporter: Sebastian Estevez >Priority: Minor > > ~$ cassandra-stress write > {code} > cqlsh> desc TABLE keyspace1.standard1 > CREATE TABLE keyspace1.standard1 ( > key blob PRIMARY KEY, > "C0" blob, > "C1" blob, > "C2" blob, > "C3" blob, > "C4" blob > ) WITH COMPACT STORAGE > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} > AND compression = {} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned
[ https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Knighton updated CASSANDRA-10068: -- Description: This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test. At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the following assertion due to a missing host ID: https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 A nodetool status on the node with failed batchlog replay shows the following entry for the decommissioned node: DN 10.0.0.5 ? 256 ? null rack1 On the unaffected nodes, there is no entry for the decommissioned node as expected. There are occasional hits of the same assertions for logs in other nodes; it looks like the issue might occasionally resolve itself, but one node seems to have the errant null entry indefinitely. In logs for the nodes, this possibly unrelated exception also appears: java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] I have a running cluster with the issue on my machine; it is also repeatable. Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node in the cluster are attached. was: This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test. At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the following assertion due to a missing host ID: https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 A nodetool status on the node with failed batchlog replay shows the following entry for the decommissioned node: DN 10.0.0.5 ? 256 ? null rack1 On the unaffected nodes, there is no entry for the decommissioned node as expected. There are occasional hits of the same assertions for logs in other nodes; it looks like the issue might occasionally resolve itself, but one node seems to have the errant null entry indefinitely. In logs for the nodes, this possibly unrelated exception also appears: java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] I haveb a running cluster with the issue on my machine; it is also repeatable. Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node in the cluster are attached. > Batchlog replay fails with exception after a node is decommissioned > --- > > Key: CASSANDRA-10068 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10068 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Branimir Lambov > Attachments: n1.log, n2.log, n3.log, n4.log, n5.log > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > At the conclusion of the test, a batchlog replay is initiated through > nodetool and hits the following assertion due to a missing host ID: > https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 > A nodetool status on the node with failed batchlog replay shows the following > entry for the decommissioned node: > DN 10.0.0.5 ? 256 ? null > rack1 > On the unaffected nodes, there is no entry for the decommissioned node as > expected. > There are occasional hits of the same assertions for logs in other nodes; it > looks like the issue might occasionally resolve itself, but one node seems to > have the errant null entry indefinitely. > In logs for the nodes, this possibly unrelated exception also appears: > java.lang.RuntimeException: Trying to get the view natural endpoint on a > non-data replica > at > org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) > ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] > I have a running cluster with the issue on my machine; it is also repeatable. > Nothing
[jira] [Updated] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned
[ https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Knighton updated CASSANDRA-10068: -- Description: This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test. At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the following assertion due to a missing host ID: https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 A nodetool status on the node with failed batchlog replay shows the following entry for the decommissioned node: DN 10.0.0.5 ? 256 ? null rack1 On the unaffected nodes, there is no entry for the decommissioned node as expected. There are occasional hits of the same assertions for logs in other nodes; it looks like the issue might occasionally resolve itself, but one node seems to have the errant null entry indefinitely. In logs for the nodes, this possibly unrelated exception also appears: java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] I haveb a running cluster with the issue on my machine; it is also repeatable. Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node in the cluster are attached. was: This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test. At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the following assertion due to a missing host ID: https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 A nodetool status on the node with failed batchlog replay shows the following entry for the decommissioned node: DN 10.0.0.5 ? 256 ? null rack1 On the unaffected nodes, there is no entry for the decommissioned node as expected. There are occasional hits of the same assertions for logs in other nodes; it looks like the issue might occasionally resolve itself, but one node seems to have the errant null entry indefinitely. In logs for the nodes, this possibly unrelated exception also appears: java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] I havereddit.com/r/androidwear a running cluster with the issue on my machine; it is also repeatable. Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node in the cluster are attached. > Batchlog replay fails with exception after a node is decommissioned > --- > > Key: CASSANDRA-10068 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10068 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Branimir Lambov > Attachments: n1.log, n2.log, n3.log, n4.log, n5.log > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > At the conclusion of the test, a batchlog replay is initiated through > nodetool and hits the following assertion due to a missing host ID: > https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 > A nodetool status on the node with failed batchlog replay shows the following > entry for the decommissioned node: > DN 10.0.0.5 ? 256 ? null > rack1 > On the unaffected nodes, there is no entry for the decommissioned node as > expected. > There are occasional hits of the same assertions for logs in other nodes; it > looks like the issue might occasionally resolve itself, but one node seems to > have the errant null entry indefinitely. > In logs for the nodes, this possibly unrelated exception also appears: > java.lang.RuntimeException: Trying to get the view natural endpoint on a > non-data replica > at > org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) > ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] > I haveb a running cluster with the issue on my machine; it is
[jira] [Resolved] (CASSANDRA-9922) Add Materialized View WHERE schema support
[ https://issues.apache.org/jira/browse/CASSANDRA-9922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Carl Yeksigian resolved CASSANDRA-9922. --- Resolution: Duplicate Fix Version/s: (was: 3.x) This was completed as part of CASSANDRA-9664. > Add Materialized View WHERE schema support > -- > > Key: CASSANDRA-9922 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9922 > Project: Cassandra > Issue Type: Improvement >Reporter: Carl Yeksigian > Labels: materializedviews > > In order to provide forward compatibility with the 3.x series, we should add > schema support for capturing the where clause of the MV. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?
[ https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta reassigned CASSANDRA-9806: -- Assignee: Paulo Motta > some TTL test are failing on trunk: losing data after restart? > --- > > Key: CASSANDRA-9806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9806 > Project: Cassandra > Issue Type: Bug >Reporter: Alan Boudreault >Assignee: Paulo Motta >Priority: Blocker > Fix For: 3.0.0 rc2 > > > ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is > failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are > failing: > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/ > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/ > After some debugging, I noticed a strange behaviour. It looks like some data > disappear after a node restart, even if the row has no TTL set. Here a test > example where I see the issue with latest trunk: > https://gist.github.com/aboudreault/94cb552750a186ca853d -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10323) Add more MaterializedView metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-10323: Assignee: Chris Lohfink > Add more MaterializedView metrics > - > > Key: CASSANDRA-10323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10323 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: Chris Lohfink > Labels: lhf > Fix For: 3.0.0 rc2 > > Attachments: trunk-10323.txt > > > We need to add more metrics to help understand where time is spent in > materialized view writes. We currently track the ratio of async base -> view > mutations that fail. > We should also add > * The amount of time spent waiting for the partition lock (contention) > * The amount of time spent reading data > Any others? > [~carlyeks] [~jkni] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933621#comment-14933621 ] Jonathan Shook commented on CASSANDRA-10403: I would be entirely in favor of having a separate settings file that can simply be sourced in. Having several related GC options sprinkled through the -env file is bothersome. This should apply as well to the CMS settings. Perhaps it should even be a soft setting, as long as the possible values are marshaled against any injection. > Consider reverting to CMS GC on 3.0 > --- > > Key: CASSANDRA-10403 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10403 > Project: Cassandra > Issue Type: Improvement > Components: Config >Reporter: Joshua McKenzie > Fix For: 3.0.0 rc2 > > > Reference discussion on CASSANDRA-7486. > For smaller heap sizes G1 appears to have some throughput/latency issues when > compared to CMS. With our default max heap size at 8G on 3.0, there's a > strong argument to be made for having CMS as the default for the 3.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10405) MV updates should optionally wait for acknowledgement from view replicas
Carl Yeksigian created CASSANDRA-10405: -- Summary: MV updates should optionally wait for acknowledgement from view replicas Key: CASSANDRA-10405 URL: https://issues.apache.org/jira/browse/CASSANDRA-10405 Project: Cassandra Issue Type: Bug Reporter: Carl Yeksigian Fix For: 3.x MV updates are currently completely asynchronous in order to provide parallelism of updates trying to acquire the partition lock. For some use cases, leaving the MV updates asynchronous is exactly what's needed. However, there are some use cases where knowing that the update has either succeeded or failed on the view is necessary, especially when trying to allow read-your-write behavior. In those cases, we would follow the same code path as asynchronous writes, but at the end wait on the acknowledgements from the view replicas before acknowledging our write. This option should be for each MV separately, since MVs which need the synchronous properties might be mixed with MV which do not need this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10347) Bulk Loader API could not tolerate even node failure
[ https://issues.apache.org/jira/browse/CASSANDRA-10347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933716#comment-14933716 ] Jeremiah Jordan commented on CASSANDRA-10347: - Isn't using mapreduce.output.bulkoutputformat.maxfailedhosts a better way to do this? Does that not work for this use case? > Bulk Loader API could not tolerate even node failure > > > Key: CASSANDRA-10347 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10347 > Project: Cassandra > Issue Type: Bug >Reporter: Shenghua Wan >Assignee: Paulo Motta > > When user uses CqlBulkOutputFormat, it tries to stream to all the nodes in > the token range, which includes the dead nodes. Therefore, the stream failed. > There was a design in C* API to allow stream() method to have a list of > ignore hosts, but it was not utilized. > The empty-argument stream() method is called in all existing versions of C*, > i.e. > in v2.0.11, > https://github.com/apache/cassandra/blob/cassandra-2.0.11/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > in v2.1.5, > https://github.com/apache/cassandra/blob/cassandra-2.1.5/src/java/org/apache/cassandra/hadoop/AbstractBulkRecordWriter.java#L122 > and current trunk branch > https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkRecordWriter.java#L241 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10068) Batchlog replay fails with exception after a node is decommissioned
[ https://issues.apache.org/jira/browse/CASSANDRA-10068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joel Knighton updated CASSANDRA-10068: -- Description: This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test. At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the following assertion due to a missing host ID: https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 A nodetool status on the node with failed batchlog replay shows the following entry for the decommissioned node: DN 10.0.0.5 ? 256 ? null rack1 On the unaffected nodes, there is no entry for the decommissioned node as expected. There are occasional hits of the same assertions for logs in other nodes; it looks like the issue might occasionally resolve itself, but one node seems to have the errant null entry indefinitely. In logs for the nodes, this possibly unrelated exception also appears: java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] I havereddit.com/r/androidwear a running cluster with the issue on my machine; it is also repeatable. Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node in the cluster are attached. was: This issue is reproducible through a Jepsen test of materialized views that crashes and decommissions nodes throughout the test. At the conclusion of the test, a batchlog replay is initiated through nodetool and hits the following assertion due to a missing host ID: https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 A nodetool status on the node with failed batchlog replay shows the following entry for the decommissioned node: DN 10.0.0.5 ? 256 ? null rack1 On the unaffected nodes, there is no entry for the decommissioned node as expected. There are occasional hits of the same assertions for logs in other nodes; it looks like the issue might occasionally resolve itself, but one node seems to have the errant null entry indefinitely. In logs for the nodes, this possibly unrelated exception also appears: java.lang.RuntimeException: Trying to get the view natural endpoint on a non-data replica at org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] I have a running cluster with the issue on my machine; it is also repeatable. Nothing stands out in the logs of the decommissioned node (n4) for me. The logs of each node in the cluster are attached. > Batchlog replay fails with exception after a node is decommissioned > --- > > Key: CASSANDRA-10068 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10068 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Branimir Lambov > Attachments: n1.log, n2.log, n3.log, n4.log, n5.log > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > At the conclusion of the test, a batchlog replay is initiated through > nodetool and hits the following assertion due to a missing host ID: > https://github.com/apache/cassandra/blob/3413e557b95d9448b0311954e9b4f53eaf4758cd/src/java/org/apache/cassandra/service/StorageProxy.java#L1197 > A nodetool status on the node with failed batchlog replay shows the following > entry for the decommissioned node: > DN 10.0.0.5 ? 256 ? null > rack1 > On the unaffected nodes, there is no entry for the decommissioned node as > expected. > There are occasional hits of the same assertions for logs in other nodes; it > looks like the issue might occasionally resolve itself, but one node seems to > have the errant null entry indefinitely. > In logs for the nodes, this possibly unrelated exception also appears: > java.lang.RuntimeException: Trying to get the view natural endpoint on a > non-data replica > at > org.apache.cassandra.db.view.MaterializedViewUtils.getViewNaturalEndpoint(MaterializedViewUtils.java:91) > ~[apache-cassandra-3.0.0-alpha1-SNAPSHOT.jar:3.0.0-alpha1-SNAPSHOT] > I havereddit.com/r/androidwear a running cluster with the issue
[jira] [Commented] (CASSANDRA-10399) Create default Stress tables without compact storage
[ https://issues.apache.org/jira/browse/CASSANDRA-10399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933663#comment-14933663 ] Sebastian Estevez commented on CASSANDRA-10399: --- Yes, I can also manually create the table without COMPACT STORAGE. I guess my point was we shouldn't use compact storage by default. > Create default Stress tables without compact storage > - > > Key: CASSANDRA-10399 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10399 > Project: Cassandra > Issue Type: Bug >Reporter: Sebastian Estevez >Priority: Minor > > ~$ cassandra-stress write > {code} > cqlsh> desc TABLE keyspace1.standard1 > CREATE TABLE keyspace1.standard1 ( > key blob PRIMARY KEY, > "C0" blob, > "C1" blob, > "C2" blob, > "C3" blob, > "C4" blob > ) WITH COMPACT STORAGE > AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' > AND comment = '' > AND compaction = {'class': > 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} > AND compression = {} > AND dclocal_read_repair_chance = 0.1 > AND default_time_to_live = 0 > AND gc_grace_seconds = 864000 > AND max_index_interval = 2048 > AND memtable_flush_period_in_ms = 0 > AND min_index_interval = 128 > AND read_repair_chance = 0.0 > AND speculative_retry = 'NONE'; > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933983#comment-14933983 ] Ariel Weisberg commented on CASSANDRA-7392: --- * Use a dedicated thread to update the timestamp so it isn't impacted by other activities * I was going to suggest use the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. * I think the timestamp field in ApproximateTime needs to be volatile. * Several properties don't have the "cassandra." prefix * By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. * I think you want a count of operations that were truncated instead of a boolean so you can log the count. * [Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126] * More bike shedding, when aggregating I would just allocate the map each time rather than clear it. * I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. * [I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257] * [More bike shedding, you can implement min and max as "oldValue = Math.min(oldValue, nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259] * [Can you humor me and for Monitorable boolean checks rename to isXYZ and for things that might change it leave as is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53] * [I think failedAt is unused now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223] > Abort in-progress queries that time out > --- > > Key: CASSANDRA-7392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7392 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.x > > > Currently we drop queries that time out before we get to them (because node > is overloaded) but not queries that time out while being processed. > (Particularly common for index queries on data that shouldn't be indexed.) > Adding the latter and logging when we have to interrupt one gets us a poor > man's "slow query log" for free. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10323) Add more MaterializedView metrics
[ https://issues.apache.org/jira/browse/CASSANDRA-10323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chris Lohfink updated CASSANDRA-10323: -- Attachment: trunk-10323-v2.txt > Add more MaterializedView metrics > - > > Key: CASSANDRA-10323 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10323 > Project: Cassandra > Issue Type: Improvement >Reporter: T Jake Luciani >Assignee: Chris Lohfink > Labels: lhf > Fix For: 3.0.0 rc2 > > Attachments: trunk-10323-v2.txt, trunk-10323.txt > > > We need to add more metrics to help understand where time is spent in > materialized view writes. We currently track the ratio of async base -> view > mutations that fail. > We should also add > * The amount of time spent waiting for the partition lock (contention) > * The amount of time spent reading data > Any others? > [~carlyeks] [~jkni] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934099#comment-14934099 ] Ariel Weisberg commented on CASSANDRA-7392: --- Sorry noticed one more thing. Not editing because it drives observers crazy. * [More bike shedding. The idiom for polling a thread safe queue is to avoid calling isEmpty() and poll checking for null to avoid extra lock acquisitions (assuming the queue does that) on the queue.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R151]. Some queues do have cheap(er) isEmpty() calls. > Abort in-progress queries that time out > --- > > Key: CASSANDRA-7392 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7392 > Project: Cassandra > Issue Type: New Feature > Components: Core >Reporter: Jonathan Ellis >Assignee: Stefania >Priority: Critical > Fix For: 3.x > > > Currently we drop queries that time out before we get to them (because node > is overloaded) but not queries that time out while being processed. > (Particularly common for index queries on data that shouldn't be indexed.) > Adding the latter and logging when we have to interrupt one gets us a poor > man's "slow query log" for free. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933983#comment-14933983 ] Ariel Weisberg edited comment on CASSANDRA-7392 at 9/28/15 9:02 PM: * Use a dedicated thread to update the timestamp so it isn't impacted by other activities * I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. * I think the timestamp field in ApproximateTime needs to be volatile. * Several properties don't have the "cassandra." prefix * By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. * I think you want a count of operations that were truncated instead of a boolean so you can log the count. * [Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126] * More bike shedding, when aggregating I would just allocate the map each time rather than clear it. * I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. * [I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257] * [More bike shedding, you can implement min and max as "oldValue = Math.min(oldValue, nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259] * [Can you humor me and for Monitorable boolean checks rename to isXYZ and for things that might change it leave as is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53] * [I think failedAt is unused now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223] was (Author: aweisberg): * Use a dedicated thread to update the timestamp so it isn't impacted by other activities * I was going to suggest use the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. * I think the timestamp field in ApproximateTime needs to be volatile. * Several properties don't have the "cassandra." prefix * By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. * I think you want a count of operations that were truncated instead of a boolean so you can log the count. * [Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126] * More bike shedding, when aggregating I would just allocate the map each time rather than clear it. * I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. * [I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257] * [More bike shedding, you can implement min and max as "oldValue = Math.min(oldValue,
[jira] [Comment Edited] (CASSANDRA-7392) Abort in-progress queries that time out
[ https://issues.apache.org/jira/browse/CASSANDRA-7392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933983#comment-14933983 ] Ariel Weisberg edited comment on CASSANDRA-7392 at 9/28/15 9:11 PM: * Use a dedicated thread to update the timestamp so it isn't impacted by other activities * I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. * I think the timestamp field in ApproximateTime needs to be volatile. * Several properties don't have the "cassandra." prefix * By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. * I think you want a count of operations that were truncated instead of a boolean so you can log the count. * [Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126] * More bike shedding, when aggregating I would just allocate the map each time rather than clear it. * I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. * [I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by hand.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R257] * [More bike shedding, you can implement min and max as "oldValue = Math.min(oldValue, nextMeasurement)".|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R259] * [Can you humor me and for Monitorable boolean checks rename to isXYZ and for things that might change it leave as is?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2578da7d6bbdd276157604856543cbecR53] * [I think failedAt is unused now?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R223] * [If we use approximate time for timeouts can we also use it for setting the construction time?|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-2603bbfead4cdd58e1e08b225338bda0R28] was (Author: aweisberg): * Use a dedicated thread to update the timestamp so it isn't impacted by other activities * I was going to suggest using the thread used by NanoTimeToCurrentTimeMillis, so make it an SES and schedule the work there. However I'm not even sure why that activity deserved it's own thread. I think there was nothing available in some version of C*, but now it could just use ScheduledExecutors. So maybe just a dedicated thread for updating ApproximateTime. I believe approximate time will find more traction over time so it should be reasonably accurate when possible. * I think the timestamp field in ApproximateTime needs to be volatile. * Several properties don't have the "cassandra." prefix * By polling the queue when not reporting you are increasing the bound on the number of retained failures and resources pinned by this reporting since aggregation doesn't really aggregate yet. I would just drain the queue when logging. * I think you want a count of operations that were truncated instead of a boolean so you can log the count. * [Offering into the queue returns a boolean and doesn't throw, which style wise seems a little nicer, but that is bike shedding.|https://github.com/apache/cassandra/compare/cassandra-3.0...stef1927:7392-3.0#diff-e06002c30313f8ead63ee472617d1b10R126] * More bike shedding, when aggregating I would just allocate the map each time rather than clear it. * I think you should sync logging to the debug log and logging info level to the regular log. Then in the regular log print a count of how many operations timed out since the last time you logged. That way it is easy to map between the two when looking at timestamps. * [I don't think this is a correct average calculation. You want a sum and a count. I didn't work for the simple example I did by
[jira] [Commented] (CASSANDRA-10379) Consider using -XX:+TrustFinalNonStaticFields
[ https://issues.apache.org/jira/browse/CASSANDRA-10379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933365#comment-14933365 ] Jonathan Ellis commented on CASSANDRA-10379: Sounds reasonable. AFAIK we're not doing any reflection tricks to defeat final-ness. I found this thread on the flag, FWIW: http://mail.openjdk.java.net/pipermail/hotspot-compiler-dev/2015-April/017698.html > Consider using -XX:+TrustFinalNonStaticFields > - > > Key: CASSANDRA-10379 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10379 > Project: Cassandra > Issue Type: Improvement >Reporter: Robert Stupp > Fix For: 3.x > > > The JVM option {{-XX:+TrustFinalNonStaticFields}}, although experimental, > seems to improve performance a bit without any code change. Therefore I > propose to include it in {{cassandra-env.sh/psl}}. > [cstar perf > benchmark|http://cstar.datastax.com/graph?stats=a6e75018-5ff4-11e5-bf84-42010af0688f=op_rate=1_user=1_aggregates=true=0=865.59=0=145568.5] > The cstar test was run with 8u45. > {noformat} > JVM_OPTS="$JVM_OPTS -XX:+UnlockExperimentalVMOptions" > JVM_OPTS="$JVM_OPTS -XX:+TrustFinalNonStaticFields" > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10401) json2sstable fails with NPE
[ https://issues.apache.org/jira/browse/CASSANDRA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jose Martinez Poblete updated CASSANDRA-10401: -- Description: We have the following table... {noformat} CREATE TABLE keyspace_name.table_name ( col1 text, col2 text, col3 text, col4 text, PRIMARY KEY ((col1, col2), col3) ) WITH CLUSTERING ORDER BY (col3 ASC) {noformat} And the following json in a file created from sstable2json tool {noformat} [ {"key": "This is col1:This is col2, "cells": [["This is col3:","",1443217787319002], ["This is col3:"col4","This is col4",1443217787319002]]} ] {noformat} Let's say we deleted that record form the DB and wanted to bring it back If we try to create an sstable from this data in a json file named test_file.json, we get a NPE {noformat} -bash-4.1$ json2sstable -K elp -c table_name-3264cbe063c211e5bc34e746786b7b29 test_file.json /var/lib/cassandra/data/keyspace_name/table_name-3264cbe063c211e5bc34e746786b7b29/keyspace_name-table_name-ka-1-Data.db Importing 1 keys... java.lang.NullPointerException at org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442) at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:316) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:287) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:514) ERROR: null -bash-4.1$ {noformat} was: We have the following table... {noformat} CREATE TABLE elp.document ( business_area_ct text, business_id text, document_id text, access_level_ct text, annotation_tx text, author_nm text, business_id_type_ct text, cms_id text, direction_ct text, document_code_id uuid, file_metadata_map_nm map, last_mod_ts timestamp, last_mod_user_id text, official_document_ts timestamp, repository_logical_package_no int, repository_offset_no int, repository_package_handle_user_id text, repository_package_nm text, repository_package_sequence_no int, repository_procedural_nm text, review_complete_in boolean, source_system_document_id text, source_system_nm text, status_cd text, vendor_nm text, PRIMARY KEY ((business_area_ct, business_id), document_id) ) WITH CLUSTERING ORDER BY (document_id ASC) {noformat} And the following json in a file created from sstable2json tool {noformat} [ {"key": "This is business_area_ct:This is business_id", "cells": [["This is document_id:","",1443217787319002], ["This is document_id:author_nm","This is autor_nm",1443217787319002]]} ] {noformat} Let's say we deleted that record form the DB and wanted to bring it back If we try to create an sstable from this json file, get a NPE {noformat} -bash-4.1$ json2sstable -K elp -c document-3264cbe063c211e5bc34e746786b7b29 test2.json /var/lib/cassandra/data/elp/document-3264cbe063c211e5bc34e746786b7b29/elp-document-ka-1-Data.db Importing 1 keys... java.lang.NullPointerException at org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442) at org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:316) at org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:287) at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:514) ERROR: null -bash-4.1$ {noformat} > json2sstable fails with NPE > --- > > Key: CASSANDRA-10401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10401 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Cassandra 2.1.8.621 >Reporter: Jose Martinez Poblete > > We have the following table... > {noformat} > CREATE TABLE keyspace_name.table_name ( > col1 text, > col2 text, > col3 text, > col4 text, > PRIMARY KEY ((col1, col2), col3) > ) WITH CLUSTERING ORDER BY (col3 ASC) > {noformat} > And the following json in a file created from sstable2json tool > {noformat} > [ > {"key": "This is col1:This is col2, > "cells": [["This is col3:","",1443217787319002], >["This is col3:"col4","This is col4",1443217787319002]]} > ] > {noformat} > Let's say we deleted that record form the DB and wanted to bring it back > If we try to create an sstable from this data in a json file named > test_file.json, we get a NPE > {noformat} > -bash-4.1$ json2sstable -K elp -c table_name-3264cbe063c211e5bc34e746786b7b29 > test_file.json > /var/lib/cassandra/data/keyspace_name/table_name-3264cbe063c211e5bc34e746786b7b29/keyspace_name-table_name-ka-1-Data.db > Importing 1 keys... > java.lang.NullPointerException > at > org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442) > at >
[jira] [Created] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0
Joshua McKenzie created CASSANDRA-10403: --- Summary: Consider reverting to CMS GC on 3.0 Key: CASSANDRA-10403 URL: https://issues.apache.org/jira/browse/CASSANDRA-10403 Project: Cassandra Issue Type: Improvement Components: Config Reporter: Joshua McKenzie Fix For: 3.0.0 rc2 Reference discussion on CASSANDRA-7486. For smaller heap sizes G1 appears to have some throughput/latency issues when compared to CMS. With our default max heap size at 8G on 3.0, there's a strong argument to be made for having CMS as the default for the 3.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10276) With DTCS, do STCS in windows if more than max_threshold sstables
[ https://issues.apache.org/jira/browse/CASSANDRA-10276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1490#comment-1490 ] Björn Hegerfors commented on CASSANDRA-10276: - I have one comment on this patch. If for some reason the STCS won't find something to compact even though bucket.size() > maxThreshold (sounds unlikely with default STCS options), then just skipping that window might render that window uncompacted for all eternity. IMO, rather than trying the next window, as in the latest commit, why not just return the maxThreshold smallest SSTables from the bucket? > With DTCS, do STCS in windows if more than max_threshold sstables > - > > Key: CASSANDRA-10276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10276 > Project: Cassandra > Issue Type: Sub-task > Components: Core >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson > Fix For: 3.x, 2.1.x, 2.2.x > > > To avoid constant recompaction of files in big ( > max threshold) DTCS > windows, we should do STCS of those files. > Patch here: https://github.com/krummas/cassandra/commits/marcuse/dtcs_stcs -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9428) Implement hints compression
[ https://issues.apache.org/jira/browse/CASSANDRA-9428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie updated CASSANDRA-9428: --- Fix Version/s: (was: 3.0.0 rc2) 3.0.x > Implement hints compression > --- > > Key: CASSANDRA-9428 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9428 > Project: Cassandra > Issue Type: Sub-task >Reporter: Aleksey Yeschenko >Assignee: Joshua McKenzie > Fix For: 3.0.x > > > CASSANDRA-6230 is being implemented with compression in mind, but it's not > going to be implemented by the original ticket. > Adding it on top should be relatively straight-forward, and important, since > there are several users in the wild that use compression interface for > encryption purposes. DSE is one of them (but isn't the only one). Losing > encryption capabilities would be a regression. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10403) Consider reverting to CMS GC on 3.0
[ https://issues.apache.org/jira/browse/CASSANDRA-10403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933505#comment-14933505 ] Jonathan Shook commented on CASSANDRA-10403: Can we get some G1 tests with a 24+G heap to see if it's worth making this machine-specific? The notion of "commodity" changes with time. The settings need to adapt if possible. > Consider reverting to CMS GC on 3.0 > --- > > Key: CASSANDRA-10403 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10403 > Project: Cassandra > Issue Type: Improvement > Components: Config >Reporter: Joshua McKenzie > Fix For: 3.0.0 rc2 > > > Reference discussion on CASSANDRA-7486. > For smaller heap sizes G1 appears to have some throughput/latency issues when > compared to CMS. With our default max heap size at 8G on 3.0, there's a > strong argument to be made for having CMS as the default for the 3.0 release. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10166) Fix failing tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aleksey Yeschenko updated CASSANDRA-10166: -- Summary: Fix failing tests (was: Failing tests on cassandra 3.0 branch) > Fix failing tests > - > > Key: CASSANDRA-10166 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10166 > Project: Cassandra > Issue Type: Test >Reporter: Sylvain Lebresne > Fix For: 3.0.0 rc2 > > > Until we find a better way to track those things, this is meant as a master > ticket to track tickets open regarding tests (unit test and dtests, though at > the time of this writing only dtest are still failing) that are still > failing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-7486) Migrate to G1GC by default
[ https://issues.apache.org/jira/browse/CASSANDRA-7486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joshua McKenzie resolved CASSANDRA-7486. Resolution: Fixed Opened CASSANDRA-10403 to cover profiling and possible revert to CMS. > Migrate to G1GC by default > -- > > Key: CASSANDRA-7486 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7486 > Project: Cassandra > Issue Type: New Feature > Components: Config >Reporter: Jonathan Ellis > Fix For: 3.0 alpha 1 > > > See > http://www.slideshare.net/MonicaBeckwith/garbage-first-garbage-collector-g1-7486gc-migration-to-expectations-and-advanced-tuning > and https://twitter.com/rbranson/status/482113561431265281 > May want to default 2.1 to G1. > 2.1 is a different animal from 2.0 after moving most of memtables off heap. > Suspect this will help G1 even more than CMS. (NB this is off by default but > needs to be part of the test.) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933382#comment-14933382 ] Yuki Morishita commented on CASSANDRA-7276: --- Sure. Though attached patches need to work a bit more. Or maybe consider using logging context as suggested before. > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tyler Hobbs >Assignee: Nitzan Volman >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9774) fix sstableverify dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-9774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sylvain Lebresne updated CASSANDRA-9774: Assignee: Jeff Jirsa > fix sstableverify dtest > --- > > Key: CASSANDRA-9774 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9774 > Project: Cassandra > Issue Type: Bug >Reporter: Jim Witschey >Assignee: Jeff Jirsa >Priority: Blocker > Fix For: 3.0.0 rc2 > > > One of our dtests for {{sstableverify}} > ({{offline_tools_test.py:TestOfflineTools.sstableverify_test}}) is failing > hard on trunk ([cassci > history|http://cassci.datastax.com/view/trunk/job/trunk_dtest/lastCompletedBuild/testReport/offline_tools_test/TestOfflineTools/sstableverify_test/history/]) > The way the test works is by deleting an SSTable, then running > {{sstableverify}} on its table. In earlier versions, it successfully detects > this problem and outputs that it "was not released before the reference was > garbage collected". The test no longer finds this string in the output; > looking through the output of the test, it doesn't look like it reports any > problems at all. > EDIT: After digging into the C* source a bit, I may have misattributed the > problem to {{sstableverify}}; this could be a more general memory management > problem, as the error text expected in the dtest is emitted by part of the > {{Ref}} implementation: > https://github.com/apache/cassandra/blob/075ff5000ced24b42f3b540815cae471bee4049d/src/java/org/apache/cassandra/utils/concurrent/Ref.java#L187 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933377#comment-14933377 ] Paulo Motta commented on CASSANDRA-7276: [~yukim] mind if I take this for review as it might be related to the recent logging changes? > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tyler Hobbs >Assignee: Nitzan Volman >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10401) json2sstable fails with NPE
[ https://issues.apache.org/jira/browse/CASSANDRA-10401?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Philip Thompson updated CASSANDRA-10401: Tester: Jim Witschey > json2sstable fails with NPE > --- > > Key: CASSANDRA-10401 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10401 > Project: Cassandra > Issue Type: Bug > Components: Tools > Environment: Cassandra 2.1.8.621 >Reporter: Jose Martinez Poblete > > We have the following table... > {noformat} > CREATE TABLE elp.document ( > business_area_ct text, > business_id text, > document_id text, > access_level_ct text, > annotation_tx text, > author_nm text, > business_id_type_ct text, > cms_id text, > direction_ct text, > document_code_id uuid, > file_metadata_map_nm map, > last_mod_ts timestamp, > last_mod_user_id text, > official_document_ts timestamp, > repository_logical_package_no int, > repository_offset_no int, > repository_package_handle_user_id text, > repository_package_nm text, > repository_package_sequence_no int, > repository_procedural_nm text, > review_complete_in boolean, > source_system_document_id text, > source_system_nm text, > status_cd text, > vendor_nm text, > PRIMARY KEY ((business_area_ct, business_id), document_id) > ) WITH CLUSTERING ORDER BY (document_id ASC) > {noformat} > And the following json in a file created from sstable2json tool > {noformat} > [ > {"key": "This is business_area_ct:This is business_id", > "cells": [["This is document_id:","",1443217787319002], >["This is document_id:author_nm","This is > autor_nm",1443217787319002]]} > ] > {noformat} > Let's say we deleted that record form the DB and wanted to bring it back > If we try to create an sstable from this json file, get a NPE > {noformat} > -bash-4.1$ json2sstable -K elp -c document-3264cbe063c211e5bc34e746786b7b29 > test2.json > /var/lib/cassandra/data/elp/document-3264cbe063c211e5bc34e746786b7b29/elp-document-ka-1-Data.db > Importing 1 keys... > java.lang.NullPointerException > at > org.apache.cassandra.tools.SSTableImport.getKeyValidator(SSTableImport.java:442) > at > org.apache.cassandra.tools.SSTableImport.importUnsorted(SSTableImport.java:316) > at > org.apache.cassandra.tools.SSTableImport.importJson(SSTableImport.java:287) > at org.apache.cassandra.tools.SSTableImport.main(SSTableImport.java:514) > ERROR: null > -bash-4.1$ > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10289) Fix cqlshlib tests
[ https://issues.apache.org/jira/browse/CASSANDRA-10289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-10289: --- Reviewer: Stefania [~Stefania] to review > Fix cqlshlib tests > -- > > Key: CASSANDRA-10289 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10289 > Project: Cassandra > Issue Type: Bug > Components: Tests >Reporter: Jim Witschey >Assignee: Jim Witschey > Labels: cqlsh > Fix For: 3.0.0 rc2 > > Attachments: trunk-10289.txt > > > The cqlsh tests in trunk haven't been running for a while: > http://cassci.datastax.com/view/All_Jobs/job/trunk_cqlshlib/423/testReport/ > This looks like the driver errors that happened because of CASSANDRA-6717. > Not sure why it's happening now; the driver installation looks normal to me > on those jobs. [~mshuler]? > There were also some changes to cqlsh itself that also broke the test > harness, but I believe those are fixed here: > https://github.com/mambocab/cassandra/tree/fix-cqlsh-tests > Once the tests are running successfully on CassCI, I'll test my patch and > mark as patch available. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?
[ https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933464#comment-14933464 ] Sylvain Lebresne commented on CASSANDRA-9806: - [~aboudreault] Could you try to bisect when this started to fail? > some TTL test are failing on trunk: losing data after restart? > --- > > Key: CASSANDRA-9806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9806 > Project: Cassandra > Issue Type: Bug >Reporter: Alan Boudreault >Priority: Blocker > Fix For: 3.0.0 rc2 > > > ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is > failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are > failing: > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/ > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/ > After some debugging, I noticed a strange behaviour. It looks like some data > disappear after a node restart, even if the row has no TTL set. Here a test > example where I see the issue with latest trunk: > https://gist.github.com/aboudreault/94cb552750a186ca853d -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-9855) Make page_size configurable in cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-9855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-9855: -- Reviewer: Philip Thompson > Make page_size configurable in cqlsh > > > Key: CASSANDRA-9855 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9855 > Project: Cassandra > Issue Type: Improvement >Reporter: Sylvain Lebresne >Assignee: Ryan McGuire >Priority: Minor > Labels: cqlsh > Fix For: 2.2.x, 3.0.0 rc2 > > Attachments: 9855.txt > > > Appears we made cqlsh use paging, but the page size if hard-coded. It sounds > easy enough to make that configurable, by either one of: > {noformat} > PAGING 50; > PAGING ON WITH PAGE_SIZE=50; > {noformat} > I'm sure some users may be happy with the convenience but it would also be > nice when we want to quickly test paging. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10383) Disable auto snapshot on selected tables.
[ https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14910197#comment-14910197 ] ASF GitHub Bot commented on CASSANDRA-10383: GitHub user tommystendahl opened a pull request: https://github.com/apache/cassandra/pull/54 Disable auto snapshot on selected tables (CASSANDRA-10383) You can merge this pull request into a Git repository by running: $ git pull https://github.com/tommystendahl/cassandra cassandra-10383 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/cassandra/pull/54.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #54 commit c647b57abedda8e2c578cc0bab14d79ca0722b71 Author: tommy stendahlDate: 2015-09-28T08:24:18Z Disable auto snapshot on selected tables (CASSANDRA-10383) > Disable auto snapshot on selected tables. > - > > Key: CASSANDRA-10383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10383 > Project: Cassandra > Issue Type: Improvement >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Fix For: 2.1.x > > > I have a use case where I would like to turn off auto snapshot for selected > tables, I don't want to turn it off completely since its a good feature. > Looking at the code I think it would be relatively easy to fix. > My plan is to create a new table property named something like > "disable_auto_snapshot". If set to false it will prevent auto snapshot on the > table, if set to true auto snapshot will be controlled by the "auto_snapshot" > property in the cassandra.yaml. Default would be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10383) Disable auto snapshot on selected tables.
[ https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933082#comment-14933082 ] Tommy Stendahl edited comment on CASSANDRA-10383 at 9/28/15 9:41 AM: - Attached a patch based on 2.1. Sorry for the pull request, it was supposed to be on my private github fork. :-( was (Author: tommy_s): based on 2.1 > Disable auto snapshot on selected tables. > - > > Key: CASSANDRA-10383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10383 > Project: Cassandra > Issue Type: Improvement >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Fix For: 2.1.x > > Attachments: 10383.txt > > > I have a use case where I would like to turn off auto snapshot for selected > tables, I don't want to turn it off completely since its a good feature. > Looking at the code I think it would be relatively easy to fix. > My plan is to create a new table property named something like > "disable_auto_snapshot". If set to false it will prevent auto snapshot on the > table, if set to true auto snapshot will be controlled by the "auto_snapshot" > property in the cassandra.yaml. Default would be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10383) Disable auto snapshot on selected tables.
[ https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933086#comment-14933086 ] ASF GitHub Bot commented on CASSANDRA-10383: Github user tommystendahl closed the pull request at: https://github.com/apache/cassandra/pull/54 > Disable auto snapshot on selected tables. > - > > Key: CASSANDRA-10383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10383 > Project: Cassandra > Issue Type: Improvement >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Fix For: 2.1.x > > Attachments: 10383.txt > > > I have a use case where I would like to turn off auto snapshot for selected > tables, I don't want to turn it off completely since its a good feature. > Looking at the code I think it would be relatively easy to fix. > My plan is to create a new table property named something like > "disable_auto_snapshot". If set to false it will prevent auto snapshot on the > table, if set to true auto snapshot will be controlled by the "auto_snapshot" > property in the cassandra.yaml. Default would be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-10383) Disable auto snapshot on selected tables.
[ https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933082#comment-14933082 ] Tommy Stendahl edited comment on CASSANDRA-10383 at 9/28/15 9:53 AM: - Attached a patch based on 2.1. The patch is also pn github, [here|https://github.com/tommystendahl/cassandra/tree/cassandra-10383] Sorry for the pull request, it was supposed to be on my private github fork. :-( was (Author: tommy_s): Attached a patch based on 2.1. Sorry for the pull request, it was supposed to be on my private github fork. :-( > Disable auto snapshot on selected tables. > - > > Key: CASSANDRA-10383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10383 > Project: Cassandra > Issue Type: Improvement >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Fix For: 2.1.x > > Attachments: 10383.txt > > > I have a use case where I would like to turn off auto snapshot for selected > tables, I don't want to turn it off completely since its a good feature. > Looking at the code I think it would be relatively easy to fix. > My plan is to create a new table property named something like > "disable_auto_snapshot". If set to false it will prevent auto snapshot on the > table, if set to true auto snapshot will be controlled by the "auto_snapshot" > property in the cassandra.yaml. Default would be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10383) Disable auto snapshot on selected tables.
[ https://issues.apache.org/jira/browse/CASSANDRA-10383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommy Stendahl updated CASSANDRA-10383: --- Attachment: 10383.txt based on 2.1 > Disable auto snapshot on selected tables. > - > > Key: CASSANDRA-10383 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10383 > Project: Cassandra > Issue Type: Improvement >Reporter: Tommy Stendahl >Assignee: Tommy Stendahl > Fix For: 2.1.x > > Attachments: 10383.txt > > > I have a use case where I would like to turn off auto snapshot for selected > tables, I don't want to turn it off completely since its a good feature. > Looking at the code I think it would be relatively easy to fix. > My plan is to create a new table property named something like > "disable_auto_snapshot". If set to false it will prevent auto snapshot on the > table, if set to true auto snapshot will be controlled by the "auto_snapshot" > property in the cassandra.yaml. Default would be true. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[Cassandra Wiki] Update of "Committers" by RobertStupp
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification. The "Committers" page has been changed by RobertStupp: https://wiki.apache.org/cassandra/Committers?action=diff=53=54 ||Tyler Hobbs ||Mar 2014 ||Datastax || || ||Benedict Elliott Smith ||May 2014 ||Datastax || || ||Josh Mckenzie ||Jul 2014 ||Datastax || || - ||Robert Stupp ||Jan 2015 ||contentteam || || + ||Robert Stupp ||Jan 2015 ||Independent || || ||Sam Tunnicliffe ||May 2015 ||Datastax || || ||Benjamin Lerer ||Jul 2015 ||Datastax || ||
[jira] [Commented] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?
[ https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934172#comment-14934172 ] Paulo Motta commented on CASSANDRA-9806: I forgot to mention that I also tested the attached gist and it passed. > some TTL test are failing on trunk: losing data after restart? > --- > > Key: CASSANDRA-9806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9806 > Project: Cassandra > Issue Type: Bug >Reporter: Alan Boudreault >Assignee: Paulo Motta >Priority: Blocker > Fix For: 3.0.0 rc2 > > > ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is > failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are > failing: > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/ > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/ > After some debugging, I noticed a strange behaviour. It looks like some data > disappear after a node restart, even if the row has no TTL set. Here a test > example where I see the issue with latest trunk: > https://gist.github.com/aboudreault/94cb552750a186ca853d -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10241) Keep a separate production debug log for troubleshooting
[ https://issues.apache.org/jira/browse/CASSANDRA-10241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934310#comment-14934310 ] Paulo Motta commented on CASSANDRA-10241: - Now that we have the basic capability committed, I'd like to follow up on this by introducing a simple logging guideline for future system logging statements, based on the discussions of this thread and current practices. This guideline could help external and new contributors to understand the logging practices, and current contributors to review tickets related to logging using the new framework. I've drafted an initial version for review, presented below: *INFO*: General cluster status, operations overview. At this level a beginner user or operator should be able to understand most messages. Examples: * Node startup and shutdown information * User or system triggered operations overview ** Repair start and finish state ** Cleanup start and finish state ** Bootstrap start and finish state ** Index rebuild start and finish state *DEBUG*: Low frequency state changes or message passing. Non-critical path logs on operation details, performance measurements or general troubleshooting information. At this level an advanced operator or system developer will have elements to investigate or detect erroneous conditions or performance bottlenecks, extract reproduction steps or inspect advanced operational information. Examples: * SSTable flushing * Compactions in progress * Gossip or schema state changes * Operations intermediate steps ** Repair steps ** Stream session message exchanges *WARN*: Use of suboptimal parameters or deprecated options, detection of degraded performance, capability limitations or missing dependencies. General optimization tips. At this level, an operator should be able to detect an eminent error condition, use of suboptimal parameters or non-critical configuration errors. Examples: * Use of chunk_length_in_kb property instead of chunk_length * GC above treshold warnings * OpenJDK not recommended notice * Small sstable size warning (Testing done for CASSANDRA-5727 indicates that performance improves up to 160MB) *ERROR*: A expected error condition has ocurred. Non-critical, transient or recovered errors might be reported at DEBUG level instead so they don't pollute system.log. Examples: * critical errors in general (corrupted disk, read error, etc) * leak detection *TRACE*: High frequency state changes or message passing, critical path logs, testing or development information. This level is disabled by default, so everything that does not fit in the previous levels and highly verbose stuff must be kept at TRACE level. Examples: * Failure detector checks * Gossip digests * CassandraServer.insert() What do you think [~aweisberg]? After review and suggestions, if there are no objections, I will add this to the wiki and send an e-mail to the dev list. After this, the next step would be to groom the current logs in a separate ticket so they follow the guideline. > Keep a separate production debug log for troubleshooting > > > Key: CASSANDRA-10241 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10241 > Project: Cassandra > Issue Type: New Feature > Components: Config >Reporter: Jonathan Ellis >Assignee: Paulo Motta > Fix For: 2.2.x, 3.0.0 rc2 > > Attachments: 2.2-debug.log, 2.2-system.log, 3.0-debug.log, > 3.0-system.log > > > [~aweisberg] had the suggestion to keep a separate debug log for aid in > troubleshooting, not intended for regular human consumption but where we can > log things that might help if something goes wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10298) Replaced dead node stayed in gossip forever
[ https://issues.apache.org/jira/browse/CASSANDRA-10298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934354#comment-14934354 ] Stefania commented on CASSANDRA-10298: -- No they are both still under test, besides we are focusing on the 3.0 branch for the tests. If you really need this in 2.1 then you can commit this patch. > Replaced dead node stayed in gossip forever > --- > > Key: CASSANDRA-10298 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10298 > Project: Cassandra > Issue Type: Bug >Reporter: Dikang Gu >Assignee: Dikang Gu > Fix For: 2.1.x > > Attachments: CASSANDRA-10298.patch > > > The dead node stayed in the nodetool status, > DN 10.210.165.55379.76 GB 256 ? null > And in the log, it throws NPE when trying to remove it. > {code} > 2015-09-10_06:41:22.92453 ERROR 06:41:22 Exception in thread > Thread[GossipStage:1,5,main] > 2015-09-10_06:41:22.92454 java.lang.NullPointerException: null > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.utils.UUIDGen.decompose(UUIDGen.java:100) > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.db.HintedHandOffManager.deleteHintsForEndpoint(HintedHandOffManager.java:201) > > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.service.StorageService.excise(StorageService.java:1886) > 2015-09-10_06:41:22.92455 at > org.apache.cassandra.service.StorageService.excise(StorageService.java:1902) > 2015-09-10_06:41:22.92456 at > org.apache.cassandra.service.StorageService.handleStateLeft(StorageService.java:1805) > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1473) > > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2099) > 2015-09-10_06:41:22.92457 at > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1009) > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1085) > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > > 2015-09-10_06:41:22.92458 at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:62) > 2015-09-10_06:41:22.92459 at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_45] > 2015-09-10_06:41:22.92460 at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_45] > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10089) NullPointerException in Gossip handleStateNormal
[ https://issues.apache.org/jira/browse/CASSANDRA-10089?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934356#comment-14934356 ] Stefania commented on CASSANDRA-10089: -- [~jbellis] can I have a reviewer for this patch? I was not able to reproduce the problem despite a few rounds of CI, so I am no 100% sure of what causes tokens to be missing during a {{handleStateNormal}} but the patch will at least fix the NPE. > NullPointerException in Gossip handleStateNormal > > > Key: CASSANDRA-10089 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10089 > Project: Cassandra > Issue Type: Bug >Reporter: Stefania >Assignee: Stefania > Fix For: 2.1.x, 2.2.x, 3.0.x > > > Whilst comparing dtests for CASSANDRA-9970 I found [this failing > dtest|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-9970-dtest/lastCompletedBuild/testReport/consistency_test/TestConsistency/short_read_test/] > in 2.2: > {code} > Unexpected error in node1 node log: ['ERROR [GossipStage:1] 2015-08-14 > 15:39:57,873 CassandraDaemon.java:183 - Exception in thread > Thread[GossipStage:1,5,main] java.lang.NullPointerException: null \tat > org.apache.cassandra.service.StorageService.getApplicationStateValue(StorageService.java:1731) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.getTokensFor(StorageService.java:1804) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.handleStateNormal(StorageService.java:1857) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onChange(StorageService.java:1629) > ~[main/:na] \tat > org.apache.cassandra.service.StorageService.onJoin(StorageService.java:2312) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.handleMajorStateChange(Gossiper.java:1025) > ~[main/:na] \tat > org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:1106) > ~[main/:na] \tat > org.apache.cassandra.gms.GossipDigestAck2VerbHandler.doVerb(GossipDigestAck2VerbHandler.java:49) > ~[main/:na] \tat > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:66) > ~[main/:na] \tat > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > ~[na:1.7.0_80] \tat > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > ~[na:1.7.0_80] \tat java.lang.Thread.run(Thread.java:745) ~[na:1.7.0_80]'] > {code} > I wasn't able to find it on unpatched branches but it is clearly not related > to CASSANDRA-9970, if anything it could have been a side effect of > CASSANDRA-9871. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10406) Nodetool supports to rebuild from specific ranges.
[ https://issues.apache.org/jira/browse/CASSANDRA-10406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934363#comment-14934363 ] Yuki Morishita commented on CASSANDRA-10406: Can you post the patch against cassandra-2.1 branch? 1.2 and 2.0 went into EOL and no further development is happening on those versions. > Nodetool supports to rebuild from specific ranges. > -- > > Key: CASSANDRA-10406 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10406 > Project: Cassandra > Issue Type: Improvement >Reporter: Dikang Gu >Assignee: Dikang Gu > Fix For: 1.2.x > > Attachments: CASSANDRA-10406.patch > > > Add the 'nodetool rebuildrange' command, so that if `nodetool rebuild` > failed, we do not need to rebuild all the ranges, and can just rebuild those > failed ones. > Should be easily ported to all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?
[ https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paulo Motta resolved CASSANDRA-9806. Resolution: Invalid > some TTL test are failing on trunk: losing data after restart? > --- > > Key: CASSANDRA-9806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9806 > Project: Cassandra > Issue Type: Bug >Reporter: Alan Boudreault >Assignee: Paulo Motta >Priority: Blocker > Fix For: 3.0.0 rc2 > > > ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is > failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are > failing: > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/ > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/ > After some debugging, I noticed a strange behaviour. It looks like some data > disappear after a node restart, even if the row has no TTL set. Here a test > example where I see the issue with latest trunk: > https://gist.github.com/aboudreault/94cb552750a186ca853d -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-9806) some TTL test are failing on trunk: losing data after restart?
[ https://issues.apache.org/jira/browse/CASSANDRA-9806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934149#comment-14934149 ] Paulo Motta commented on CASSANDRA-9806: The error was reported on jenkins build #346. We're currently on build #625, and both [ttl_is_respected_on_delayed_replication_test|http://cassci.datastax.com/view/trunk/job/trunk_dtest/625/testReport/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/history/] and [ttl_is_respected_on_repair_test|http://cassci.datastax.com/view/trunk/job/trunk_dtest/625/testReport/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/history/] seem to been consistently stable in the last builds. The only recent failures in [ttl_test.py|http://cassci.datastax.com/view/trunk/job/trunk_dtest/625/testReport/ttl_test/history/] are related to driver connection timeouts during setup, so I increased our default dtest timeout from 5s to 10s, which should make these and other tests less flakey: [dtest PR|https://github.com/riptano/cassandra-dtest/pull/572]. > some TTL test are failing on trunk: losing data after restart? > --- > > Key: CASSANDRA-9806 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9806 > Project: Cassandra > Issue Type: Bug >Reporter: Alan Boudreault >Assignee: Paulo Motta >Priority: Blocker > Fix For: 3.0.0 rc2 > > > ttl_test.TestDistributedTTL.ttl_is_respected_on_delayed_replication_test is > failing and ttl_test.TestDistributedTTL.ttl_is_respected_on_repair_test are > failing: > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_delayed_replication_test/ > http://cassci.datastax.com/view/trunk/job/trunk_dtest/346/testReport/junit/ttl_test/TestDistributedTTL/ttl_is_respected_on_repair_test/ > After some debugging, I noticed a strange behaviour. It looks like some data > disappear after a node restart, even if the row has no TTL set. Here a test > example where I see the issue with latest trunk: > https://gist.github.com/aboudreault/94cb552750a186ca853d -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
[ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934321#comment-14934321 ] Stefania commented on CASSANDRA-10231: -- If you've reproduced it with your Jepsen test could you attach the logs please? > Null status entries on nodes that crash during decommission of a different > node > --- > > Key: CASSANDRA-10231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10231 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Stefania > Fix For: 3.0.0 rc2 > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > In a 5 node cluster, if a node crashes at a certain point (unknown) during > the decommission of a different node, it may start with a null entry for the > decommissioned node like so: > DN 10.0.0.5 ? 256 ? null rack1 > This entry does not get updated/cleared by gossip. This entry is removed upon > a restart of the affected node. > This issue is further detailed in ticket > [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10231) Null status entries on nodes that crash during decommission of a different node
[ https://issues.apache.org/jira/browse/CASSANDRA-10231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934358#comment-14934358 ] Stefania commented on CASSANDRA-10231: -- Also, a similar exception affects 2.1+ as well, albeit the patch would be slightly different, see CASSANDRA-10298. > Null status entries on nodes that crash during decommission of a different > node > --- > > Key: CASSANDRA-10231 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10231 > Project: Cassandra > Issue Type: Bug >Reporter: Joel Knighton >Assignee: Stefania > Fix For: 3.0.0 rc2 > > > This issue is reproducible through a Jepsen test of materialized views that > crashes and decommissions nodes throughout the test. > In a 5 node cluster, if a node crashes at a certain point (unknown) during > the decommission of a different node, it may start with a null entry for the > decommissioned node like so: > DN 10.0.0.5 ? 256 ? null rack1 > This entry does not get updated/cleared by gossip. This entry is removed upon > a restart of the affected node. > This issue is further detailed in ticket > [10068|https://issues.apache.org/jira/browse/CASSANDRA-10068]. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10205) decommissioned_wiped_node_can_join_test fails on Jenkins
[ https://issues.apache.org/jira/browse/CASSANDRA-10205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934364#comment-14934364 ] Stefania commented on CASSANDRA-10205: -- [~jbellis] we need a new reviewer, thanks. > decommissioned_wiped_node_can_join_test fails on Jenkins > > > Key: CASSANDRA-10205 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10205 > Project: Cassandra > Issue Type: Test >Reporter: Stefania >Assignee: Stefania > Attachments: decommissioned_wiped_node_can_join_test.tar.gz > > > This test passes locally but reliably fails on Jenkins. It seems after we > restart node4, it is unable to Gossip with other nodes: > {code} > INFO [HANDSHAKE-/127.0.0.2] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.2 > INFO [HANDSHAKE-/127.0.0.1] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.1 > INFO [HANDSHAKE-/127.0.0.3] 2015-08-27 06:50:42,778 > OutboundTcpConnection.java:494 - Handshaking version with /127.0.0.3 > ERROR [main] 2015-08-27 06:51:13,785 CassandraDaemon.java:635 - Exception > encountered during startup > java.lang.RuntimeException: Unable to gossip with any seeds > at > org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1342) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:518) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.prepareToJoin(StorageService.java:763) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:687) > ~[main/:na] > at > org.apache.cassandra.service.StorageService.initServer(StorageService.java:570) > ~[main/:na] > at > org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:320) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:516) > [main/:na] > at > org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:622) > [main/:na] > WARN [StorageServiceShutdownHook] 2015-08-27 06:51:13,799 Gossiper.java:1453 > - No local state or state is in silent shutdown, not announcing shutdown > {code} > It seems both the addresses and port number of the seeds are correct so I > don't think the problem is the Amazon private addresses but I might be wrong. > It's also worth noting that the first time the node starts up without > problems. The problem only occurs during a restart. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10406) Nodetool supports to rebuild from specific ranges.
Dikang Gu created CASSANDRA-10406: - Summary: Nodetool supports to rebuild from specific ranges. Key: CASSANDRA-10406 URL: https://issues.apache.org/jira/browse/CASSANDRA-10406 Project: Cassandra Issue Type: Improvement Reporter: Dikang Gu Assignee: Dikang Gu Fix For: 1.2.x Add the 'nodetool rebuildrange' command, so that if `nodetool rebuild` failed, we do not need to rebuild all the ranges, and can just rebuild those failed ones. Should be easily ported to all versions. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10228) JVMStabilityInspector should inspect cause and suppressed exceptions
[ https://issues.apache.org/jira/browse/CASSANDRA-10228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14933841#comment-14933841 ] Ariel Weisberg commented on CASSANDRA-10228: Cassci looks happy near as I can tell. This is ready for commit. > JVMStabilityInspector should inspect cause and suppressed exceptions > > > Key: CASSANDRA-10228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10228 > Project: Cassandra > Issue Type: Bug > Components: Core >Reporter: Benedict >Assignee: Paul MacIntosh > Labels: lhf > Fix For: 2.1.x, 2.2.x, 3.0.x > > > JVMStabilityInspector only checks the outer exception, but this can wrap or > otherwise suppress an exception we do consider "unstable". We should check > all of the exceptions in an exception graph before deciding things are kosher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-10407) Benchmark and evaluate CASSANDRA-8894 improvements
Aleksey Yeschenko created CASSANDRA-10407: - Summary: Benchmark and evaluate CASSANDRA-8894 improvements Key: CASSANDRA-10407 URL: https://issues.apache.org/jira/browse/CASSANDRA-10407 Project: Cassandra Issue Type: Test Reporter: Aleksey Yeschenko Fix For: 3.0.0 rc2 The original ticket (CASSANDRA-8894) was committed to 3.0 alpha1 two months ago. We need to get proper performance tests before GA. See [~benedict]'s [comment|https://issues.apache.org/jira/browse/CASSANDRA-8894?focusedCommentId=14631203=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14631203] for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-8894) Our default buffer size for (uncompressed) buffered reads should be smaller, and based on the expected record size
[ https://issues.apache.org/jira/browse/CASSANDRA-8894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934494#comment-14934494 ] Aleksey Yeschenko commented on CASSANDRA-8894: -- Closing the ticket as it's been committed to 3.0 alpha1 two months ago. Opened a separate CASSANDRA-10407 to follow up with proper tests. > Our default buffer size for (uncompressed) buffered reads should be smaller, > and based on the expected record size > -- > > Key: CASSANDRA-8894 > URL: https://issues.apache.org/jira/browse/CASSANDRA-8894 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Benedict >Assignee: Stefania > Labels: benedict-to-commit > Fix For: 3.0 alpha 1 > > Attachments: 8894_25pct.yaml, 8894_5pct.yaml, 8894_tiny.yaml, > screenshot-1.png, screenshot-2.png > > > A large contributor to slower buffered reads than mmapped is likely that we > read a full 64Kb at once, when average record sizes may be as low as 140 > bytes on our stress tests. The TLB has only 128 entries on a modern core, and > each read will touch 32 of these, meaning we are unlikely to almost ever be > hitting the TLB, and will be incurring at least 30 unnecessary misses each > time (as well as the other costs of larger than necessary accesses). When > working with an SSD there is little to no benefit reading more than 4Kb at > once, and in either case reading more data than we need is wasteful. So, I > propose selecting a buffer size that is the next larger power of 2 than our > average record size (with a minimum of 4Kb), so that we expect to read in one > operation. I also propose that we create a pool of these buffers up-front, > and that we ensure they are all exactly aligned to a virtual page, so that > the source and target operations each touch exactly one virtual page per 4Kb > of expected record size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-10407) Benchmark and evaluate CASSANDRA-8894 improvements
[ https://issues.apache.org/jira/browse/CASSANDRA-10407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934495#comment-14934495 ] Aleksey Yeschenko commented on CASSANDRA-10407: --- cc [~enigmacurry] > Benchmark and evaluate CASSANDRA-8894 improvements > -- > > Key: CASSANDRA-10407 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10407 > Project: Cassandra > Issue Type: Test >Reporter: Aleksey Yeschenko > Fix For: 3.0.0 rc2 > > > The original ticket (CASSANDRA-8894) was committed to 3.0 alpha1 two months > ago. We need to get proper performance tests before GA. > See [~benedict]'s > [comment|https://issues.apache.org/jira/browse/CASSANDRA-8894?focusedCommentId=14631203=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14631203] > for more details. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934399#comment-14934399 ] Paulo Motta commented on CASSANDRA-7276: A more elegant approach would be to use logback [MDC|http://logback.qos.ch/manual/mdc.html] feature, which allows to transparently add thread-local contexts to log statements (similar to the solution mentioned by [~odpeer]). We could add new CF and KS MDC placeholders to the appender layout pattern on logback.xml (they will be empty if not set), and set them when necessary. We could start by setting on the following places: * VerbHandlers which contains KS and CF info * Flush * Compaction Some helper methods would be nice to provide encapsulated and consistent access to MDC. Are you still willing to take this [~nitzanv]? > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tyler Hobbs >Assignee: Nitzan Volman >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934449#comment-14934449 ] J.P. Eiti Kimura commented on CASSANDRA-7276: - Very nice suggestion [~pauloricardomg]! We have been using MDC with logback for years you our platforms at Movile. It enable us to trace all the thread execution context. I think It is a better approuch than we are thinking before :) [~nitzanv], I think I can help as well with this task. [~pauloricardomg] I believe I can start to work on it as you suggested in the next few weeks ;) > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tyler Hobbs >Assignee: Nitzan Volman >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-7276) Include keyspace and table names in logs where possible
[ https://issues.apache.org/jira/browse/CASSANDRA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14934449#comment-14934449 ] J.P. Eiti Kimura edited comment on CASSANDRA-7276 at 9/29/15 1:14 AM: -- Very nice suggestion [~pauloricardomg]! We have been using MDC with logback for years in our platforms at Movile. It enable us to trace all the thread execution context. I think It is a better approuch than we are thinking before :) [~nitzanv], I think I can help as well with this task. [~pauloricardomg] I believe I can start to work on it as you suggested in the next few weeks ;) was (Author: eitikimura): Very nice suggestion [~pauloricardomg]! We have been using MDC with logback for years you our platforms at Movile. It enable us to trace all the thread execution context. I think It is a better approuch than we are thinking before :) [~nitzanv], I think I can help as well with this task. [~pauloricardomg] I believe I can start to work on it as you suggested in the next few weeks ;) > Include keyspace and table names in logs where possible > --- > > Key: CASSANDRA-7276 > URL: https://issues.apache.org/jira/browse/CASSANDRA-7276 > Project: Cassandra > Issue Type: Improvement > Components: Core >Reporter: Tyler Hobbs >Assignee: Nitzan Volman >Priority: Minor > Labels: bootcamp, lhf > Fix For: 2.1.x > > Attachments: 2.1-CASSANDRA-7276-v1.txt, > cassandra-2.1-7276-compaction.txt, cassandra-2.1-7276.txt > > > Most error messages and stacktraces give you no clue as to what keyspace or > table was causing the problem. For example: > {noformat} > ERROR [MutationStage:61648] 2014-05-20 12:05:45,145 CassandraDaemon.java > (line 198) Exception in thread Thread[MutationStage:61648,5,main] > java.lang.IllegalArgumentException > at java.nio.Buffer.limit(Unknown Source) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getBytes(AbstractCompositeType.java:63) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.getWithShortLength(AbstractCompositeType.java:72) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:98) > at > org.apache.cassandra.db.marshal.AbstractCompositeType.compare(AbstractCompositeType.java:35) > at > edu.stanford.ppl.concurrent.SnapTreeMap$1.compareTo(SnapTreeMap.java:538) > at > edu.stanford.ppl.concurrent.SnapTreeMap.attemptUpdate(SnapTreeMap.java:1108) > at > edu.stanford.ppl.concurrent.SnapTreeMap.updateUnderRoot(SnapTreeMap.java:1059) > at edu.stanford.ppl.concurrent.SnapTreeMap.update(SnapTreeMap.java:1023) > at > edu.stanford.ppl.concurrent.SnapTreeMap.putIfAbsent(SnapTreeMap.java:985) > at > org.apache.cassandra.db.AtomicSortedColumns$Holder.addColumn(AtomicSortedColumns.java:328) > at > org.apache.cassandra.db.AtomicSortedColumns.addAllWithSizeDelta(AtomicSortedColumns.java:200) > at org.apache.cassandra.db.Memtable.resolve(Memtable.java:226) > at org.apache.cassandra.db.Memtable.put(Memtable.java:173) > at > org.apache.cassandra.db.ColumnFamilyStore.apply(ColumnFamilyStore.java:893) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:368) > at org.apache.cassandra.db.Keyspace.apply(Keyspace.java:333) > at org.apache.cassandra.db.RowMutation.apply(RowMutation.java:206) > at > org.apache.cassandra.db.RowMutationVerbHandler.doVerb(RowMutationVerbHandler.java:56) > at > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:60) > at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) > at java.lang.Thread.run(Unknown Source) > {noformat} > We should try to include info on the keyspace and column family in the error > messages or logs whenever possible. This includes reads, writes, > compactions, flushes, repairs, and probably more. -- This message was sent by Atlassian JIRA (v6.3.4#6332)