[jira] [Commented] (CASSANDRA-14646) built_views entries are not removed after dropping keyspace
[ https://issues.apache.org/jira/browse/CASSANDRA-14646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584637#comment-16584637 ] ZhaoYang commented on CASSANDRA-14646: -- Thanks for reviewing > built_views entries are not removed after dropping keyspace > --- > > Key: CASSANDRA-14646 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14646 > Project: Cassandra > Issue Type: Bug > Components: Distributed Metadata, Materialized Views >Reporter: ZhaoYang >Assignee: ZhaoYang >Priority: Major > Fix For: 4.0 > > > If we restore view schema after dropping keyspace, view build won't be > triggered because it was marked as SUCCESS in {{built_views}} table. > | patch | CI | > | [trunk|https://github.com/jasonstack/cassandra/commits/mv_drop_ks] | > [utest|https://circleci.com/gh/jasonstack/cassandra/739] | > | [dtest|https://github.com/apache/cassandra-dtest/pull/36]| -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14631) Add RSS support for Cassandra blog
[ https://issues.apache.org/jira/browse/CASSANDRA-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584535#comment-16584535 ] Dinesh Joshi commented on CASSANDRA-14631: -- The patch seems to generally work (thanks for the ruby bundler!), I have run into a couple issues. First the RSS icon seems a bit misaligned in Safari and Chrome. !Screen Shot 2018-08-17 at 5.32.08 PM.png|width=549,height=82! Second, the RSS feed seems to have weird html characters in "RSS Follower" app on macOS. !Screen Shot 2018-08-17 at 5.32.25 PM.png|width=492,height=336! > Add RSS support for Cassandra blog > -- > > Key: CASSANDRA-14631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14631 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jacques-Henri Berthemet >Assignee: Jeff Beck >Priority: Major > Attachments: 14631-site.txt, Screen Shot 2018-08-17 at 5.32.08 > PM.png, Screen Shot 2018-08-17 at 5.32.25 PM.png > > > It would be convenient to add RSS support to Cassandra blog: > [http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html] > And maybe also for other resources like new versions, but this ticket is > about blog. > > {quote}From: Scott Andreas > Sent: Wednesday, August 08, 2018 6:53 PM > To: [d...@cassandra.apache.org|mailto:d...@cassandra.apache.org] > Subject: Re: Apache Cassandra Blog is now live > > Please feel free to file a ticket (label: Documentation and Website). > > It looks like Jekyll, the static site generator used to build the website, > has a plugin that generates Atom feeds if someone would like to work on > adding one: [https://github.com/jekyll/jekyll-feed] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14631) Add RSS support for Cassandra blog
[ https://issues.apache.org/jira/browse/CASSANDRA-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14631: - Attachment: Screen Shot 2018-08-17 at 5.32.25 PM.png > Add RSS support for Cassandra blog > -- > > Key: CASSANDRA-14631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14631 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jacques-Henri Berthemet >Assignee: Jeff Beck >Priority: Major > Attachments: 14631-site.txt, Screen Shot 2018-08-17 at 5.32.08 > PM.png, Screen Shot 2018-08-17 at 5.32.25 PM.png > > > It would be convenient to add RSS support to Cassandra blog: > [http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html] > And maybe also for other resources like new versions, but this ticket is > about blog. > > {quote}From: Scott Andreas > Sent: Wednesday, August 08, 2018 6:53 PM > To: [d...@cassandra.apache.org|mailto:d...@cassandra.apache.org] > Subject: Re: Apache Cassandra Blog is now live > > Please feel free to file a ticket (label: Documentation and Website). > > It looks like Jekyll, the static site generator used to build the website, > has a plugin that generates Atom feeds if someone would like to work on > adding one: [https://github.com/jekyll/jekyll-feed] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14631) Add RSS support for Cassandra blog
[ https://issues.apache.org/jira/browse/CASSANDRA-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14631: - Attachment: Screen Shot 2018-08-17 at 5.32.08 PM.png > Add RSS support for Cassandra blog > -- > > Key: CASSANDRA-14631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14631 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jacques-Henri Berthemet >Assignee: Jeff Beck >Priority: Major > Attachments: 14631-site.txt, Screen Shot 2018-08-17 at 5.32.08 > PM.png, Screen Shot 2018-08-17 at 5.32.25 PM.png > > > It would be convenient to add RSS support to Cassandra blog: > [http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html] > And maybe also for other resources like new versions, but this ticket is > about blog. > > {quote}From: Scott Andreas > Sent: Wednesday, August 08, 2018 6:53 PM > To: [d...@cassandra.apache.org|mailto:d...@cassandra.apache.org] > Subject: Re: Apache Cassandra Blog is now live > > Please feel free to file a ticket (label: Documentation and Website). > > It looks like Jekyll, the static site generator used to build the website, > has a plugin that generates Atom feeds if someone would like to work on > adding one: [https://github.com/jekyll/jekyll-feed] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584498#comment-16584498 ] Jeff Jirsa commented on CASSANDRA-14653: On which version was this observed? > The performance of "NonPeriodicTasks" pools defined in class > ScheduledExecutors is low > -- > > Key: CASSANDRA-14653 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14653 > Project: Cassandra > Issue Type: Improvement > Components: Compaction > Environment: Cassandra nodes : > 3 nodes, 330G physical memory per node , and four data directory (ssd) per > node. >Reporter: Peter Xie >Priority: Major > > We use cassandra as backend storage for Janusgraph. when we loading huge data > (~2 billion vertex, ~10 billion edges), we met some problems. > > At first, we use STCS as compaction strategy , but met below exception. we > checked the value of "max memory lock" is unlimited and "file map count" is > 1 million, these values should enough for loading data. last we found this > problem is caused by the virtual memory are all cosumed by cassandra. So not > additional virtual memory can be used by compaction task , and below > exception is thrown out. > {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 > JVMStabilityInspector.javv > a:74 - OutOfMemory error letting the JVM handle the error: > java.lang.OutOfMemoryError: Map failed > {quote} > So, we change compaction strategy to LCS, this change seems can resolve the > virtual memory problem. But we found another problem : Many sstables which > has been compacted are still retained on disk, these old sstables consume so > many disk space, it's causing no enough disk for saving real data. and we > found that many files like "mc_txn_compaction_xxx.log" are created under the > data directory. > After some times' investigaton, found this problem is caused by > "NonPeriodicTasks" thread pools. this pools is always using only one thread > for processing clean task after compaction. this thread pool is instanced > with class DebuggableScheduledThreadPoolExecutor, > and DebuggableScheduledThreadPoolExecutor is inherit from class > ScheduledThreadPoolExecutor. > By reading the code of class DebuggableScheduledThreadPoolExecutor, found > DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and > core pool size is 1. I think it should wrong using unbound queue. If we > using unbound queue, the thread pool wouldn't increasing thread even > there're many tasks are blocked in queue, because unbound queue never would > be full. I think here should use bound queue, so when clean task is heavily, > more threads would created for processing them. > {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String > threadPoolName, int priority) > Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, > priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } > > public ScheduledThreadPoolExecutor(int corePoolSize, > ThreadFactory threadFactory) > Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new > DelayedWorkQueue(), threadFactory); } > {quote} > Below is the case about clean task after compaction. there nearly 3 hours > delay for removing file "mc-56525". > {quote} > TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 > LifecycleTransaction.java:363 - Staging for obsolescence > BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') > .. > TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - > removing > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from > list of files tracked for test_2.edgestore > > TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - > Async instance tidier for > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, > before barrier > TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - > Async instance tidier for > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, > after barrier > TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - > Async instance tidier for > /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, > completed > {quote} > > > > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14654: --- Labels: Performance (was: ) > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Major > Labels: Performance > Fix For: 4.x > > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Jirsa updated CASSANDRA-14654: --- Fix Version/s: 4.x > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Major > Labels: Performance > Fix For: 4.x > > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14655) Upgrade C* to use latest guava (26.0)
[ https://issues.apache.org/jira/browse/CASSANDRA-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vinay Chella reassigned CASSANDRA-14655: Assignee: Sumanth Pasupuleti > Upgrade C* to use latest guava (26.0) > - > > Key: CASSANDRA-14655 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14655 > Project: Cassandra > Issue Type: Improvement > Components: Libraries >Reporter: Sumanth Pasupuleti >Assignee: Sumanth Pasupuleti >Priority: Minor > Fix For: 4.x > > > C* currently uses guava 23.3. This JIRA is about changing C* to use latest > guava (26.0). Originated from a discussion in the mailing list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14655) Upgrade C* to use latest guava (26.0)
[ https://issues.apache.org/jira/browse/CASSANDRA-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sumanth Pasupuleti updated CASSANDRA-14655: --- Fix Version/s: 4.x > Upgrade C* to use latest guava (26.0) > - > > Key: CASSANDRA-14655 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14655 > Project: Cassandra > Issue Type: Improvement > Components: Libraries >Reporter: Sumanth Pasupuleti >Priority: Minor > Fix For: 4.x > > > C* currently uses guava 23.3. This JIRA is about changing C* to use latest > guava (26.0). Originated from a discussion in the mailing list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14655) Upgrade C* to use latest guava (26.0)
[ https://issues.apache.org/jira/browse/CASSANDRA-14655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584403#comment-16584403 ] Sumanth Pasupuleti commented on CASSANDRA-14655: Github Branch: https://github.com/sumanth-pasupuleti/cassandra/tree/guava_26_trunk Failing Unit Tests: https://circleci.com/gh/sumanth-pasupuleti/cassandra/84 As confirmed by [~andrew.tolbert], the current version of driver is incompatible with the latest Guava, failing the UTs as of now. Will resume work on guava upgrade, once we have a new release of the driver that does not have guava compatibility issues. > Upgrade C* to use latest guava (26.0) > - > > Key: CASSANDRA-14655 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14655 > Project: Cassandra > Issue Type: Improvement > Components: Libraries >Reporter: Sumanth Pasupuleti >Priority: Minor > > C* currently uses guava 23.3. This JIRA is about changing C* to use latest > guava (26.0). Originated from a discussion in the mailing list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14654) Reduce heap pressure during compactions
[ https://issues.apache.org/jira/browse/CASSANDRA-14654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14654: - Component/s: Compaction > Reduce heap pressure during compactions > --- > > Key: CASSANDRA-14654 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 > Project: Cassandra > Issue Type: Improvement > Components: Compaction >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Major > > Small partition compactions are painfully slow with a lot of overhead per > partition. There also tends to be an excess of objects created (ie > 200-700mb/s) per compaction thread. > The EncoderStats walks through all the partitions and with mergeWith it will > create a new one per partition as it walks the potentially millions of > partitions. In a test scenario of about 600byte partitions and a couple 100mb > of data this consumed ~16% of the heap pressure. Changing this to instead > mutably track the min values and create one in a EncodingStats.Collector > brought this down considerably (but not 100% since the > UnfilteredRowIterator.stats() still creates 1 per partition). > The KeyCacheKey makes a full copy of the underlying byte array in > ByteBufferUtil.getArray in its constructor. This is the dominating heap > pressure as there are more sstables. By changing this to just keeping the > original it completely eliminates the current dominator of the compactions > and also improves read performance. > Minor tweak included for this as well for operators when compactions are > behind on low read clusters is to make the preemptive opening setting a > hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14655) Upgrade C* to use latest guava (26.0)
Sumanth Pasupuleti created CASSANDRA-14655: -- Summary: Upgrade C* to use latest guava (26.0) Key: CASSANDRA-14655 URL: https://issues.apache.org/jira/browse/CASSANDRA-14655 Project: Cassandra Issue Type: Improvement Components: Libraries Reporter: Sumanth Pasupuleti C* currently uses guava 23.3. This JIRA is about changing C* to use latest guava (26.0). Originated from a discussion in the mailing list. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-14654) Reduce heap pressure during compactions
Chris Lohfink created CASSANDRA-14654: - Summary: Reduce heap pressure during compactions Key: CASSANDRA-14654 URL: https://issues.apache.org/jira/browse/CASSANDRA-14654 Project: Cassandra Issue Type: Improvement Reporter: Chris Lohfink Assignee: Chris Lohfink Small partition compactions are painfully slow with a lot of overhead per partition. There also tends to be an excess of objects created (ie 200-700mb/s) per compaction thread. The EncoderStats walks through all the partitions and with mergeWith it will create a new one per partition as it walks the potentially millions of partitions. In a test scenario of about 600byte partitions and a couple 100mb of data this consumed ~16% of the heap pressure. Changing this to instead mutably track the min values and create one in a EncodingStats.Collector brought this down considerably (but not 100% since the UnfilteredRowIterator.stats() still creates 1 per partition). The KeyCacheKey makes a full copy of the underlying byte array in ByteBufferUtil.getArray in its constructor. This is the dominating heap pressure as there are more sstables. By changing this to just keeping the original it completely eliminates the current dominator of the compactions and also improves read performance. Minor tweak included for this as well for operators when compactions are behind on low read clusters is to make the preemptive opening setting a hotprop. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14631) Add RSS support for Cassandra blog
[ https://issues.apache.org/jira/browse/CASSANDRA-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-14631: - Reviewer: Dinesh Joshi > Add RSS support for Cassandra blog > -- > > Key: CASSANDRA-14631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14631 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jacques-Henri Berthemet >Assignee: Jeff Beck >Priority: Major > Attachments: 14631-site.txt > > > It would be convenient to add RSS support to Cassandra blog: > [http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html] > And maybe also for other resources like new versions, but this ticket is > about blog. > > {quote}From: Scott Andreas > Sent: Wednesday, August 08, 2018 6:53 PM > To: [d...@cassandra.apache.org|mailto:d...@cassandra.apache.org] > Subject: Re: Apache Cassandra Blog is now live > > Please feel free to file a ticket (label: Documentation and Website). > > It looks like Jekyll, the static site generator used to build the website, > has a plugin that generates Atom feeds if someone would like to work on > adding one: [https://github.com/jekyll/jekyll-feed] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-14631) Add RSS support for Cassandra blog
[ https://issues.apache.org/jira/browse/CASSANDRA-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi reassigned CASSANDRA-14631: Assignee: Jeff Beck > Add RSS support for Cassandra blog > -- > > Key: CASSANDRA-14631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14631 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jacques-Henri Berthemet >Assignee: Jeff Beck >Priority: Major > Attachments: 14631-site.txt > > > It would be convenient to add RSS support to Cassandra blog: > [http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html] > And maybe also for other resources like new versions, but this ticket is > about blog. > > {quote}From: Scott Andreas > Sent: Wednesday, August 08, 2018 6:53 PM > To: [d...@cassandra.apache.org|mailto:d...@cassandra.apache.org] > Subject: Re: Apache Cassandra Blog is now live > > Please feel free to file a ticket (label: Documentation and Website). > > It looks like Jekyll, the static site generator used to build the website, > has a plugin that generates Atom feeds if someone would like to work on > adding one: [https://github.com/jekyll/jekyll-feed] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14631) Add RSS support for Cassandra blog
[ https://issues.apache.org/jira/browse/CASSANDRA-14631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Beck updated CASSANDRA-14631: -- Attachment: 14631-site.txt Status: Patch Available (was: Open) I added the feed plugin that allows RSS subscriptions. Given we need more gems to make this work I setup bundler and a gemlock to make sure it will always match up. When I generated locally I noticed the date in the published blog post doesn't match what is generated not sure if that is an artifact of how it was originally published > Add RSS support for Cassandra blog > -- > > Key: CASSANDRA-14631 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14631 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Jacques-Henri Berthemet >Priority: Major > Attachments: 14631-site.txt > > > It would be convenient to add RSS support to Cassandra blog: > [http://cassandra.apache.org/blog/2018/08/07/faster_streaming_in_cassandra.html] > And maybe also for other resources like new versions, but this ticket is > about blog. > > {quote}From: Scott Andreas > Sent: Wednesday, August 08, 2018 6:53 PM > To: [d...@cassandra.apache.org|mailto:d...@cassandra.apache.org] > Subject: Re: Apache Cassandra Blog is now live > > Please feel free to file a ticket (label: Documentation and Website). > > It looks like Jekyll, the static site generator used to build the website, > has a plugin that generates Atom feeds if someone would like to work on > adding one: [https://github.com/jekyll/jekyll-feed] > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14651) No longer possible to specify cassandra_dir via pytest.ini on cassandra-dtest
[ https://issues.apache.org/jira/browse/CASSANDRA-14651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584170#comment-16584170 ] Jordan West commented on CASSANDRA-14651: - On the code, the only comment I have, which is a very minor/optional nit, is it would be nice to encapsulate the fetching of cassandra_dir in a method. {{.pytest_cache}} also seems like a good thing to put in {{.gitignore}}. Its in [Github's official one for Python|https://github.com/github/gitignore/commit/f651f0d3eef062a8592e017a194e703d93f3e5c9]. > No longer possible to specify cassandra_dir via pytest.ini on cassandra-dtest > - > > Key: CASSANDRA-14651 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14651 > Project: Cassandra > Issue Type: Bug > Components: Testing >Reporter: Paulo Motta >Assignee: Paulo Motta >Priority: Trivial > > It seems like ability to specify {{cassandra_dir}} via {{pytest.init}}, as > [stated on the > doc|https://github.com/apache/cassandra-dtest/blame/master/README.md#L79] was > lost after CASSANDRA-14449. We should either get it back or remove it from > the doc. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14436) Add sampler for query time and expose with nodetool
[ https://issues.apache.org/jira/browse/CASSANDRA-14436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584137#comment-16584137 ] Chris Lohfink commented on CASSANDRA-14436: --- Pushed requested changes. Having some issues with circleci and dtests Ill ask for some help with though |[units|https://circleci.com/gh/clohfink/cassandra/298]| > Add sampler for query time and expose with nodetool > --- > > Key: CASSANDRA-14436 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14436 > Project: Cassandra > Issue Type: Improvement >Reporter: Chris Lohfink >Assignee: Chris Lohfink >Priority: Major > Labels: pull-request-available > Time Spent: 0.5h > Remaining Estimate: 0h > > Create a new {{nodetool profileload}} that functions just like toppartitions > but with more data, returning the slowest local reads and writes on the host > during a given duration and highest frequency touched partitions (same as > {{nodetool toppartitions}}). Refactor included to extend use of the sampler > for uses outside of top frequency (max instead of total sample values). > Future work to this is to include top cpu and allocations by query and > possibly tasks/cpu/allocations by stage during time window. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14652) Extend IAuthenticator to accept peer SSL certificates
[ https://issues.apache.org/jira/browse/CASSANDRA-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeremy Hanna updated CASSANDRA-14652: - Labels: Security (was: ) > Extend IAuthenticator to accept peer SSL certificates > - > > Key: CASSANDRA-14652 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14652 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Labels: Security > Fix For: 4.0 > > > This patch will extend the IAuthenticator interface to accept peer's SSL > certificates. This will allow the Authenticator implementations to perform > additional checks from the client, if so desired. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584032#comment-16584032 ] Stefan Podkowinski commented on CASSANDRA-14346: I'd add a license file per jar to keep things consistent. Adding the servlet-api (dual CDDL/GPL) needs some more careful handling compared to permissive licensed deps, but shouldn't be a blocker per se, if we decide we really want to get this committed. > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.0 > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11671) Remove check on gossip status from DynamicEndpointSnitch::updateScores
[ https://issues.apache.org/jira/browse/CASSANDRA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-11671: Reviewer: Jason Brown > Remove check on gossip status from DynamicEndpointSnitch::updateScores > -- > > Key: CASSANDRA-11671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11671 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Sam Tunnicliffe >Priority: Minor > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that historically there were initialization ordering issues which > affected DES and StorageService (CASSANDRA-1756) and so a condition was added > to DES::updateScores() to ensure that SS had finished setup. In fact, the > check was actually testing whether gossip was active or not. CASSANDRA-10134 > preserved this behaviour, but it seems likely that the check can be removed > from DES completely now. If not, it can at least be switched to use > SS::isInitialized() which post CASSANDRA-10134 actually reports what it's > name suggests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11671) Remove check on gossip status from DynamicEndpointSnitch::updateScores
[ https://issues.apache.org/jira/browse/CASSANDRA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artsiom Yudovin updated CASSANDRA-11671: Status: Patch Available (was: Awaiting Feedback) > Remove check on gossip status from DynamicEndpointSnitch::updateScores > -- > > Key: CASSANDRA-11671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11671 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Sam Tunnicliffe >Priority: Minor > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that historically there were initialization ordering issues which > affected DES and StorageService (CASSANDRA-1756) and so a condition was added > to DES::updateScores() to ensure that SS had finished setup. In fact, the > check was actually testing whether gossip was active or not. CASSANDRA-10134 > preserved this behaviour, but it seems likely that the check can be removed > from DES completely now. If not, it can at least be switched to use > SS::isInitialized() which post CASSANDRA-10134 actually reports what it's > name suggests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11671) Remove check on gossip status from DynamicEndpointSnitch::updateScores
[ https://issues.apache.org/jira/browse/CASSANDRA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artsiom Yudovin updated CASSANDRA-11671: Status: Ready to Commit (was: Patch Available) > Remove check on gossip status from DynamicEndpointSnitch::updateScores > -- > > Key: CASSANDRA-11671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11671 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Sam Tunnicliffe >Priority: Minor > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that historically there were initialization ordering issues which > affected DES and StorageService (CASSANDRA-1756) and so a condition was added > to DES::updateScores() to ensure that SS had finished setup. In fact, the > check was actually testing whether gossip was active or not. CASSANDRA-10134 > preserved this behaviour, but it seems likely that the check can be removed > from DES completely now. If not, it can at least be switched to use > SS::isInitialized() which post CASSANDRA-10134 actually reports what it's > name suggests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11671) Remove check on gossip status from DynamicEndpointSnitch::updateScores
[ https://issues.apache.org/jira/browse/CASSANDRA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artsiom Yudovin updated CASSANDRA-11671: Status: Patch Available (was: Open) this [pull request|https://github.com/apache/cassandra/pull/251] > Remove check on gossip status from DynamicEndpointSnitch::updateScores > -- > > Key: CASSANDRA-11671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11671 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Sam Tunnicliffe >Priority: Minor > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that historically there were initialization ordering issues which > affected DES and StorageService (CASSANDRA-1756) and so a condition was added > to DES::updateScores() to ensure that SS had finished setup. In fact, the > check was actually testing whether gossip was active or not. CASSANDRA-10134 > preserved this behaviour, but it seems likely that the check can be removed > from DES completely now. If not, it can at least be switched to use > SS::isInitialized() which post CASSANDRA-10134 actually reports what it's > name suggests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11671) Remove check on gossip status from DynamicEndpointSnitch::updateScores
[ https://issues.apache.org/jira/browse/CASSANDRA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artsiom Yudovin updated CASSANDRA-11671: Status: Awaiting Feedback (was: In Progress) > Remove check on gossip status from DynamicEndpointSnitch::updateScores > -- > > Key: CASSANDRA-11671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11671 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Sam Tunnicliffe >Priority: Minor > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that historically there were initialization ordering issues which > affected DES and StorageService (CASSANDRA-1756) and so a condition was added > to DES::updateScores() to ensure that SS had finished setup. In fact, the > check was actually testing whether gossip was active or not. CASSANDRA-10134 > preserved this behaviour, but it seems likely that the check can be removed > from DES completely now. If not, it can at least be switched to use > SS::isInitialized() which post CASSANDRA-10134 actually reports what it's > name suggests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11671) Remove check on gossip status from DynamicEndpointSnitch::updateScores
[ https://issues.apache.org/jira/browse/CASSANDRA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artsiom Yudovin updated CASSANDRA-11671: Status: In Progress (was: Ready to Commit) > Remove check on gossip status from DynamicEndpointSnitch::updateScores > -- > > Key: CASSANDRA-11671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11671 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Sam Tunnicliffe >Priority: Minor > Labels: pull-request-available > Fix For: 4.x > > Time Spent: 10m > Remaining Estimate: 0h > > It seems that historically there were initialization ordering issues which > affected DES and StorageService (CASSANDRA-1756) and so a condition was added > to DES::updateScores() to ensure that SS had finished setup. In fact, the > check was actually testing whether gossip was active or not. CASSANDRA-10134 > preserved this behaviour, but it seems likely that the check can be removed > from DES completely now. If not, it can at least be switched to use > SS::isInitialized() which post CASSANDRA-10134 actually reports what it's > name suggests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-11671) Remove check on gossip status from DynamicEndpointSnitch::updateScores
[ https://issues.apache.org/jira/browse/CASSANDRA-11671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-11671: --- Labels: pull-request-available (was: ) > Remove check on gossip status from DynamicEndpointSnitch::updateScores > -- > > Key: CASSANDRA-11671 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11671 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Sam Tunnicliffe >Priority: Minor > Labels: pull-request-available > Fix For: 4.x > > > It seems that historically there were initialization ordering issues which > affected DES and StorageService (CASSANDRA-1756) and so a condition was added > to DES::updateScores() to ensure that SS had finished setup. In fact, the > check was actually testing whether gossip was active or not. CASSANDRA-10134 > preserved this behaviour, but it seems likely that the check can be removed > from DES completely now. If not, it can at least be switched to use > SS::isInitialized() which post CASSANDRA-10134 actually reports what it's > name suggests. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14652) Extend IAuthenticator to accept peer SSL certificates
[ https://issues.apache.org/jira/browse/CASSANDRA-14652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14652: Resolution: Fixed Status: Resolved (was: Patch Available) +1 Committed as sha {{ac1bb75867a9a878a86d9b659234f78772627287}}. Thanks! > Extend IAuthenticator to accept peer SSL certificates > - > > Key: CASSANDRA-14652 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14652 > Project: Cassandra > Issue Type: Improvement > Components: Auth >Reporter: Dinesh Joshi >Assignee: Dinesh Joshi >Priority: Major > Fix For: 4.0 > > > This patch will extend the IAuthenticator interface to accept peer's SSL > certificates. This will allow the Authenticator implementations to perform > additional checks from the client, if so desired. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Extend IAuthenticator to accept peer SSL certificates
Repository: cassandra Updated Branches: refs/heads/trunk 298416a74 -> ac1bb7586 Extend IAuthenticator to accept peer SSL certificates patch by Dinesh Joshi; reviewed by jasobrown for CASSANDRA-14652 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ac1bb758 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ac1bb758 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ac1bb758 Branch: refs/heads/trunk Commit: ac1bb75867a9a878a86d9b659234f78772627287 Parents: 298416a Author: Dinesh A. Joshi Authored: Thu Aug 16 15:01:20 2018 -0700 Committer: Jason Brown Committed: Fri Aug 17 06:43:45 2018 -0700 -- CHANGES.txt | 1 + .../apache/cassandra/auth/IAuthenticator.java | 18 +++ .../cassandra/transport/ServerConnection.java | 33 +++- 3 files changed, 51 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac1bb758/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 0e671b0..d906879 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Extend IAuthenticator to accept peer SSL certificates (CASSANDRA-14652) * Incomplete handling of exceptions when decoding incoming messages (CASSANDRA-14574) * Add diagnostic events for user audit logging (CASSANDRA-13668) * Allow retrieving diagnostic events via JMX (CASSANDRA-14435) http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac1bb758/src/java/org/apache/cassandra/auth/IAuthenticator.java -- diff --git a/src/java/org/apache/cassandra/auth/IAuthenticator.java b/src/java/org/apache/cassandra/auth/IAuthenticator.java index 9eb50a7..212e774 100644 --- a/src/java/org/apache/cassandra/auth/IAuthenticator.java +++ b/src/java/org/apache/cassandra/auth/IAuthenticator.java @@ -21,6 +21,8 @@ import java.net.InetAddress; import java.util.Map; import java.util.Set; +import javax.security.cert.X509Certificate; + import org.apache.cassandra.exceptions.AuthenticationException; import org.apache.cassandra.exceptions.ConfigurationException; @@ -65,6 +67,22 @@ public interface IAuthenticator SaslNegotiator newSaslNegotiator(InetAddress clientAddress); /** + * Provide a SASL handler to perform authentication for an single connection. SASL + * is a stateful protocol, so a new instance must be used for each authentication + * attempt. This method accepts certificates as well. Authentication strategies can + * override this method to gain access to client's certificate chain, if present. + * @param clientAddress the IP address of the client whom we wish to authenticate, or null + * if an internal client (one not connected over the remote transport). + * @param certificates the peer's X509 Certificate chain, if present. + * @return org.apache.cassandra.auth.IAuthenticator.SaslNegotiator implementation + * (see {@link org.apache.cassandra.auth.PasswordAuthenticator.PlainTextSaslAuthenticator}) + */ +default SaslNegotiator newSaslNegotiator(InetAddress clientAddress, X509Certificate[] certificates) +{ +return newSaslNegotiator(clientAddress); +} + +/** * A legacy method that is still used by JMX authentication. * * You should implement this for having JMX authentication through your http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac1bb758/src/java/org/apache/cassandra/transport/ServerConnection.java -- diff --git a/src/java/org/apache/cassandra/transport/ServerConnection.java b/src/java/org/apache/cassandra/transport/ServerConnection.java index d78b7c0..00e334c 100644 --- a/src/java/org/apache/cassandra/transport/ServerConnection.java +++ b/src/java/org/apache/cassandra/transport/ServerConnection.java @@ -20,8 +20,15 @@ package org.apache.cassandra.transport; import java.util.concurrent.ConcurrentHashMap; import java.util.concurrent.ConcurrentMap; +import javax.net.ssl.SSLPeerUnverifiedException; +import javax.security.cert.X509Certificate; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + import io.netty.channel.Channel; import com.codahale.metrics.Counter; +import io.netty.handler.ssl.SslHandler; import org.apache.cassandra.auth.IAuthenticator; import org.apache.cassandra.config.DatabaseDescriptor; import org.apache.cassandra.service.ClientState; @@ -29,6 +36,7 @@ import org.apache.cassandra.service.QueryState; public class ServerConnection extends Connection { +private static Logger logger = LoggerFactory.getLogger(ServerConnection.class); private
[jira] [Commented] (CASSANDRA-14647) Reading cardinality from Statistics.db failed
[ https://issues.apache.org/jira/browse/CASSANDRA-14647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583927#comment-16583927 ] Romain Hardouin commented on CASSANDRA-14647: - This is not due to STCS -> LCS. I have the same behavior on one cluster with LCS and heavy writes. STCS has never been configured on it. > Reading cardinality from Statistics.db failed > - > > Key: CASSANDRA-14647 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14647 > Project: Cassandra > Issue Type: Bug > Components: Compaction > Environment: Clients are doing only writes with Local One, cluster > consist of 3 regions with RF3. > Storage is configured wth jbod/XFS on 10 x 1Tb disks > IOPS limit for each disk 500 (total 5000 iops) > Bandwith for each disk 60mb/s (600 total) > OS is Debian linux. >Reporter: Vitali Djatsuk >Priority: Major > Fix For: 3.0.x > > Attachments: cassandra_compaction_pending_tasks_7days.png > > > There is some issue with sstable metadata which is visible in system.log, the > messages says: > {noformat} > WARN [Thread-6] 2018-07-25 07:12:47,928 SSTableReader.java:249 - Reading > cardinality from Statistics.db failed for > /opt/data/disk5/data/keyspace/table/mc-big-Data.db.{noformat} > Although there is no such file. > The message has appeared after i've changed the compaction strategy from > SizeTiered to Leveled. Compaction strategy has been changed region by region > (total 3 regions) and it has coincided with the double client write traffic > increase. > I have tried to run nodetool scrub to rebuilt the sstable, but that does not > fix the issue. > So very hard to define the steps to reproduce, probably it will be: > # run stress tool with write traffic > # under load change compaction strategy from SireTiered to Leveled for the > bunch of hosts > # add more write traffic > Reading the code it is said that if this metadata is broken, then "estimating > the keys will be done using index summary". > > [https://github.com/apache/cassandra/blob/cassandra-3.0.17/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java#L247] > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14574) Incomplete handling of exceptions when decoding incoming messages
[ https://issues.apache.org/jira/browse/CASSANDRA-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-14574: Resolution: Fixed Status: Resolved (was: Ready to Commit) > Incomplete handling of exceptions when decoding incoming messages > -- > > Key: CASSANDRA-14574 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14574 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Aleksey Yeschenko >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > > {{MessageInHandler.decode()}} occasionally reads the payload incorrectly, > passing the full message to {{MessageIn.read()}} instead of just the payload > bytes. > You can see the stack trace in the logs from this [CI > run|https://circleci.com/gh/iamaleksey/cassandra/437#tests/containers/38]. > {code} > Caused by: java.lang.AssertionError: null > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:351) > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:371) > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:335) > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:158) > at > org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:132) > {code} > Reconstructed, truncated stream passed to {{MessageIn.read()}}: > {{000b000743414c5f42414301002a01e1a5c9b089fd11e8b517436ee124300704005d10fc50ec}} > You can clearly see parameters in there encoded before the payload: > {{[43414c5f424143 - CAL_BAC] [01 - ONE_BYTE] [002a - 42, payload size] 01 e1 > a5 c9 b0 89 fd 11 e8 b5 17 43 6e e1 24 30 07 04 00 00 00 1d 10 fc 50 ec}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14574) Incomplete handling of exceptions when decoding incoming messages
[ https://issues.apache.org/jira/browse/CASSANDRA-14574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583874#comment-16583874 ] Jason Brown commented on CASSANDRA-14574: - Committed to c* as sha {{298416a7445aa50874caebc779ca3094b32f3e31}}, committed to dtest as sha {{6e80b1846c308bb13d0b700263c89f10caa17d28}}. Thanks, all! > Incomplete handling of exceptions when decoding incoming messages > -- > > Key: CASSANDRA-14574 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14574 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging >Reporter: Aleksey Yeschenko >Assignee: Jason Brown >Priority: Major > Fix For: 4.0 > > > {{MessageInHandler.decode()}} occasionally reads the payload incorrectly, > passing the full message to {{MessageIn.read()}} instead of just the payload > bytes. > You can see the stack trace in the logs from this [CI > run|https://circleci.com/gh/iamaleksey/cassandra/437#tests/containers/38]. > {code} > Caused by: java.lang.AssertionError: null > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:351) > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:371) > at > org.apache.cassandra.db.Mutation$MutationSerializer.deserialize(Mutation.java:335) > at org.apache.cassandra.net.MessageIn.read(MessageIn.java:158) > at > org.apache.cassandra.net.async.MessageInHandler.decode(MessageInHandler.java:132) > {code} > Reconstructed, truncated stream passed to {{MessageIn.read()}}: > {{000b000743414c5f42414301002a01e1a5c9b089fd11e8b517436ee124300704005d10fc50ec}} > You can clearly see parameters in there encoded before the payload: > {{[43414c5f424143 - CAL_BAC] [01 - ONE_BYTE] [002a - 42, payload size] 01 e1 > a5 c9 b0 89 fd 11 e8 b5 17 43 6e e1 24 30 07 04 00 00 00 1d 10 fc 50 ec}} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra-dtest git commit: Test corrupting an internode messaging connection, and ensure it reconnects.
Repository: cassandra-dtest Updated Branches: refs/heads/master e426ce1da -> 6e80b1846 Test corrupting an internode messaging connection, and ensure it reconnects. patch by jasobrown; reviewed by Dinesh Joshi for CASSANDRA-14574 Project: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/commit/6e80b184 Tree: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/tree/6e80b184 Diff: http://git-wip-us.apache.org/repos/asf/cassandra-dtest/diff/6e80b184 Branch: refs/heads/master Commit: 6e80b1846c308bb13d0b700263c89f10caa17d28 Parents: e426ce1 Author: Jason Brown Authored: Thu Aug 16 06:27:23 2018 -0700 Committer: Jason Brown Committed: Fri Aug 17 05:55:40 2018 -0700 -- byteman/corrupt_internode_messages_gossip.btm | 17 internode_messaging_test.py | 48 ++ 2 files changed, 65 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/6e80b184/byteman/corrupt_internode_messages_gossip.btm -- diff --git a/byteman/corrupt_internode_messages_gossip.btm b/byteman/corrupt_internode_messages_gossip.btm new file mode 100644 index 000..66e4fe2 --- /dev/null +++ b/byteman/corrupt_internode_messages_gossip.btm @@ -0,0 +1,17 @@ +# +# corrupt the first gossip ACK message. we corrupt it on the way out, +# in serialize(), so it fails on deserializing. However, we also need +# to hack the serializedSize(). +# + +RULE corrupt the first gossip ACK message +CLASS org.apache.cassandra.gms.GossipDigestAckSerializer +METHOD serialize(org.apache.cassandra.gms.GossipDigestAck, org.apache.cassandra.io.util.DataOutputPlus, int) +AT ENTRY +# set flag to only run this rule once. +IF NOT flagged("done") +DO + flag("done"); + $2.writeInt(-1); +ENDRULE + http://git-wip-us.apache.org/repos/asf/cassandra-dtest/blob/6e80b184/internode_messaging_test.py -- diff --git a/internode_messaging_test.py b/internode_messaging_test.py new file mode 100644 index 000..d0d4d1f --- /dev/null +++ b/internode_messaging_test.py @@ -0,0 +1,48 @@ +import pytest +import logging +import time + +from dtest import Tester + +since = pytest.mark.since +logger = logging.getLogger(__name__) + +_LOG_ERR_ILLEGAL_CAPACITY = "Caused by: java.lang.IllegalArgumentException: Illegal Capacity: -1" + + +@since('4.0') +class TestInternodeMessaging(Tester): + +@pytest.fixture(autouse=True) +def fixture_add_additional_log_patterns(self, fixture_dtest_setup): +fixture_dtest_setup.ignore_log_patterns = ( +r'Illegal Capacity: -1', +r'reported message size' +) + +def test_message_corruption(self): +""" +@jira_ticket CASSANDRA-14574 + +Use byteman to corrupt an outgoing gossip ACK message, check that the recipient fails *once* on the message +but does not spin out of control trying to process the rest of the bytes in the buffer. +Then make sure normal messaging can occur after a reconnect (on a different socket, of course). +""" +cluster = self.cluster +cluster.populate(2, install_byteman=True) +cluster.start(wait_other_notice=True) + +node1, node2 = cluster.nodelist() +node1_log_mark = node1.mark_log() +node2_log_mark = node2.mark_log() + + node2.byteman_submit(['./byteman/corrupt_internode_messages_gossip.btm']) + +# wait for the deserialization error to happen on node1 +time.sleep(10) +assert len(node1.grep_log(_LOG_ERR_ILLEGAL_CAPACITY, from_mark=node1_log_mark)) == 1 + +# now, make sure node2 reconnects (and continues gossiping). +# node.watch_log_for() will time out if it cannot find the log entry +assert node2.grep_log('successfully connected to 127.0.0.1:7000 \(GOSSIP\)', + from_mark=node2_log_mark, filename='debug.log') - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
cassandra git commit: Incomplete handling of exceptions when decoding incoming messages
Repository: cassandra Updated Branches: refs/heads/trunk d8c451923 -> 298416a74 Incomplete handling of exceptions when decoding incoming messages patch by jasobrown; reviewed by Dinesh Joshi for CASSANDRA-14574 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/298416a7 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/298416a7 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/298416a7 Branch: refs/heads/trunk Commit: 298416a7445aa50874caebc779ca3094b32f3e31 Parents: d8c4519 Author: Jason Brown Authored: Wed Jul 18 13:47:22 2018 -0700 Committer: Jason Brown Committed: Fri Aug 17 05:54:37 2018 -0700 -- CHANGES.txt | 1 + .../net/async/BaseMessageInHandler.java | 61 - .../cassandra/net/async/MessageInHandler.java | 121 +++- .../net/async/MessageInHandlerPre40.java| 137 +-- .../test/microbench/MessageOutBench.java| 6 +- .../net/async/MessageInHandlerTest.java | 65 - 6 files changed, 238 insertions(+), 153 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/298416a7/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index d2970a4..0e671b0 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Incomplete handling of exceptions when decoding incoming messages (CASSANDRA-14574) * Add diagnostic events for user audit logging (CASSANDRA-13668) * Allow retrieving diagnostic events via JMX (CASSANDRA-14435) * Add base classes for diagnostic events (CASSANDRA-13457) http://git-wip-us.apache.org/repos/asf/cassandra/blob/298416a7/src/java/org/apache/cassandra/net/async/BaseMessageInHandler.java -- diff --git a/src/java/org/apache/cassandra/net/async/BaseMessageInHandler.java b/src/java/org/apache/cassandra/net/async/BaseMessageInHandler.java index 7314999..2f2a973 100644 --- a/src/java/org/apache/cassandra/net/async/BaseMessageInHandler.java +++ b/src/java/org/apache/cassandra/net/async/BaseMessageInHandler.java @@ -26,7 +26,6 @@ import java.util.Map; import java.util.function.BiConsumer; import com.google.common.annotations.VisibleForTesting; - import org.slf4j.Logger; import org.slf4j.LoggerFactory; @@ -40,6 +39,14 @@ import org.apache.cassandra.net.MessageIn; import org.apache.cassandra.net.MessagingService; import org.apache.cassandra.net.ParameterType; +/** + * Parses out individual messages from the incoming buffers. Each message, both header and payload, is incrementally built up + * from the available input data, then passed to the {@link #messageConsumer}. + * + * Note: this class derives from {@link ByteToMessageDecoder} to take advantage of the {@link ByteToMessageDecoder.Cumulator} + * behavior across {@link #decode(ChannelHandlerContext, ByteBuf, List)} invocations. That way we don't have to maintain + * the not-fully consumed {@link ByteBuf}s. + */ public abstract class BaseMessageInHandler extends ByteToMessageDecoder { public static final Logger logger = LoggerFactory.getLogger(BaseMessageInHandler.class); @@ -52,7 +59,8 @@ public abstract class BaseMessageInHandler extends ByteToMessageDecoder READ_PARAMETERS_SIZE, READ_PARAMETERS_DATA, READ_PAYLOAD_SIZE, -READ_PAYLOAD +READ_PAYLOAD, +CLOSED } /** @@ -77,6 +85,8 @@ public abstract class BaseMessageInHandler extends ByteToMessageDecoder final InetAddressAndPort peer; final int messagingVersion; +protected State state; + public BaseMessageInHandler(InetAddressAndPort peer, int messagingVersion, BiConsumer messageConsumer) { this.peer = peer; @@ -84,7 +94,36 @@ public abstract class BaseMessageInHandler extends ByteToMessageDecoder this.messageConsumer = messageConsumer; } -public abstract void decode(ChannelHandlerContext ctx, ByteBuf in, List out); +// redeclared here to make the method public (for testing) +@VisibleForTesting +public void decode(ChannelHandlerContext ctx, ByteBuf in, List out) throws Exception +{ +if (state == State.CLOSED) +{ +in.skipBytes(in.readableBytes()); +return; +} + +try +{ +handleDecode(ctx, in, out); +} +catch (Exception e) +{ +// prevent any future attempts at reading messages from any inbound buffers, as we're already in a bad state +state = State.CLOSED; + +// force the buffer to appear to be consumed, thereby exiting the ByteToMessageDecoder.callDecode() loop, +// and other
[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583869#comment-16583869 ] Joseph Lynch commented on CASSANDRA-14346: -- [~spo...@gmail.com] great catch, sorry we missed those! I'm working on adding them but I just wanted to double check if we need a license per jar or we can group them (e.g. jetty, websocket)? Regarding {{javax.servlet-api}} I thought that project was dual licensed under GPLv2 as well ([maven|https://mvnrepository.com/artifact/javax.servlet/javax.servlet-api], [source code|https://github.com/javaee/glassfish/blob/master/LICENSE], [website license page|https://javaee.github.io/glassfish/LICENSE])? Also I think it's a pretty common Apache project dependency, for example [hadoop|https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common/3.1.1] has it as a compile and runtime dependency I believe. Is it still an issue if we choose to use it under the GPLv2? > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.0 > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14435) Diag. Events: JMX events
[ https://issues.apache.org/jira/browse/CASSANDRA-14435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-14435: --- Resolution: Fixed Fix Version/s: (was: 4.x) 4.0 Status: Resolved (was: Patch Available) Committed as a79e5903b552e40f77c! > Diag. Events: JMX events > > > Key: CASSANDRA-14435 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14435 > Project: Cassandra > Issue Type: New Feature > Components: Observability >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.0 > > > Nodes currently use JMX events for progress reporting on bootstrap and > repairs. This might also be an option to expose diagnostic events to external > subscribers. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13668) Diag. events for user audit logging
[ https://issues.apache.org/jira/browse/CASSANDRA-13668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13668: --- Resolution: Fixed Fix Version/s: (was: 4.x) 4.0 Status: Resolved (was: Patch Available) Committed as d8c45192318584! > Diag. events for user audit logging > --- > > Key: CASSANDRA-13668 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13668 > Project: Cassandra > Issue Type: Improvement > Components: Observability >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.0 > > > With the availability of CASSANDRA-13459, any native transport enabled client > will be able to subscribe to internal Cassandra events. External tools can > take advantage by monitoring these events in various ways. Use-cases for this > can be e.g. auditing tools for compliance and security purposes. > The scope of this ticket is to add diagnostic events that are raised around > authentication and CQL operations. These events can then be consumed and used > by external tools to implement a Cassandra user auditing solution. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13457) Diag. Events: Add base classes
[ https://issues.apache.org/jira/browse/CASSANDRA-13457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Podkowinski updated CASSANDRA-13457: --- Resolution: Fixed Fix Version/s: 4.0 Status: Resolved (was: Patch Available) Committed as 2846b22a70d48bae. Thanks [~michaelsembwever] and [~jasobrown] for reviewing and sharing your feedback! > Diag. Events: Add base classes > -- > > Key: CASSANDRA-13457 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13457 > Project: Cassandra > Issue Type: Sub-task > Components: Core, Observability >Reporter: Stefan Podkowinski >Assignee: Stefan Podkowinski >Priority: Major > Fix For: 4.0 > > > Base ticket for adding classes that will allow you to implement and subscribe > to events. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[1/5] cassandra git commit: Add diagnostic events base classes
Repository: cassandra Updated Branches: refs/heads/trunk d3e6891ec -> d8c451923 http://git-wip-us.apache.org/repos/asf/cassandra/blob/2846b22a/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceDiagnostics.java -- diff --git a/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceDiagnostics.java b/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceDiagnostics.java new file mode 100644 index 000..ec09e3f --- /dev/null +++ b/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceDiagnostics.java @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.service; + +import java.util.concurrent.atomic.AtomicInteger; + +import org.apache.cassandra.diag.DiagnosticEventService; +import org.apache.cassandra.service.PendingRangeCalculatorServiceEvent.PendingRangeCalculatorServiceEventType; + +/** + * Utility methods for diagnostic events related to {@link PendingRangeCalculatorService}. + */ +final class PendingRangeCalculatorServiceDiagnostics +{ +private static final DiagnosticEventService service = DiagnosticEventService.instance(); + +private PendingRangeCalculatorServiceDiagnostics() +{ +} + +static void taskStarted(PendingRangeCalculatorService calculatorService, AtomicInteger taskCount) +{ +if (isEnabled(PendingRangeCalculatorServiceEventType.TASK_STARTED)) +service.publish(new PendingRangeCalculatorServiceEvent(PendingRangeCalculatorServiceEventType.TASK_STARTED, + calculatorService, + taskCount.get())); +} + +static void taskFinished(PendingRangeCalculatorService calculatorService, AtomicInteger taskCount) +{ +if (isEnabled(PendingRangeCalculatorServiceEventType.TASK_FINISHED_SUCCESSFULLY)) +service.publish(new PendingRangeCalculatorServiceEvent(PendingRangeCalculatorServiceEventType.TASK_FINISHED_SUCCESSFULLY, + calculatorService, + taskCount.get())); +} + +static void taskRejected(PendingRangeCalculatorService calculatorService, AtomicInteger taskCount) +{ +if (isEnabled(PendingRangeCalculatorServiceEventType.TASK_EXECUTION_REJECTED)) +service.publish(new PendingRangeCalculatorServiceEvent(PendingRangeCalculatorServiceEventType.TASK_EXECUTION_REJECTED, + calculatorService, + taskCount.get())); +} + +static void taskCountChanged(PendingRangeCalculatorService calculatorService, int taskCount) +{ +if (isEnabled(PendingRangeCalculatorServiceEventType.TASK_COUNT_CHANGED)) +service.publish(new PendingRangeCalculatorServiceEvent(PendingRangeCalculatorServiceEventType.TASK_COUNT_CHANGED, + calculatorService, + taskCount)); +} + +private static boolean isEnabled(PendingRangeCalculatorServiceEventType type) +{ +return service.isEnabled(PendingRangeCalculatorServiceEvent.class, type); +} +} http://git-wip-us.apache.org/repos/asf/cassandra/blob/2846b22a/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceEvent.java -- diff --git a/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceEvent.java b/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceEvent.java new file mode 100644 index 000..3024149 --- /dev/null +++ b/src/java/org/apache/cassandra/service/PendingRangeCalculatorServiceEvent.java @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with
[2/5] cassandra git commit: Add diagnostic events base classes
http://git-wip-us.apache.org/repos/asf/cassandra/blob/2846b22a/src/java/org/apache/cassandra/gms/GossiperEvent.java -- diff --git a/src/java/org/apache/cassandra/gms/GossiperEvent.java b/src/java/org/apache/cassandra/gms/GossiperEvent.java new file mode 100644 index 000..2de88bc --- /dev/null +++ b/src/java/org/apache/cassandra/gms/GossiperEvent.java @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.gms; + +import java.io.Serializable; +import java.util.HashMap; +import java.util.List; +import java.util.Map; +import java.util.Set; +import javax.annotation.Nullable; + +import org.apache.cassandra.diag.DiagnosticEvent; +import org.apache.cassandra.locator.InetAddressAndPort; + +/** + * DiagnosticEvent implementation for {@link Gossiper} activities. + */ +final class GossiperEvent extends DiagnosticEvent +{ +private final InetAddressAndPort endpoint; +@Nullable +private final Long quarantineExpiration; +@Nullable +private final EndpointState localState; + +private final Map endpointStateMap; +private final boolean inShadowRound; +private final Map justRemovedEndpoints; +private final long lastProcessedMessageAt; +private final Set liveEndpoints; +private final List seeds; +private final Set seedsInShadowRound; +private final Map unreachableEndpoints; + + +enum GossiperEventType +{ +MARKED_AS_SHUTDOWN, +CONVICTED, +REPLACEMENT_QUARANTINE, +REPLACED_ENDPOINT, +EVICTED_FROM_MEMBERSHIP, +REMOVED_ENDPOINT, +QUARANTINED_ENDPOINT, +MARKED_ALIVE, +REAL_MARKED_ALIVE, +MARKED_DEAD, +MAJOR_STATE_CHANGE_HANDLED, +SEND_GOSSIP_DIGEST_SYN +} + +public GossiperEventType type; + + +GossiperEvent(GossiperEventType type, Gossiper gossiper, InetAddressAndPort endpoint, + @Nullable Long quarantineExpiration, @Nullable EndpointState localState) +{ +this.type = type; +this.endpoint = endpoint; +this.quarantineExpiration = quarantineExpiration; +this.localState = localState; + +this.endpointStateMap = gossiper.getEndpointStateMap(); +this.inShadowRound = gossiper.isInShadowRound(); +this.justRemovedEndpoints = gossiper.getJustRemovedEndpoints(); +this.lastProcessedMessageAt = gossiper.getLastProcessedMessageAt(); +this.liveEndpoints = gossiper.getLiveMembers(); +this.seeds = gossiper.getSeeds(); +this.seedsInShadowRound = gossiper.getSeedsInShadowRound(); +this.unreachableEndpoints = gossiper.getUnreachableEndpoints(); +} + +public Enum getType() +{ +return type; +} + +public HashMap toMap() +{ +// be extra defensive against nulls and bugs +HashMap ret = new HashMap<>(); +if (endpoint != null) ret.put("endpoint", endpoint.getHostAddress(true)); +ret.put("quarantineExpiration", quarantineExpiration); +ret.put("localState", String.valueOf(localState)); +ret.put("endpointStateMap", String.valueOf(endpointStateMap)); +ret.put("inShadowRound", inShadowRound); +ret.put("justRemovedEndpoints", String.valueOf(justRemovedEndpoints)); +ret.put("lastProcessedMessageAt", lastProcessedMessageAt); +ret.put("liveEndpoints", String.valueOf(liveEndpoints)); +ret.put("seeds", String.valueOf(seeds)); +ret.put("seedsInShadowRound", String.valueOf(seedsInShadowRound)); +ret.put("unreachableEndpoints", String.valueOf(unreachableEndpoints)); +return ret; +} +} \ No newline at end of file http://git-wip-us.apache.org/repos/asf/cassandra/blob/2846b22a/src/java/org/apache/cassandra/hints/Hint.java -- diff --git a/src/java/org/apache/cassandra/hints/Hint.java b/src/java/org/apache/cassandra/hints/Hint.java index b0abd50..7e4618c 100644 --- a/src/java/org/apache/cassandra/hints/Hint.java +++ b/src/java/org/apache/cassandra/hints/Hint.java @@ -132,7 +132,7 @@ public final class Hint
[5/5] cassandra git commit: Add diagnostic events for user audit logging
Add diagnostic events for user audit logging patch by Stefan Podkowinski; reviewed by Mick Semb Wever for CASSANDRA-13668 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/d8c45192 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/d8c45192 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/d8c45192 Branch: refs/heads/trunk Commit: d8c451923185841ca28e8cb1177b71edafbfd988 Parents: a79e590 Author: Stefan Podkowinski Authored: Fri Apr 6 09:49:38 2018 +0200 Committer: Stefan Podkowinski Committed: Fri Aug 17 14:08:37 2018 +0200 -- CHANGES.txt | 1 + .../org/apache/cassandra/audit/AuditEvent.java | 75 ++ .../audit/DiagnosticEventAuditLogger.java | 39 +++ .../cassandra/transport/CQLUserAuditTest.java | 253 +++ 4 files changed, 368 insertions(+) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/d8c45192/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index 097e7dd..d2970a4 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Add diagnostic events for user audit logging (CASSANDRA-13668) * Allow retrieving diagnostic events via JMX (CASSANDRA-14435) * Add base classes for diagnostic events (CASSANDRA-13457) * Clear view system metadata when dropping keyspace (CASSANDRA-14646) http://git-wip-us.apache.org/repos/asf/cassandra/blob/d8c45192/src/java/org/apache/cassandra/audit/AuditEvent.java -- diff --git a/src/java/org/apache/cassandra/audit/AuditEvent.java b/src/java/org/apache/cassandra/audit/AuditEvent.java new file mode 100644 index 000..b21fe58 --- /dev/null +++ b/src/java/org/apache/cassandra/audit/AuditEvent.java @@ -0,0 +1,75 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.audit; + +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; + +import org.apache.cassandra.diag.DiagnosticEvent; +import org.apache.cassandra.diag.DiagnosticEventService; + +/** + * {@Link AuditLogEntry} wrapper to expose audit events as {@link DiagnosticEvent}s. + */ +public final class AuditEvent extends DiagnosticEvent +{ +private final AuditLogEntry entry; + +private AuditEvent(AuditLogEntry entry) +{ +this.entry = entry; +} + +static void create(AuditLogEntry entry) +{ +if (isEnabled(entry.getType())) +DiagnosticEventService.instance().publish(new AuditEvent(entry)); +} + +private static boolean isEnabled(AuditLogEntryType type) +{ +return DiagnosticEventService.instance().isEnabled(AuditEvent.class, type); +} + +public Enum getType() +{ +return entry.getType(); +} + +public String getSource() +{ +return entry.getSource().toString(true); +} + +public AuditLogEntry getEntry() +{ +return entry; +} + +public Map toMap() +{ +HashMap ret = new HashMap<>(); +if (entry.getKeyspace() != null) ret.put("keyspace", entry.getKeyspace()); +if (entry.getOperation() != null) ret.put("operation", entry.getOperation()); +if (entry.getScope() != null) ret.put("scope", entry.getScope()); +if (entry.getUser() != null) ret.put("user", entry.getUser()); +return ret; +} +} http://git-wip-us.apache.org/repos/asf/cassandra/blob/d8c45192/src/java/org/apache/cassandra/audit/DiagnosticEventAuditLogger.java -- diff --git a/src/java/org/apache/cassandra/audit/DiagnosticEventAuditLogger.java b/src/java/org/apache/cassandra/audit/DiagnosticEventAuditLogger.java new file mode 100644 index 000..9d586ba --- /dev/null +++ b/src/java/org/apache/cassandra/audit/DiagnosticEventAuditLogger.java @@ -0,0 +1,39 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor
[4/5] cassandra git commit: Add JMX query support for diagnostic events
Add JMX query support for diagnostic events patch by Stefan Podkowinski; reviewed by Mick Semb Wever for CASSANDRA-14435 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/a79e5903 Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/a79e5903 Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/a79e5903 Branch: refs/heads/trunk Commit: a79e5903b552e40f77c151e23172f054ffb7f39e Parents: 2846b22 Author: Stefan Podkowinski Authored: Wed May 2 13:03:10 2018 +0200 Committer: Stefan Podkowinski Committed: Fri Aug 17 14:07:45 2018 +0200 -- CHANGES.txt | 1 + .../cassandra/config/DatabaseDescriptor.java| 1 - .../diag/DiagnosticEventPersistence.java| 151 .../cassandra/diag/DiagnosticEventService.java | 65 ++- .../diag/DiagnosticEventServiceMBean.java | 59 +++ .../cassandra/diag/LastEventIdBroadcaster.java | 150 .../diag/LastEventIdBroadcasterMBean.java | 41 + .../diag/store/DiagnosticEventMemoryStore.java | 97 +++ .../diag/store/DiagnosticEventStore.java| 52 ++ .../cassandra/service/StorageService.java | 5 +- .../progress/jmx/JMXBroadcastExecutor.java | 35 .../DiagnosticEventPersistenceBench.java| 73 .../microbench/DiagnosticEventServiceBench.java | 103 +++ .../config/OverrideConfigurationLoader.java | 47 + .../diag/DiagnosticEventServiceTest.java| 6 +- .../store/DiagnosticEventMemoryStoreTest.java | 170 +++ 16 files changed, 1047 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/a79e5903/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index ceba843..097e7dd 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Allow retrieving diagnostic events via JMX (CASSANDRA-14435) * Add base classes for diagnostic events (CASSANDRA-13457) * Clear view system metadata when dropping keyspace (CASSANDRA-14646) * Allocate ReentrantLock on-demand in java11 AtomicBTreePartitionerBase (CASSANDRA-14637) http://git-wip-us.apache.org/repos/asf/cassandra/blob/a79e5903/src/java/org/apache/cassandra/config/DatabaseDescriptor.java -- diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java index 65a34f0..aa5ca92 100644 --- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java +++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java @@ -2541,7 +2541,6 @@ public class DatabaseDescriptor return conf.diagnostic_events_enabled; } -@VisibleForTesting public static void setDiagnosticEventsEnabled(boolean enabled) { conf.diagnostic_events_enabled = enabled; http://git-wip-us.apache.org/repos/asf/cassandra/blob/a79e5903/src/java/org/apache/cassandra/diag/DiagnosticEventPersistence.java -- diff --git a/src/java/org/apache/cassandra/diag/DiagnosticEventPersistence.java b/src/java/org/apache/cassandra/diag/DiagnosticEventPersistence.java new file mode 100644 index 000..7da335c --- /dev/null +++ b/src/java/org/apache/cassandra/diag/DiagnosticEventPersistence.java @@ -0,0 +1,151 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.cassandra.diag; + +import java.io.InvalidClassException; +import java.io.Serializable; +import java.util.HashMap; +import java.util.Map; +import java.util.NavigableMap; +import java.util.SortedMap; +import java.util.TreeMap; +import java.util.concurrent.ConcurrentHashMap; +import java.util.function.Consumer; + +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import org.apache.cassandra.diag.store.DiagnosticEventMemoryStore; +import
[3/5] cassandra git commit: Add diagnostic events base classes
Add diagnostic events base classes patch by Stefan Podkowinski; reviewed by Mick Semb Wever for CASSANDRA-13457 Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2846b22a Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2846b22a Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2846b22a Branch: refs/heads/trunk Commit: 2846b22a70d48bae25203be945e02dd3b6cfda56 Parents: d3e6891 Author: Stefan Podkowinski Authored: Thu Mar 16 12:50:52 2017 +0100 Committer: Stefan Podkowinski Committed: Fri Aug 17 14:06:57 2018 +0200 -- CHANGES.txt | 1 + conf/cassandra.yaml | 5 + .../org/apache/cassandra/config/Config.java | 3 + .../cassandra/config/DatabaseDescriptor.java| 11 + .../schema/AlterKeyspaceStatement.java | 5 + .../statements/schema/AlterTableStatement.java | 5 + .../statements/schema/AlterTypeStatement.java | 5 + .../statements/schema/AlterViewStatement.java | 5 + .../schema/CreateAggregateStatement.java| 5 + .../schema/CreateFunctionStatement.java | 5 + .../statements/schema/CreateIndexStatement.java | 5 + .../schema/CreateKeyspaceStatement.java | 5 + .../statements/schema/CreateTableStatement.java | 5 + .../schema/CreateTriggerStatement.java | 5 + .../statements/schema/CreateTypeStatement.java | 5 + .../statements/schema/CreateViewStatement.java | 5 + .../schema/DropAggregateStatement.java | 5 + .../schema/DropFunctionStatement.java | 5 + .../statements/schema/DropIndexStatement.java | 5 + .../schema/DropKeyspaceStatement.java | 5 + .../statements/schema/DropTableStatement.java | 5 + .../statements/schema/DropTriggerStatement.java | 5 + .../statements/schema/DropTypeStatement.java| 5 + .../statements/schema/DropViewStatement.java| 5 + .../org/apache/cassandra/dht/BootStrapper.java | 14 +- .../cassandra/dht/BootstrapDiagnostics.java | 80 + .../apache/cassandra/dht/BootstrapEvent.java| 82 + .../NoReplicationTokenAllocator.java| 4 + .../ReplicationAwareTokenAllocator.java | 7 +- .../TokenAllocatorDiagnostics.java | 195 .../dht/tokenallocator/TokenAllocatorEvent.java | 113 +++ .../tokenallocator/TokenAllocatorFactory.java | 8 +- .../apache/cassandra/diag/DiagnosticEvent.java | 50 +++ .../cassandra/diag/DiagnosticEventService.java | 291 + src/java/org/apache/cassandra/gms/Gossiper.java | 46 ++- .../cassandra/gms/GossiperDiagnostics.java | 113 +++ .../org/apache/cassandra/gms/GossiperEvent.java | 111 +++ src/java/org/apache/cassandra/hints/Hint.java | 2 +- .../apache/cassandra/hints/HintDiagnostics.java | 85 + .../org/apache/cassandra/hints/HintEvent.java | 102 ++ .../cassandra/hints/HintsDispatchExecutor.java | 10 + .../apache/cassandra/hints/HintsDispatcher.java | 52 +-- .../apache/cassandra/hints/HintsService.java| 12 +- .../hints/HintsServiceDiagnostics.java | 65 .../cassandra/hints/HintsServiceEvent.java | 71 + .../apache/cassandra/locator/TokenMetadata.java | 2 + .../locator/TokenMetadataDiagnostics.java | 46 +++ .../cassandra/locator/TokenMetadataEvent.java | 62 src/java/org/apache/cassandra/schema/Diff.java | 5 + .../cassandra/schema/MigrationManager.java | 33 +- .../apache/cassandra/schema/MigrationTask.java | 5 + .../org/apache/cassandra/schema/Schema.java | 34 ++ .../schema/SchemaAnnouncementDiagnostics.java | 60 .../schema/SchemaAnnouncementEvent.java | 104 ++ .../cassandra/schema/SchemaDiagnostics.java | 178 +++ .../apache/cassandra/schema/SchemaEvent.java| 318 +++ .../schema/SchemaMigrationDiagnostics.java | 83 + .../cassandra/schema/SchemaMigrationEvent.java | 114 +++ .../cassandra/schema/SchemaPushVerbHandler.java | 1 + .../service/PendingRangeCalculatorService.java | 23 +- ...endingRangeCalculatorServiceDiagnostics.java | 73 + .../PendingRangeCalculatorServiceEvent.java | 69 .../diag/DiagnosticEventServiceTest.java| 244 ++ 63 files changed, 3047 insertions(+), 40 deletions(-) -- http://git-wip-us.apache.org/repos/asf/cassandra/blob/2846b22a/CHANGES.txt -- diff --git a/CHANGES.txt b/CHANGES.txt index d8aca56..ceba843 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -1,4 +1,5 @@ 4.0 + * Add base classes for diagnostic events (CASSANDRA-13457) * Clear view system metadata when dropping keyspace (CASSANDRA-14646) * Allocate
[jira] [Comment Edited] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583811#comment-16583811 ] Stefan Podkowinski edited comment on CASSANDRA-14346 at 8/17/18 11:52 AM: -- Please include license files like we do in {{lib/licenses}}. Adding {{javax.servlet-api}} (assuming CDDL1.1) requires special handling ([category b|https://www.apache.org/legal/resolved.html#category-x]), so it would be preferable not having to use that dependency. was (Author: spo...@gmail.com): Please include license files like we do in {{lib/licenses}}. Adding {{javax.servlet-api}} (assuming CDDL1.1) requires special handling (category b), so it would be preferable not having to use that dependency. > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.0 > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra
[ https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16583811#comment-16583811 ] Stefan Podkowinski commented on CASSANDRA-14346: Please include license files like we do in {{lib/licenses}}. Adding {{javax.servlet-api}} (assuming CDDL1.1) requires special handling (category b), so it would be preferable not having to use that dependency. > Scheduled Repair in Cassandra > - > > Key: CASSANDRA-14346 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14346 > Project: Cassandra > Issue Type: Improvement > Components: Repair >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Major > Labels: 4.0-feature-freeze-review-requested, > CommunityFeedbackRequested > Fix For: 4.0 > > Attachments: ScheduledRepairV1_20180327.pdf > > > There have been many attempts to automate repair in Cassandra, which makes > sense given that it is necessary to give our users eventual consistency. Most > recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked > for ways to solve this problem. > At Netflix we've built a scheduled repair service within Priam (our sidecar), > which we spoke about last year at NGCC. Given the positive feedback at NGCC > we focussed on getting it production ready and have now been using it in > production to repair hundreds of clusters, tens of thousands of nodes, and > petabytes of data for the past six months. Also based on feedback at NGCC we > have invested effort in figuring out how to integrate this natively into > Cassandra rather than open sourcing it as an external service (e.g. in Priam). > As such, [~vinaykumarcse] and I would like to re-work and merge our > implementation into Cassandra, and have created a [design > document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing] > showing how we plan to make it happen, including the the user interface. > As we work on the code migration from Priam to Cassandra, any feedback would > be greatly appreciated about the interface or v1 implementation features. I > have tried to call out in the document features which we explicitly consider > future work (as well as a path forward to implement them in the future) > because I would very much like to get this done before the 4.0 merge window > closes, and to do that I think aggressively pruning scope is going to be a > necessity. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, found this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task after compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. I think it should wrong using unbound queue. If we using unbound queue, the thread pool wouldn't increasing thread even there're many tasks are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when clean task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, found this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task after compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. I think it should wrong using unbound queue. If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, found this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, found this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task after compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. and we found that many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, these old sstables consume so many disk space, it's causing no enough disk for saving real data. we found that so many files
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is throwed out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrown out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is throwed out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading data. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "file map count" is 1 million, these values should enough for loading. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "map count" is 1 million, these values should enough for loading. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files
[jira] [Updated] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
[ https://issues.apache.org/jira/browse/CASSANDRA-14653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Xie updated CASSANDRA-14653: -- Description: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met some problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "map count" is 1 million, these values should enough for loading. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) Unknown macro: \{ super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) Unknown macro: \{ super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); }{quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} was: We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met an problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "map count" is 1 million, these values should enough for loading. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like
[jira] [Created] (CASSANDRA-14653) The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low
Peter Xie created CASSANDRA-14653: - Summary: The performance of "NonPeriodicTasks" pools defined in class ScheduledExecutors is low Key: CASSANDRA-14653 URL: https://issues.apache.org/jira/browse/CASSANDRA-14653 Project: Cassandra Issue Type: Improvement Components: Compaction Environment: Cassandra nodes : 3 nodes, 330G physical memory per node , and four data directory (ssd) per node. Reporter: Peter Xie We use cassandra as backend storage for Janusgraph. when we loading huge data (~2 billion vertex, ~10 billion edges), we met an problems. At first, we use STCS as compaction strategy , but met below exception. we checked the value of "max memory lock" is unlimited and "map count" is 1 million, these values should enough for loading. last , we found this problem is caused by the virtual memory are all cosumed by cassandra. So not additional virtual memory can be used by compaction task , and below exception is thrower out. {quote}ERROR [CompactionExecutor:267] 2018-08-09 02:28:40,952 JVMStabilityInspector.javv a:74 - OutOfMemory error letting the JVM handle the error: java.lang.OutOfMemoryError: Map failed {quote} So, we change compaction strategy to LCS, this change seems can resolve the virtual memory problem. But we found another problem : Many sstables which has been compacted are still retained on disk, at last these old sstable consume so many disk space, it's causing no enough disk for saving real data. we found that so many files like "mc_txn_compaction_xxx.log" are created under the data directory. After some times' investigaton, we found that this problem is caused by "NonPeriodicTasks" thread pools. this pools is always using only one thread for processing clean task and compaction. this thread pool is instanced with class DebuggableScheduledThreadPoolExecutor, and DebuggableScheduledThreadPoolExecutor is inherit from class ScheduledThreadPoolExecutor. By reading the code of class DebuggableScheduledThreadPoolExecutor, found DebuggableScheduledThreadPoolExecutor is using an unbound task queue, and core pool size is 1. Why here use the unbound queue for queuing submitted tasks? If we using unbound queue, the thread pool wouldn't increasing thread even there so many task are blocked in queue, because unbound queue never would be full. I think here should use bound queue, so when task is heavily, more threads would created for processing them. {quote}public DebuggableScheduledThreadPoolExecutor(int corePoolSize, String threadPoolName, int priority) { super(corePoolSize, new NamedThreadFactory(threadPoolName, priority)); setRejectedExecutionHandler(rejectedExecutionHandler); } public ScheduledThreadPoolExecutor(int corePoolSize, ThreadFactory threadFactory) { super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory); } {quote} Below is the case about clean task after compaction. there nearly 3 hours delay for removing file "mc-56525". {quote} TRACE [CompactionExecutor:81] 2018-08-16 21:22:29,664 LifecycleTransaction.java:363 - Staging for obsolescence BigTableReader(path='/sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big-Data.db') .. TRACE [CompactionExecutor:81] 2018-08-16 21:22:41,162 Tracker.java:165 - removing /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big from list of files tracked for test_2.edgestore TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,179 SSTableReader.java:2175 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, before barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,180 SSTableReader.java:2181 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, after barrier TRACE [NonPeriodicTasks:1] 2018-08-17 00:28:47,182 SSTableReader.java:2196 - Async instance tidier for /sdb/data/test_2/edgestore-365b0b70a05911e8806001ebe60a5ce7/mc-56525-big, completed {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[Cassandra Wiki] Update of "Committers" by BenjaminLerer
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification. The "Committers" page has been changed by BenjaminLerer: https://wiki.apache.org/cassandra/Committers?action=diff=78=79 ||Josh Mckenzie ||Jul 2014 ||Datastax ||PMC member || ||Robert Stupp ||Jan 2015 ||Datastax || || ||Sam Tunnicliffe ||May 2015 ||Apple || || - ||Benjamin Lerer ||Jul 2015 ||Datastax || || + ||Benjamin Lerer ||Jul 2015 ||Datastax ||PMC member || ||Carl Yeksigian ||Jan 2016 ||Datastax ||Also a [[http://thrift.apache.org|Thrift]] committer || ||Stefania Alborghetti ||Apr 2016 ||Datastax || || ||Jeff Jirsa ||June 2016 ||Apple|| PMC member || - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org