[jira] [Updated] (CASSANDRA-14559) Check for endpoint collision with hibernating nodes

2018-07-22 Thread Kurt Greaves (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14559:
-
Assignee: Vincent White
  Status: Patch Available  (was: Open)

> Check for endpoint collision with hibernating nodes 
> 
>
> Key: CASSANDRA-14559
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14559
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Vincent White
>Assignee: Vincent White
>Priority: Major
>
> I ran across an edge case when replacing a node with the same address. This 
> issue results in the node(and its tokens) being unsafely removed from gossip.
> Steps to replicate:
> 1. Create 3 node cluster.
> 2. Stop a node
> 3. Replace the stopped node with a node using the same address using the 
> replace_address flag
> 4. Stop the node before it finishes bootstrapping
> 5. Remove the replace_address flag and restart the node to resume 
> bootstrapping (if the data dir is also cleared at this point the node will 
> also generate new tokens when it starts)
> 6. Stop the node before it finishes bootstrapping again
> 7. 30 Seconds later the node will be removed from gossip because it now 
> matches the check for a FatClient
> I think this is only an issue when replacing a node with the same address 
> because other replacements now use STATUS_BOOTSTRAPPING_REPLACE and leave the 
> dead node unchanged.
> I believe the simplest fix for this is to add a check that prevents a 
> non-bootstrapped node (without the replaces_address flag) starting if there 
> is a gossip entry for the same address in the hibernate state. 
> [3.11 PoC 
> |https://github.com/apache/cassandra/compare/trunk...vincewhite:check_for_hibernate_on_start]
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14553) Document troubleshooting page

2018-07-22 Thread Joseph Lynch (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14553:
-
Status: Patch Available  (was: Open)

||trunk||
|[patch|https://github.com/apache/cassandra/compare/trunk...jolynch:jolynch_write_troubleshooting]|

The markdown displays on github don't exactly work for all sphinx features, so 
I'm going to try to stand up a local sphinx and see if I can run the docs to 
double check all the links and such.

I think this is ready for review if anyone has time. Probably want to start 
with and [the 
index|https://github.com/apache/cassandra/compare/trunk...jolynch:jolynch_write_troubleshooting#diff-933941591a6261840299fd35ffb92cb0]
 go from there.

> Document troubleshooting page
> -
>
> Key: CASSANDRA-14553
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14553
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Minor
>  Labels: Documentation
>
> Currently the [troubleshooting 
> docs|http://cassandra.apache.org/doc/latest/troubleshooting/] are blank. As 
> much as I like to believe Cassandra never has any problems I was thinking of 
> writing up a troubleshooting page focussing on:
>  # Finding the hosts(s) that are behaving badly (common error messages)
>  # Which logs exist, where they are, and what to look for in which log 
> (common error messages, gc logs, etc)
>  # Which nodetool commands can give you more information
>  # Java/Operating systems tools that can help dive deep into performance 
> issues (jstat, top, iostat, cachestat, etc) 
> Since this is going to be a fairly lengthy page I wanted to get a jira going 
> in case someone else had ideas or had already started. Also if there are any 
> large areas I missed above please comment here and I can include them.
> [~cscotta] fyi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-9608) Support Java 11

2018-07-22 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551975#comment-16551975
 ] 

Benedict edited comment on CASSANDRA-9608 at 7/22/18 11:22 PM:
---

I assume this is to handle e.g. the removal of Unsafe.monitorEnter/monitorExit?

A simple improvement to this would be to use a volatile \{{ReentrantLock}} 
instead of a long thread id, that is allocated only on contention. This isn't 
quite as cheap as Unsafe.monitorEnter/monitorExit, as the inflated lock is 
never repurposed until flush.  

If we wanted, we could probably implement a partially re-usable one, which 
might be applicable elsewhere, or we could have a static ConcurrentMap to inflate only precisely when needed (though this will cost more 
CPU/memory bandwidth and allocations).

Or we could implement some static helper methods to help us lock against a 
property using a special inflated lock object, that can be used for 
synchronisation until there is no contention, and the last owning thread sets 
the property to null on completion.  This is something I could rustle up fairly 
quickly, so let me know if you want me to produce a utility class for this.


was (Author: benedict):
I assume this is to handle e.g. the removal of Unsafe.monitorEnter/monitorExit?

A simple improvement to this would be to use a volatile WaitQueue instead of a 
long thread id, that is allocated only on contention, then used as any other 
WaitQueue to manage waiting threads.  This isn't quite as cheap as 
Unsafe.monitorEnter/monitorExit, as the inflated lock is never repurposed until 
flush.  If we wanted, we could probably implement a partially re-usable one, 
which might be applicable elsewhere, or we could have a static 
ConcurrentMap to inflate only precisely when needed (though 
this will cost more CPU/memory bandwidth and allocations).

> Support Java 11
> ---
>
> Key: CASSANDRA-9608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9608
> Project: Cassandra
>  Issue Type: Task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 4.x
>
> Attachments: jdk_9_10.patch
>
>
> This ticket is intended to group all issues found to support Java 9 in the 
> future.
> From what I've found out so far:
> * Maven dependency {{com.sun:tools:jar:0}} via cobertura cannot be resolved. 
> It can be easily solved using this patch:
> {code}
> - artifactId="cobertura"/>
> + artifactId="cobertura">
> +  
> +
> {code}
> * Another issue is that {{sun.misc.Unsafe}} no longer contains the methods 
> {{monitorEnter}} + {{monitorExit}}. These methods are used by 
> {{o.a.c.utils.concurrent.Locks}} which is only used by 
> {{o.a.c.db.AtomicBTreeColumns}}.
> I don't mind to start working on this yet since Java 9 is in a too early 
> development phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14556) Optimize streaming path in Cassandra

2018-07-22 Thread Jeff Jirsa (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552157#comment-16552157
 ] 

Jeff Jirsa commented on CASSANDRA-14556:


In the future (not this patch), would it make sense to add presence of legacy 
shards to sstable metadata? Would let us potentially take this path more often, 
and maybe we can use it for the eventual ticket where we’ll clean them up.

> Optimize streaming path in Cassandra
> 
>
> Key: CASSANDRA-14556
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14556
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Streaming and Messaging
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Major
>  Labels: Performance
> Fix For: 4.x
>
>
> During streaming, Cassandra reifies the sstables into objects. This creates 
> unnecessary garbage and slows down the whole streaming process as some 
> sstables can be transferred as a whole file rather than individual 
> partitions. The objective of the ticket is to detect when a whole sstable can 
> be transferred and skip the object reification. We can also use a zero-copy 
> path to avoid bringing data into user-space on both sending and receiving 
> side.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9242) Add PerfDisableSharedMem to default JVM params

2018-07-22 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16552136#comment-16552136
 ] 

Joseph Lynch commented on CASSANDRA-9242:
-

Do we think it's worth documenting that an alternative workaround that doesn't 
break JVM tooling is to simply mount {{/tmp}} on an in memory tmpfs? 

As I'm writing the troubleshooting docs page (CASSANDRA-14553) it's somewhat 
tricky not having standard JVM tooling like {{jstat}} around to help debug e.g. 
heap pressure.

> Add PerfDisableSharedMem to default JVM params
> --
>
> Key: CASSANDRA-9242
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9242
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Matt Stump
>Assignee: Ariel Weisberg
>Priority: Major
> Fix For: 2.2.0 beta 1
>
>
> We should add PerfDisableSharedMem to default JVM params. The JVM will save 
> stats to a memory mapped file when reaching a safepoint. This is performed 
> synchronously and the JVM remains paused while this action takes place. 
> Occasionally the OS will stall the calling thread while this happens 
> resulting in significant impact to worst case JVM pauses. By disabling the 
> save in the JVM these mysterious multi-second pauses disappear.
> The behavior is outlined in [this 
> article|http://www.evanjones.ca/jvm-mmap-pause.html]. Another manifestation 
> is significant time spent in sys during GC pauses. In [the linked 
> test|http://cstar.datastax.com/graph?stats=762d9c2a-eace-11e4-8236-42010af0688f&metric=gc_max_ms&operation=1_write&smoothing=1&show_aggregates=true&xmin=0&xmax=110.77&ymin=0&ymax=10421.4]
>  you'll notice multiple seconds spent in sys during the longest pauses.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9608) Support Java 11

2018-07-22 Thread Benedict (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-9608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16551975#comment-16551975
 ] 

Benedict commented on CASSANDRA-9608:
-

I assume this is to handle e.g. the removal of Unsafe.monitorEnter/monitorExit?

A simple improvement to this would be to use a volatile WaitQueue instead of a 
long thread id, that is allocated only on contention, then used as any other 
WaitQueue to manage waiting threads.  This isn't quite as cheap as 
Unsafe.monitorEnter/monitorExit, as the inflated lock is never repurposed until 
flush.  If we wanted, we could probably implement a partially re-usable one, 
which might be applicable elsewhere, or we could have a static 
ConcurrentMap to inflate only precisely when needed (though 
this will cost more CPU/memory bandwidth and allocations).

> Support Java 11
> ---
>
> Key: CASSANDRA-9608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9608
> Project: Cassandra
>  Issue Type: Task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Minor
> Fix For: 4.x
>
> Attachments: jdk_9_10.patch
>
>
> This ticket is intended to group all issues found to support Java 9 in the 
> future.
> From what I've found out so far:
> * Maven dependency {{com.sun:tools:jar:0}} via cobertura cannot be resolved. 
> It can be easily solved using this patch:
> {code}
> - artifactId="cobertura"/>
> + artifactId="cobertura">
> +  
> +
> {code}
> * Another issue is that {{sun.misc.Unsafe}} no longer contains the methods 
> {{monitorEnter}} + {{monitorExit}}. These methods are used by 
> {{o.a.c.utils.concurrent.Locks}} which is only used by 
> {{o.a.c.db.AtomicBTreeColumns}}.
> I don't mind to start working on this yet since Java 9 is in a too early 
> development phase.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14580) Make PeriodicCommitLogService.blockWhenSyncLagsNanos configurable

2018-07-22 Thread Jason Brown (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-14580.
-
Resolution: Fixed

Committed as sha {{176d4bac22c356c80e275dcb4040bc5cbd0da1c2}}.

> Make PeriodicCommitLogService.blockWhenSyncLagsNanos configurable
> -
>
> Key: CASSANDRA-14580
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14580
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Jason Brown
>Assignee: Jason Brown
>Priority: Minor
> Fix For: 4.0
>
>
> When using the default values for periodic commitlog, the sync is every ten 
> seconds and if there's a lad in flushing to disc, we can block for up to 15 
> seconds (sync time * 1.5). However, if you lower the sync time to 1 second, 
> for example, the block time is only 1.5 seconds (not acceptable in all 
> situations). Admittedly this is only an expert-level setting, but useful in 
> some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



cassandra git commit: Make PeriodicCommitLogService.blockWhenSyncLagsNanos configurable

2018-07-22 Thread jasobrown
Repository: cassandra
Updated Branches:
  refs/heads/trunk 9abeff38c -> 176d4bac2


Make PeriodicCommitLogService.blockWhenSyncLagsNanos configurable

patch by jasobrown; reviewed by Jordan West for CASSANDRA-14580


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/176d4bac
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/176d4bac
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/176d4bac

Branch: refs/heads/trunk
Commit: 176d4bac22c356c80e275dcb4040bc5cbd0da1c2
Parents: 9abeff3
Author: Jason Brown 
Authored: Fri Jul 20 16:05:18 2018 -0700
Committer: Jason Brown 
Committed: Sun Jul 22 03:20:46 2018 -0700

--
 CHANGES.txt  | 1 +
 conf/cassandra.yaml  | 4 
 src/java/org/apache/cassandra/config/Config.java | 1 +
 src/java/org/apache/cassandra/config/DatabaseDescriptor.java | 8 
 .../cassandra/db/commitlog/PeriodicCommitLogService.java | 4 +++-
 5 files changed, 17 insertions(+), 1 deletion(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/176d4bac/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index faf37ea..4ba3313 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0
+ * Make PeriodicCommitLogService.blockWhenSyncLagsNanos configurable 
(CASSANDRA-14580)
  * Improve logging in MessageInHandler's constructor (CASSANDRA-14576)
  * Set broadcast address in internode messaging handshake (CASSANDRA-14579)
  * Wait for schema agreement prior to building MVs (CASSANDRA-14571)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/176d4bac/conf/cassandra.yaml
--
diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index 7ff056d..439b85a 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml
@@ -389,6 +389,10 @@ counter_cache_save_period: 7200
 commitlog_sync: periodic
 commitlog_sync_period_in_ms: 1
 
+# When in periodic commitlog mode, the number of milliseconds to block writes
+# while waiting for a slow disk flush to complete.
+# periodic_commitlog_sync_lag_block_in_ms: 
+
 # The size of the individual commitlog file segments.  A commitlog
 # segment may be archived, deleted, or recycled once all the data
 # in it (potentially from each columnfamily in the system) has been

http://git-wip-us.apache.org/repos/asf/cassandra/blob/176d4bac/src/java/org/apache/cassandra/config/Config.java
--
diff --git a/src/java/org/apache/cassandra/config/Config.java 
b/src/java/org/apache/cassandra/config/Config.java
index d9250bb..0d4760e 100644
--- a/src/java/org/apache/cassandra/config/Config.java
+++ b/src/java/org/apache/cassandra/config/Config.java
@@ -204,6 +204,7 @@ public class Config
 public int commitlog_segment_size_in_mb = 32;
 public ParameterizedClass commitlog_compression;
 public int commitlog_max_compression_buffers_in_pool = 3;
+public Integer periodic_commitlog_sync_lag_block_in_ms;
 public TransparentDataEncryptionOptions 
transparent_data_encryption_options = new TransparentDataEncryptionOptions();
 
 public Integer max_mutation_size_in_kb;

http://git-wip-us.apache.org/repos/asf/cassandra/blob/176d4bac/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
--
diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java 
b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index 91ee63a..2dc3737 100644
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
@@ -1866,6 +1866,14 @@ public class DatabaseDescriptor
 return conf.commitlog_sync_period_in_ms;
 }
 
+public static long getPeriodicCommitLogSyncBlock()
+{
+Integer blockMillis = conf.periodic_commitlog_sync_lag_block_in_ms;
+return blockMillis == null
+   ? (long)(getCommitLogSyncPeriod() * 1.5)
+   : blockMillis;
+}
+
 public static void setCommitLogSyncPeriod(int periodMillis)
 {
 conf.commitlog_sync_period_in_ms = periodMillis;

http://git-wip-us.apache.org/repos/asf/cassandra/blob/176d4bac/src/java/org/apache/cassandra/db/commitlog/PeriodicCommitLogService.java
--
diff --git 
a/src/java/org/apache/cassandra/db/commitlog/PeriodicCommitLogService.java 
b/src/java/org/apache/cassandra/db/commitlog/PeriodicCommitLogService.java
index efd3394..e94c616 100644
--- a/src/java/org/apache/cassandra/db/commitlog/PeriodicCommitLogService.j