[jira] [Updated] (CASSANDRA-15034) cassandra-stress fails to retry user profile insert and query operations

2020-04-13 Thread Dmitry Kropachev (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dmitry Kropachev updated CASSANDRA-15034:
-
Reviewers: Dinesh Joshi, Dmitry Kropachev  (was: Dinesh Joshi)
   Status: Review In Progress  (was: Patch Available)

Hello Dinesh,

 

Could you please take a look at the fix, it is 10 lines of code:

 

[https://github.com/apache/cassandra/pull/443/files]

 

> cassandra-stress fails to retry user profile insert and query operations 
> -
>
> Key: CASSANDRA-15034
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15034
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/stress
>Reporter: Shlomi Livne
>Assignee: Dinesh Joshi
>Priority: Normal
> Attachments: 0001-Fix-retry-of-userdefined-operations.patch, 
> stress.yaml
>
>
> cassandra-stress that is run with a user profile against a cluster will fail 
> to try operations when a node is killed.
> To reproduce:
> # Create a 3 ndoe cluster with ccm
> {code:java}
> ccm create cas-3 --vnodes -n 1 --version=3.11.3``{code}
> # start the cluster
> {code:java}
>  ccm start{code}
> # run
> {code:java}
> tools/bin/cassandra-stress user profile=stress.yaml n=10 ops(insert=1) 
> no-warmup cl=QUORUM -node 127.0.0.2 -rate threads=10{code}
> or run
> {code:java}
> tools/bin/cassandra-stress user profile=stress.yaml n=10 ops(simple=1) 
> no-warmup cl=QUORUM -node 127.0.0.2 -rate threads=10{code}
> # while the stress is progressing kill a node
> {code:java}
> ccm node1 stop --not-gently{code}
> # wait for cassandra-stress to end
> for the insert case check
> {code:java}
> ccm node2 cqlsh -e " select count(*) FROM stresscql.tb ;"{code}
> we are missing rows
> for the query case (simple) the following errors will be reported
> {code:java}
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.io.IOException: Operation x10 on key(s) [49a26f35297303469236]: Error 
> executing: (NoSuchElementException)
> at 
> org.apache.cassandra.stress.Operation.error(Operation.java:127)java.util.NoSuchElementException
> at 
> org.apache.cassandra.stress.Operation.timeWithRetry(Operation.java:105)com.datastax.driver.core.exceptions.TransportException:
>  [/127.0.0.1:9042] Connection has been closed
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.util.NoSuchElementException
> java.util.NoSuchElementException{code}
> profile: stress.yaml[^stress.yaml]
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15623) When running CQLSH with STDIN input, exit with error status code if script fails

2020-04-13 Thread Jacob Becker (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082812#comment-17082812
 ] 

Jacob Becker commented on CASSANDRA-15623:
--

[~djoshi], sure, no problem, here are the respective PRs:

* [For cassandra-3.11|https://github.com/apache/cassandra/pull/467]
* [For cassandra-3.0|https://github.com/apache/cassandra/pull/536]

The first one is the original PR I have initially submitted (as it was based 
off cassandra-3.11), I just rebased off upstream to include the latest changes 
and changed the PR description. The second one is new.


> When running CQLSH with STDIN input, exit with error status code if script 
> fails
> 
>
> Key: CASSANDRA-15623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15623
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
>Reporter: Jacob Becker
>Assignee: Jacob Becker
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Assuming CASSANDRA-6344 is in place for years and considering that scripts 
> submitted with the `-e` option behave in a similar fashion, it is very 
> surprising that scripts submitted to STDIN (i.e. piped in) always exit with a 
> zero code, regardless of errors. I believe this should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15701) Does Cassandra 3.11.3/3.11.5 is affected by CVE-2019-10712 or not ?

2020-04-13 Thread wht (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082803#comment-17082803
 ] 

wht commented on CASSANDRA-15701:
-

hi, i really wanna know whether it has an impact to Cassandra 
3.11.3/3.11.5/3.11.6,  Is there anyone can help ?

> Does  Cassandra 3.11.3/3.11.5  is affected by CVE-2019-10712 or not ?
> -
>
> Key: CASSANDRA-15701
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15701
> Project: Cassandra
>  Issue Type: Bug
>  Components: Dependencies
>Reporter: wht
>Priority: Normal
>
> Because  cassandra 3.11.3/3.11.5 rely on jackson-mapper-asl-1.9.13.jar which 
> has been reported a vulnerability CVE-2019-10172, 
> [https://nvd.nist.gov/vuln/detail/CVE-2019-10172], so I want to know if it 
> has an impact to cassandra. Thanks!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15688) Invalid cdc_raw_directory prevents server startup

2020-04-13 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082784#comment-17082784
 ] 

Dinesh Joshi commented on CASSANDRA-15688:
--

Committed, thanks for the patch!

> Invalid cdc_raw_directory prevents server startup
> -
>
> Key: CASSANDRA-15688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15688
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Change Data Capture
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{cdc_raw_directory}} is set to an invalid directory it prevents startup 
> of the server even when cdc_enabled is set false.
> The directory can either be set directly by the {{cdc_raw_directory}} setting 
> in configuration YAML or indirectly via the {{cassandra.storage_dir}} system 
> property, which is how I encountered it.
> Easy to reproduce by setting {{cdc_raw_directory}} to {{notadir/notasubdir}}
> Additionally while investigating, discovered that 
> {{DatabaseDescriptor.guessFileStore}} can cause a {{NullPointerException}} if 
> it runs out of parent elements
>  before it can get a FileStore. It should provide a more useful 
> ConfigurationException providing details on the problematic path.
>  {{guessFileStore}} is used for checks on {{commitlog_directory}}, 
> {{cdc_raw_directory}} and {{data_file_directories}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15721) any open source management tool for appache cassandra

2020-04-13 Thread C. Scott Andreas (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas reassigned CASSANDRA-15721:


Assignee: Chirantan

> any open source management tool for appache cassandra 
> --
>
> Key: CASSANDRA-15721
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15721
> Project: Cassandra
>  Issue Type: Task
>Reporter: Chirantan
>Assignee: Chirantan
>Priority: Normal
>
> pls recomend any open source management tool for appache cassandra  similare 
> to ops center 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15721) any open source management tool for appache cassandra

2020-04-13 Thread C. Scott Andreas (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-15721:
-
Resolution: Not A Problem
Status: Resolved  (was: Triage Needed)

> any open source management tool for appache cassandra 
> --
>
> Key: CASSANDRA-15721
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15721
> Project: Cassandra
>  Issue Type: Task
>Reporter: Chirantan
>Priority: Normal
>
> pls recomend any open source management tool for appache cassandra  similare 
> to ops center 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15721) any open source management tool for appache cassandra

2020-04-13 Thread C. Scott Andreas (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15721?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082783#comment-17082783
 ] 

C. Scott Andreas commented on CASSANDRA-15721:
--

Hi Chirantan, this bug tracker is used to track bugs in the database and 
feature development.

For questions like this, please refer to the documentation at 
[https://cassandra.apache.org|https://cassandra.apache.org/] or reach out to 
the user community via email or Slack. Information regarding how to join the 
mailing list or ASF Slack channel (#cassandra) is located here: 
https://cassandra.apache.org/community/

> any open source management tool for appache cassandra 
> --
>
> Key: CASSANDRA-15721
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15721
> Project: Cassandra
>  Issue Type: Task
>Reporter: Chirantan
>Priority: Normal
>
> pls recomend any open source management tool for appache cassandra  similare 
> to ops center 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15718) Improve BatchMetricsTest

2020-04-13 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15718:
-
Reviewers: David Capwell, Dinesh Joshi  (was: David Capwell)

> Improve BatchMetricsTest 
> -
>
> Key: CASSANDRA-15718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15718
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> As noted in CASSANDRA-15582 {{BatchMetricsTest}} should test 
> {{BatchStatement.Type.COUNTER}} to cover all the {{BatchMetrics}}.  Some 
> changes were introduced to make this improvement at:
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics
> and the following suggestions were made in review (in addition to the 
> suggestion that a separate JIRA be created for this change) by [~dcapwell]:
> {quote}
> * I like the usage of BatchStatement.Type for the tests
> * honestly feel quick theories is better than random, but glad you added the 
> seed to all asserts =). Would still be better as a quick theories test since 
> you basically wrote a property anyways!
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R131
>  feel you should rename to expectedPartitionsPerLoggedBatch 
> {Count,Logged,Unlogged}
> * . pre is what the value is, post is what the value is expected to be 
> (rather than what it is).
> * 
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R150
>  this doesn't look correct. the batch has distinctPartitions mutations, so 
> shouldn't max reflect that? I ran the current test in a debugger and see that 
> that is the case (aka current test is wrong).
> most of the comments are nit picks, but the last one looks like a test bug to 
> me
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15688) Invalid cdc_raw_directory prevents server startup

2020-04-13 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15688:
-
  Since Version: 4.0
Source Control Link: 
https://github.com/apache/cassandra/commit/02c6d6540c6ab108b763a639146e74e9f8d0dd40
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Invalid cdc_raw_directory prevents server startup
> -
>
> Key: CASSANDRA-15688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15688
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Change Data Capture
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{cdc_raw_directory}} is set to an invalid directory it prevents startup 
> of the server even when cdc_enabled is set false.
> The directory can either be set directly by the {{cdc_raw_directory}} setting 
> in configuration YAML or indirectly via the {{cassandra.storage_dir}} system 
> property, which is how I encountered it.
> Easy to reproduce by setting {{cdc_raw_directory}} to {{notadir/notasubdir}}
> Additionally while investigating, discovered that 
> {{DatabaseDescriptor.guessFileStore}} can cause a {{NullPointerException}} if 
> it runs out of parent elements
>  before it can get a FileStore. It should provide a more useful 
> ConfigurationException providing details on the problematic path.
>  {{guessFileStore}} is used for checks on {{commitlog_directory}}, 
> {{cdc_raw_directory}} and {{data_file_directories}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Do not check cdc_raw_directory filesystem space if CDC disabled

2020-04-13 Thread djoshi
This is an automated email from the ASF dual-hosted git repository.

djoshi pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 02c6d65  Do not check cdc_raw_directory filesystem space if CDC 
disabled
02c6d65 is described below

commit 02c6d6540c6ab108b763a639146e74e9f8d0dd40
Author: Jon Meredith 
AuthorDate: Thu Apr 2 15:31:40 2020 -0600

Do not check cdc_raw_directory filesystem space if CDC disabled

On startup, applySimpleConfig checks disk space for cdc_raw_directory
even if cdc_enabled=false.  The cdc_raw_directory could be computed
automatically from the cassandra.storagedir property so if that
has been deliberately set to an invalid directory (e.g. to force
explicit configuration of storage paths) then the server will not
start.

Additionally this protects against an NPE while checking storage
space if misconfigured.

Patch by Jon Meredith; Reviewed by Dinesh Joshi for CASSANDRA-15688
---
 CHANGES.txt|   1 +
 .../cassandra/config/DatabaseDescriptor.java   | 108 -
 .../cassandra/config/DatabaseDescriptorTest.java   |  22 +
 .../commitlog/CommitLogSegmentManagerCDCTest.java  |   1 +
 4 files changed, 86 insertions(+), 46 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index dc04d30..38fef18 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-alpha4
+ * Do not check cdc_raw_directory filesystem space if CDC disabled 
(CASSANDRA-15688)
  * Replace array iterators with get by index (CASSANDRA-15394)
  * Minimize BTree iterator allocations (CASSANDRA-15389)
  * Add client request size server metrics (CASSANDRA-15704)
diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java 
b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index e6bee3a..d5794ae 100644
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
@@ -79,6 +79,7 @@ import org.apache.commons.lang3.StringUtils;
 
 import static java.util.concurrent.TimeUnit.MILLISECONDS;
 import static org.apache.cassandra.io.util.FileUtils.ONE_GB;
+import static org.apache.cassandra.io.util.FileUtils.ONE_MB;
 
 public class DatabaseDescriptor
 {
@@ -539,71 +540,60 @@ public class DatabaseDescriptor
 conf.native_transport_max_concurrent_requests_in_bytes_per_ip = 
Runtime.getRuntime().maxMemory() / 40;
 }
 
-if (conf.cdc_raw_directory == null)
-{
-conf.cdc_raw_directory = storagedirFor("cdc_raw");
-}
-
-// Windows memory-mapped CommitLog files is incompatible with CDC as 
we hard-link files in cdc_raw. Confirm we don't have both enabled.
-if (FBUtilities.isWindows && conf.cdc_enabled && 
conf.commitlog_compression == null)
-throw new ConfigurationException("Cannot enable cdc on Windows 
with uncompressed commitlog.");
-
 if (conf.commitlog_total_space_in_mb == null)
 {
-int preferredSize = 8192;
-int minSize = 0;
+final int preferredSizeInMB = 8192;
 try
 {
 // use 1/4 of available space.  See discussion on #10013 and 
#10199
-minSize = 
Ints.saturatedCast((guessFileStore(conf.commitlog_directory).getTotalSpace() / 
1048576) / 4);
+final long totalSpaceInBytes = 
guessFileStore(conf.commitlog_directory).getTotalSpace();
+conf.commitlog_total_space_in_mb = 
calculateDefaultSpaceInMB("commitlog",
+ 
conf.commitlog_directory,
+ 
"commitlog_total_space_in_mb",
+ 
preferredSizeInMB,
+ 
totalSpaceInBytes, 1, 4);
+
 }
 catch (IOException e)
 {
 logger.debug("Error checking disk space", e);
-throw new ConfigurationException(String.format("Unable to 
check disk space available to %s. Perhaps the Cassandra user does not have the 
necessary permissions",
+throw new ConfigurationException(String.format("Unable to 
check disk space available to '%s'. Perhaps the Cassandra user does not have 
the necessary permissions",

conf.commitlog_directory), e);
 }
-if (minSize < preferredSize)
-{
-logger.warn("Small commitlog volume detected at {}; setting 
commitlog_total_space_in_mb to {}.  You can override this in cassandra.yaml",
-conf.commitlog_directory, minSize);
-   

[jira] [Updated] (CASSANDRA-15688) Invalid cdc_raw_directory prevents server startup

2020-04-13 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15688:
-
Status: Ready to Commit  (was: Review In Progress)

+1

> Invalid cdc_raw_directory prevents server startup
> -
>
> Key: CASSANDRA-15688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15688
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Change Data Capture
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{cdc_raw_directory}} is set to an invalid directory it prevents startup 
> of the server even when cdc_enabled is set false.
> The directory can either be set directly by the {{cdc_raw_directory}} setting 
> in configuration YAML or indirectly via the {{cassandra.storage_dir}} system 
> property, which is how I encountered it.
> Easy to reproduce by setting {{cdc_raw_directory}} to {{notadir/notasubdir}}
> Additionally while investigating, discovered that 
> {{DatabaseDescriptor.guessFileStore}} can cause a {{NullPointerException}} if 
> it runs out of parent elements
>  before it can get a FileStore. It should provide a more useful 
> ConfigurationException providing details on the problematic path.
>  {{guessFileStore}} is used for checks on {{commitlog_directory}}, 
> {{cdc_raw_directory}} and {{data_file_directories}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15688) Invalid cdc_raw_directory prevents server startup

2020-04-13 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15688:
-
Reviewers: Dinesh Joshi, Dinesh Joshi  (was: Dinesh Joshi)
   Dinesh Joshi, Dinesh Joshi  (was: Dinesh Joshi)
   Status: Review In Progress  (was: Patch Available)

> Invalid cdc_raw_directory prevents server startup
> -
>
> Key: CASSANDRA-15688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15688
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Change Data Capture
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> If {{cdc_raw_directory}} is set to an invalid directory it prevents startup 
> of the server even when cdc_enabled is set false.
> The directory can either be set directly by the {{cdc_raw_directory}} setting 
> in configuration YAML or indirectly via the {{cassandra.storage_dir}} system 
> property, which is how I encountered it.
> Easy to reproduce by setting {{cdc_raw_directory}} to {{notadir/notasubdir}}
> Additionally while investigating, discovered that 
> {{DatabaseDescriptor.guessFileStore}} can cause a {{NullPointerException}} if 
> it runs out of parent elements
>  before it can get a FileStore. It should provide a more useful 
> ConfigurationException providing details on the problematic path.
>  {{guessFileStore}} is used for checks on {{commitlog_directory}}, 
> {{cdc_raw_directory}} and {{data_file_directories}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15623) When running CQLSH with STDIN input, exit with error status code if script fails

2020-04-13 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082765#comment-17082765
 ] 

Dinesh Joshi commented on CASSANDRA-15623:
--

Since I've not heard anything on the dev list, I am inclined to back port your 
change with a warning in the {{NEWS.txt}}. Could you please produce two 
separate commits one for 3.0 and 3.11 respectively?

> When running CQLSH with STDIN input, exit with error status code if script 
> fails
> 
>
> Key: CASSANDRA-15623
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15623
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Tools
>Reporter: Jacob Becker
>Assignee: Jacob Becker
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Assuming CASSANDRA-6344 is in place for years and considering that scripts 
> submitted with the `-e` option behave in a similar fashion, it is very 
> surprising that scripts submitted to STDIN (i.e. piped in) always exit with a 
> zero code, regardless of errors. I believe this should be fixed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15229) BufferPool Regression

2020-04-13 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082202#comment-17082202
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15229 at 4/14/20, 12:02 AM:
---

bq. In networking, most of the time, buffer will be release immediately after 
allocation and with recycleWhenFree=false, fully freed chunk will be reused 
instead of being recycled to global list. Partial-recycle is unlikely affect 
networking usage. I am happy to test it..

It is famously difficult to prove a negative, particularly via external 
testing.  It will be untrue in some circumstances, most notably large message 
processing (which happens asynchronously).  I would need to review the buffer 
control flow in messaging to confirm it is sufficiently low risk to modify the 
behaviour here, so I would prefer we not modify it in a way that is not easily 
verified.

bq. will it create fragmentation in system direct memory?

-Not easily completely ruled out, but given this data will be allocated mostly 
in its own virtual page space (given all allocations are much larger than a 
normal page), it hopefully shouldn't be an insurmountable problem for most 
allocators given the availability of almost unlimited virtual page space on 
modern systems.-

edit: while this may be true, it's a bit of a stretch as I haven't looked at 
any modern allocator remotely recently, and I should not extrapolate in this 
way (however it's anyway probably not something to worry about if we're 
allocating relatively regular sizes)

bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline.

Did you perform the simple optimisation of rounding up to the >= 2KiB boundary 
(for equivalent behaviour), then re-using any buffer that is correctly sized 
when evicting to make room for a new item?  It might well be possible to make 
this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 
1KiB, or perhaps as little as 512B.

So if I were doing this myself, I think I would be starting at this point and 
if necessary would move towards further reusing the buffers we already have in 
the cache - since it is already a pool of them.  I would just be looking to 
smooth out the random distribution of sizes used with e.g. a handful of queues 
each containing a single size of buffer and at most a handful of items each.  
This feels like a simpler solution to me, particularly as it does not affect 
any other pool users.

However, I’m not doing the work (nor maybe reviewing it), so if you are willing 
to at least enable the behaviour only for the ChunkCache so this change cannot 
have any unintended negative effect for those users not expected to benefit, my 
main concern will be alleviated.



was (Author: benedict):
bq. In networking, most of the time, buffer will be release immediately after 
allocation and with recycleWhenFree=false, fully freed chunk will be reused 
instead of being recycled to global list. Partial-recycle is unlikely affect 
networking usage. I am happy to test it..

It is famously difficult to prove a negative, particularly via external 
testing.  It will be untrue in some circumstances, most notably large message 
processing (which happens asynchronously).  I would need to review the buffer 
control flow in messaging to confirm it is sufficiently low risk to modify the 
behaviour here, so I would prefer we not modify it in a way that is not easily 
verified.

bq. will it create fragmentation in system direct memory?

-Not easily completely ruled out, but given this data will be allocated mostly 
in its own virtual page space (given all allocations are much larger than a 
normal page), it hopefully shouldn't be an insurmountable problem for most 
allocators given the availability of almost unlimited virtual page space on 
modern systems.
-

edit: while this may be true, it's a bit of a stretch as I haven't looked at 
any modern allocator remotely recently, and I should not extrapolate in this 
way (however it's anyway probably not something to worry about if we're 
allocating relatively regular sizes)

bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline.

Did you perform the simple optimisation of rounding up to the >= 2KiB boundary 
(for equivalent behaviour), then re-using any buffer that is correctly sized 
when evicting to make room for a new item?  It might well be possible to make 
this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 
1KiB, or perhaps as little as 512B.

So if I were doing this myself, I think I would be starting at this point and 
if necessary would move towards further reusing the buffers we already have in 
the cache - since it is already a pool of them.  I would just be looking to 
smooth out the 

[jira] [Comment Edited] (CASSANDRA-15229) BufferPool Regression

2020-04-13 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082202#comment-17082202
 ] 

Benedict Elliott Smith edited comment on CASSANDRA-15229 at 4/14/20, 12:02 AM:
---

bq. In networking, most of the time, buffer will be release immediately after 
allocation and with recycleWhenFree=false, fully freed chunk will be reused 
instead of being recycled to global list. Partial-recycle is unlikely affect 
networking usage. I am happy to test it..

It is famously difficult to prove a negative, particularly via external 
testing.  It will be untrue in some circumstances, most notably large message 
processing (which happens asynchronously).  I would need to review the buffer 
control flow in messaging to confirm it is sufficiently low risk to modify the 
behaviour here, so I would prefer we not modify it in a way that is not easily 
verified.

bq. will it create fragmentation in system direct memory?

-Not easily completely ruled out, but given this data will be allocated mostly 
in its own virtual page space (given all allocations are much larger than a 
normal page), it hopefully shouldn't be an insurmountable problem for most 
allocators given the availability of almost unlimited virtual page space on 
modern systems.
-

edit: while this may be true, it's a bit of a stretch as I haven't looked at 
any modern allocator remotely recently, and I should not extrapolate in this 
way (however it's anyway probably not something to worry about if we're 
allocating relatively regular sizes)

bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline.

Did you perform the simple optimisation of rounding up to the >= 2KiB boundary 
(for equivalent behaviour), then re-using any buffer that is correctly sized 
when evicting to make room for a new item?  It might well be possible to make 
this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 
1KiB, or perhaps as little as 512B.

So if I were doing this myself, I think I would be starting at this point and 
if necessary would move towards further reusing the buffers we already have in 
the cache - since it is already a pool of them.  I would just be looking to 
smooth out the random distribution of sizes used with e.g. a handful of queues 
each containing a single size of buffer and at most a handful of items each.  
This feels like a simpler solution to me, particularly as it does not affect 
any other pool users.

However, I’m not doing the work (nor maybe reviewing it), so if you are willing 
to at least enable the behaviour only for the ChunkCache so this change cannot 
have any unintended negative effect for those users not expected to benefit, my 
main concern will be alleviated.



was (Author: benedict):
bq. In networking, most of the time, buffer will be release immediately after 
allocation and with recycleWhenFree=false, fully freed chunk will be reused 
instead of being recycled to global list. Partial-recycle is unlikely affect 
networking usage. I am happy to test it..

It is famously difficult to prove a negative, particularly via external 
testing.  It will be untrue in some circumstances, most notably large message 
processing (which happens asynchronously).  I would need to review the buffer 
control flow in messaging to confirm it is sufficiently low risk to modify the 
behaviour here, so I would prefer we not modify it in a way that is not easily 
verified.

bq. will it create fragmentation in system direct memory?

Not easily completely ruled out, but given this data will be allocated mostly 
in its own virtual page space (given all allocations are much larger than a 
normal page), it hopefully shouldn't be an insurmountable problem for most 
allocators given the availability of almost unlimited virtual page space on 
modern systems.

bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline.

Did you perform the simple optimisation of rounding up to the >= 2KiB boundary 
(for equivalent behaviour), then re-using any buffer that is correctly sized 
when evicting to make room for a new item?  It might well be possible to make 
this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 
1KiB, or perhaps as little as 512B.

So if I were doing this myself, I think I would be starting at this point and 
if necessary would move towards further reusing the buffers we already have in 
the cache - since it is already a pool of them.  I would just be looking to 
smooth out the random distribution of sizes used with e.g. a handful of queues 
each containing a single size of buffer and at most a handful of items each.  
This feels like a simpler solution to me, particularly as it does not affect 
any other pool users.

However, I’m not doing 

[jira] [Commented] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082750#comment-17082750
 ] 

Jordan West commented on CASSANDRA-15674:
-

Ah yes. Good catch [~benedict] 

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082749#comment-17082749
 ] 

Benedict Elliott Smith commented on CASSANDRA-15674:


I haven't looked closely at the rationale, but I'm at least a bit surprised at 
the mixed terminology.  We should probably stick to consistent nomenclature, of 
commit/abort, rather than introduce commit/rollback?

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15674:
--
Reviewers: Jordan West  (was: David Capwell, Jordan West)

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15674:
--
Status: In Progress  (was: Changes Suggested)

Filed CASSANDRA-15723

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15674:
--
Reviewers: Jordan West, David Capwell  (was: David Capwell, Jordan West)
   Jordan West, David Capwell  (was: Jordan West)
   Status: Review In Progress  (was: Patch Available)

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15674:
--
Status: Patch Available  (was: In Progress)

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15723) Add support in in-jvm dtest for JMX values

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15723:
--
Change Category: Quality Assurance
 Complexity: Normal
 Status: Open  (was: Triage Needed)

> Add support in in-jvm dtest for JMX values
> --
>
> Key: CASSANDRA-15723
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15723
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: David Capwell
>Priority: Normal
>
> There are several tests which need to use callOnInstance to extract a metric 
> value, this makes it so the tests are specific to the current version and can 
> not be extracted out.  To help make these tests more generic, we should add 
> support for JMX values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15723) Add support in in-jvm dtest for JMX values

2020-04-13 Thread David Capwell (Jira)
David Capwell created CASSANDRA-15723:
-

 Summary: Add support in in-jvm dtest for JMX values
 Key: CASSANDRA-15723
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15723
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/dtest
Reporter: David Capwell


There are several tests which need to use callOnInstance to extract a metric 
value, this makes it so the tests are specific to the current version and can 
not be extracted out.  To help make these tests more generic, we should add 
support for JMX values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15720) Simplify the documentation

2020-04-13 Thread C. Scott Andreas (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

C. Scott Andreas updated CASSANDRA-15720:
-
Summary: Simplify the documentation  (was: Fix the documentation)

> Simplify the documentation
> --
>
> Key: CASSANDRA-15720
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15720
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: nikhil karnik
>Priority: Normal
>
> Can you fix the documentation, 
> [https://github.com/apache/cassandra/blob/trunk/doc/source/new/streaming.rst]
> In the zero copying section, it says "Pre-4.0, during streaming Cassandra 
> reifies the SSTables into objects." 
> The word reifies makes no sense. Its not easy to understand and as end user 
> if you can update it with some simple English would help. 
> I think it "It serializes the data, the meta data needs to be re-built on the 
> receiving node" but a better and simple explanation would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15722) Attach cluster type to the error message in cassandra-diff

2020-04-13 Thread Yifan Cai (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yifan Cai updated CASSANDRA-15722:
--
Change Category: Operability
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Attach cluster type to the error message in cassandra-diff
> --
>
> Key: CASSANDRA-15722
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15722
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tool/diff
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>
> The exceptions get logged in the cassandra-diff do not contain cluster type. 
> It is hard to tell whether the exceptions are caused by requesting the SOURCE 
> or TARGET cluster. 
> The error message could be more informative by including the cluster type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15722) Attach cluster type to the error message in cassandra-diff

2020-04-13 Thread Yifan Cai (Jira)
Yifan Cai created CASSANDRA-15722:
-

 Summary: Attach cluster type to the error message in cassandra-diff
 Key: CASSANDRA-15722
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15722
 Project: Cassandra
  Issue Type: Improvement
  Components: Tool/diff
Reporter: Yifan Cai
Assignee: Yifan Cai


The exceptions get logged in the cassandra-diff do not contain cluster type. 
It is hard to tell whether the exceptions are caused by requesting the SOURCE 
or TARGET cluster. 
The error message could be more informative by including the cluster type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082696#comment-17082696
 ] 

Jordan West commented on CASSANDRA-15674:
-

Thanks for the explanations. I'm +1 on the patch. Moving the TODO to a ticket 
where it can be tracked wfm. 

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082690#comment-17082690
 ] 

David Capwell commented on CASSANDRA-15674:
---

bq. why you chose the hooks that were added vs using the existing 
onCommit/onAbort API with a delegate or sub-class implementation used only for 
IndexSummaryRedistribution.

IndexSummaryRedistribution mutates disk in-place but relies on the metrics to 
be updated via transactions.  Right now the commit case works since 
IndexSummaryRedistribution will delete the current size (before the mutation) 
so the readd will update the size.  In the abort case we need to know the 
deltas (can only compute right after mutating), so would need a list of deltas 
to apply in onAbort.  This logic is very specific to IndexSummaryRedistribution 
and would rely on extending LifecycleTransaction to apply this list. I felt 
that commit/abort hooks were a generalization of this and could be used if 
other use cases needed.

bq. Pending counter: can you say a bit more about how you would see an operator 
using this metric?

PendingSSTableReleases was mostly added for tests to know when everything is 
fully released (couldn't find any other way).  Its also the reason 
liveDiskSpaceUsed !=totalDiskSpaceUsed, so it also exposes to operators that 
things line up (if they are not equal then pending should be > 0; else there is 
a bug).

bq. consider renaming Listener#onPreSample to Listener#beforeResample

Done.

bq. IndexSummaryRedistribution constructor should be marked @VisibleForTest

Done

bq. DiskSpaceMetricsTest: Is the TODO on L125 still valid or can it be removed? 
Looks like the latter

There is desire to extract the jvm dtest tests out so they can run against any 
version.  This TODO is mostly commenting that the test could not be written 
generic to versions since it depends on JMX values which are not implemented in 
jvm dtests.  

I can file a ticket and remove the TODO.


> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15674) liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if IndexSummaryRedistribution gets interrupted

2020-04-13 Thread Jordan West (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jordan West updated CASSANDRA-15674:

Status: Changes Suggested  (was: Review In Progress)

Thanks [~dcapwell]. The primary question I had is why you chose the hooks that 
were added vs using the existing onCommit/onAbort API with a delegate or 
sub-class implementation used only for IndexSummaryRedistribution. On the 
surface that would limit the scope of the change but may actually be more 
invasive? Anyways, wanted to get your input and maybe [~benedict]'s as well as 
the original API author. 

Otherwise, my other comments are all minor:
* Pending counter: can you say a bit more about how you would see an operator 
using this metric? 
* consider renaming Listener#onPreSample to Listener#beforeResample
* The 4-arity version of the IndexSummaryRedistribution constructor should be 
marked @VisibleForTest (and maybe the Listener too? — or at least some comment 
that for now it only exists for tests)
* DiskSpaceMetricsTest: Is the TODO on L125 still valid or can it be removed? 
Looks like the latter

> liveDiskSpaceUsed and totalDiskSpaceUsed get corrupted if 
> IndexSummaryRedistribution gets interrupted
> -
>
> Key: CASSANDRA-15674
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15674
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Compaction, Observability/Metrics
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> IndexSummaryRedistribution is a compaction task and as such extends Holder 
> and supports cancelation by throwing a CompactionInterruptedException.  The 
> issue is that IndexSummaryRedistribution tries to use transactions, but 
> mutates the sstable in-place; transaction is unable to roll back.
> This would be fine (only updates summary) if it wasn’t for the fact the task 
> attempts to also mutate the two metrics liveDiskSpaceUsed and 
> totalDiskSpaceUsed, since these can’t be rolled back any cancelation could 
> corrupt these metrics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Description: 
Sorry for the slightly irrelevant post. This is not an issue with Cassandra but 
possibly with the interaction between Cassandra and Kubernetes.

We experienced a performance degradation when running a single Cassandra 
instance inside kubeadm 1.14 in comparison with running the Docker container 
stand-alone.
 A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:

 

{{ cqlsh> create keyspace ycsb
WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
;
cqlsh> USE ycsb;
cqlsh> create table usertable (
y_id varchar primary key,
field0 varchar,
field1 varchar,
field2 varchar,
field3 varchar,
field4 varchar,
field5 varchar,
field6 varchar,
field7 varchar,
field8 varchar,
field9 varchar);}}

And using the following script:

 

{{python ./bin/ycsb load cassandra2-cql -P workloads/workloada -p 
recordcount=150 -p 
operationcount=150 -p measurementtype=raw -p 
cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
sleep 15}}

We used the following image: {{decomads/cassandra:2.2.16}}, which uses the 
official {{cassandra:2.2.16}} as base image and adds a readinessProbe to it.

We used identical Docker configuration parameters by ensuring that the output 
of {{docker inspect}} is as much as possible the same. First we got the YCSB 
benchmark in a container that is co-located with the cassandra container in one 
pod. Kubernetes starts these containers then with network mode 
{{net=container:...}} This is a separate container that links up the ycsb and 
cassandra containers within the same network space so they can talk via 
localhost. By this we hope to avoid network plugin interference from the CNI 
plugin.

We ran the docker-only container within the Kubernetes node using the default 
bridge network

We first performed the experiment on an Openstack VM Ubuntu 16:04 (4GB, 4 CPU 
cores, 50GB), that runs on a physical nodes with 16 CPU cores. Storage is Ceph 
however and therefore distributed

To avoid distributed storage of ceph, we repeated the experiment also on 
minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on a Windows 10 laptop with 4 
cores/8 logical processors and 16GB RAM. However the same performance 
degradation was measured.

Observations (On Ubuntu-OpenStack)
 * Docker:
 ** Mean average response latency YCSB benchmark: 1,5 ms-1.7ms
 * Kubernetes
 ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
 * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
position paper: [https://lirias.kuleuven.be/2788169?limo=0]):

Possible causes:
 * Network overhead of virtual bridge in Kubernetes is not the cause of the 
problem in our opinion.
 ** We repeated the experiment where we ran the Docker-Only containers inside a 
Kubernetes node and we linked the containers using the --net=container: mode 
mechanisms as similar as possible as we could. The YCSB latency stayed the same.
 * Disk/io bottleneck: Nodetool tablestats are very similar. Cassandra 
containers are configured to write data to a filesystem that is mounted from 
the host inside the container. Exactly the same Docker mount type is used
 ** Write latency is very stable over multiple runs
 * Kubernetes for ycsb user table: 0.0167 ms.
 * Write latency Docker for ycsb usertable: 0.0150 ms.
 ** Compaction_history/compaction_in_progress is also very similar (see 
attached files)

)

Do you know of any other causes that might explain the difference in reported 
YCSB reponse latency? Could it be the the Cassandra Session is closed by 
Kubernetes after each request?  How can I diagnose this?

 

  was:
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra 
instance  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  First we got the ycsb 
benchmark in a container that is co-located with the cassandra container in one 
pod.  Kubernetes starts these containers then with network mode 
"net=container:... This is a  separate container that link up the ycsb and 
cassandra containers within the same network space so they can talk via 
localhost – by this we hope to avoid network plugin interference from the CNI 
plugin.

We ran the docker-only 

[jira] [Created] (CASSANDRA-15721) any open source management tool for appache cassandra

2020-04-13 Thread Chirantan (Jira)
Chirantan created CASSANDRA-15721:
-

 Summary: any open source management tool for appache cassandra 
 Key: CASSANDRA-15721
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15721
 Project: Cassandra
  Issue Type: Task
Reporter: Chirantan


pls recomend any open source management tool for appache cassandra  similare to 
ops center 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Description: 
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra 
instance  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  First we got the ycsb 
benchmark in a container that is co-located with the cassandra container in one 
pod.  Kubernetes starts these containers then with network mode 
"net=container:... This is a  separate container that link up the ycsb and 
cassandra containers within the same network space so they can talk via 
localhost – by this we hope to avoid network plugin interference from the CNI 
plugin.

We ran the docker-only container within the Kubernetes node using the default 
bridge network

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
 sleep 15

Observations (On Ubuntu-OpenStack)
 * Docker:
 ** Mean average  response latency YCSB benchmark: 1,5 ms-1.7ms
 * Kubernetes
 ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
 * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
position paper: [https://lirias.kuleuven.be/2788169?limo=0)]: 

Possible  causes:
 * Network overhead of virtual bridge in container orchestrator is not the 
cause of the problem in our opinion
 ** We repeated the experiment where we ran the Docker-Only containers inside a 
Kubernetes node and we linked the containers using the --net=container: mode 
mechanisms as similar as possible as we could. The YCSB latency stayed the same.
 * Disk/io bottleneck: Nodetool tablestats are very similar
 ** Cassandra containers are configured to write data to a filesystem that is 
mounted from the host inside the container. Exactly the same Docker mount type 
is used
 ** Write latency is very stable over multiple runs
 *** Kubernetes for ycsb user table: 0.0167 ms.
 *** Write latency Docker for ycsb usertable: 0.0150 ms.
 ** Compaction_history/compaction_in_progress is also very similar (as opposed 
to earlier versions of the issue – sorry for the confusion!)

Do you know of any other causes that might explain the difference in reported 
YCSB reponse latency?

 

     

 

 

 

 

  was:
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  First we got the ycsb 
benchmark in a container that is co-located with the cassandra container in one 
pod.  Kubernetes starts these containers then with network mode 
"net=container:... This is a  separate container that link up the ycsb and 
cassandra containers within the same network space so they can talk via 
localhost – by this we hope to avoid network plugin interference from the CNI 
plugin.

We ran the docker-only container within the Kubernetes node using the default 
bridge network

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * 

[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Attachment: nodetool-compaction-history-docker-cassandra.txt

> Benchmark performance difference between Docker and Kubernetes when running 
> Cassandra:2.2.16 official Docker image
> --
>
> Key: CASSANDRA-15717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Eddy Truyen
>Priority: Normal
> Attachments: nodetool-compaction-history-docker-cassandra.txt, 
> nodetool-compaction-history-kubeadm-cassandra.txt
>
>
> This is my first JIRA issue. Sorry if I do something  wrong in the reporting.
> I experienced a performance degradation when running a single Cassandra 
> Docker container  inside Kubernetes in comparison with running the Docker 
> container stand-alone. I used the following image decomads/cassandra:2.2.16, 
> which uses cassandra:2.2.16 as base image and adds a readinessProbe to it.
> I used identical Docker configuration parameters by ensuring that the output 
> of docker inspect is as much as possible the same.  First we got the ycsb 
> benchmark in a container that is co-located with the cassandra container in 
> one pod.  Kubernetes starts these containers then with network mode 
> "net=container:... This is a  separate container that link up the ycsb and 
> cassandra containers within the same network space so they can talk via 
> localhost – by this we hope to avoid network plugin interference from the CNI 
> plugin.
> We ran the docker-only container within the Kubernetes node using the default 
> bridge network
>  Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
> physical laptop with 4 cores/8 logical processors and 16GB RAM on and 
> Openstack VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical 
> nodes with 16 CPU cores. Storage is Ceph.
>  * A write-only workload (YCSB benchmark workload A - Load phase) using the 
> following user table:
>  cqlsh> create keyspace ycsb
>  WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
>  ;
>  cqlsh> USE ycsb;
>  cqlsh> create table usertable (
>  y_id varchar primary key,
>  field0 varchar,
>  field1 varchar,
>  field2 varchar,
>  field3 varchar,
>  field4 varchar,
>  field5 varchar,
>  field6 varchar,
>  field7 varchar,
>  field8 varchar,
>  field9 varchar);
>  * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
> workloads/workloada -p recordcount=150 -p operationcount=150 -p 
> measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
> cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost 
> > 
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
>  sleep 15
> Observations (On Ubuntu-OpenStack)
>  * Docker:
>  ** Mean average  response latency YCSB benchmark: 1,5 ms-1.7ms
>  * Kubernetes
>  ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
>  * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
> position paper: [https://lirias.kuleuven.be/2788169?limo=0)]:  
> Possible  causes:
>  * Network overhead of virtual bridge in container orchestrator is not the 
> cause of the problem in our opinion
>  ** We repeated the experiment where we ran the Docker-Only containers inside 
> a Kubernetes node and we linked the containers using the --net=container: 
> mode mechanisms as similar as possible as we could. The YCSB latency stayed 
> the same.
>  * Disk/io bottleneck: Nodetool tablestats are very similar
>  ** Write latency is very stable over multiple runs
>  *** Kubernetes for ycsb user table: 0.0167 ms.
>  *** Write latency Docker for ycsb usertable: 0.0150 ms.
>  ** Compaction_history/compaction_in_progress is also very similar (as 
> opposed to earlier versions of the issue – sorry for the confusion!)
>  * 
>      
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Attachment: (was: nodetool-compaction-history-docker-cassandra.txt)

> Benchmark performance difference between Docker and Kubernetes when running 
> Cassandra:2.2.16 official Docker image
> --
>
> Key: CASSANDRA-15717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Eddy Truyen
>Priority: Normal
> Attachments: nodetool-compaction-history-docker-cassandra.txt, 
> nodetool-compaction-history-kubeadm-cassandra.txt
>
>
> This is my first JIRA issue. Sorry if I do something  wrong in the reporting.
> I experienced a performance degradation when running a single Cassandra 
> Docker container  inside Kubernetes in comparison with running the Docker 
> container stand-alone. I used the following image decomads/cassandra:2.2.16, 
> which uses cassandra:2.2.16 as base image and adds a readinessProbe to it.
> I used identical Docker configuration parameters by ensuring that the output 
> of docker inspect is as much as possible the same.  First we got the ycsb 
> benchmark in a container that is co-located with the cassandra container in 
> one pod.  Kubernetes starts these containers then with network mode 
> "net=container:... This is a  separate container that link up the ycsb and 
> cassandra containers within the same network space so they can talk via 
> localhost – by this we hope to avoid network plugin interference from the CNI 
> plugin.
> We ran the docker-only container within the Kubernetes node using the default 
> bridge network
>  Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
> physical laptop with 4 cores/8 logical processors and 16GB RAM on and 
> Openstack VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical 
> nodes with 16 CPU cores. Storage is Ceph.
>  * A write-only workload (YCSB benchmark workload A - Load phase) using the 
> following user table:
>  cqlsh> create keyspace ycsb
>  WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
>  ;
>  cqlsh> USE ycsb;
>  cqlsh> create table usertable (
>  y_id varchar primary key,
>  field0 varchar,
>  field1 varchar,
>  field2 varchar,
>  field3 varchar,
>  field4 varchar,
>  field5 varchar,
>  field6 varchar,
>  field7 varchar,
>  field8 varchar,
>  field9 varchar);
>  * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
> workloads/workloada -p recordcount=150 -p operationcount=150 -p 
> measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
> cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost 
> > 
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
>  sleep 15
> Observations (On Ubuntu-OpenStack)
>  * Docker:
>  ** Mean average  response latency YCSB benchmark: 1,5 ms-1.7ms
>  * Kubernetes
>  ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
>  * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
> position paper: [https://lirias.kuleuven.be/2788169?limo=0)]:  
> Possible  causes:
>  * Network overhead of virtual bridge in container orchestrator is not the 
> cause of the problem in our opinion
>  ** We repeated the experiment where we ran the Docker-Only containers inside 
> a Kubernetes node and we linked the containers using the --net=container: 
> mode mechanisms as similar as possible as we could. The YCSB latency stayed 
> the same.
>  * Disk/io bottleneck: Nodetool tablestats are very similar
>  ** Write latency is very stable over multiple runs
>  *** Kubernetes for ycsb user table: 0.0167 ms.
>  *** Write latency Docker for ycsb usertable: 0.0150 ms.
>  ** Compaction_history/compaction_in_progress is also very similar (as 
> opposed to earlier versions of the issue – sorry for the confusion!)
>  * 
>      
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Attachment: nodetool-compaction-history-kubeadm-cassandra.txt
nodetool-compaction-history-docker-cassandra.txt

> Benchmark performance difference between Docker and Kubernetes when running 
> Cassandra:2.2.16 official Docker image
> --
>
> Key: CASSANDRA-15717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Eddy Truyen
>Priority: Normal
> Attachments: nodetool-compaction-history-docker-cassandra.txt, 
> nodetool-compaction-history-kubeadm-cassandra.txt
>
>
> This is my first JIRA issue. Sorry if I do something  wrong in the reporting.
> I experienced a performance degradation when running a single Cassandra 
> Docker container  inside Kubernetes in comparison with running the Docker 
> container stand-alone. I used the following image decomads/cassandra:2.2.16, 
> which uses cassandra:2.2.16 as base image and adds a readinessProbe to it.
> I used identical Docker configuration parameters by ensuring that the output 
> of docker inspect is as much as possible the same.  First we got the ycsb 
> benchmark in a container that is co-located with the cassandra container in 
> one pod.  Kubernetes starts these containers then with network mode 
> "net=container:... This is a  separate container that link up the ycsb and 
> cassandra containers within the same network space so they can talk via 
> localhost – by this we hope to avoid network plugin interference from the CNI 
> plugin.
> We ran the docker-only container within the Kubernetes node using the default 
> bridge network
>  Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
> physical laptop with 4 cores/8 logical processors and 16GB RAM on and 
> Openstack VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical 
> nodes with 16 CPU cores. Storage is Ceph.
>  * A write-only workload (YCSB benchmark workload A - Load phase) using the 
> following user table:
>  cqlsh> create keyspace ycsb
>  WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
>  ;
>  cqlsh> USE ycsb;
>  cqlsh> create table usertable (
>  y_id varchar primary key,
>  field0 varchar,
>  field1 varchar,
>  field2 varchar,
>  field3 varchar,
>  field4 varchar,
>  field5 varchar,
>  field6 varchar,
>  field7 varchar,
>  field8 varchar,
>  field9 varchar);
>  * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
> workloads/workloada -p recordcount=150 -p operationcount=150 -p 
> measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
> cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost 
> > 
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
>  sleep 15
> Observations (On Ubuntu-OpenStack)
>  * Docker:
>  ** Mean average  response latency YCSB benchmark: 1,5 ms-1.7ms
>  * Kubernetes
>  ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
>  * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
> position paper: [https://lirias.kuleuven.be/2788169?limo=0)]:  
> Possible  causes:
>  * Network overhead of virtual bridge in container orchestrator is not the 
> cause of the problem in our opinion
>  ** We repeated the experiment where we ran the Docker-Only containers inside 
> a Kubernetes node and we linked the containers using the --net=container: 
> mode mechanisms as similar as possible as we could. The YCSB latency stayed 
> the same.
>  * Disk/io bottleneck: Nodetool tablestats are very similar
>  ** Write latency is very stable over multiple runs
>  *** Kubernetes for ycsb user table: 0.0167 ms.
>  *** Write latency Docker for ycsb usertable: 0.0150 ms.
>  ** Compaction_history/compaction_in_progress is also very similar (as 
> opposed to earlier versions of the issue – sorry for the confusion!)
>  * 
>      
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Attachment: (was: kube-adm-cassandra-nodetool-tablestats)

> Benchmark performance difference between Docker and Kubernetes when running 
> Cassandra:2.2.16 official Docker image
> --
>
> Key: CASSANDRA-15717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Eddy Truyen
>Priority: Normal
>
> This is my first JIRA issue. Sorry if I do something  wrong in the reporting.
> I experienced a performance degradation when running a single Cassandra 
> Docker container  inside Kubernetes in comparison with running the Docker 
> container stand-alone. I used the following image decomads/cassandra:2.2.16, 
> which uses cassandra:2.2.16 as base image and adds a readinessProbe to it.
> I used identical Docker configuration parameters by ensuring that the output 
> of docker inspect is as much as possible the same.  First we got the ycsb 
> benchmark in a container that is co-located with the cassandra container in 
> one pod.  Kubernetes starts these containers then with network mode 
> "net=container:... This is a  separate container that link up the ycsb and 
> cassandra containers within the same network space so they can talk via 
> localhost – by this we hope to avoid network plugin interference from the CNI 
> plugin.
> We ran the docker-only container within the Kubernetes node using the default 
> bridge network
>  Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
> physical laptop with 4 cores/8 logical processors and 16GB RAM on and 
> Openstack VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical 
> nodes with 16 CPU cores. Storage is Ceph.
>  * A write-only workload (YCSB benchmark workload A - Load phase) using the 
> following user table:
>  cqlsh> create keyspace ycsb
>  WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
>  ;
>  cqlsh> USE ycsb;
>  cqlsh> create table usertable (
>  y_id varchar primary key,
>  field0 varchar,
>  field1 varchar,
>  field2 varchar,
>  field3 varchar,
>  field4 varchar,
>  field5 varchar,
>  field6 varchar,
>  field7 varchar,
>  field8 varchar,
>  field9 varchar);
>  * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
> workloads/workloada -p recordcount=150 -p operationcount=150 -p 
> measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
> cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost 
> > 
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
>  sleep 15
> Observations (On Ubuntu-OpenStack)
>  * Docker:
>  ** Mean average  response latency YCSB benchmark: 1,5 ms-1.7ms
>  * Kubernetes
>  ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
>  * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
> position paper: [https://lirias.kuleuven.be/2788169?limo=0)]:  
> Possible  causes:
>  * Network overhead of virtual bridge in container orchestrator is not the 
> cause of the problem in our opinion
>  ** We repeated the experiment where we ran the Docker-Only containers inside 
> a Kubernetes node and we linked the containers using the --net=container: 
> mode mechanisms as similar as possible as we could. The YCSB latency stayed 
> the same.
>  * Disk/io bottleneck: Nodetool tablestats are very similar
>  ** Write latency is very stable over multiple runs
>  *** Kubernetes for ycsb user table: 0.0167 ms.
>  *** Write latency Docker for ycsb usertable: 0.0150 ms.
>  ** Compaction_history/compaction_in_progress is also very similar (as 
> opposed to earlier versions of the issue – sorry for the confusion!)
>  * 
>      
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Description: 
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  First we got the ycsb 
benchmark in a container that is co-located with the cassandra container in one 
pod.  Kubernetes starts these containers then with network mode 
"net=container:... This is a  separate container that link up the ycsb and 
cassandra containers within the same network space so they can talk via 
localhost – by this we hope to avoid network plugin interference from the CNI 
plugin.

We ran the docker-only container within the Kubernetes node using the default 
bridge network

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
 sleep 15

Observations (On Ubuntu-OpenStack)
 * Docker:
 ** Mean average  response latency YCSB benchmark: 1,5 ms-1.7ms
 * Kubernetes
 ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
 * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
position paper: [https://lirias.kuleuven.be/2788169?limo=0)]:  

Possible  causes:
 * Network overhead of virtual bridge in container orchestrator is not the 
cause of the problem in our opinion
 ** We repeated the experiment where we ran the Docker-Only containers inside a 
Kubernetes node and we linked the containers using the --net=container: mode 
mechanisms as similar as possible as we could. The YCSB latency stayed the same.
 * Disk/io bottleneck: Nodetool tablestats are very similar
 ** Write latency is very stable over multiple runs
 *** Kubernetes for ycsb user table: 0.0167 ms.
 *** Write latency Docker for ycsb usertable: 0.0150 ms.
 ** Compaction_history/compaction_in_progress is also very similar (as opposed 
to earlier versions of the issue – sorry for the confusion!)
 * 

     

 

 

 

 

  was:
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  Docker runs in bridged mode

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p 

[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Attachment: (was: docker-cassandra-nodetool-tablestats)

> Benchmark performance difference between Docker and Kubernetes when running 
> Cassandra:2.2.16 official Docker image
> --
>
> Key: CASSANDRA-15717
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/benchmark
>Reporter: Eddy Truyen
>Priority: Normal
>
> This is my first JIRA issue. Sorry if I do something  wrong in the reporting.
> I experienced a performance degradation when running a single Cassandra 
> Docker container  inside Kubernetes in comparison with running the Docker 
> container stand-alone. I used the following image decomads/cassandra:2.2.16, 
> which uses cassandra:2.2.16 as base image and adds a readinessProbe to it.
> I used identical Docker configuration parameters by ensuring that the output 
> of docker inspect is as much as possible the same.  First we got the ycsb 
> benchmark in a container that is co-located with the cassandra container in 
> one pod.  Kubernetes starts these containers then with network mode 
> "net=container:... This is a  separate container that link up the ycsb and 
> cassandra containers within the same network space so they can talk via 
> localhost – by this we hope to avoid network plugin interference from the CNI 
> plugin.
> We ran the docker-only container within the Kubernetes node using the default 
> bridge network
>  Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
> physical laptop with 4 cores/8 logical processors and 16GB RAM on and 
> Openstack VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical 
> nodes with 16 CPU cores. Storage is Ceph.
>  * A write-only workload (YCSB benchmark workload A - Load phase) using the 
> following user table:
>  cqlsh> create keyspace ycsb
>  WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
>  ;
>  cqlsh> USE ycsb;
>  cqlsh> create table usertable (
>  y_id varchar primary key,
>  field0 varchar,
>  field1 varchar,
>  field2 varchar,
>  field3 varchar,
>  field4 varchar,
>  field5 varchar,
>  field6 varchar,
>  field7 varchar,
>  field8 varchar,
>  field9 varchar);
>  * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
> workloads/workloada -p recordcount=150 -p operationcount=150 -p 
> measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
> cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost 
> > 
> results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
>  sleep 15
> Observations (On Ubuntu-OpenStack)
>  * Docker:
>  ** Mean average  response latency YCSB benchmark: 1,5 ms-1.7ms
>  * Kubernetes
>  ** Mean average response latency YCSB benchmark: 2.7 ms-3ms
>  * CPU usage of the Cassandra Daemon JVM is way lower than Kubernetes (see my 
> position paper: [https://lirias.kuleuven.be/2788169?limo=0)]:  
> Possible  causes:
>  * Network overhead of virtual bridge in container orchestrator is not the 
> cause of the problem in our opinion
>  ** We repeated the experiment where we ran the Docker-Only containers inside 
> a Kubernetes node and we linked the containers using the --net=container: 
> mode mechanisms as similar as possible as we could. The YCSB latency stayed 
> the same.
>  * Disk/io bottleneck: Nodetool tablestats are very similar
>  ** Write latency is very stable over multiple runs
>  *** Kubernetes for ycsb user table: 0.0167 ms.
>  *** Write latency Docker for ycsb usertable: 0.0150 ms.
>  ** Compaction_history/compaction_in_progress is also very similar (as 
> opposed to earlier versions of the issue – sorry for the confusion!)
>  * 
>      
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15720) Fix the documentation

2020-04-13 Thread nikhil karnik (Jira)
nikhil karnik created CASSANDRA-15720:
-

 Summary: Fix the documentation
 Key: CASSANDRA-15720
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15720
 Project: Cassandra
  Issue Type: Improvement
Reporter: nikhil karnik


Can you fix the documentation, 
[https://github.com/apache/cassandra/blob/trunk/doc/source/new/streaming.rst]

In the zero copying section, it says "Pre-4.0, during streaming Cassandra 
reifies the SSTables into objects." 

The word reifies makes no sense. Its not easy to understand and as end user if 
you can update it with some simple English would help. 

I think it "It serializes the data, the meta data needs to be re-built on the 
receiving node" but a better and simple explanation would help.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Description: 
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  Docker runs in bridged mode

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
 sleep 15

Observations (On Ubuntu-OpenStack)
 * Docker:
 ** Mean average  response latency: 1500 us
 * Kubernetes
 ** Mean average response latency: 2700 us

Observation (On minikube): CPU usage of the CassandraDaemon JVM is 
significantly lower on Kubernetes (Sorry no precise statistics, but at least a 
difference of 10%)

 

 

 

  was:
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  Docker runs in bridged mode

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
 sleep 15

Observations (On Ubuntu-OpenStack)
 * Docker:
 ** Mean average  response latency: 1500 us
 * Kubernetes
 ** Mean average response latency: 2700 us
 ** Average CPU usage of cassandra instance (wrt 2 cores):  
 * Nodetool tablestats
 ** There are little difference for the usertable, with an almost identical  
write latency (difference < 0.002 ms).
 ** However for the system keyspace there are quite some differences in 
read/write count and read latency (difference = 2.5 ms). More specifically,  
compaction history (see attachment the 2 tablestats output)
 *** Table: compactions_in_progress Kubernetes
 SSTable count: 2
 Space used (live): 9778
 Space used (total): 9778
 Space used by snapshots (total): 0
 Off heap memory used (total): 88
 SSTable Compression Ratio: 0.86805556
 Number of keys (estimate): 1
 Memtable cell count: 0
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 26
 Local read count: 0
 Local read latency: NaN ms
 Local write count: 26
 Local write latency: NaN ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 

[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Description: 
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  Docker runs in bridged mode

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
 sleep 15

Observations (On Ubuntu-OpenStack)
 * Docker:
 ** Mean average  response latency: 1500 us
 * Kubernetes
 ** Mean average response latency: 2700 us
 ** Average CPU usage of cassandra instance (wrt 2 cores):  
 * Nodetool tablestats
 ** There are little difference for the usertable, with an almost identical  
write latency (difference < 0.002 ms).
 ** However for the system keyspace there are quite some differences in 
read/write count and read latency (difference = 2.5 ms). More specifically,  
compaction history (see attachment the 2 tablestats output)
 *** Table: compactions_in_progress Kubernetes
 SSTable count: 2
 Space used (live): 9778
 Space used (total): 9778
 Space used by snapshots (total): 0
 Off heap memory used (total): 88
 SSTable Compression Ratio: 0.86805556
 Number of keys (estimate): 1
 Memtable cell count: 0
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 26
 Local read count: 0
 Local read latency: NaN ms
 Local write count: 26
 Local write latency: NaN ms
 Pending flushes: 0
 Bloom filter false positives: 0
 Bloom filter false ratio: 0.0
 Bloom filter space used: 32
 Bloom filter off heap memory used: 16
 Index summary off heap memory used: 56
 Compression metadata off heap memory used: 16
 Compacted partition minimum bytes: 30
 Compacted partition maximum bytes: 310
 Compacted partition mean bytes: 172
 Average live cells per slice (last five minutes): NaN
 Maximum live cells per slice (last five minutes): 0
 Average tombstones per slice (last five minutes): NaN
 Maximum tombstones per slice (last five minutes): 0
 SSTable count: 1
 Space used (live): 8921
 Space used (total): 8921
 Space used by snapshots (total): 0
 Off heap memory used (total): 76
 SSTable Compression Ratio: 0.2603359822955437
 Number of keys (estimate): 25
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 1
 *** 
 * Cassandra Logs:s
 * 
|Docker|Kubernetes|
|ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compaction' docker-cassandra-logs
 27
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-size' 
docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c ' Writing 
Memtable-sstable' docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-usertable' docker-cassandra-logs
 45
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compactions_in_progress' docker-cassandra-logs
 26
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-peers' 
docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-schema' docker-cassandra-logs
 24
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-local' 
docker-cassandra-logs
 6|ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-compaction' kubeadm-cassandra-logs \| more
 32
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-size' 
kubeadm-cassandra-logs \| more
 7
 

[jira] [Updated] (CASSANDRA-15713) InstanceClassLoader fails to load with the following previously initiated loading for a different type with name "org/w3c/dom/Document"

2020-04-13 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15713:
-
Status: Ready to Commit  (was: Review In Progress)

+1

> InstanceClassLoader fails to load with the following previously initiated 
> loading for a different type with name "org/w3c/dom/Document"
> ---
>
> Key: CASSANDRA-15713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> java.lang.LinkageError: loader constraint violation: loader (instance of 
> org/apache/cassandra/distributed/shared/InstanceClassLoader) previously 
> initiated loading for a different type with name "org/w3c/dom/Document”
> This is caused when using dtest outside of the normal Cassandra context.  
> There is no API to add more exclusions so unable to work around this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15713) InstanceClassLoader fails to load with the following previously initiated loading for a different type with name "org/w3c/dom/Document"

2020-04-13 Thread Jon Meredith (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Meredith updated CASSANDRA-15713:
-
Reviewers: Jon Meredith, Jon Meredith  (was: Jon Meredith)
   Jon Meredith, Jon Meredith
   Status: Review In Progress  (was: Patch Available)

+1, tested this mechanism on CASSANDRA-15714 instead of hard-coding 
{{org.w3c.dom}} and it worked nicely.

> InstanceClassLoader fails to load with the following previously initiated 
> loading for a different type with name "org/w3c/dom/Document"
> ---
>
> Key: CASSANDRA-15713
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15713
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> java.lang.LinkageError: loader constraint violation: loader (instance of 
> org/apache/cassandra/distributed/shared/InstanceClassLoader) previously 
> initiated loading for a different type with name "org/w3c/dom/Document”
> This is caused when using dtest outside of the normal Cassandra context.  
> There is no API to add more exclusions so unable to work around this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15712) Introduce MIDRES config in CircleCI

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15712:
--
Change Category: Quality Assurance
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Introduce MIDRES config in CircleCI
> ---
>
> Key: CASSANDRA-15712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15712
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest, Test/unit
>Reporter: Kevin Gallardo
>Priority: Normal
>
> From document: 
> [https://gist.github.com/newkek/bb79dccbe7d2f5e41b2a3daac3858fde]
> We have identified that the current HIGHRES configuration seems to require 
> resources that might not bring the best cost efficiency to the build.
> We have also identified several "good compromise" configurations that allow 
> to get decent performance out of the test suites, without going all out on 
> the big config.
> It seems it would be useful for a lot of people to have this available as a 
> {{config.yml.MIDRES}} configuration in the {{.circleci}} folder. This way we 
> do not need to argue on modifying the {{HIGHRES}} configuration so as to not 
> impact the people already using it, but can still have easy access the 
> "compromise" config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15712) Introduce MIDRES config in CircleCI

2020-04-13 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15712?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082521#comment-17082521
 ] 

David Capwell commented on CASSANDRA-15712:
---

Looking at https://circleci.com/pricing/ I think the MIDRES would still be for 
paid accounts (largest free size is Medium)?  The main benefit is 1/2 to 1/4 
credits used?  Sounds reasonable.  

* Based off your doc, r=8 for unit still has the issue that current tests are 
not safe shared.  For this ticket I think r=1 is best, but would be good to 
improve our isolation so tests could run concurrently.  
* Python dtest with p=25, I=large seems reasonable, this would be 1/4th the 
credits used

Would this be maintained by another diff file?

> Introduce MIDRES config in CircleCI
> ---
>
> Key: CASSANDRA-15712
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15712
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest, Test/unit
>Reporter: Kevin Gallardo
>Priority: Normal
>
> From document: 
> [https://gist.github.com/newkek/bb79dccbe7d2f5e41b2a3daac3858fde]
> We have identified that the current HIGHRES configuration seems to require 
> resources that might not bring the best cost efficiency to the build.
> We have also identified several "good compromise" configurations that allow 
> to get decent performance out of the test suites, without going all out on 
> the big config.
> It seems it would be useful for a lot of people to have this available as a 
> {{config.yml.MIDRES}} configuration in the {{.circleci}} folder. This way we 
> do not need to argue on modifying the {{HIGHRES}} configuration so as to not 
> impact the people already using it, but can still have easy access the 
> "compromise" config.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15719) The primary range repair should be run on every datacenter

2020-04-13 Thread Jane Deng (Jira)
Jane Deng created CASSANDRA-15719:
-

 Summary: The primary range repair should be run on every datacenter
 Key: CASSANDRA-15719
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15719
 Project: Cassandra
  Issue Type: Bug
  Components: Documentation/Blog
Reporter: Jane Deng


In the Cassandra document page: 
[http://cassandra.apache.org/doc/latest/operating/repair.html], there is an 
explanation which makes confusion on users' side:
{quote}The {{-pr}} flag will only repair the “primary” ranges on a node, so you 
can repair your entire cluster by running {{nodetool repair -pr}} on each node 
in a single datacenter.
{quote}
This made confusion that "running nodetool repair -pr on each node in a single 
DC" was sufficient to do a full repair on a multi-DC cluster.

The correct way to use the primary range repair should be
{quote}if you are using “nodetool repair -pr” you must run it on *EVERY* node 
in *EVERY* data center, no skipping allowed.
{quote}
Please make a doc fix. Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15714) Support in cassandra-in-jvm-dtest-api for replacing logback with alternate logger

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15714:
--
Reviewers: David Capwell, David Capwell  (was: David Capwell)
   David Capwell, David Capwell
   Status: Review In Progress  (was: Patch Available)

small comment left in GitHub.

Overall I am ok with this.

>  Support in cassandra-in-jvm-dtest-api for replacing logback with alternate 
> logger
> --
>
> Key: CASSANDRA-15714
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15714
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Jon Meredith
>Assignee: Jon Meredith
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Not all forks use logback, and there is an (prematurely) closed ticket 
> indicating that it would be valuable CASSANDRA-13212.
>  
> Add support for making the log file configuration property and log file 
> pathname configurable rather than hard-coding to logback.
>  
> Also had to add 'org.w3c.dom' to the InstanceClassLoader so that log4j2 could 
> load its configuration, but looks like that can be handled with the changes 
> in CASSANDRA-15713



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-04-13 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-15338:
--
Reviewers: Andres de la Peña

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: CASS-15338-Docker.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15718) Improve BatchMetricsTest

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15718:
--
Test and Documentation Plan: additional unit tests
 Status: Patch Available  (was: Open)

> Improve BatchMetricsTest 
> -
>
> Key: CASSANDRA-15718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15718
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> As noted in CASSANDRA-15582 {{BatchMetricsTest}} should test 
> {{BatchStatement.Type.COUNTER}} to cover all the {{BatchMetrics}}.  Some 
> changes were introduced to make this improvement at:
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics
> and the following suggestions were made in review (in addition to the 
> suggestion that a separate JIRA be created for this change) by [~dcapwell]:
> {quote}
> * I like the usage of BatchStatement.Type for the tests
> * honestly feel quick theories is better than random, but glad you added the 
> seed to all asserts =). Would still be better as a quick theories test since 
> you basically wrote a property anyways!
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R131
>  feel you should rename to expectedPartitionsPerLoggedBatch 
> {Count,Logged,Unlogged}
> * . pre is what the value is, post is what the value is expected to be 
> (rather than what it is).
> * 
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R150
>  this doesn't look correct. the batch has distinctPartitions mutations, so 
> shouldn't max reflect that? I ran the current test in a debugger and see that 
> that is the case (aka current test is wrong).
> most of the comments are nit picks, but the last one looks like a test bug to 
> me
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15718) Improve BatchMetricsTest

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15718:
--
Reviewers: David Capwell, David Capwell  (was: David Capwell)
   David Capwell, David Capwell
   Status: Review In Progress  (was: Patch Available)

> Improve BatchMetricsTest 
> -
>
> Key: CASSANDRA-15718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15718
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> As noted in CASSANDRA-15582 {{BatchMetricsTest}} should test 
> {{BatchStatement.Type.COUNTER}} to cover all the {{BatchMetrics}}.  Some 
> changes were introduced to make this improvement at:
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics
> and the following suggestions were made in review (in addition to the 
> suggestion that a separate JIRA be created for this change) by [~dcapwell]:
> {quote}
> * I like the usage of BatchStatement.Type for the tests
> * honestly feel quick theories is better than random, but glad you added the 
> seed to all asserts =). Would still be better as a quick theories test since 
> you basically wrote a property anyways!
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R131
>  feel you should rename to expectedPartitionsPerLoggedBatch 
> {Count,Logged,Unlogged}
> * . pre is what the value is, post is what the value is expected to be 
> (rather than what it is).
> * 
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R150
>  this doesn't look correct. the batch has distinctPartitions mutations, so 
> shouldn't max reflect that? I ran the current test in a debugger and see that 
> that is the case (aka current test is wrong).
> most of the comments are nit picks, but the last one looks like a test bug to 
> me
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-04-13 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082478#comment-17082478
 ] 

David Capwell commented on CASSANDRA-15338:
---

Sounds great thanks!

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: CASS-15338-Docker.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15718) Improve BatchMetricsTest

2020-04-13 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082475#comment-17082475
 ] 

David Capwell commented on CASSANDRA-15718:
---

Count shows that the histograms are updated at the same frequency as expected, 
but doesn't show that the actual metrics are working.  Testing the actual 
histogram's data would show if the updates are correct or not; so I do feel 
that we want histogram data checks as well.

Now, do I think we need to test if histograms from the metrics library are 
correct?  No, only our usage.  To be clear, since the test really provides 
fixed values, then min/max are well known so should be enough to test.

> Improve BatchMetricsTest 
> -
>
> Key: CASSANDRA-15718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15718
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> As noted in CASSANDRA-15582 {{BatchMetricsTest}} should test 
> {{BatchStatement.Type.COUNTER}} to cover all the {{BatchMetrics}}.  Some 
> changes were introduced to make this improvement at:
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics
> and the following suggestions were made in review (in addition to the 
> suggestion that a separate JIRA be created for this change) by [~dcapwell]:
> {quote}
> * I like the usage of BatchStatement.Type for the tests
> * honestly feel quick theories is better than random, but glad you added the 
> seed to all asserts =). Would still be better as a quick theories test since 
> you basically wrote a property anyways!
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R131
>  feel you should rename to expectedPartitionsPerLoggedBatch 
> {Count,Logged,Unlogged}
> * . pre is what the value is, post is what the value is expected to be 
> (rather than what it is).
> * 
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R150
>  this doesn't look correct. the batch has distinctPartitions mutations, so 
> shouldn't max reflect that? I ran the current test in a debugger and see that 
> that is the case (aka current test is wrong).
> most of the comments are nit picks, but the last one looks like a test bug to 
> me
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15718) Improve BatchMetricsTest

2020-04-13 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15718:
--
Change Category: Quality Assurance
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Improve BatchMetricsTest 
> -
>
> Key: CASSANDRA-15718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15718
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> As noted in CASSANDRA-15582 {{BatchMetricsTest}} should test 
> {{BatchStatement.Type.COUNTER}} to cover all the {{BatchMetrics}}.  Some 
> changes were introduced to make this improvement at:
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics
> and the following suggestions were made in review (in addition to the 
> suggestion that a separate JIRA be created for this change) by [~dcapwell]:
> {quote}
> * I like the usage of BatchStatement.Type for the tests
> * honestly feel quick theories is better than random, but glad you added the 
> seed to all asserts =). Would still be better as a quick theories test since 
> you basically wrote a property anyways!
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R131
>  feel you should rename to expectedPartitionsPerLoggedBatch 
> {Count,Logged,Unlogged}
> * . pre is what the value is, post is what the value is expected to be 
> (rather than what it is).
> * 
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R150
>  this doesn't look correct. the batch has distinctPartitions mutations, so 
> shouldn't max reflect that? I ran the current test in a debugger and see that 
> that is the case (aka current test is wrong).
> most of the comments are nit picks, but the last one looks like a test bug to 
> me
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15718) Improve BatchMetricsTest

2020-04-13 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082376#comment-17082376
 ] 

Stephen Mallette commented on CASSANDRA-15718:
--

[~dcapwell] I've started making adjustments to use quick theories and 
interestingly the assertion of the histogram fails as you point out that the 
"current test is wrong". I started thinking about those assertions:

{code}
assertTrue(assertionMessage,partitionsPerLoggedBatchCountPre <= 
metrics.partitionsPerLoggedBatch.getSnapshot().getMax());
assertTrue(assertionMessage,partitionsPerUnloggedBatchCountPre <= 
metrics.partitionsPerUnloggedBatch.getSnapshot().getMax());
assertTrue(assertionMessage,partitionsPerCounterBatchCountPre <= 
metrics.partitionsPerCounterBatch.getSnapshot().getMax());
{code} 

and started to wonder if the histograms need to be asserted at all. I mostly 
just copied the semantics of the old tests, but didn't really stop to consider 
them. Is the assertion of the histogram fulfilling any purpose in your view 
compared to what is already being asserted in terms of the direct expectations 
of the metrics:

{code}
assertEquals(assertionMessage,partitionsPerUnloggedBatchCountPost, 
metrics.partitionsPerUnloggedBatch.getCount());
assertEquals(assertionMessage, partitionsPerLoggedBatchCountPost, 
metrics.partitionsPerLoggedBatch.getCount());
assertEquals(assertionMessage, partitionsPerCounterBatchCountPost, 
metrics.partitionsPerCounterBatch.getCount());
{code}

> Improve BatchMetricsTest 
> -
>
> Key: CASSANDRA-15718
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15718
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> As noted in CASSANDRA-15582 {{BatchMetricsTest}} should test 
> {{BatchStatement.Type.COUNTER}} to cover all the {{BatchMetrics}}.  Some 
> changes were introduced to make this improvement at:
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics
> and the following suggestions were made in review (in addition to the 
> suggestion that a separate JIRA be created for this change) by [~dcapwell]:
> {quote}
> * I like the usage of BatchStatement.Type for the tests
> * honestly feel quick theories is better than random, but glad you added the 
> seed to all asserts =). Would still be better as a quick theories test since 
> you basically wrote a property anyways!
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R131
>  feel you should rename to expectedPartitionsPerLoggedBatch 
> {Count,Logged,Unlogged}
> * . pre is what the value is, post is what the value is expected to be 
> (rather than what it is).
> * 
> * 
> https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R150
>  this doesn't look correct. the batch has distinctPartitions mutations, so 
> shouldn't max reflect that? I ran the current test in a debugger and see that 
> that is the case (aka current test is wrong).
> most of the comments are nit picks, but the last one looks like a test bug to 
> me
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15582) 4.0 quality testing: metrics

2020-04-13 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082367#comment-17082367
 ] 

Stephen Mallette commented on CASSANDRA-15582:
--

I've added CASSANDRA-15718 to continue the review of the {{BatchMetricsTest}}

> 4.0 quality testing: metrics
> 
>
> Key: CASSANDRA-15582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15582
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Romain Hardouin
>Priority: Normal
> Fix For: 4.0-rc
>
> Attachments: Screen Shot 2020-04-07 at 5.47.17 PM.png
>
>
> In past releases we've unknowingly broken metrics integrations and introduced 
> performance regressions in metrics collection and reporting. We strive in 4.0 
> to not do that. Metrics should work well!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15718) Improve BatchMetricsTest

2020-04-13 Thread Stephen Mallette (Jira)
Stephen Mallette created CASSANDRA-15718:


 Summary: Improve BatchMetricsTest 
 Key: CASSANDRA-15718
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15718
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/unit
Reporter: Stephen Mallette
Assignee: Stephen Mallette


As noted in CASSANDRA-15582 {{BatchMetricsTest}} should test 
{{BatchStatement.Type.COUNTER}} to cover all the {{BatchMetrics}}.  Some 
changes were introduced to make this improvement at:

https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics

and the following suggestions were made in review (in addition to the 
suggestion that a separate JIRA be created for this change) by [~dcapwell]:

{quote}
* I like the usage of BatchStatement.Type for the tests
* honestly feel quick theories is better than random, but glad you added the 
seed to all asserts =). Would still be better as a quick theories test since 
you basically wrote a property anyways!
* 
https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R131
 feel you should rename to expectedPartitionsPerLoggedBatch 
{Count,Logged,Unlogged}
* . pre is what the value is, post is what the value is expected to be (rather 
than what it is).
* 
* 
https://github.com/apache/cassandra/compare/trunk...spmallette:CASSANDRA-15582-trunk-batchmetrics#diff-8948cec1f9d33f10b15c38de80141548R150
 this doesn't look correct. the batch has distinctPartitions mutations, so 
shouldn't max reflect that? I ran the current test in a debugger and see that 
that is the case (aka current test is wrong).
most of the comments are nit picks, but the last one looks like a test bug to me
{quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15338) Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest

2020-04-13 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082227#comment-17082227
 ] 

Andres de la Peña commented on CASSANDRA-15338:
---

[~dcapwell] I can start reviewing this one tomorrow, if it's ok with you

> Fix flakey testMessagePurging - org.apache.cassandra.net.ConnectionTest
> ---
>
> Key: CASSANDRA-15338
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15338
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-alpha
>
> Attachments: CASS-15338-Docker.zip
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Example failure: 
> [https://circleci.com/gh/dcapwell/cassandra/11#artifacts/containers/1]
>   
> {code:java}
> Testcase: testMessagePurging(org.apache.cassandra.net.ConnectionTest):  FAILED
>  expected:<0> but was:<1>
>  junit.framework.AssertionFailedError: expected:<0> but was:<1>
>    at 
> org.apache.cassandra.net.ConnectionTest.lambda$testMessagePurging$38(ConnectionTest.java:625)
>    at 
> org.apache.cassandra.net.ConnectionTest.doTestManual(ConnectionTest.java:258)
>    at 
> org.apache.cassandra.net.ConnectionTest.testManual(ConnectionTest.java:231)
>    at 
> org.apache.cassandra.net.ConnectionTest.testMessagePurging(ConnectionTest.java:584){code}
>   
>  Looking closer at 
> org.apache.cassandra.net.OutboundConnection.Delivery#stopAndRun it seems that 
> the run method is called before 
> org.apache.cassandra.net.OutboundConnection.Delivery#doRun which may lead to 
> a test race condition where the CountDownLatch completes before executing



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Description: 
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  Docker runs in bridged mode

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
 sleep 15

Observations (On Ubuntu-OpenStack)
 * Docker:
 ** Mean average  response latency: 1500 us
 ** Average CPU usage of cassandra instances (wrt 2 cores): 42
 * Kubernetes
 ** Mean average response latency: 2700 us
 ** Average CPU usage of cassandra instance (wrt 2 cores): 32%
 * Nodetool tablestats
 ** There are little difference for the usertable, with an almost identical  
write latency (difference < 0.002 ms).
 ** However for the system keyspace there are quite some differences in 
read/write count and read latency (difference = 2.5 ms). More specifically,  
compaction history (see attachment the 2 tablestats output)
 *** Table: compaction_history Kubernetes
 SSTable count: 1
 Space used (live): 12049
 Space used (total): 12049
 Space used by snapshots (total): 0
 Off heap memory used (total): 108
 SSTable Compression Ratio: 0.25466231166368136
 Number of keys (estimate): 54 
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 12
 *** Table: compaction_history Docker
 SSTable count: 1
 Space used (live): 8921
 Space used (total): 8921
 Space used by snapshots (total): 0
 Off heap memory used (total): 76
 SSTable Compression Ratio: 0.2603359822955437
 Number of keys (estimate): 25
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 1
 * Cassandra Logs:
 * 
|Docker|Kubernetes|
|ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compaction' docker-cassandra-logs
 27
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-size' 
docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c ' Writing 
Memtable-sstable' docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-usertable' docker-cassandra-logs
 45
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compactions_in_progress' docker-cassandra-logs
 26
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-peers' 
docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-schema' docker-cassandra-logs
 24
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-local' 
docker-cassandra-logs
 6|ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-compaction' kubeadm-cassandra-logs \| more
 32
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-size' 
kubeadm-cassandra-logs \| more
 7
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  ' Writing 
Memtable-sstable' kubeadm-cassandra-logs \| more
 7
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-usertable' kubeadm-cassandra-logs \| more
 45
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-compactions_in_progress' kubeadm-cassandra-logs \| more
 26
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-peers' 
kubeadm-cassandra-logs \| more
 2
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-schema' 
kubeadm-cassandra-logs
 17
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep 

[jira] [Updated] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eddy Truyen updated CASSANDRA-15717:

Description: 
This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  Docker runs in bridged mode

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph.
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table:
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 1 }
 ;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);

 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
 sleep 15

Observations:
 * Docker:
 ** Mean average  response latency: 1500
 ** Average CPU usage of cassandra instances (wrt 2 cores): 42
 * Kubernetes
 ** Mean average response latency: 2700
 ** Average CPU usage of cassandra instance (wrt 2 cores): 32%
 * Nodetool tablestats
 ** There are little difference for the usertable, with an almost identical  
write latency (difference < 0.002 ms).
 ** However for the system keyspace there are quite some differences in 
read/write count and read latency (difference = 2.5 ms). More specifically,  
compaction history (see attachment the 2 tablestats output)
 *** Table: compaction_history Kubernetes
 SSTable count: 1
 Space used (live): 12049
 Space used (total): 12049
 Space used by snapshots (total): 0
 Off heap memory used (total): 108
 SSTable Compression Ratio: 0.25466231166368136
 Number of keys (estimate): 54 
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 12
 *** Table: compaction_history Docker
 SSTable count: 1
 Space used (live): 8921
 Space used (total): 8921
 Space used by snapshots (total): 0
 Off heap memory used (total): 76
 SSTable Compression Ratio: 0.2603359822955437
 Number of keys (estimate): 25
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 1
 * Cassandra Logs:
 * 
|Docker|Kubernetes|
|ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compaction' docker-cassandra-logs
 27
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-size' 
docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c ' Writing 
Memtable-sstable' docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-usertable' docker-cassandra-logs
 45
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compactions_in_progress' docker-cassandra-logs
 26
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-peers' 
docker-cassandra-logs
 1
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-schema' docker-cassandra-logs
 24
 ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-local' 
docker-cassandra-logs
 6|ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-compaction' kubeadm-cassandra-logs \| more
 32
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-size' 
kubeadm-cassandra-logs \| more
 7
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  ' Writing 
Memtable-sstable' kubeadm-cassandra-logs \| more
 7
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-usertable' kubeadm-cassandra-logs \| more
 45
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-compactions_in_progress' kubeadm-cassandra-logs \| more
 26
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-peers' 
kubeadm-cassandra-logs \| more
 2
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-schema' 
kubeadm-cassandra-logs
 17
 ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c 'Writing Memtable-local' 

[jira] [Created] (CASSANDRA-15717) Benchmark performance difference between Docker and Kubernetes when running Cassandra:2.2.16 official Docker image

2020-04-13 Thread Eddy Truyen (Jira)
Eddy Truyen created CASSANDRA-15717:
---

 Summary: Benchmark performance difference between Docker and 
Kubernetes when running Cassandra:2.2.16 official Docker image
 Key: CASSANDRA-15717
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15717
 Project: Cassandra
  Issue Type: Bug
  Components: Test/benchmark
Reporter: Eddy Truyen
 Attachments: docker-cassandra-nodetool-tablestats, 
kube-adm-cassandra-nodetool-tablestats

This is my first JIRA issue. Sorry if I do something  wrong in the reporting.

I experienced a performance degradation when running a single Cassandra Docker 
container  inside Kubernetes in comparison with running the Docker container 
stand-alone. I used the following image decomads/cassandra:2.2.16, which uses 
cassandra:2.2.16 as base image and adds a readinessProbe to it.

I used identical Docker configuration parameters by ensuring that the output of 
docker inspect is as much as possible the same.  Docker runs in bridged mode

 Experiment (repeated on minikube+VirtualBox (12GB, 4 CPU cores, 30 GB) on 
physical laptop with 4 cores/8 logical processors and 16GB RAM on and Openstack 
VM Ubuntu 16:04  (4GB, 4 CPU cores, 50GB), that runs on a physical nodes with 
16 CPU cores. Storage is Ceph. 
 * A write-only workload (YCSB benchmark workload A - Load phase) using the 
following user table: 
 cqlsh> create keyspace ycsb
 WITH REPLICATION = \{'class' : 'SimpleStrategy', 'replication_factor': 3 }
;
 cqlsh> USE ycsb;
 cqlsh> create table usertable (
 y_id varchar primary key,
 field0 varchar,
 field1 varchar,
 field2 varchar,
 field3 varchar,
 field4 varchar,
 field5 varchar,
 field6 varchar,
 field7 varchar,
 field8 varchar,
 field9 varchar);



 * And using the following script: python ./bin/ycsb load cassandra2-cql -P 
workloads/workloada -p recordcount=150 -p operationcount=150 -p 
measurementtype=raw -p cassandra.connecttimeoutmillis=6 -p 
cassandra.readtimeoutmillis=6 -target 1500 -threads 20 -p hosts=localhost > 
results/cassandra-docker/cassandra-docker-load-workloada-1-records-150-rnd-1762034446.txt
sleep 15

Observations:
 * Docker:
 ** Mean average  response latency: 1500
 ** Average CPU usage of cassandra instances (wrt 2 cores): 42
 * Kubernetes
 ** Mean average response latency: 2700
 ** Average CPU usage of cassandra instance (wrt 2 cores): 32%
 * Nodetool tablestats
 ** There are little difference for the usertable, with an almost identical  
write latency (difference < 0.002 ms).
 ** However for the system keyspace there are quite some differences in 
read/write count and read latency (difference = 2.5 ms). More specifically,  
compaction history (see attachment the 2 tablestats output)
 *** Table: compaction_history Kubernetes
 SSTable count: 1
 Space used (live): 12049
 Space used (total): 12049
 Space used by snapshots (total): 0
 Off heap memory used (total): 108
 SSTable Compression Ratio: 0.25466231166368136
 Number of keys (estimate): 54 
 Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 12
 *** Table: compaction_history Docker
 SSTable count: 1
 Space used (live): 8921
 Space used (total): 8921
 Space used by snapshots (total): 0
 Off heap memory used (total): 76
 SSTable Compression Ratio: 0.2603359822955437
 Number of keys (estimate): 25
Memtable data size: 0
 Memtable off heap memory used: 0
 Memtable switch count: 1
 * Cassandra Logs:
 * 
|Docker|Kubernetes|
|ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compaction' docker-cassandra-logs
27
ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-size' 
docker-cassandra-logs
1
ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c ' Writing 
Memtable-sstable' docker-cassandra-logs
1
ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-usertable' docker-cassandra-logs
45
ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing 
Memtable-compactions_in_progress' docker-cassandra-logs
26
ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-peers' 
docker-cassandra-logs
1
ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-schema' 
docker-cassandra-logs
24
ubuntu@k8-test-2:/data/ycsb/cassandra-docker$ grep -c 'Writing Memtable-local' 
docker-cassandra-logs
6|ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-compaction' kubeadm-cassandra-logs \| more
32
ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing Memtable-size' 
kubeadm-cassandra-logs \| more
7
ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  ' Writing 
Memtable-sstable' kubeadm-cassandra-logs \| more
7
ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-usertable' kubeadm-cassandra-logs \| more
45
ubuntu@k8-test-2:/data/ycsb/cassandra-kube$ grep -c  'Writing 
Memtable-compactions_in_progress' kubeadm-cassandra-logs 

[jira] [Commented] (CASSANDRA-15229) BufferPool Regression

2020-04-13 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082202#comment-17082202
 ] 

Benedict Elliott Smith commented on CASSANDRA-15229:


bq. In networking, most of the time, buffer will be release immediately after 
allocation and with recycleWhenFree=false, fully freed chunk will be reused 
instead of being recycled to global list. Partial-recycle is unlikely affect 
networking usage. I am happy to test it..

It is famously difficult to prove a negative, particularly via external 
testing.  It will be untrue in some circumstances, most notably large message 
processing (which happens asynchronously).  I would need to review the buffer 
control flow in messaging to confirm it is sufficiently low risk to modify the 
behaviour here, so I would prefer we not modify it in a way that is not easily 
verified.

bq. will it create fragmentation in system direct memory?

Not easily completely ruled out, but given this data will be allocated mostly 
in its own virtual page space (given all allocations are much larger than a 
normal page), it hopefully shouldn't be an insurmountable problem for most 
allocators given the availability of almost unlimited virtual page space on 
modern systems.

bq. I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline.

Did you perform the simple optimisation of rounding up to the >= 2KiB boundary 
(for equivalent behaviour), then re-using any buffer that is correctly sized 
when evicting to make room for a new item?  It might well be possible to make 
this yet more efficient than {{BufferPool}} by reducing this boundary to e.g. 
1KiB, or perhaps as little as 512B.

So if I were doing this myself, I think I would be starting at this point and 
if necessary would move towards further reusing the buffers we already have in 
the cache - since it is already a pool of them.  I would just be looking to 
smooth out the random distribution of sizes used with e.g. a handful of queues 
each containing a single size of buffer and at most a handful of items each.  
This feels like a simpler solution to me, particularly as it does not affect 
any other pool users.

However, I’m not doing the work (nor maybe reviewing it), so if you are willing 
to at least enable the behaviour only for the ChunkCache so this change cannot 
have any unintended negative effect for those users not expected to benefit, my 
main concern will be alleviated.


> BufferPool Regression
> -
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15229) BufferPool Regression

2020-04-13 Thread ZhaoYang (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17082081#comment-17082081
 ] 

ZhaoYang commented on CASSANDRA-15229:
--

{quote}
Recirculating immediately will lead to greater inefficiency in allocation, as 
we will attempt to reuse partially freed chunks in preference to entirely freed 
chunks, leading to a great deal more churn in the active blocks. This will 
affect the networking pooling as much as the chunk cache.
{quote}

In networking, most of the time, buffer will be release immediately after 
allocation and  with {{recycleWhenFree=false}}, fully freed chunk will be 
reused instead of being recycled to global list. Partial-recycle is unlikely 
affect networking usage. I am happy to test it..

{quote}
 At the very least this behaviour should be enabled only for the ChunkCache, 
but ideally might have e.g. two queues, one with guaranteed-free chunks, 
another (perhaps for ease a superset) containing those chunks that might or 
mightn't be free.
{quote}

It's a good idea to have a separate queue and let partially freed chunk to have 
lower priority than fully freed chunk. So partially freed chunks will likely 
have larger freed space comparing to reusing them immediately.

{quote}if using Unsafe.allocateMemory wouldn't be simpler, more efficient, less 
risky and produce less fragmentation.
{quote}

It is simpler, but not efficient.. Without slab allocation, will it create 
fragmentation in system direct memory? 

I tested with "Bytebuffer#allocateDirect" and "Unsafe#allocateMemory", both 
latencies are slightly worse than baseline. 

btw, I think it'd be nice to add a new metrics to track direct bytebuffer 
allocation outside of buffer pool because they may be held by chunk cache for a 
long time.

Chunk cache with 
[Bytebuffer.allocateDirect|https://github.com/jasonstack/cassandra/commit/c3f286c1148d13f00364872413733822a4a2c475]:
 !15229-direct.png|width=600,height=400!

Chunk cache with 
[Unsafe.allocateMemory|https://github.com/jasonstack/cassandra/commit/3dadd884ff0d8e19d3dd46a07a290762755df312]:
 !15229-unsafe.png|width=600,height=400!

> BufferPool Regression
> -
>
> Key: CASSANDRA-15229
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15229
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Caching
>Reporter: Benedict Elliott Smith
>Assignee: ZhaoYang
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
> Attachments: 15229-count.png, 15229-direct.png, 15229-hit-rate.png, 
> 15229-recirculate-count.png, 15229-recirculate-hit-rate.png, 
> 15229-recirculate-size.png, 15229-recirculate.png, 15229-size.png, 
> 15229-unsafe.png
>
>
> The BufferPool was never intended to be used for a {{ChunkCache}}, and we 
> need to either change our behaviour to handle uncorrelated lifetimes or use 
> something else.  This is particularly important with the default chunk size 
> for compressed sstables being reduced.  If we address the problem, we should 
> also utilise the BufferPool for native transport connections like we do for 
> internode messaging, and reduce the number of pooling solutions we employ.
> Probably the best thing to do is to improve BufferPool’s behaviour when used 
> for things with uncorrelated lifetimes, which essentially boils down to 
> tracking those chunks that have not been freed and re-circulating them when 
> we run out of completely free blocks.  We should probably also permit 
> instantiating separate {{BufferPool}}, so that we can insulate internode 
> messaging from the {{ChunkCache}}, or at least have separate memory bounds 
> for each, and only share fully-freed chunks.
> With these improvements we can also safely increase the {{BufferPool}} chunk 
> size to 128KiB or 256KiB, to guarantee we can fit compressed pages and reduce 
> the amount of global coordination and per-allocation overhead.  We don’t need 
> 1KiB granularity for allocations, nor 16 byte granularity for tiny 
> allocations.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org