[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919020#comment-16919020
 ] 

Dinesh Joshi commented on CASSANDRA-13938:
--

[~jolynch] I have assigned this to you. Thanks for volunteering :)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Joseph Lynch
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the 

[jira] [Assigned] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi reassigned CASSANDRA-13938:


Assignee: Joseph Lynch  (was: Jason Brown)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Joseph Lynch
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05 14:32:07,048] null
> at 

[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Joseph Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918975#comment-16918975
 ] 

Joseph Lynch commented on CASSANDRA-13938:
--

I might have cycles to tackle this shortly, if someone else has cycles first 
please take it.

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair 

[jira] [Commented] (CASSANDRA-15262) server_encryption_options is not backwards compatible with 3.11

2019-08-29 Thread Joseph Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918959#comment-16918959
 ] 

Joseph Lynch commented on CASSANDRA-15262:
--

This could slip to 4.0-beta if we had to, but it is going to be annoying for 
folks testing with TLS (it was for us).

> server_encryption_options is not backwards compatible with 3.11
> ---
>
> Key: CASSANDRA-15262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15262
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
>
> The current `server_encryption_options` configuration options are as follows:
> {noformat}
> server_encryption_options:
> # set to true for allowing secure incoming connections
> enabled: false
> # If enabled and optional are both set to true, encrypted and unencrypted 
> connections are handled on the storage_port
> optional: false
> # if enabled, will open up an encrypted listening socket on 
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false.
> enable_legacy_ssl_storage_port: false
> # on outbound connections, determine which type of peers to securely 
> connect to. 'enabled' must be set to true.
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
> # More advanced defaults below:
> # protocol: TLS
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> # require_client_auth: false
> # require_endpoint_verification: false
> {noformat}
> A couple of issues here:
> 1. optional defaults to false, which will break existing TLS configurations 
> for (from what I can tell) no particularly good reason
> 2. The provided protocol and cipher suites are not good ideas (in particular 
> encouraging anyone to use CBC ciphers is a bad plan
> I propose that before the 4.0 cut we fixup server_encryption_options and even 
> client_encryption_options :
> # Change the default {{optional}} setting to true. As the new Netty code 
> intelligently decides to open a TLS connection or not this is the more 
> sensible default (saves operators a step while transitioning to TLS as well)
> # Update the defaults to what netty actually defaults to



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15294) Allow easy use of custom security providers

2019-08-29 Thread Joseph Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918955#comment-16918955
 ] 

Joseph Lynch commented on CASSANDRA-15294:
--

Yes I think after the alpha cuts I should have cycles to add this in, since it 
doesn't involve any backwards incompatible API changes I can do it before beta. 
I'd like to add the configuration capability to 3.0/3.11/trunk if possible but 
I think people might object to it being in 3.0 ... If no-one objects I'll just 
make patches for all three.

> Allow easy use of custom security providers
> ---
>
> Key: CASSANDRA-15294
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15294
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Joseph Lynch
>Priority: Normal
>
> As more users are switching to using {{AES-GCM}} TLS they are increasingly 
> running into extremely poor performance with the JDK implementations (e.g. 
> [JDK-8046943|https://bugs.openjdk.java.net/browse/JDK-8046943]). It's not 
> just TLS either, generally speaking Java crypto can be really slow, including 
> for example MD5 hashing which powers our digests (CASSANDRA-14611).
> There have been a few community attempts to fix this via customer java 
> security providers, for example Google's 
> [conscrypt|https://github.com/google/conscrypt] and recently Amazon's 
> [ACCP|https://github.com/corretto/amazon-corretto-crypto-provider] which are 
> basically portions of OpenSSL/BoringSSL that are statically linked in and 
> exposed via JNI. These approaches are similar in spirit to what 
> [netty-tcnative|https://github.com/netty/netty-tcnative] is doing for TLS in 
> C* trunk.
> Since there may be tradeoffs to using various providers for various functions 
> (e.g. {{conscrypt}} may be faster or slower than {{accp}} in certain use 
> cases and in other cases you may want to use JDK providers for ease of 
> upgrading) it would be useful if Cassandra supported pluggable providers per 
> use case. For example we could use {{conscrypt}} for TLS, {{accp}} for MD5 
> digesting, and the {{SUN}} provider for everything else. There is a small 
> amount of JVM wiring that needs to be done for this and it could unlock 
> 10-25% CPU capacity improvements.
> We can then use this pluggability to test different providers and if one is 
> strictly dominant we can just check that one in in libs and default to it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15146) Transitional TLS server configuration options are overly complex

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15146:
-
Fix Version/s: 4.0-beta

> Transitional TLS server configuration options are overly complex
> 
>
> Key: CASSANDRA-15146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption, Local/Config
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> It appears as part of the port from transitional client TLS to transitional 
> server TLS in CASSANDRA-10404 (the ability to switch a cluster to using 
> {{internode_encryption}} without listening on two ports and without downtime) 
> we carried the {{enabled}} setting over from the client implementation. I 
> believe that the {{enabled}} option is redundant to {{internode_encryption}} 
> and {{optional}} and it should therefore be removed prior to the 4.0 release 
> where we will have to start respecting that interface. 
> Current trunk yaml:
> {noformat}
> server_encryption_options:
>   
> # set to true for allowing secure incoming connections
>   
> enabled: false
>   
> # If enabled and optional are both set to true, encrypted and unencrypted 
> connections are handled on the storage_port
> optional: false   
>   
>   
>   
> 
> # if enabled, will open up an encrypted listening socket on 
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false. 
>   
> enable_legacy_ssl_storage_port: false 
>   
> # on outbound connections, determine which type of peers to securely 
> connect to. 'enabled' must be set to true.
> internode_encryption: none
>   
> keystore: conf/.keystore  
>   
> keystore_password: cassandra  
>   
> truststore: conf/.truststore  
>   
> truststore_password: cassandra
> {noformat}
> I propose we eliminate {{enabled}} and just use {{optional}} and 
> {{internode_encryption}} to determine the listener setup. I also propose we 
> change the default of {{optional}} to true. We could also re-name 
> {{optional}} since it's a new option but I think it's good to stay consistent 
> with the client and use {{optional}}.
> ||optional||internode_encryption||description||
> |true|none|(default) No encryption is used but if a server reaches out with 
> it we'll use it|
> |false|dc|Encryption is required for inter-dc communication, but not intra-dc|
> |false|all|Encryption is required for all communication|
> |false|none|We only listen for unencrypted connections|
> |true|dc|Encryption is used for inter-dc communication but is not required|
> |true|all|Encryption is used for all communication but is not required|
> From these states it is clear when we should be accepting TLS connections 
> (all except for false and none) as well as when we must enforce it.
> To transition without downtime from an un-encrypted cluster to an encrypted 
> cluster the user would do the following:
> 1. After adding valid truststores, change {{internode_encryption}} to the 
> desired level of encryption (recommended {{all}}) and restart Cassandra
>  2. Change {{optional=false}} and restart Cassandra to enforce #1
> If {{optional}} defaulted to {{false}} as it does right now we'd need a third 
> restart to first change {{optional}} to {{true}}, which given my 
> understanding of the OptionalSslHandler isn't really relevant.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15262) server_encryption_options is not backwards compatible with 3.11

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15262:
-
Fix Version/s: 4.0-alpha

> server_encryption_options is not backwards compatible with 3.11
> ---
>
> Key: CASSANDRA-15262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15262
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
>
> The current `server_encryption_options` configuration options are as follows:
> {noformat}
> server_encryption_options:
> # set to true for allowing secure incoming connections
> enabled: false
> # If enabled and optional are both set to true, encrypted and unencrypted 
> connections are handled on the storage_port
> optional: false
> # if enabled, will open up an encrypted listening socket on 
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false.
> enable_legacy_ssl_storage_port: false
> # on outbound connections, determine which type of peers to securely 
> connect to. 'enabled' must be set to true.
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
> # More advanced defaults below:
> # protocol: TLS
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> # require_client_auth: false
> # require_endpoint_verification: false
> {noformat}
> A couple of issues here:
> 1. optional defaults to false, which will break existing TLS configurations 
> for (from what I can tell) no particularly good reason
> 2. The provided protocol and cipher suites are not good ideas (in particular 
> encouraging anyone to use CBC ciphers is a bad plan
> I propose that before the 4.0 cut we fixup server_encryption_options and even 
> client_encryption_options :
> # Change the default {{optional}} setting to true. As the new Netty code 
> intelligently decides to open a TLS connection or not this is the more 
> sensible default (saves operators a step while transitioning to TLS as well)
> # Update the defaults to what netty actually defaults to



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14764) Evaluate 12 Node Breaking Point, compression=none, encryption=none, coalescing=off

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14764:
-
Fix Version/s: 4.0-beta

> Evaluate 12 Node Breaking Point, compression=none, encryption=none, 
> coalescing=off
> --
>
> Key: CASSANDRA-14764
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14764
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Streaming and Messaging
>Reporter: Joseph Lynch
>Assignee: Vinay Chella
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: i-03341e1c52de6ea3e-after-queue-change.svg, 
> i-07cd92e844d66d801-after-queue-bound.svg, i-07cd92e844d66d801-hint-play.svg, 
> i-07cd92e844d66d801-uninlined-with-jvm-methods.svg, ttop.txt
>
>
> *Setup:*
>  * Cassandra: 12 (2*6) node i3.xlarge AWS instance (4 cpu cores, 30GB ram) 
> running cassandra trunk off of jasobrown/14503 jdd7ec5a2 (Jasons patched 
> internode messaging branch) vs the same footprint running 3.0.17
>  * Two datacenters with 100ms latency between them
>  * No compression, encryption, or coalescing turned on
> *Test #1:*
> ndbench sent 1.5k QPS at a coordinator level to one datacenter (RF=3*2 = 6 so 
> 3k global replica QPS) of 4kb single partition BATCH mutations at LOCAL_ONE. 
> This represents about 250 QPS per coordinator in the first datacenter or 60 
> QPS per core. The goal was to observe P99 write and read latencies under 
> various QPS.
> *Result:*
> The good news is since the CASSANDRA-14503 changes, instead of keeping the 
> mutations on heap we put the message into hints instead and don't run out of 
> memory. The bad news is that the {{MessagingService-NettyOutbound-Thread's}} 
> would occasionally enter a degraded state where they would just spin on a 
> core. I've attached flame graphs showing the CPU state as [~jasobrown] 
> applied fixes to the {{OutboundMessagingConnection}} class.
>  *Follow Ups:*
> [~jasobrown] has committed a number of fixes onto his 
> {{jasobrown/14503-collab}} branch including:
> 1. Limiting the amount of time spent dequeuing messages if they are expired 
> (previously if messages entered the queue faster than we could dequeue them 
> we'd just inifinte loop on the consumer side)
> 2. Don't call {{dequeueMessages}} from within {{dequeueMessages}} created 
> callbacks.
> We're continuing to use CPU flamegraphs to figure out where we're looping and 
> fixing bugs as we find them.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14747:
-
Fix Version/s: 4.0-beta

> Evaluate 200 node, compression=none, encryption=none, coalescing=off 
> -
>
> Key: CASSANDRA-14747
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14747
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Testing
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, 
> 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, 
> 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, 
> 4.0_errors_showing_heap_pressure.txt, 
> 4.0_heap_histogram_showing_many_MessageOuts.txt, 
> i-0ed2acd2dfacab7c1-after-looping-fixes.svg, 
> trunk_14503_v2_cpuflamegraph.svg, trunk_vs_3.0.17_latency_under_load.png, 
> ttop_NettyOutbound-Thread_spinning.txt, 
> useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, 
> useast1e-i-08635fa1631601538_flamegraph_96node.svg, 
> useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, 
> useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg
>
>
> Tracks evaluating a 200 node cluster with all internode settings off (no 
> compression, no encryption, no coalescing).



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14746:
-
Fix Version/s: 4.0-beta

> Ensure Netty Internode Messaging Refactor is Solid
> --
>
> Key: CASSANDRA-14746
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14746
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Streaming and Messaging
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
>  Labels: 4.0-QA
> Fix For: 4.0, 4.0-beta
>
>
> Before we release 4.0 let's ensure that the internode messaging refactor is 
> 100% solid. As internode messaging is naturally used in many code paths and 
> widely configurable we have a large number of cluster configurations and test 
> configurations that must be vetted.
> We plan to vary the following:
>  * Version of Cassandra 3.0.17 vs 4.0-alpha
>  * Cluster sizes with *multi-dc* deployments ranging from 6 - 100 nodes
>  * Client request rates varying between 1k QPS and 100k QPS of varying sizes 
> and shapes (BATCH, INSERT, SELECT point, SELECT range, etc ...)
>  * Internode compression
>  * Internode SSL (as well as openssl vs jdk)
>  * Internode Coalescing options
> We are looking to measure the following as appropriate:
>  * Latency distributions of reads and writes (lower is better)
>  * Scaling limit, aka maximum throughput before violating p99 latency 
> deadline of 10ms @ LOCAL_QUORUM, on a fixed hardware deployment for 100% 
> writes, 100% reads and 50-50 writes+reads (higher is better)
>  * Thread counts (lower is better)
>  * Context switches (lower is better)
>  * On-CPU time of tasks (higher periods without context switch is better)
>  * GC allocation rates / throughput for a fixed size heap (lower allocation 
> better)
>  * Streaming recovery time for a single node failure, i.e. can Cassandra 
> saturate the NIC
>  
> The goal is that 4.0 should have better latency, more throughput, fewer 
> threads, fewer context switches, less GC allocation, and faster recovery 
> time. I'm putting Jason Brown as the reviewer since he implemented most of 
> the internode refactor.
> Current collaborators driving this QA task: Dinesh Joshi, Jordan West, Joey 
> Lynch (Netflix), Vinay Chella (Netflix)
> Owning committer(s): Jason Brown



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15181) Ensure Nodes can Start and Stop

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15181:
-
Fix Version/s: 4.0-beta

> Ensure Nodes can Start and Stop
> ---
>
> Key: CASSANDRA-15181
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15181
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Legacy/Streaming and Messaging, Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Vinay Chella
>Priority: High
> Fix For: 4.0-beta
>
>
> Let's load a cluster up with data and start killing nodes. We can do hard 
> failures (node terminations) and soft failures (process kills) We plan to 
> observe the following:
> * Can nodes successfully bootstrap?
> * How long does it take to bootstrap
> * What are the effects of TLS on and off (e.g. on stream time)
> * Are hints properly played after a node restart
> * Do nodes properly shutdown and start back up.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14688) Update protocol spec and class level doc with protocol checksumming details

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14688:
-
Fix Version/s: 4.0-beta

> Update protocol spec and class level doc with protocol checksumming details
> ---
>
> Key: CASSANDRA-14688
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14688
> Project: Cassandra
>  Issue Type: Task
>  Components: Legacy/Documentation and Website
>Reporter: Sam Tunnicliffe
>Assignee: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0, 4.0-beta
>
>
> CASSANDRA-13304 provides an option to add checksumming to the frame body of 
> native protocol messages. The native protocol spec needs to be updated to 
> reflect this ASAP. We should also verify that the javadoc comments describing 
> the on-wire format in 
> {{o.a.c.transport.frame.checksum.ChecksummingTransformer}} are up to date.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15228) Commit Log should not use sync markers

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15228:
-
Fix Version/s: 4.0-alpha

> Commit Log should not use sync markers
> --
>
> Key: CASSANDRA-15228
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15228
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
>
> The sync markers existed to permit file re-use.  Since we no longer re-use 
> files, they no longer provide any value.  However, they _can_ corrupt the 
> commit log for replay in the event of a process crash.  Before we release 
> 4.0, we should ideally remove the sync markers entirely.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements

2019-08-29 Thread Joseph Lynch (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918949#comment-16918949
 ] 

Joseph Lynch commented on CASSANDRA-14801:
--

[~benedict] do you think this should block the first alpha or it can wait for 
beta?

> calculatePendingRanges no longer safe for multiple adjacent range movements
> ---
>
> Key: CASSANDRA-14801
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14801
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination, Legacy/Distributed Metadata
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0
>
>
> Correctness depended upon the narrowing to a {{Set}}, 
> which we no longer do - we maintain a collection of all {{Replica}}.  Our 
> {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result 
> contain the same endpoint multiple times; and our {{EndpointsForToken}} 
> obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, 
> resulting in cluster-wide failures for writes to the affected token ranges 
> for the duration of the range movement.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-10190) Python 3 support for cqlsh

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-10190:
-
Fix Version/s: 4.0-alpha

> Python 3 support for cqlsh
> --
>
> Key: CASSANDRA-10190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Legacy/Tools
>Reporter: Andrew Pennebaker
>Assignee: Patrick Bannister
>Priority: Normal
>  Labels: cqlsh
> Fix For: 4.0-alpha
>
> Attachments: coverage_notes.txt
>
>
> Users who operate in a Python 3 environment may have trouble launching cqlsh. 
> Could we please update cqlsh's syntax to run in Python 3?
> As a workaround, users can setup pyenv, and cd to a directory with a 
> .python-version containing "2.7". But it would be nice if cqlsh supported 
> modern Python versions out of the box.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2019-08-29 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-15213:
-
Fix Version/s: (was: 4.0)
   4.0-beta

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Benedict (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-13938:
-
Fix Version/s: (was: 4.0)
   4.0-alpha

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05 14:32:07,048] null
> at 

[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-13938:
-
Fix Version/s: (was: 4.0)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Urgent
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05 14:32:07,048] null
> at 

[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-13938:
-
Fix Version/s: 4.0

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Urgent
> Fix For: 4.0
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05 14:32:07,048] null
> at 

[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-13938:
-
Fix Version/s: (was: 4.x)
   4.0

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Jason Brown
>Priority: Urgent
> Fix For: 4.0
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05 14:32:07,048] null
> at 

[jira] [Updated] (CASSANDRA-15146) Transitional TLS server configuration options are overly complex

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15146:
-
Fix Version/s: 4.0

> Transitional TLS server configuration options are overly complex
> 
>
> Key: CASSANDRA-15146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption, Local/Config
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Fix For: 4.0
>
>
> It appears as part of the port from transitional client TLS to transitional 
> server TLS in CASSANDRA-10404 (the ability to switch a cluster to using 
> {{internode_encryption}} without listening on two ports and without downtime) 
> we carried the {{enabled}} setting over from the client implementation. I 
> believe that the {{enabled}} option is redundant to {{internode_encryption}} 
> and {{optional}} and it should therefore be removed prior to the 4.0 release 
> where we will have to start respecting that interface. 
> Current trunk yaml:
> {noformat}
> server_encryption_options:
>   
> # set to true for allowing secure incoming connections
>   
> enabled: false
>   
> # If enabled and optional are both set to true, encrypted and unencrypted 
> connections are handled on the storage_port
> optional: false   
>   
>   
>   
> 
> # if enabled, will open up an encrypted listening socket on 
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false. 
>   
> enable_legacy_ssl_storage_port: false 
>   
> # on outbound connections, determine which type of peers to securely 
> connect to. 'enabled' must be set to true.
> internode_encryption: none
>   
> keystore: conf/.keystore  
>   
> keystore_password: cassandra  
>   
> truststore: conf/.truststore  
>   
> truststore_password: cassandra
> {noformat}
> I propose we eliminate {{enabled}} and just use {{optional}} and 
> {{internode_encryption}} to determine the listener setup. I also propose we 
> change the default of {{optional}} to true. We could also re-name 
> {{optional}} since it's a new option but I think it's good to stay consistent 
> with the client and use {{optional}}.
> ||optional||internode_encryption||description||
> |true|none|(default) No encryption is used but if a server reaches out with 
> it we'll use it|
> |false|dc|Encryption is required for inter-dc communication, but not intra-dc|
> |false|all|Encryption is required for all communication|
> |false|none|We only listen for unencrypted connections|
> |true|dc|Encryption is used for inter-dc communication but is not required|
> |true|all|Encryption is used for all communication but is not required|
> From these states it is clear when we should be accepting TLS connections 
> (all except for false and none) as well as when we must enforce it.
> To transition without downtime from an un-encrypted cluster to an encrypted 
> cluster the user would do the following:
> 1. After adding valid truststores, change {{internode_encryption}} to the 
> desired level of encryption (recommended {{all}}) and restart Cassandra
>  2. Change {{optional=false}} and restart Cassandra to enforce #1
> If {{optional}} defaulted to {{false}} as it does right now we'd need a third 
> restart to first change {{optional}} to {{true}}, which given my 
> understanding of the OptionalSslHandler isn't really relevant.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15262) server_encryption_options is not backwards compatible with 3.11

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15262:
-
Fix Version/s: 4.0

> server_encryption_options is not backwards compatible with 3.11
> ---
>
> Key: CASSANDRA-15262
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15262
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
> Fix For: 4.0
>
>
> The current `server_encryption_options` configuration options are as follows:
> {noformat}
> server_encryption_options:
> # set to true for allowing secure incoming connections
> enabled: false
> # If enabled and optional are both set to true, encrypted and unencrypted 
> connections are handled on the storage_port
> optional: false
> # if enabled, will open up an encrypted listening socket on 
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false.
> enable_legacy_ssl_storage_port: false
> # on outbound connections, determine which type of peers to securely 
> connect to. 'enabled' must be set to true.
> internode_encryption: none
> keystore: conf/.keystore
> keystore_password: cassandra
> truststore: conf/.truststore
> truststore_password: cassandra
> # More advanced defaults below:
> # protocol: TLS
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> # require_client_auth: false
> # require_endpoint_verification: false
> {noformat}
> A couple of issues here:
> 1. optional defaults to false, which will break existing TLS configurations 
> for (from what I can tell) no particularly good reason
> 2. The provided protocol and cipher suites are not good ideas (in particular 
> encouraging anyone to use CBC ciphers is a bad plan
> I propose that before the 4.0 cut we fixup server_encryption_options and even 
> client_encryption_options :
> # Change the default {{optional}} setting to true. As the new Netty code 
> intelligently decides to open a TLS connection or not this is the more 
> sensible default (saves operators a step while transitioning to TLS as well)
> # Update the defaults to what netty actually defaults to



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all

2019-08-29 Thread Joseph Lynch (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-15175:
-
Fix Version/s: 4.0

> Evaluate 200 node, compression=on, encryption=all
> -
>
> Key: CASSANDRA-15175
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15175
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Test/benchmark
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Normal
>  Labels: 4.0-QA
> Fix For: 4.0
>
> Attachments: 30x_14400cRPS-14400cWPS.svg, 
> 30x_LQ_21600cRPS-14400cWPS.svg, ShortbufferExceptions.png, 
> cassandra_comparative_performance_all_flamegraphs.html, 
> image-2019-08-06-14-20-25-140.png, odd_netty_jdk_tls_cpu_usage.png, 
> trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, 
> trunk_187kcRPS_14kcWPS.png, trunk_22000cRPS-14400cWPS-jdk.svg, 
> trunk_22000cRPS-14400cWPS-openssl.svg, trunk_220kcRPS_14kcWPS.png, 
> trunk_252kcRPS-14kcWPS.png, trunk_93500cRPS-14400cWPS.svg, 
> trunk_LQ_14400cRPS-14400cWPS.svg, trunk_LQ_21600cRPS-14400cWPS.svg, 
> trunk_Q_21600cRPS-7200cWPS.svg, trunk_allocation_Q_21k_cRPS.svg, 
> trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, 
> trunk_vs_30x_14kcRPS_14kcWPS.png, 
> trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, 
> trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, 
> trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, 
> trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_LQ_21kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_64kcRPS_14kcWPS.png, trunk_vs_30x_LQ_jdk_summary.png, 
> trunk_vs_30x_LQ_openssl_21kcRPS_14kcWPS.png, 
> trunk_vs_30x_LQ_tcnative_summary.png, trunk_vs_30x_Q_21kcRPS_7200cWPS.png, 
> trunk_vs_30x_Q_36kcRPS_7200cWPS.png, trunk_vs_30x_Q_tcnative_summary.png, 
> trunk_vs_30x_summary.png, trunk_vs_30x_wEQ_rLQ_7kcRPS_22kcWPS.png, 
> trunk_vs_30x_wEQ_rLQ_7kcRPS_58kcWPS.png, 
> trunk_vs_30x_wEQ_rLQ_7kcRPS_7kcWPS.png, 
> trunk_vs_30x_write_LO_7kcRPS_108kcWPS.png, 
> trunk_vs_30x_write_LO_7kcRPS_162kcWPS.png, 
> trunk_vs_30x_write_LO_7kcRPS_72kcWPS.png, 
> trunk_vs_30x_write_LO_7kcRPS_7kcWPS.png, write_scaling_local_one_summary.png, 
> write_scaling_lq_eq_summary.png
>
>
> Tracks evaluating a 192 node cluster with compression and encryption on.
> First test is a [read scaling test 
> |https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053]
>  
> |Test Setup| |
> |Baseline|3.0.19
> @d7d00036|
> |Candiate|trunk
> @abb0e177|
> | | |
> |Workload| |
> |Write size|4kb random|
> |Read size|4kb random|
> |Per Node Data|110GiB|
> |Generator|ndbench|
> |Key Distribution|Uniform|
> |SSTable Compr|Off|
> |Internode TLS|On (jdk)|
> |Internode Compr|On|
> |Compaction|LCS (320 MiB)|
> |Repair|Off|
> | | |
> |Hardware| |
> |Instance Type|i3.xlarge|
> |Deployment|96 us-east-1, 96 eu-west-1|
> |Region node count|96|
> | | |
> |OS Settings| |
> |IO scheduler|kyber|
> |Net qdisc|tc-fq|
> |readahead|32kb|
> |Java Version|OpenJDK 1.8.0_202 (Zulu)|
> | | |
> Second test is a [write scaling 
> test|https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=428858608]:



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-builds] branch master updated: Add OpenJDK 11 to CentOS docker image

2019-08-29 Thread mshuler
This is an automated email from the ASF dual-hosted git repository.

mshuler pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git


The following commit(s) were added to refs/heads/master by this push:
 new 6b63414  Add OpenJDK 11 to CentOS docker image
6b63414 is described below

commit 6b63414f5101ecc02f51ebb8a8a1cd996e1df27f
Author: Michael Shuler 
AuthorDate: Thu Aug 29 13:55:54 2019 -0500

Add OpenJDK 11 to CentOS docker image
---
 docker/centos7-image.docker | 1 +
 1 file changed, 1 insertion(+)

diff --git a/docker/centos7-image.docker b/docker/centos7-image.docker
index c622f6d..1d80216 100644
--- a/docker/centos7-image.docker
+++ b/docker/centos7-image.docker
@@ -17,6 +17,7 @@ RUN yum -y install \
git \
java-1.7.0-openjdk-devel \
java-1.8.0-openjdk-devel \
+   java-11-openjdk-devel \
make \
rpm-build \
sudo


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15294) Allow easy use of custom security providers

2019-08-29 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15294:
-
Status: Awaiting Feedback  (was: Triage Needed)

> Allow easy use of custom security providers
> ---
>
> Key: CASSANDRA-15294
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15294
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Joseph Lynch
>Priority: Normal
>
> As more users are switching to using {{AES-GCM}} TLS they are increasingly 
> running into extremely poor performance with the JDK implementations (e.g. 
> [JDK-8046943|https://bugs.openjdk.java.net/browse/JDK-8046943]). It's not 
> just TLS either, generally speaking Java crypto can be really slow, including 
> for example MD5 hashing which powers our digests (CASSANDRA-14611).
> There have been a few community attempts to fix this via customer java 
> security providers, for example Google's 
> [conscrypt|https://github.com/google/conscrypt] and recently Amazon's 
> [ACCP|https://github.com/corretto/amazon-corretto-crypto-provider] which are 
> basically portions of OpenSSL/BoringSSL that are statically linked in and 
> exposed via JNI. These approaches are similar in spirit to what 
> [netty-tcnative|https://github.com/netty/netty-tcnative] is doing for TLS in 
> C* trunk.
> Since there may be tradeoffs to using various providers for various functions 
> (e.g. {{conscrypt}} may be faster or slower than {{accp}} in certain use 
> cases and in other cases you may want to use JDK providers for ease of 
> upgrading) it would be useful if Cassandra supported pluggable providers per 
> use case. For example we could use {{conscrypt}} for TLS, {{accp}} for MD5 
> digesting, and the {{SUN}} provider for everything else. There is a small 
> amount of JVM wiring that needs to be done for this and it could unlock 
> 10-25% CPU capacity improvements.
> We can then use this pluggability to test different providers and if one is 
> strictly dominant we can just check that one in in libs and default to it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15294) Allow easy use of custom security providers

2019-08-29 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918869#comment-16918869
 ] 

Dinesh Joshi commented on CASSANDRA-15294:
--

[~jolynch], given the advantages, I think this is worth adding. Do you want to 
propose a patch?

> Allow easy use of custom security providers
> ---
>
> Key: CASSANDRA-15294
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15294
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Joseph Lynch
>Priority: Normal
>
> As more users are switching to using {{AES-GCM}} TLS they are increasingly 
> running into extremely poor performance with the JDK implementations (e.g. 
> [JDK-8046943|https://bugs.openjdk.java.net/browse/JDK-8046943]). It's not 
> just TLS either, generally speaking Java crypto can be really slow, including 
> for example MD5 hashing which powers our digests (CASSANDRA-14611).
> There have been a few community attempts to fix this via customer java 
> security providers, for example Google's 
> [conscrypt|https://github.com/google/conscrypt] and recently Amazon's 
> [ACCP|https://github.com/corretto/amazon-corretto-crypto-provider] which are 
> basically portions of OpenSSL/BoringSSL that are statically linked in and 
> exposed via JNI. These approaches are similar in spirit to what 
> [netty-tcnative|https://github.com/netty/netty-tcnative] is doing for TLS in 
> C* trunk.
> Since there may be tradeoffs to using various providers for various functions 
> (e.g. {{conscrypt}} may be faster or slower than {{accp}} in certain use 
> cases and in other cases you may want to use JDK providers for ease of 
> upgrading) it would be useful if Cassandra supported pluggable providers per 
> use case. For example we could use {{conscrypt}} for TLS, {{accp}} for MD5 
> digesting, and the {{SUN}} provider for everything else. There is a small 
> amount of JVM wiring that needs to be done for this and it could unlock 
> 10-25% CPU capacity improvements.
> We can then use this pluggability to test different providers and if one is 
> strictly dominant we can just check that one in in libs and default to it.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15172) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException

2019-08-29 Thread Shalom (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918545#comment-16918545
 ] 

Shalom commented on CASSANDRA-15172:


Thanks a lot for further clarifying [~benedict]. (I hope you enjoyed your 
vacation :) )

Just to set my mind straight, the issue is when there are mixed versions in the 
cluster, so if I upgraded all binaries to 3.11, it won't recur even if I 
haven't upgraded the SSTables yet. Is my assumption correct?

> LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException
> 
>
> Key: CASSANDRA-15172
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15172
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Other
>Reporter: Shalom
>Assignee: Benedict
>Priority: Normal
> Fix For: 3.0.19, 3.11.5
>
>
> Hi All,
> This is the first time I open an issue, so apologies if I'm not following the 
> rules properly.
>  
> After upgrading a node from version 2.1.21 to 3.11.4, we've started seeing a 
> lot of AbstractLocalAwareExecutorService exceptions. This happened right 
> after the node successfully started up with the new 3.11.4 binaries. 
> {noformat}
> INFO  [main] 2019-06-05 04:41:37,730 Gossiper.java:1715 - No gossip backlog; 
> proceeding
> INFO  [main] 2019-06-05 04:41:38,036 NativeTransportService.java:70 - Netty 
> using native Epoll event loop
> INFO  [main] 2019-06-05 04:41:38,117 Server.java:155 - Using Netty Version: 
> [netty-buffer=netty-buffer-4.0.44.Final.452812a, 
> netty-codec=netty-codec-4.0.44.Final.452812a, 
> netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, 
> netty-codec-http=netty-codec-http-4.0.44.Final.452812a, 
> netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, 
> netty-common=netty-common-4.0.44.Final.452812a, 
> netty-handler=netty-handler-4.0.44.Final.452812a, 
> netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, 
> netty-transport=netty-transport-4.0.44.Final.452812a, 
> netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a,
>  netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, 
> netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, 
> netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a]
> INFO  [main] 2019-06-05 04:41:38,118 Server.java:156 - Starting listening for 
> CQL clients on /0.0.0.0:9042 (unencrypted)...
> INFO  [main] 2019-06-05 04:41:38,179 CassandraDaemon.java:556 - Not starting 
> RPC server as requested. Use JMX (StorageService->startRPCServer()) or 
> nodetool (enablethrift) to start it
> INFO  [Native-Transport-Requests-21] 2019-06-05 04:41:39,145 
> AuthCache.java:161 - (Re)initializing PermissionsCache (validity 
> period/update interval/max entries) (2000/2000/1000)
> INFO  [OptionalTasks:1] 2019-06-05 04:41:39,729 CassandraAuthorizer.java:409 
> - Converting legacy permissions data
> INFO  [HANDSHAKE-/10.10.10.8] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.8
> INFO  [HANDSHAKE-/10.10.10.9] 2019-06-05 04:41:39,808 
> OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.9
> INFO  [HANDSHAKE-dc1_02/10.10.10.6] 2019-06-05 04:41:39,809 
> OutboundTcpConnection.java:561 - Handshaking version with dc1_02/10.10.10.6
> WARN  [ReadStage-2] 2019-06-05 04:41:39,857 
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread 
> Thread[ReadStage-2,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: 1
>     at 
> org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545)
>     at 
> org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522)
>     at 
> org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446)
>     at 
> org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352)
>     at 
> org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171)
>     at 
> org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77)
>     at 
> org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802)
>     at 
> org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953)
>     at 
> org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929)
>     at 
> org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:62)
>     at 
>