[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16919020#comment-16919020 ] Dinesh Joshi commented on CASSANDRA-13938: -- [~jolynch] I have assigned this to you. Thanks for volunteering :) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Joseph Lynch >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair job has failed with the
[jira] [Assigned] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi reassigned CASSANDRA-13938: Assignee: Joseph Lynch (was: Jason Brown) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Joseph Lynch >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair job has failed with the error message: > [2017-10-05 14:32:07,048] null > at
[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918975#comment-16918975 ] Joseph Lynch commented on CASSANDRA-13938: -- I might have cycles to tackle this shortly, if someone else has cycles first please take it. > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair
[jira] [Commented] (CASSANDRA-15262) server_encryption_options is not backwards compatible with 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918959#comment-16918959 ] Joseph Lynch commented on CASSANDRA-15262: -- This could slip to 4.0-beta if we had to, but it is going to be annoying for folks testing with TLS (it was for us). > server_encryption_options is not backwards compatible with 3.11 > --- > > Key: CASSANDRA-15262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15262 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Fix For: 4.0, 4.0-alpha > > > The current `server_encryption_options` configuration options are as follows: > {noformat} > server_encryption_options: > # set to true for allowing secure incoming connections > enabled: false > # If enabled and optional are both set to true, encrypted and unencrypted > connections are handled on the storage_port > optional: false > # if enabled, will open up an encrypted listening socket on > ssl_storage_port. Should be used > # during upgrade to 4.0; otherwise, set to false. > enable_legacy_ssl_storage_port: false > # on outbound connections, determine which type of peers to securely > connect to. 'enabled' must be set to true. > internode_encryption: none > keystore: conf/.keystore > keystore_password: cassandra > truststore: conf/.truststore > truststore_password: cassandra > # More advanced defaults below: > # protocol: TLS > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false > {noformat} > A couple of issues here: > 1. optional defaults to false, which will break existing TLS configurations > for (from what I can tell) no particularly good reason > 2. The provided protocol and cipher suites are not good ideas (in particular > encouraging anyone to use CBC ciphers is a bad plan > I propose that before the 4.0 cut we fixup server_encryption_options and even > client_encryption_options : > # Change the default {{optional}} setting to true. As the new Netty code > intelligently decides to open a TLS connection or not this is the more > sensible default (saves operators a step while transitioning to TLS as well) > # Update the defaults to what netty actually defaults to -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15294) Allow easy use of custom security providers
[ https://issues.apache.org/jira/browse/CASSANDRA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918955#comment-16918955 ] Joseph Lynch commented on CASSANDRA-15294: -- Yes I think after the alpha cuts I should have cycles to add this in, since it doesn't involve any backwards incompatible API changes I can do it before beta. I'd like to add the configuration capability to 3.0/3.11/trunk if possible but I think people might object to it being in 3.0 ... If no-one objects I'll just make patches for all three. > Allow easy use of custom security providers > --- > > Key: CASSANDRA-15294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15294 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Joseph Lynch >Priority: Normal > > As more users are switching to using {{AES-GCM}} TLS they are increasingly > running into extremely poor performance with the JDK implementations (e.g. > [JDK-8046943|https://bugs.openjdk.java.net/browse/JDK-8046943]). It's not > just TLS either, generally speaking Java crypto can be really slow, including > for example MD5 hashing which powers our digests (CASSANDRA-14611). > There have been a few community attempts to fix this via customer java > security providers, for example Google's > [conscrypt|https://github.com/google/conscrypt] and recently Amazon's > [ACCP|https://github.com/corretto/amazon-corretto-crypto-provider] which are > basically portions of OpenSSL/BoringSSL that are statically linked in and > exposed via JNI. These approaches are similar in spirit to what > [netty-tcnative|https://github.com/netty/netty-tcnative] is doing for TLS in > C* trunk. > Since there may be tradeoffs to using various providers for various functions > (e.g. {{conscrypt}} may be faster or slower than {{accp}} in certain use > cases and in other cases you may want to use JDK providers for ease of > upgrading) it would be useful if Cassandra supported pluggable providers per > use case. For example we could use {{conscrypt}} for TLS, {{accp}} for MD5 > digesting, and the {{SUN}} provider for everything else. There is a small > amount of JVM wiring that needs to be done for this and it could unlock > 10-25% CPU capacity improvements. > We can then use this pluggability to test different providers and if one is > strictly dominant we can just check that one in in libs and default to it. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15146) Transitional TLS server configuration options are overly complex
[ https://issues.apache.org/jira/browse/CASSANDRA-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15146: - Fix Version/s: 4.0-beta > Transitional TLS server configuration options are overly complex > > > Key: CASSANDRA-15146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15146 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption, Local/Config >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Fix For: 4.0, 4.0-beta > > > It appears as part of the port from transitional client TLS to transitional > server TLS in CASSANDRA-10404 (the ability to switch a cluster to using > {{internode_encryption}} without listening on two ports and without downtime) > we carried the {{enabled}} setting over from the client implementation. I > believe that the {{enabled}} option is redundant to {{internode_encryption}} > and {{optional}} and it should therefore be removed prior to the 4.0 release > where we will have to start respecting that interface. > Current trunk yaml: > {noformat} > server_encryption_options: > > # set to true for allowing secure incoming connections > > enabled: false > > # If enabled and optional are both set to true, encrypted and unencrypted > connections are handled on the storage_port > optional: false > > > > > # if enabled, will open up an encrypted listening socket on > ssl_storage_port. Should be used > # during upgrade to 4.0; otherwise, set to false. > > enable_legacy_ssl_storage_port: false > > # on outbound connections, determine which type of peers to securely > connect to. 'enabled' must be set to true. > internode_encryption: none > > keystore: conf/.keystore > > keystore_password: cassandra > > truststore: conf/.truststore > > truststore_password: cassandra > {noformat} > I propose we eliminate {{enabled}} and just use {{optional}} and > {{internode_encryption}} to determine the listener setup. I also propose we > change the default of {{optional}} to true. We could also re-name > {{optional}} since it's a new option but I think it's good to stay consistent > with the client and use {{optional}}. > ||optional||internode_encryption||description|| > |true|none|(default) No encryption is used but if a server reaches out with > it we'll use it| > |false|dc|Encryption is required for inter-dc communication, but not intra-dc| > |false|all|Encryption is required for all communication| > |false|none|We only listen for unencrypted connections| > |true|dc|Encryption is used for inter-dc communication but is not required| > |true|all|Encryption is used for all communication but is not required| > From these states it is clear when we should be accepting TLS connections > (all except for false and none) as well as when we must enforce it. > To transition without downtime from an un-encrypted cluster to an encrypted > cluster the user would do the following: > 1. After adding valid truststores, change {{internode_encryption}} to the > desired level of encryption (recommended {{all}}) and restart Cassandra > 2. Change {{optional=false}} and restart Cassandra to enforce #1 > If {{optional}} defaulted to {{false}} as it does right now we'd need a third > restart to first change {{optional}} to {{true}}, which given my > understanding of the OptionalSslHandler isn't really relevant. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15262) server_encryption_options is not backwards compatible with 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15262: - Fix Version/s: 4.0-alpha > server_encryption_options is not backwards compatible with 3.11 > --- > > Key: CASSANDRA-15262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15262 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Fix For: 4.0, 4.0-alpha > > > The current `server_encryption_options` configuration options are as follows: > {noformat} > server_encryption_options: > # set to true for allowing secure incoming connections > enabled: false > # If enabled and optional are both set to true, encrypted and unencrypted > connections are handled on the storage_port > optional: false > # if enabled, will open up an encrypted listening socket on > ssl_storage_port. Should be used > # during upgrade to 4.0; otherwise, set to false. > enable_legacy_ssl_storage_port: false > # on outbound connections, determine which type of peers to securely > connect to. 'enabled' must be set to true. > internode_encryption: none > keystore: conf/.keystore > keystore_password: cassandra > truststore: conf/.truststore > truststore_password: cassandra > # More advanced defaults below: > # protocol: TLS > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false > {noformat} > A couple of issues here: > 1. optional defaults to false, which will break existing TLS configurations > for (from what I can tell) no particularly good reason > 2. The provided protocol and cipher suites are not good ideas (in particular > encouraging anyone to use CBC ciphers is a bad plan > I propose that before the 4.0 cut we fixup server_encryption_options and even > client_encryption_options : > # Change the default {{optional}} setting to true. As the new Netty code > intelligently decides to open a TLS connection or not this is the more > sensible default (saves operators a step while transitioning to TLS as well) > # Update the defaults to what netty actually defaults to -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14764) Evaluate 12 Node Breaking Point, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14764: - Fix Version/s: 4.0-beta > Evaluate 12 Node Breaking Point, compression=none, encryption=none, > coalescing=off > -- > > Key: CASSANDRA-14764 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14764 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Streaming and Messaging >Reporter: Joseph Lynch >Assignee: Vinay Chella >Priority: Normal > Fix For: 4.0-beta > > Attachments: i-03341e1c52de6ea3e-after-queue-change.svg, > i-07cd92e844d66d801-after-queue-bound.svg, i-07cd92e844d66d801-hint-play.svg, > i-07cd92e844d66d801-uninlined-with-jvm-methods.svg, ttop.txt > > > *Setup:* > * Cassandra: 12 (2*6) node i3.xlarge AWS instance (4 cpu cores, 30GB ram) > running cassandra trunk off of jasobrown/14503 jdd7ec5a2 (Jasons patched > internode messaging branch) vs the same footprint running 3.0.17 > * Two datacenters with 100ms latency between them > * No compression, encryption, or coalescing turned on > *Test #1:* > ndbench sent 1.5k QPS at a coordinator level to one datacenter (RF=3*2 = 6 so > 3k global replica QPS) of 4kb single partition BATCH mutations at LOCAL_ONE. > This represents about 250 QPS per coordinator in the first datacenter or 60 > QPS per core. The goal was to observe P99 write and read latencies under > various QPS. > *Result:* > The good news is since the CASSANDRA-14503 changes, instead of keeping the > mutations on heap we put the message into hints instead and don't run out of > memory. The bad news is that the {{MessagingService-NettyOutbound-Thread's}} > would occasionally enter a degraded state where they would just spin on a > core. I've attached flame graphs showing the CPU state as [~jasobrown] > applied fixes to the {{OutboundMessagingConnection}} class. > *Follow Ups:* > [~jasobrown] has committed a number of fixes onto his > {{jasobrown/14503-collab}} branch including: > 1. Limiting the amount of time spent dequeuing messages if they are expired > (previously if messages entered the queue faster than we could dequeue them > we'd just inifinte loop on the consumer side) > 2. Don't call {{dequeueMessages}} from within {{dequeueMessages}} created > callbacks. > We're continuing to use CPU flamegraphs to figure out where we're looping and > fixing bugs as we find them. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14747) Evaluate 200 node, compression=none, encryption=none, coalescing=off
[ https://issues.apache.org/jira/browse/CASSANDRA-14747?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14747: - Fix Version/s: 4.0-beta > Evaluate 200 node, compression=none, encryption=none, coalescing=off > - > > Key: CASSANDRA-14747 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14747 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Testing >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Fix For: 4.0-beta > > Attachments: 3.0.17-QPS.png, 4.0.1-QPS.png, > 4.0.11-after-jolynch-tweaks.svg, 4.0.12-after-unconditional-flush.svg, > 4.0.15-after-sndbuf-fix.svg, 4.0.7-before-my-changes.svg, > 4.0_errors_showing_heap_pressure.txt, > 4.0_heap_histogram_showing_many_MessageOuts.txt, > i-0ed2acd2dfacab7c1-after-looping-fixes.svg, > trunk_14503_v2_cpuflamegraph.svg, trunk_vs_3.0.17_latency_under_load.png, > ttop_NettyOutbound-Thread_spinning.txt, > useast1c-i-0e1ddfe8b2f769060-mutation-flame.svg, > useast1e-i-08635fa1631601538_flamegraph_96node.svg, > useast1e-i-08635fa1631601538_ttop_netty_outbound_threads_96nodes, > useast1e-i-08635fa1631601538_uninlinedcpuflamegraph.0_96node_60sec_profile.svg > > > Tracks evaluating a 200 node cluster with all internode settings off (no > compression, no encryption, no coalescing). -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14746) Ensure Netty Internode Messaging Refactor is Solid
[ https://issues.apache.org/jira/browse/CASSANDRA-14746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14746: - Fix Version/s: 4.0-beta > Ensure Netty Internode Messaging Refactor is Solid > -- > > Key: CASSANDRA-14746 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14746 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Streaming and Messaging >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Labels: 4.0-QA > Fix For: 4.0, 4.0-beta > > > Before we release 4.0 let's ensure that the internode messaging refactor is > 100% solid. As internode messaging is naturally used in many code paths and > widely configurable we have a large number of cluster configurations and test > configurations that must be vetted. > We plan to vary the following: > * Version of Cassandra 3.0.17 vs 4.0-alpha > * Cluster sizes with *multi-dc* deployments ranging from 6 - 100 nodes > * Client request rates varying between 1k QPS and 100k QPS of varying sizes > and shapes (BATCH, INSERT, SELECT point, SELECT range, etc ...) > * Internode compression > * Internode SSL (as well as openssl vs jdk) > * Internode Coalescing options > We are looking to measure the following as appropriate: > * Latency distributions of reads and writes (lower is better) > * Scaling limit, aka maximum throughput before violating p99 latency > deadline of 10ms @ LOCAL_QUORUM, on a fixed hardware deployment for 100% > writes, 100% reads and 50-50 writes+reads (higher is better) > * Thread counts (lower is better) > * Context switches (lower is better) > * On-CPU time of tasks (higher periods without context switch is better) > * GC allocation rates / throughput for a fixed size heap (lower allocation > better) > * Streaming recovery time for a single node failure, i.e. can Cassandra > saturate the NIC > > The goal is that 4.0 should have better latency, more throughput, fewer > threads, fewer context switches, less GC allocation, and faster recovery > time. I'm putting Jason Brown as the reviewer since he implemented most of > the internode refactor. > Current collaborators driving this QA task: Dinesh Joshi, Jordan West, Joey > Lynch (Netflix), Vinay Chella (Netflix) > Owning committer(s): Jason Brown -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15181) Ensure Nodes can Start and Stop
[ https://issues.apache.org/jira/browse/CASSANDRA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15181: - Fix Version/s: 4.0-beta > Ensure Nodes can Start and Stop > --- > > Key: CASSANDRA-15181 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15181 > Project: Cassandra > Issue Type: Sub-task > Components: Legacy/Streaming and Messaging, Test/benchmark >Reporter: Joseph Lynch >Assignee: Vinay Chella >Priority: High > Fix For: 4.0-beta > > > Let's load a cluster up with data and start killing nodes. We can do hard > failures (node terminations) and soft failures (process kills) We plan to > observe the following: > * Can nodes successfully bootstrap? > * How long does it take to bootstrap > * What are the effects of TLS on and off (e.g. on stream time) > * Are hints properly played after a node restart > * Do nodes properly shutdown and start back up. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14688) Update protocol spec and class level doc with protocol checksumming details
[ https://issues.apache.org/jira/browse/CASSANDRA-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-14688: - Fix Version/s: 4.0-beta > Update protocol spec and class level doc with protocol checksumming details > --- > > Key: CASSANDRA-14688 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14688 > Project: Cassandra > Issue Type: Task > Components: Legacy/Documentation and Website >Reporter: Sam Tunnicliffe >Assignee: Sam Tunnicliffe >Priority: Normal > Fix For: 4.0, 4.0-beta > > > CASSANDRA-13304 provides an option to add checksumming to the frame body of > native protocol messages. The native protocol spec needs to be updated to > reflect this ASAP. We should also verify that the javadoc comments describing > the on-wire format in > {{o.a.c.transport.frame.checksum.ChecksummingTransformer}} are up to date. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15228) Commit Log should not use sync markers
[ https://issues.apache.org/jira/browse/CASSANDRA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15228: - Fix Version/s: 4.0-alpha > Commit Log should not use sync markers > -- > > Key: CASSANDRA-15228 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15228 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log >Reporter: Benedict >Priority: Normal > Fix For: 4.0, 4.0-alpha > > > The sync markers existed to permit file re-use. Since we no longer re-use > files, they no longer provide any value. However, they _can_ corrupt the > commit log for replay in the event of a process crash. Before we release > 4.0, we should ideally remove the sync markers entirely. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-14801) calculatePendingRanges no longer safe for multiple adjacent range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918949#comment-16918949 ] Joseph Lynch commented on CASSANDRA-14801: -- [~benedict] do you think this should block the first alpha or it can wait for beta? > calculatePendingRanges no longer safe for multiple adjacent range movements > --- > > Key: CASSANDRA-14801 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14801 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination, Legacy/Distributed Metadata >Reporter: Benedict >Priority: Normal > Fix For: 4.0 > > > Correctness depended upon the narrowing to a {{Set}}, > which we no longer do - we maintain a collection of all {{Replica}}. Our > {{RangesAtEndpoint}} collection built by {{getPendingRanges}} can as a result > contain the same endpoint multiple times; and our {{EndpointsForToken}} > obtained by {{TokenMetadata.pendingEndpointsFor}} may fail to be constructed, > resulting in cluster-wide failures for writes to the affected token ranges > for the duration of the range movement. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-10190) Python 3 support for cqlsh
[ https://issues.apache.org/jira/browse/CASSANDRA-10190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-10190: - Fix Version/s: 4.0-alpha > Python 3 support for cqlsh > -- > > Key: CASSANDRA-10190 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10190 > Project: Cassandra > Issue Type: Improvement > Components: Legacy/Tools >Reporter: Andrew Pennebaker >Assignee: Patrick Bannister >Priority: Normal > Labels: cqlsh > Fix For: 4.0-alpha > > Attachments: coverage_notes.txt > > > Users who operate in a Python 3 environment may have trouble launching cqlsh. > Could we please update cqlsh's syntax to run in Python 3? > As a workaround, users can setup pyenv, and cd to a directory with a > .python-version containing "2.7". But it would be nice if cqlsh supported > modern Python versions out of the box. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-15213: - Fix Version/s: (was: 4.0) 4.0-beta > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benedict updated CASSANDRA-13938: - Fix Version/s: (was: 4.0) 4.0-alpha > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair job has failed with the error message: > [2017-10-05 14:32:07,048] null > at
[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-13938: - Fix Version/s: (was: 4.0) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Urgent > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair job has failed with the error message: > [2017-10-05 14:32:07,048] null > at
[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-13938: - Fix Version/s: 4.0 > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Urgent > Fix For: 4.0 > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair job has failed with the error message: > [2017-10-05 14:32:07,048] null > at
[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-13938: - Fix Version/s: (was: 4.x) 4.0 > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Jason Brown >Priority: Urgent > Fix For: 4.0 > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair job has failed with the error message: > [2017-10-05 14:32:07,048] null > at
[jira] [Updated] (CASSANDRA-15146) Transitional TLS server configuration options are overly complex
[ https://issues.apache.org/jira/browse/CASSANDRA-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15146: - Fix Version/s: 4.0 > Transitional TLS server configuration options are overly complex > > > Key: CASSANDRA-15146 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15146 > Project: Cassandra > Issue Type: Bug > Components: Feature/Encryption, Local/Config >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Fix For: 4.0 > > > It appears as part of the port from transitional client TLS to transitional > server TLS in CASSANDRA-10404 (the ability to switch a cluster to using > {{internode_encryption}} without listening on two ports and without downtime) > we carried the {{enabled}} setting over from the client implementation. I > believe that the {{enabled}} option is redundant to {{internode_encryption}} > and {{optional}} and it should therefore be removed prior to the 4.0 release > where we will have to start respecting that interface. > Current trunk yaml: > {noformat} > server_encryption_options: > > # set to true for allowing secure incoming connections > > enabled: false > > # If enabled and optional are both set to true, encrypted and unencrypted > connections are handled on the storage_port > optional: false > > > > > # if enabled, will open up an encrypted listening socket on > ssl_storage_port. Should be used > # during upgrade to 4.0; otherwise, set to false. > > enable_legacy_ssl_storage_port: false > > # on outbound connections, determine which type of peers to securely > connect to. 'enabled' must be set to true. > internode_encryption: none > > keystore: conf/.keystore > > keystore_password: cassandra > > truststore: conf/.truststore > > truststore_password: cassandra > {noformat} > I propose we eliminate {{enabled}} and just use {{optional}} and > {{internode_encryption}} to determine the listener setup. I also propose we > change the default of {{optional}} to true. We could also re-name > {{optional}} since it's a new option but I think it's good to stay consistent > with the client and use {{optional}}. > ||optional||internode_encryption||description|| > |true|none|(default) No encryption is used but if a server reaches out with > it we'll use it| > |false|dc|Encryption is required for inter-dc communication, but not intra-dc| > |false|all|Encryption is required for all communication| > |false|none|We only listen for unencrypted connections| > |true|dc|Encryption is used for inter-dc communication but is not required| > |true|all|Encryption is used for all communication but is not required| > From these states it is clear when we should be accepting TLS connections > (all except for false and none) as well as when we must enforce it. > To transition without downtime from an un-encrypted cluster to an encrypted > cluster the user would do the following: > 1. After adding valid truststores, change {{internode_encryption}} to the > desired level of encryption (recommended {{all}}) and restart Cassandra > 2. Change {{optional=false}} and restart Cassandra to enforce #1 > If {{optional}} defaulted to {{false}} as it does right now we'd need a third > restart to first change {{optional}} to {{true}}, which given my > understanding of the OptionalSslHandler isn't really relevant. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15262) server_encryption_options is not backwards compatible with 3.11
[ https://issues.apache.org/jira/browse/CASSANDRA-15262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15262: - Fix Version/s: 4.0 > server_encryption_options is not backwards compatible with 3.11 > --- > > Key: CASSANDRA-15262 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15262 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Fix For: 4.0 > > > The current `server_encryption_options` configuration options are as follows: > {noformat} > server_encryption_options: > # set to true for allowing secure incoming connections > enabled: false > # If enabled and optional are both set to true, encrypted and unencrypted > connections are handled on the storage_port > optional: false > # if enabled, will open up an encrypted listening socket on > ssl_storage_port. Should be used > # during upgrade to 4.0; otherwise, set to false. > enable_legacy_ssl_storage_port: false > # on outbound connections, determine which type of peers to securely > connect to. 'enabled' must be set to true. > internode_encryption: none > keystore: conf/.keystore > keystore_password: cassandra > truststore: conf/.truststore > truststore_password: cassandra > # More advanced defaults below: > # protocol: TLS > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > # require_client_auth: false > # require_endpoint_verification: false > {noformat} > A couple of issues here: > 1. optional defaults to false, which will break existing TLS configurations > for (from what I can tell) no particularly good reason > 2. The provided protocol and cipher suites are not good ideas (in particular > encouraging anyone to use CBC ciphers is a bad plan > I propose that before the 4.0 cut we fixup server_encryption_options and even > client_encryption_options : > # Change the default {{optional}} setting to true. As the new Netty code > intelligently decides to open a TLS connection or not this is the more > sensible default (saves operators a step while transitioning to TLS as well) > # Update the defaults to what netty actually defaults to -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15175) Evaluate 200 node, compression=on, encryption=all
[ https://issues.apache.org/jira/browse/CASSANDRA-15175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph Lynch updated CASSANDRA-15175: - Fix Version/s: 4.0 > Evaluate 200 node, compression=on, encryption=all > - > > Key: CASSANDRA-15175 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15175 > Project: Cassandra > Issue Type: Sub-task > Components: Test/benchmark >Reporter: Joseph Lynch >Assignee: Joseph Lynch >Priority: Normal > Labels: 4.0-QA > Fix For: 4.0 > > Attachments: 30x_14400cRPS-14400cWPS.svg, > 30x_LQ_21600cRPS-14400cWPS.svg, ShortbufferExceptions.png, > cassandra_comparative_performance_all_flamegraphs.html, > image-2019-08-06-14-20-25-140.png, odd_netty_jdk_tls_cpu_usage.png, > trunk_14400cRPS-14400cWPS.svg, trunk_187000cRPS-14400cWPS.svg, > trunk_187kcRPS_14kcWPS.png, trunk_22000cRPS-14400cWPS-jdk.svg, > trunk_22000cRPS-14400cWPS-openssl.svg, trunk_220kcRPS_14kcWPS.png, > trunk_252kcRPS-14kcWPS.png, trunk_93500cRPS-14400cWPS.svg, > trunk_LQ_14400cRPS-14400cWPS.svg, trunk_LQ_21600cRPS-14400cWPS.svg, > trunk_Q_21600cRPS-7200cWPS.svg, trunk_allocation_Q_21k_cRPS.svg, > trunk_vs_30x_125kcRPS_14kcWPS.png, trunk_vs_30x_14kRPS_14kcWPS_load.png, > trunk_vs_30x_14kcRPS_14kcWPS.png, > trunk_vs_30x_14kcRPS_14kcWPS_schedstat_delays.png, > trunk_vs_30x_156kcRPS_14kcWPS.png, trunk_vs_30x_24kcRPS_14kcWPS.png, > trunk_vs_30x_24kcRPS_14kcWPS_load.png, trunk_vs_30x_31kcRPS_14kcWPS.png, > trunk_vs_30x_62kcRPS_14kcWPS.png, trunk_vs_30x_93kcRPS_14kcWPS.png, > trunk_vs_30x_LQ_14kcRPS_14kcWPS.png, trunk_vs_30x_LQ_21kcRPS_14kcWPS.png, > trunk_vs_30x_LQ_64kcRPS_14kcWPS.png, trunk_vs_30x_LQ_jdk_summary.png, > trunk_vs_30x_LQ_openssl_21kcRPS_14kcWPS.png, > trunk_vs_30x_LQ_tcnative_summary.png, trunk_vs_30x_Q_21kcRPS_7200cWPS.png, > trunk_vs_30x_Q_36kcRPS_7200cWPS.png, trunk_vs_30x_Q_tcnative_summary.png, > trunk_vs_30x_summary.png, trunk_vs_30x_wEQ_rLQ_7kcRPS_22kcWPS.png, > trunk_vs_30x_wEQ_rLQ_7kcRPS_58kcWPS.png, > trunk_vs_30x_wEQ_rLQ_7kcRPS_7kcWPS.png, > trunk_vs_30x_write_LO_7kcRPS_108kcWPS.png, > trunk_vs_30x_write_LO_7kcRPS_162kcWPS.png, > trunk_vs_30x_write_LO_7kcRPS_72kcWPS.png, > trunk_vs_30x_write_LO_7kcRPS_7kcWPS.png, write_scaling_local_one_summary.png, > write_scaling_lq_eq_summary.png > > > Tracks evaluating a 192 node cluster with compression and encryption on. > First test is a [read scaling test > |https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=1336583053] > > |Test Setup| | > |Baseline|3.0.19 > @d7d00036| > |Candiate|trunk > @abb0e177| > | | | > |Workload| | > |Write size|4kb random| > |Read size|4kb random| > |Per Node Data|110GiB| > |Generator|ndbench| > |Key Distribution|Uniform| > |SSTable Compr|Off| > |Internode TLS|On (jdk)| > |Internode Compr|On| > |Compaction|LCS (320 MiB)| > |Repair|Off| > | | | > |Hardware| | > |Instance Type|i3.xlarge| > |Deployment|96 us-east-1, 96 eu-west-1| > |Region node count|96| > | | | > |OS Settings| | > |IO scheduler|kyber| > |Net qdisc|tc-fq| > |readahead|32kb| > |Java Version|OpenJDK 1.8.0_202 (Zulu)| > | | | > Second test is a [write scaling > test|https://docs.google.com/spreadsheets/d/1Vq_wC2q-rcG7UWim-t2leZZ4GgcuAjSREMFbG0QGy20/edit#gid=428858608]: -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra-builds] branch master updated: Add OpenJDK 11 to CentOS docker image
This is an automated email from the ASF dual-hosted git repository. mshuler pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/cassandra-builds.git The following commit(s) were added to refs/heads/master by this push: new 6b63414 Add OpenJDK 11 to CentOS docker image 6b63414 is described below commit 6b63414f5101ecc02f51ebb8a8a1cd996e1df27f Author: Michael Shuler AuthorDate: Thu Aug 29 13:55:54 2019 -0500 Add OpenJDK 11 to CentOS docker image --- docker/centos7-image.docker | 1 + 1 file changed, 1 insertion(+) diff --git a/docker/centos7-image.docker b/docker/centos7-image.docker index c622f6d..1d80216 100644 --- a/docker/centos7-image.docker +++ b/docker/centos7-image.docker @@ -17,6 +17,7 @@ RUN yum -y install \ git \ java-1.7.0-openjdk-devel \ java-1.8.0-openjdk-devel \ + java-11-openjdk-devel \ make \ rpm-build \ sudo - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15294) Allow easy use of custom security providers
[ https://issues.apache.org/jira/browse/CASSANDRA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-15294: - Status: Awaiting Feedback (was: Triage Needed) > Allow easy use of custom security providers > --- > > Key: CASSANDRA-15294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15294 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Joseph Lynch >Priority: Normal > > As more users are switching to using {{AES-GCM}} TLS they are increasingly > running into extremely poor performance with the JDK implementations (e.g. > [JDK-8046943|https://bugs.openjdk.java.net/browse/JDK-8046943]). It's not > just TLS either, generally speaking Java crypto can be really slow, including > for example MD5 hashing which powers our digests (CASSANDRA-14611). > There have been a few community attempts to fix this via customer java > security providers, for example Google's > [conscrypt|https://github.com/google/conscrypt] and recently Amazon's > [ACCP|https://github.com/corretto/amazon-corretto-crypto-provider] which are > basically portions of OpenSSL/BoringSSL that are statically linked in and > exposed via JNI. These approaches are similar in spirit to what > [netty-tcnative|https://github.com/netty/netty-tcnative] is doing for TLS in > C* trunk. > Since there may be tradeoffs to using various providers for various functions > (e.g. {{conscrypt}} may be faster or slower than {{accp}} in certain use > cases and in other cases you may want to use JDK providers for ease of > upgrading) it would be useful if Cassandra supported pluggable providers per > use case. For example we could use {{conscrypt}} for TLS, {{accp}} for MD5 > digesting, and the {{SUN}} provider for everything else. There is a small > amount of JVM wiring that needs to be done for this and it could unlock > 10-25% CPU capacity improvements. > We can then use this pluggability to test different providers and if one is > strictly dominant we can just check that one in in libs and default to it. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15294) Allow easy use of custom security providers
[ https://issues.apache.org/jira/browse/CASSANDRA-15294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918869#comment-16918869 ] Dinesh Joshi commented on CASSANDRA-15294: -- [~jolynch], given the advantages, I think this is worth adding. Do you want to propose a patch? > Allow easy use of custom security providers > --- > > Key: CASSANDRA-15294 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15294 > Project: Cassandra > Issue Type: Improvement > Components: Local/Config >Reporter: Joseph Lynch >Priority: Normal > > As more users are switching to using {{AES-GCM}} TLS they are increasingly > running into extremely poor performance with the JDK implementations (e.g. > [JDK-8046943|https://bugs.openjdk.java.net/browse/JDK-8046943]). It's not > just TLS either, generally speaking Java crypto can be really slow, including > for example MD5 hashing which powers our digests (CASSANDRA-14611). > There have been a few community attempts to fix this via customer java > security providers, for example Google's > [conscrypt|https://github.com/google/conscrypt] and recently Amazon's > [ACCP|https://github.com/corretto/amazon-corretto-crypto-provider] which are > basically portions of OpenSSL/BoringSSL that are statically linked in and > exposed via JNI. These approaches are similar in spirit to what > [netty-tcnative|https://github.com/netty/netty-tcnative] is doing for TLS in > C* trunk. > Since there may be tradeoffs to using various providers for various functions > (e.g. {{conscrypt}} may be faster or slower than {{accp}} in certain use > cases and in other cases you may want to use JDK providers for ease of > upgrading) it would be useful if Cassandra supported pluggable providers per > use case. For example we could use {{conscrypt}} for TLS, {{accp}} for MD5 > digesting, and the {{SUN}} provider for everything else. There is a small > amount of JVM wiring that needs to be done for this and it could unlock > 10-25% CPU capacity improvements. > We can then use this pluggability to test different providers and if one is > strictly dominant we can just check that one in in libs and default to it. -- This message was sent by Atlassian Jira (v8.3.2#803003) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15172) LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/CASSANDRA-15172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16918545#comment-16918545 ] Shalom commented on CASSANDRA-15172: Thanks a lot for further clarifying [~benedict]. (I hope you enjoyed your vacation :) ) Just to set my mind straight, the issue is when there are mixed versions in the cluster, so if I upgraded all binaries to 3.11, it won't recur even if I haven't upgraded the SSTables yet. Is my assumption correct? > LegacyLayout RangeTombstoneList throws IndexOutOfBoundsException > > > Key: CASSANDRA-15172 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15172 > Project: Cassandra > Issue Type: Bug > Components: Local/Other >Reporter: Shalom >Assignee: Benedict >Priority: Normal > Fix For: 3.0.19, 3.11.5 > > > Hi All, > This is the first time I open an issue, so apologies if I'm not following the > rules properly. > > After upgrading a node from version 2.1.21 to 3.11.4, we've started seeing a > lot of AbstractLocalAwareExecutorService exceptions. This happened right > after the node successfully started up with the new 3.11.4 binaries. > {noformat} > INFO [main] 2019-06-05 04:41:37,730 Gossiper.java:1715 - No gossip backlog; > proceeding > INFO [main] 2019-06-05 04:41:38,036 NativeTransportService.java:70 - Netty > using native Epoll event loop > INFO [main] 2019-06-05 04:41:38,117 Server.java:155 - Using Netty Version: > [netty-buffer=netty-buffer-4.0.44.Final.452812a, > netty-codec=netty-codec-4.0.44.Final.452812a, > netty-codec-haproxy=netty-codec-haproxy-4.0.44.Final.452812a, > netty-codec-http=netty-codec-http-4.0.44.Final.452812a, > netty-codec-socks=netty-codec-socks-4.0.44.Final.452812a, > netty-common=netty-common-4.0.44.Final.452812a, > netty-handler=netty-handler-4.0.44.Final.452812a, > netty-tcnative=netty-tcnative-1.1.33.Fork26.142ecbb, > netty-transport=netty-transport-4.0.44.Final.452812a, > netty-transport-native-epoll=netty-transport-native-epoll-4.0.44.Final.452812a, > netty-transport-rxtx=netty-transport-rxtx-4.0.44.Final.452812a, > netty-transport-sctp=netty-transport-sctp-4.0.44.Final.452812a, > netty-transport-udt=netty-transport-udt-4.0.44.Final.452812a] > INFO [main] 2019-06-05 04:41:38,118 Server.java:156 - Starting listening for > CQL clients on /0.0.0.0:9042 (unencrypted)... > INFO [main] 2019-06-05 04:41:38,179 CassandraDaemon.java:556 - Not starting > RPC server as requested. Use JMX (StorageService->startRPCServer()) or > nodetool (enablethrift) to start it > INFO [Native-Transport-Requests-21] 2019-06-05 04:41:39,145 > AuthCache.java:161 - (Re)initializing PermissionsCache (validity > period/update interval/max entries) (2000/2000/1000) > INFO [OptionalTasks:1] 2019-06-05 04:41:39,729 CassandraAuthorizer.java:409 > - Converting legacy permissions data > INFO [HANDSHAKE-/10.10.10.8] 2019-06-05 04:41:39,808 > OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.8 > INFO [HANDSHAKE-/10.10.10.9] 2019-06-05 04:41:39,808 > OutboundTcpConnection.java:561 - Handshaking version with /10.10.10.9 > INFO [HANDSHAKE-dc1_02/10.10.10.6] 2019-06-05 04:41:39,809 > OutboundTcpConnection.java:561 - Handshaking version with dc1_02/10.10.10.6 > WARN [ReadStage-2] 2019-06-05 04:41:39,857 > AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread > Thread[ReadStage-2,5,main]: {} > java.lang.ArrayIndexOutOfBoundsException: 1 > at > org.apache.cassandra.db.AbstractBufferClusteringPrefix.get(AbstractBufferClusteringPrefix.java:55) > at > org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSizeCompound(LegacyLayout.java:2545) > at > org.apache.cassandra.db.LegacyLayout$LegacyRangeTombstoneList.serializedSize(LegacyLayout.java:2522) > at > org.apache.cassandra.db.LegacyLayout.serializedSizeAsLegacyPartition(LegacyLayout.java:565) > at > org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:446) > at > org.apache.cassandra.db.ReadResponse$Serializer.serializedSize(ReadResponse.java:352) > at > org.apache.cassandra.net.MessageOut.payloadSize(MessageOut.java:171) > at > org.apache.cassandra.net.OutboundTcpConnectionPool.getConnection(OutboundTcpConnectionPool.java:77) > at > org.apache.cassandra.net.MessagingService.getConnection(MessagingService.java:802) > at > org.apache.cassandra.net.MessagingService.sendOneWay(MessagingService.java:953) > at > org.apache.cassandra.net.MessagingService.sendReply(MessagingService.java:929) > at > org.apache.cassandra.db.ReadCommandVerbHandler.doVerb(ReadCommandVerbHandler.java:62) > at >