[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18
[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016628#comment-17016628 ] Thomas Steinmaurer edited comment on CASSANDRA-15430 at 1/16/20 7:57 AM: - [~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 here: [https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W] The data model is pretty straightforward originating from Astyanax/Thrift legacy days, moving over to CQL, in a BLOB-centric model, with our client-side "serializer framework". E.g.: {noformat} CREATE TABLE ks."cf" ( k blob, n blob, v blob, PRIMARY KEY (k, n) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (n ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '2'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 259200 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; {noformat} Regarding queries. It is really just about the write path (batch message processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have tried single-partition batches vs. multi-partition batches (I know, bad practice), but single-partition batches didn't have a positive impact on the write path in 3.0 either in our tests. Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to handle the same write load sufficiently. Thanks for any help in that area! was (Author: tsteinmaurer): [~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 here: [https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W] The data model is pretty straightforward originating from Astyanax/Thrift legacy days, moving over to CQL, in a BLOB-centric model, with our client-side "serializer framework". E.g.: {noformat} CREATE TABLE ks."cf" ( k blob, n blob, v blob, PRIMARY KEY (k, n) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (n ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '2'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 259200 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; {noformat} Regarding queries. It is really just about the write path (batch message processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have tried single-partition batches vs. multi-partition batches (I know, bad practice), but single-partition batches didn't have a positive impact on the write path in 3.0 either in our tests. Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to handle the same load sufficiently. Thanks for any help in that area! > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer >Priority: Normal > Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard
[jira] [Updated] (CASSANDRA-15470) Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB & Bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mallika Kulkarni updated CASSANDRA-15470: - Test and Documentation Plan: Unit tests written for newly added validations Status: Patch Available (was: Open) [https://github.com/apache/cassandra/pull/425] > Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB > & Bytes > - > > Key: CASSANDRA-15470 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15470 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jordan West >Assignee: Mallika Kulkarni >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-rc > > Time Spent: 10m > Remaining Estimate: 0h > > {{DatabaseDescriptor}} has several functions that convert between user > supplied sizes in KB/MB and bytes. These are implemented without much > consistency and, while unlikely, several have the potential to overflow since > validation on the input is missing. Meanwhile, some widen the number to a > long correctly. Options include: widening in all places or simply doing > better validation on start up — currently only the lower bound of the valid > range is checked for many of these fields. > List of Affected {{DatabaseDescriptor}} Methods: > * {{getColumnIndexSize}} > * {{getColumnIndexCacheSize}} > * {{getBatchSizeWarnThreshold}} > * {{getNativeTransportFrameBlockSize}} > * {{getRepairSessionSpaceInMegabytes}} > * {{getNativeTransportMaxFrameSize}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15470) Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB & Bytes
[ https://issues.apache.org/jira/browse/CASSANDRA-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15470: --- Labels: pull-request-available (was: ) > Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB > & Bytes > - > > Key: CASSANDRA-15470 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15470 > Project: Cassandra > Issue Type: Bug > Components: Local/Config >Reporter: Jordan West >Assignee: Mallika Kulkarni >Priority: Normal > Labels: pull-request-available > Fix For: 4.0-rc > > > {{DatabaseDescriptor}} has several functions that convert between user > supplied sizes in KB/MB and bytes. These are implemented without much > consistency and, while unlikely, several have the potential to overflow since > validation on the input is missing. Meanwhile, some widen the number to a > long correctly. Options include: widening in all places or simply doing > better validation on start up — currently only the lower bound of the valid > range is checked for many of these fields. > List of Affected {{DatabaseDescriptor}} Methods: > * {{getColumnIndexSize}} > * {{getColumnIndexCacheSize}} > * {{getBatchSizeWarnThreshold}} > * {{getNativeTransportFrameBlockSize}} > * {{getRepairSessionSpaceInMegabytes}} > * {{getNativeTransportMaxFrameSize}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18
[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016628#comment-17016628 ] Thomas Steinmaurer commented on CASSANDRA-15430: [~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 here: [https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W] The data model is pretty straightforward originating from Astyanax/Thrift legacy days, moving over to CQL, in a BLOB-centric model, with our client-side "serializer framework". E.g.: {noformat} CREATE TABLE ks."cf" ( k blob, n blob, v blob, PRIMARY KEY (k, n) ) WITH COMPACT STORAGE AND CLUSTERING ORDER BY (n ASC) AND bloom_filter_fp_chance = 0.01 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'} AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '2'} AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND crc_check_chance = 1.0 AND dclocal_read_repair_chance = 0.0 AND default_time_to_live = 0 AND gc_grace_seconds = 259200 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = 'NONE'; {noformat} Regarding queries. It is really just about the write path (batch message processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have tried single-partition batches vs. multi-partition batches (I know, bad practice), but single-partition batches didn't have a positive impact on the write path in 3.0 either in our tests. Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to handle the same load sufficiently. Thanks for any help in that area! > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer >Priority: Normal > Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard shows highlighted areas (CPU, suspension) with metrics > for all 6 nodes and the one outlier being the node upgraded to Cassandra > 3.0.18. > !dashboard.png|width=1280! > Additionally we see a large increase on pending tasks in the mutation stage > after the upgrade: > !mutation_stage.png! > And dropped mutation messages, also confirmed in the Cassandra log: > {noformat} > INFO [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - > MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > MutationStage 256 81824 3360532756 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ReadStage 0 0 62862266 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > RequestResponseStage 0 0 2176659856 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > ReadRepairStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > ... > {noformat} > Judging from a 15min JFR session for both, 3.0.18
[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016527#comment-17016527 ] Dinesh Joshi commented on CASSANDRA-13938: -- Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to make changes on commit. - {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out as a constant? I think its used in multiple locations. - {{CompressedInputStream::chunkBytesRead}} can be package private. - {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in comment. > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Aleksey Yeschenko >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], >
[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016527#comment-17016527 ] Dinesh Joshi edited comment on CASSANDRA-13938 at 1/16/20 3:56 AM: --- Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to make changes on commit. - {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out as a constant? I think its used in multiple locations. - {{CompressedInputStream::chunkBytesRead}} can be package private. - {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in comment. +1 was (Author: djoshi3): Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to make changes on commit. - {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out as a constant? I think its used in multiple locations. - {{CompressedInputStream::chunkBytesRead}} can be package private. - {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in comment. > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Aleksey Yeschenko >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace
[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dinesh Joshi updated CASSANDRA-13938: - Status: Ready to Commit (was: Review In Progress) > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Aleksey Yeschenko >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with > repair options (parallelism: parallel, primary range: false, incremental: > true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: > [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: > false) > [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 > for range [(3074457345618258602,-9223372036854775808], > (-9223372036854775808,-3074457345618258603], > (-3074457345618258603,3074457345618258602]] failed with error Stream failed > [2017-10-05 14:32:07,048] null > [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds > error: Repair job has failed with the error message: [2017-10-05 > 14:32:07,048] null > -- StackTrace -- > java.lang.RuntimeException: Repair job has failed with the error message: > [2017-10-05 14:32:07,048] null >
[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)
[ https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016527#comment-17016527 ] Dinesh Joshi edited comment on CASSANDRA-13938 at 1/16/20 3:56 AM: --- Hi [~aleksey], Overall the code looks good. Minor nits only. Feel free to make changes on commit. - {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out as a constant? I think its used in multiple locations. - {{CompressedInputStream::chunkBytesRead}} can be package private. - {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in comment. +1 was (Author: djoshi3): Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to make changes on commit. - {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out as a constant? I think its used in multiple locations. - {{CompressedInputStream::chunkBytesRead}} can be package private. - {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in comment. +1 > Default repair is broken, crashes other nodes participating in repair (in > trunk) > > > Key: CASSANDRA-13938 > URL: https://issues.apache.org/jira/browse/CASSANDRA-13938 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Repair >Reporter: Nate McCall >Assignee: Aleksey Yeschenko >Priority: Urgent > Fix For: 4.0-alpha > > Attachments: 13938.yaml, test.sh > > > Running through a simple scenario to test some of the new repair features, I > was not able to make a repair command work. Further, the exception seemed to > trigger a nasty failure state that basically shuts down the netty connections > for messaging *and* CQL on the nodes transferring back data to the node being > repaired. The following steps reproduce this issue consistently. > Cassandra stress profile (probably not necessary, but this one provides a > really simple schema and consistent data shape): > {noformat} > keyspace: standard_long > keyspace_definition: | > CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', > 'replication_factor':3}; > table: test_data > table_definition: | > CREATE TABLE test_data ( > key text, > ts bigint, > val text, > PRIMARY KEY (key, ts) > ) WITH COMPACT STORAGE AND > CLUSTERING ORDER BY (ts DESC) AND > bloom_filter_fp_chance=0.01 AND > caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND > comment='' AND > dclocal_read_repair_chance=0.00 AND > gc_grace_seconds=864000 AND > read_repair_chance=0.00 AND > compaction={'class': 'SizeTieredCompactionStrategy'} AND > compression={'sstable_compression': 'LZ4Compressor'}; > columnspec: > - name: key > population: uniform(1..5000) # 50 million records available > - name: ts > cluster: gaussian(1..50) # Up to 50 inserts per record > - name: val > population: gaussian(128..1024) # varrying size of value data > insert: > partitions: fixed(1) # only one insert per batch for individual partitions > select: fixed(1)/1 # each insert comes in one at a time > batchtype: UNLOGGED > queries: > single: > cql: select * from test_data where key = ? and ts = ? limit 1; > series: > cql: select key,ts,val from test_data where key = ? limit 10; > {noformat} > The commands to build and run: > {noformat} > ccm create 4_0_test -v git:trunk -n 3 -s > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4 > # flush the memtable just to get everything on disk > ccm node1 nodetool flush > ccm node2 nodetool flush > ccm node3 nodetool flush > # disable hints for nodes 2 and 3 > ccm node2 nodetool disablehandoff > ccm node3 nodetool disablehandoff > # stop node1 > ccm node1 stop > ccm stress user profile=./histo-test-schema.yml > ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4 > # wait 10 seconds > ccm node1 start > # Note that we are local to ccm's nodetool install 'cause repair preview is > not reported yet > node1/bin/nodetool repair --preview > node1/bin/nodetool repair standard_long test_data > {noformat} > The error outputs from the last repair command follow. First, this is stdout > from node1: > {noformat} > $ node1/bin/nodetool repair standard_long test_data > objc[47876]: Class JavaLaunchHelper is implemented in both > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java > (0x10274d4c0) and > /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib > (0x1047b64e0). One of the two will be used. Which one is undefined. > [2017-10-05 14:31:52,425] Starting repair command #4 > (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace
[jira] [Commented] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
[ https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016505#comment-17016505 ] Ekaterina Dimitrova commented on CASSANDRA-15314: - I just started looking into it. I will let you know when there is a patch available for review. Thanks > Fix failing test - test_rolling_upgrade_with_internode_ssl - > upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD > - > > Key: CASSANDRA-15314 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15314 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Ekaterina Dimitrova >Priority: Normal > Labels: dtest > Fix For: 4.0-alpha > > > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11] > > {code:java} > ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* > now UP']: INFO [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See > system.log for remainder > self = > object at 0x7f6d90d43b38> > @pytest.mark.timeout(3000) > def test_rolling_upgrade_with_internode_ssl(self): > """ > Rolling upgrade test using internode ssl. > """ > > self.upgrade_scenario(rolling=True, internode_ssl=True) > upgrade_tests/upgrade_through_versions_test.py:296: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario > self.upgrade_to_version(version_meta, partial=True, nodes=(node,), > internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version > node.start(wait_other_notice=240, wait_for_binary_proto=True) > ../env/src/ccm/ccmlib/node.py:751: in start > node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice) > ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240 > process = None, verbose = False, filename = 'system.log' > def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, > verbose=False, filename='system.log'): > """ > Watch the log until one or more (regular) expression are found. > This methods when all the expressions have been found or the > method > timeouts (a TimeoutError is then raised). On successful > completion, > a list of pair (line matched, match object) is returned. > """ > start = time.time() > tofind = [exprs] if isinstance(exprs, string_types) else exprs > tofind = [re.compile(e) for e in tofind] > matchings = [] > reads = "" > if len(tofind) == 0: > return None > > log_file = os.path.join(self.get_path(), 'logs', filename) > output_read = False > while not os.path.exists(log_file): > time.sleep(.5) > if start + timeout < time.time(): > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", > time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be > created.".format(log_file)) > if process and not output_read: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse RuntimeError > but I'm lazy > > with open(log_file) as f: > if from_mark: > f.seek(from_mark) > > while True: > # First, if we have a process to check, then check it. > # Skip on Windows - stdout/stderr is cassandra.bat > if not common.is_win() and not output_read: > if process: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, > verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse > RuntimeError but I'm lazy > > line = f.readline() > if
[jira] [Commented] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
[ https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016450#comment-17016450 ] Vinay Chella commented on CASSANDRA-15314: -- I believe these two(TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD, TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD) are different test cases, could be failing for the same reason. > Fix failing test - test_rolling_upgrade_with_internode_ssl - > upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD > - > > Key: CASSANDRA-15314 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15314 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Ekaterina Dimitrova >Priority: Normal > Labels: dtest > Fix For: 4.0-alpha > > > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11] > > {code:java} > ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* > now UP']: INFO [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See > system.log for remainder > self = > object at 0x7f6d90d43b38> > @pytest.mark.timeout(3000) > def test_rolling_upgrade_with_internode_ssl(self): > """ > Rolling upgrade test using internode ssl. > """ > > self.upgrade_scenario(rolling=True, internode_ssl=True) > upgrade_tests/upgrade_through_versions_test.py:296: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario > self.upgrade_to_version(version_meta, partial=True, nodes=(node,), > internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version > node.start(wait_other_notice=240, wait_for_binary_proto=True) > ../env/src/ccm/ccmlib/node.py:751: in start > node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice) > ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240 > process = None, verbose = False, filename = 'system.log' > def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, > verbose=False, filename='system.log'): > """ > Watch the log until one or more (regular) expression are found. > This methods when all the expressions have been found or the > method > timeouts (a TimeoutError is then raised). On successful > completion, > a list of pair (line matched, match object) is returned. > """ > start = time.time() > tofind = [exprs] if isinstance(exprs, string_types) else exprs > tofind = [re.compile(e) for e in tofind] > matchings = [] > reads = "" > if len(tofind) == 0: > return None > > log_file = os.path.join(self.get_path(), 'logs', filename) > output_read = False > while not os.path.exists(log_file): > time.sleep(.5) > if start + timeout < time.time(): > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", > time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be > created.".format(log_file)) > if process and not output_read: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse RuntimeError > but I'm lazy > > with open(log_file) as f: > if from_mark: > f.seek(from_mark) > > while True: > # First, if we have a process to check, then check it. > # Skip on Windows - stdout/stderr is cassandra.bat > if not common.is_win() and not output_read: > if process: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, > verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse >
[jira] [Comment Edited] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
[ https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016450#comment-17016450 ] Vinay Chella edited comment on CASSANDRA-15314 at 1/16/20 1:33 AM: --- I believe these two(TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD, TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD) are different test cases, could be failing for the same reason. I would be happy to help with review both if you have a patch. was (Author: vinaykumarcse): I believe these two(TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD, TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD) are different test cases, could be failing for the same reason. > Fix failing test - test_rolling_upgrade_with_internode_ssl - > upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD > - > > Key: CASSANDRA-15314 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15314 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Ekaterina Dimitrova >Priority: Normal > Labels: dtest > Fix For: 4.0-alpha > > > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11] > > {code:java} > ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* > now UP']: INFO [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See > system.log for remainder > self = > object at 0x7f6d90d43b38> > @pytest.mark.timeout(3000) > def test_rolling_upgrade_with_internode_ssl(self): > """ > Rolling upgrade test using internode ssl. > """ > > self.upgrade_scenario(rolling=True, internode_ssl=True) > upgrade_tests/upgrade_through_versions_test.py:296: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario > self.upgrade_to_version(version_meta, partial=True, nodes=(node,), > internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version > node.start(wait_other_notice=240, wait_for_binary_proto=True) > ../env/src/ccm/ccmlib/node.py:751: in start > node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice) > ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240 > process = None, verbose = False, filename = 'system.log' > def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, > verbose=False, filename='system.log'): > """ > Watch the log until one or more (regular) expression are found. > This methods when all the expressions have been found or the > method > timeouts (a TimeoutError is then raised). On successful > completion, > a list of pair (line matched, match object) is returned. > """ > start = time.time() > tofind = [exprs] if isinstance(exprs, string_types) else exprs > tofind = [re.compile(e) for e in tofind] > matchings = [] > reads = "" > if len(tofind) == 0: > return None > > log_file = os.path.join(self.get_path(), 'logs', filename) > output_read = False > while not os.path.exists(log_file): > time.sleep(.5) > if start + timeout < time.time(): > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", > time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be > created.".format(log_file)) > if process and not output_read: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse RuntimeError > but I'm lazy > > with open(log_file) as f: > if from_mark: > f.seek(from_mark) > > while True: > # First, if we have a process to check, then check it. > # Skip on Windows - stdout/stderr is cassandra.bat > if not common.is_win() and not output_read: > if process: >
[jira] [Commented] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually
[ https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016429#comment-17016429 ] David Capwell commented on CASSANDRA-15507: --- The only thing I have thought of to solve this is to make the selection pluggable (rather not mutate CQL for this) so dtest could just override the implementation. The main reason I didn't go this route was in a attempt to try to make this less specific to a version; so the cost is a potentially failing test in the future... > Test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > does not test a failing read repair and should be updated to actually > trigger a failed read repair > > > Key: CASSANDRA-15507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15507 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > makes a few assumptions which are not valid at the moment. > 1) the write to node 1 and 2 have the same digest (they don’t, this is caused > by the timestamp being different) > 2) node 3 will participate with the read; it won’t give the fact that > org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the > first 2 nodes always, so node 3 won’t get involved with the repair > 3) node 3 will attempt to get repaired (it won’t because its never looked at) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15367) Memtable memory allocations may deadlock
[ https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016398#comment-17016398 ] Benedict Elliott Smith commented on CASSANDRA-15367: Correct, except perhaps the last part. There's no need to collect more than one of these deadlocks to bring down the node. If there are no memtable flushes already in progress, then no more flushes will ever occur, because they must wait for all earlier operations to complete, including the deadlock. So from this point on no Memtable memory will ever be released. > Memtable memory allocations may deadlock > > > Key: CASSANDRA-15367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15367 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log, Local/Memtable >Reporter: Benedict Elliott Smith >Assignee: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x > > > * Under heavy contention, we guard modifications to a partition with a mutex, > for the lifetime of the memtable. > * Memtables block for the completion of all {{OpOrder.Group}} started before > their flush began > * Memtables permit operations from this cohort to fall-through to the > following Memtable, in order to guarantee a precise commitLogUpperBound > * Memtable memory limits may be lifted for operations in the first cohort, > since they block flush (and hence block future memory allocation) > With very unfortunate scheduling > * A contended partition may rapidly escalate to a mutex > * The system may reach memory limits that prevent allocations for the new > Memtable’s cohort (C2) > * An operation from C2 may hold the mutex when this occurs > * Operations from a prior Memtable’s cohort (C1), for a contended partition, > may fall-through to the next Memtable > * The operations from C1 may execute after the above is encountered by those > from C2 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually
[ https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016394#comment-17016394 ] Benedict Elliott Smith commented on CASSANDRA-15507: Sure, wfm. It would be nice to try to solve the general problem of nominating nodes to be contacted, i.e. specifying the contact preference order of nodes for a coordinator (since this is going to be needed in a lot of distributed tests), but this looks to solve the clear and present problem, so no huge harm punting on that. > Test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > does not test a failing read repair and should be updated to actually > trigger a failed read repair > > > Key: CASSANDRA-15507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15507 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > > The test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > makes a few assumptions which are not valid at the moment. > 1) the write to node 1 and 2 have the same digest (they don’t, this is caused > by the timestamp being different) > 2) node 3 will participate with the read; it won’t give the fact that > org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the > first 2 nodes always, so node 3 won’t get involved with the repair > 3) node 3 will attempt to get repaired (it won’t because its never looked at) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15367) Memtable memory allocations may deadlock
[ https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016393#comment-17016393 ] Blake Eggleston commented on CASSANDRA-15367: - I've been trying to work out exactly how this deadlock can occur, based on your description. Could the deadlock be restated like this? For a given partition key: * a write is part of an OpGroup before a barrier set on Memtable1 (M1), but with a replay position after the final replay position set on M1 before it flushes. * So it’s forwarded to M2, while still blocking flushes on M1 * M2 has another in flight write for this partition, it’s contended, so it’s holding the lock ** It can’t progress because it can’t allocate memory (in part because M1 can’t flush) ** It doesn’t degrade to allocating on heap it’s oporder isn’t blocking anything. * The write stage becomes saturated with deadlocked writes like these, no more writes > Memtable memory allocations may deadlock > > > Key: CASSANDRA-15367 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15367 > Project: Cassandra > Issue Type: Bug > Components: Local/Commit Log, Local/Memtable >Reporter: Benedict Elliott Smith >Assignee: Benedict Elliott Smith >Priority: Normal > Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x > > > * Under heavy contention, we guard modifications to a partition with a mutex, > for the lifetime of the memtable. > * Memtables block for the completion of all {{OpOrder.Group}} started before > their flush began > * Memtables permit operations from this cohort to fall-through to the > following Memtable, in order to guarantee a precise commitLogUpperBound > * Memtable memory limits may be lifted for operations in the first cohort, > since they block flush (and hence block future memory allocation) > With very unfortunate scheduling > * A contended partition may rapidly escalate to a mutex > * The system may reach memory limits that prevent allocations for the new > Memtable’s cohort (C2) > * An operation from C2 may hold the mutex when this occurs > * Operations from a prior Memtable’s cohort (C1), for a contended partition, > may fall-through to the next Memtable > * The operations from C1 may execute after the above is encountered by those > from C2 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t
[ https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15507: -- Test and Documentation Plan: PR: https://github.com/apache/cassandra/pull/424 CircleCI: https://circleci.com/gh/dcapwell/cassandra/tree/fixDistributedReadWritePathTestTest Status: Patch Available (was: Open) > Test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > does not test a failing read repair and should be updated to actually > trigger a failed read repair > > > Key: CASSANDRA-15507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15507 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > makes a few assumptions which are not valid at the moment. > 1) the write to node 1 and 2 have the same digest (they don’t, this is caused > by the timestamp being different) > 2) node 3 will participate with the read; it won’t give the fact that > org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the > first 2 nodes always, so node 3 won’t get involved with the repair > 3) node 3 will attempt to get repaired (it won’t because its never looked at) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t
[ https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15507: -- Test and Documentation Plan: PR: https://github.com/apache/cassandra/pull/424 CircleCI: https://circleci.com/gh/dcapwell/cassandra/tree/fixDistributedReadWritePathTestTest was: PR: https://github.com/apache/cassandra/pull/424 CircleCI: https://circleci.com/gh/dcapwell/cassandra/tree/fixDistributedReadWritePathTestTest > Test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > does not test a failing read repair and should be updated to actually > trigger a failed read repair > > > Key: CASSANDRA-15507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15507 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > Time Spent: 10m > Remaining Estimate: 0h > > The test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > makes a few assumptions which are not valid at the moment. > 1) the write to node 1 and 2 have the same digest (they don’t, this is caused > by the timestamp being different) > 2) node 3 will participate with the read; it won’t give the fact that > org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the > first 2 nodes always, so node 3 won’t get involved with the repair > 3) node 3 will attempt to get repaired (it won’t because its never looked at) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t
[ https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated CASSANDRA-15507: --- Labels: pull-request-available (was: ) > Test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > does not test a failing read repair and should be updated to actually > trigger a failed read repair > > > Key: CASSANDRA-15507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15507 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > Labels: pull-request-available > > The test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > makes a few assumptions which are not valid at the moment. > 1) the write to node 1 and 2 have the same digest (they don’t, this is caused > by the timestamp being different) > 2) node 3 will participate with the read; it won’t give the fact that > org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the > first 2 nodes always, so node 3 won’t get involved with the repair > 3) node 3 will attempt to get repaired (it won’t because its never looked at) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15216: - Since Version: 4.0 Source Control Link: https://github.com/apache/cassandra/commit/9d2ffad6b6d09761a03aeb1a207e9780d1174046 Resolution: Fixed Status: Resolved (was: Ready to Commit) > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot > 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15216: - Status: Ready to Commit (was: Review In Progress) > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot > 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Brandon Williams updated CASSANDRA-15216: - Reviewers: Brandon Williams, Brandon Williams (was: Brandon Williams) Brandon Williams, Brandon Williams Status: Review In Progress (was: Patch Available) > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot > 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[cassandra] branch trunk updated: Set cross_node_timeout to true by default.
This is an automated email from the ASF dual-hosted git repository. brandonwilliams pushed a commit to branch trunk in repository https://gitbox.apache.org/repos/asf/cassandra.git The following commit(s) were added to refs/heads/trunk by this push: new 9d2ffad Set cross_node_timeout to true by default. 9d2ffad is described below commit 9d2ffad6b6d09761a03aeb1a207e9780d1174046 Author: Ekaterina Dimitrova AuthorDate: Mon Jan 13 14:29:38 2020 -0500 Set cross_node_timeout to true by default. Patch by Ekaterina Dimitrova, reviewed by brandonwilliams for CASSANDRA-15216 --- CHANGES.txt | 4 NEWS.txt | 6 ++ conf/cassandra.yaml | 6 +++--- src/java/org/apache/cassandra/config/Config.java | 2 +- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/CHANGES.txt b/CHANGES.txt index b6d140c..522edf8 100644 --- a/CHANGES.txt +++ b/CHANGES.txt @@ -14,6 +14,10 @@ * Add documentation for Java 11 support in Cassandra (CASSANDRA-15428) * Integrate SJK into nodetool (CASSANDRA-12197) * Ensure that empty clusterings with kind==CLUSTERING are Clustering.EMPTY (CASSANDRA-15498) + * The flag 'cross_node_timeout' has been set as true by default. This change + is done under the assumption that users have setup NTP on their clusters or + otherwise synchronize their clocks, and that clocks are mostly in sync, since + this is a requirement for general correctness of last write wins. (CASSANDRA-15216) Merged from 3.11: * Fix nodetool compactionstats showing extra pending task for TWCS - patch implemented (CASSANDRA-15409) * Fix SELECT JSON formatting for the "duration" type (CASSANDRA-15075) diff --git a/NEWS.txt b/NEWS.txt index 86de7a4..e51203f 100644 --- a/NEWS.txt +++ b/NEWS.txt @@ -218,6 +218,12 @@ Upgrading have been set as false by default. Operators should modify them to allow the creation of new views and SASI indexes, the existing ones will continue working. See CASSANDRA-14866 for details. +- CASSANDRA-15216 - The flag 'cross_node_timeout' has been set as true by default. + This change is done under the assumption that users have setup NTP on + their clusters or otherwise synchronize their clocks, and that clocks are + mostly in sync, since this is a requirement for general correctness of + last write wins. + Materialized Views --- diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml index 9a79f24..f1e5864 100644 --- a/conf/cassandra.yaml +++ b/conf/cassandra.yaml @@ -923,9 +923,9 @@ slow_query_log_timeout_in_ms: 500 # under overload conditions we will waste that much extra time processing # already-timed-out requests. # -# Warning: before enabling this property make sure to ntp is installed -# and the times are synchronized between the nodes. -cross_node_timeout: false +# Warning: It is generally assumed that users have setup NTP on their clusters, and that clocks are modestly in sync, +# since this is a requirement for general correctness of last write wins. +#cross_node_timeout: true # Set keep-alive period for streaming # This node will send a keep-alive message periodically with this period. diff --git a/src/java/org/apache/cassandra/config/Config.java b/src/java/org/apache/cassandra/config/Config.java index 8fa8e72..2d74426 100644 --- a/src/java/org/apache/cassandra/config/Config.java +++ b/src/java/org/apache/cassandra/config/Config.java @@ -108,7 +108,7 @@ public class Config public Integer streaming_connections_per_host = 1; public Integer streaming_keep_alive_period_in_secs = 300; //5 minutes -public boolean cross_node_timeout = false; +public boolean cross_node_timeout = true; public volatile long slow_query_log_timeout_in_ms = 500L; - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-15216: Impacts: Docs (was: None) Test and Documentation Plan: Documented in NEWS.txt Status: Patch Available (was: In Progress) > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot > 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016361#comment-17016361 ] Ekaterina Dimitrova edited comment on CASSANDRA-15216 at 1/15/20 10:10 PM: --- Patch available for trunk [here|https://github.com/ekaterinadimitrova2/cassandra/tree/trunk-CASSANDRA-15216] [Pull request|https://github.com/ekaterinadimitrova2/cassandra/pull/17] Screenshots from the CI ran attached. If you look at the failures of "test all", there are some which I don't see when I am running CI on trunk but most of them are marked as flaky. I think it should be good. NEWS.txt updated as agreed earlier. was (Author: e.dimitrova): Patch available for trunk [here|https://github.com/ekaterinadimitrova2/cassandra/tree/trunk-CASSANDRA-15216] [Pull request | https://github.com/ekaterinadimitrova2/cassandra/pull/17] Screenshots from the CI ran attached. If you look at the failures of "test all", there are some which I don't see when I am running CI on trunk but most of them are marked as flaky. I think it should be good. NEWS.txt updated as agreed earlier. > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot > 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016361#comment-17016361 ] Ekaterina Dimitrova commented on CASSANDRA-15216: - Patch available for trunk [here|https://github.com/ekaterinadimitrova2/cassandra/tree/trunk-CASSANDRA-15216] [Pull request | https://github.com/ekaterinadimitrova2/cassandra/pull/17] Screenshots from the CI ran attached. If you look at the failures of "test all", there are some which I don't see when I am running CI on trunk but most of them are marked as flaky. I think it should be good. NEWS.txt updated as agreed earlier. > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot > 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default
[ https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova updated CASSANDRA-15216: Attachment: Screen Shot 2020-01-15 at 5.02.22 PM.png Screen Shot 2020-01-15 at 5.01.33 PM.png Screen Shot 2020-01-15 at 5.01.06 PM.png > Cross node message creation times are disabled by default > - > > Key: CASSANDRA-15216 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15216 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Benedict Elliott Smith >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0, 4.0-alpha > > Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot > 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png > > > This can cause a lot of wasted work for messages that have timed out on the > coordinator. We should generally assume that our users have setup NTP on > their clusters, and that clocks are modestly in sync, since it’s a > requirement for general correctness of last write wins. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t
[ https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Capwell updated CASSANDRA-15507: -- Bug Category: Parent values: Correctness(12982)Level 1 values: Test Failure(12990) Complexity: Low Hanging Fruit Discovered By: Code Inspection Severity: Normal Status: Open (was: Triage Needed) > Test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > does not test a failing read repair and should be updated to actually > trigger a failed read repair > > > Key: CASSANDRA-15507 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15507 > Project: Cassandra > Issue Type: Bug > Components: Test/unit >Reporter: David Capwell >Assignee: David Capwell >Priority: Normal > > The test > org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest > makes a few assumptions which are not valid at the moment. > 1) the write to node 1 and 2 have the same digest (they don’t, this is caused > by the timestamp being different) > 2) node 3 will participate with the read; it won’t give the fact that > org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the > first 2 nodes always, so node 3 won’t get involved with the repair > 3) node 3 will attempt to get repaired (it won’t because its never looked at) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t
David Capwell created CASSANDRA-15507: - Summary: Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually trigger a failed read repair Key: CASSANDRA-15507 URL: https://issues.apache.org/jira/browse/CASSANDRA-15507 Project: Cassandra Issue Type: Bug Components: Test/unit Reporter: David Capwell Assignee: David Capwell The test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest makes a few assumptions which are not valid at the moment. 1) the write to node 1 and 2 have the same digest (they don’t, this is caused by the timestamp being different) 2) node 3 will participate with the read; it won’t give the fact that org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the first 2 nodes always, so node 3 won’t get involved with the repair 3) node 3 will attempt to get repaired (it won’t because its never looked at) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-12995) update hppc dependency to 0.7
[ https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016334#comment-17016334 ] Ekaterina Dimitrova edited comment on CASSANDRA-12995 at 1/15/20 9:24 PM: -- [~suztomo] [~brandon.williams] Thank you for raising it. As part of CASSANDRA-12197 hppc was not updated. I guess the confusion comes from the removal of the version flag in all-pom where no other package has version added. Best practices were followed as per the [official documentation |http://cassandra.apache.org/doc/latest/development/dependencies.html] while only adding new libraries in order to support SJK. Also, as pointed by [~suztomo] - hppc version in parent pom was not changed, neither a jar was updated. Also, from the code itself - Please let me know if I miss something. was (Author: e.dimitrova): [~suztomo] [~brandon.williams] Thank you for raising it. As part of CASSANDRA-12197 hppc was not updated. I guess the confusion comes from the removal of the version flag. Best practices were followed as per the [official documentation |http://cassandra.apache.org/doc/latest/development/dependencies.html] while only adding new libraries in order to support SJK. Also, as pointed by [~suztomo] - hppc version in parent pom was not changed, neither a jar was updated. Also, from the code itself - Please let me know if I miss something. > update hppc dependency to 0.7 > - > > Key: CASSANDRA-12995 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12995 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies, Packaging >Reporter: Tomas Repik >Priority: Normal > Labels: easyfix > Fix For: 4.0 > > Attachments: cassandra-3.11.0-hppc.patch > > > Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to > the sources we need to do in order to successfully build it. Cassandra > depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream > released even newer version 0.7.2. I attached a patch updating cassandra > sources to depend on the 0.7.1 hppc sources. It should be also compatible > with the newest upstream version. The only actual changes are the removal of > Open infix in class names. The issue was discussed in here: > https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12995) update hppc dependency to 0.7
[ https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016334#comment-17016334 ] Ekaterina Dimitrova commented on CASSANDRA-12995: - [~suztomo] [~brandon.williams] Thank you for raising it. As part of CASSANDRA-12197 hppc was not updated. I guess the confusion comes from the removal of the version flag. Best practices were followed as per the [official documentation |http://cassandra.apache.org/doc/latest/development/dependencies.html] while only adding new libraries in order to support SJK. Also, as pointed by [~suztomo] - hppc version in parent pom was not changed, neither a jar was updated. Also, from the code itself - Please let me know if I miss something. > update hppc dependency to 0.7 > - > > Key: CASSANDRA-12995 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12995 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies, Packaging >Reporter: Tomas Repik >Priority: Normal > Labels: easyfix > Fix For: 4.0 > > Attachments: cassandra-3.11.0-hppc.patch > > > Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to > the sources we need to do in order to successfully build it. Cassandra > depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream > released even newer version 0.7.2. I attached a patch updating cassandra > sources to depend on the 0.7.1 hppc sources. It should be also compatible > with the newest upstream version. The only actual changes are the removal of > Open infix in class names. The issue was discussed in here: > https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12995) update hppc dependency to 0.7
[ https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016288#comment-17016288 ] Brandon Williams commented on CASSANDRA-12995: -- It looks like the issue is we have it again here: [https://github.com/apache/cassandra/blob/trunk/build.xml#L783] [~e.dimitrova] was this oversight? > update hppc dependency to 0.7 > - > > Key: CASSANDRA-12995 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12995 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies, Packaging >Reporter: Tomas Repik >Priority: Normal > Labels: easyfix > Fix For: 4.0 > > Attachments: cassandra-3.11.0-hppc.patch > > > Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to > the sources we need to do in order to successfully build it. Cassandra > depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream > released even newer version 0.7.2. I attached a patch updating cassandra > sources to depend on the 0.7.1 hppc sources. It should be also compatible > with the newest upstream version. The only actual changes are the removal of > Open infix in class names. The issue was discussed in here: > https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15504) INT is incompatible with previous type SMALLINT
[ https://issues.apache.org/jira/browse/CASSANDRA-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016226#comment-17016226 ] Benedict Elliott Smith commented on CASSANDRA-15504: It's even more complicated than you might think. These are some of the factors that come to mind initially, that are probably not a complete catalogue of the issues: # 3.0 format sstables persist data in a manner that requires us to know how many bytes are used, and they do not record what the type was when the sstable was written. So at minimum to support this we would need to persist this type information in all sstables (which we should anyway, but don't currently), as opposed to using the system tables. # We have to handle data from legacy sstables, which persist no information at all about what data they contain, and for which it is very possible to find poorly typed legacy information floating around from before we had proper checks, and permitted mangling of type casts to write arbitrary things # So, we'd need (1) and we'd need to ensure we didn't support any such operation until we had established that no dangerous files exist on the cluster, on any node (including refusing restoring them from backup or importing them, for instance), but wait, we're not done # Currently schema changes are also eventually consistent - this is slated to be changed, but not for some time, and it will always have eventually consistent propagation, even if there is serialized decision-making. So: what happens if a node requests data for a field that used to be a different type and _still is_ on the other node? How do we know what type we will receive? We will need to verify the schema we're communicating with for each operation between each pair of nodes. Which, again, is definitely something that is likely to be implemented in the future, but it's non-trivial, and not pressing. The long and the short of it is that schema behaviours were implemented back in the Wild West era of Cassandra, and it's actually a lot more involved than the implementors originally imagined. So until we have time to do it properly, we've had to disable features like this that can lead to corrupted data through misinterpretation - however unlikely it might be. That said, in the meantime it's certainly possible to do this as an operator, it just requires some annoying surgery on your cluster. Or, as I say, we'd be more than happy for a volunteer with the time to take up this task. > INT is incompatible with previous type SMALLINT > --- > > Key: CASSANDRA-15504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15504 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Truscello >Priority: Normal > > With the release of Cassandra 3.11.5 and the fixing of CASSANDRA-14948, it > now appears that you can no longer re-add a SMALLINT column as an INT type. > This is rather surprising as any SMALLINT value should be representable by an > INT type. > The following example was run on Cassandra 3.11.5 on CentOS 7 installed from > official RedHat repo: > > > {noformat} > cqlsh> CREATE KEYSPACE demo WITH replication = {'class':'SimpleStrategy', > 'replication_factor' : 1}; > cqlsh> CREATE TABLE demo.demo_table ( > ... user_id BIGINT, > ... created TIMESTAMP, > ... points SMALLINT, > ... PRIMARY KEY (user_id, created) > ... ) WITH CLUSTERING ORDER BY (created DESC); > cqlsh> ALTER TABLE demo.demo_table DROP points; > cqlsh> ALTER TABLE demo.demo_table ADD points INT; > InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot > re-add previously dropped column 'points' of type int, incompatible with > previous type smallint"{noformat} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-14740) BlockingReadRepair does not maintain monotonicity during range movements
[ https://issues.apache.org/jira/browse/CASSANDRA-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sam Tunnicliffe updated CASSANDRA-14740: Reviewers: Sam Tunnicliffe > BlockingReadRepair does not maintain monotonicity during range movements > > > Key: CASSANDRA-14740 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14740 > Project: Cassandra > Issue Type: Bug > Components: Legacy/Coordination >Reporter: Benedict Elliott Smith >Assignee: Benedict Elliott Smith >Priority: Urgent > Labels: correctness > Fix For: 4.0, 4.0-beta > > > The BlockingReadRepair code introduced by CASSANDRA-10726 requires that each > of the queried nodes are written to, but pending nodes are not considered. > If there is a pending range movement, one of these writes may be ‘lost’ when > the range movement completes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15504) INT is incompatible with previous type SMALLINT
[ https://issues.apache.org/jira/browse/CASSANDRA-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016214#comment-17016214 ] Marcus Truscello commented on CASSANDRA-15504: -- That's unfortunate to hear. However, I was thinking something a bit more simple: making SMALLINT considered "compatible" with INTs. Currently, it appears that Int32s are only compatible with themselves (they lack an {{isValueCompatibleWith}} method) but Shorts do [offer a toInt method|[https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/marshal/ShortType.java#L102-L105]], so it should be possible to upgrade Shorts to Int32s. But if I'm understanding you correctly, that solution wouldn't work, would it? It sounds like the binary representations are being blindly deserialized to the current type, and that ShortType and Int32Type serialize to different formats. That means a fix would require A) modifying [Int32Serializer|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/serializers/Int32Serializer.java#L39-L43] to handle deserializing 2-byte shorts and B) modifying Int32Type to list itself as compatible with ShortType. Handling _all_ type conversions in that manner would be terrible, but doing it for fixed-size integer types doesn't sound unreasonable. > INT is incompatible with previous type SMALLINT > --- > > Key: CASSANDRA-15504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15504 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Truscello >Priority: Normal > > With the release of Cassandra 3.11.5 and the fixing of CASSANDRA-14948, it > now appears that you can no longer re-add a SMALLINT column as an INT type. > This is rather surprising as any SMALLINT value should be representable by an > INT type. > The following example was run on Cassandra 3.11.5 on CentOS 7 installed from > official RedHat repo: > > > {noformat} > cqlsh> CREATE KEYSPACE demo WITH replication = {'class':'SimpleStrategy', > 'replication_factor' : 1}; > cqlsh> CREATE TABLE demo.demo_table ( > ... user_id BIGINT, > ... created TIMESTAMP, > ... points SMALLINT, > ... PRIMARY KEY (user_id, created) > ... ) WITH CLUSTERING ORDER BY (created DESC); > cqlsh> ALTER TABLE demo.demo_table DROP points; > cqlsh> ALTER TABLE demo.demo_table ADD points INT; > InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot > re-add previously dropped column 'points' of type int, incompatible with > previous type smallint"{noformat} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-12995) update hppc dependency to 0.7
[ https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016208#comment-17016208 ] Tomo Suzuki commented on CASSANDRA-12995: - [~brandon.williams] I don't see the ticket or associated PR [cassandra-dtest#55|https://github.com/apache/cassandra-dtest/pull/55] touched {{com.carrotsearch:hppc dependency}} dependency. And I still see the below in https://github.com/apache/cassandra/blob/82dc720/build.xml#L577 {noformat} {noformat} > update hppc dependency to 0.7 > - > > Key: CASSANDRA-12995 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12995 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies, Packaging >Reporter: Tomas Repik >Priority: Normal > Labels: easyfix > Fix For: 4.0 > > Attachments: cassandra-3.11.0-hppc.patch > > > Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to > the sources we need to do in order to successfully build it. Cassandra > depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream > released even newer version 0.7.2. I attached a patch updating cassandra > sources to depend on the 0.7.1 hppc sources. It should be also compatible > with the newest upstream version. The only actual changes are the removal of > Open infix in class names. The issue was discussed in here: > https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15455) Upgrade com.carrotsearch:hppc dependency
[ https://issues.apache.org/jira/browse/CASSANDRA-15455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomo Suzuki updated CASSANDRA-15455: Resolution: Duplicate Status: Resolved (was: Triage Needed) [~gus] Thanks. I'm closing this ticket. > Upgrade com.carrotsearch:hppc dependency > > > Key: CASSANDRA-15455 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15455 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies >Reporter: Tomo Suzuki >Priority: Normal > > Upgrade com.carrotsearch:hppc dependency. > Current version 0.5 causes diamond dependency conflict with other dependency > (via Elasticsearch) in Apache Beam. > https://gist.github.com/suztomo/6fe16f6bda526aab97e879feac70309d -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Issue Comment Deleted] (CASSANDRA-15455) Upgrade com.carrotsearch:hppc dependency
[ https://issues.apache.org/jira/browse/CASSANDRA-15455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomo Suzuki updated CASSANDRA-15455: Comment: was deleted (was: [~brandon.williams] I don't see the ticket or associated PR [cassandra-dtest#55|https://github.com/apache/cassandra-dtest/pull/55] touched {{com.carrotsearch:hppc dependency}} dependency. And I still see the below in https://github.com/apache/cassandra/blob/82dc720/build.xml#L577 {noformat} {noformat} ) > Upgrade com.carrotsearch:hppc dependency > > > Key: CASSANDRA-15455 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15455 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies >Reporter: Tomo Suzuki >Priority: Normal > > Upgrade com.carrotsearch:hppc dependency. > Current version 0.5 causes diamond dependency conflict with other dependency > (via Elasticsearch) in Apache Beam. > https://gist.github.com/suztomo/6fe16f6bda526aab97e879feac70309d -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15455) Upgrade com.carrotsearch:hppc dependency
[ https://issues.apache.org/jira/browse/CASSANDRA-15455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016205#comment-17016205 ] Tomo Suzuki commented on CASSANDRA-15455: - [~brandon.williams] I don't see the ticket or associated PR [cassandra-dtest#55|https://github.com/apache/cassandra-dtest/pull/55] touched {{com.carrotsearch:hppc dependency}} dependency. And I still see the below in https://github.com/apache/cassandra/blob/82dc720/build.xml#L577 {noformat} {noformat} > Upgrade com.carrotsearch:hppc dependency > > > Key: CASSANDRA-15455 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15455 > Project: Cassandra > Issue Type: Improvement > Components: Dependencies >Reporter: Tomo Suzuki >Priority: Normal > > Upgrade com.carrotsearch:hppc dependency. > Current version 0.5 causes diamond dependency conflict with other dependency > (via Elasticsearch) in Apache Beam. > https://gist.github.com/suztomo/6fe16f6bda526aab97e879feac70309d -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15499) Internode message builder does not add trace header
[ https://issues.apache.org/jira/browse/CASSANDRA-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016203#comment-17016203 ] Yifan Cai commented on CASSANDRA-15499: --- Thanks [~aleksey]. Agree that {{withParams()}} is the intended way to add the trace headers and other fields, especially after seeing the unit test cases. However, I feel it is quite easy to forget adding the trace headers, and the logic of check and add is applicable to all outing messages. It, in certain degree, becomes an intrinsic step of building a message. Does it sound a valid argument? Regarding helper, probably adding a method in the builder, say {{withTracingMaybe(tracing: Tracing)}}, if the above does not sound good. And call this method for building every message... > Internode message builder does not add trace header > --- > > Key: CASSANDRA-15499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15499 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > > The messages built with the {{Builder}} > ({{org.apache.cassandra.net.Message.Builder}}) do not have the trace header > when tracing is enabled. > Consequently, no tracing session gets propagated to other nodes, and the > tracing function is broken. > The set of static {{out*}} methods provided (to create an out-bounding > message) in Message do not have the issue. They can properly add the trace > header when necessary. > To be clear, only the {{Builder}} missed adding the tracing header and it > should be fixed to be consistent with the {{out*}} methods. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15499) Internode message builder does not add trace header
[ https://issues.apache.org/jira/browse/CASSANDRA-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016193#comment-17016193 ] Aleksey Yeschenko commented on CASSANDRA-15499: --- The code looks fine, but conceptually I would prefer it to be the responsibility of {{withParams()}} caller to build the correct map, and not have {{build()}} reach out to global state to set it implicitly if avoidable. Add a helper for other callers if needed? > Internode message builder does not add trace header > --- > > Key: CASSANDRA-15499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15499 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > > The messages built with the {{Builder}} > ({{org.apache.cassandra.net.Message.Builder}}) do not have the trace header > when tracing is enabled. > Consequently, no tracing session gets propagated to other nodes, and the > tracing function is broken. > The set of static {{out*}} methods provided (to create an out-bounding > message) in Message do not have the issue. They can properly add the trace > header when necessary. > To be clear, only the {{Builder}} missed adding the tracing header and it > should be fixed to be consistent with the {{out*}} methods. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
[ https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016186#comment-17016186 ] Ekaterina Dimitrova commented on CASSANDRA-15314: - [~vinaykumarcse], please, correct me if I'm wrong but I think this one is a duplicate of CASSANDRA-15315? Shall we close this one and work on the other one? > Fix failing test - test_rolling_upgrade_with_internode_ssl - > upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD > - > > Key: CASSANDRA-15314 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15314 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Ekaterina Dimitrova >Priority: Normal > Labels: dtest > Fix For: 4.0-alpha > > > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11] > > {code:java} > ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* > now UP']: INFO [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See > system.log for remainder > self = > object at 0x7f6d90d43b38> > @pytest.mark.timeout(3000) > def test_rolling_upgrade_with_internode_ssl(self): > """ > Rolling upgrade test using internode ssl. > """ > > self.upgrade_scenario(rolling=True, internode_ssl=True) > upgrade_tests/upgrade_through_versions_test.py:296: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario > self.upgrade_to_version(version_meta, partial=True, nodes=(node,), > internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version > node.start(wait_other_notice=240, wait_for_binary_proto=True) > ../env/src/ccm/ccmlib/node.py:751: in start > node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice) > ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240 > process = None, verbose = False, filename = 'system.log' > def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, > verbose=False, filename='system.log'): > """ > Watch the log until one or more (regular) expression are found. > This methods when all the expressions have been found or the > method > timeouts (a TimeoutError is then raised). On successful > completion, > a list of pair (line matched, match object) is returned. > """ > start = time.time() > tofind = [exprs] if isinstance(exprs, string_types) else exprs > tofind = [re.compile(e) for e in tofind] > matchings = [] > reads = "" > if len(tofind) == 0: > return None > > log_file = os.path.join(self.get_path(), 'logs', filename) > output_read = False > while not os.path.exists(log_file): > time.sleep(.5) > if start + timeout < time.time(): > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", > time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be > created.".format(log_file)) > if process and not output_read: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse RuntimeError > but I'm lazy > > with open(log_file) as f: > if from_mark: > f.seek(from_mark) > > while True: > # First, if we have a process to check, then check it. > # Skip on Windows - stdout/stderr is cassandra.bat > if not common.is_win() and not output_read: > if process: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, > verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse > RuntimeError but I'm lazy > >
[jira] [Commented] (CASSANDRA-15499) Internode message builder does not add trace header
[ https://issues.apache.org/jira/browse/CASSANDRA-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016149#comment-17016149 ] Yifan Cai commented on CASSANDRA-15499: --- The message serializer should be good. Tracing headers are placed in the {{params}}. Both {{toPre40FailureResponse}} and {{toPost40FailureResponse}} copy the {{params}} and any existing trace headers fields should been copied. > Internode message builder does not add trace header > --- > > Key: CASSANDRA-15499 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15499 > Project: Cassandra > Issue Type: Bug > Components: Messaging/Internode >Reporter: Yifan Cai >Assignee: Yifan Cai >Priority: Normal > Labels: pull-request-available > Fix For: 4.x > > > The messages built with the {{Builder}} > ({{org.apache.cassandra.net.Message.Builder}}) do not have the trace header > when tracing is enabled. > Consequently, no tracing session gets propagated to other nodes, and the > tracing function is broken. > The set of static {{out*}} methods provided (to create an out-bounding > message) in Message do not have the issue. They can properly add the trace > header when necessary. > To be clear, only the {{Builder}} missed adding the tracing header and it > should be fixed to be consistent with the {{out*}} methods. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Assigned] (CASSANDRA-15315) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Tru
[ https://issues.apache.org/jira/browse/CASSANDRA-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova reassigned CASSANDRA-15315: --- Assignee: Ekaterina Dimitrova > Fix failing test - test_rolling_upgrade_with_internode_ssl - > upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD > --- > > Key: CASSANDRA-15315 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15315 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Ekaterina Dimitrova >Priority: Normal > Fix For: 4.0-alpha > > > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11] > [https://circleci.com/gh/vinaykumarchella/cassandra/451#tests/containers/11] > {code:java} > ccmlib.node.TimeoutError: 06 Sep 2019 20:21:39 [node2] Missing: ['127.0.0.1.* > now UP']: INFO [HANDSHAKE-/127.0.0.1] 2019-09-06 20:17:43,8. See > system.log for remainder > self = > object at 0x7fbb75245a90> > @pytest.mark.timeout(3000) > def test_rolling_upgrade_with_internode_ssl(self): > """ > Rolling upgrade test using internode ssl. > """ > > self.upgrade_scenario(rolling=True, internode_ssl=True) > upgrade_tests/upgrade_through_versions_test.py:296: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario > self.upgrade_to_version(version_meta, partial=True, nodes=(node,), > internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version > node.start(wait_other_notice=240, wait_for_binary_proto=True) > ../env/src/ccm/ccmlib/node.py:751: in start > node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice) > ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > exprs = ['127.0.0.1.* now UP'], from_mark = 151813, timeout = 240 > process = None, verbose = False, filename = 'system.log' > def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, > verbose=False, filename='system.log'): > """ > Watch the log until one or more (regular) expression are found. > This methods when all the expressions have been found or the > method > timeouts (a TimeoutError is then raised). On successful > completion, > a list of pair (line matched, match object) is returned. > """ > start = time.time() > tofind = [exprs] if isinstance(exprs, string_types) else exprs > tofind = [re.compile(e) for e in tofind] > matchings = [] > reads = "" > if len(tofind) == 0: > return None > > log_file = os.path.join(self.get_path(), 'logs', filename) > output_read = False > while not os.path.exists(log_file): > time.sleep(.5) > if start + timeout < time.time(): > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", > time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be > created.".format(log_file)) > if process and not output_read: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse RuntimeError > but I'm lazy > > with open(log_file) as f: > if from_mark: > f.seek(from_mark) > > while True: > # First, if we have a process to check, then check it. > # Skip on Windows - stdout/stderr is cassandra.bat > if not common.is_win() and not output_read: > if process: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, > verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse > RuntimeError but I'm lazy > > line = f.readline() > if line: >
[jira] [Assigned] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
[ https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ekaterina Dimitrova reassigned CASSANDRA-15314: --- Assignee: Ekaterina Dimitrova > Fix failing test - test_rolling_upgrade_with_internode_ssl - > upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD > - > > Key: CASSANDRA-15314 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15314 > Project: Cassandra > Issue Type: Bug > Components: Test/dtest >Reporter: Vinay Chella >Assignee: Ekaterina Dimitrova >Priority: Normal > Labels: dtest > Fix For: 4.0-alpha > > > Example failure: > [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11] > > {code:java} > ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* > now UP']: INFO [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See > system.log for remainder > self = > object at 0x7f6d90d43b38> > @pytest.mark.timeout(3000) > def test_rolling_upgrade_with_internode_ssl(self): > """ > Rolling upgrade test using internode ssl. > """ > > self.upgrade_scenario(rolling=True, internode_ssl=True) > upgrade_tests/upgrade_through_versions_test.py:296: > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario > self.upgrade_to_version(version_meta, partial=True, nodes=(node,), > internode_ssl=internode_ssl) > upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version > node.start(wait_other_notice=240, wait_for_binary_proto=True) > ../env/src/ccm/ccmlib/node.py:751: in start > node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice) > ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive > self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, > filename=filename) > _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ > _ > self = > exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240 > process = None, verbose = False, filename = 'system.log' > def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, > verbose=False, filename='system.log'): > """ > Watch the log until one or more (regular) expression are found. > This methods when all the expressions have been found or the > method > timeouts (a TimeoutError is then raised). On successful > completion, > a list of pair (line matched, match object) is returned. > """ > start = time.time() > tofind = [exprs] if isinstance(exprs, string_types) else exprs > tofind = [re.compile(e) for e in tofind] > matchings = [] > reads = "" > if len(tofind) == 0: > return None > > log_file = os.path.join(self.get_path(), 'logs', filename) > output_read = False > while not os.path.exists(log_file): > time.sleep(.5) > if start + timeout < time.time(): > raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", > time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be > created.".format(log_file)) > if process and not output_read: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse RuntimeError > but I'm lazy > > with open(log_file) as f: > if from_mark: > f.seek(from_mark) > > while True: > # First, if we have a process to check, then check it. > # Skip on Windows - stdout/stderr is cassandra.bat > if not common.is_win() and not output_read: > if process: > process.poll() > if process.returncode is not None: > self.print_process_output(self.name, process, > verbose) > output_read = True > if process.returncode != 0: > raise RuntimeError() # Shouldn't reuse > RuntimeError but I'm lazy > > line = f.readline() > if line: > reads = reads + line > for e in tofind: >
[jira] [Commented] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016122#comment-17016122 ] Alex Petrov commented on CASSANDRA-15506: - Thank you for the patch! +1, LGTM! > Run in-jvm upgrade dtests in circleci > - > > Key: CASSANDRA-15506 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15506 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > We should run the in-jvm upgrade dtests in circleci -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016077#comment-17016077 ] Benedict Elliott Smith commented on CASSANDRA-15213: Except if we use the cleaner {{stripedIndex}} calculation, we might need to go up to e.g. 8x stripes, in which case we'd need to throw {{23}} into the mix. That seems to take us up to 16x, with {{29}} taking us all the way to 64, which is way over provisioning stripes. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016076#comment-17016076 ] Jordan West commented on CASSANDRA-15213: - Sounds good. Thanks for the test / proof. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055 ] Jordan West edited comment on CASSANDRA-15213 at 1/15/20 3:15 PM: -- Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve an extra read/load (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. was (Author: jrwest): Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve an extra read/load (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. EDIT: 17 divides several potential custom bucket sizes included 102, 170, and 204. To satisfy the last requirement I think we need to pick a prime such that prime * 2 > max bucket count. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055 ] Jordan West edited comment on CASSANDRA-15213 at 1/15/20 3:14 PM: -- Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve an extra read/load (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. EDIT: 17 divides several potential custom bucket sizes included 102, 170, and 204. To satisfy the last requirement I think we need to pick a prime such that prime * 2 > max bucket count. was (Author: jrwest): Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve an extra read/load (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016070#comment-17016070 ] Benedict Elliott Smith commented on CASSANDRA-15213: Also fwiw, it looks like the primes 17 and 19 are sufficient, so we can literally just try either of those. Proof: {code} int[] primes = new int[] { 17, 19 }; BitSet sizeWithoutConflict = new BitSet(); for (int prime : primes) { for (int size = 1 ; size < 238 ; ++size) { BitSet conflict = new BitSet(); boolean hasConflict = false; for (int i = 0 ; i < size ; ++i) { if (conflict.get((i * prime) % size)) hasConflict = true; conflict.set((i * prime) % size); } if (!hasConflict) sizeWithoutConflict.set(size); } } for (int size = 1 ; size < 238 ; ++size) { if (!sizeWithoutConflict.get(size)) System.out.println(size); } {code} > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055 ] Jordan West commented on CASSANDRA-15213: - Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve extra reads (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies
[ https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055 ] Jordan West edited comment on CASSANDRA-15213 at 1/15/20 2:52 PM: -- Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve an extra read/load (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. was (Author: jrwest): Thanks. I'll start exploring that approach. I implemented a version using {{Integer.reverse}} (which distributed well) but didn't find an approach using it that didn't involve extra reads (was looking for something more along the lines of a simple calculation like this). Will report back with my testing results / findings. > DecayingEstimatedHistogramReservoir Inefficiencies > -- > > Key: CASSANDRA-15213 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15213 > Project: Cassandra > Issue Type: Bug > Components: Observability/Metrics >Reporter: Benedict Elliott Smith >Assignee: Jordan West >Priority: Normal > Fix For: 4.0-beta > > > * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user > schemas, and this will grow significantly under contention and user schemas > with many tables. This is because {{LongAdder}} is a very heavy class > designed for single contended values. > ** This can likely be improved significantly, without significant loss of > performance in the contended case, by simply increasing the size of our > primitive backing array and providing multiple buckets, with each thread > picking a bucket to increment, or simply multiple backing arrays. Probably a > better way still to do this would be to introduce some competition detection > to the update, much like {{LongAdder}} utilises, that increases the number of > backing arrays under competition. > ** To save memory this approach could partition the space into chunks that > are likely to be updated together, so that we do not need to duplicate the > entire array under competition. > * Similarly, binary search is costly and a measurable cost as a share of the > new networking work (without filtering it was > 10% of the CPU used overall). > We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, > to save the random memory access costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18
[ https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015996#comment-17015996 ] Benedict Elliott Smith commented on CASSANDRA-15430: [~tsteinmaurer] it would help if you could post the schema and example queries you are submitting to the cluster. It might be that there is a mitigation in a later version of Cassandra for the specific workload, or in the forthcoming 4.0, that might be possible for you to backport. I would also be happy to take a look at the JFR logs if we can find somewhere shared to put them. > Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations > compared to 2.1.18 > > > Key: CASSANDRA-15430 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15430 > Project: Cassandra > Issue Type: Bug >Reporter: Thomas Steinmaurer >Priority: Normal > Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png > > > In a 6 node loadtest cluster, we have been running with 2.1.18 a certain > production-like workload constantly and sufficiently. After upgrading one > node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of > regression described below), 3.0.18 is showing increased CPU usage, increase > GC, high mutation stage pending tasks, dropped mutation messages ... > Some spec. All 6 nodes equally sized: > * Bare metal, 32 physical cores, 512G RAM > * Xmx31G, G1, max pause millis = 2000ms > * cassandra.yaml basically unchanged, thus same settings in regard to number > of threads, compaction throttling etc. > Following dashboard shows highlighted areas (CPU, suspension) with metrics > for all 6 nodes and the one outlier being the node upgraded to Cassandra > 3.0.18. > !dashboard.png|width=1280! > Additionally we see a large increase on pending tasks in the mutation stage > after the upgrade: > !mutation_stage.png! > And dropped mutation messages, also confirmed in the Cassandra log: > {noformat} > INFO [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - > MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout > and 0 for cross node timeout > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool > NameActive Pending Completed Blocked All Time > Blocked > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > MutationStage 256 81824 3360532756 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ViewMutationStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - > ReadStage 0 0 62862266 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > RequestResponseStage 0 0 2176659856 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > ReadRepairStage 0 0 0 0 > 0 > INFO [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - > CounterMutationStage 0 0 0 0 > 0 > ... > {noformat} > Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different > node, high-level, it looks like the code path underneath > {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in > 3.0.18 compared to 2.1.18. > !jfr_allocations.png! > Left => 3.0.18 > Right => 2.1.18 > JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I > can upload them, if there is another destination available. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Commented] (CASSANDRA-15504) INT is incompatible with previous type SMALLINT
[ https://issues.apache.org/jira/browse/CASSANDRA-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015843#comment-17015843 ] Benedict Elliott Smith commented on CASSANDRA-15504: This was previously a bug: while it's absolutely possible to reinterpret the data on disk as an {{INT}}, this isn't what happens, and the binary representations are not interpreted correctly automatically. The entirety of our type management in this regard could do with modernising, as it should anyway be possible to re-add columns as often as you like, with whatever type you like, but in a distributed system this is more hassle than you might imagine. We'd more than welcome a contribution moving in the direction of this, but my prediction is that the active contributor community does not have the resources to dedicate to this specific issue at present. It might be that there is some middle ground that could be achieved more readily, at least with a convenience mechanism for force re-add after expunging the old data via compaction. But again, I don't think this is a priority, so you would have to take a look yourself. > INT is incompatible with previous type SMALLINT > --- > > Key: CASSANDRA-15504 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15504 > Project: Cassandra > Issue Type: Bug >Reporter: Marcus Truscello >Priority: Normal > > With the release of Cassandra 3.11.5 and the fixing of CASSANDRA-14948, it > now appears that you can no longer re-add a SMALLINT column as an INT type. > This is rather surprising as any SMALLINT value should be representable by an > INT type. > The following example was run on Cassandra 3.11.5 on CentOS 7 installed from > official RedHat repo: > > > {noformat} > cqlsh> CREATE KEYSPACE demo WITH replication = {'class':'SimpleStrategy', > 'replication_factor' : 1}; > cqlsh> CREATE TABLE demo.demo_table ( > ... user_id BIGINT, > ... created TIMESTAMP, > ... points SMALLINT, > ... PRIMARY KEY (user_id, created) > ... ) WITH CLUSTERING ORDER BY (created DESC); > cqlsh> ALTER TABLE demo.demo_table DROP points; > cqlsh> ALTER TABLE demo.demo_table ADD points INT; > InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot > re-add previously dropped column 'points' of type int, incompatible with > previous type smallint"{noformat} > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-15506: Reviewers: Alex Petrov > Run in-jvm upgrade dtests in circleci > - > > Key: CASSANDRA-15506 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15506 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > We should run the in-jvm upgrade dtests in circleci -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Comment Edited] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015788#comment-17015788 ] Marcus Eriksson edited comment on CASSANDRA-15506 at 1/15/20 9:57 AM: -- [patch|https://github.com/krummas/cassandra/commits/marcuse/upgradedtests-circle], [circleci run|https://circleci.com/workflow-run/36459d4c-431e-409d-8e32-97a0b2e14648] This patch adds a step to build the dtest jars for all current versions and stores them in the workspace to avoid rebuilding when testing, but so far it only runs the tests sequentially. The patch also fixes an issue with the `TestLocator` script - it would always exit with status code 0, which makes the build green in cci even if there are failures. was (Author: krummas): [patch|https://github.com/krummas/cassandra/commits/marcuse/upgradedtests-circle] [circleci run|https://circleci.com/workflow-run/36459d4c-431e-409d-8e32-97a0b2e14648] This patch adds a step to build the dtest jars for all current versions and stores them in the workspace to avoid rebuilding when testing, but so far it only runs the tests sequentially. The patch also fixes an issue with the `TestLocator` script - it would always exit with status code 0, which makes the build green in cci even if there are failures. > Run in-jvm upgrade dtests in circleci > - > > Key: CASSANDRA-15506 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15506 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > We should run the in-jvm upgrade dtests in circleci -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-15506: Test and Documentation Plan: circleci runs Status: Patch Available (was: Open) [patch|https://github.com/krummas/cassandra/commits/marcuse/upgradedtests-circle] [circleci run|https://circleci.com/workflow-run/36459d4c-431e-409d-8e32-97a0b2e14648] This patch adds a step to build the dtest jars for all current versions and stores them in the workspace to avoid rebuilding when testing, but so far it only runs the tests sequentially. The patch also fixes an issue with the `TestLocator` script - it would always exit with status code 0, which makes the build green in cci even if there are failures. > Run in-jvm upgrade dtests in circleci > - > > Key: CASSANDRA-15506 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15506 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > We should run the in-jvm upgrade dtests in circleci -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci
Marcus Eriksson created CASSANDRA-15506: --- Summary: Run in-jvm upgrade dtests in circleci Key: CASSANDRA-15506 URL: https://issues.apache.org/jira/browse/CASSANDRA-15506 Project: Cassandra Issue Type: Improvement Components: Test/dtest Reporter: Marcus Eriksson Assignee: Marcus Eriksson We should run the in-jvm upgrade dtests in circleci -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Updated] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci
[ https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcus Eriksson updated CASSANDRA-15506: Change Category: Quality Assurance Complexity: Low Hanging Fruit Fix Version/s: 4.x 3.11.x 3.0.x Status: Open (was: Triage Needed) > Run in-jvm upgrade dtests in circleci > - > > Key: CASSANDRA-15506 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15506 > Project: Cassandra > Issue Type: Improvement > Components: Test/dtest >Reporter: Marcus Eriksson >Assignee: Marcus Eriksson >Priority: Normal > Fix For: 3.0.x, 3.11.x, 4.x > > > We should run the in-jvm upgrade dtests in circleci -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org
[jira] [Created] (CASSANDRA-15505) Add message interceptors to in-jvm dtests
Alex Petrov created CASSANDRA-15505: --- Summary: Add message interceptors to in-jvm dtests Key: CASSANDRA-15505 URL: https://issues.apache.org/jira/browse/CASSANDRA-15505 Project: Cassandra Issue Type: New Feature Components: Test/dtest Reporter: Alex Petrov Assignee: Alex Petrov Currently we only have means to filter messages in in-jvm tests. We need a facility to intercept and modify the messages between nodes for testing purposes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org