[jira] [Comment Edited] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-15 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016628#comment-17016628
 ] 

Thomas Steinmaurer edited comment on CASSANDRA-15430 at 1/16/20 7:57 AM:
-

[~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 
here: 
[https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W]

The data model is pretty straightforward originating from Astyanax/Thrift 
legacy days, moving over to CQL, in a BLOB-centric model, with our client-side 
"serializer framework".

E.g.:
{noformat}
CREATE TABLE ks."cf" (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (n ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '2'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 259200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
{noformat}

Regarding queries. It is really just about the write path (batch message 
processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have 
tried single-partition batches vs. multi-partition batches (I know, bad 
practice), but single-partition batches didn't have a positive impact on the 
write path in 3.0 either in our tests.

Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to 
handle the same write load sufficiently. Thanks for any help in that area!


was (Author: tsteinmaurer):
[~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 
here: 
[https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W]

The data model is pretty straightforward originating from Astyanax/Thrift 
legacy days, moving over to CQL, in a BLOB-centric model, with our client-side 
"serializer framework".

E.g.:
{noformat}
CREATE TABLE ks."cf" (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (n ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '2'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 259200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
{noformat}

Regarding queries. It is really just about the write path (batch message 
processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have 
tried single-partition batches vs. multi-partition batches (I know, bad 
practice), but single-partition batches didn't have a positive impact on the 
write path in 3.0 either in our tests.

Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to 
handle the same load sufficiently. Thanks for any help in that area!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard 

[jira] [Updated] (CASSANDRA-15470) Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB & Bytes

2020-01-15 Thread Mallika Kulkarni (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mallika Kulkarni updated CASSANDRA-15470:
-
Test and Documentation Plan: Unit tests written for newly added validations
 Status: Patch Available  (was: Open)

[https://github.com/apache/cassandra/pull/425]

> Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB 
> & Bytes
> -
>
> Key: CASSANDRA-15470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15470
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jordan West
>Assignee: Mallika Kulkarni
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{DatabaseDescriptor}} has several functions that convert between user 
> supplied sizes in KB/MB and bytes. These are implemented without much 
> consistency and, while unlikely, several have the potential to overflow since 
> validation on the input is missing. Meanwhile, some widen the number to a 
> long correctly. Options include: widening in all places or simply doing 
> better validation on start up — currently only the lower bound of the valid 
> range is checked for many of these fields.
> List of Affected {{DatabaseDescriptor}} Methods:
>  * {{getColumnIndexSize}}
>  * {{getColumnIndexCacheSize}}
>  * {{getBatchSizeWarnThreshold}}
>  * {{getNativeTransportFrameBlockSize}}
>  * {{getRepairSessionSpaceInMegabytes}}
>  * {{getNativeTransportMaxFrameSize}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15470) Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB & Bytes

2020-01-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15470:
---
Labels: pull-request-available  (was: )

> Potential Overflow in DatabaseDescriptor Functions That Convert Between KB/MB 
> & Bytes
> -
>
> Key: CASSANDRA-15470
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15470
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Config
>Reporter: Jordan West
>Assignee: Mallika Kulkarni
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>
> {{DatabaseDescriptor}} has several functions that convert between user 
> supplied sizes in KB/MB and bytes. These are implemented without much 
> consistency and, while unlikely, several have the potential to overflow since 
> validation on the input is missing. Meanwhile, some widen the number to a 
> long correctly. Options include: widening in all places or simply doing 
> better validation on start up — currently only the lower bound of the valid 
> range is checked for many of these fields.
> List of Affected {{DatabaseDescriptor}} Methods:
>  * {{getColumnIndexSize}}
>  * {{getColumnIndexCacheSize}}
>  * {{getBatchSizeWarnThreshold}}
>  * {{getNativeTransportFrameBlockSize}}
>  * {{getRepairSessionSpaceInMegabytes}}
>  * {{getNativeTransportMaxFrameSize}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-15 Thread Thomas Steinmaurer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016628#comment-17016628
 ] 

Thomas Steinmaurer commented on CASSANDRA-15430:


[~benedict], please try to download the JFR files for both 2.1.18 and 3.0.18 
here: 
[https://dynatrace-my.sharepoint.com/:f:/p/thomas_steinmaurer/EoFkdBH-WnlOmuGZ4hL_8PwByBTQLwhtlBGBLW_0y3P9rg?e=uKlr6W]

The data model is pretty straightforward originating from Astyanax/Thrift 
legacy days, moving over to CQL, in a BLOB-centric model, with our client-side 
"serializer framework".

E.g.:
{noformat}
CREATE TABLE ks."cf" (
k blob,
n blob,
v blob,
PRIMARY KEY (k, n)
) WITH COMPACT STORAGE
AND CLUSTERING ORDER BY (n ASC)
AND bloom_filter_fp_chance = 0.01
AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
AND comment = ''
AND compaction = {'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 
'max_threshold': '32', 'min_threshold': '2'}
AND compression = {'chunk_length_in_kb': '64', 'class': 
'org.apache.cassandra.io.compress.LZ4Compressor'}
AND crc_check_chance = 1.0
AND dclocal_read_repair_chance = 0.0
AND default_time_to_live = 0
AND gc_grace_seconds = 259200
AND max_index_interval = 2048
AND memtable_flush_period_in_ms = 0
AND min_index_interval = 128
AND read_repair_chance = 0.0
AND speculative_retry = 'NONE';
{noformat}

Regarding queries. It is really just about the write path (batch message 
processing) in Cas 2.1 vs. 3.0 as outlined in the issue description. We have 
tried single-partition batches vs. multi-partition batches (I know, bad 
practice), but single-partition batches didn't have a positive impact on the 
write path in 3.0 either in our tests.

Moving from 2.1 to 3.0 would mean for us to add ~ 30-40% more resources to 
handle the same load sufficiently. Thanks for any help in that area!

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 

[jira] [Commented] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2020-01-15 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016527#comment-17016527
 ] 

Dinesh Joshi commented on CASSANDRA-13938:
--

Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to 
make changes on commit.

- {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out 
as a constant? I think its used in multiple locations.
- {{CompressedInputStream::chunkBytesRead}} can be package private.
- {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in 
comment.

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Aleksey Yeschenko
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> 

[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2020-01-15 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016527#comment-17016527
 ] 

Dinesh Joshi edited comment on CASSANDRA-13938 at 1/16/20 3:56 AM:
---

Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to 
make changes on commit.

- {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out 
as a constant? I think its used in multiple locations.
- {{CompressedInputStream::chunkBytesRead}} can be package private.
- {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in 
comment.

+1


was (Author: djoshi3):
Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to 
make changes on commit.

- {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out 
as a constant? I think its used in multiple locations.
- {{CompressedInputStream::chunkBytesRead}} can be package private.
- {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in 
comment.

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Aleksey Yeschenko
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace 

[jira] [Updated] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2020-01-15 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-13938:
-
Status: Ready to Commit  (was: Review In Progress)

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Aleksey Yeschenko
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace standard_long with 
> repair options (parallelism: parallel, primary range: false, incremental: 
> true, job threads: 1, ColumnFamilies: [test_data], dataCenters: [], hosts: 
> [], previewKind: NONE, # of ranges: 3, pull repair: false, force repair: 
> false)
> [2017-10-05 14:32:07,045] Repair session 7e2e8e80-a98e-11e7-ad86-cbd2801b8de2 
> for range [(3074457345618258602,-9223372036854775808], 
> (-9223372036854775808,-3074457345618258603], 
> (-3074457345618258603,3074457345618258602]] failed with error Stream failed
> [2017-10-05 14:32:07,048] null
> [2017-10-05 14:32:07,050] Repair command #4 finished in 14 seconds
> error: Repair job has failed with the error message: [2017-10-05 
> 14:32:07,048] null
> -- StackTrace --
> java.lang.RuntimeException: Repair job has failed with the error message: 
> [2017-10-05 14:32:07,048] null
>   

[jira] [Comment Edited] (CASSANDRA-13938) Default repair is broken, crashes other nodes participating in repair (in trunk)

2020-01-15 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016527#comment-17016527
 ] 

Dinesh Joshi edited comment on CASSANDRA-13938 at 1/16/20 3:56 AM:
---

Hi [~aleksey], Overall the code looks good. Minor nits only. Feel free to make 
changes on commit.

- {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out 
as a constant? I think its used in multiple locations.
- {{CompressedInputStream::chunkBytesRead}} can be package private.
- {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in 
comment.

+1


was (Author: djoshi3):
Hi [~aleksey], Overall the code looks good. Two minor nits only. Feel free to 
make changes on commit.

- {{CompressedInputStream}} - could you pull the resizing multiplier (1.5) out 
as a constant? I think its used in multiple locations.
- {{CompressedInputStream::chunkBytesRead}} can be package private.
- {{RebufferingInputStream}} - Line 106, the word 'length' has a typo in 
comment.

+1

> Default repair is broken, crashes other nodes participating in repair (in 
> trunk)
> 
>
> Key: CASSANDRA-13938
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13938
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair
>Reporter: Nate McCall
>Assignee: Aleksey Yeschenko
>Priority: Urgent
> Fix For: 4.0-alpha
>
> Attachments: 13938.yaml, test.sh
>
>
> Running through a simple scenario to test some of the new repair features, I 
> was not able to make a repair command work. Further, the exception seemed to 
> trigger a nasty failure state that basically shuts down the netty connections 
> for messaging *and* CQL on the nodes transferring back data to the node being 
> repaired. The following steps reproduce this issue consistently.
> Cassandra stress profile (probably not necessary, but this one provides a 
> really simple schema and consistent data shape):
> {noformat}
> keyspace: standard_long
> keyspace_definition: |
>   CREATE KEYSPACE standard_long WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor':3};
> table: test_data
> table_definition: |
>   CREATE TABLE test_data (
>   key text,
>   ts bigint,
>   val text,
>   PRIMARY KEY (key, ts)
>   ) WITH COMPACT STORAGE AND
>   CLUSTERING ORDER BY (ts DESC) AND
>   bloom_filter_fp_chance=0.01 AND
>   caching={'keys':'ALL', 'rows_per_partition':'NONE'} AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=864000 AND
>   read_repair_chance=0.00 AND
>   compaction={'class': 'SizeTieredCompactionStrategy'} AND
>   compression={'sstable_compression': 'LZ4Compressor'};
> columnspec:
>   - name: key
> population: uniform(1..5000) # 50 million records available
>   - name: ts
> cluster: gaussian(1..50) # Up to 50 inserts per record
>   - name: val
> population: gaussian(128..1024) # varrying size of value data
> insert:
>   partitions: fixed(1) # only one insert per batch for individual partitions
>   select: fixed(1)/1 # each insert comes in one at a time
>   batchtype: UNLOGGED
> queries:
>   single:
> cql: select * from test_data where key = ? and ts = ? limit 1;
>   series:
> cql: select key,ts,val from test_data where key = ? limit 10;
> {noformat}
> The commands to build and run:
> {noformat}
> ccm create 4_0_test -v git:trunk -n 3 -s
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=15s -rate threads=4
> # flush the memtable just to get everything on disk
> ccm node1 nodetool flush
> ccm node2 nodetool flush
> ccm node3 nodetool flush
> # disable hints for nodes 2 and 3
> ccm node2 nodetool disablehandoff
> ccm node3 nodetool disablehandoff
> # stop node1
> ccm node1 stop
> ccm stress user profile=./histo-test-schema.yml 
> ops\(insert=20,single=1,series=1\) duration=45s -rate threads=4
> # wait 10 seconds
> ccm node1 start
> # Note that we are local to ccm's nodetool install 'cause repair preview is 
> not reported yet
> node1/bin/nodetool repair --preview
> node1/bin/nodetool repair standard_long test_data
> {noformat} 
> The error outputs from the last repair command follow. First, this is stdout 
> from node1:
> {noformat}
> $ node1/bin/nodetool repair standard_long test_data
> objc[47876]: Class JavaLaunchHelper is implemented in both 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/bin/java 
> (0x10274d4c0) and 
> /Library/Java/JavaVirtualMachines/jdk1.8.0_101.jdk/Contents/Home/jre/lib/libinstrument.dylib
>  (0x1047b64e0). One of the two will be used. Which one is undefined.
> [2017-10-05 14:31:52,425] Starting repair command #4 
> (7e1a9150-a98e-11e7-ad86-cbd2801b8de2), repairing keyspace 

[jira] [Commented] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD

2020-01-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016505#comment-17016505
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15314:
-

I just started looking into it. I will let you know when there is a patch 
available for review. Thanks 

> Fix failing test - test_rolling_upgrade_with_internode_ssl - 
> upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
> -
>
> Key: CASSANDRA-15314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15314
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>  Labels: dtest
> Fix For: 4.0-alpha
>
>
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11]
>  
> {code:java}
> ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* 
> now UP']: INFO  [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See 
> system.log for remainder
> self = 
>   object at 0x7f6d90d43b38>
> @pytest.mark.timeout(3000)
> def test_rolling_upgrade_with_internode_ssl(self):
> """
> Rolling upgrade test using internode ssl.
> """
> >   self.upgrade_scenario(rolling=True, internode_ssl=True)
> upgrade_tests/upgrade_through_versions_test.py:296: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,), 
> internode_ssl=internode_ssl)
> upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version
> node.start(wait_other_notice=240, wait_for_binary_proto=True)
> ../env/src/ccm/ccmlib/node.py:751: in start
> node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice)
> ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240
> process = None, verbose = False, filename = 'system.log'
> def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log'):
> """
> Watch the log until one or more (regular) expression are found.
> This methods when all the expressions have been found or the 
> method
> timeouts (a TimeoutError is then raised). On successful 
> completion,
> a list of pair (line matched, match object) is returned.
> """
> start = time.time()
> tofind = [exprs] if isinstance(exprs, string_types) else exprs
> tofind = [re.compile(e) for e in tofind]
> matchings = []
> reads = ""
> if len(tofind) == 0:
> return None
> 
> log_file = os.path.join(self.get_path(), 'logs', filename)
> output_read = False
> while not os.path.exists(log_file):
> time.sleep(.5)
> if start + timeout < time.time():
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", 
> time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be 
> created.".format(log_file))
> if process and not output_read:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse RuntimeError 
> but I'm lazy
> 
> with open(log_file) as f:
> if from_mark:
> f.seek(from_mark)
> 
> while True:
> # First, if we have a process to check, then check it.
> # Skip on Windows - stdout/stderr is cassandra.bat
> if not common.is_win() and not output_read:
> if process:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, 
> verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse 
> RuntimeError but I'm lazy
> 
> line = f.readline()
> if 

[jira] [Commented] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD

2020-01-15 Thread Vinay Chella (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016450#comment-17016450
 ] 

Vinay Chella commented on CASSANDRA-15314:
--

I believe these two(TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD, 
TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD) are 
different test cases, could be failing for the same reason. 

> Fix failing test - test_rolling_upgrade_with_internode_ssl - 
> upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
> -
>
> Key: CASSANDRA-15314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15314
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>  Labels: dtest
> Fix For: 4.0-alpha
>
>
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11]
>  
> {code:java}
> ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* 
> now UP']: INFO  [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See 
> system.log for remainder
> self = 
>   object at 0x7f6d90d43b38>
> @pytest.mark.timeout(3000)
> def test_rolling_upgrade_with_internode_ssl(self):
> """
> Rolling upgrade test using internode ssl.
> """
> >   self.upgrade_scenario(rolling=True, internode_ssl=True)
> upgrade_tests/upgrade_through_versions_test.py:296: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,), 
> internode_ssl=internode_ssl)
> upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version
> node.start(wait_other_notice=240, wait_for_binary_proto=True)
> ../env/src/ccm/ccmlib/node.py:751: in start
> node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice)
> ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240
> process = None, verbose = False, filename = 'system.log'
> def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log'):
> """
> Watch the log until one or more (regular) expression are found.
> This methods when all the expressions have been found or the 
> method
> timeouts (a TimeoutError is then raised). On successful 
> completion,
> a list of pair (line matched, match object) is returned.
> """
> start = time.time()
> tofind = [exprs] if isinstance(exprs, string_types) else exprs
> tofind = [re.compile(e) for e in tofind]
> matchings = []
> reads = ""
> if len(tofind) == 0:
> return None
> 
> log_file = os.path.join(self.get_path(), 'logs', filename)
> output_read = False
> while not os.path.exists(log_file):
> time.sleep(.5)
> if start + timeout < time.time():
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", 
> time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be 
> created.".format(log_file))
> if process and not output_read:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse RuntimeError 
> but I'm lazy
> 
> with open(log_file) as f:
> if from_mark:
> f.seek(from_mark)
> 
> while True:
> # First, if we have a process to check, then check it.
> # Skip on Windows - stdout/stderr is cassandra.bat
> if not common.is_win() and not output_read:
> if process:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, 
> verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse 
> 

[jira] [Comment Edited] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD

2020-01-15 Thread Vinay Chella (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016450#comment-17016450
 ] 

Vinay Chella edited comment on CASSANDRA-15314 at 1/16/20 1:33 AM:
---

I believe these two(TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD, 
TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD) are 
different test cases, could be failing for the same reason. I would be happy to 
help with review both if you have a patch.


was (Author: vinaykumarcse):
I believe these two(TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD, 
TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD) are 
different test cases, could be failing for the same reason. 

> Fix failing test - test_rolling_upgrade_with_internode_ssl - 
> upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
> -
>
> Key: CASSANDRA-15314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15314
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>  Labels: dtest
> Fix For: 4.0-alpha
>
>
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11]
>  
> {code:java}
> ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* 
> now UP']: INFO  [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See 
> system.log for remainder
> self = 
>   object at 0x7f6d90d43b38>
> @pytest.mark.timeout(3000)
> def test_rolling_upgrade_with_internode_ssl(self):
> """
> Rolling upgrade test using internode ssl.
> """
> >   self.upgrade_scenario(rolling=True, internode_ssl=True)
> upgrade_tests/upgrade_through_versions_test.py:296: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,), 
> internode_ssl=internode_ssl)
> upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version
> node.start(wait_other_notice=240, wait_for_binary_proto=True)
> ../env/src/ccm/ccmlib/node.py:751: in start
> node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice)
> ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240
> process = None, verbose = False, filename = 'system.log'
> def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log'):
> """
> Watch the log until one or more (regular) expression are found.
> This methods when all the expressions have been found or the 
> method
> timeouts (a TimeoutError is then raised). On successful 
> completion,
> a list of pair (line matched, match object) is returned.
> """
> start = time.time()
> tofind = [exprs] if isinstance(exprs, string_types) else exprs
> tofind = [re.compile(e) for e in tofind]
> matchings = []
> reads = ""
> if len(tofind) == 0:
> return None
> 
> log_file = os.path.join(self.get_path(), 'logs', filename)
> output_read = False
> while not os.path.exists(log_file):
> time.sleep(.5)
> if start + timeout < time.time():
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", 
> time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be 
> created.".format(log_file))
> if process and not output_read:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse RuntimeError 
> but I'm lazy
> 
> with open(log_file) as f:
> if from_mark:
> f.seek(from_mark)
> 
> while True:
> # First, if we have a process to check, then check it.
> # Skip on Windows - stdout/stderr is cassandra.bat
> if not common.is_win() and not output_read:
> if process:
> 

[jira] [Commented] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually

2020-01-15 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016429#comment-17016429
 ] 

David Capwell commented on CASSANDRA-15507:
---

The only thing I have thought of to solve this is to make the selection 
pluggable (rather not mutate CQL for this) so dtest could just override the 
implementation.  The main reason I didn't go this route was in a attempt to try 
to make this less specific to a version; so the cost is a potentially failing 
test in the future...

> Test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  does not test a failing read repair and should be updated to actually 
> trigger a failed read repair
> 
>
> Key: CASSANDRA-15507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  makes a few assumptions which are not valid at the moment.
> 1) the write to node 1 and 2 have the same digest (they don’t, this is caused 
> by the timestamp being different)
> 2) node 3 will participate with the read; it won’t give the fact that 
> org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the 
> first 2 nodes always, so node 3 won’t get involved with the repair
> 3) node 3 will attempt to get repaired (it won’t because its never looked at)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15367) Memtable memory allocations may deadlock

2020-01-15 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016398#comment-17016398
 ] 

Benedict Elliott Smith commented on CASSANDRA-15367:


Correct, except perhaps the last part.  There's no need to collect more than 
one of these deadlocks to bring down the node.  If there are no memtable 
flushes already in progress, then no more flushes will ever occur, because they 
must wait for all earlier operations to complete, including the deadlock.  So 
from this point on no Memtable memory will ever be released.

> Memtable memory allocations may deadlock
> 
>
> Key: CASSANDRA-15367
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log, Local/Memtable
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex, 
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before 
> their flush began
> * Memtables permit operations from this cohort to fall-through to the 
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, 
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new 
> Memtable’s cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, 
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those 
> from C2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually

2020-01-15 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016394#comment-17016394
 ] 

Benedict Elliott Smith commented on CASSANDRA-15507:


Sure, wfm.  It would be nice to try to solve the general problem of nominating 
nodes to be contacted, i.e. specifying the contact preference order of nodes 
for a coordinator (since this is going to be needed in a lot of distributed 
tests), but this looks to solve the clear and present problem, so no huge harm 
punting on that.

> Test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  does not test a failing read repair and should be updated to actually 
> trigger a failed read repair
> 
>
> Key: CASSANDRA-15507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  makes a few assumptions which are not valid at the moment.
> 1) the write to node 1 and 2 have the same digest (they don’t, this is caused 
> by the timestamp being different)
> 2) node 3 will participate with the read; it won’t give the fact that 
> org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the 
> first 2 nodes always, so node 3 won’t get involved with the repair
> 3) node 3 will attempt to get repaired (it won’t because its never looked at)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15367) Memtable memory allocations may deadlock

2020-01-15 Thread Blake Eggleston (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016393#comment-17016393
 ] 

Blake Eggleston commented on CASSANDRA-15367:
-

I've been trying to work out exactly how this deadlock can occur, based on your 
description. Could the deadlock be restated like this?

 
 For a given partition key:
 * a write is part of an OpGroup before a barrier set on Memtable1 (M1), but 
with a replay position after the final replay position set on M1 before it 
flushes.
 * So it’s forwarded to M2, while still blocking flushes on M1
 * M2 has another in flight write for this partition, it’s contended, so it’s 
holding the lock
 ** It can’t progress because it can’t allocate memory (in part because M1 
can’t flush)
 ** It doesn’t degrade to allocating on heap it’s oporder isn’t blocking 
anything.
 * The write stage becomes saturated with deadlocked writes like these, no more 
writes

 

 

> Memtable memory allocations may deadlock
> 
>
> Key: CASSANDRA-15367
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15367
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Commit Log, Local/Memtable
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Normal
> Fix For: 4.0, 2.2.x, 3.0.x, 3.11.x
>
>
> * Under heavy contention, we guard modifications to a partition with a mutex, 
> for the lifetime of the memtable.
> * Memtables block for the completion of all {{OpOrder.Group}} started before 
> their flush began
> * Memtables permit operations from this cohort to fall-through to the 
> following Memtable, in order to guarantee a precise commitLogUpperBound
> * Memtable memory limits may be lifted for operations in the first cohort, 
> since they block flush (and hence block future memory allocation)
> With very unfortunate scheduling
> * A contended partition may rapidly escalate to a mutex
> * The system may reach memory limits that prevent allocations for the new 
> Memtable’s cohort (C2) 
> * An operation from C2 may hold the mutex when this occurs
> * Operations from a prior Memtable’s cohort (C1), for a contended partition, 
> may fall-through to the next Memtable
> * The operations from C1 may execute after the above is encountered by those 
> from C2



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t

2020-01-15 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15507:
--
Test and Documentation Plan: 
PR: https://github.com/apache/cassandra/pull/424
CircleCI: 
https://circleci.com/gh/dcapwell/cassandra/tree/fixDistributedReadWritePathTestTest
 Status: Patch Available  (was: Open)

> Test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  does not test a failing read repair and should be updated to actually 
> trigger a failed read repair
> 
>
> Key: CASSANDRA-15507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  makes a few assumptions which are not valid at the moment.
> 1) the write to node 1 and 2 have the same digest (they don’t, this is caused 
> by the timestamp being different)
> 2) node 3 will participate with the read; it won’t give the fact that 
> org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the 
> first 2 nodes always, so node 3 won’t get involved with the repair
> 3) node 3 will attempt to get repaired (it won’t because its never looked at)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t

2020-01-15 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15507:
--
Test and Documentation Plan: 
PR: https://github.com/apache/cassandra/pull/424

CircleCI: 
https://circleci.com/gh/dcapwell/cassandra/tree/fixDistributedReadWritePathTestTest

  was:
PR: https://github.com/apache/cassandra/pull/424
CircleCI: 
https://circleci.com/gh/dcapwell/cassandra/tree/fixDistributedReadWritePathTestTest


> Test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  does not test a failing read repair and should be updated to actually 
> trigger a failed read repair
> 
>
> Key: CASSANDRA-15507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  makes a few assumptions which are not valid at the moment.
> 1) the write to node 1 and 2 have the same digest (they don’t, this is caused 
> by the timestamp being different)
> 2) node 3 will participate with the read; it won’t give the fact that 
> org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the 
> first 2 nodes always, so node 3 won’t get involved with the repair
> 3) node 3 will attempt to get repaired (it won’t because its never looked at)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t

2020-01-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15507:
---
Labels: pull-request-available  (was: )

> Test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  does not test a failing read repair and should be updated to actually 
> trigger a failed read repair
> 
>
> Key: CASSANDRA-15507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>
> The test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  makes a few assumptions which are not valid at the moment.
> 1) the write to node 1 and 2 have the same digest (they don’t, this is caused 
> by the timestamp being different)
> 2) node 3 will participate with the read; it won’t give the fact that 
> org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the 
> first 2 nodes always, so node 3 won’t get involved with the repair
> 3) node 3 will attempt to get repaired (it won’t because its never looked at)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default

2020-01-15 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15216:
-
  Since Version: 4.0
Source Control Link: 
https://github.com/apache/cassandra/commit/9d2ffad6b6d09761a03aeb1a207e9780d1174046
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
> Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot 
> 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default

2020-01-15 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15216:
-
Status: Ready to Commit  (was: Review In Progress)

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
> Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot 
> 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default

2020-01-15 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15216:
-
Reviewers: Brandon Williams, Brandon Williams  (was: Brandon Williams)
   Brandon Williams, Brandon Williams
   Status: Review In Progress  (was: Patch Available)

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
> Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot 
> 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Set cross_node_timeout to true by default.

2020-01-15 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 9d2ffad  Set cross_node_timeout to true by default.
9d2ffad is described below

commit 9d2ffad6b6d09761a03aeb1a207e9780d1174046
Author: Ekaterina Dimitrova 
AuthorDate: Mon Jan 13 14:29:38 2020 -0500

Set cross_node_timeout to true by default.

Patch by Ekaterina Dimitrova, reviewed by brandonwilliams for
CASSANDRA-15216
---
 CHANGES.txt  | 4 
 NEWS.txt | 6 ++
 conf/cassandra.yaml  | 6 +++---
 src/java/org/apache/cassandra/config/Config.java | 2 +-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index b6d140c..522edf8 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -14,6 +14,10 @@
  * Add documentation for Java 11 support in Cassandra (CASSANDRA-15428)
  * Integrate SJK into nodetool (CASSANDRA-12197)
  * Ensure that empty clusterings with kind==CLUSTERING are Clustering.EMPTY 
(CASSANDRA-15498)
+ * The flag 'cross_node_timeout' has been set as true by default. This change
+   is done under the assumption that users have setup NTP on their clusters or
+   otherwise synchronize their clocks, and that clocks are mostly in sync, 
since
+   this is a requirement for general correctness of last write wins. 
(CASSANDRA-15216)
 Merged from 3.11:
  * Fix nodetool compactionstats showing extra pending task for TWCS - patch 
implemented (CASSANDRA-15409)
  * Fix SELECT JSON formatting for the "duration" type (CASSANDRA-15075)
diff --git a/NEWS.txt b/NEWS.txt
index 86de7a4..e51203f 100644
--- a/NEWS.txt
+++ b/NEWS.txt
@@ -218,6 +218,12 @@ Upgrading
   have been set as false by default. Operators should modify them to allow 
the
   creation of new views and SASI indexes, the existing ones will continue 
working.
   See CASSANDRA-14866 for details.
+- CASSANDRA-15216 - The flag 'cross_node_timeout' has been set as true by 
default.
+  This change is done under the assumption that users have setup NTP on
+  their clusters or otherwise synchronize their clocks, and that clocks are
+  mostly in sync, since this is a requirement for general correctness of
+  last write wins.
+
 
 Materialized Views
 ---
diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index 9a79f24..f1e5864 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml
@@ -923,9 +923,9 @@ slow_query_log_timeout_in_ms: 500
 # under overload conditions we will waste that much extra time processing 
 # already-timed-out requests.
 #
-# Warning: before enabling this property make sure to ntp is installed
-# and the times are synchronized between the nodes.
-cross_node_timeout: false
+# Warning: It is generally assumed that users have setup NTP on their 
clusters, and that clocks are modestly in sync, 
+# since this is a requirement for general correctness of last write wins.
+#cross_node_timeout: true
 
 # Set keep-alive period for streaming
 # This node will send a keep-alive message periodically with this period.
diff --git a/src/java/org/apache/cassandra/config/Config.java 
b/src/java/org/apache/cassandra/config/Config.java
index 8fa8e72..2d74426 100644
--- a/src/java/org/apache/cassandra/config/Config.java
+++ b/src/java/org/apache/cassandra/config/Config.java
@@ -108,7 +108,7 @@ public class Config
 public Integer streaming_connections_per_host = 1;
 public Integer streaming_keep_alive_period_in_secs = 300; //5 minutes
 
-public boolean cross_node_timeout = false;
+public boolean cross_node_timeout = true;
 
 public volatile long slow_query_log_timeout_in_ms = 500L;
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default

2020-01-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-15216:

Impacts: Docs  (was: None)
Test and Documentation Plan: Documented in NEWS.txt
 Status: Patch Available  (was: In Progress)

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
> Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot 
> 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15216) Cross node message creation times are disabled by default

2020-01-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016361#comment-17016361
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-15216 at 1/15/20 10:10 PM:
---

Patch available for trunk 
[here|https://github.com/ekaterinadimitrova2/cassandra/tree/trunk-CASSANDRA-15216]

[Pull request|https://github.com/ekaterinadimitrova2/cassandra/pull/17]

Screenshots from the CI ran attached. If you look at the failures of "test 
all", there are some which I don't see when I am running CI on trunk but most 
of them are marked as flaky. I think it should be good. NEWS.txt updated as 
agreed earlier. 


was (Author: e.dimitrova):
Patch available for trunk 
[here|https://github.com/ekaterinadimitrova2/cassandra/tree/trunk-CASSANDRA-15216]

[Pull request | https://github.com/ekaterinadimitrova2/cassandra/pull/17]

Screenshots from the CI ran attached. If you look at the failures of "test 
all", there are some which I don't see when I am running CI on trunk but most 
of them are marked as flaky. I think it should be good. NEWS.txt updated as 
agreed earlier. 

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
> Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot 
> 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15216) Cross node message creation times are disabled by default

2020-01-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016361#comment-17016361
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15216:
-

Patch available for trunk 
[here|https://github.com/ekaterinadimitrova2/cassandra/tree/trunk-CASSANDRA-15216]

[Pull request | https://github.com/ekaterinadimitrova2/cassandra/pull/17]

Screenshots from the CI ran attached. If you look at the failures of "test 
all", there are some which I don't see when I am running CI on trunk but most 
of them are marked as flaky. I think it should be good. NEWS.txt updated as 
agreed earlier. 

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
> Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot 
> 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15216) Cross node message creation times are disabled by default

2020-01-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-15216:

Attachment: Screen Shot 2020-01-15 at 5.02.22 PM.png
Screen Shot 2020-01-15 at 5.01.33 PM.png
Screen Shot 2020-01-15 at 5.01.06 PM.png

> Cross node message creation times are disabled by default
> -
>
> Key: CASSANDRA-15216
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15216
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0, 4.0-alpha
>
> Attachments: Screen Shot 2020-01-15 at 5.01.06 PM.png, Screen Shot 
> 2020-01-15 at 5.01.33 PM.png, Screen Shot 2020-01-15 at 5.02.22 PM.png
>
>
> This can cause a lot of wasted work for messages that have timed out on the 
> coordinator.  We should generally assume that our users have setup NTP on 
> their clusters, and that clocks are modestly in sync, since it’s a 
> requirement for general correctness of last write wins.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t

2020-01-15 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15507:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Low Hanging Fruit
Discovered By: Code Inspection
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  does not test a failing read repair and should be updated to actually 
> trigger a failed read repair
> 
>
> Key: CASSANDRA-15507
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15507
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>
> The test 
> org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
>  makes a few assumptions which are not valid at the moment.
> 1) the write to node 1 and 2 have the same digest (they don’t, this is caused 
> by the timestamp being different)
> 2) node 3 will participate with the read; it won’t give the fact that 
> org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the 
> first 2 nodes always, so node 3 won’t get involved with the repair
> 3) node 3 will attempt to get repaired (it won’t because its never looked at)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15507) Test org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest does not test a failing read repair and should be updated to actually t

2020-01-15 Thread David Capwell (Jira)
David Capwell created CASSANDRA-15507:
-

 Summary: Test 
org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
 does not test a failing read repair and should be updated to actually trigger 
a failed read repair
 Key: CASSANDRA-15507
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15507
 Project: Cassandra
  Issue Type: Bug
  Components: Test/unit
Reporter: David Capwell
Assignee: David Capwell


The test 
org.apache.cassandra.distributed.test.DistributedReadWritePathTest#failingReadRepairTest
 makes a few assumptions which are not valid at the moment.

1) the write to node 1 and 2 have the same digest (they don’t, this is caused 
by the timestamp being different)
2) node 3 will participate with the read; it won’t give the fact that 
org.apache.cassandra.locator.ReplicaPlans#contactForRead will speculate the 
first 2 nodes always, so node 3 won’t get involved with the repair
3) node 3 will attempt to get repaired (it won’t because its never looked at)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12995) update hppc dependency to 0.7

2020-01-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016334#comment-17016334
 ] 

Ekaterina Dimitrova edited comment on CASSANDRA-12995 at 1/15/20 9:24 PM:
--

[~suztomo] [~brandon.williams]

Thank you for raising it. As part of CASSANDRA-12197 hppc was not updated.

I guess the confusion comes from the removal of the version flag in all-pom 
where no other package has version added. 

Best practices were followed as per the  [official documentation 
|http://cassandra.apache.org/doc/latest/development/dependencies.html] while 
only adding new libraries in order to support SJK. 

Also, as pointed by [~suztomo] - hppc version in parent pom was not changed, 
neither a jar was updated.

Also, from the code itself - 



Please let me know if I miss something.


was (Author: e.dimitrova):
[~suztomo] [~brandon.williams]

Thank you for raising it. As part of CASSANDRA-12197 hppc was not updated.

I guess the confusion comes from the removal of the version flag. 

Best practices were followed as per the  [official documentation 
|http://cassandra.apache.org/doc/latest/development/dependencies.html] while 
only adding new libraries in order to support SJK. 

Also, as pointed by [~suztomo] - hppc version in parent pom was not changed, 
neither a jar was updated.

Also, from the code itself - 



Please let me know if I miss something.

> update hppc dependency to 0.7
> -
>
> Key: CASSANDRA-12995
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12995
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies, Packaging
>Reporter: Tomas Repik
>Priority: Normal
>  Labels: easyfix
> Fix For: 4.0
>
> Attachments: cassandra-3.11.0-hppc.patch
>
>
> Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to 
> the sources we need to do in order to successfully build it. Cassandra 
> depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream 
> released even newer version 0.7.2. I attached a patch updating cassandra 
> sources to depend on the 0.7.1 hppc sources. It should be also compatible 
> with the newest upstream version. The only actual changes are the removal of 
> Open infix in class names. The issue was discussed in here: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12995) update hppc dependency to 0.7

2020-01-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016334#comment-17016334
 ] 

Ekaterina Dimitrova commented on CASSANDRA-12995:
-

[~suztomo] [~brandon.williams]

Thank you for raising it. As part of CASSANDRA-12197 hppc was not updated.

I guess the confusion comes from the removal of the version flag. 

Best practices were followed as per the  [official documentation 
|http://cassandra.apache.org/doc/latest/development/dependencies.html] while 
only adding new libraries in order to support SJK. 

Also, as pointed by [~suztomo] - hppc version in parent pom was not changed, 
neither a jar was updated.

Also, from the code itself - 



Please let me know if I miss something.

> update hppc dependency to 0.7
> -
>
> Key: CASSANDRA-12995
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12995
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies, Packaging
>Reporter: Tomas Repik
>Priority: Normal
>  Labels: easyfix
> Fix For: 4.0
>
> Attachments: cassandra-3.11.0-hppc.patch
>
>
> Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to 
> the sources we need to do in order to successfully build it. Cassandra 
> depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream 
> released even newer version 0.7.2. I attached a patch updating cassandra 
> sources to depend on the 0.7.1 hppc sources. It should be also compatible 
> with the newest upstream version. The only actual changes are the removal of 
> Open infix in class names. The issue was discussed in here: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12995) update hppc dependency to 0.7

2020-01-15 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016288#comment-17016288
 ] 

Brandon Williams commented on CASSANDRA-12995:
--

It looks like the issue is we have it again here: 
[https://github.com/apache/cassandra/blob/trunk/build.xml#L783]  [~e.dimitrova] 
was this oversight?

> update hppc dependency to 0.7
> -
>
> Key: CASSANDRA-12995
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12995
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies, Packaging
>Reporter: Tomas Repik
>Priority: Normal
>  Labels: easyfix
> Fix For: 4.0
>
> Attachments: cassandra-3.11.0-hppc.patch
>
>
> Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to 
> the sources we need to do in order to successfully build it. Cassandra 
> depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream 
> released even newer version 0.7.2. I attached a patch updating cassandra 
> sources to depend on the 0.7.1 hppc sources. It should be also compatible 
> with the newest upstream version. The only actual changes are the removal of 
> Open infix in class names. The issue was discussed in here: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15504) INT is incompatible with previous type SMALLINT

2020-01-15 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016226#comment-17016226
 ] 

Benedict Elliott Smith commented on CASSANDRA-15504:


It's even more complicated than you might think.  These are some of the factors 
that come to mind initially, that are probably not a complete catalogue of the 
issues:

# 3.0 format sstables persist data in a manner that requires us to know how 
many bytes are used, and they do not record what the type was when the sstable 
was written.  So at minimum to support this we would need to persist this type 
information in all sstables (which we should anyway, but don't currently), as 
opposed to using the system tables.
# We have to handle data from legacy sstables, which persist no information at 
all about what data they contain, and for which it is very possible to find 
poorly typed legacy information floating around from before we had proper 
checks, and permitted mangling of type casts to write arbitrary things
# So, we'd need (1) and we'd need to ensure we didn't support any such 
operation until we had established that no dangerous files exist on the 
cluster, on any node (including refusing restoring them from backup or 
importing them, for instance), but wait, we're not done
# Currently schema changes are also eventually consistent - this is slated to 
be changed, but not for some time, and it will always have eventually 
consistent propagation, even if there is serialized decision-making.  So: what 
happens if a node requests data for a field that used to be a different type 
and _still is_ on the other node?  How do we know what type we will receive?  
We will need to verify the schema we're communicating with for each operation 
between each pair of nodes.  Which, again, is definitely something that is 
likely to be implemented in the future, but it's non-trivial, and not pressing.

The long and the short of it is that schema behaviours were implemented back in 
the Wild West era of Cassandra, and it's actually a lot more involved than the 
implementors originally imagined.  So until we have time to do it properly, 
we've had to disable features like this that can lead to corrupted data through 
misinterpretation - however unlikely it might be.

That said, in the meantime it's certainly possible to do this as an operator, 
it just requires some annoying surgery on your cluster.  Or, as I say, we'd be 
more than happy for a volunteer with the time to take up this task.

> INT is incompatible with previous type SMALLINT
> ---
>
> Key: CASSANDRA-15504
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15504
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Truscello
>Priority: Normal
>
> With the release of Cassandra 3.11.5 and the fixing of CASSANDRA-14948, it 
> now appears that you can no longer re-add a SMALLINT column as an INT type.  
> This is rather surprising as any SMALLINT value should be representable by an 
> INT type.
> The following example was run on Cassandra 3.11.5 on CentOS 7 installed from 
> official RedHat repo:
>  
>  
> {noformat}
> cqlsh> CREATE KEYSPACE demo WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor' : 1};
> cqlsh> CREATE TABLE demo.demo_table (
>    ...   user_id BIGINT,
>    ...   created TIMESTAMP,
>    ...   points  SMALLINT,
>    ...   PRIMARY KEY (user_id, created)
>    ... ) WITH CLUSTERING ORDER BY (created DESC);
> cqlsh> ALTER TABLE demo.demo_table DROP points;
> cqlsh> ALTER TABLE demo.demo_table ADD  points INT;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
> re-add previously dropped column 'points' of type int, incompatible with 
> previous type smallint"{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14740) BlockingReadRepair does not maintain monotonicity during range movements

2020-01-15 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14740?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-14740:

Reviewers: Sam Tunnicliffe

> BlockingReadRepair does not maintain monotonicity during range movements
> 
>
> Key: CASSANDRA-14740
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14740
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Coordination
>Reporter: Benedict Elliott Smith
>Assignee: Benedict Elliott Smith
>Priority: Urgent
>  Labels: correctness
> Fix For: 4.0, 4.0-beta
>
>
> The BlockingReadRepair code introduced by CASSANDRA-10726 requires that each 
> of the queried nodes are written to, but pending nodes are not considered.  
> If there is a pending range movement, one of these writes may be ‘lost’ when 
> the range movement completes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15504) INT is incompatible with previous type SMALLINT

2020-01-15 Thread Marcus Truscello (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016214#comment-17016214
 ] 

Marcus Truscello commented on CASSANDRA-15504:
--

That's unfortunate to hear.  However, I was thinking something a bit more 
simple: making SMALLINT considered "compatible" with INTs.

Currently, it appears that Int32s are only compatible with themselves (they 
lack an {{isValueCompatibleWith}} method) but Shorts do [offer a toInt 
method|[https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/db/marshal/ShortType.java#L102-L105]],
 so it should be possible to upgrade Shorts to Int32s.

But if I'm understanding you correctly, that solution wouldn't work, would it?  
It sounds like the binary representations are being blindly deserialized to the 
current type, and that ShortType and Int32Type serialize to different formats.  
That means a fix would require A) modifying 
[Int32Serializer|https://github.com/apache/cassandra/blob/06209037ea56b5a2a49615a99f1542d6ea1b2947/src/java/org/apache/cassandra/serializers/Int32Serializer.java#L39-L43]
 to handle deserializing 2-byte shorts and B) modifying Int32Type to list 
itself as compatible with ShortType.

Handling _all_ type conversions in that manner would be terrible, but doing it 
for fixed-size integer types doesn't sound unreasonable.

> INT is incompatible with previous type SMALLINT
> ---
>
> Key: CASSANDRA-15504
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15504
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Truscello
>Priority: Normal
>
> With the release of Cassandra 3.11.5 and the fixing of CASSANDRA-14948, it 
> now appears that you can no longer re-add a SMALLINT column as an INT type.  
> This is rather surprising as any SMALLINT value should be representable by an 
> INT type.
> The following example was run on Cassandra 3.11.5 on CentOS 7 installed from 
> official RedHat repo:
>  
>  
> {noformat}
> cqlsh> CREATE KEYSPACE demo WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor' : 1};
> cqlsh> CREATE TABLE demo.demo_table (
>    ...   user_id BIGINT,
>    ...   created TIMESTAMP,
>    ...   points  SMALLINT,
>    ...   PRIMARY KEY (user_id, created)
>    ... ) WITH CLUSTERING ORDER BY (created DESC);
> cqlsh> ALTER TABLE demo.demo_table DROP points;
> cqlsh> ALTER TABLE demo.demo_table ADD  points INT;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
> re-add previously dropped column 'points' of type int, incompatible with 
> previous type smallint"{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12995) update hppc dependency to 0.7

2020-01-15 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016208#comment-17016208
 ] 

Tomo Suzuki commented on CASSANDRA-12995:
-

[~brandon.williams] I don't see the ticket or associated PR 
[cassandra-dtest#55|https://github.com/apache/cassandra-dtest/pull/55] touched 
{{com.carrotsearch:hppc dependency}} dependency.

And I still see the below in  
https://github.com/apache/cassandra/blob/82dc720/build.xml#L577

{noformat}

{noformat}


> update hppc dependency to 0.7
> -
>
> Key: CASSANDRA-12995
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12995
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies, Packaging
>Reporter: Tomas Repik
>Priority: Normal
>  Labels: easyfix
> Fix For: 4.0
>
> Attachments: cassandra-3.11.0-hppc.patch
>
>
> Cassandra 3.11.0 is about to be included in Fedora. There are some tweaks to 
> the sources we need to do in order to successfully build it. Cassandra 
> depends on hppc 0.5.4, but In Fedora we have the newer version 0.7.1 Upstream 
> released even newer version 0.7.2. I attached a patch updating cassandra 
> sources to depend on the 0.7.1 hppc sources. It should be also compatible 
> with the newest upstream version. The only actual changes are the removal of 
> Open infix in class names. The issue was discussed in here: 
> https://bugzilla.redhat.com/show_bug.cgi?id=1340876 Please consider updating.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15455) Upgrade com.carrotsearch:hppc dependency

2020-01-15 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated CASSANDRA-15455:

Resolution: Duplicate
Status: Resolved  (was: Triage Needed)

[~gus] Thanks. I'm closing this ticket.

> Upgrade com.carrotsearch:hppc dependency
> 
>
> Key: CASSANDRA-15455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15455
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies
>Reporter: Tomo Suzuki
>Priority: Normal
>
> Upgrade com.carrotsearch:hppc dependency.
> Current version 0.5 causes diamond dependency conflict with other dependency 
> (via Elasticsearch) in Apache Beam.
> https://gist.github.com/suztomo/6fe16f6bda526aab97e879feac70309d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-15455) Upgrade com.carrotsearch:hppc dependency

2020-01-15 Thread Tomo Suzuki (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomo Suzuki updated CASSANDRA-15455:

Comment: was deleted

(was: [~brandon.williams] I don't see the ticket or associated PR 
[cassandra-dtest#55|https://github.com/apache/cassandra-dtest/pull/55] touched 
{{com.carrotsearch:hppc dependency}} dependency.

And I still see the below in  
https://github.com/apache/cassandra/blob/82dc720/build.xml#L577

{noformat}

{noformat}
)

> Upgrade com.carrotsearch:hppc dependency
> 
>
> Key: CASSANDRA-15455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15455
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies
>Reporter: Tomo Suzuki
>Priority: Normal
>
> Upgrade com.carrotsearch:hppc dependency.
> Current version 0.5 causes diamond dependency conflict with other dependency 
> (via Elasticsearch) in Apache Beam.
> https://gist.github.com/suztomo/6fe16f6bda526aab97e879feac70309d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15455) Upgrade com.carrotsearch:hppc dependency

2020-01-15 Thread Tomo Suzuki (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016205#comment-17016205
 ] 

Tomo Suzuki commented on CASSANDRA-15455:
-

[~brandon.williams] I don't see the ticket or associated PR 
[cassandra-dtest#55|https://github.com/apache/cassandra-dtest/pull/55] touched 
{{com.carrotsearch:hppc dependency}} dependency.

And I still see the below in  
https://github.com/apache/cassandra/blob/82dc720/build.xml#L577

{noformat}

{noformat}


> Upgrade com.carrotsearch:hppc dependency
> 
>
> Key: CASSANDRA-15455
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15455
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Dependencies
>Reporter: Tomo Suzuki
>Priority: Normal
>
> Upgrade com.carrotsearch:hppc dependency.
> Current version 0.5 causes diamond dependency conflict with other dependency 
> (via Elasticsearch) in Apache Beam.
> https://gist.github.com/suztomo/6fe16f6bda526aab97e879feac70309d



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15499) Internode message builder does not add trace header

2020-01-15 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016203#comment-17016203
 ] 

Yifan Cai commented on CASSANDRA-15499:
---

Thanks [~aleksey]. 

Agree that {{withParams()}} is the intended way to add the trace headers and 
other fields, especially after seeing the unit test cases. However, I feel it 
is quite easy to forget adding the trace headers, and the logic of check and 
add is applicable to all outing messages. It, in certain degree, becomes an 
intrinsic step of building a message. Does it sound a valid argument?

Regarding helper, probably adding a method in the builder, say 
{{withTracingMaybe(tracing: Tracing)}}, if the above does not sound good. And 
call this method for building every message...

> Internode message builder does not add trace header
> ---
>
> Key: CASSANDRA-15499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15499
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.x
>
>
> The messages built with the {{Builder}} 
> ({{org.apache.cassandra.net.Message.Builder}}) do not have the trace header 
> when tracing is enabled. 
> Consequently, no tracing session gets propagated to other nodes, and the 
> tracing function is broken. 
> The set of static {{out*}} methods provided (to create an out-bounding 
> message) in Message do not have the issue. They can properly add the trace 
> header when necessary. 
> To be clear, only the {{Builder}} missed adding the tracing header and it 
> should be fixed to be consistent with the {{out*}} methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15499) Internode message builder does not add trace header

2020-01-15 Thread Aleksey Yeschenko (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016193#comment-17016193
 ] 

Aleksey Yeschenko commented on CASSANDRA-15499:
---

The code looks fine, but conceptually I would prefer it to be the 
responsibility of {{withParams()}} caller to build the correct map, and not 
have {{build()}} reach out to global state to set it implicitly if avoidable. 
Add a helper for other callers if needed?

> Internode message builder does not add trace header
> ---
>
> Key: CASSANDRA-15499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15499
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.x
>
>
> The messages built with the {{Builder}} 
> ({{org.apache.cassandra.net.Message.Builder}}) do not have the trace header 
> when tracing is enabled. 
> Consequently, no tracing session gets propagated to other nodes, and the 
> tracing function is broken. 
> The set of static {{out*}} methods provided (to create an out-bounding 
> message) in Message do not have the issue. They can properly add the trace 
> header when necessary. 
> To be clear, only the {{Builder}} missed adding the tracing header and it 
> should be fixed to be consistent with the {{out*}} methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD

2020-01-15 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016186#comment-17016186
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15314:
-

[~vinaykumarcse], please, correct me if I'm wrong but I think this one is a 
duplicate of CASSANDRA-15315? Shall we close this one and work on the other one?

> Fix failing test - test_rolling_upgrade_with_internode_ssl - 
> upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
> -
>
> Key: CASSANDRA-15314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15314
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>  Labels: dtest
> Fix For: 4.0-alpha
>
>
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11]
>  
> {code:java}
> ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* 
> now UP']: INFO  [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See 
> system.log for remainder
> self = 
>   object at 0x7f6d90d43b38>
> @pytest.mark.timeout(3000)
> def test_rolling_upgrade_with_internode_ssl(self):
> """
> Rolling upgrade test using internode ssl.
> """
> >   self.upgrade_scenario(rolling=True, internode_ssl=True)
> upgrade_tests/upgrade_through_versions_test.py:296: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,), 
> internode_ssl=internode_ssl)
> upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version
> node.start(wait_other_notice=240, wait_for_binary_proto=True)
> ../env/src/ccm/ccmlib/node.py:751: in start
> node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice)
> ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240
> process = None, verbose = False, filename = 'system.log'
> def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log'):
> """
> Watch the log until one or more (regular) expression are found.
> This methods when all the expressions have been found or the 
> method
> timeouts (a TimeoutError is then raised). On successful 
> completion,
> a list of pair (line matched, match object) is returned.
> """
> start = time.time()
> tofind = [exprs] if isinstance(exprs, string_types) else exprs
> tofind = [re.compile(e) for e in tofind]
> matchings = []
> reads = ""
> if len(tofind) == 0:
> return None
> 
> log_file = os.path.join(self.get_path(), 'logs', filename)
> output_read = False
> while not os.path.exists(log_file):
> time.sleep(.5)
> if start + timeout < time.time():
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", 
> time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be 
> created.".format(log_file))
> if process and not output_read:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse RuntimeError 
> but I'm lazy
> 
> with open(log_file) as f:
> if from_mark:
> f.seek(from_mark)
> 
> while True:
> # First, if we have a process to check, then check it.
> # Skip on Windows - stdout/stderr is cassandra.bat
> if not common.is_win() and not output_read:
> if process:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, 
> verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse 
> RuntimeError but I'm lazy
> 
> 

[jira] [Commented] (CASSANDRA-15499) Internode message builder does not add trace header

2020-01-15 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016149#comment-17016149
 ] 

Yifan Cai commented on CASSANDRA-15499:
---

The message serializer should be good.

Tracing headers are placed in the {{params}}. Both {{toPre40FailureResponse}} 
and {{toPost40FailureResponse}} copy the {{params}} and any existing trace 
headers fields should been copied.

> Internode message builder does not add trace header
> ---
>
> Key: CASSANDRA-15499
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15499
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.x
>
>
> The messages built with the {{Builder}} 
> ({{org.apache.cassandra.net.Message.Builder}}) do not have the trace header 
> when tracing is enabled. 
> Consequently, no tracing session gets propagated to other nodes, and the 
> tracing function is broken. 
> The set of static {{out*}} methods provided (to create an out-bounding 
> message) in Message do not have the issue. They can properly add the trace 
> header when necessary. 
> To be clear, only the {{Builder}} missed adding the tracing header and it 
> should be fixed to be consistent with the {{out*}} methods.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15315) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Tru

2020-01-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova reassigned CASSANDRA-15315:
---

Assignee: Ekaterina Dimitrova

> Fix failing test - test_rolling_upgrade_with_internode_ssl - 
> upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_RandomPartitioner_EndsAt_Trunk_HEAD
> ---
>
> Key: CASSANDRA-15315
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15315
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> Example failure:
> [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11]
> [https://circleci.com/gh/vinaykumarchella/cassandra/451#tests/containers/11]
> {code:java}
> ccmlib.node.TimeoutError: 06 Sep 2019 20:21:39 [node2] Missing: ['127.0.0.1.* 
> now UP']: INFO  [HANDSHAKE-/127.0.0.1] 2019-09-06 20:17:43,8. See 
> system.log for remainder
> self = 
>   object at 0x7fbb75245a90>
> @pytest.mark.timeout(3000)
> def test_rolling_upgrade_with_internode_ssl(self):
> """
> Rolling upgrade test using internode ssl.
> """
> >   self.upgrade_scenario(rolling=True, internode_ssl=True)
> upgrade_tests/upgrade_through_versions_test.py:296: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,), 
> internode_ssl=internode_ssl)
> upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version
> node.start(wait_other_notice=240, wait_for_binary_proto=True)
> ../env/src/ccm/ccmlib/node.py:751: in start
> node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice)
> ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> exprs = ['127.0.0.1.* now UP'], from_mark = 151813, timeout = 240
> process = None, verbose = False, filename = 'system.log'
> def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log'):
> """
> Watch the log until one or more (regular) expression are found.
> This methods when all the expressions have been found or the 
> method
> timeouts (a TimeoutError is then raised). On successful 
> completion,
> a list of pair (line matched, match object) is returned.
> """
> start = time.time()
> tofind = [exprs] if isinstance(exprs, string_types) else exprs
> tofind = [re.compile(e) for e in tofind]
> matchings = []
> reads = ""
> if len(tofind) == 0:
> return None
> 
> log_file = os.path.join(self.get_path(), 'logs', filename)
> output_read = False
> while not os.path.exists(log_file):
> time.sleep(.5)
> if start + timeout < time.time():
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", 
> time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be 
> created.".format(log_file))
> if process and not output_read:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse RuntimeError 
> but I'm lazy
> 
> with open(log_file) as f:
> if from_mark:
> f.seek(from_mark)
> 
> while True:
> # First, if we have a process to check, then check it.
> # Skip on Windows - stdout/stderr is cassandra.bat
> if not common.is_win() and not output_read:
> if process:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, 
> verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse 
> RuntimeError but I'm lazy
> 
> line = f.readline()
> if line:
> 

[jira] [Assigned] (CASSANDRA-15314) Fix failing test - test_rolling_upgrade_with_internode_ssl - upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD

2020-01-15 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova reassigned CASSANDRA-15314:
---

Assignee: Ekaterina Dimitrova

> Fix failing test - test_rolling_upgrade_with_internode_ssl - 
> upgrade_tests.upgrade_through_versions_test.TestProtoV4Upgrade_AllVersions_EndsAt_Trunk_HEAD
> -
>
> Key: CASSANDRA-15314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15314
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Vinay Chella
>Assignee: Ekaterina Dimitrova
>Priority: Normal
>  Labels: dtest
> Fix For: 4.0-alpha
>
>
> Example failure: 
> [https://circleci.com/gh/vinaykumarchella/cassandra/468#tests/containers/11]
>  
> {code:java}
> ccmlib.node.TimeoutError: 06 Sep 2019 20:23:57 [node2] Missing: ['127.0.0.1.* 
> now UP']: INFO  [HANDSHAKE-/127.0.0.1] 2019-09-06 20:20:01,7. See 
> system.log for remainder
> self = 
>   object at 0x7f6d90d43b38>
> @pytest.mark.timeout(3000)
> def test_rolling_upgrade_with_internode_ssl(self):
> """
> Rolling upgrade test using internode ssl.
> """
> >   self.upgrade_scenario(rolling=True, internode_ssl=True)
> upgrade_tests/upgrade_through_versions_test.py:296: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> upgrade_tests/upgrade_through_versions_test.py:352: in upgrade_scenario
> self.upgrade_to_version(version_meta, partial=True, nodes=(node,), 
> internode_ssl=internode_ssl)
> upgrade_tests/upgrade_through_versions_test.py:456: in upgrade_to_version
> node.start(wait_other_notice=240, wait_for_binary_proto=True)
> ../env/src/ccm/ccmlib/node.py:751: in start
> node.watch_log_for_alive(self, from_mark=mark, timeout=wait_other_notice)
> ../env/src/ccm/ccmlib/node.py:568: in watch_log_for_alive
> self.watch_log_for(tofind, from_mark=from_mark, timeout=timeout, 
> filename=filename)
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = 
> exprs = ['127.0.0.1.* now UP'], from_mark = 150742, timeout = 240
> process = None, verbose = False, filename = 'system.log'
> def watch_log_for(self, exprs, from_mark=None, timeout=600, process=None, 
> verbose=False, filename='system.log'):
> """
> Watch the log until one or more (regular) expression are found.
> This methods when all the expressions have been found or the 
> method
> timeouts (a TimeoutError is then raised). On successful 
> completion,
> a list of pair (line matched, match object) is returned.
> """
> start = time.time()
> tofind = [exprs] if isinstance(exprs, string_types) else exprs
> tofind = [re.compile(e) for e in tofind]
> matchings = []
> reads = ""
> if len(tofind) == 0:
> return None
> 
> log_file = os.path.join(self.get_path(), 'logs', filename)
> output_read = False
> while not os.path.exists(log_file):
> time.sleep(.5)
> if start + timeout < time.time():
> raise TimeoutError(time.strftime("%d %b %Y %H:%M:%S", 
> time.gmtime()) + " [" + self.name + "] Timed out waiting for {} to be 
> created.".format(log_file))
> if process and not output_read:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse RuntimeError 
> but I'm lazy
> 
> with open(log_file) as f:
> if from_mark:
> f.seek(from_mark)
> 
> while True:
> # First, if we have a process to check, then check it.
> # Skip on Windows - stdout/stderr is cassandra.bat
> if not common.is_win() and not output_read:
> if process:
> process.poll()
> if process.returncode is not None:
> self.print_process_output(self.name, process, 
> verbose)
> output_read = True
> if process.returncode != 0:
> raise RuntimeError()  # Shouldn't reuse 
> RuntimeError but I'm lazy
> 
> line = f.readline()
> if line:
> reads = reads + line
> for e in tofind:
>

[jira] [Commented] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci

2020-01-15 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016122#comment-17016122
 ] 

Alex Petrov commented on CASSANDRA-15506:
-

Thank you for the patch!

+1, LGTM!

> Run in-jvm upgrade dtests in circleci
> -
>
> Key: CASSANDRA-15506
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15506
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should run the in-jvm upgrade dtests in circleci



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2020-01-15 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016077#comment-17016077
 ] 

Benedict Elliott Smith commented on CASSANDRA-15213:


Except if we use the cleaner {{stripedIndex}} calculation, we might need to go 
up to e.g. 8x stripes, in which case we'd need to throw {{23}} into the mix.  
That seems to take us up to 16x, with {{29}} taking us all the way to 64, which 
is way over provisioning stripes.

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict Elliott Smith
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2020-01-15 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016076#comment-17016076
 ] 

Jordan West commented on CASSANDRA-15213:
-

Sounds good. Thanks for the test / proof. 

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict Elliott Smith
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2020-01-15 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055
 ] 

Jordan West edited comment on CASSANDRA-15213 at 1/15/20 3:15 PM:
--

Thanks. I'll start exploring that approach. I implemented a version using 
{{Integer.reverse}} (which distributed well) but didn't find an approach using 
it that didn't involve an extra read/load (was looking for something more along 
the lines of a simple calculation like this). Will report back with my testing 
results / findings. 




was (Author: jrwest):
Thanks. I'll start exploring that approach. I implemented a version using 
{{Integer.reverse}} (which distributed well) but didn't find an approach using 
it that didn't involve an extra read/load (was looking for something more along 
the lines of a simple calculation like this). Will report back with my testing 
results / findings. 

EDIT: 17 divides several potential custom bucket sizes included 102, 170, and 
204. To satisfy the last requirement I think we need to pick a prime such that 
prime * 2 > max bucket count. 

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict Elliott Smith
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2020-01-15 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055
 ] 

Jordan West edited comment on CASSANDRA-15213 at 1/15/20 3:14 PM:
--

Thanks. I'll start exploring that approach. I implemented a version using 
{{Integer.reverse}} (which distributed well) but didn't find an approach using 
it that didn't involve an extra read/load (was looking for something more along 
the lines of a simple calculation like this). Will report back with my testing 
results / findings. 

EDIT: 17 divides several potential custom bucket sizes included 102, 170, and 
204. To satisfy the last requirement I think we need to pick a prime such that 
prime * 2 > max bucket count. 


was (Author: jrwest):
Thanks. I'll start exploring that approach. I implemented a version using 
{{Integer.reverse}} (which distributed well) but didn't find an approach using 
it that didn't involve an extra read/load (was looking for something more along 
the lines of a simple calculation like this). Will report back with my testing 
results / findings. 

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict Elliott Smith
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2020-01-15 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016070#comment-17016070
 ] 

Benedict Elliott Smith commented on CASSANDRA-15213:


Also fwiw, it looks like the primes 17 and 19 are sufficient, so we can 
literally just try either of those. Proof:

{code}
int[] primes = new int[] { 17, 19 };
BitSet sizeWithoutConflict = new BitSet();
for (int prime : primes)
{
for (int size = 1 ; size < 238 ; ++size)
{
BitSet conflict = new BitSet();
boolean hasConflict = false;
for (int i = 0 ; i < size ; ++i)
{
if (conflict.get((i * prime) % size))
hasConflict = true;
conflict.set((i * prime) % size);
}
if (!hasConflict)
sizeWithoutConflict.set(size);
}
}
for (int size = 1 ; size < 238 ; ++size)
{
if (!sizeWithoutConflict.get(size))
System.out.println(size);
}
{code}

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict Elliott Smith
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2020-01-15 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055
 ] 

Jordan West commented on CASSANDRA-15213:
-

Thanks. I'll start exploring that approach. I implemented a version using 
{{Integer.reverse}} (which distributed well) but didn't find an approach using 
it that didn't involve extra reads (was looking for something more along the 
lines of a simple calculation like this). Will report back with my testing 
results / findings. 

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict Elliott Smith
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15213) DecayingEstimatedHistogramReservoir Inefficiencies

2020-01-15 Thread Jordan West (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17016055#comment-17016055
 ] 

Jordan West edited comment on CASSANDRA-15213 at 1/15/20 2:52 PM:
--

Thanks. I'll start exploring that approach. I implemented a version using 
{{Integer.reverse}} (which distributed well) but didn't find an approach using 
it that didn't involve an extra read/load (was looking for something more along 
the lines of a simple calculation like this). Will report back with my testing 
results / findings. 


was (Author: jrwest):
Thanks. I'll start exploring that approach. I implemented a version using 
{{Integer.reverse}} (which distributed well) but didn't find an approach using 
it that didn't involve extra reads (was looking for something more along the 
lines of a simple calculation like this). Will report back with my testing 
results / findings. 

> DecayingEstimatedHistogramReservoir Inefficiencies
> --
>
> Key: CASSANDRA-15213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Observability/Metrics
>Reporter: Benedict Elliott Smith
>Assignee: Jordan West
>Priority: Normal
> Fix For: 4.0-beta
>
>
> * {{LongAdder}} introduced to trunk consumes 9MiB of heap without user 
> schemas, and this will grow significantly under contention and user schemas 
> with many tables.  This is because {{LongAdder}} is a very heavy class 
> designed for single contended values.  
>  ** This can likely be improved significantly, without significant loss of 
> performance in the contended case, by simply increasing the size of our 
> primitive backing array and providing multiple buckets, with each thread 
> picking a bucket to increment, or simply multiple backing arrays.  Probably a 
> better way still to do this would be to introduce some competition detection 
> to the update, much like {{LongAdder}} utilises, that increases the number of 
> backing arrays under competition.
>  ** To save memory this approach could partition the space into chunks that 
> are likely to be updated together, so that we do not need to duplicate the 
> entire array under competition.
>  * Similarly, binary search is costly and a measurable cost as a share of the 
> new networking work (without filtering it was > 10% of the CPU used overall). 
>  We can compute an approximation floor(log2 n / log2 1.2) extremely cheaply, 
> to save the random memory access costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15430) Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations compared to 2.1.18

2020-01-15 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015996#comment-17015996
 ] 

Benedict Elliott Smith commented on CASSANDRA-15430:


[~tsteinmaurer] it would help if you could post the schema and example queries 
you are submitting to the cluster.  It might be that there is a mitigation in a 
later version of Cassandra for the specific workload, or in the forthcoming 
4.0, that might be possible for you to backport.  I would also be happy to take 
a look at the JFR logs if we can find somewhere shared to put them.

> Cassandra 3.0.18: BatchMessage.execute - 10x more on-heap allocations 
> compared to 2.1.18
> 
>
> Key: CASSANDRA-15430
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15430
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Thomas Steinmaurer
>Priority: Normal
> Attachments: dashboard.png, jfr_allocations.png, mutation_stage.png
>
>
> In a 6 node loadtest cluster, we have been running with 2.1.18 a certain 
> production-like workload constantly and sufficiently. After upgrading one 
> node to 3.0.18 (remaining 5 still on 2.1.18 after we have seen that sort of 
> regression described below), 3.0.18 is showing increased CPU usage, increase 
> GC, high mutation stage pending tasks, dropped mutation messages ...
> Some spec. All 6 nodes equally sized:
>  * Bare metal, 32 physical cores, 512G RAM
>  * Xmx31G, G1, max pause millis = 2000ms
>  * cassandra.yaml basically unchanged, thus same settings in regard to number 
> of threads, compaction throttling etc.
> Following dashboard shows highlighted areas (CPU, suspension) with metrics 
> for all 6 nodes and the one outlier being the node upgraded to Cassandra 
> 3.0.18.
>  !dashboard.png|width=1280!
> Additionally we see a large increase on pending tasks in the mutation stage 
> after the upgrade:
>  !mutation_stage.png!
> And dropped mutation messages, also confirmed in the Cassandra log:
> {noformat}
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:24,780 MessagingService.java:1022 - 
> MUTATION messages were dropped in last 5000 ms: 41552 for internal timeout 
> and 0 for cross node timeout
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,157 StatusLogger.java:52 - Pool 
> NameActive   Pending  Completed   Blocked  All Time 
> Blocked
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> MutationStage   256 81824 3360532756 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ViewMutationStage 0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,168 StatusLogger.java:56 - 
> ReadStage 0 0   62862266 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> RequestResponseStage  0 0 2176659856 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> ReadRepairStage   0 0  0 0
>  0
> INFO  [ScheduledTasks:1] 2019-11-15 08:24:25,169 StatusLogger.java:56 - 
> CounterMutationStage  0 0  0 0
>  0
> ...
> {noformat}
> Judging from a 15min JFR session for both, 3.0.18 and 2.1.18 on a different 
> node, high-level, it looks like the code path underneath 
> {{BatchMessage.execute}} is producing ~ 10x more on-heap allocations in 
> 3.0.18 compared to 2.1.18.
>  !jfr_allocations.png!
> Left => 3.0.18
>  Right => 2.1.18
> JFRs zipped are exceeding the 60MB limit to directly attach to the ticket. I 
> can upload them, if there is another destination available.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15504) INT is incompatible with previous type SMALLINT

2020-01-15 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015843#comment-17015843
 ] 

Benedict Elliott Smith commented on CASSANDRA-15504:


This was previously a bug: while it's absolutely possible to reinterpret the 
data on disk as an {{INT}}, this isn't what happens, and the binary 
representations are not interpreted correctly automatically.  The entirety of 
our type management in this regard could do with modernising, as it should 
anyway be possible to re-add columns as often as you like, with whatever type 
you like, but in a distributed system this is more hassle than you might 
imagine.

We'd more than welcome a contribution moving in the direction of this, but my 
prediction is that the active contributor community does not have the resources 
to dedicate to this specific issue at present.

It might be that there is some middle ground that could be achieved more 
readily, at least with a convenience mechanism for force re-add after expunging 
the old data via compaction.  But again, I don't think this is a priority, so 
you would have to take a look yourself.

> INT is incompatible with previous type SMALLINT
> ---
>
> Key: CASSANDRA-15504
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15504
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Truscello
>Priority: Normal
>
> With the release of Cassandra 3.11.5 and the fixing of CASSANDRA-14948, it 
> now appears that you can no longer re-add a SMALLINT column as an INT type.  
> This is rather surprising as any SMALLINT value should be representable by an 
> INT type.
> The following example was run on Cassandra 3.11.5 on CentOS 7 installed from 
> official RedHat repo:
>  
>  
> {noformat}
> cqlsh> CREATE KEYSPACE demo WITH replication = {'class':'SimpleStrategy', 
> 'replication_factor' : 1};
> cqlsh> CREATE TABLE demo.demo_table (
>    ...   user_id BIGINT,
>    ...   created TIMESTAMP,
>    ...   points  SMALLINT,
>    ...   PRIMARY KEY (user_id, created)
>    ... ) WITH CLUSTERING ORDER BY (created DESC);
> cqlsh> ALTER TABLE demo.demo_table DROP points;
> cqlsh> ALTER TABLE demo.demo_table ADD  points INT;
> InvalidRequest: Error from server: code=2200 [Invalid query] message="Cannot 
> re-add previously dropped column 'points' of type int, incompatible with 
> previous type smallint"{noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci

2020-01-15 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15506:

Reviewers: Alex Petrov

> Run in-jvm upgrade dtests in circleci
> -
>
> Key: CASSANDRA-15506
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15506
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should run the in-jvm upgrade dtests in circleci



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci

2020-01-15 Thread Marcus Eriksson (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17015788#comment-17015788
 ] 

Marcus Eriksson edited comment on CASSANDRA-15506 at 1/15/20 9:57 AM:
--

[patch|https://github.com/krummas/cassandra/commits/marcuse/upgradedtests-circle],
 [circleci 
run|https://circleci.com/workflow-run/36459d4c-431e-409d-8e32-97a0b2e14648]

This patch adds a step to build the dtest jars for all current versions and 
stores them in the workspace to avoid rebuilding when testing, but so far it 
only runs the tests sequentially.

The patch also fixes an issue with the `TestLocator` script - it would always 
exit with status code 0, which makes the build green in cci even if there are 
failures.



was (Author: krummas):
[patch|https://github.com/krummas/cassandra/commits/marcuse/upgradedtests-circle]
 [circleci 
run|https://circleci.com/workflow-run/36459d4c-431e-409d-8e32-97a0b2e14648]

This patch adds a step to build the dtest jars for all current versions and 
stores them in the workspace to avoid rebuilding when testing, but so far it 
only runs the tests sequentially.

The patch also fixes an issue with the `TestLocator` script - it would always 
exit with status code 0, which makes the build green in cci even if there are 
failures.


> Run in-jvm upgrade dtests in circleci
> -
>
> Key: CASSANDRA-15506
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15506
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should run the in-jvm upgrade dtests in circleci



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci

2020-01-15 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15506:

Test and Documentation Plan: circleci runs
 Status: Patch Available  (was: Open)

[patch|https://github.com/krummas/cassandra/commits/marcuse/upgradedtests-circle]
 [circleci 
run|https://circleci.com/workflow-run/36459d4c-431e-409d-8e32-97a0b2e14648]

This patch adds a step to build the dtest jars for all current versions and 
stores them in the workspace to avoid rebuilding when testing, but so far it 
only runs the tests sequentially.

The patch also fixes an issue with the `TestLocator` script - it would always 
exit with status code 0, which makes the build green in cci even if there are 
failures.


> Run in-jvm upgrade dtests in circleci
> -
>
> Key: CASSANDRA-15506
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15506
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should run the in-jvm upgrade dtests in circleci



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci

2020-01-15 Thread Marcus Eriksson (Jira)
Marcus Eriksson created CASSANDRA-15506:
---

 Summary: Run in-jvm upgrade dtests in circleci
 Key: CASSANDRA-15506
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15506
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/dtest
Reporter: Marcus Eriksson
Assignee: Marcus Eriksson


We should run the in-jvm upgrade dtests in circleci



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15506) Run in-jvm upgrade dtests in circleci

2020-01-15 Thread Marcus Eriksson (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marcus Eriksson updated CASSANDRA-15506:

Change Category: Quality Assurance
 Complexity: Low Hanging Fruit
  Fix Version/s: 4.x
 3.11.x
 3.0.x
 Status: Open  (was: Triage Needed)

> Run in-jvm upgrade dtests in circleci
> -
>
> Key: CASSANDRA-15506
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15506
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should run the in-jvm upgrade dtests in circleci



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15505) Add message interceptors to in-jvm dtests

2020-01-15 Thread Alex Petrov (Jira)
Alex Petrov created CASSANDRA-15505:
---

 Summary: Add message interceptors to in-jvm dtests
 Key: CASSANDRA-15505
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15505
 Project: Cassandra
  Issue Type: New Feature
  Components: Test/dtest
Reporter: Alex Petrov
Assignee: Alex Petrov


Currently we only have means to filter messages in in-jvm tests. We need a 
facility to intercept and modify the messages between nodes for testing 
purposes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org