[jira] [Assigned] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread rsasupport (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rsasupport reassigned CASSANDRA-15608:
--

Assignee: Brandon Williams  (was: William Saar)

> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Assignee: Brandon Williams
>Priority: Urgent
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread rsasupport (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rsasupport reassigned CASSANDRA-15608:
--

Assignee: William Saar

> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Assignee: William Saar
>Priority: Urgent
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread rsasupport (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rsasupport updated CASSANDRA-15608:
---
Status: Open  (was: Resolved)

> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Priority: Urgent
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread rsasupport (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17048173#comment-17048173
 ] 

rsasupport commented on CASSANDRA-15608:


Any recommended changes to be done??

Please find the JVM.options below:

###
# jvm.options #
# #
# - all flags defined here will be used by cassandra to startup the JVM #
# - one flag should be specified per line #
# - lines that do not start with '-' will be ignored #
# - only static flags are accepted (no variables or parameters) #
# - dynamic flags will be appended to these on cassandra-env #
###

##
# STARTUP PARAMETERS #
##

# Uncomment any of the following properties to enable specific startup 
parameters

# In a multi-instance deployment, multiple Cassandra instances will 
independently assume that all
# CPU processors are available to it. This setting allows you to specify a 
smaller set of processors
# and perhaps have affinity.
#-Dcassandra.available_processors=number_of_processors

# The directory location of the cassandra.yaml file.
#-Dcassandra.config=directory

# Sets the initial partitioner token for a node the first time the node is 
started.
#-Dcassandra.initial_token=token

# Set to false to start Cassandra on a node but not have the node join the 
cluster.
#-Dcassandra.join_ring=true|false

# Set to false to clear all gossip state for the node on restart. Use when you 
have changed node
# information in cassandra.yaml (such as listen_address).
#-Dcassandra.load_ring_state=true|false

# Enable pluggable metrics reporter. See Pluggable metrics reporting in 
Cassandra 2.0.2.
#-Dcassandra.metricsReporterConfigFile=file

# Set the port on which the CQL native transport listens for clients. (Default: 
9042)
#-Dcassandra.native_transport_port=port

# Overrides the partitioner. (Default: 
org.apache.cassandra.dht.Murmur3Partitioner)
#-Dcassandra.partitioner=partitioner

# To replace a node that has died, restart a new node in its place specifying 
the address of the
# dead node. The new node must not have any data in its data directory, that 
is, it must be in the
# same state as before bootstrapping.
#-Dcassandra.replace_address=listen_address or broadcast_address of dead node

# Allow restoring specific tables from an archived commit log.
#-Dcassandra.replayList=table

# Allows overriding of the default RING_DELAY (3ms), which is the amount of 
time a node waits
# before joining the ring.
#-Dcassandra.ring_delay_ms=ms

# Set the port for the Thrift RPC service, which is used for client 
connections. (Default: 9160)
#-Dcassandra.rpc_port=port

# Set the SSL port for encrypted communication. (Default: 7001)
#-Dcassandra.ssl_storage_port=port

# Enable or disable the native transport server. See start_native_transport in 
cassandra.yaml.
# cassandra.start_native_transport=true|false

# Enable or disable the Thrift RPC server. (Default: true)
#-Dcassandra.start_rpc=true/false

# Set the port for inter-node communication. (Default: 7000)
#-Dcassandra.storage_port=port

# Set the default location for the trigger JARs. (Default: conf/triggers)
#-Dcassandra.triggers_dir=directory

# For testing new compaction and compression strategies. It allows you to 
experiment with different
# strategies and benchmark write performance differences without affecting the 
production workload. 
#-Dcassandra.write_survey=true

# To disable configuration via JMX of auth caches (such as those for 
credentials, permissions and
# roles). This will mean those config options can only be set (persistently) in 
cassandra.yaml
# and will require a restart for new values to take effect.
#-Dcassandra.disable_auth_caches_remote_configuration=true

# To disable dynamic calculation of the page size used when indexing an entire 
partition (during
# initial index build/rebuild). If set to true, the page size will be fixed to 
the default of
# 1 rows per page.
#-Dcassandra.force_default_indexing_page_size=true


# GENERAL JVM SETTINGS #


# enable assertions. highly suggested for correct application functionality.
-ea

# enable thread priorities, primarily so we can give periodic tasks
# a lower priority to avoid interfering with client workload
-XX:+UseThreadPriorities

# allows lowering thread priority without being root on linux - probably
# not necessary on Windows but doesn't harm anything.
# see http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workar
-XX:ThreadPriorityPolicy=42

# Enable heap-dump if there's an OOM
-XX:+HeapDumpOnOutOfMemoryError

# Per-thread stack size.
-Xss256k

# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
-XX:StringTableSize=103

# Make sure all memory is faulted and zeroed on startup.

[jira] [Assigned] (CASSANDRA-15605) Broken dtest replication_test.py::TestSnitchConfigurationUpdate

2020-02-28 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-15605:


Assignee: Brandon Williams

> Broken dtest replication_test.py::TestSnitchConfigurationUpdate
> ---
>
> Key: CASSANDRA-15605
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15605
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Sam Tunnicliffe
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> Noticed this failing on a couple of CI runs and repros when running trunk 
> locally and on CircleCI
> 2 or 3 tests are consistently failing:
>  * {{test_rf_expand_gossiping_property_file_snitch}}
>  * {{test_rf_expand_property_file_snitch}}
>  * {{test_move_forwards_between_and_cleanup}}
> [https://circleci.com/workflow-run/f23f13a9-bbdc-4764-8336-109517e137f1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI

2020-02-28 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047951#comment-17047951
 ] 

David Capwell commented on CASSANDRA-15611:
---

Ignore my comment from before, I just saw you put this under Sidecar!

> Build and Test with both Java 8 & 11 in Circle CI
> -
>
> Key: CASSANDRA-15611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15611
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> We currently only build and test with Java 8.  We should ensure Java 11 is 
> fully supported for both builds and testing in CircleCI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI

2020-02-28 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15611:
--
Comment: was deleted

(was: [~rustyrazorblade] are you saying Circle CI?  If so we do java 8 and 11 
in Circle CI (though java 11 doesn't do as much).  There is some confusion in 
the UI since they are two different pipelines, so you need to find the java 11 
pipeline (or w/e circle ci calls it).

I do believe that Jenkins is only 8 though; [see 
here|https://builds.apache.org/job/Cassandra-trunk/], 
[here|https://builds.apache.org/job/Cassandra-trunk-artifacts/], and 
[here|https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L8].
  There looks to be a flag to override, so not sure I can prove; but ether way 
since there is only one job for each stage, it looks like its one or the other.)

> Build and Test with both Java 8 & 11 in Circle CI
> -
>
> Key: CASSANDRA-15611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15611
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> We currently only build and test with Java 8.  We should ensure Java 11 is 
> fully supported for both builds and testing in CircleCI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI

2020-02-28 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047950#comment-17047950
 ] 

David Capwell commented on CASSANDRA-15611:
---

[~rustyrazorblade] are you saying Circle CI?  If so we do java 8 and 11 in 
Circle CI (though java 11 doesn't do as much).  There is some confusion in the 
UI since they are two different pipelines, so you need to find the java 11 
pipeline (or w/e circle ci calls it).

I do believe that Jenkins is only 8 though; [see 
here|https://builds.apache.org/job/Cassandra-trunk/], 
[here|https://builds.apache.org/job/Cassandra-trunk-artifacts/], and 
[here|https://github.com/apache/cassandra-builds/blob/master/jenkins-dsl/cassandra_job_dsl_seed.groovy#L8].
  There looks to be a flag to override, so not sure I can prove; but ether way 
since there is only one job for each stage, it looks like its one or the other.

> Build and Test with both Java 8 & 11 in Circle CI
> -
>
> Key: CASSANDRA-15611
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15611
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> We currently only build and test with Java 8.  We should ensure Java 11 is 
> fully supported for both builds and testing in CircleCI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15611) Build and Test with both Java 8 & 11 in Circle CI

2020-02-28 Thread Jon Haddad (Jira)
Jon Haddad created CASSANDRA-15611:
--

 Summary: Build and Test with both Java 8 & 11 in Circle CI
 Key: CASSANDRA-15611
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15611
 Project: Cassandra
  Issue Type: Improvement
  Components: Sidecar
Reporter: Jon Haddad
Assignee: Jon Haddad


We currently only build and test with Java 8.  We should ensure Java 11 is 
fully supported for both builds and testing in CircleCI.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent

2020-02-28 Thread Dinesh Joshi (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047930#comment-17047930
 ] 

Dinesh Joshi commented on CASSANDRA-15564:
--

[~dcapwell] LGTM +1

> Refactor repair coordinator so errors are consistent
> 
>
> Key: CASSANDRA-15564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15564
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> This is to split the change in CASSANDRA-15399 so the refactor is isolated 
> out.
> Currently the repair coordinator special cases the exit cases at each call 
> site; this makes it so that errors can be inconsistent and there are cases 
> where proper complete isn't done (proper notifications, and forgetting to 
> update ActiveRepairService).
> [Circle 
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15610) Upgrade Sidecar Gradle Dependencies

2020-02-28 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-15610:
---
Test and Documentation Plan: Unit tests
 Status: Patch Available  (was: Open)

Patch here: https://github.com/apache/cassandra-sidecar/pull/6

> Upgrade Sidecar Gradle Dependencies
> ---
>
> Key: CASSANDRA-15610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There's a few out of date dependencies in the Gradle config, some of which 
> are hindering us from fully supporting Java 11.   
> * Gradle can be upgraded to 6.2
> * Findbugs is abandoned and is Java 8 only, spotbugs is the supported 
> replacement
> * Old guava transitive dependency from the Java Driver
> * Java driver is 3.6, 3.8 is more current
> There's probably more to do but this is a quick starting point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15610) Upgrade Sidecar Gradle Dependencies

2020-02-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated CASSANDRA-15610:
---
Labels: pull-request-available  (was: )

> Upgrade Sidecar Gradle Dependencies
> ---
>
> Key: CASSANDRA-15610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>  Labels: pull-request-available
>
> There's a few out of date dependencies in the Gradle config, some of which 
> are hindering us from fully supporting Java 11.   
> * Gradle can be upgraded to 6.2
> * Findbugs is abandoned and is Java 8 only, spotbugs is the supported 
> replacement
> * Old guava transitive dependency from the Java Driver
> * Java driver is 3.6, 3.8 is more current
> There's probably more to do but this is a quick starting point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15610) Upgrade Sidecar Gradle Dependencies

2020-02-28 Thread Dinesh Joshi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-15610:
-
Reviewers: Dinesh Joshi

> Upgrade Sidecar Gradle Dependencies
> ---
>
> Key: CASSANDRA-15610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> There's a few out of date dependencies in the Gradle config, some of which 
> are hindering us from fully supporting Java 11.   
> * Gradle can be upgraded to 6.2
> * Findbugs is abandoned and is Java 8 only, spotbugs is the supported 
> replacement
> * Old guava transitive dependency from the Java Driver
> * Java driver is 3.6, 3.8 is more current
> There's probably more to do but this is a quick starting point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15610) Upgrade Sidecar Gradle Dependencies

2020-02-28 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-15610:
---
Change Category: Operability
 Complexity: Low Hanging Fruit
 Status: Open  (was: Triage Needed)

> Upgrade Sidecar Gradle Dependencies
> ---
>
> Key: CASSANDRA-15610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> There's a few out of date dependencies in the Gradle config, some of which 
> are hindering us from fully supporting Java 11.   
> * Gradle can be upgraded to 6.2
> * Findbugs is abandoned and is Java 8 only, spotbugs is the supported 
> replacement
> * Old guava transitive dependency from the Java Driver
> * Java driver is 3.6, 3.8 is more current
> There's probably more to do but this is a quick starting point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15610) Upgrade Sidecar Gradle Dependencies

2020-02-28 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-15610:
---
Description: 
There's a few out of date dependencies in the Gradle config, some of which are 
hindering us from fully supporting Java 11.   

* Gradle can be upgraded to 6.2
* Findbugs is abandoned and is Java 8 only, spotbugs is the supported 
replacement
* Old guava transitive dependency from the Java Driver
* Java driver is 3.6, 3.8 is more current

There's probably more to do but this is a quick starting point.

  was:
There's a few out of date dependencies in the Gradle config, some of which are 
hindering us from fully supporting Java 11.   

* Gradle can be upgraded to 6.2
* Findbugs is abandoned and is Java 8 only
* Old guava transitive dependency from the Java Driver
* Java driver is 3.6, 3.8 is more current

There's probably more to do but this is a quick starting point.


> Upgrade Sidecar Gradle Dependencies
> ---
>
> Key: CASSANDRA-15610
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15610
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Sidecar
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> There's a few out of date dependencies in the Gradle config, some of which 
> are hindering us from fully supporting Java 11.   
> * Gradle can be upgraded to 6.2
> * Findbugs is abandoned and is Java 8 only, spotbugs is the supported 
> replacement
> * Old guava transitive dependency from the Java Driver
> * Java driver is 3.6, 3.8 is more current
> There's probably more to do but this is a quick starting point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15610) Upgrade Sidecar Gradle Dependencies

2020-02-28 Thread Jon Haddad (Jira)
Jon Haddad created CASSANDRA-15610:
--

 Summary: Upgrade Sidecar Gradle Dependencies
 Key: CASSANDRA-15610
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15610
 Project: Cassandra
  Issue Type: Improvement
  Components: Sidecar
Reporter: Jon Haddad
Assignee: Jon Haddad


There's a few out of date dependencies in the Gradle config, some of which are 
hindering us from fully supporting Java 11.   

* Gradle can be upgraded to 6.2
* Findbugs is abandoned and is Java 8 only
* Old guava transitive dependency from the Java Driver
* Java driver is 3.6, 3.8 is more current

There's probably more to do but this is a quick starting point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent

2020-02-28 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15564:
--
Status: Patch Available  (was: Review In Progress)

Based off slack I should move this to PA and reviewers should move to Review in 
progress... trying to figure out how to do this...

> Refactor repair coordinator so errors are consistent
> 
>
> Key: CASSANDRA-15564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15564
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> This is to split the change in CASSANDRA-15399 so the refactor is isolated 
> out.
> Currently the repair coordinator special cases the exit cases at each call 
> site; this makes it so that errors can be inconsistent and there are cases 
> where proper complete isn't done (proper notifications, and forgetting to 
> update ActiveRepairService).
> [Circle 
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15609) Update patches doc to remove references to sending patches attached to JIRA

2020-02-28 Thread David Capwell (Jira)
David Capwell created CASSANDRA-15609:
-

 Summary: Update patches doc to remove references to sending 
patches attached to JIRA
 Key: CASSANDRA-15609
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15609
 Project: Cassandra
  Issue Type: Improvement
Reporter: David Capwell


When a new person joins and goes through our docs they see the process is to 
create patches and attach to JIRA; this is not the process most people follow.

We should update the document to encourage newer norms to make it easier for 
new developers to contribute.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15609) Update patches doc to remove references to sending patches attached to JIRA

2020-02-28 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15609:
--
Change Category: Code Clarity
 Complexity: Low Hanging Fruit
Component/s: Documentation/Website
 Status: Open  (was: Triage Needed)

> Update patches doc to remove references to sending patches attached to JIRA
> ---
>
> Key: CASSANDRA-15609
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15609
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Website
>Reporter: David Capwell
>Priority: Normal
>
> When a new person joins and goes through our docs they see the process is to 
> create patches and attach to JIRA; this is not the process most people follow.
> We should update the document to encourage newer norms to make it easier for 
> new developers to contribute.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15605) Broken dtest replication_test.py::TestSnitchConfigurationUpdate

2020-02-28 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15605:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Unit Test
 Severity: Normal
   Status: Open  (was: Triage Needed)

> Broken dtest replication_test.py::TestSnitchConfigurationUpdate
> ---
>
> Key: CASSANDRA-15605
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15605
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> Noticed this failing on a couple of CI runs and repros when running trunk 
> locally and on CircleCI
> 2 or 3 tests are consistently failing:
>  * {{test_rf_expand_gossiping_property_file_snitch}}
>  * {{test_rf_expand_property_file_snitch}}
>  * {{test_move_forwards_between_and_cleanup}}
> [https://circleci.com/workflow-run/f23f13a9-bbdc-4764-8336-109517e137f1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15564) Refactor repair coordinator so errors are consistent

2020-02-28 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047782#comment-17047782
 ] 

David Capwell commented on CASSANDRA-15564:
---

Here are the [latest CI 
runs|https://circleci.com/workflow-run/9c487328-53c3-4f14-a097-5d19faddfbd8]

[Unit test 
failed|https://app.circleci.com/jobs/github/dcapwell/cassandra/749/tests] - 
this is not my fault! See CASSANDRA-15308
[python dtest 
failed|https://app.circleci.com/jobs/github/dcapwell/cassandra/751/tests] - 
this is not my fault! See CASSANDRA-15605

Given the only failures are known failures, CI is good.

> Refactor repair coordinator so errors are consistent
> 
>
> Key: CASSANDRA-15564
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15564
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Consistency/Repair
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
>  Labels: pull-request-available
>  Time Spent: 10h 50m
>  Remaining Estimate: 0h
>
> This is to split the change in CASSANDRA-15399 so the refactor is isolated 
> out.
> Currently the repair coordinator special cases the exit cases at each call 
> site; this makes it so that errors can be inconsistent and there are cases 
> where proper complete isn't done (proper notifications, and forgetting to 
> update ActiveRepairService).
> [Circle 
> CI|https://circleci.com/gh/dcapwell/cassandra/tree/bug%2FrepairCoordinatorJmxConsistency]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-15608:
-
Resolution: Not A Bug
Status: Resolved  (was: Triage Needed)

> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Priority: Urgent
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047709#comment-17047709
 ] 

Brandon Williams commented on CASSANDRA-15608:
--

This is not a bug and is better served on the user list or the ASF slack 
#cassandra channel.  This node is horribly misconfigured (for example, setting 
new size with G1) and is suffering because of it:

bq. WARN  [GossipTasks:1] 2020-02-28 18:01:47,596 FailureDetector.java:288 - 
Not marking nodes down due to local pause of 127331804873 > 50

That's 2 minutes and 2 seconds the entire JVM was frozen which is going to 
cause all kinds of problems.  This isn't specific to repair.


> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Assignee: Nate McCall
>Priority: Urgent
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: 

[jira] [Assigned] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-15608:


 Authors:   (was: Nate McCall)
Assignee: (was: Nate McCall)

> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Priority: Urgent
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15536) 4.0 Quality: Components and Test Plans

2020-02-28 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15536:
--
Description: 
Migrated from 
[cwiki|https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality:+Components+and+Test+Plans]

 

The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a 
state where major users would run it in production when it is cut. To gain this 
confidence there are various ongoing testing efforts involving correctness, 
performance, and ease of use. In this page we try to coordinate and identify 
blockers for subsystems before we can release 4.0

For each component we strive to have shepherds and contributors involved. 
Shepherds should be committers or knowledgeable component owners and are 
responsible for driving their blocking tickets to completion and ensuring 
quality in their claimed area, while contributors have signed up to help verify 
that subsystem by running tests or contributing fixes. Shepherds also ideally 
help set testing standards and ensure that we meet a high standard of quality 
in their claimed area.

{color:#de350b}(For now, we will overload "assignee == shepherd", and 
"reviewer(s) == contributors" so we don't have to change fields in JIRA.){color}

If you are interested in contributing to testing 4.0, please add your name as a 
reviewer and get involved in the the tracking ticket, and dev list/IRC 
discussions involving that component. Reach out to the assignee on the ticket 
to get context and oriented with the work.
h3. Targeted Components / Subsystems

We've tried to collect some of the major components or subsystems that we want 
to ensure work properly towards having a great 4.0 release. If you think 
something is missing please add it. Better yet volunteer to contribute to 
testing it!
h4. Internode Messaging

In 4.0 we're getting a new Netty based inter-node communication system 
(CASSANDRA-8457). As internode messaging is vital to the correctness and 
performance of the database we should make sure that all forms (TLS, 
compressed, low latency, high latency, etc ...) of internode messaging function 
correctly.
h4. Test Infrastructure / Automation: Diff Testing

Diff testing is a form of model-based testing in which two clusters are 
exhaustively compared to assert identity. To support Apache Cassandra 4.0 
validation, contributors have developed cassandra-diff. This is a Spark 
application that distributes the token range over a configurable number of 
Spark executors, then parallelizes randomized forward and reverse reads with 
varying paging sizes to read and compare every row present in the cluster, 
persisting a record of mismatches for investigation. This methodology has been 
instrumental to identifying data loss, data corruption, and incorrect response 
issues introduced in early Cassandra 3.0 releases.

cassandra-diff and associated documentation can be found at: 
[https://github.com/apache/cassandra-diff]. Contributors are encouraged to run 
diff tests against clusters they manage and report issues to ensure workload 
diversity across the project.
h4. System Tables and Internal Schema

This task covers a review of and minor bug fixes to local and distributed 
system keyspaces. Planned work in this area is now complete.
h4. Source Audit and Performance Testing: Streaming

This task covers an audit of the Streaming implementation in Apache Cassandra 
4.0. In this release, contributors have implemented full-SSTable streaming to 
improve performance and reduce memory pressure. Internode messaging changes 
implemented in CASSANDRA-15066 adjacent to streaming suggested that review of 
the streaming implementation itself may be desirable. Prior work also covered 
performance testing of full-SSTable streaming.
h4. Test Infrastructure / Automation: "Harry"

CASSANDRA-15348 - Harry: generator library and extensible framework for fuzz 
testing Apache Cassandra TRIAGE NEEDED

Harry is a component for fuzz testing and verification of the Apache Cassandra 
clusters at scale. Harry allows to run tests that are able to validate state of 
both dense nodes (to test local read-write path) and large clusters (to test 
distributed read-write path), and do it efficiently. Harry defines a model that 
holds the state of the database, generators that produce reproducible, 
pseudo-random schemas, mutations, and queries, and a validator that asserts the 
correctness of the model following execution of generated traffic. See 
CASSANDRA-15348 for additional details.
h4. Local Read/Write Path: IndexInfo (CASSANDRA-11206)

Users upgrading from Cassandra 3.0.x to trunk will pick up CASSANDRA-11206 in 
the process. Contributors to 4.0 testing and validation have allocated time to 
testing and validation of these changes via source audit and implementation of 
property-based tests (currently underway). The majority of planned work here 

[jira] [Commented] (CASSANDRA-15536) 4.0 Quality: Components and Test Plans

2020-02-28 Thread Josh McKenzie (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047623#comment-17047623
 ] 

Josh McKenzie commented on CASSANDRA-15536:
---

Cleaned up ticket to indicate active tracking status now. Reached out to 
[~cscotta] on slack about deprecating wiki article and linking here as I don't 
have wiki edit access.

> 4.0 Quality: Components and Test Plans
> --
>
> Key: CASSANDRA-15536
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15536
> Project: Cassandra
>  Issue Type: Epic
>  Components: Test/benchmark, Test/dtest, Test/fuzz, Test/unit
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: High
> Fix For: 4.0
>
>
> Migrated from 
> [cwiki|https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality:+Components+and+Test+Plans]
>  
> The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a 
> state where major users would run it in production when it is cut. To gain 
> this confidence there are various ongoing testing efforts involving 
> correctness, performance, and ease of use. In this page we try to coordinate 
> and identify blockers for subsystems before we can release 4.0
> For each component we strive to have shepherds and contributors involved. 
> Shepherds should be committers or knowledgeable component owners and are 
> responsible for driving their blocking tickets to completion and ensuring 
> quality in their claimed area, while contributors have signed up to help 
> verify that subsystem by running tests or contributing fixes. Shepherds also 
> ideally help set testing standards and ensure that we meet a high standard of 
> quality in their claimed area.
> {color:#de350b}(For now, we will overload "assignee == shepherd", and 
> "reviewer(s) == contributors" so we don't have to change fields in 
> JIRA.){color}
> If you are interested in contributing to testing 4.0, please add your name as 
> a reviewer and get involved in the the tracking ticket, and dev list/IRC 
> discussions involving that component. Reach out to the assignee on the ticket 
> to get context and oriented with the work.
> h3. Targeted Components / Subsystems
> We've tried to collect some of the major components or subsystems that we 
> want to ensure work properly towards having a great 4.0 release. If you think 
> something is missing please add it. Better yet volunteer to contribute to 
> testing it!
> h4. Internode Messaging
> In 4.0 we're getting a new Netty based inter-node communication system 
> (CASSANDRA-8457). As internode messaging is vital to the correctness and 
> performance of the database we should make sure that all forms (TLS, 
> compressed, low latency, high latency, etc ...) of internode messaging 
> function correctly.
> h4. Test Infrastructure / Automation: Diff Testing
> Diff testing is a form of model-based testing in which two clusters are 
> exhaustively compared to assert identity. To support Apache Cassandra 4.0 
> validation, contributors have developed cassandra-diff. This is a Spark 
> application that distributes the token range over a configurable number of 
> Spark executors, then parallelizes randomized forward and reverse reads with 
> varying paging sizes to read and compare every row present in the cluster, 
> persisting a record of mismatches for investigation. This methodology has 
> been instrumental to identifying data loss, data corruption, and incorrect 
> response issues introduced in early Cassandra 3.0 releases.
> cassandra-diff and associated documentation can be found at: 
> [https://github.com/apache/cassandra-diff]. Contributors are encouraged to 
> run diff tests against clusters they manage and report issues to ensure 
> workload diversity across the project.
> h4. System Tables and Internal Schema
> This task covers a review of and minor bug fixes to local and distributed 
> system keyspaces. Planned work in this area is now complete.
> h4. Source Audit and Performance Testing: Streaming
> This task covers an audit of the Streaming implementation in Apache Cassandra 
> 4.0. In this release, contributors have implemented full-SSTable streaming to 
> improve performance and reduce memory pressure. Internode messaging changes 
> implemented in CASSANDRA-15066 adjacent to streaming suggested that review of 
> the streaming implementation itself may be desirable. Prior work also covered 
> performance testing of full-SSTable streaming.
> h4. Test Infrastructure / Automation: "Harry"
> CASSANDRA-15348 - Harry: generator library and extensible framework for fuzz 
> testing Apache Cassandra TRIAGE NEEDED
> Harry is a component for fuzz testing and verification of the Apache 
> Cassandra clusters at scale. Harry allows to run tests that are able to 
> validate state of both dense nodes 

[jira] [Updated] (CASSANDRA-15536) 4.0 Quality: Components and Test Plans

2020-02-28 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15536:
--
Description: 
The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a 
state where major users would run it in production when it is cut. To gain this 
confidence there are various ongoing testing efforts involving correctness, 
performance, and ease of use. In this page we try to coordinate and identify 
blockers for subsystems before we can release 4.0

For each component we strive to have shepherds and contributors involved. 
Shepherds should be committers or knowledgeable component owners and are 
responsible for driving their blocking tickets to completion and ensuring 
quality in their claimed area, while contributors have signed up to help verify 
that subsystem by running tests or contributing fixes. Shepherds also ideally 
help set testing standards and ensure that we meet a high standard of quality 
in their claimed area.

{color:#de350b}(For now, we will overload "assignee == shepherd", and 
"reviewer(s) == contributors" so we don't have to change fields in JIRA.){color}

If you are interested in contributing to testing 4.0, please add your name as a 
reviewer and get involved in the the tracking ticket, and dev list/IRC 
discussions involving that component. Reach out to the assignee on the ticket 
to get context and oriented with the work.
h3. Targeted Components / Subsystems

We've tried to collect some of the major components or subsystems that we want 
to ensure work properly towards having a great 4.0 release. If you think 
something is missing please add it. Better yet volunteer to contribute to 
testing it!
h4. Internode Messaging

In 4.0 we're getting a new Netty based inter-node communication system 
(CASSANDRA-8457). As internode messaging is vital to the correctness and 
performance of the database we should make sure that all forms (TLS, 
compressed, low latency, high latency, etc ...) of internode messaging function 
correctly.
h4. Test Infrastructure / Automation: Diff Testing

Diff testing is a form of model-based testing in which two clusters are 
exhaustively compared to assert identity. To support Apache Cassandra 4.0 
validation, contributors have developed cassandra-diff. This is a Spark 
application that distributes the token range over a configurable number of 
Spark executors, then parallelizes randomized forward and reverse reads with 
varying paging sizes to read and compare every row present in the cluster, 
persisting a record of mismatches for investigation. This methodology has been 
instrumental to identifying data loss, data corruption, and incorrect response 
issues introduced in early Cassandra 3.0 releases.

cassandra-diff and associated documentation can be found at: 
[https://github.com/apache/cassandra-diff]. Contributors are encouraged to run 
diff tests against clusters they manage and report issues to ensure workload 
diversity across the project.
h4. System Tables and Internal Schema

This task covers a review of and minor bug fixes to local and distributed 
system keyspaces. Planned work in this area is now complete.
h4. Source Audit and Performance Testing: Streaming

This task covers an audit of the Streaming implementation in Apache Cassandra 
4.0. In this release, contributors have implemented full-SSTable streaming to 
improve performance and reduce memory pressure. Internode messaging changes 
implemented in CASSANDRA-15066 adjacent to streaming suggested that review of 
the streaming implementation itself may be desirable. Prior work also covered 
performance testing of full-SSTable streaming.
h4. Test Infrastructure / Automation: "Harry"

CASSANDRA-15348 - Harry: generator library and extensible framework for fuzz 
testing Apache Cassandra TRIAGE NEEDED

Harry is a component for fuzz testing and verification of the Apache Cassandra 
clusters at scale. Harry allows to run tests that are able to validate state of 
both dense nodes (to test local read-write path) and large clusters (to test 
distributed read-write path), and do it efficiently. Harry defines a model that 
holds the state of the database, generators that produce reproducible, 
pseudo-random schemas, mutations, and queries, and a validator that asserts the 
correctness of the model following execution of generated traffic. See 
CASSANDRA-15348 for additional details.
h4. Local Read/Write Path: IndexInfo (CASSANDRA-11206)

Users upgrading from Cassandra 3.0.x to trunk will pick up CASSANDRA-11206 in 
the process. Contributors to 4.0 testing and validation have allocated time to 
testing and validation of these changes via source audit and implementation of 
property-based tests (currently underway). The majority of planned work here is 
complete, with a final set of perf tests in progress. No correctness issues 
were identified via the source audit and 

[jira] [Updated] (CASSANDRA-15536) 4.0 Quality: Components and Test Plans

2020-02-28 Thread Josh McKenzie (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh McKenzie updated CASSANDRA-15536:
--
Summary: 4.0 Quality: Components and Test Plans  (was: (JIRA WORKFLOW TEST) 
4.0 Quality: Components and Test Plans)

> 4.0 Quality: Components and Test Plans
> --
>
> Key: CASSANDRA-15536
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15536
> Project: Cassandra
>  Issue Type: Epic
>  Components: Test/benchmark, Test/dtest, Test/fuzz, Test/unit
>Reporter: Josh McKenzie
>Assignee: Josh McKenzie
>Priority: High
> Fix For: 4.0
>
>
> [ALPHA TEST]
> {color:#de350b} This is a test to shift the test tracking and work from 
> [cwiki|https://cwiki.apache.org/confluence/display/CASSANDRA/4.0+Quality:+Components+and+Test+Plans]
>  into JIRA to unify our workflow and reduce friction to collaboration on all 
> the work going into 4.0. The goal of this exercise is to see if this works as 
> a value-add replacement for the old model.{color}
> -
>  The overarching goal of the 4.0 release is that Cassandra 4.0 should be at a 
> state where major users would run it in production when it is cut. To gain 
> this confidence there are various ongoing testing efforts involving 
> correctness, performance, and ease of use. In this page we try to coordinate 
> and identify blockers for subsystems before we can release 4.0
> For each component we strive to have shepherds and contributors involved. 
> Shepherds should be committers or knowledgeable component owners and are 
> responsible for driving their blocking tickets to completion and ensuring 
> quality in their claimed area, while contributors have signed up to help 
> verify that subsystem by running tests or contributing fixes. Shepherds also 
> ideally help set testing standards and ensure that we meet a high standard of 
> quality in their claimed area.
> {color:#de350b}(For now, we will overload "assignee == shepherd", and 
> "reviewer(s) == contributors" so we don't have to change fields in 
> JIRA.){color}
> -If you are interested in contributing to testing 4.0, please add your name 
> as a contributor and get involved in the the tracking ticket, and dev 
> list/IRC discussions involving that component.-
>  {color:#de350b}For now - please treat these tickets as read-only until such 
> time as we discuss this approach on the dev ML.{color}
> h3. Targeted Components / Subsystems
> We've tried to collect some of the major components or subsystems that we 
> want to ensure work properly towards having a great 4.0 release. If you think 
> something is missing please add it. Better yet volunteer to contribute to 
> testing it!
> h4. Internode Messaging
>  In 4.0 we're getting a new Netty based inter-node communication system 
> (CASSANDRA-8457). As internode messaging is vital to the correctness and 
> performance of the database we should make sure that all forms (TLS, 
> compressed, low latency, high latency, etc ...) of internode messaging 
> function correctly.
> h4. Test Infrastructure / Automation: Diff Testing
>  Diff testing is a form of model-based testing in which two clusters are 
> exhaustively compared to assert identity. To support Apache Cassandra 4.0 
> validation, contributors have developed cassandra-diff. This is a Spark 
> application that distributes the token range over a configurable number of 
> Spark executors, then parallelizes randomized forward and reverse reads with 
> varying paging sizes to read and compare every row present in the cluster, 
> persisting a record of mismatches for investigation. This methodology has 
> been instrumental to identifying data loss, data corruption, and incorrect 
> response issues introduced in early Cassandra 3.0 releases.
> cassandra-diff and associated documentation can be found at: 
> [https://github.com/apache/cassandra-diff]. Contributors are encouraged to 
> run diff tests against clusters they manage and report issues to ensure 
> workload diversity across the project.
> h4. System Tables and Internal Schema
>  This task covers a review of and minor bug fixes to local and distributed 
> system keyspaces. Planned work in this area is now complete.
> h4. Source Audit and Performance Testing: Streaming
>  This task covers an audit of the Streaming implementation in Apache 
> Cassandra 4.0. In this release, contributors have implemented full-SSTable 
> streaming to improve performance and reduce memory pressure. Internode 
> messaging changes implemented in CASSANDRA-15066 adjacent to streaming 
> suggested that review of the streaming implementation itself may be 
> desirable. Prior work also covered performance testing of full-SSTable 
> streaming.
> h4. Test Infrastructure / Automation: "Harry"
>  CASSANDRA-15348 - 

[jira] [Updated] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread rsasupport (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rsasupport updated CASSANDRA-15608:
---
Complexity: Challenging
   Impacts: None,Clients  (was: None)
  Platform: All,Linux  (was: All)
  Severity: Critical

> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Assignee: Nate McCall
>Priority: Urgent
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread rsasupport (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

rsasupport reassigned CASSANDRA-15608:
--

Assignee: Nate McCall

> full repair getting failed for 1 large table version 3.11.2
> ---
>
> Key: CASSANDRA-15608
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Repair, Consistency/Streaming
>Reporter: rsasupport
>Assignee: Nate McCall
>Priority: Normal
> Attachments: system.log
>
>
> nodetool repair -pr test table1
> [2020-02-28 17:56:00,192] Starting repair command #1 
> (7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
> options (parallelism: parallel, primary range: true, incremental: true, job 
> threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of 
> ranges: 256, pull repair: false)
> [2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
> for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 
> 1%)
> [2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 
> 2%)
> [2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 
> 2%)
> [2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
> for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
> [2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
> for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
> [2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
> for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 
> 3%)
> [2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
> for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 
> 4%)
> [2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
> for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 
> 4%)
> [2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
> for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
> [2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
> for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
> [2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
> for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
> [2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
> for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
> Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
> WARNING: Failed to check the connection: java.net.SocketTimeoutException: 
> Read timed out
> [2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
> for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)
> Current setup Details:
> Allocated Memory: 128gb
> No of nodes:12 DC1 and 12 DC2
> |*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
> |1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
> |2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
> |3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
> |4|GC SETTINGS|CMS|G1GC|jvm.options|
> |5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
> |6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
> |7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set 
> it)|
> |8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
> |9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
> |11|phi_convict_threshold|#8|12|cassandra.yaml|
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15608) full repair getting failed for 1 large table version 3.11.2

2020-02-28 Thread rsasupport (Jira)
rsasupport created CASSANDRA-15608:
--

 Summary: full repair getting failed for 1 large table version 
3.11.2
 Key: CASSANDRA-15608
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15608
 Project: Cassandra
  Issue Type: Bug
  Components: Consistency/Repair, Consistency/Streaming
Reporter: rsasupport
 Attachments: system.log

nodetool repair -pr test table1
[2020-02-28 17:56:00,192] Starting repair command #1 
(7a634f80-5a25-11ea-8b5a-3b5edc400697), repairing keyspace test with repair 
options (parallelism: parallel, primary range: true, incremental: true, job 
threads: 1, ColumnFamilies: [table1], dataCenters: [], hosts: [], # of ranges: 
256, pull repair: false)
[2020-02-28 17:56:17,173] Repair session 7ae020a1-5a25-11ea-8b5a-3b5edc400697 
for range [(-1398494450650640175,-1398484603620354999]] finished (progress: 1%)
[2020-02-28 17:56:22,633] Repair session 7ad3c491-5a25-11ea-8b5a-3b5edc400697 
for range [(-2055080187802596770,-2055069591393124910]] finished (progress: 2%)
[2020-02-28 17:56:23,888] Repair session 7ace4651-5a25-11ea-8b5a-3b5edc400697 
for range [(-5916076129714268351,-5916054112636649354]] finished (progress: 2%)
[2020-02-28 17:56:27,675] Repair session 7ad83161-5a25-11ea-8b5a-3b5edc400697 
for range [(3262419392837600448,3262453314317655259]] finished (progress: 3%)
[2020-02-28 17:56:31,674] Repair session 7ad32851-5a25-11ea-8b5a-3b5edc400697 
for range [(2146167582342729236,2146207858477993492]] finished (progress: 3%)
[2020-02-28 17:56:52,964] Repair session 7af15eb0-5a25-11ea-8b5a-3b5edc400697 
for range [(-7088663015651548617,-7088601571353808807]] finished (progress: 3%)
[2020-02-28 17:57:03,237] Repair session 7acf30b2-5a25-11ea-8b5a-3b5edc400697 
for range [(-5392084277480362928,-5391992322619247178]] finished (progress: 4%)
[2020-02-28 17:57:08,937] Repair session 7ada5443-5a25-11ea-8b5a-3b5edc400697 
for range [(-2247846360993532740,-2247745485112718002]] finished (progress: 4%)
[2020-02-28 17:57:14,102] Repair session 7ae1a740-5a25-11ea-8b5a-3b5edc400697 
for range [(4804514170425290662,4804611019437533765]] finished (progress: 5%)
[2020-02-28 17:57:18,190] Repair session 7ae6fe71-5a25-11ea-8b5a-3b5edc400697 
for range [(4989724097640378549,4989803009764075856]] finished (progress: 5%)
[2020-02-28 17:57:27,549] Repair session 7acfccf1-5a25-11ea-8b5a-3b5edc400697 
for range [(4512226173347132723,4512312572318114003]] finished (progress: 5%)
[2020-02-28 17:57:37,534] Repair session 7ae1ce50-5a25-11ea-8b5a-3b5edc400697 
for range [(7146381406763943813,7146516394221309657]] finished (progress: 6%)
Feb 28, 2020 5:59:00 PM ClientCommunicatorAdmin Checker-run
WARNING: Failed to check the connection: java.net.SocketTimeoutException: Read 
timed out
[2020-02-28 17:59:36,419] Repair session 7aeacf01-5a25-11ea-8b5a-3b5edc400697 
for range [(4717535938324360043,4717683452072792471]] finished (progress: 6%)

Current setup Details:

Allocated Memory: 128gb

No of nodes:12 DC1 and 12 DC2
|*Sr No*|*Parameter Name*|*Old value*|*New Value*|*File Name*|
|1|Min Heap Size -Xms|-Xms32G|-Xms64G|jvm.options|
|2|Max Heap Size -Xmx|-Xms32G|-Xms64G|jvm.options|
|3|Young generation size -Xmn|#-Xmn800M|-Xmn3072M|jvm.options|
|4|GC SETTINGS|CMS|G1GC|jvm.options|
|5|Disk Read_ahead_kb|4096|64|/sys/class/block/sdd/queue/read_ahead_kb|
|6|Disk Rotational|1|1|/sys/class/block/sdd/queue/rotational|
|7|block devices attributes|8192|128|blockdev --setra 128 /dev/sdd (To set it)|
|8|commitlog_segment_size_in_mb|32|64|cassandra.yaml|
|9|memtable_heap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
|10|memtable_offheap_space_in_mb|1/4 the size of the heap|8192|cassandra.yaml|
|11|phi_convict_threshold|#8|12|cassandra.yaml|

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15607) Incorrect Outcome returned when acquiring capacity for incoming message

2020-02-28 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-15607:
---

 Summary: Incorrect Outcome returned when acquiring capacity for 
incoming message
 Key: CASSANDRA-15607
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15607
 Project: Cassandra
  Issue Type: Bug
  Components: Messaging/Internode
Reporter: Sam Tunnicliffe


When acquiring capacity to process an inbound internode message, a failure to 
allocate from the endpoint-specific reserve returns the wrong {{Outcome}}. This 
means we only ever register with {{globalWaitQueue}}, although this probably 
doesn't actually ever get detected as the global and endpoint queues are always 
signalled together.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-02-28 Thread Jorge Bay (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047432#comment-17047432
 ] 

Jorge Bay edited comment on CASSANDRA-15299 at 2/28/20 10:27 AM:
-

Let me know if I can help in any way with this ticket (early review / tests).


was (Author: jorgebg):
Let me know if I can help in anyway with this ticket (early review / tests).

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> ---
>
> Key: CASSANDRA-15299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
>  Labels: protocolv5
> Fix For: 4.0-beta
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-02-28 Thread Jorge Bay (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17047432#comment-17047432
 ] 

Jorge Bay commented on CASSANDRA-15299:
---

Let me know if I can help in anyway with this ticket (early review / tests).

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> ---
>
> Key: CASSANDRA-15299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client
>Reporter: Aleksey Yeschenko
>Assignee: Aleksey Yeschenko
>Priority: Normal
>  Labels: protocolv5
> Fix For: 4.0-beta
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org