[jira] [Updated] (CASSANDRA-16217) Minimal 4.0 COMPACT STORAGE backport

2020-12-07 Thread Alex Petrov (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov updated CASSANDRA-16217:

Resolution: Fixed
Status: Resolved  (was: Open)

> Minimal 4.0 COMPACT STORAGE backport
> 
>
> Key: CASSANDRA-16217
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16217
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/CQL
>Reporter: Alex Petrov
>Assignee: Alex Petrov
>Priority: Normal
> Fix For: 4.0-beta4
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> There are several behavioural changes related to compact storage, and these 
> differences are larger than most of us have anticipated: we first thought 
> there’ll be that “appearing column”, but there’s implicit nulls in 
> clusterings thing, and row vs column deletion.
> Some of the recent issues on the subject are: CASSANDRA-16048, which allows 
> to ignore these differences. The other one was trying to improve user 
> experience of anyone still using compact storage: CASSANDRA-15811.
> Easily reproducible differernces are:
> (1) hidden columns show up, which breaks SELECT * queries
>  (2) DELETE v and UPDATE v WITH TTL would result into row removals in 
> non-dense compact tables (CASSANDRA-16069)
>  (3) INSERT allows skipping clusterings, which are filled with nulls by 
> default.
> Some of these are tricky to support, as 15811 has shown. Anyone on OSS side 
> who might want to upgrade to 4.0 while still using compact storage might be 
> affected by being forced into one of these behaviours.
> Possible solutions are to document these behaviours, or to bring back a 
> minimal set of COMPACT STORAGE to keep supporting these.
> It looks like it is possible to leave some of the functionality related to 
> DENSE flag and allow it to be present in 4.0, but only for these three (and 
> potential related, however not direrclty visible) cases.
> [~e.dimitrova] since you were working on removal on compact storage, wanted 
> to reassure that this is not a revert of your patch. On contrary: your patch 
> was instrumental in identifying the right places.
> cc [~slebresne] [~aleksey] [~benedict] [~marcuse]
> |[patch|https://github.com/apache/cassandra/pull/785]|[ci|https://app.circleci.com/pipelines/github/ifesdjeen/cassandra?branch=13994-followup]|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra-dtest] branch trunk updated: Follow-up: remove tests deprecated by CASSANDRA-16217

2020-12-07 Thread ifesdjeen
This is an automated email from the ASF dual-hosted git repository.

ifesdjeen pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra-dtest.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 6f97913  Follow-up: remove tests deprecated by CASSANDRA-16217
 new 6428562  Merge pull request #104 from 
ifesdjeen/CASSANDRA-16217-followup
6f97913 is described below

commit 6f97913eef843d449afd4cabaa7e78a73e9b226b
Author: Alex Petrov 
AuthorDate: Fri Nov 20 16:21:43 2020 +0100

Follow-up: remove tests deprecated by CASSANDRA-16217

Patch by Alex Petrov; reviewed by Ekaterina Dimitrova for CASSANDRA-16217
---
 upgrade_tests/upgrade_compact_storage.py | 116 ---
 1 file changed, 116 deletions(-)

diff --git a/upgrade_tests/upgrade_compact_storage.py 
b/upgrade_tests/upgrade_compact_storage.py
index 6732285..46b8e8e 100644
--- a/upgrade_tests/upgrade_compact_storage.py
+++ b/upgrade_tests/upgrade_compact_storage.py
@@ -53,25 +53,6 @@ class TestUpgradeSuperColumnsThrough(Tester):
 cluster.start()
 return cluster
 
-def test_upgrade_compact_storage(self):
-cluster = self.prepare(cassandra_version='github:apache/cassandra-3.0')
-node = self.cluster.nodelist()[0]
-session = self.patient_cql_connection(node, row_factory=dict_factory)
-
-session.execute("CREATE KEYSPACE ks WITH replication = {'class': 
'SimpleStrategy','replication_factor': '1' };")
-session.execute("CREATE TABLE ks.compact_table (pk int PRIMARY KEY, 
col1 int, col2 int) WITH COMPACT STORAGE")
-
-for i in range(1, 5):
-session.execute("INSERT INTO ks.compact_table (pk, col1, col2) 
VALUES ({i}, {i}, {i})".format(i=i))
-
-self.upgrade_to_version(VERSION_TRUNK, wait=False)
-self.fixture_dtest_setup.allow_log_errors = True
-
-time.sleep(5)
-# After restart, it won't start
-errors = len(node.grep_log("Compact Tables are not allowed in 
Cassandra starting with 4.0 version"))
-assert errors > 0
-
 def test_mixed_cluster(self):
 cluster = self.prepare(num_nodes=2, cassandra_version=VERSION_311)
 node1, node2 = self.cluster.nodelist()
@@ -113,40 +94,6 @@ class TestUpgradeSuperColumnsThrough(Tester):
 assert (list(session.execute("SELECT * FROM ks.compact_table WHERE pk 
= 1")) ==
  [{'col2': 1, 'pk': 1, 'column1': None, 'value': None, 
'col1': 1}])
 
-def test_force_readd_compact_storage(self):
-cluster = self.prepare(cassandra_version=VERSION_311)
-node = self.cluster.nodelist()[0]
-session = self.patient_cql_connection(node, row_factory=dict_factory)
-
-session.execute("CREATE KEYSPACE ks WITH replication = {'class': 
'SimpleStrategy','replication_factor': '1' };")
-session.execute("CREATE TABLE ks.compact_table (pk int PRIMARY KEY, 
col1 int, col2 int) WITH COMPACT STORAGE")
-
-for i in range(1, 5):
-session.execute("INSERT INTO ks.compact_table (pk, col1, col2) 
VALUES ({i}, {i}, {i})".format(i=i))
-
-session.execute("ALTER TABLE ks.compact_table DROP COMPACT STORAGE")
-
-self.upgrade_to_version(VERSION_TRUNK, wait=True)
-
-session = self.patient_cql_connection(node, row_factory=dict_factory)
-session.execute("update system_schema.tables set flags={} where 
keyspace_name='ks' and table_name='compact_table';")
-
-assert (list(session.execute("SELECT * FROM ks.compact_table WHERE pk 
= 1")) ==
- [{'col2': 1, 'pk': 1, 'column1': None, 'value': None, 
'col1': 1}])
-
-self.fixture_dtest_setup.allow_log_errors = True
-
-node.stop(wait_other_notice=False)
-node.set_install_dir(version=VERSION_TRUNK)
-try:
-node.start(wait_other_notice=False, wait_for_binary_proto=False, 
verbose=False)
-except NodeError:
-print("error")  # ignore
-time.sleep(5)
-# After restart, it won't start
-errors = len(node.grep_log("Compact Tables are not allowed in 
Cassandra starting with 4.0 version"))
-assert errors > 0
-
 def test_upgrade_with_dropped_compact_storage_index(self):
 cluster = self.prepare(cassandra_version=VERSION_311)
 node = self.cluster.nodelist()[0]
@@ -178,66 +125,3 @@ class TestUpgradeSuperColumnsThrough(Tester):
  [{'col1': '50', 'column1': None, 'pk': '5', 'value': 
None}])
 assert (list(session.execute("SELECT * FROM ks.compact_table WHERE pk 
= '5'")) ==
  [{'col1': '50', 'column1': None, 'pk': '5', 'value': 
None}])
-
-def test_downgrade_after_failed_upgrade(self):
-"""
-The purpose of this test is to show that after verifying early in the 
startup process in Cassandra 4.0 that
-COMPACT STORAGE was not removed prior start of an upgrade, users can 
still successfully 

[jira] [Commented] (CASSANDRA-16143) Streaming fails when s SSTable writer finish() exceeds internode_tcp_user_timeout

2020-12-07 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245600#comment-17245600
 ] 

David Capwell commented on CASSANDRA-16143:
---

+1

> Streaming fails when s SSTable writer finish() exceeds 
> internode_tcp_user_timeout
> -
>
> Key: CASSANDRA-16143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16143
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> tl;dr The internode TCP user timeout that provides more responsive detection 
> of dead nodes for internode message will cause streaming to fail if system 
> calls to fsync/fdatasync exceed the timeout (default 30s).
> To workaround, explicitly set internode_tcp_user_timeout to longer than 
> fsync/fdatasync, or to zero to revert to the operating system default.
> Details:
> While bootstrapping a replacement 4.0beta3 node in an existing cluster, 
> bootstrap streaming repeatedly failed with the streaming follower logging
> {code:java}
> ERROR 2020-09-10T14:29:34,711 [NettyStreaming-Outbound-1.1.1.1.7000:1] 
> org.apache.cassandra.streaming.StreamSession:693 - [Stream 
> #7cb67c00-f3ac-11ea-b940-f7836f164528] Streaming error occurred on session 
> with peer 1.1.1.1:7000
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:142)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:90)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:138)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:89)
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:180)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:40)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:347)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.50.Final.jar:4.1.50.Final]
>at java.lang.Thread.run(Thread.java:834) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:78)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:348)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) 
> [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> 

[jira] [Updated] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml

2020-12-07 Thread Jeremy Hanna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-16315:
-
Description: 
Since CASSANDRA-7551, we gave the following advice for setting 
{{concurrent_compactors}}:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase {{concurrent_compactors}} to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
{{concurrent_compactors}} for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using SSD based storage, you can increase the number of 
{{concurrent_compactors}}.  However be aware that using too many concurrent 
compactors can have a detrimental effect such as GC pressure, more context 
switching among compactors and realtime operations, and more random IO pulling 
data for different compactions.  It's best to test and measure with your 
workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.

  was:
Since CASSANDRA-7551, we gave the following advice for setting 
concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase {{concurrent_compactors}} to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
{{concurrent_compactors}} for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using SSD based storage, you can increase the number of 
{{concurrent_compactors}}.  However be aware that using too many concurrent 
compactors can have a detrimental effect such as GC pressure, more context 
switching among compactors and realtime operations, and more random IO pulling 
data for different compactions.  It's best to test and measure with your 
workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.


> Remove bad advice on concurrent compactors from cassandra.yaml
> --
>
> Key: CASSANDRA-16315
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16315
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Jeremy Hanna
>Priority: Normal
>
> Since CASSANDRA-7551, we gave the following advice for setting 
> {{concurrent_compactors}}:
> {code}
> # If your data directories are backed by SSD, you should increase this
> # to the number of cores.
> {code}
> However in practice there are a number of problems with this.  While it's 
> true that one can increase {{concurrent_compactors}} to improve efficiency of 
> compactions on machines with more cpu cores, the context switching with 
> random IO and GC associated with bringing compaction data into the heap will 
> work against the additional parallelism.
> This has caused problems for those who have taken this advice literally.
> I propose that we adjust this language to give a limit on number of 
> {{concurrent_compactors}} for this setting both in the 3.x line and in trunk 
> so that new users do not stumble when reviewing whether to change defaults.
> See also CASSANDRA-7139 for a discussion on considerations.
> I see two short-term options to avoid new user pain:
> 1. Change the language to say something like 

[jira] [Updated] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml

2020-12-07 Thread Jeremy Hanna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-16315:
-
Description: 
Since CASSANDRA-7551, we gave the following advice for setting 
concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase {{concurrent_compactors}} to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
{{concurrent_compactors}} for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using fast SSD, you can increase the number of {{concurrent_compactors}}.  
However be aware that using too many concurrent compactors can have a 
detrimental effect such as GC pressure, more context switching among compactors 
and realtime operations, and more random IO pulling data for different 
compactions.  It's best to test and measure with your workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.

  was:
Since CASSANDRA-7551, we gave the following advice for setting 
concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase concurrent_compactors to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
concurrent_compactors for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using fast SSD, you can increase the number of {{concurrent_compactors}}.  
However be aware that using too many concurrent compactors can have a 
detrimental effect such as GC pressure, more context switching among compactors 
and realtime operations, and more random IO pulling data for different 
compactions.  It's best to test and measure with your workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.


> Remove bad advice on concurrent compactors from cassandra.yaml
> --
>
> Key: CASSANDRA-16315
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16315
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Jeremy Hanna
>Priority: Normal
>
> Since CASSANDRA-7551, we gave the following advice for setting 
> concurrent_compactors:
> {code}
> # If your data directories are backed by SSD, you should increase this
> # to the number of cores.
> {code}
> However in practice there are a number of problems with this.  While it's 
> true that one can increase {{concurrent_compactors}} to improve efficiency of 
> compactions on machines with more cpu cores, the context switching with 
> random IO and GC associated with bringing compaction data into the heap will 
> work against the additional parallelism.
> This has caused problems for those who have taken this advice literally.
> I propose that we adjust this language to give a limit on number of 
> {{concurrent_compactors}} for this setting both in the 3.x line and in trunk 
> so that new users do not stumble when reviewing whether to change defaults.
> See also CASSANDRA-7139 for a discussion on considerations.
> I see two short-term options to avoid new user pain:
> 1. Change the language to say something like this:
> {quote}
> When using fast SSD, 

[jira] [Updated] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml

2020-12-07 Thread Jeremy Hanna (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16315?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-16315:
-
Description: 
Since CASSANDRA-7551, we gave the following advice for setting 
concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase {{concurrent_compactors}} to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
{{concurrent_compactors}} for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using SSD based storage, you can increase the number of 
{{concurrent_compactors}}.  However be aware that using too many concurrent 
compactors can have a detrimental effect such as GC pressure, more context 
switching among compactors and realtime operations, and more random IO pulling 
data for different compactions.  It's best to test and measure with your 
workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.

  was:
Since CASSANDRA-7551, we gave the following advice for setting 
concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase {{concurrent_compactors}} to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
{{concurrent_compactors}} for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using fast SSD, you can increase the number of {{concurrent_compactors}}.  
However be aware that using too many concurrent compactors can have a 
detrimental effect such as GC pressure, more context switching among compactors 
and realtime operations, and more random IO pulling data for different 
compactions.  It's best to test and measure with your workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.


> Remove bad advice on concurrent compactors from cassandra.yaml
> --
>
> Key: CASSANDRA-16315
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16315
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Local/Config
>Reporter: Jeremy Hanna
>Priority: Normal
>
> Since CASSANDRA-7551, we gave the following advice for setting 
> concurrent_compactors:
> {code}
> # If your data directories are backed by SSD, you should increase this
> # to the number of cores.
> {code}
> However in practice there are a number of problems with this.  While it's 
> true that one can increase {{concurrent_compactors}} to improve efficiency of 
> compactions on machines with more cpu cores, the context switching with 
> random IO and GC associated with bringing compaction data into the heap will 
> work against the additional parallelism.
> This has caused problems for those who have taken this advice literally.
> I propose that we adjust this language to give a limit on number of 
> {{concurrent_compactors}} for this setting both in the 3.x line and in trunk 
> so that new users do not stumble when reviewing whether to change defaults.
> See also CASSANDRA-7139 for a discussion on considerations.
> I see two short-term options to avoid new user pain:
> 1. Change the language to say something like this:
> {quote}
> 

[jira] [Created] (CASSANDRA-16315) Remove bad advice on concurrent compactors from cassandra.yaml

2020-12-07 Thread Jeremy Hanna (Jira)
Jeremy Hanna created CASSANDRA-16315:


 Summary: Remove bad advice on concurrent compactors from 
cassandra.yaml
 Key: CASSANDRA-16315
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16315
 Project: Cassandra
  Issue Type: Improvement
  Components: Local/Config
Reporter: Jeremy Hanna


Since CASSANDRA-7551, we gave the following advice for setting 
concurrent_compactors:

{code}
# If your data directories are backed by SSD, you should increase this
# to the number of cores.
{code}

However in practice there are a number of problems with this.  While it's true 
that one can increase concurrent_compactors to improve efficiency of 
compactions on machines with more cpu cores, the context switching with random 
IO and GC associated with bringing compaction data into the heap will work 
against the additional parallelism.

This has caused problems for those who have taken this advice literally.

I propose that we adjust this language to give a limit on number of 
concurrent_compactors for this setting both in the 3.x line and in trunk so 
that new users do not stumble when reviewing whether to change defaults.

See also CASSANDRA-7139 for a discussion on considerations.

I see two short-term options to avoid new user pain:

1. Change the language to say something like this:

{quote}
When using fast SSD, you can increase the number of {{concurrent_compactors}}.  
However be aware that using too many concurrent compactors can have a 
detrimental effect such as GC pressure, more context switching among compactors 
and realtime operations, and more random IO pulling data for different 
compactions.  It's best to test and measure with your workload and hardware.
{quote}

2. Do some significant testing of compaction efficient and read/write 
latency/throughput targets to see where the tipping point is - considering some 
constants around memory and heap size and configuration to keep it simple.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16314) nodetool cleanup not working

2020-12-07 Thread AaronTrazona (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AaronTrazona updated CASSANDRA-16314:
-
Resolution: (was: Not A Problem)
Status: Open  (was: Resolved)

> nodetool cleanup not working
> 
>
> Key: CASSANDRA-16314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16314
> Project: Cassandra
>  Issue Type: Bug
>Reporter: AaronTrazona
>Priority: Low
> Attachments: image-2020-12-07-09-23-02-002.png, 
> image-2020-12-07-09-23-33-788.png, image-2020-12-07-09-24-54-453.png, 
> image-2020-12-07-09-26-28-702.png
>
>
> Hi,
>  
> After setting up the 3 clusters, I want to free up the disk on my first 
> cluster since 
> the previous still there.
> This is the nodetool status before running the nodetool cleanup
> !image-2020-12-07-09-23-02-002.png!
> When I run the  nodetool cleanup 
> !image-2020-12-07-09-23-33-788.png!
> After I run the nodetool cleanup . I check if the node free up the spaces. 
> This is the result
> !image-2020-12-07-09-24-54-453.png!
> It's seems that the nodetool cleanup not working well
> cassandra version and java version
> !image-2020-12-07-09-26-28-702.png!
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16314) nodetool cleanup not working

2020-12-07 Thread AaronTrazona (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

AaronTrazona updated CASSANDRA-16314:
-
Severity: Low

> nodetool cleanup not working
> 
>
> Key: CASSANDRA-16314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16314
> Project: Cassandra
>  Issue Type: Bug
>Reporter: AaronTrazona
>Priority: Low
> Attachments: image-2020-12-07-09-23-02-002.png, 
> image-2020-12-07-09-23-33-788.png, image-2020-12-07-09-24-54-453.png, 
> image-2020-12-07-09-26-28-702.png
>
>
> Hi,
>  
> After setting up the 3 clusters, I want to free up the disk on my first 
> cluster since 
> the previous still there.
> This is the nodetool status before running the nodetool cleanup
> !image-2020-12-07-09-23-02-002.png!
> When I run the  nodetool cleanup 
> !image-2020-12-07-09-23-33-788.png!
> After I run the nodetool cleanup . I check if the node free up the spaces. 
> This is the result
> !image-2020-12-07-09-24-54-453.png!
> It's seems that the nodetool cleanup not working well
> cassandra version and java version
> !image-2020-12-07-09-26-28-702.png!
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16314) nodetool cleanup not working

2020-12-07 Thread AaronTrazona (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245589#comment-17245589
 ] 

AaronTrazona commented on CASSANDRA-16314:
--

Hi Romain,

 

This is the scenario,  I have 1 node at first which is 159 now i decided to add 
2 new nodes so i added both 201 and 116. Now the data is being distributed to 3 
nodes. Now i want to remove the old data on 159 so I did a  "nodetool -h  
10.147.18.159 cleanup" after that nothing still same load which is 2MiB..

 

Let me know if I clearly  understand what the uses of the nodetool cleanup..



Thanks.

> nodetool cleanup not working
> 
>
> Key: CASSANDRA-16314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16314
> Project: Cassandra
>  Issue Type: Bug
>Reporter: AaronTrazona
>Priority: Normal
> Attachments: image-2020-12-07-09-23-02-002.png, 
> image-2020-12-07-09-23-33-788.png, image-2020-12-07-09-24-54-453.png, 
> image-2020-12-07-09-26-28-702.png
>
>
> Hi,
>  
> After setting up the 3 clusters, I want to free up the disk on my first 
> cluster since 
> the previous still there.
> This is the nodetool status before running the nodetool cleanup
> !image-2020-12-07-09-23-02-002.png!
> When I run the  nodetool cleanup 
> !image-2020-12-07-09-23-33-788.png!
> After I run the nodetool cleanup . I check if the node free up the spaces. 
> This is the result
> !image-2020-12-07-09-24-54-453.png!
> It's seems that the nodetool cleanup not working well
> cassandra version and java version
> !image-2020-12-07-09-26-28-702.png!
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16143) Streaming fails when s SSTable writer finish() exceeds internode_tcp_user_timeout

2020-12-07 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245585#comment-17245585
 ] 

Yifan Cai commented on CASSANDRA-16143:
---

Thanks David! 
Pushed a new commit. 

bq. should we use NoSpamLogger vs normal logger?

Using {{NoSpamLogger}} sounds better. The warning message emits when disk goes 
slow. In that case, we can expect a lot of those warnings. Logging every 
warning does not provide more help, from the operators' perspective. I have 
changed the logging interval to 1 minute. 

> Streaming fails when s SSTable writer finish() exceeds 
> internode_tcp_user_timeout
> -
>
> Key: CASSANDRA-16143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16143
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> tl;dr The internode TCP user timeout that provides more responsive detection 
> of dead nodes for internode message will cause streaming to fail if system 
> calls to fsync/fdatasync exceed the timeout (default 30s).
> To workaround, explicitly set internode_tcp_user_timeout to longer than 
> fsync/fdatasync, or to zero to revert to the operating system default.
> Details:
> While bootstrapping a replacement 4.0beta3 node in an existing cluster, 
> bootstrap streaming repeatedly failed with the streaming follower logging
> {code:java}
> ERROR 2020-09-10T14:29:34,711 [NettyStreaming-Outbound-1.1.1.1.7000:1] 
> org.apache.cassandra.streaming.StreamSession:693 - [Stream 
> #7cb67c00-f3ac-11ea-b940-f7836f164528] Streaming error occurred on session 
> with peer 1.1.1.1:7000
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:142)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:90)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:138)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:89)
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:180)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:40)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:347)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.50.Final.jar:4.1.50.Final]
>at java.lang.Thread.run(Thread.java:834) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:78)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:348)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>

[jira] [Commented] (CASSANDRA-16143) Streaming fails when s SSTable writer finish() exceeds internode_tcp_user_timeout

2020-12-07 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245562#comment-17245562
 ] 

David Capwell commented on CASSANDRA-16143:
---

Finished first round of review, overall LGTM; only minor comments.

1) typo in docs; have an extra 0
2) should we use NoSpamLogger vs normal logger?

> Streaming fails when s SSTable writer finish() exceeds 
> internode_tcp_user_timeout
> -
>
> Key: CASSANDRA-16143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16143
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> tl;dr The internode TCP user timeout that provides more responsive detection 
> of dead nodes for internode message will cause streaming to fail if system 
> calls to fsync/fdatasync exceed the timeout (default 30s).
> To workaround, explicitly set internode_tcp_user_timeout to longer than 
> fsync/fdatasync, or to zero to revert to the operating system default.
> Details:
> While bootstrapping a replacement 4.0beta3 node in an existing cluster, 
> bootstrap streaming repeatedly failed with the streaming follower logging
> {code:java}
> ERROR 2020-09-10T14:29:34,711 [NettyStreaming-Outbound-1.1.1.1.7000:1] 
> org.apache.cassandra.streaming.StreamSession:693 - [Stream 
> #7cb67c00-f3ac-11ea-b940-f7836f164528] Streaming error occurred on session 
> with peer 1.1.1.1:7000
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:142)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:90)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:138)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:89)
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:180)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:40)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:347)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.50.Final.jar:4.1.50.Final]
>at java.lang.Thread.run(Thread.java:834) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:78)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:348)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) 
> [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> 

[jira] [Commented] (CASSANDRA-16079) Improve dtest runtime

2020-12-07 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245558#comment-17245558
 ] 

Adam Holmberg commented on CASSANDRA-16079:
---

 Everything looks pretty good to me. Thanks Mick for exploring this through 
multiple iterations!

I put a handful of comments on the ccm changes. Nothing too crazy -- most are 
"take 'em or leave 'em"
Tentative +1 from me on both ccm and dtest changes.

Have dtests been run on any of the previous supported C* branches? I tried 
kicking a few off and the dtests all show "blocked" in CircleCI. I've not tried 
to run Circle on earlier branches before, so I wasn't sure what to expect.

> Improve dtest runtime
> -
>
> Key: CASSANDRA-16079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16079
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Adam Holmberg
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Screenshot 2020-09-19 at 12.32.21.png
>
>
> A recent ticket, CASSANDRA-13701, changed the way dtests run, resulting in a 
> [30% increase in run 
> time|https://www.mail-archive.com/dev@cassandra.apache.org/msg15606.html]. 
> While that change was accepted, we wanted to spin out a ticket to optimize 
> dtests in an attempt to gain back some of that runtime.
> At this time we don't have concrete improvements in mind, so the first order 
> of this ticket will be to analyze the state of things currently, and try to 
> ascertain some valuable optimizations. Once the problems are understood, we 
> will break down subtasks to divide the work.
> Some areas to consider:
> * cluster reuse
> * C* startup optimizations
> * Tests that should be ported to in-JVM dtest or even unit tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16311) Extend the exclusion of replica filtering protection to other indices instead of just SASI

2020-12-07 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic updated CASSANDRA-16311:
--
Description: 
There was a check introduced in CASSANDRA-8272 telling if an index is a SASI 
index and if it is, replica filtering protection was not triggered.

There might be other custom index implementations which also do not support 
filtering protection and they do not have to be SASI indices neccessarily, 
however it is not possible exclude to them.

 

https://github.com/apache/cassandra/pull/844

  was:
There was a check introduced in CASSANDRA-8272 telling if an index is a SASI 
index and if it is, replica filtering protection was not triggered.

There might be other custom index implementations which also do not support 
filtering protection and they do not have to be SASI indices neccessarily, 
however it is not possible exclude to them.


> Extend the exclusion of replica filtering protection to other indices instead 
> of just SASI
> --
>
> Key: CASSANDRA-16311
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16311
> Project: Cassandra
>  Issue Type: Task
>  Components: Feature/2i Index, Feature/SASI
>Reporter: Stefan Miklosovic
>Assignee: Stefan Miklosovic
>Priority: Normal
>
> There was a check introduced in CASSANDRA-8272 telling if an index is a SASI 
> index and if it is, replica filtering protection was not triggered.
> There might be other custom index implementations which also do not support 
> filtering protection and they do not have to be SASI indices neccessarily, 
> however it is not possible exclude to them.
>  
> https://github.com/apache/cassandra/pull/844



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16290) Consistency can be violated when bootstrap or decommission is resumed after node restart

2020-12-07 Thread Stefan Miklosovic (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Miklosovic reassigned CASSANDRA-16290:
-

Assignee: (was: Stefan Miklosovic)

> Consistency can be violated when bootstrap or decommission is resumed after 
> node restart
> 
>
> Key: CASSANDRA-16290
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16290
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Bootstrap and Decommission
>Reporter: Paulo Motta
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>
> Since CASSANDRA-12008, successfully transferred ranges during decommission 
> are saved on the {{system.transferred_ranges}} table. This allow skipping 
> ranges already transferred when a failed decommission is retried with 
> {{nodetool decommission}}.
> If instead of resuming the decommission, an operator restarts the node, waits 
> N minutes and then performs a new decommission, the previously transferred 
> ranges will be skipped during streaming, and any writes received by the 
> decommissioned node during these N minutes will not be replicated to the new 
> range owner, what violates consistency.
> This issue is analogous to the issue mentioned [on this 
> comment|https://issues.apache.org/jira/browse/CASSANDRA-8838?focusedCommentId=16900234=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16900234]
>  for resumable bootstrap (CASSANDRA-8838).
> In order to prevent consistency violations we should clear the 
> {{system.transferred_ranges}} state during node restart, and maybe a system 
> property to disable it. While we're at this, we should change the default of 
> {{-Dcassandra.reset_bootstrap_progress}} to {{true}} to clear the 
> {{system.available_ranges}} state by default when a bootstrapping node is 
> restarted.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16143) Streaming fails when s SSTable writer finish() exceeds internode_tcp_user_timeout

2020-12-07 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16143:
--
Reviewers: Adam Holmberg, Benjamin Lerer, Berenguer Blasi, David Capwell  
(was: Adam Holmberg, Benjamin Lerer, Berenguer Blasi)

> Streaming fails when s SSTable writer finish() exceeds 
> internode_tcp_user_timeout
> -
>
> Key: CASSANDRA-16143
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16143
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Internode
>Reporter: Jon Meredith
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> tl;dr The internode TCP user timeout that provides more responsive detection 
> of dead nodes for internode message will cause streaming to fail if system 
> calls to fsync/fdatasync exceed the timeout (default 30s).
> To workaround, explicitly set internode_tcp_user_timeout to longer than 
> fsync/fdatasync, or to zero to revert to the operating system default.
> Details:
> While bootstrapping a replacement 4.0beta3 node in an existing cluster, 
> bootstrap streaming repeatedly failed with the streaming follower logging
> {code:java}
> ERROR 2020-09-10T14:29:34,711 [NettyStreaming-Outbound-1.1.1.1.7000:1] 
> org.apache.cassandra.streaming.StreamSession:693 - [Stream 
> #7cb67c00-f3ac-11ea-b940-f7836f164528] Streaming error occurred on session 
> with peer 1.1.1.1:7000
> org.apache.cassandra.net.AsyncChannelOutputPlus$FlushException: The channel 
> this output stream was writing to has been closed
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.propagateFailedFlush(AsyncChannelOutputPlus.java:200)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitUntilFlushed(AsyncChannelOutputPlus.java:158)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.waitForSpace(AsyncChannelOutputPlus.java:140)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.beginFlush(AsyncChannelOutputPlus.java:97)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.lambda$writeToChannel$0(AsyncStreamingOutputPlus.java:142)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.lambda$write$0(CassandraCompressedStreamWriter.java:90)
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.writeToChannel(AsyncStreamingOutputPlus.java:138)
>at 
> org.apache.cassandra.db.streaming.CassandraCompressedStreamWriter.write(CassandraCompressedStreamWriter.java:89)
>at 
> org.apache.cassandra.db.streaming.CassandraOutgoingFile.write(CassandraOutgoingFile.java:180)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage.serialize(OutgoingStreamMessage.java:87)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:45)
>at 
> org.apache.cassandra.streaming.messages.OutgoingStreamMessage$1.serialize(OutgoingStreamMessage.java:34)
>at 
> org.apache.cassandra.streaming.messages.StreamMessage.serialize(StreamMessage.java:40)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:347)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  [?:?]
>at 
> io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
>  [netty-all-4.1.50.Final.jar:4.1.50.Final]
>at java.lang.Thread.run(Thread.java:834) [?:?]
>Suppressed: java.nio.channels.ClosedChannelException
>at 
> org.apache.cassandra.net.AsyncStreamingOutputPlus.doFlush(AsyncStreamingOutputPlus.java:78)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.flush(AsyncChannelOutputPlus.java:229)
>at 
> org.apache.cassandra.net.AsyncChannelOutputPlus.close(AsyncChannelOutputPlus.java:248)
>at 
> org.apache.cassandra.streaming.async.NettyStreamingMessageSender$FileStreamTask.run(NettyStreamingMessageSender.java:348)
>at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
>at java.util.concurrent.FutureTask.run(FutureTask.java:264) 
> [?:?]
>at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  [?:?]
>at 
> 

[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-12-07 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245464#comment-17245464
 ] 

David Capwell commented on CASSANDRA-16213:
---

Thanks [~samt]!

Will wait for [~paulo] to review.

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-12-07 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245458#comment-17245458
 ] 

Sam Tunnicliffe commented on CASSANDRA-16213:
-

+1 from me too, thanks for incorporating my suggestions.

Nit typo: {{HostReplacementOfDowedClusterTest}}

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15369) Fake row deletions and range tombstones, causing digest mismatch and sstable growth

2020-12-07 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15369:
---
Fix Version/s: (was: 4.0-beta)
   4.0-beta4
   4.0

> Fake row deletions and range tombstones, causing digest mismatch and sstable 
> growth
> ---
>
> Key: CASSANDRA-15369
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15369
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Benedict Elliott Smith
>Assignee: Zhao Yang
>Priority: Normal
> Fix For: 4.0, 4.0-beta4
>
>
> As assessed in CASSANDRA-15363, we generate fake row deletions and fake 
> tombstone markers under various circumstances:
>  * If we perform a clustering key query (or select a compact column):
>  * Serving from a {{Memtable}}, we will generate fake row deletions
>  * Serving from an sstable, we will generate fake row tombstone markers
>  * If we perform a slice query, we will generate only fake row tombstone 
> markers for any range tombstone that begins or ends outside of the limit of 
> the requested slice
>  * If we perform a multi-slice or IN query, this will occur for each 
> slice/clustering
> Unfortunately, these different behaviours can lead to very different data 
> stored in sstables until a full repair is run.  When we read-repair, we only 
> send these fake deletions or range tombstones.  A fake row deletion, 
> clustering RT and slice RT, each produces a different digest.  So for each 
> single point lookup we can produce a digest mismatch twice, and until a full 
> repair is run we can encounter an unlimited number of digest mismatches 
> across different overlapping queries.
> Relatedly, this seems a more problematic variant of our atomicity failures 
> caused by our monotonic reads, since RTs can have an atomic effect across (up 
> to) the entire partition, whereas the propagation may happen on an 
> arbitrarily small portion.  If the RT exists on only one node, this could 
> plausibly lead to fairly problematic scenario if that node fails before the 
> range can be repaired. 
> At the very least, this behaviour can lead to an almost unlimited amount of 
> extraneous data being stored until the range is repaired and compaction 
> happens to overwrite the sub-range RTs and row deletions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16213) Cannot replace_address /X because it doesn't exist in gossip

2020-12-07 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-16213:
--
Status: Review In Progress  (was: Changes Suggested)

> Cannot replace_address /X because it doesn't exist in gossip
> 
>
> Key: CASSANDRA-16213
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16213
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip, Cluster/Membership
>Reporter: David Capwell
>Assignee: David Capwell
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We see this exception around nodes crashing and trying to do a host 
> replacement; this error appears to be correlated around multiple node 
> failures.
> A simplified case to trigger this is the following
> *) Have a N node cluster
> *) Shutdown all N nodes
> *) Bring up N-1 nodes (at least 1 seed, else replace seed)
> *) Host replace the N-1th node -> this will fail with the above
> The reason this happens is that the N-1th node isn’t gossiping anymore, and 
> the existing nodes do not have its details in gossip (but have the details in 
> the peers table), so the host replacement fails as the node isn’t known in 
> gossip.
> This affects all versions (tested 3.0 and trunk, assume 2.2 as well)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-16079) Improve dtest runtime

2020-12-07 Thread Michael Semb Wever (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244710#comment-17244710
 ] 

Michael Semb Wever edited comment on CASSANDRA-16079 at 12/7/20, 6:02 PM:
--

bq. Did you notice any speed improvements? All runs seem to be more or less the 
same when I look.

The objective of this ticket is, from the description: "…optimize dtests in an 
attempt to gain back some of that runtime."

If runs looks to have more or less the same performance, given they are 
including 13701, then that is a win.

Here's is another run, again showing performance consistent (or possibly 
better) than trunk: 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/262/

So… so long as we have made improvements that mostly offset the cost introduced 
by 13701 then this ticket I believe should be good to go. That is, this ticket 
is a blocker to 13701 getting merged into 4.0, hence this ticket is a blocker 
to 4.0. I don't have a problem with identified improvements to further improve 
dtest performance being spun out to further tickets (that don't block 4.0).

What are your thoughts [~Bereng], [~aholmber], [~paulo]?

For me the only remaining question here is if there are other dtests that 
should go through the runtime token allocation strategy rather than using the 
pre-generated tokens, that is using the cluster env variable: 
{{'CASSANDRA_TOKEN_PREGENERATION_DISABLED'}}.


was (Author: michaelsembwever):
bq. Did you notice any speed improvements? All runs seem to be more or less the 
same when I look.

The objective of this ticket is, from the description: "…optimize dtests in an 
attempt to gain back some of that runtime."

If runs looks to have more or less the same performance, given they are 
including 13701, then that is a win.

Here's is another run, again showing performance consistent (or possibly 
better) than trunk: 
https://ci-cassandra.apache.org/view/patches/job/Cassandra-devbranch-dtest/262/

So… so long as we have made improvements that mostly offset the cost introduced 
by 13701 then this ticket I believe should be good to go. That is, this ticket 
is a blocker to 13701 getting merged into 4.0, hence this ticket is a blocker 
to 4.0. I don't have a problem with identified improvements to further improve 
dtest performance being spun out to further tickets (that don't block 4.0).

What are your thoughts [~Bereng], [~aholmber], [~pauloricardomg]?

For me the only remaining question here is if there are other dtests that 
should go through the runtime token allocation strategy rather than using the 
pre-generated tokens, that is using the cluster env variable: 
{{'CASSANDRA_TOKEN_PREGENERATION_DISABLED'}}.

> Improve dtest runtime
> -
>
> Key: CASSANDRA-16079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16079
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Adam Holmberg
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Screenshot 2020-09-19 at 12.32.21.png
>
>
> A recent ticket, CASSANDRA-13701, changed the way dtests run, resulting in a 
> [30% increase in run 
> time|https://www.mail-archive.com/dev@cassandra.apache.org/msg15606.html]. 
> While that change was accepted, we wanted to spin out a ticket to optimize 
> dtests in an attempt to gain back some of that runtime.
> At this time we don't have concrete improvements in mind, so the first order 
> of this ticket will be to analyze the state of things currently, and try to 
> ascertain some valuable optimizations. Once the problems are understood, we 
> will break down subtasks to divide the work.
> Some areas to consider:
> * cluster reuse
> * C* startup optimizations
> * Tests that should be ported to in-JVM dtest or even unit tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16079) Improve dtest runtime

2020-12-07 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245376#comment-17245376
 ] 

Adam Holmberg commented on CASSANDRA-16079:
---

Thanks for the ping. I'm looking at it now.

(also paging [~paulo] with his active handle)

> Improve dtest runtime
> -
>
> Key: CASSANDRA-16079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16079
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Adam Holmberg
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Screenshot 2020-09-19 at 12.32.21.png
>
>
> A recent ticket, CASSANDRA-13701, changed the way dtests run, resulting in a 
> [30% increase in run 
> time|https://www.mail-archive.com/dev@cassandra.apache.org/msg15606.html]. 
> While that change was accepted, we wanted to spin out a ticket to optimize 
> dtests in an attempt to gain back some of that runtime.
> At this time we don't have concrete improvements in mind, so the first order 
> of this ticket will be to analyze the state of things currently, and try to 
> ascertain some valuable optimizations. Once the problems are understood, we 
> will break down subtasks to divide the work.
> Some areas to consider:
> * cluster reuse
> * C* startup optimizations
> * Tests that should be ported to in-JVM dtest or even unit tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16079) Improve dtest runtime

2020-12-07 Thread Adam Holmberg (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Holmberg updated CASSANDRA-16079:
--
Reviewers: Adam Holmberg
   Status: Review In Progress  (was: Patch Available)

> Improve dtest runtime
> -
>
> Key: CASSANDRA-16079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16079
> Project: Cassandra
>  Issue Type: Improvement
>  Components: CI
>Reporter: Adam Holmberg
>Assignee: Michael Semb Wever
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Screenshot 2020-09-19 at 12.32.21.png
>
>
> A recent ticket, CASSANDRA-13701, changed the way dtests run, resulting in a 
> [30% increase in run 
> time|https://www.mail-archive.com/dev@cassandra.apache.org/msg15606.html]. 
> While that change was accepted, we wanted to spin out a ticket to optimize 
> dtests in an attempt to gain back some of that runtime.
> At this time we don't have concrete improvements in mind, so the first order 
> of this ticket will be to analyze the state of things currently, and try to 
> ascertain some valuable optimizations. Once the problems are understood, we 
> will break down subtasks to divide the work.
> Some areas to consider:
> * cluster reuse
> * C* startup optimizations
> * Tests that should be ported to in-JVM dtest or even unit tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16078) Performance regression for queries accessing multiple rows

2020-12-07 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams reassigned CASSANDRA-16078:


Assignee: Brandon Williams

> Performance regression for queries accessing multiple rows
> --
>
> Key: CASSANDRA-16078
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16078
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination
>Reporter: David Capwell
>Assignee: Brandon Williams
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: image.png
>
>
> This is spin off from CASSANDRA-16036.
> In testing 4.0 relative to 3.0* I found that queries which accessed multiple 
> rows to have a noticeable performance decrease; two queries were used in the 
> test (more may be impacted, others might not): query partition (table has 
> clustering keys) with LIMIT, and query clustering keys using IN clause.
> In the below graphs the green line is 3.0 and the other lines are 4.0 (with 
> and without chunk cache)
> Partition with LIMIT
> !https://issues.apache.org/jira/secure/attachment/13009751/clustering-slice_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009750/clustering-slice_latency_under90_selects_baseline.png!
> Cluster with IN clause
> !https://issues.apache.org/jira/secure/attachment/13009749/clustering-in-clause_latency_selects_baseline.png!
> !https://issues.apache.org/jira/secure/attachment/13009748/clustering-in-clause_latency_under90_selects_baseline.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16314) nodetool cleanup not working

2020-12-07 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16314:
-
Resolution: Not A Problem
Status: Resolved  (was: Triage Needed)

> nodetool cleanup not working
> 
>
> Key: CASSANDRA-16314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16314
> Project: Cassandra
>  Issue Type: Bug
>Reporter: AaronTrazona
>Priority: Normal
> Attachments: image-2020-12-07-09-23-02-002.png, 
> image-2020-12-07-09-23-33-788.png, image-2020-12-07-09-24-54-453.png, 
> image-2020-12-07-09-26-28-702.png
>
>
> Hi,
>  
> After setting up the 3 clusters, I want to free up the disk on my first 
> cluster since 
> the previous still there.
> This is the nodetool status before running the nodetool cleanup
> !image-2020-12-07-09-23-02-002.png!
> When I run the  nodetool cleanup 
> !image-2020-12-07-09-23-33-788.png!
> After I run the nodetool cleanup . I check if the node free up the spaces. 
> This is the result
> !image-2020-12-07-09-24-54-453.png!
> It's seems that the nodetool cleanup not working well
> cassandra version and java version
> !image-2020-12-07-09-26-28-702.png!
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16190) Add tests for streaming metrics

2020-12-07 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16190?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245281#comment-17245281
 ] 

Adam Holmberg commented on CASSANDRA-16190:
---

Note: Yifan bootstrapped a streaming metrics test in CASSANDRA-16143 that may 
serve as a basis for this ticket.

> Add tests for streaming metrics
> ---
>
> Key: CASSANDRA-16190
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16190
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/python
>Reporter: Benjamin Lerer
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We currently have no tests to checks the streaming metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16192) Add more tests to cover compaction metrics

2020-12-07 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245253#comment-17245253
 ] 

Benjamin Lerer commented on CASSANDRA-16192:


[~marcuse], [~jjirsa] while looking into the {{pendingTasks}} metric I 
discovered that it relies on 
{{AbstractCompactionStrategy.getEstimatedRemainingTasks()}} to determine the 
amount of compactions remaining to perform.
I found out that the way {{getEstimatedRemainingTasks()}} work is different 
depending on the strategy. {{LeveledCompactionStrategy}} will recompute the 
number of remaining tasks each time it is called while other strategies will 
compute the number of remaining tasks in {{getNextBackgroundSSTables}} and will 
cache the value. That value will be the one returned by 
{{getEstimatedRemainingTasks()}} until {{getNextBackgroundSSTables}} is called 
again.

That approach of caching the number of remaining tasks makes the 
{{pendingTasks}} metric inaccurate as it does not take into account newly 
flushed SSTables or compacted ones until {{getNextBackgroundSSTables}} is 
called again.
It also look that the cached number of remaining tasks is not modified when we 
force a major compaction and that by consequence the number of remaining tasks 
will only be updated once the major compaction it completely done, and the 
automatic compactions are renabled

It is not clear to me how accurate we want the {{pendingTasks}} metric to be 
and I would like to have your opinion on that point.  

Another point, I wanted to raise is that due to the way 
{{getEstimatedRemainingTasks()}} works, it looks that we base our compactions 
prioritisation on outdated estimations (see 
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/db/compaction/CompactionStrategyHolder.java#L109).
 


> Add more tests to cover compaction metrics
> --
>
> Key: CASSANDRA-16192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16192
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Benjamin Lerer
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> Some compaction metrics do not seems to be tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-13607) InvalidRequestException: Key may not be empty

2020-12-07 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245140#comment-17245140
 ] 

Benjamin Lerer edited comment on CASSANDRA-13607 at 12/7/20, 11:16 AM:
---

[~Marinho] Your stacktrace is a DSE stacktrace. Please raise the problem to the 
DSE support team.

[~anuragh] Based on the stacktrace it seems that something was wrong with the 
username provided to connect to the C* cluster. Now, the version that you use 
{{2.1.7}} is really old, the lastest 2.1 version is {{2.1.22}}. You should try 
upgrading to the latest version.

For information, 2.1 and 2.2 will reach EOL once 4.0 is out, which will 
probably happen beginning of 2021.


was (Author: blerer):
[~Marinho] Your stacktrace is a DSE stacktrace. Please raise the problem to DSE 
support.

[~anuragh] Based on the stacktrace it seems that something was wrong with the 
username provided to connect to the C* cluster. Now, the version that you use 
{{2.1.7}} is really old, the lastest 2.1 version is {{2.1.22}}. You should try 
upgrading to the latest version.

For information, 2.1 and 2.2 will reach EOL once 4.0 is out, which will 
probably happen beginning of 2021.

> InvalidRequestException: Key may not be empty
> -
>
> Key: CASSANDRA-13607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13607
> Project: Cassandra
>  Issue Type: Bug
>Reporter: anuragh
>Priority: Urgent
> Fix For: 2.1.x
>
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost ID 
>   Rack
> UN  10.146.80.71   175.01 GB  256 ?   
> 2a926f37-9b37-4915-9739-d04befec4f6b  rack1
> UN  10.146.81.149  105.71 GB  256 ?   
> 79b06413-fad7-4d23-aafa-28e8f2b54400  rack1
> UN  10.146.80.56   180.43 GB  256 ?   
> b5eb237f-9edf-4963-9f2c-4e39d6af0f0a  rack1
> UN  10.146.80.62   174.89 GB  256 ?   
> a7ba177f-0f21-4f24-b6fe-77937d005498  rack1
> Here adding this node(10.146.81.149) to the ring and the seed node is 
> 10.146.80.56.
> This new node is not JMX enabled but all the other 3 are JMX enabled.
> Is the below error is due to JMX?
> RROR [SharedPool-Worker-11] 2017-06-15 07:15:46,939 Message.java:538 - 
> Unexpected exception during request; channel = [id: 0x57764a3b, 
> /10.146.144.86:47336 => /10.146.81.149:9042]
> java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Key may not be empty
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:125)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:291)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:81)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
>  [apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
>  [apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_60]
>   at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  [apache-cassandra-2.1.7.jar:2.1.7]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.7.jar:2.1.7]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Key may 
> not be empty
>   at 
> org.apache.cassandra.cql3.QueryProcessor.validateKey(QueryProcessor.java:197) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:360)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.getPageableCommand(SelectStatement.java:253)
>  

[jira] [Commented] (CASSANDRA-13607) InvalidRequestException: Key may not be empty

2020-12-07 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245140#comment-17245140
 ] 

Benjamin Lerer commented on CASSANDRA-13607:


[~Marinho] Your stacktrace is a DSE stacktrace. Please raise the problem to DSE 
support.

[~anuragh] Based on the stacktrace it seems that something was wrong with the 
username provided to connect to the C* cluster. Now, the version that you use 
{{2.1.7}} is really old, the lastest 2.1 version is {{2.1.22}}. You should try 
upgrading to the latest version.

For information, 2.1 and 2.2 will reach EOL once 4.0 is out, which will 
probably happen beginning of 2021.

> InvalidRequestException: Key may not be empty
> -
>
> Key: CASSANDRA-13607
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13607
> Project: Cassandra
>  Issue Type: Bug
>Reporter: anuragh
>Priority: Urgent
> Fix For: 2.1.x
>
>
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> --  AddressLoad   Tokens  OwnsHost ID 
>   Rack
> UN  10.146.80.71   175.01 GB  256 ?   
> 2a926f37-9b37-4915-9739-d04befec4f6b  rack1
> UN  10.146.81.149  105.71 GB  256 ?   
> 79b06413-fad7-4d23-aafa-28e8f2b54400  rack1
> UN  10.146.80.56   180.43 GB  256 ?   
> b5eb237f-9edf-4963-9f2c-4e39d6af0f0a  rack1
> UN  10.146.80.62   174.89 GB  256 ?   
> a7ba177f-0f21-4f24-b6fe-77937d005498  rack1
> Here adding this node(10.146.81.149) to the ring and the seed node is 
> 10.146.80.56.
> This new node is not JMX enabled but all the other 3 are JMX enabled.
> Is the below error is due to JMX?
> RROR [SharedPool-Worker-11] 2017-06-15 07:15:46,939 Message.java:538 - 
> Unexpected exception during request; channel = [id: 0x57764a3b, 
> /10.146.144.86:47336 => /10.146.81.149:9042]
> java.lang.AssertionError: 
> org.apache.cassandra.exceptions.InvalidRequestException: Key may not be empty
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:125)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:291)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:81)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:439)
>  [apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:335)
>  [apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_60]
>   at 
> org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164)
>  [apache-cassandra-2.1.7.jar:2.1.7]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-2.1.7.jar:2.1.7]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> Caused by: org.apache.cassandra.exceptions.InvalidRequestException: Key may 
> not be empty
>   at 
> org.apache.cassandra.cql3.QueryProcessor.validateKey(QueryProcessor.java:197) 
> ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.getSliceCommands(SelectStatement.java:360)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.getPageableCommand(SelectStatement.java:253)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:213)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:118)
>  ~[apache-cassandra-2.1.7.jar:2.1.7]
>   ... 12 common frames omitted



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional 

[jira] [Commented] (CASSANDRA-16314) nodetool cleanup not working

2020-12-07 Thread Romain Hardouin (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245134#comment-17245134
 ] 

Romain Hardouin commented on CASSANDRA-16314:
-

Hi [~dnamicro] , there is nothing wrong with cleanup. Since you didn't add new 
nodes in the cluster the command has nothing to do.
{quote}nodetool cleanup - Triggers the immediate cleanup of keys *no longer 
belonging to a node*.
{quote}
[https://cassandra.apache.org/doc/latest/tools/nodetool/cleanup.html]
{quote}
h2. Cleanup data after range movements

As a safety measure, Cassandra does not automatically remove data from nodes 
that “lose” part of their token range due to a range movement operation 
(bootstrap, move, replace). Run *{{nodetool cleanup}}* on the nodes that lost 
ranges to the joining node when you are satisfied the new node is up and 
working. If you do not do this the old data will still be counted against the 
load on that node.
{quote}
[https://cassandra.apache.org/doc/latest/operating/topo_changes.html]

 

 

 

> nodetool cleanup not working
> 
>
> Key: CASSANDRA-16314
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16314
> Project: Cassandra
>  Issue Type: Bug
>Reporter: AaronTrazona
>Priority: Normal
> Attachments: image-2020-12-07-09-23-02-002.png, 
> image-2020-12-07-09-23-33-788.png, image-2020-12-07-09-24-54-453.png, 
> image-2020-12-07-09-26-28-702.png
>
>
> Hi,
>  
> After setting up the 3 clusters, I want to free up the disk on my first 
> cluster since 
> the previous still there.
> This is the nodetool status before running the nodetool cleanup
> !image-2020-12-07-09-23-02-002.png!
> When I run the  nodetool cleanup 
> !image-2020-12-07-09-23-33-788.png!
> After I run the nodetool cleanup . I check if the node free up the spaces. 
> This is the result
> !image-2020-12-07-09-24-54-453.png!
> It's seems that the nodetool cleanup not working well
> cassandra version and java version
> !image-2020-12-07-09-26-28-702.png!
> Thanks.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org