[jira] [Updated] (CASSANDRA-15820) tools/bin/fqltool doesn't work on all distributions

2020-05-19 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15820:
---
Reviewers: Michael Semb Wever, Michael Semb Wever  (was: Michael Semb Wever)
   Michael Semb Wever, Michael Semb Wever
   Status: Review In Progress  (was: Patch Available)

> tools/bin/fqltool doesn't work on all distributions
> ---
>
> Key: CASSANDRA-15820
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15820
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/fql
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> The line
> {code}
> if [ ! $1 ]; then break; fi
> {code}
> doesn't work on all OSes/Linux distributions (e.g. a bare Ubuntu 18.04) with 
> {{#!/bin/sh}} causing {{fqltool}} to fail. The fix is quite simple



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15820) tools/bin/fqltool doesn't work on all distributions

2020-05-19 Thread Michael Semb Wever (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Semb Wever updated CASSANDRA-15820:
---
Status: Ready to Commit  (was: Review In Progress)

+1 (verified and tested)

> tools/bin/fqltool doesn't work on all distributions
> ---
>
> Key: CASSANDRA-15820
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15820
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/fql
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> The line
> {code}
> if [ ! $1 ]; then break; fi
> {code}
> doesn't work on all OSes/Linux distributions (e.g. a bare Ubuntu 18.04) with 
> {{#!/bin/sh}} causing {{fqltool}} to fail. The fix is quite simple



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15820) tools/bin/fqltool doesn't work on all distributions

2020-05-19 Thread Eduard Tudenhoefner (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17110956#comment-17110956
 ] 

Eduard Tudenhoefner commented on CASSANDRA-15820:
-

+1

> tools/bin/fqltool doesn't work on all distributions
> ---
>
> Key: CASSANDRA-15820
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15820
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/fql
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> The line
> {code}
> if [ ! $1 ]; then break; fi
> {code}
> doesn't work on all OSes/Linux distributions (e.g. a bare Ubuntu 18.04) with 
> {{#!/bin/sh}} causing {{fqltool}} to fail. The fix is quite simple



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-15819) nodetool enablefullquerylog doesn't allow caller to make non-blocking

2020-05-19 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi reassigned CASSANDRA-15819:
---

Assignee: Berenguer Blasi

> nodetool enablefullquerylog doesn't allow caller to make non-blocking
> -
>
> Key: CASSANDRA-15819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15819
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: David Capwell
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>
> {code}
> $ ./bin/nodetool enablefullquerylog --path /tmp/deleteme --blocking false
> Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
> nodetool: Found unexpected parameters: [false]
> See 'nodetool help' or 'nodetool help '.
> {code}
> The root cause is boolean is special cased in airlift, so any time —blocking 
> is set it gets turned on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15789) Rows can get duplicated in mixed major-version clusters and after full upgrade

2020-05-19 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111027#comment-17111027
 ] 

Alex Petrov commented on CASSANDRA-15789:
-

+1 from me as well. 

Two tiny nits that can be fixed on commit: 
  * two getters ({{getCheckForDuplicateRowsDuringReads}} and 
{{getCheckForDuplicateRowsDuringCompaction}}) return {{void}}
  * 
[toIter|https://github.com/apache/cassandra/compare/trunk...krummas:15789-3.11#diff-c43c377976893dc7ae62e89072946ecbR141]
 can be replaced by {{Iterators#forArray()}}

I have some questions / meta-discussions:

  * In this patch, we're using an 
[executor|https://github.com/apache/cassandra/compare/trunk...krummas:15789-3.11#diff-32fe9b86f85fea958f137ab7862ec522R42]
 that doesn't get shut down. Should we use use non-periodic tasks for them?
  * we're setting {{snapshot_on_duplicate_row_detection}} via config and 
{{diagnostic_snapshot_interval_nanos}} via system property. I don't mind to 
have it as-is in current case, but we should generally try to consolidate the 
way we're managing configuration. 

> Rows can get duplicated in mixed major-version clusters and after full upgrade
> --
>
> Key: CASSANDRA-15789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15789
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In a mixed 2.X/3.X major version cluster a sequence of row deletes, 
> collection overwrites, paging, and read repair can cause 3.X nodes to split 
> individual rows into several rows with identical clustering. This happens due 
> to 2.X paging and RT semantics, and a 3.X {{LegacyLayout}} deficiency.
> To reproduce, set up a 2-node mixed major version cluster with the following 
> table:
> {code}
> CREATE TABLE distributed_test_keyspace.tlb (
> pk int,
> ck int,
> v map,
> PRIMARY KEY (pk, ck)
> );
> {code}
> 1. Using either node as the coordinator, delete the row with ck=2 using 
> timestamp 1
> {code}
> DELETE FROM tbl USING TIMESTAMP 1 WHERE pk = 1 AND ck = 2;
> {code}
> 2. Using either node as the coordinator, insert the following 3 rows:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 1, {'e':'f'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 3, {'i':'j'}) USING TIMESTAMP 3;
> {code}
> 3. Flush the table on both nodes
> 4. Using the 2.2 node as the coordinator, force read repar by querying the 
> table with page size = 2:
>  
> {code}
> SELECT * FROM tbl;
> {code}
> 5. Overwrite the row with ck=2 using timestamp 5:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 5;}}
> {code}
> 6. Query the 3.0 node and observe the split row:
> {code}
> cqlsh> select * from distributed_test_keyspace.tlb ;
>  pk | ck | v
> ++
>   1 |  1 | {'e': 'f'}
>   1 |  2 | {'g': 'h'}
>   1 |  2 | {'k': 'l'}
>   1 |  3 | {'i': 'j'}
> {code}
> This happens because the read to query the second page ends up generating the 
> following mutation for the 3.0 node:
> {code}
> ColumnFamily(tbl -{deletedAt=-9223372036854775808, localDeletion=2147483647,
>  ranges=[2:v:_-2:v:!, deletedAt=2, localDeletion=1588588821]
> [2:v:!-2:!,   deletedAt=1, localDeletion=1588588821]
> [3:v:_-3:v:!, deletedAt=2, localDeletion=1588588821]}-
>  [2:v:63:false:1@3,])
> {code}
> Which on 3.0 side gets incorrectly deserialized as
> {code}
> Mutation(keyspace='distributed_test_keyspace', key='0001', modifications=[
>   [distributed_test_keyspace.tbl] key=1 
> partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 
> columns=[[] | [v]]
> Row[info=[ts=-9223372036854775808] ]: ck=2 | del(v)=deletedAt=2, 
> localDeletion=1588588821, [v[c]=d ts=3]
> Row[info=[ts=-9223372036854775808] del=deletedAt=1, 
> localDeletion=1588588821 ]: ck=2 |
> Row[info=[ts=-9223372036854775808] ]: ck=3 | del(v)=deletedAt=2, 
> localDeletion=1588588821
> ])
> {code}
> {{LegacyLayout}} correctly interprets a range tombstone whose start and 
> finish {{collectionName}} values don't match as a wrapping fragment of a 
> legacy row deletion that's being interrupted by a collection deletion 
> (correctly) - see 
> [code|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1874-L1889].
>  Quoting the comment inline:
> {code}
> // Because of the way RangeTombstoneList work, we can have a tombstone where 
> only one of
> // the bound has a collectionName. That happens if we have a big tombstone A 
> (spanning one
> // or multiple 

[jira] [Commented] (CASSANDRA-15805) Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones interacts with collection tombstones

2020-05-19 Thread Sylvain Lebresne (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111019#comment-17111019
 ] 

Sylvain Lebresne commented on CASSANDRA-15805:
--

Thanks for the review. I addressed the comments, squash-cleaned, 'merged' into 
3.11 and started CI (first try at https://ci-cassandra.apache.org, not sure how 
that will go).

||branch||CI||
| [3.0|https://github.com/pcmanus/cassandra/commits/C-15805-3.0] | 
[ci-cassandra 
#122|https://ci-cassandra.apache.org/job/Cassandra-devbranch/122/] |
| [3.11|https://github.com/pcmanus/cassandra/commits/C-15805-3.11] | 
[ci-cassandra 
#123|https://ci-cassandra.apache.org/job/Cassandra-devbranch/123/] |


> Potential duplicate rows on 2.X->3.X upgrade when multi-rows range tombstones 
> interacts with collection tombstones
> --
>
> Key: CASSANDRA-15805
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15805
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/SSTable
>Reporter: Sylvain Lebresne
>Assignee: Sylvain Lebresne
>Priority: Normal
> Fix For: 3.0.x, 3.11.x
>
>
> The legacy reading code ({{LegacyLayout}} and 
> {{UnfilteredDeserializer.OldFormatDeserializer}}) does not handle correctly 
> the case where a range tombstone covering multiple rows interacts with a 
> collection tombstone.
> A simple example of this problem is if one runs on 2.X:
> {noformat}
> CREATE TABLE t (
>   k int,
>   c1 text,
>   c2 text,
>   a text,
>   b set,
>   c text,
>   PRIMARY KEY((k), c1, c2)
> );
> // Delete all rows where c1 is 'A'
> DELETE FROM t USING TIMESTAMP 1 WHERE k = 0 AND c1 = 'A';
> // Inserts a row covered by that previous range tombstone
> INSERT INTO t(k, c1, c2, a, b, c) VALUES (0, 'A', 'X', 'foo', {'whatever'}, 
> 'bar') USING TIMESTAMP 2;
> // Delete the collection of that previously inserted row
> DELETE b FROM t USING TIMESTAMP 3 WHERE k = 0 AND c1 = 'A' and c2 = 'X';
> {noformat}
> If the following is ran on 2.X (with everything either flushed in the same 
> table or compacted together), then this will result in the inserted row being 
> duplicated (one part containing the {{a}} column, the other the {{c}} one).
> I will note that this is _not_ a duplicate of CASSANDRA-15789 and this 
> reproduce even with the fix to {{LegacyLayout}} of this ticket. That said, 
> the additional code added to CASSANDRA-15789 to force merging duplicated rows 
> if they are produced _will_ end up fixing this as a consequence (assuming 
> there is no variation of this problem that leads to other visible issues than 
> duplicated rows). That said, I "think" we'd still rather fix the source of 
> the issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15789) Rows can get duplicated in mixed major-version clusters and after full upgrade

2020-05-19 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111052#comment-17111052
 ] 

Sam Tunnicliffe commented on CASSANDRA-15789:
-

{quote} * In this patch, we're using an 
[executor|https://github.com/apache/cassandra/compare/trunk...krummas:15789-3.11#diff-32fe9b86f85fea958f137ab7862ec522R42]
 that doesn't get shut down. Should we use use non-periodic tasks for 
them?{quote}

This is to be explicit about making the snapshot task execution single threaded 
to ensure that only a single snapshot per-prefix can be triggered on a replica. 
Non-periodic tasks should be, and most likely always is, effectively 
singlethreaded but it doesn't explicitly guarantee that.

{quote} * we're setting snapshot_on_duplicate_row_detection via config and 
diagnostic_snapshot_interval_nanos via system property. I don't mind to have it 
as-is in current case, but we should generally try to consolidate the way we're 
managing configuration. {quote}

{{diagnostic_snapshot_interval_nanos}} is purely for testing, so it didn't feel 
necessary to make that accessible to operators. We could subclass 
{{DiagnosticSnapshotService}} for testing instead, but it didn't seem too hacky 
to use a system property here.

> Rows can get duplicated in mixed major-version clusters and after full upgrade
> --
>
> Key: CASSANDRA-15789
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15789
> Project: Cassandra
>  Issue Type: Bug
>  Components: Consistency/Coordination, Local/Memtable, Local/SSTable
>Reporter: Aleksey Yeschenko
>Assignee: Marcus Eriksson
>Priority: Normal
>
> In a mixed 2.X/3.X major version cluster a sequence of row deletes, 
> collection overwrites, paging, and read repair can cause 3.X nodes to split 
> individual rows into several rows with identical clustering. This happens due 
> to 2.X paging and RT semantics, and a 3.X {{LegacyLayout}} deficiency.
> To reproduce, set up a 2-node mixed major version cluster with the following 
> table:
> {code}
> CREATE TABLE distributed_test_keyspace.tlb (
> pk int,
> ck int,
> v map,
> PRIMARY KEY (pk, ck)
> );
> {code}
> 1. Using either node as the coordinator, delete the row with ck=2 using 
> timestamp 1
> {code}
> DELETE FROM tbl USING TIMESTAMP 1 WHERE pk = 1 AND ck = 2;
> {code}
> 2. Using either node as the coordinator, insert the following 3 rows:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 1, {'e':'f'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 3;
> INSERT INTO tbl (pk, ck, v) VALUES (1, 3, {'i':'j'}) USING TIMESTAMP 3;
> {code}
> 3. Flush the table on both nodes
> 4. Using the 2.2 node as the coordinator, force read repar by querying the 
> table with page size = 2:
>  
> {code}
> SELECT * FROM tbl;
> {code}
> 5. Overwrite the row with ck=2 using timestamp 5:
> {code}
> INSERT INTO tbl (pk, ck, v) VALUES (1, 2, {'g':'h'}) USING TIMESTAMP 5;}}
> {code}
> 6. Query the 3.0 node and observe the split row:
> {code}
> cqlsh> select * from distributed_test_keyspace.tlb ;
>  pk | ck | v
> ++
>   1 |  1 | {'e': 'f'}
>   1 |  2 | {'g': 'h'}
>   1 |  2 | {'k': 'l'}
>   1 |  3 | {'i': 'j'}
> {code}
> This happens because the read to query the second page ends up generating the 
> following mutation for the 3.0 node:
> {code}
> ColumnFamily(tbl -{deletedAt=-9223372036854775808, localDeletion=2147483647,
>  ranges=[2:v:_-2:v:!, deletedAt=2, localDeletion=1588588821]
> [2:v:!-2:!,   deletedAt=1, localDeletion=1588588821]
> [3:v:_-3:v:!, deletedAt=2, localDeletion=1588588821]}-
>  [2:v:63:false:1@3,])
> {code}
> Which on 3.0 side gets incorrectly deserialized as
> {code}
> Mutation(keyspace='distributed_test_keyspace', key='0001', modifications=[
>   [distributed_test_keyspace.tbl] key=1 
> partition_deletion=deletedAt=-9223372036854775808, localDeletion=2147483647 
> columns=[[] | [v]]
> Row[info=[ts=-9223372036854775808] ]: ck=2 | del(v)=deletedAt=2, 
> localDeletion=1588588821, [v[c]=d ts=3]
> Row[info=[ts=-9223372036854775808] del=deletedAt=1, 
> localDeletion=1588588821 ]: ck=2 |
> Row[info=[ts=-9223372036854775808] ]: ck=3 | del(v)=deletedAt=2, 
> localDeletion=1588588821
> ])
> {code}
> {{LegacyLayout}} correctly interprets a range tombstone whose start and 
> finish {{collectionName}} values don't match as a wrapping fragment of a 
> legacy row deletion that's being interrupted by a collection deletion 
> (correctly) - see 
> [code|https://github.com/apache/cassandra/blob/cassandra-3.0/src/java/org/apache/cassandra/db/LegacyLayout.java#L1874-L1889].
>  Quoting the comment inline:
> {code}
> // Because of the way 

[jira] [Commented] (CASSANDRA-14200) NullPointerException when dumping sstable with null value for timestamp column

2020-05-19 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111075#comment-17111075
 ] 

Jacek Lewandowski commented on CASSANDRA-14200:
---

You do not even need to use JSON to reproduce, just pass empty string as a 
timestamp in CQLSH.

> NullPointerException when dumping sstable with null value for timestamp column
> --
>
> Key: CASSANDRA-14200
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14200
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Simon Zhou
>Assignee: Simon Zhou
>Priority: Normal
> Fix For: 3.0.x
>
>
> We have an sstable whose schema has a column of type timestamp and it's not 
> part of primary key. When dumping the sstable using sstabledump there is NPE 
> like this:
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at java.util.Calendar.setTime(Calendar.java:1770)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:943)
> at java.text.SimpleDateFormat.format(SimpleDateFormat.java:936)
> at java.text.DateFormat.format(DateFormat.java:345)
> at 
> org.apache.cassandra.db.marshal.TimestampType.toJSONString(TimestampType.java:93)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeCell(JsonTransformer.java:442)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeColumnData(JsonTransformer.java:376)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializeRow(JsonTransformer.java:280)
> at 
> org.apache.cassandra.tools.JsonTransformer.serializePartition(JsonTransformer.java:215)
> at java.util.stream.ForEachOps$ForEachOp$OfRef.accept(ForEachOps.java:184)
> at java.util.stream.ReferencePipeline$2$1.accept(ReferencePipeline.java:175)
> at java.util.Iterator.forEachRemaining(Iterator.java:116)
> at 
> java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
> at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at 
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at 
> java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
> at 
> java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
> at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
> at org.apache.cassandra.tools.JsonTransformer.toJson(JsonTransformer.java:104)
> at org.apache.cassandra.tools.SSTableExport.main(SSTableExport.java:242){code}
> The reason is that we use a null Date when there is no value for this column:
> {code}
> public Date deserialize(ByteBuffer bytes)
> {
> return bytes.remaining() == 0 ? null : new 
> Date(ByteBufferUtil.toLong(bytes));
> }
> {code}
> It seems that we should not deserialize columns with null values.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-05-19 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091505#comment-17091505
 ] 

Benjamin Lerer edited comment on CASSANDRA-15667 at 5/19/20, 12:36 PM:
---

Jenkins CI runs:
|[4.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/126/]|[3.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/125/]|[3.11|https://ci-cassandra.apache.org/job/Cassandra-devbranch/124/]|


was (Author: blerer):
Jenkins CI runs:
|[4.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/74/]|[3.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/75/]|[3.11|https://ci-cassandra.apache.org/job/Cassandra-devbranch/76/]|

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: log_bootstrap_resumable
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15794) Upgraded C* (4.x) fail to start because of Compact Tables & dropping compact tables in downgraded C* (3.11.4) introduces non-existent columns

2020-05-19 Thread Alex Petrov (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1717#comment-1717
 ] 

Alex Petrov commented on CASSANDRA-15794:
-

[~Zhuqi1108] unfortunately, I do not see an easy solution for this problem. If 
we do [CASSANDRA-15811] "the right way", it is rather easy to get rid of the 
{{value}} column, since it's just a regular column which we can drop. However, 
getting rid of the {{column1}} column is harder, since it's a clustering key. 
It gets even trickier to do this without disruptions in the cluster, since we 
have to rewrite sstables. 

bq. Instead of directly block cassandra startup, is it possible to correctly 
drop existing compact storage in 4.x

We've made a decision to prevent 4.0 from starting since it was clearly 
announced that there will be _no_ Thrift support in 4.0. There's no reason not 
to cut it out in 4.0, since we'll have to drop it eventually, and now is a 
great time to do so.

bq. if we stick to ask users to downgrade to 3.x and drop compact storage, can 
we not generate new commit log before we hit the error in 4.x, so that we avoid 
blocking 3.x from starting?

That's a good point, and I'll look into how to make sure we don't write any 
commit log messages in such case. If you would like to take over and check how 
to do this - you're more than welcome. 

bq. block or warn users when they try to introduce compact storage in 3.x? 

That's also a good point. We should add a log and a client warning. Would you 
like to do that? I'm not sure if blocking them from doing that is legitimate, 
even though I don't understand why anyone would use compact storage today. 

> Upgraded C* (4.x) fail to start because of Compact Tables & dropping compact 
> tables in downgraded C* (3.11.4) introduces non-existent columns
> -
>
> Key: CASSANDRA-15794
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15794
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Zhuqi Jin
>Priority: Normal
>
> We tried to test upgrading a 3.11.4 C* cluster to 4.x and run into the 
> following problems. 
>  * We started a single 3.11.4 C* node. 
>  * We ran cassandra-stress like this
> {code:java}
> ./cassandra-stress write n = 30 -rate threads = 10 -node  172.17.0.2 {code}
>  * We stopped this node, and started a C* node running C* compiled from trunk 
> (git commit: e394dc0bb32f612a476269010930c617dd1ed3cb)
>  * New C* failed to start with the following error message
> {code:java}
> ERROR [main] 2020-05-07 00:58:18,503 CassandraDaemon.java:245 - Error while 
> loading schema: ERROR [main] 2020-05-07 00:58:18,503 CassandraDaemon.java:245 
> - Error while loading schema: java.lang.IllegalArgumentException: Compact 
> Tables are not allowed in Cassandra starting with 4.0 version. Use `ALTER ... 
> DROP COMPACT STORAGE` command supplied in 3.x/3.11 Cassandra in order to 
> migrate off Compact Storage. at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:965)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:924)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:883)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspacesWithout(SchemaKeyspace.java:874)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchNonSystemKeyspaces(SchemaKeyspace.java:862)
>  at org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:102) at 
> org.apache.cassandra.schema.Schema.loadFromDisk(Schema.java:91) at 
> org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:241) 
> at 
> org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:653)
>  at 
> org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:770)Exception
>  (java.lang.IllegalArgumentException) encountered during startup: Compact 
> Tables are not allowed in Cassandra starting with 4.0 version. Use `ALTER ... 
> DROP COMPACT STORAGE` command supplied in 3.x/3.11 Cassandra in order to 
> migrate off Compact Storage.ERROR [main] 2020-05-07 00:58:18,520 
> CassandraDaemon.java:792 - Exception encountered during 
> startupjava.lang.IllegalArgumentException: Compact Tables are not allowed in 
> Cassandra starting with 4.0 version. Use `ALTER ... DROP COMPACT STORAGE` 
> command supplied in 3.x/3.11 Cassandra in order to migrate off Compact 
> Storage. at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTable(SchemaKeyspace.java:965)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchTables(SchemaKeyspace.java:924)
>  at 
> org.apache.cassandra.schema.SchemaKeyspace.fetchKeyspace(SchemaKeyspace.java:883)
>  at 
> 

[jira] [Updated] (CASSANDRA-8272) 2ndary indexes can return stale data

2020-05-19 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-8272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-8272:
--
Reviewers: Benjamin Lerer, ZhaoYang, Benjamin Lerer  (was: Benjamin Lerer, 
ZhaoYang)
   Benjamin Lerer, ZhaoYang, Benjamin Lerer  (was: Benjamin Lerer, 
ZhaoYang)
   Status: Review In Progress  (was: Patch Available)

> 2ndary indexes can return stale data
> 
>
> Key: CASSANDRA-8272
> URL: https://issues.apache.org/jira/browse/CASSANDRA-8272
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/2i Index
>Reporter: Sylvain Lebresne
>Assignee: Andres de la Peña
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 3.0.x, 4.0-beta
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> When replica return 2ndary index results, it's possible for a single replica 
> to return a stale result and that result will be sent back to the user, 
> potentially failing the CL contract.
> For instance, consider 3 replicas A, B and C, and the following situation:
> {noformat}
> CREATE TABLE test (k int PRIMARY KEY, v text);
> CREATE INDEX ON test(v);
> INSERT INTO test(k, v) VALUES (0, 'foo');
> {noformat}
> with every replica up to date. Now, suppose that the following queries are 
> done at {{QUORUM}}:
> {noformat}
> UPDATE test SET v = 'bar' WHERE k = 0;
> SELECT * FROM test WHERE v = 'foo';
> {noformat}
> then, if A and B acknowledge the insert but C respond to the read before 
> having applied the insert, then the now stale result will be returned (since 
> C will return it and A or B will return nothing).
> A potential solution would be that when we read a tombstone in the index (and 
> provided we make the index inherit the gcGrace of it's parent CF), instead of 
> skipping that tombstone, we'd insert in the result a corresponding range 
> tombstone.  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15667) StreamResultFuture check for completeness is inconsistent, leading to races

2020-05-19 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17091505#comment-17091505
 ] 

Benjamin Lerer edited comment on CASSANDRA-15667 at 5/19/20, 12:41 PM:
---

Jenkins CI runs:
|[4.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/128/]|[3.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/127/]|[3.11|https://ci-cassandra.apache.org/job/Cassandra-devbranch/124/]|


was (Author: blerer):
Jenkins CI runs:
|[4.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/126/]|[3.0|https://ci-cassandra.apache.org/job/Cassandra-devbranch/125/]|[3.11|https://ci-cassandra.apache.org/job/Cassandra-devbranch/124/]|

> StreamResultFuture check for completeness is inconsistent, leading to races
> ---
>
> Key: CASSANDRA-15667
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15667
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Streaming and Messaging
>Reporter: Sergio Bossa
>Assignee: Massimiliano Tomassi
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: log_bootstrap_resumable
>
>
> {{StreamResultFuture#maybeComplete()}} uses 
> {{StreamCoordinator#hasActiveSessions()}} to determine if all sessions are 
> completed, but then accesses each session state via 
> {{StreamCoordinator#getAllSessionInfo()}}: this is inconsistent, as the 
> former relies on the actual {{StreamSession}} state, while the latter on the 
> {{SessionInfo}} state, and the two are concurrently updated with no 
> coordination whatsoever.
> This leads to races, i.e. apparent in some dtest spurious failures, such as 
> {{TestBootstrap.resumable_bootstrap_test}} in CASSANDRA-15614 cc 
> [~e.dimitrova].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15773) Add test to cover metrics related to the BufferPool

2020-05-19 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1747#comment-1747
 ] 

Stephen Mallette commented on CASSANDRA-15773:
--

I was just wondering if anything else needed here on this one for it to be 
merged. Thanks.

> Add test to cover metrics related to the BufferPool
> ---
>
> Key: CASSANDRA-15773
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15773
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> At this time there do not appear to be unit tests to validate 
> {{BufferPoolMetrics}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15788) Add tests to cover CacheMetrics

2020-05-19 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1752#comment-1752
 ] 

Stephen Mallette commented on CASSANDRA-15788:
--

Does anyone have a moment to review this one (and trigger the full range of 
tests)? Thanks.

> Add tests to cover CacheMetrics
> ---
>
> Key: CASSANDRA-15788
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15788
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> {{CacheMetrics}} and {{ChunkCacheMetrics}} do not have unit tests covering 
> them.  {{CachingBench}} seems to provide some coverage but those tests (which 
> don't appear to run as part of the standard run of unit tests) are failing 
> and do not assert against all defined metrics, nor do they seem to assert 
> code in {{InstrumentingCache}} which also incremented metrics. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15820) tools/bin/fqltool doesn't work on all distributions

2020-05-19 Thread Robert Stupp (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-15820:
-
  Since Version: 4.0-alpha
Source Control Link: 
https://github.com/apache/cassandra/commit/ec07cd7e76c93bf713618f381480f500f6c4e62f
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks!

Committed as 
[ec07cd7e76c93bf713618f381480f500f6c4e62f|https://github.com/apache/cassandra/commit/ec07cd7e76c93bf713618f381480f500f6c4e62f]
 to [trunk|https://github.com/apache/cassandra/tree/trunk].


> tools/bin/fqltool doesn't work on all distributions
> ---
>
> Key: CASSANDRA-15820
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15820
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/fql
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> The line
> {code}
> if [ ! $1 ]; then break; fi
> {code}
> doesn't work on all OSes/Linux distributions (e.g. a bare Ubuntu 18.04) with 
> {{#!/bin/sh}} causing {{fqltool}} to fail. The fix is quite simple



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15819) nodetool enablefullquerylog doesn't allow caller to make non-blocking

2020-05-19 Thread Berenguer Blasi (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Berenguer Blasi updated CASSANDRA-15819:

Test and Documentation Plan: See PR
 Status: Patch Available  (was: In Progress)

> nodetool enablefullquerylog doesn't allow caller to make non-blocking
> -
>
> Key: CASSANDRA-15819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15819
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: David Capwell
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> $ ./bin/nodetool enablefullquerylog --path /tmp/deleteme --blocking false
> Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
> nodetool: Found unexpected parameters: [false]
> See 'nodetool help' or 'nodetool help '.
> {code}
> The root cause is boolean is special cased in airlift, so any time —blocking 
> is set it gets turned on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Fix tools/bin/fqltool for all shells

2020-05-19 Thread snazy
This is an automated email from the ASF dual-hosted git repository.

snazy pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new ec07cd7  Fix tools/bin/fqltool for all shells
ec07cd7 is described below

commit ec07cd7e76c93bf713618f381480f500f6c4e62f
Author: Robert Stupp 
AuthorDate: Tue May 19 07:00:41 2020 +0200

Fix tools/bin/fqltool for all shells

patch by Robert Stupp; reviewed by Mick Semb Wever and Eduard Tudenhöfner 
for CASSANDRA-15820
---
 CHANGES.txt   | 1 +
 tools/bin/fqltool | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index f430f5c..43aef72 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-alpha5
+ * Fix tools/bin/fqltool for all shells (CASSANDRA-15820)
  * Fix clearing of legacy size_estimates (CASSANDRA-15776)
  * Update port when reconnecting to pre-4.0 SSL storage (CASSANDRA-15727)
  * Only calculate dynamicBadnessThreshold once per loop in 
DynamicEndpointSnitch (CASSANDRA-15798)
diff --git a/tools/bin/fqltool b/tools/bin/fqltool
index a34128e..dc49e50 100755
--- a/tools/bin/fqltool
+++ b/tools/bin/fqltool
@@ -52,7 +52,7 @@ ARGS=""
 JVM_ARGS=""
 while true
 do
-  if [ ! $1 ]; then break; fi
+  if [ "x" = "x$1" ]; then break; fi
   case $1 in
 -D*)
   JVM_ARGS="$JVM_ARGS $1"


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15677) Topology events are not sent to clients if the nodes use the same network interface

2020-05-19 Thread Bryn Cooke (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1791#comment-1791
 ] 

Bryn Cooke commented on CASSANDRA-15677:


I'm having a go at this.
The dtest code needs tweaking to allow tests on the same interface.

> Topology events are not sent to clients if the nodes use the same network 
> interface
> ---
>
> Key: CASSANDRA-15677
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15677
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client
>Reporter: Alan Boudreault
>Priority: Normal
>  Labels: pull-request-available
> Fix For: 4.0-rc
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> *This bug only happens when the cassandra nodes are configured to use a 
> single network interface (ip) but different ports.  See CASSANDRA-7544.*
> Issue: The topology events aren't sent to clients. The problem is that the 
> port is not taken into account when determining if we send it or not:
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/transport/Server.java#L624
> To reproduce:
> {code}
> # I think the cassandra-test branch is required to get the -S option 
> (USE_SINGLE_INTERFACE)
> ccm create -n4 local40 -v 4.0-alpha2 -S
> {code}
>  
> Then run this small python driver script:
> {code}
> import time
> from cassandra.cluster import Cluster
> cluster = Cluster()
> session = cluster.connect()
> while True:
> print(cluster.metadata.all_hosts())
> print([h.is_up for h in cluster.metadata.all_hosts()])
> time.sleep(5)
> {code}
> Then decommission a node:
> {code}
> ccm node2 nodetool disablebinary
> ccm node2 nodetool decommission
> {code}
>  
> You should see that the node is never removed from the client cluster 
> metadata and the reconnector started.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-15819) nodetool enablefullquerylog doesn't allow caller to make non-blocking

2020-05-19 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111418#comment-17111418
 ] 

David Capwell edited comment on CASSANDRA-15819 at 5/19/20, 6:30 PM:
-

patch LGTM, running CI now 
https://app.circleci.com/pipelines/github/dcapwell/cassandra?branch=review%2FCASSANDRA-15819


was (Author: dcapwell):
patch LGTM, running CI now.

> nodetool enablefullquerylog doesn't allow caller to make non-blocking
> -
>
> Key: CASSANDRA-15819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15819
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: David Capwell
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> $ ./bin/nodetool enablefullquerylog --path /tmp/deleteme --blocking false
> Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
> nodetool: Found unexpected parameters: [false]
> See 'nodetool help' or 'nodetool help '.
> {code}
> The root cause is boolean is special cased in airlift, so any time —blocking 
> is set it gets turned on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15582) 4.0 quality testing: metrics

2020-05-19 Thread Stephen Mallette (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111405#comment-17111405
 ] 

Stephen Mallette commented on CASSANDRA-15582:
--

Just added CASSANDRA-15821 that presents some analysis on how published metrics 
match up to documentation. 

> 4.0 quality testing: metrics
> 
>
> Key: CASSANDRA-15582
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15582
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Josh McKenzie
>Assignee: Romain Hardouin
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: Screen Shot 2020-04-07 at 5.47.17 PM.png
>
>
> In past releases we've unknowingly broken metrics integrations and introduced 
> performance regressions in metrics collection and reporting. We strive in 4.0 
> to not do that. Metrics should work well!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15821) Metrics Documentation Enhancements

2020-05-19 Thread Stephen Mallette (Jira)
Stephen Mallette created CASSANDRA-15821:


 Summary: Metrics Documentation Enhancements
 Key: CASSANDRA-15821
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15821
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation/Website
Reporter: Stephen Mallette


CASSANDRA-15582 involves quality around metrics and it was mentioned that 
reviewing and [improving 
documentation|https://github.com/apache/cassandra/blob/trunk/doc/source/operating/metrics.rst]
 around metrics would fall into that scope. Please consider some of this 
analysis in determining what improvements to make here:

Please see [this 
spreadsheet|https://docs.google.com/spreadsheets/d/1iPWfCMIG75CI6LbYuDtCTjEOvZw-5dyH-e08bc63QnI/edit?usp=sharing]
 that itemizes almost all of cassandra's metrics and whether they are 
documented or not (and other notes).  That spreadsheet is "almost all" because 
there are some metrics that don't seem to initialize as part of Cassandra 
startup (i was able to trigger some to initialize, but all were not immediately 
obvious). The missing metrics seem to be related to the following:

* ThreadPool metrics - only some initialize at startup the list of which follow 
below
* Streaming Metrics
* HintedHandoff Metrics
* HintsService Metrics

Here are the ThreadPool scopes that get listed:

{code}
AntiEntropyStage
CacheCleanupExecutor
CompactionExecutor
GossipStage
HintsDispatcher
MemtableFlushWriter
MemtablePostFlush
MemtableReclaimMemory
MigrationStage
MutationStage
Native-Transport-Requests
PendingRangeCalculator
PerDiskMemtableFlushWriter_0
ReadStage
Repair-Task
RequestResponseStage
Sampler
SecondaryIndexManagement
ValidationExecutor
ViewBuildExecutor
{code}

I noticed that Keyspace Metrics have this note: "Most of these metrics are the 
same as the Table Metrics above, only they are aggregated at the Keyspace 
level." I think I've isolated those metrics on table that are not on keyspace 
to specifically be:

{code}
BloomFilterFalsePositives
BloomFilterFalseRatio
BytesAnticompacted
BytesFlushed
BytesMutatedAnticompaction
BytesPendingRepair
BytesRepaired
BytesUnrepaired
CompactionBytesWritten
CompressionRatio
CoordinatorReadLatency
CoordinatorScanLatency
CoordinatorWriteLatency
EstimatedColumnCountHistogram
EstimatedPartitionCount
EstimatedPartitionSizeHistogram
KeyCacheHitRate
LiveSSTableCount
MaxPartitionSize
MeanPartitionSize
MinPartitionSize
MutatedAnticompactionGauge
PercentRepaired
RowCacheHitOutOfRange
RowCacheHit
RowCacheMiss
SpeculativeSampleLatencyNanos
SyncTime
WaitingOnFreeMemtableSpace
DroppedMutations
{code}

Someone with greater knowledge of this area might consider it worth the effort 
to see if any of these metrics should be aggregated to the keyspace level in 
case they were inadvertently missed. In any case, perhaps the documentation 
could easily now reflect which metric names could be expected on Keyspace.

The DroppedMessage metrics have a much larger body of scopes than just what 
were documented:

{code}
ASYMMETRIC_SYNC_REQ
BATCH_REMOVE_REQ
BATCH_REMOVE_RSP
BATCH_STORE_REQ
BATCH_STORE_RSP
CLEANUP_MSG
COUNTER_MUTATION_REQ
COUNTER_MUTATION_RSP
ECHO_REQ
ECHO_RSP
FAILED_SESSION_MSG
FAILURE_RSP
FINALIZE_COMMIT_MSG
FINALIZE_PROMISE_MSG
FINALIZE_PROPOSE_MSG
GOSSIP_DIGEST_ACK
GOSSIP_DIGEST_ACK2
GOSSIP_DIGEST_SYN
GOSSIP_SHUTDOWN
HINT_REQ
HINT_RSP
INTERNAL_RSP
MUTATION_REQ
MUTATION_RSP
PAXOS_COMMIT_REQ
PAXOS_COMMIT_RSP
PAXOS_PREPARE_REQ
PAXOS_PREPARE_RSP
PAXOS_PROPOSE_REQ
PAXOS_PROPOSE_RSP
PING_REQ
PING_RSP
PREPARE_CONSISTENT_REQ
PREPARE_CONSISTENT_RSP
PREPARE_MSG
RANGE_REQ
RANGE_RSP
READ_REPAIR_REQ
READ_REPAIR_RSP
READ_REQ
READ_RSP
REPAIR_RSP
REPLICATION_DONE_REQ
REPLICATION_DONE_RSP
REQUEST_RSP
SCHEMA_PULL_REQ
SCHEMA_PULL_RSP
SCHEMA_PUSH_REQ
SCHEMA_PUSH_RSP
SCHEMA_VERSION_REQ
SCHEMA_VERSION_RSP
SNAPSHOT_MSG
SNAPSHOT_REQ
SNAPSHOT_RSP
STATUS_REQ
STATUS_RSP
SYNC_REQ
SYNC_RSP
TRUNCATE_REQ
TRUNCATE_RSP
VALIDATION_REQ
VALIDATION_RSP
_SAMPLE
_TEST_1
_TEST_2
_TRACE
{code}

I suppose I may yet be missing some metrics as my knowledge of what's available 
is limited to what I can get from JMX after cassandra initialization (and some 
initial starting commands) and what's int he documentation. If something is 
present that is missing from both then I won't know it's there.  Anyway, 
perhaps this issue can help build some discussion around the improvements that 
might be made given the analysis that has been provided so far. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15773) Add test to cover metrics related to the BufferPool

2020-05-19 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111412#comment-17111412
 ] 

David Capwell commented on CASSANDRA-15773:
---

[~djoshi] can we get your eyes?

> Add test to cover metrics related to the BufferPool
> ---
>
> Key: CASSANDRA-15773
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15773
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Stephen Mallette
>Assignee: Stephen Mallette
>Priority: Normal
>
> At this time there do not appear to be unit tests to validate 
> {{BufferPoolMetrics}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15819) nodetool enablefullquerylog doesn't allow caller to make non-blocking

2020-05-19 Thread David Capwell (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Capwell updated CASSANDRA-15819:
--
Reviewers: David Capwell, David Capwell  (was: David Capwell)
   David Capwell, David Capwell
   Status: Review In Progress  (was: Patch Available)

patch LGTM, running CI now.

> nodetool enablefullquerylog doesn't allow caller to make non-blocking
> -
>
> Key: CASSANDRA-15819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15819
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: David Capwell
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> $ ./bin/nodetool enablefullquerylog --path /tmp/deleteme --blocking false
> Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
> nodetool: Found unexpected parameters: [false]
> See 'nodetool help' or 'nodetool help '.
> {code}
> The root cause is boolean is special cased in airlift, so any time —blocking 
> is set it gets turned on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-15822) DOC - Improve C* configuration docs

2020-05-19 Thread Lorina Poland (Jira)
Lorina Poland created CASSANDRA-15822:
-

 Summary: DOC - Improve C* configuration docs
 Key: CASSANDRA-15822
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15822
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation/Website
Reporter: Lorina Poland
Assignee: Lorina Poland


Two sections, Getting started > Configuring  and Configuration, could use 
improvement. 

Adding information about some of the other config files besides cassandra.yaml 
is one key goal. 

 

At the risk of contaminating one ticket with another project, I started 
creating a separate glossary file, so that key terms in configuration can be 
linked to a glossary that describes terms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15822) DOC - Improve C* configuration docs

2020-05-19 Thread Lorina Poland (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorina Poland updated CASSANDRA-15822:
--
Reviewers: Jon Haddad

> DOC - Improve C* configuration docs
> ---
>
> Key: CASSANDRA-15822
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15822
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Website
>Reporter: Lorina Poland
>Assignee: Lorina Poland
>Priority: Normal
>
> Two sections, Getting started > Configuring  and Configuration, could use 
> improvement. 
> Adding information about some of the other config files besides 
> cassandra.yaml is one key goal. 
>  
> At the risk of contaminating one ticket with another project, I started 
> creating a separate glossary file, so that key terms in configuration can be 
> linked to a glossary that describes terms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-05-19 Thread Olivier Michallat (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111564#comment-17111564
 ] 

Olivier Michallat commented on CASSANDRA-15299:
---

OK, not much to add on the field sizes, I hope the future will prove you right.
{quote}The only real difference between having this in the header and nominally 
being part of the body is that the integrity of the header is protected
{quote}
The integrity of the payload is protected as well. Is it that the guarantee is 
stronger for the header because it is shorter?

 

Speaking of the header, I think there is an issue in the way it's currently 
encoded. It's probably a bit early code for reviews, but I want to make sure I 
understand this correctly. If I use {{FrameEncoderCrc}} for a self-contained, 
uncompressed outer frame with a 4-bytes payload:
{code:java}
ByteBuffer buffer = ByteBuffer.allocate(6);
FrameEncoderCrc.writeHeader(buffer, true, 4);
System.out.println(Bytes.toHexString(buffer)); {code}
The buffer ends up with {{0x04000201f9f2}}. In binary that's 
{{011110010010}}. If I align the first 24 
bytes in the spec's diagram, the length is in a weird order, and the flag is 
not in the right position:
{code:java}
 0   1   2   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0|0|0 0 0 0 1 0| CRC24...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ {code}
That's because we do a byte-by-byte reversal, but our fields are not aligned on 
byte boundaries.

I think the simplest fix would be to build the header data in network order, 
and only reverse it when we compute the CRC:
{code:java}
int header3b = dataLength;
header3b <<= 1;
if (isSelfContained)
header3b |= 1;
header3b <<= 6; // padding
int crc = crc24(Integer.reverseBytes(header3b)>>8, 3);

// Same as put3b, but big-endian
frame.put(0, (byte)(header3b >>> 16));
frame.put(1, (byte)(header3b >>> 8) );
frame.put(2, (byte) header3b);

put3b(frame, 3, crc);
{code}
This produces {{0x000240979ee2}}, or in binary:
{code:java}
  0   1   2   3
  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
 |0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0|1|0 0 0 0 0 0| CRC24...
 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
{code}

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> ---
>
> Key: CASSANDRA-15299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client
>Reporter: Aleksey Yeschenko
>Assignee: Sam Tunnicliffe
>Priority: Normal
>  Labels: protocolv5
> Fix For: 4.0-alpha
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address 

[jira] [Updated] (CASSANDRASC-23) Set up structure for handling multiple Cassandra versions

2020-05-19 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-23?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRASC-23:
--
Reviewers: Dinesh Joshi, Vinay Chella  (was: Dinesh Joshi)

> Set up structure for handling multiple Cassandra versions
> -
>
> Key: CASSANDRASC-23
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-23
> Project: Sidecar for Apache Cassandra
>  Issue Type: Improvement
>  Components: Configuration
>Reporter: Jon Haddad
>Assignee: Jon Haddad
>Priority: Normal
>
> The first sidecar release will be for Cassandra 4.0, but one of the project 
> goals is to be able to handle multiple versions.  This will be especially 
> important in mixed version clusters, or even mixed version 1:N sidecar to C* 
> nodes (see CASSANDRASC-17).
> This JIRA is to lay the foundational work for supporting multiple Cassandra 
> versions. 
> Each Cassandra version (for example \{{cassandra40}}, \{{cassandra50}}, will 
> be in a separate Gradle subproject, implementing a common interface defined 
> in a \{{common}} subproject.  A common cassandra integration testing 
> framework will be able to test every version to ensure each version adheres 
> to the expected interface.
> Each module will define the minimum compatibility it supports, so if someone 
> (for example) wants to implement a Cassandra30 module for their own cluster, 
> they will have the ability to do so.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Moved] (CASSANDRASC-24) C* Management process

2020-05-19 Thread Jon Haddad (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRASC-24?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad moved CASSANDRA-14395 to CASSANDRASC-24:
---

Key: CASSANDRASC-24  (was: CASSANDRA-14395)
Project: Sidecar for Apache Cassandra  (was: Cassandra)

> C* Management process
> -
>
> Key: CASSANDRASC-24
> URL: https://issues.apache.org/jira/browse/CASSANDRASC-24
> Project: Sidecar for Apache Cassandra
>  Issue Type: New Feature
>Reporter: Dinesh Joshi
>Assignee: Dinesh Joshi
>Priority: Normal
> Attachments: Looking towards an Official Cassandra Sidecar - 
> Netflix.pdf
>
>
> I would like to propose amending Cassandra's architecture to include a 
> management process. The detailed description is here: 
> https://docs.google.com/document/d/1UV9pE81NaIUF3g4L1wxq09nT11AkSQcMijgLFwGsY3s/edit
> I'd like to propose seeding this with a few simple use-cases such as Health 
> Checks, Bulk Commands with a simple REST API interface. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15146) Transitional TLS server configuration options are overly complex

2020-05-19 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111588#comment-17111588
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15146:
-

[~micarlise], CASSANDRA-15234 is almost done. Please let me know if I can help 
you with this one. 

> Transitional TLS server configuration options are overly complex
> 
>
> Key: CASSANDRA-15146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Encryption, Local/Config
>Reporter: Joey Lynch
>Assignee: M Carlise
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> It appears as part of the port from transitional client TLS to transitional 
> server TLS in CASSANDRA-10404 (the ability to switch a cluster to using 
> {{internode_encryption}} without listening on two ports and without downtime) 
> we carried the {{enabled}} setting over from the client implementation. I 
> believe that the {{enabled}} option is redundant to {{internode_encryption}} 
> and {{optional}} and it should therefore be removed prior to the 4.0 release 
> where we will have to start respecting that interface. 
> Current trunk yaml:
> {noformat}
> server_encryption_options:
>   
> # set to true for allowing secure incoming connections
>   
> enabled: false
>   
> # If enabled and optional are both set to true, encrypted and unencrypted 
> connections are handled on the storage_port
> optional: false   
>   
>   
>   
> 
> # if enabled, will open up an encrypted listening socket on 
> ssl_storage_port. Should be used
> # during upgrade to 4.0; otherwise, set to false. 
>   
> enable_legacy_ssl_storage_port: false 
>   
> # on outbound connections, determine which type of peers to securely 
> connect to. 'enabled' must be set to true.
> internode_encryption: none
>   
> keystore: conf/.keystore  
>   
> keystore_password: cassandra  
>   
> truststore: conf/.truststore  
>   
> truststore_password: cassandra
> {noformat}
> I propose we eliminate {{enabled}} and just use {{optional}} and 
> {{internode_encryption}} to determine the listener setup. I also propose we 
> change the default of {{optional}} to true. We could also re-name 
> {{optional}} since it's a new option but I think it's good to stay consistent 
> with the client and use {{optional}}.
> ||optional||internode_encryption||description||
> |true|none|(default) No encryption is used but if a server reaches out with 
> it we'll use it|
> |false|dc|Encryption is required for inter-dc communication, but not intra-dc|
> |false|all|Encryption is required for all communication|
> |false|none|We only listen for unencrypted connections|
> |true|dc|Encryption is used for inter-dc communication but is not required|
> |true|all|Encryption is used for all communication but is not required|
> From these states it is clear when we should be accepting TLS connections 
> (all except for false and none) as well as when we must enforce it.
> To transition without downtime from an un-encrypted cluster to an encrypted 
> cluster the user would do the following:
> 1. After adding valid truststores, change {{internode_encryption}} to the 
> desired level of encryption (recommended {{all}}) and restart Cassandra
>  2. Change {{optional=false}} and restart Cassandra to enforce #1
> If {{optional}} defaulted to {{false}} as it does right now we'd need a third 
> restart to first change {{optional}} to {{true}}, which given my 
> understanding of the OptionalSslHandler isn't really relevant.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15797) Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest

2020-05-19 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111591#comment-17111591
 ] 

Ekaterina Dimitrova commented on CASSANDRA-15797:
-

Thanks [~yifanc] and [~vinaykumarcse]
Was this committed or is there anything else needed to be done here? 

> Fix flaky BinLogTest - org.apache.cassandra.utils.binlog.BinLogTest
> ---
>
> Key: CASSANDRA-15797
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15797
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/unit
>Reporter: Jon Meredith
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0-alpha
>
>
> An internal CI system is failing BinLogTest somewhat frequently under JDK11.  
> Configuration was recently changed to reduce the number of cores the tests 
> run with, however it is reproducible on an 8 core laptop.
> {code}
> [junit-timeout] OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC 
> was deprecated in version 9.0 and will likely be removed in a future release.
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest
> [Junit-timeout] WARNING: An illegal reflective access operation has occurred
> [junit-timeout] WARNING: Illegal reflective access by 
> net.openhft.chronicle.core.Jvm (file:/.../lib/chronicle-core-1.16.4.jar) to 
> field java.nio.Bits.RESERVED_MEMORY
> [junit-timeout] WARNING: Please consider reporting this to the maintainers of 
> net.openhft.chronicle.core.Jvm
> [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of 
> further illegal reflective access operations
> [junit-timeout] WARNING: All illegal access operations will be denied in a 
> future release
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests 
> run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 13.895 sec
> [junit-timeout]
> [junit-timeout] Testcase: 
> testPutAfterStop(org.apache.cassandra.utils.binlog.BinLogTest): FAILED
> [junit-timeout] expected: but 
> was:
> [junit-timeout] junit.framework.AssertionFailedError: expected: but 
> was:
> [junit-timeout] at 
> org.apache.cassandra.utils.binlog.BinLogTest.testPutAfterStop(BinLogTest.java:431)
> [junit-timeout] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> [junit-timeout] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> [junit-timeout] at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> [junit-timeout]
> [junit-timeout]
> [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED
> {code}
> There's also a different failure under JDK8
> {code}
> junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests 
> run: 13, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 15.273 sec
> [junit-timeout]
> [junit-timeout] Testcase: 
> testBinLogStartStop(org.apache.cassandra.utils.binlog.BinLogTest):  FAILED
> [junit-timeout] expected:<2> but was:<0>
> [junit-timeout] junit.framework.AssertionFailedError: expected:<2> but was:<0>
> [junit-timeout] at 
> org.apache.cassandra.utils.binlog.BinLogTest.testBinLogStartStop(BinLogTest.java:172)
> [junit-timeout]
> [junit-timeout]
> [junit-timeout] Test org.apache.cassandra.utils.binlog.BinLogTest FAILED
> {code}
> Reproducer
> {code}
> PASSED=0; time  { while ant testclasslist -Dtest.classlistprefix=unit 
> -Dtest.classlistfile=<(echo 
> org/apache/cassandra/utils/binlog/BinLogTest.java); do PASSED=$((PASSED+1)); 
> echo PASSED $PASSED; done }; echo FAILED after $PASSED runs.
> {code}
> In the last four attempts it has taken 31, 38, 27 and 10 rounds respectively 
> under JDK11 and took 51 under JDK8 (about 15 minutes).
> I have not tried running in a cpu-limited container or anything like that yet.
> Additionally, this went past in the logs a few times (under JDK11).  No idea 
> if it's just an artifact of weird test setup, or something more serious.
> {code}
> [junit-timeout] WARNING: Please consider reporting this to the maintainers of 
> net.openhft.chronicle.core.Jvm
> [junit-timeout] WARNING: Use --illegal-access=warn to enable warnings of 
> further illegal reflective access operations
> [junit-timeout] WARNING: All illegal access operations will be denied in a 
> future release
> [junit-timeout] Testsuite: org.apache.cassandra.utils.binlog.BinLogTest Tests 
> run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 12.839 sec
> [junit-timeout]
> [junit-timeout] java.lang.Throwable: 1e53135d-main creation ref-count=1
> [junit-timeout] at 
> 

[cassandra] branch trunk updated: Add isTransient to SSTableMetadataView

2020-05-19 Thread snazy
This is an automated email from the ASF dual-hosted git repository.

snazy pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new 3f689e9  Add isTransient to SSTableMetadataView
3f689e9 is described below

commit 3f689e93768ea670f7a8351ec30128dd4b410c9c
Author: Ekaterina Dimitrova 
AuthorDate: Wed May 13 15:07:56 2020 -0400

Add isTransient to SSTableMetadataView

patch by Sequoyha Pelletier; reviewed by Ekaterina Dimitrova for 
CASSANDRA-15806
---
 CHANGES.txt| 1 +
 src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java | 1 +
 2 files changed, 2 insertions(+)

diff --git a/CHANGES.txt b/CHANGES.txt
index 43aef72..56212f9 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 4.0-alpha5
+ * Add isTransient to SSTableMetadataView (CASSANDRA-15806)
  * Fix tools/bin/fqltool for all shells (CASSANDRA-15820)
  * Fix clearing of legacy size_estimates (CASSANDRA-15776)
  * Update port when reconnecting to pre-4.0 SSL storage (CASSANDRA-15727)
diff --git a/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java 
b/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java
index eb8670c..6366fd5 100755
--- a/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java
+++ b/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java
@@ -432,6 +432,7 @@ public class SSTableMetadataViewer
 field("ClusteringTypes", clusteringTypes.toString());
 field("StaticColumns", FBUtilities.toString(statics));
 field("RegularColumns", FBUtilities.toString(regulars));
+field("IsTransient", stats.isTransient);
 }
 }
 


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-19 Thread Dor Laor (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111693#comment-17111693
 ] 

Dor Laor commented on CASSANDRA-15823:
--

In Scylla we see a similar need and there is a suggestion to use UUID

identification instead of IPs: [https://github.com/scylladb/scylla/issues/6403]

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to be pre-computed 
> _*and match*_ across all k8s control planes. By using hostnames the 
> underlying IP address does not matter and will 

[jira] [Updated] (CASSANDRA-15783) test_optimized_primary_range_repair - transient_replication_test.TestTransientReplication

2020-05-19 Thread Blake Eggleston (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-15783:

Status: Ready to Commit  (was: Review In Progress)

+1

> test_optimized_primary_range_repair - 
> transient_replication_test.TestTransientReplication
> -
>
> Key: CASSANDRA-15783
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15783
> Project: Cassandra
>  Issue Type: Bug
>  Components: Test/dtest
>Reporter: Ekaterina Dimitrova
>Assignee: Ekaterina Dimitrova
>Priority: Normal
> Fix For: 4.0-alpha
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Dtest failure.
> Example:
> https://app.circleci.com/pipelines/github/ekaterinadimitrova2/cassandra/118/workflows/9e57522d-52fa-4d44-88d8-5cec0e87f517/jobs/585/tests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15819) nodetool enablefullquerylog doesn't allow caller to make non-blocking

2020-05-19 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111658#comment-17111658
 ] 

David Capwell commented on CASSANDRA-15819:
---

CI didn't pass but looks like known flaky tests and an intermediate issue with 
circle ci and docker images.  

LGTM +1.

Need another review as well.

> nodetool enablefullquerylog doesn't allow caller to make non-blocking
> -
>
> Key: CASSANDRA-15819
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15819
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/nodetool
>Reporter: David Capwell
>Assignee: Berenguer Blasi
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {code}
> $ ./bin/nodetool enablefullquerylog --path /tmp/deleteme --blocking false
> Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
> nodetool: Found unexpected parameters: [false]
> See 'nodetool help' or 'nodetool help '.
> {code}
> The root cause is boolean is special cased in airlift, so any time —blocking 
> is set it gets turned on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-19 Thread Jeff Jirsa (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17111724#comment-17111724
 ] 

Jeff Jirsa commented on CASSANDRA-15823:


> Cassandra is protected here as we already have logic in place to update peers 
> when we come up with the same host id, but a different IP address.


This definitely isn’t true / strictly safe. In fact it’s trivial to violate 
consistency / lose data by swapping the IP of two Pods/instances on the same 
host. 

We really need everything to be based on UUIDs, not ip or port or host name. 
And we really really really shouldn’t assume that dns is universally available 
or correct (because that’s just not always true, even in 2020).


> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see 

[jira] [Created] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-19 Thread Christopher Bradford (Jira)
Christopher Bradford created CASSANDRA-15823:


 Summary: Support for networking via identity instead of IP
 Key: CASSANDRA-15823
 URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
 Project: Cassandra
  Issue Type: Improvement
Reporter: Christopher Bradford


TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
resolution to different IP addresses per DC that may then be forwarded to nodes 
on remote networks without requiring node to node IP connectivity for cross-dc 
links.

 

This approach should not affect existing deployments as those could continue to 
use IPs as the hostname and skip resolution.

With orchestration platforms like Kubernetes and the usage of ephemeral 
containers in environments today we should consider some changes to how we 
handle the tracking of nodes and their network location. Currently we maintain 
a mapping between host ids and IP addresses.

 

With traditional infrastructure, if a node goes down it, usually, comes back up 
with the same IP. In some environments this contract may be explicit with 
virtual IPs that may move between hosts. In newer deployments, like on 
Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
assigned an IP address at start time. Should the pod be restarted or scheduled 
on a different host there is no guarantee we would have the same IP. Cassandra 
is protected here as we already have logic in place to update peers when we 
come up with the same host id, but a different IP address.

 

There are ways to get Kubernetes to assign a specific IP per Pod. Most 
recommendations involve the use of a service per pod. Communication with the 
fixed service IP would automatically forward to the associated pod, regardless 
of address. We _could_ use this approach, but it seems like this would 
needlessly create a number of extra resources in our k8s cluster to get around 
the problem. Which, to be fair, doesn't seem like much of a problem with the 
aforementioned mitigations built into C*.

 

So what is the _actual_ problem? *Cross-region, cross-cloud, hybrid-deployment 
connectivity between pods is a pain.* This can be solved with significant 
investment by those who want to deploy these types of topologies. You can 
definitely configure connectivity between clouds over dedicated connections, or 
VPN tunnels. With a big chunk of time insuring that pod to pod connectivity 
just works even if those pods are managed by separate control planes, but that 
again requires time and talent. There are a number of edge cases to support 
between the ever so slight, but very important, differences in cloud vendor 
networks.

 

Recently there have been a number of innovations that aid in the deployment and 
operation of these types of applications on Kubernetes. Service meshes support 
distributed microservices running across multiple k8s cluster control planes in 
disparate networks. Instead of directly connecting to IP addresses of remote 
services instead they use a hostname. With this approach, hostname traffic may 
then be routed to a proxy that sends traffic over the WAN (sometimes with mTLS) 
to another proxy pod in the remote cluster which then forwards the data along 
to the correct pod in that network. (See attached diagrams)

 

Which brings us to the point of this ticket. Instead of mapping host ids to 
IPs, use hostnames (and update the underlying address periodically instead of 
caching indefinitely). This allows resolution to different IP addresses per DC 
(k8s cluster) that may then be forwarded to nodes (pods) on remote networks 
(k8s clusters) without requiring node to node (pod to pod) IP connectivity 
between them. Traditional deployments can still function like they do today 
(even if operators opt to keep using IPs as identifiers instead of hostnames). 
This proxy approach is then enabled like those we see in service meshes.

 

_Notes_

C* already has the concept of broadcast addresses vs those which are bound on 
the node. This approach _could_ be leveraged to provide the behavior we're 
looking for, but then the broadcast values would need to be pre-computed _*and 
match*_ across all k8s control planes. By using hostnames the underlying IP 
address does not matter and will most likely be different in each cluster.

 

I recognize the title may be a bit misleading as we would obviously still 
communicate over TCP/IP., but it concisely conveys the point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15823) Support for networking via identity instead of IP

2020-05-19 Thread Christopher Bradford (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Bradford updated CASSANDRA-15823:
-
Attachment: consul-mesh-gateways.png
linkerd-service-mirroring.svg
istio-multicluster-with-gateways.svg

> Support for networking via identity instead of IP
> -
>
> Key: CASSANDRA-15823
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15823
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Christopher Bradford
>Priority: Normal
> Attachments: consul-mesh-gateways.png, 
> istio-multicluster-with-gateways.svg, linkerd-service-mirroring.svg
>
>
> TL;DR: Instead of mapping host ids to IPs, use hostnames. This allows 
> resolution to different IP addresses per DC that may then be forwarded to 
> nodes on remote networks without requiring node to node IP connectivity for 
> cross-dc links.
>  
> This approach should not affect existing deployments as those could continue 
> to use IPs as the hostname and skip resolution.
> 
> With orchestration platforms like Kubernetes and the usage of ephemeral 
> containers in environments today we should consider some changes to how we 
> handle the tracking of nodes and their network location. Currently we 
> maintain a mapping between host ids and IP addresses.
>  
> With traditional infrastructure, if a node goes down it, usually, comes back 
> up with the same IP. In some environments this contract may be explicit with 
> virtual IPs that may move between hosts. In newer deployments, like on 
> Kubernetes, this contract is not possible. Pods (analogous to nodes) are 
> assigned an IP address at start time. Should the pod be restarted or 
> scheduled on a different host there is no guarantee we would have the same 
> IP. Cassandra is protected here as we already have logic in place to update 
> peers when we come up with the same host id, but a different IP address.
>  
> There are ways to get Kubernetes to assign a specific IP per Pod. Most 
> recommendations involve the use of a service per pod. Communication with the 
> fixed service IP would automatically forward to the associated pod, 
> regardless of address. We _could_ use this approach, but it seems like this 
> would needlessly create a number of extra resources in our k8s cluster to get 
> around the problem. Which, to be fair, doesn't seem like much of a problem 
> with the aforementioned mitigations built into C*.
>  
> So what is the _actual_ problem? *Cross-region, cross-cloud, 
> hybrid-deployment connectivity between pods is a pain.* This can be solved 
> with significant investment by those who want to deploy these types of 
> topologies. You can definitely configure connectivity between clouds over 
> dedicated connections, or VPN tunnels. With a big chunk of time insuring that 
> pod to pod connectivity just works even if those pods are managed by separate 
> control planes, but that again requires time and talent. There are a number 
> of edge cases to support between the ever so slight, but very important, 
> differences in cloud vendor networks.
>  
> Recently there have been a number of innovations that aid in the deployment 
> and operation of these types of applications on Kubernetes. Service meshes 
> support distributed microservices running across multiple k8s cluster control 
> planes in disparate networks. Instead of directly connecting to IP addresses 
> of remote services instead they use a hostname. With this approach, hostname 
> traffic may then be routed to a proxy that sends traffic over the WAN 
> (sometimes with mTLS) to another proxy pod in the remote cluster which then 
> forwards the data along to the correct pod in that network. (See attached 
> diagrams)
>  
> Which brings us to the point of this ticket. Instead of mapping host ids to 
> IPs, use hostnames (and update the underlying address periodically instead of 
> caching indefinitely). This allows resolution to different IP addresses per 
> DC (k8s cluster) that may then be forwarded to nodes (pods) on remote 
> networks (k8s clusters) without requiring node to node (pod to pod) IP 
> connectivity between them. Traditional deployments can still function like 
> they do today (even if operators opt to keep using IPs as identifiers instead 
> of hostnames). This proxy approach is then enabled like those we see in 
> service meshes.
>  
> _Notes_
> C* already has the concept of broadcast addresses vs those which are bound on 
> the node. This approach _could_ be leveraged to provide the behavior we're 
> looking for, but then the broadcast values would need to be pre-computed 
> _*and match*_ across all k8s control planes. By using hostnames the 
> underlying IP address does not matter and will most likely be different in 

[jira] [Updated] (CASSANDRA-15676) flaky test testWriteUnknownResult- org.apache.cassandra.distributed.test.CasWriteTest

2020-05-19 Thread Robert Stupp (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-15676:
-
Source Control Link: 
https://github.com/apache/cassandra/commit/fdcd0dff216d9e1ad242be1a7d5be3ef67044ac3
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

+1 from me as well

Thanks for the patch!

Committed as 
[fdcd0dff216d9e1ad242be1a7d5be3ef67044ac3|https://github.com/apache/cassandra/commit/fdcd0dff216d9e1ad242be1a7d5be3ef67044ac3]
 to [trunk|https://github.com/apache/cassandra/tree/trunk].


> flaky test testWriteUnknownResult- 
> org.apache.cassandra.distributed.test.CasWriteTest
> -
>
> Key: CASSANDRA-15676
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15676
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Kevin Gallardo
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: Screen Shot 2020-05-07 at 7.25.19 PM.png
>
>
> Failure observed in: 
> https://app.circleci.com/pipelines/github/newkek/cassandra/33/workflows/54007cf7-4424-4ec1-9655-665f6044e6d1/jobs/187/tests
> {noformat}
> testWriteUnknownResult - org.apache.cassandra.distributed.test.CasWriteTest
> junit.framework.AssertionFailedError: Expecting cause to be 
> CasWriteUncertainException
>   at 
> org.apache.cassandra.distributed.test.CasWriteTest.testWriteUnknownResult(CasWriteTest.java:257)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15806) C* 4.0 is missing a way to identify transient/non-transient SSTables on disk

2020-05-19 Thread Robert Stupp (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-15806:
-
  Since Version: 4.0-alpha
Source Control Link: 
https://github.com/apache/cassandra/commit/3f689e93768ea670f7a8351ec30128dd4b410c9c
 Resolution: Fixed
 Status: Resolved  (was: Ready to Commit)

Thanks!

Committed as 
[3f689e93768ea670f7a8351ec30128dd4b410c9c|https://github.com/apache/cassandra/commit/3f689e93768ea670f7a8351ec30128dd4b410c9c]
 to [trunk|https://github.com/apache/cassandra/tree/trunk].


> C* 4.0 is missing a way to identify transient/non-transient SSTables on disk
> 
>
> Key: CASSANDRA-15806
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15806
> Project: Cassandra
>  Issue Type: Bug
>  Components: Tool/sstable
>Reporter: sequoyha pelletier
>Assignee: sequoyha pelletier
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: 15806-4.0.txt
>
>
> Currently, there is no way to identify SSTables that were created as 
> transient replicated data. Even thought the feature is experimental we should 
> open that up for those that want to experiment. This seems to be an 
> oversight. I have added the missing line of code to the SStableMetadataViewer 
> and will attach a patch shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15676) flaky test testWriteUnknownResult- org.apache.cassandra.distributed.test.CasWriteTest

2020-05-19 Thread Robert Stupp (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-15676:
-
Reviewers: Ekaterina Dimitrova, Yifan Cai, Robert Stupp  (was: Ekaterina 
Dimitrova, Robert Stupp, Yifan Cai)
   Ekaterina Dimitrova, Yifan Cai, Robert Stupp  (was: Ekaterina 
Dimitrova, Yifan Cai)
   Status: Review In Progress  (was: Patch Available)

> flaky test testWriteUnknownResult- 
> org.apache.cassandra.distributed.test.CasWriteTest
> -
>
> Key: CASSANDRA-15676
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15676
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Kevin Gallardo
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: Screen Shot 2020-05-07 at 7.25.19 PM.png
>
>
> Failure observed in: 
> https://app.circleci.com/pipelines/github/newkek/cassandra/33/workflows/54007cf7-4424-4ec1-9655-665f6044e6d1/jobs/187/tests
> {noformat}
> testWriteUnknownResult - org.apache.cassandra.distributed.test.CasWriteTest
> junit.framework.AssertionFailedError: Expecting cause to be 
> CasWriteUncertainException
>   at 
> org.apache.cassandra.distributed.test.CasWriteTest.testWriteUnknownResult(CasWriteTest.java:257)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-15676) flaky test testWriteUnknownResult- org.apache.cassandra.distributed.test.CasWriteTest

2020-05-19 Thread Robert Stupp (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-15676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-15676:
-
Status: Ready to Commit  (was: Review In Progress)

> flaky test testWriteUnknownResult- 
> org.apache.cassandra.distributed.test.CasWriteTest
> -
>
> Key: CASSANDRA-15676
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15676
> Project: Cassandra
>  Issue Type: Task
>  Components: Test/dtest
>Reporter: Kevin Gallardo
>Assignee: Gianluca Righetto
>Priority: Normal
> Fix For: 4.0-alpha
>
> Attachments: Screen Shot 2020-05-07 at 7.25.19 PM.png
>
>
> Failure observed in: 
> https://app.circleci.com/pipelines/github/newkek/cassandra/33/workflows/54007cf7-4424-4ec1-9655-665f6044e6d1/jobs/187/tests
> {noformat}
> testWriteUnknownResult - org.apache.cassandra.distributed.test.CasWriteTest
> junit.framework.AssertionFailedError: Expecting cause to be 
> CasWriteUncertainException
>   at 
> org.apache.cassandra.distributed.test.CasWriteTest.testWriteUnknownResult(CasWriteTest.java:257)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated: Fixed non-deterministic test in CasWriteTest

2020-05-19 Thread snazy
This is an automated email from the ASF dual-hosted git repository.

snazy pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/trunk by this push:
 new fdcd0df  Fixed non-deterministic test in CasWriteTest
fdcd0df is described below

commit fdcd0dff216d9e1ad242be1a7d5be3ef67044ac3
Author: Gianluca Righetto 
AuthorDate: Wed May 20 06:30:29 2020 +0200

Fixed non-deterministic test in CasWriteTest

patch by Gianluca Righetto; reviewed by Ekaterina Dimitrova & Yifan Cai for 
CASSANDRA-15676
---
 .../cassandra/distributed/test/CasWriteTest.java   | 40 +-
 1 file changed, 23 insertions(+), 17 deletions(-)

diff --git 
a/test/distributed/org/apache/cassandra/distributed/test/CasWriteTest.java 
b/test/distributed/org/apache/cassandra/distributed/test/CasWriteTest.java
index 1d886cf..81b52f7 100644
--- a/test/distributed/org/apache/cassandra/distributed/test/CasWriteTest.java
+++ b/test/distributed/org/apache/cassandra/distributed/test/CasWriteTest.java
@@ -32,6 +32,7 @@ import java.util.function.Consumer;
 import java.util.function.Function;
 import java.util.function.Supplier;
 
+import com.google.common.util.concurrent.Uninterruptibles;
 import org.junit.After;
 import org.junit.AfterClass;
 import org.junit.Assert;
@@ -249,28 +250,33 @@ public class CasWriteTest extends TestBaseImpl
 @Test
 public void testWriteUnknownResult()
 {
-while (true)
-{
-cluster.filters().reset();
-int pk = pkGen.getAndIncrement();
-
cluster.filters().verbs(Verb.PAXOS_PROPOSE_REQ.id).from(1).to(3).messagesMatching((from,
 to, msg) -> {
+cluster.filters().reset();
+int pk = pkGen.getAndIncrement();
+CountDownLatch ready = new CountDownLatch(1);
+cluster.filters().verbs(Verb.PAXOS_PROPOSE_REQ.id).from(1).to(2, 
3).messagesMatching((from, to, msg) -> {
+if (to == 2)
+{
 // Inject a single CAS request in-between prepare and propose 
phases
 cluster.coordinator(2).execute(mkCasInsertQuery((a) -> pk, 1, 
2),
ConsistencyLevel.QUORUM);
-return false;
-}).drop();
-
-try
-{
-cluster.coordinator(1).execute(mkCasInsertQuery((a) -> pk, 1, 
1), ConsistencyLevel.QUORUM);
-}
-catch (Throwable t)
-{
-Assert.assertEquals("Expecting cause to be 
CasWriteUncertainException",
-
CasWriteUnknownResultException.class.getCanonicalName(), 
t.getClass().getCanonicalName());
-return;
+ready.countDown();
+} else {
+Uninterruptibles.awaitUninterruptibly(ready);
 }
+return false;
+}).drop();
+
+try
+{
+cluster.coordinator(1).execute(mkCasInsertQuery((a) -> pk, 1, 1), 
ConsistencyLevel.QUORUM);
+}
+catch (Throwable t)
+{
+Assert.assertEquals("Expecting cause to be 
CasWriteUnknownResultException",
+
CasWriteUnknownResultException.class.getCanonicalName(), 
t.getClass().getCanonicalName());
+return;
 }
+Assert.fail("Expecting test to throw a 
CasWriteUnknownResultException");
 }
 
 // every invokation returns a query with an unique pk


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org