[jira] [Created] (CASSANDRA-16252) centos 7

2020-11-06 Thread Karim Chowdhury (Jira)
Karim Chowdhury created CASSANDRA-16252:
---

 Summary: centos 7 
 Key: CASSANDRA-16252
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16252
 Project: Cassandra
  Issue Type: Bug
Reporter: Karim Chowdhury


We are using centos 7. we have an 10 node Cassandra Cluster and currently we 
are facing issues with one of them with following error:

 

Fatal exception in thread Thread[CompactionExecutor:1,1,main]
java.io.IOError: java.io.EOFException
 
or 
 
java.lang.AssertionError: Added column does not sort as the last colum I 
already tried to remove the data folder, start scrub and repair. Error happens 
after 24h again.
 
Need help



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15214) OOMs caught and not rethrown

2020-11-06 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227604#comment-17227604
 ] 

David Capwell commented on CASSANDRA-15214:
---

+1 from me with small comment, see PR.

I tested this patch by breaking byte buffer allocation to run out of direct 
memory, in doing so found an edge case on client (.transport package) code, so 
once that is fixed client and internode shut down on OOM.

> OOMs caught and not rethrown
> 
>
> Key: CASSANDRA-15214
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15214
> Project: Cassandra
>  Issue Type: Bug
>  Components: Messaging/Client, Messaging/Internode
>Reporter: Benedict Elliott Smith
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 4.0, 4.0-rc
>
> Attachments: oom-experiments.zip
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Netty (at least, and perhaps elsewhere in Executors) catches all exceptions, 
> so presently there is no way to ensure that an OOM reaches the JVM handler to 
> trigger a crash/heapdump.
> It may be that the simplest most consistent way to do this would be to have a 
> single thread spawned at startup that waits for any exceptions we must 
> propagate to the Runtime.
> We could probably submit a patch upstream to Netty, but for a guaranteed 
> future proof approach, it may be worth paying the cost of a single thread.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.

2020-11-06 Thread Brandon Williams (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brandon Williams updated CASSANDRA-16248:
-
Resolution: Fixed
Status: Resolved  (was: Open)

Works for me too, that commit got it.

> GossipTest hangs until timeout, then fails.
> ---
>
> Key: CASSANDRA-16248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16248
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown, Messaging/Internode, 
> Test/dtest/java
>Reporter: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0-beta4
>
>
> A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}
> * There seems to have been a merge/commit race between CASSANDRA-16146 
> ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e])
>  and CASSANDRA-15935 
> ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]).
>  The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, 
> but the latter changed the method signature, so this never actually gets 
> injected. This causes a latch in the test not to be counted down and it hangs 
> until timeout.
> * After fixing the test code, it still hangs due to changes to 
> {{server_encryption_options}} initialization in CASSANDRA-16144 
> ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]).
>  It appears to be causing an incorrect keystore location to be specified, 
> which causes instance startup to fail, again leading to the test hanging 
> until it times out. I don't have the cycles to dig into this further right 
> now, but reverting that commit (and making the test fix above) restores the 
> green bar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.

2020-11-06 Thread David Capwell (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227583#comment-17227583
 ] 

David Capwell commented on CASSANDRA-16248:
---

{code}
./ci-test org/apache/cassandra/distributed/test/GossipTest
...
testclasslist:
 [echo] Number of test runners: 1
[mkdir] Created dir: 
/Users/davidcapwell/src/github/apache/cassandra-trunk/build/test/cassandra
[mkdir] Created dir: 
/Users/davidcapwell/src/github/apache/cassandra-trunk/build/test/output
[junit-timeout] Picked up _JAVA_OPTIONS: -Djava.net.preferIPv4Stack=true
[junit-timeout] Testsuite: org.apache.cassandra.distributed.test.GossipTest
[junit-timeout] Testsuite: org.apache.cassandra.distributed.test.GossipTest 
Tests run: 2, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 56.095 sec
[junit-timeout]

BUILD SUCCESSFUL
Total time: 1 minute 37 seconds
{code}

trunk is working for me again, thanks for the commit [~yifanc] and 
[~brandon.williams]

> GossipTest hangs until timeout, then fails.
> ---
>
> Key: CASSANDRA-16248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16248
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown, Messaging/Internode, 
> Test/dtest/java
>Reporter: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0-beta4
>
>
> A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}
> * There seems to have been a merge/commit race between CASSANDRA-16146 
> ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e])
>  and CASSANDRA-15935 
> ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]).
>  The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, 
> but the latter changed the method signature, so this never actually gets 
> injected. This causes a latch in the test not to be counted down and it hangs 
> until timeout.
> * After fixing the test code, it still hangs due to changes to 
> {{server_encryption_options}} initialization in CASSANDRA-16144 
> ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]).
>  It appears to be causing an incorrect keystore location to be specified, 
> which causes instance startup to fail, again leading to the test hanging 
> until it times out. I don't have the cycles to dig into this further right 
> now, but reverting that commit (and making the test fix above) restores the 
> green bar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-15299) CASSANDRA-13304 follow-up: improve checksumming and compression in protocol v5-beta

2020-11-06 Thread Sam Tunnicliffe (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-15299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227575#comment-17227575
 ] 

Sam Tunnicliffe commented on CASSANDRA-15299:
-

I just pushed a commit which renames {{o.a.c.t.Frame}} to {{Envelope}}, which 
IMO this greatly reduces the cognitive friction here. There are no client 
facing changes involved, the renaming is purely internal (aside from docs, I've 
updated the WIP asciidoc on V5 framing, but will get the main protocol spec in 
CASSANDRA-14688, asap)

> CASSANDRA-13304 follow-up: improve checksumming and compression in protocol 
> v5-beta
> ---
>
> Key: CASSANDRA-15299
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15299
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Messaging/Client
>Reporter: Aleksey Yeschenko
>Assignee: Sam Tunnicliffe
>Priority: Normal
>  Labels: protocolv5
> Fix For: 4.0-alpha
>
> Attachments: Process CQL Frame.png, V5 Flow Chart.png
>
>
> CASSANDRA-13304 made an important improvement to our native protocol: it 
> introduced checksumming/CRC32 to request and response bodies. It’s an 
> important step forward, but it doesn’t cover the entire stream. In 
> particular, the message header is not covered by a checksum or a crc, which 
> poses a correctness issue if, for example, {{streamId}} gets corrupted.
> Additionally, we aren’t quite using CRC32 correctly, in two ways:
> 1. We are calculating the CRC32 of the *decompressed* value instead of 
> computing the CRC32 on the bytes written on the wire - losing the properties 
> of the CRC32. In some cases, due to this sequencing, attempting to decompress 
> a corrupt stream can cause a segfault by LZ4.
> 2. When using CRC32, the CRC32 value is written in the incorrect byte order, 
> also losing some of the protections.
> See https://users.ece.cmu.edu/~koopman/pubs/KoopmanCRCWebinar9May2012.pdf for 
> explanation for the two points above.
> Separately, there are some long-standing issues with the protocol - since 
> *way* before CASSANDRA-13304. Importantly, both checksumming and compression 
> operate on individual message bodies rather than frames of multiple complete 
> messages. In reality, this has several important additional downsides. To 
> name a couple:
> # For compression, we are getting poor compression ratios for smaller 
> messages - when operating on tiny sequences of bytes. In reality, for most 
> small requests and responses we are discarding the compressed value as it’d 
> be smaller than the uncompressed one - incurring both redundant allocations 
> and compressions.
> # For checksumming and CRC32 we pay a high overhead price for small messages. 
> 4 bytes extra is *a lot* for an empty write response, for example.
> To address the correctness issue of {{streamId}} not being covered by the 
> checksum/CRC32 and the inefficiency in compression and checksumming/CRC32, we 
> should switch to a framing protocol with multiple messages in a single frame.
> I suggest we reuse the framing protocol recently implemented for internode 
> messaging in CASSANDRA-15066 to the extent that its logic can be borrowed, 
> and that we do it before native protocol v5 graduates from beta. See 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderCrc.java
>  and 
> https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/net/FrameDecoderLZ4.java.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics

2020-11-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227573#comment-17227573
 ] 

Andres de la Peña commented on CASSANDRA-16183:
---

Great, thanks. Overall the approach looks good to me. I have added a few 
initial minor comments, I'll finish my review early next week. 

> Add tests to cover ClientRequest metrics 
> -
>
> Key: CASSANDRA-16183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Benjamin Lerer
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We do not have test that covers the ClientRequest metrics.
> * ClientRequestMetrics
> * CASClientRequestMetrics
> * CASClientWriteRequestMetrics
> * ClientWriteRequestMetrics
> * ViewWriteMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.

2020-11-06 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227552#comment-17227552
 ] 

Brandon Williams commented on CASSANDRA-16248:
--

I fixed the tests in e5ab8c1951, but cc [~yifanc]

> GossipTest hangs until timeout, then fails.
> ---
>
> Key: CASSANDRA-16248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16248
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown, Messaging/Internode, 
> Test/dtest/java
>Reporter: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0-beta4
>
>
> A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}
> * There seems to have been a merge/commit race between CASSANDRA-16146 
> ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e])
>  and CASSANDRA-15935 
> ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]).
>  The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, 
> but the latter changed the method signature, so this never actually gets 
> injected. This causes a latch in the test not to be counted down and it hangs 
> until timeout.
> * After fixing the test code, it still hangs due to changes to 
> {{server_encryption_options}} initialization in CASSANDRA-16144 
> ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]).
>  It appears to be causing an incorrect keystore location to be specified, 
> which causes instance startup to fail, again leading to the test hanging 
> until it times out. I don't have the cycles to dig into this further right 
> now, but reverting that commit (and making the test fix above) restores the 
> green bar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.0 updated: Fix tests broken by CASSANDRA-16146

2020-11-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-3.0
in repository https://gitbox.apache.org/repos/asf/cassandra.git


The following commit(s) were added to refs/heads/cassandra-3.0 by this push:
 new e5ab8c1  Fix tests broken by CASSANDRA-16146
e5ab8c1 is described below

commit e5ab8c1951384b9ddf0df9f1d4d49b4c9dfc188f
Author: yifan-c 
AuthorDate: Tue Nov 3 15:30:30 2020 -0800

Fix tests broken by CASSANDRA-16146

Patch by Yifan Cai, reviewed by brandonwilliams for CASSANDRA-16146
---
 src/java/org/apache/cassandra/service/StorageService.java   | 13 -
 .../org/apache/cassandra/distributed/impl/Instance.java |  2 ++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/java/org/apache/cassandra/service/StorageService.java 
b/src/java/org/apache/cassandra/service/StorageService.java
index 7645091..c4309f8 100644
--- a/src/java/org/apache/cassandra/service/StorageService.java
+++ b/src/java/org/apache/cassandra/service/StorageService.java
@@ -1225,10 +1225,21 @@ public class StorageService extends 
NotificationBroadcasterSupport implements IE
 }
 
 @VisibleForTesting // only used by test
-public void setMovingModeUnsafe() {
+public void setMovingModeUnsafe()
+{
 setMode(Mode.MOVING, true);
 }
 
+/**
+ * Only used in jvm dtest when not using GOSSIP.
+ * See 
org.apache.cassandra.distributed.impl.Instance#initializeRing(org.apache.cassandra.distributed.api.ICluster)
+ */
+@VisibleForTesting
+public void setNormalModeUnsafe()
+{
+setMode(Mode.NORMAL, true);
+}
+
 private void setMode(Mode m, boolean log)
 {
 setMode(m, null, log);
diff --git 
a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java 
b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
index 4f799ee..f72661d 100644
--- a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
+++ b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
@@ -690,6 +690,8 @@ public class Instance extends IsolatedExecutor implements 
IInvokableInstance
 // check that all nodes are in token metadata
 for (int i = 0; i < tokens.size(); ++i)
 assert 
storageService.getTokenMetadata().isMember(hosts.get(i).getAddress());
+
+storageService.setNormalModeUnsafe();
 }
 catch (Throwable e) // UnknownHostException
 {


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.0' into cassandra-3.11

2020-11-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 94f940cc50c72bcd819098e97548ab28d576bfac
Merge: 3200bcf e5ab8c1
Author: Brandon Williams 
AuthorDate: Fri Nov 6 11:42:01 2020 -0600

Merge branch 'cassandra-3.0' into cassandra-3.11

 src/java/org/apache/cassandra/service/StorageService.java   | 13 -
 .../org/apache/cassandra/distributed/impl/Instance.java |  2 ++
 .../distributed/test/ClientNetworkStopStartTest.java|  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)

diff --cc 
test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java
index da0731e,da0731e..0aabc8c
--- 
a/test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java
+++ 
b/test/distributed/org/apache/cassandra/distributed/test/ClientNetworkStopStartTest.java
@@@ -51,6 -51,6 +51,7 @@@ public class ClientNetworkStopStartTes
  @Test
  public void stopStartThrift() throws IOException, TException
  {
++// GOSSIP is needed in order to initServer correctly.
  try (Cluster cluster = init(Cluster.build(1).withConfig(c -> 
c.with(Feature.NATIVE_PROTOCOL)).start()))
  {
  IInvokableInstance node = cluster.get(1);


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch trunk updated (0700d79 -> 9ac9a93)

2020-11-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 0700d79  Circleci should run cqlshlib tests as well
 new e5ab8c1  Fix tests broken by CASSANDRA-16146
 new 94f940c  Merge branch 'cassandra-3.0' into cassandra-3.11
 new 9ac9a93  Merge branch 'cassandra-3.11' into trunk

The 3 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/java/org/apache/cassandra/service/StorageService.java   | 13 -
 .../org/apache/cassandra/distributed/impl/Instance.java |  1 +
 .../org/apache/cassandra/distributed/test/GossipTest.java   | 12 ++--
 3 files changed, 15 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-06 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227544#comment-17227544
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

Committed, thanks!

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] branch cassandra-3.11 updated (3200bcf -> 94f940c)

2020-11-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a change to branch cassandra-3.11
in repository https://gitbox.apache.org/repos/asf/cassandra.git.


from 3200bcf  Merge branch 'cassandra-3.0' into cassandra-3.11
 new e5ab8c1  Fix tests broken by CASSANDRA-16146
 new 94f940c  Merge branch 'cassandra-3.0' into cassandra-3.11

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 src/java/org/apache/cassandra/service/StorageService.java   | 13 -
 .../org/apache/cassandra/distributed/impl/Instance.java |  2 ++
 .../distributed/test/ClientNetworkStopStartTest.java|  1 +
 3 files changed, 15 insertions(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[cassandra] 01/01: Merge branch 'cassandra-3.11' into trunk

2020-11-06 Thread brandonwilliams
This is an automated email from the ASF dual-hosted git repository.

brandonwilliams pushed a commit to branch trunk
in repository https://gitbox.apache.org/repos/asf/cassandra.git

commit 9ac9a9343540e67f4609f75dd5199b2a66624488
Merge: 0700d79 94f940c
Author: Brandon Williams 
AuthorDate: Fri Nov 6 11:43:23 2020 -0600

Merge branch 'cassandra-3.11' into trunk

 src/java/org/apache/cassandra/service/StorageService.java   | 13 -
 .../org/apache/cassandra/distributed/impl/Instance.java |  1 +
 .../org/apache/cassandra/distributed/test/GossipTest.java   | 12 ++--
 3 files changed, 15 insertions(+), 11 deletions(-)

diff --cc src/java/org/apache/cassandra/service/StorageService.java
index d7d3ebe,7dea7a0..47f82b8
--- a/src/java/org/apache/cassandra/service/StorageService.java
+++ b/src/java/org/apache/cassandra/service/StorageService.java
@@@ -1464,6 -1488,16 +1465,16 @@@ public class StorageService extends Not
  setMode(Mode.MOVING, true);
  }
  
+ /**
+  * Only used in jvm dtest when not using GOSSIP.
 - * See 
org.apache.cassandra.distributed.impl.Instance#initializeRing(org.apache.cassandra.distributed.api.ICluster)
++ * See 
org.apache.cassandra.distributed.impl.Instance#startup(org.apache.cassandra.distributed.api.ICluster)
+  */
+ @VisibleForTesting
+ public void setNormalModeUnsafe()
+ {
+ setMode(Mode.NORMAL, true);
+ }
+ 
  private void setMode(Mode m, boolean log)
  {
  setMode(m, null, log);
diff --cc test/distributed/org/apache/cassandra/distributed/impl/Instance.java
index 4c778f1,50aea0b..2fc7044
--- a/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
+++ b/test/distributed/org/apache/cassandra/distributed/impl/Instance.java
@@@ -481,13 -554,7 +481,14 @@@ public class Instance extends IsolatedE
  }
  else
  {
 -initializeRing(cluster);
 +cluster.stream().forEach(peer -> {
 +if (cluster instanceof Cluster)
 +GossipHelper.statusToNormal((IInvokableInstance) 
peer).accept(this);
 +else
 +GossipHelper.unsafeStatusToNormal(this, 
(IInstance) peer);
 +});
 +
++StorageService.instance.setNormalModeUnsafe();
  }
  
  StorageService.instance.ensureTraceKeyspace();
diff --cc test/distributed/org/apache/cassandra/distributed/test/GossipTest.java
index a162ebf,32ecb95..1b6a004
--- a/test/distributed/org/apache/cassandra/distributed/test/GossipTest.java
+++ b/test/distributed/org/apache/cassandra/distributed/test/GossipTest.java
@@@ -19,17 -19,17 +19,13 @@@
  package org.apache.cassandra.distributed.test;
  
  import java.io.Closeable;
--import java.net.InetAddress;
  import java.util.Collection;
  import java.util.concurrent.CountDownLatch;
  import java.util.concurrent.ExecutorService;
  import java.util.concurrent.Executors;
  import java.util.concurrent.Future;
  import java.util.concurrent.TimeUnit;
--import java.util.concurrent.locks.LockSupport;
--import java.util.stream.Collectors;
  
--import com.google.common.collect.Iterables;
  import com.google.common.util.concurrent.Uninterruptibles;
  import org.junit.Assert;
  import org.junit.Test;
@@@ -39,11 -39,11 +35,7 @@@ import net.bytebuddy.dynamic.loading.Cl
  import net.bytebuddy.implementation.MethodDelegation;
  import org.apache.cassandra.dht.Token;
  import org.apache.cassandra.distributed.Cluster;
--import org.apache.cassandra.gms.ApplicationState;
--import org.apache.cassandra.gms.EndpointState;
--import org.apache.cassandra.gms.Gossiper;
  import org.apache.cassandra.service.StorageService;
--import org.apache.cassandra.utils.FBUtilities;
  
  import static net.bytebuddy.matcher.ElementMatchers.named;
  import static net.bytebuddy.matcher.ElementMatchers.takesArguments;
@@@ -61,13 -132,13 +53,13 @@@ public class GossipTest extends TestBas
  if (nodeNumber != 2)
  return;
  new ByteBuddy().rebase(StorageService.class)
--   .method(named("bootstrap").and(takesArguments(1)))
++   .method(named("bootstrap").and(takesArguments(2)))
 
.intercept(MethodDelegation.to(BBBootstrapInterceptor.class))
 .make()
 .load(cl, ClassLoadingStrategy.Default.INJECTION);
  }
  
--public static boolean bootstrap(Collection tokens) throws 
Exception
++public static boolean bootstrap(Collection tokens, long 
bootstrapTimeoutMillis)
  {
  bootstrapStart.countDown();
  Uninterruptibles.awaitUninterruptibly(bootstrapReady);


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For 

[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-06 Thread Yifan Cai (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227536#comment-17227536
 ] 

Yifan Cai commented on CASSANDRA-16146:
---

Sure thing. Comment was just added for the unsafe method in each branch. 

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics

2020-11-06 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227533#comment-17227533
 ] 

Adam Holmberg commented on CASSANDRA-16183:
---

I also appreciate using PRs for review. Here's one against my fork:
https://github.com/aholmberg/cassandra-dtest/pull/1

> Add tests to cover ClientRequest metrics 
> -
>
> Key: CASSANDRA-16183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Benjamin Lerer
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We do not have test that covers the ClientRequest metrics.
> * ClientRequestMetrics
> * CASClientRequestMetrics
> * CASClientWriteRequestMetrics
> * ClientWriteRequestMetrics
> * ViewWriteMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics

2020-11-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227474#comment-17227474
 ] 

Andres de la Peña commented on CASSANDRA-16183:
---

We don't merge the PRs nor require them, but I found them useful for the review 
comments, and in my experience it's usual to have them for that purpose (see 
[here|https://github.com/apache/cassandra/pulls] and 
[here)|https://github.com/apache/cassandra-dtest/pulls], although tickets 
without PR are not unusual either. I think that without a PR we can only add 
comments on each of the nine individual commits, but not on the diff, so it's 
difficult to have a global vision. Also I'm not sure whether those comments 
would survive a squash, as they do with PRs. WDYT? I can create the PR if you 
don't disagree, or perhaps we can squash the changes and comment on a single 
commit.

> Add tests to cover ClientRequest metrics 
> -
>
> Key: CASSANDRA-16183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Benjamin Lerer
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We do not have test that covers the ClientRequest metrics.
> * ClientRequestMetrics
> * CASClientRequestMetrics
> * CASClientWriteRequestMetrics
> * ClientWriteRequestMetrics
> * ViewWriteMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position

2020-11-06 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16249:
---
Fix Version/s: (was: 4.0.x)
   4.0-beta

> ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
> ---
>
> Key: CASSANDRA-16249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16249
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer 
> position when calculating the final offset for reading data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position

2020-11-06 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16249:
---
Reviewers: Benjamin Lerer

> ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
> ---
>
> Key: CASSANDRA-16249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16249
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer 
> position when calculating the final offset for reading data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics

2020-11-06 Thread Adam Holmberg (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227463#comment-17227463
 ] 

Adam Holmberg commented on CASSANDRA-16183:
---

I just linked a diff in the previous comment. Here:
https://github.com/apache/cassandra-dtest/compare/trunk...aholmberg:CASSANDRA-16183

Do you want me to create an actual PR? I was under the impression we don't 
actually take those on the mirrors.

> Add tests to cover ClientRequest metrics 
> -
>
> Key: CASSANDRA-16183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Benjamin Lerer
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We do not have test that covers the ClientRequest metrics.
> * ClientRequestMetrics
> * CASClientRequestMetrics
> * CASClientWriteRequestMetrics
> * ClientWriteRequestMetrics
> * ViewWriteMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16183) Add tests to cover ClientRequest metrics

2020-11-06 Thread Jira


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227455#comment-17227455
 ] 

Andres de la Peña commented on CASSANDRA-16183:
---

[~aholmber] is there a PR for the dtest patch?

> Add tests to cover ClientRequest metrics 
> -
>
> Key: CASSANDRA-16183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Benjamin Lerer
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We do not have test that covers the ClientRequest metrics.
> * ClientRequestMetrics
> * CASClientRequestMetrics
> * CASClientWriteRequestMetrics
> * ClientWriteRequestMetrics
> * ViewWriteMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-06 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227449#comment-17227449
 ] 

Benedict Elliott Smith commented on CASSANDRA-12126:


Yes, that sounds like a great idea, and I really appreciate you offering to 
take that to the list. I'll chime in with any necessary details to help inform 
the decision, but will try not to influence it otherwise. I don't have a strong 
opinion about which of those four options we select, except that my experiments 
do suggest (3) is perhaps dangerous for some of our users. It's probably a 
trade-off that should be made with careful business consideration and 
experimentation by each end user.

As far as delaying 4.0 is concerned, that's probably also a matter of community 
decision-making. We could quite quickly have a patch, that has been reviewed by 
multiple committers, posted in fairly short order - perhaps before we exit 
beta. This work will have had much greater validation than the current 
implementation, but publishing all of this validation work will take longer - 
likely also achievable before GA, but we might have to invert our process a 
little. Perhaps this is acceptable, given the balance of correctness and 
regression we're considering as an alternative, but given my proximity to the 
work (and that I also don't have a strong position either way), I would prefer 
to let others make that call.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16146) Node state incorrectly set to NORMAL after nodetool disablegossip and enablegossip during bootstrap

2020-11-06 Thread Brandon Williams (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227448#comment-17227448
 ] 

Brandon Williams commented on CASSANDRA-16146:
--

LGTM, supernit: I think it's a good idea to have some comment around unsafe 
methods, but that's easy enough to add on commit.

> Node state incorrectly set to NORMAL after nodetool disablegossip and 
> enablegossip during bootstrap
> ---
>
> Key: CASSANDRA-16146
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16146
> Project: Cassandra
>  Issue Type: Bug
>  Components: Cluster/Gossip
>Reporter: Yifan Cai
>Assignee: Yifan Cai
>Priority: Normal
> Fix For: 3.0.x, 3.11.x, 4.0-beta3
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> At high level, {{StorageService#setGossipTokens}} set the gossip state to 
> {{NORMAL}} blindly. Therefore, re-enabling gossip (stop and start gossip) 
> overrides the actual gossip state.
>   
> It could happen in the below scenario.
> # Bootstrap failed. The gossip state remains in {{BOOT}} / {{JOINING}} and 
> code execution exits StorageService#initServer.
> # Operator runs nodetool to stop and re-start gossip. The gossip state gets 
> flipped to {{NORMAL}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16251) SSTableLoader documentation needs improvement

2020-11-06 Thread Ekaterina Dimitrova (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ekaterina Dimitrova updated CASSANDRA-16251:

Change Category: Semantic
 Complexity: Normal
Component/s: Documentation/Website
   Priority: Low  (was: Normal)
 Status: Open  (was: Triage Needed)

> SSTableLoader documentation needs improvement
> -
>
> Key: CASSANDRA-16251
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16251
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation/Website
>Reporter: Ekaterina Dimitrova
>Priority: Low
>
> SSTableLoader documentation is unclear. 
> Offline/online usage;  directories; steps to use it - It is unclear and 
> sometimes for a new user.
> /CC [~lor...@datastax.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16251) SSTableLoader documentation needs improvement

2020-11-06 Thread Ekaterina Dimitrova (Jira)
Ekaterina Dimitrova created CASSANDRA-16251:
---

 Summary: SSTableLoader documentation needs improvement
 Key: CASSANDRA-16251
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16251
 Project: Cassandra
  Issue Type: Improvement
Reporter: Ekaterina Dimitrova


SSTableLoader documentation is unclear. 
Offline/online usage;  directories; steps to use it - It is unclear and 
sometimes for a new user.

/CC [~lor...@datastax.com]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2020-11-06 Thread Ekaterina Dimitrova (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227423#comment-17227423
 ] 

Ekaterina Dimitrova commented on CASSANDRA-14013:
-

??To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.??

To support this, the first time I was reading for the SSTableLoader and trying 
to use it, I did exactly what you said and got really frustrated :-) 

I will open a ticket to [~lor...@datastax.com] to do her magic :-) 

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Gregor Uhlenheuer
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-16250) LongSharedExecutorPoolTest.testPromptnessOfExecution burn test is flaky

2020-11-06 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer reassigned CASSANDRA-16250:
--

Assignee: Benjamin Lerer

> LongSharedExecutorPoolTest.testPromptnessOfExecution burn test is flaky
> ---
>
> Key: CASSANDRA-16250
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16250
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/burn
>Reporter: Benjamin Lerer
>Assignee: Benjamin Lerer
>Priority: Normal
>
> Within the burn tests 
> {{LongSharedExecutorPoolTest.testPromptnessOfExecution}} fail regularily with 
> the following stacktrace:
> {code}
> junit.framework.AssertionFailedError
>   at 
> org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:213)
>   at 
> org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:102)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16250) LongSharedExecutorPoolTest.testPromptnessOfExecution burn test is flaky

2020-11-06 Thread Benjamin Lerer (Jira)
Benjamin Lerer created CASSANDRA-16250:
--

 Summary: LongSharedExecutorPoolTest.testPromptnessOfExecution burn 
test is flaky
 Key: CASSANDRA-16250
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16250
 Project: Cassandra
  Issue Type: Improvement
  Components: Test/burn
Reporter: Benjamin Lerer


Within the burn tests {{LongSharedExecutorPoolTest.testPromptnessOfExecution}} 
fail regularily with the following stacktrace:

{code}
junit.framework.AssertionFailedError
at 
org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:213)
at 
org.apache.cassandra.concurrent.LongSharedExecutorPoolTest.testPromptnessOfExecution(LongSharedExecutorPoolTest.java:102)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16171) Remove Windows scripts

2020-11-06 Thread Yuki Morishita (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227386#comment-17227386
 ] 

Yuki Morishita commented on CASSANDRA-16171:


Sorry for delay.

Updated my PR with:
 * removing install instruction from README
 * removing .bat/.ps1 ref from build.xml / rpm spec

I think we don't have to touch CHANGES/NEWS.

I left make.bat for doc, it is for development and it does not go into release 
artifacts. 

> Remove Windows scripts
> --
>
> Key: CASSANDRA-16171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16171
> Project: Cassandra
>  Issue Type: Task
>  Components: Packaging
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per the email thread in cassandra-dev mailing list[1], remove windows 
> scripts from Cassandra 4.0 onwards, due to the lack of maintenance and tests.
> 1: https://www.mail-archive.com/dev@cassandra.apache.org/msg15583.html 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16171) Remove Windows scripts

2020-11-06 Thread Yuki Morishita (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-16171:
---
Status: Patch Available  (was: In Progress)

> Remove Windows scripts
> --
>
> Key: CASSANDRA-16171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16171
> Project: Cassandra
>  Issue Type: Task
>  Components: Packaging
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per the email thread in cassandra-dev mailing list[1], remove windows 
> scripts from Cassandra 4.0 onwards, due to the lack of maintenance and tests.
> 1: https://www.mail-archive.com/dev@cassandra.apache.org/msg15583.html 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16171) Remove Windows scripts

2020-11-06 Thread Yuki Morishita (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16171?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuki Morishita updated CASSANDRA-16171:
---
Status: In Progress  (was: Changes Suggested)

> Remove Windows scripts
> --
>
> Key: CASSANDRA-16171
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16171
> Project: Cassandra
>  Issue Type: Task
>  Components: Packaging
>Reporter: Yuki Morishita
>Assignee: Yuki Morishita
>Priority: Normal
> Fix For: 4.0-rc
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> As per the email thread in cassandra-dev mailing list[1], remove windows 
> scripts from Cassandra 4.0 onwards, due to the lack of maintenance and tests.
> 1: https://www.mail-archive.com/dev@cassandra.apache.org/msg15583.html 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227379#comment-17227379
 ] 

Benjamin Lerer edited comment on CASSANDRA-12126 at 11/6/20, 12:44 PM:
---

It seem to me that there are several options here:
# Try to use your proposal for 4.0 if the community has the appetite for it. 
The main issue there is some potential extra delay for 4.0
# Do nothing for 4.0. Meaning do not commit the patch. We have lived a long 
time with that issue and we can probably wait a bit more for a proper solution.
# Commit the patch as such, fixing the correctness but introducting potentially 
some performance issue until we release a better solution.
#  Changing the patch to default to the current behavior but allowing people to 
enable the new one if the correctness is a problem for them.

May be we should trigger a discussion on the mailing list and see what is other 
people opinion.

I can take care of that next week if you think it is a good idea.


was (Author: blerer):
It seem to me that there are several options here:
# Try to use your proposal for 4.0 if the community has the appetite for it. 
The main issue there is some potential extra delay for 4.0
# Do nothing for 4.0. Meaning do not commit the patch. We have lived a long 
time with that issue and we can probably wait a bit more for a proper solution.
# Commit the patch as such, fixing the correctness but introducting potentially 
some performance issue until we release a better solution.
#  Changing the patch to default to the current behavior but allowing people to 
enable the new one if the correctness is a problem for them.

May be we should trigger a discussion on the mailing list and see what is other 
people opinion.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227379#comment-17227379
 ] 

Benjamin Lerer commented on CASSANDRA-12126:


It seem to me that there are several options here:
# Try to use your proposal for 4.0 if the community has the appetite for it. 
The main issue there is some potential extra delay for 4.0
# Do nothing for 4.0. Meaning do not commit the patch. We have lived a long 
time with that issue and we can probably wait a bit more for a proper solution.
# Commit the patch as such, fixing the correctness but introducting potentially 
some performance issue until we release a better solution.
#  Changing the patch to default to the current behavior but allowing people to 
enable the new one if the correctness is a problem for them.

May be we should trigger a discussion on the mailing list and see what is other 
people opinion.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16183) Add tests to cover ClientRequest metrics

2020-11-06 Thread Jira


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andres de la Peña updated CASSANDRA-16183:
--
Reviewers: Andres de la Peña

> Add tests to cover ClientRequest metrics 
> -
>
> Key: CASSANDRA-16183
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16183
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/dtest/java
>Reporter: Benjamin Lerer
>Assignee: Adam Holmberg
>Priority: Normal
> Fix For: 4.0-beta
>
>
> We do not have test that covers the ClientRequest metrics.
> * ClientRequestMetrics
> * CASClientRequestMetrics
> * CASClientWriteRequestMetrics
> * ClientWriteRequestMetrics
> * ViewWriteMetrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position

2020-11-06 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16249:
--
Test and Documentation Plan: Run regression tests
 Status: Patch Available  (was: In Progress)

> ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
> ---
>
> Key: CASSANDRA-16249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16249
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer 
> position when calculating the final offset for reading data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position

2020-11-06 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16249:
--
  Fix Version/s: 4.0.x
Source Control Link: https://github.com/apache/cassandra/pull/811

> ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
> ---
>
> Key: CASSANDRA-16249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16249
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
> Fix For: 4.0.x
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer 
> position when calculating the final offset for reading data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position

2020-11-06 Thread Jacek Lewandowski (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jacek Lewandowski updated CASSANDRA-16249:
--
 Bug Category: Parent values: Correctness(12982)Level 1 values: 
Unrecoverable Corruption / Loss(13161)
   Complexity: Low Hanging Fruit
  Component/s: Legacy/Core
Discovered By: Code Inspection
 Severity: Critical
   Status: Open  (was: Triage Needed)

> ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position
> ---
>
> Key: CASSANDRA-16249
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16249
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Jacek Lewandowski
>Assignee: Jacek Lewandowski
>Priority: Normal
>
> {{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer 
> position when calculating the final offset for reading data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16249) ByteBufferAccessor.getUnsignedShort ignores ByteBuffer position

2020-11-06 Thread Jacek Lewandowski (Jira)
Jacek Lewandowski created CASSANDRA-16249:
-

 Summary: ByteBufferAccessor.getUnsignedShort ignores ByteBuffer 
position
 Key: CASSANDRA-16249
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16249
 Project: Cassandra
  Issue Type: Bug
Reporter: Jacek Lewandowski
Assignee: Jacek Lewandowski


{{ByteBufferAccessor.readUnsignedShort}} does not include the current buffer 
position when calculating the final offset for reading data



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.

2020-11-06 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16248:

 Bug Category: Parent values: Correctness(12982)Level 1 values: Test 
Failure(12990)
   Complexity: Normal
Discovered By: Unit Test
Fix Version/s: 4.0-beta4
 Severity: Critical
   Status: Open  (was: Triage Needed)

> GossipTest hangs until timeout, then fails.
> ---
>
> Key: CASSANDRA-16248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16248
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown, Messaging/Internode, 
> Test/dtest/java
>Reporter: Sam Tunnicliffe
>Priority: Normal
> Fix For: 4.0-beta4
>
>
> A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}
> * There seems to have been a merge/commit race between CASSANDRA-16146 
> ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e])
>  and CASSANDRA-15935 
> ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]).
>  The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, 
> but the latter changed the method signature, so this never actually gets 
> injected. This causes a latch in the test not to be counted down and it hangs 
> until timeout.
> * After fixing the test code, it still hangs due to changes to 
> {{server_encryption_options}} initialization in CASSANDRA-16144 
> ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]).
>  It appears to be causing an incorrect keystore location to be specified, 
> which causes instance startup to fail, again leading to the test hanging 
> until it times out. I don't have the cycles to dig into this further right 
> now, but reverting that commit (and making the test fix above) restores the 
> green bar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.

2020-11-06 Thread Sam Tunnicliffe (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sam Tunnicliffe updated CASSANDRA-16248:

Description: 
A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}

* There seems to have been a merge/commit race between CASSANDRA-16146 
([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e])
 and CASSANDRA-15935 
([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]).
 The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, but 
the latter changed the method signature, so this never actually gets injected. 
This causes a latch in the test not to be counted down and it hangs until 
timeout.
* After fixing the test code, it still hangs due to changes to 
{{server_encryption_options}} initialization in CASSANDRA-16144 
([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]).
 It appears to be causing an incorrect keystore location to be specified, which 
causes instance startup to fail, again leading to the test hanging until it 
times out. I don't have the cycles to dig into this further right now, but 
reverting that commit (and making the test fix above) restores the green bar.



  was:
A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}

* There seems to have been a merge/commit race between CASSANDRA-16146 
({{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e)
 and CASSANDRA-15935 
({{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab)).
 The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, but 
the latter changed the method signature, so this never actually gets injected. 
This causes a latch in the test not to be counted down and it hangs until 
timeout.
* After fixing the test code, it still hangs due to changes to 
{{server_encryption_options}} initialization in CASSANDRA-166144 
({{f293376a|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c).
 It appears to be causing an incorrect keystore location to be specified, which 
causes instance startup to fail, again leading to the test hanging until it 
times out. I don't have the cycles to dig into this further right now, but 
reverting that commit (and making the test fix above) restores the green bar.




> GossipTest hangs until timeout, then fails.
> ---
>
> Key: CASSANDRA-16248
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16248
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local/Startup and Shutdown, Messaging/Internode, 
> Test/dtest/java
>Reporter: Sam Tunnicliffe
>Priority: Normal
>
> A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}
> * There seems to have been a merge/commit race between CASSANDRA-16146 
> ([{{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e])
>  and CASSANDRA-15935 
> ([{{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab]).
>  The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, 
> but the latter changed the method signature, so this never actually gets 
> injected. This causes a latch in the test not to be counted down and it hangs 
> until timeout.
> * After fixing the test code, it still hangs due to changes to 
> {{server_encryption_options}} initialization in CASSANDRA-16144 
> ([{{f293376a}}|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c]).
>  It appears to be causing an incorrect keystore location to be specified, 
> which causes instance startup to fail, again leading to the test hanging 
> until it times out. I don't have the cycles to dig into this further right 
> now, but reverting that commit (and making the test fix above) restores the 
> green bar.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-16248) GossipTest hangs until timeout, then fails.

2020-11-06 Thread Sam Tunnicliffe (Jira)
Sam Tunnicliffe created CASSANDRA-16248:
---

 Summary: GossipTest hangs until timeout, then fails.
 Key: CASSANDRA-16248
 URL: https://issues.apache.org/jira/browse/CASSANDRA-16248
 Project: Cassandra
  Issue Type: Bug
  Components: Local/Startup and Shutdown, Messaging/Internode, 
Test/dtest/java
Reporter: Sam Tunnicliffe


A couple of recent updates appear to have broken {{o.a.c.d.t.GossipTest}}

* There seems to have been a merge/commit race between CASSANDRA-16146 
({{fee7a108}}|https://github.com/apache/cassandra/commit/fee7a10823da1e29bd0e6504fea9679389180c9e)
 and CASSANDRA-15935 
({{41952a2f}}|https://github.com/apache/cassandra/commit/41952a2f73ba5198250f64beba8f7ff1203204ab)).
 The former adds a ByteBuddy interception on {{StorageService::bootstrap}}, but 
the latter changed the method signature, so this never actually gets injected. 
This causes a latch in the test not to be counted down and it hangs until 
timeout.
* After fixing the test code, it still hangs due to changes to 
{{server_encryption_options}} initialization in CASSANDRA-166144 
({{f293376a|https://github.com/apache/cassandra/commit/f293376aa8dd315a208ef2f03bdcb7a84dcc675c).
 It appears to be causing an incorrect keystore location to be specified, which 
causes instance startup to fail, again leading to the test hanging until it 
times out. I don't have the cycles to dig into this further right now, but 
reverting that commit (and making the test fix above) restores the green bar.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12126) CAS Reads Inconsistencies

2020-11-06 Thread Benedict Elliott Smith (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227354#comment-17227354
 ] 

Benedict Elliott Smith commented on CASSANDRA-12126:


To some extent that is all up for debate.


 My plan so far has been to avoid interfering with 4.0 release, so I have been 
working towards targeting 4.x. This would also permit time to produce 
documentation and reach out to the list to begin the slow handshake to see if 
the project wants the work, and in what manner. However, the main body of work 
is essentially complete, so it is possible that this could be brought forwards 
if there were appetite.
 As to target version, it would be possible to target 3.0+, at least for a 
portion of the work that would encompass this issue, without a great deal of 
work. The project's appetite would be the main decider, as it's a significant 
body of work.


 The main contribution would be a parallel implementation of the same 
underlying Paxos algorithm, that is able to run concurrently alongside it 
(supporting live migration), but with several latency improvements, as well as 
several fixes to correctness. Alongside this is related work to guarantee 
linearizability across range movements in the form of modifications to repair, 
bootstrap, replace etc.


 Related to this work are several patches to wider Cassandra to support 
automated verification of its correctness, by permitting deterministic 
simulation of Cassandra clusters with adversarial ordering of events. We have 
so far simulated billions of transactions to verify its linearizability. I 
anticipate that this work will be useful for the project's overall goal of 
improving quality, but they are themselves quite significant and will require 
their own discussions around timeline and scope.

> CAS Reads Inconsistencies 
> --
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
>  Issue Type: Bug
>  Components: Feature/Lightweight Transactions, Legacy/Coordination
>Reporter: Sankalp Kohli
>Assignee: Sylvain Lebresne
>Priority: Normal
>  Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with 
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies 
> true to a propose and saves the commit in accepted filed. The other two 
> machines B and C does not get to the accept phase. 
> Current state is that machine A has this commit in paxos table as accepted 
> but not committed and B and C does not. 
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the 
> value written in step 1. This step is as if nothing is inflight. 
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that 
> there is something inflight from A and will propose and commit it with the 
> current ballot. Now we can read the value written in step 1 as part of this 
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value 
> written in step 1. 
> 4. Issue a CAS Write and it involves only B and C. This will succeed and 
> commit a different value than step 1. Step 1 value will never be seen again 
> and was never seen before. 
> If you read the Lamport “paxos made simple” paper and read section 2.3. It 
> talks about this issue which is how learners can find out if majority of the 
> acceptors have accepted the proposal. 
> In step 3, it is correct that we propose the value again since we dont know 
> if it was accepted by majority of acceptors. When we ask majority of 
> acceptors, and more than one acceptors but not majority has something in 
> flight, we have no way of knowing if it is accepted by majority of acceptors. 
> So this behavior is correct. 
> However we need to fix step 2, since it caused reads to not be linearizable 
> with respect to writes and other reads. In this case, we know that majority 
> of acceptors have no inflight commit which means we have majority that 
> nothing was accepted by majority. I think we should run a propose step here 
> with empty commit and that will cause write written in step 1 to not be 
> visible ever after. 
> With this fix, we will either see data written in step 1 on next serial read 
> or will never see it which is what we want. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227347#comment-17227347
 ] 

Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 11:29 AM:
---

Trying to summarize the problem:
# SSTables used within the C* data directories should be within the data 
directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and 
the table directories should be in the form {{-}}. In this 
case the problem come mainly from keyspace being named {{backups}} or 
{{snapshots}}.
# Files coming from SSTableLoader should be outside of the data directories and 
the table name should be without the TableID. In this case, keyspaces and 
tables with a 
{{backups}} or {{snapshots}} name will be having issues.

To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.

[~stefan.miklosovic] As you pointed out there are several scenario that we 
never tested. {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag 
name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested 
by CASSANDRA-16235. The patch should add some tests for those scenarios.
We should also probably test a {{nodetool refresh}} with a {{snapshots}} or 
{{backups}} keyspace.

Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235.


was (Author: blerer):
Trying to summarize the problem:
# SSTables used within the C* data directories should be within the data 
directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and 
the table directories should be in the form {{-}}. In this 
case the problem come mainly from keyspace being named {{backups}} or 
{{snapshots}}.
# Files coming from SSTableLoader should be outside of the data directories and 
the table name should be without the TableID. In this case, keyspaces and 
tables with a 
{{backups}} or {{snapshots}} name will be having issues.

To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.

[~stefan.miklosovic] As you pointed out there are several scenario that we 
never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag 
name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested 
by CASSANDRA-16235. The patch should add some tests for those scenarios.
We should also probably test a {{nodetool refresh}} with a {{snapshots}} or 
{{backups}} keyspace.

Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235.

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Gregor Uhlenheuer
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227347#comment-17227347
 ] 

Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 11:30 AM:
---

Trying to summarize the problem:
# SSTables used within the C* data directories should be within the data 
directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and 
the table directories should be in the form {{-}}. In this 
case the problem come mainly from keyspace being named {{backups}} or 
{{snapshots}}.
# Files coming from SSTableLoader should be outside of the data directories and 
the table name should be without the TableID. In this case, keyspaces and 
tables with a 
{{backups}} or {{snapshots}} name will be having issues.

To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.

[~stefan.miklosovic] As you pointed out there are several scenario that we 
never tested. {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag 
name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested 
by CASSANDRA-16235). The patch should add some tests for those scenarios.
We should also probably test {{nodetool refresh}} with a {{snapshots}} or 
{{backups}} keyspace.

Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235.


was (Author: blerer):
Trying to summarize the problem:
# SSTables used within the C* data directories should be within the data 
directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and 
the table directories should be in the form {{-}}. In this 
case the problem come mainly from keyspace being named {{backups}} or 
{{snapshots}}.
# Files coming from SSTableLoader should be outside of the data directories and 
the table name should be without the TableID. In this case, keyspaces and 
tables with a 
{{backups}} or {{snapshots}} name will be having issues.

To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.

[~stefan.miklosovic] As you pointed out there are several scenario that we 
never tested. {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag 
name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested 
by CASSANDRA-16235. The patch should add some tests for those scenarios.
We should also probably test a {{nodetool refresh}} with a {{snapshots}} or 
{{backups}} keyspace.

Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235.

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Gregor Uhlenheuer
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227347#comment-17227347
 ] 

Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 11:29 AM:
---

Trying to summarize the problem:
# SSTables used within the C* data directories should be within the data 
directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and 
the table directories should be in the form {{-}}. In this 
case the problem come mainly from keyspace being named {{backups}} or 
{{snapshots}}.
# Files coming from SSTableLoader should be outside of the data directories and 
the table name should be without the TableID. In this case, keyspaces and 
tables with a 
{{backups}} or {{snapshots}} name will be having issues.

To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.

[~stefan.miklosovic] As you pointed out there are several scenario that we 
never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag 
name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested 
by CASSANDRA-16235. The patch should add some tests for those scenarios.
We should also probably test a {{nodetool refresh}} with a {{snapshots}} or 
{{backups}} keyspace.

Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235.


was (Author: blerer):
Trying to summarize the problem:
# SSTables used within the C* data directories should be within the data 
directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and 
the table directories should be in the form {{-}}. In this 
case the problem come mainly from keyspace being named {{backups}} or 
{{snapshots}}.
# Files coming from SSTableLoader should outside of the data directories and 
the table name should be without the TableID. In this case, keyspace and table 
with a 
{{backups}} or {{snapshots}} name will be having issues.

To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.

[~stefan.miklosovic] As you pointed out there are several scenario that we 
never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag 
name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested 
by CASSANDRA-16235. The patch should add some tests for those scenarios.
We should also probably test a {{nodetool refresh}} with a {{snapshots}} or 
{{backups}} keyspace.

Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235.

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Gregor Uhlenheuer
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: 

[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227347#comment-17227347
 ] 

Benjamin Lerer commented on CASSANDRA-14013:


Trying to summarize the problem:
# SSTables used within the C* data directories should be within the data 
directories returned by {{DatabaseDescriptor.getAllDataFileLocations()}} and 
the table directories should be in the form {{-}}. In this 
case the problem come mainly from keyspace being named {{backups}} or 
{{snapshots}}.
# Files coming from SSTableLoader should outside of the data directories and 
the table name should be without the TableID. In this case, keyspace and table 
with a 
{{backups}} or {{snapshots}} name will be having issues.

To be honest, the documentation I found on the SSTableloader is pretty 
confusing and I imagine that some people might try to use it directly on the C* 
data directories in which case the table directory will contains the TableID. 
This case is somehow the same than the {{1.}} above.

[~stefan.miklosovic] As you pointed out there are several scenario that we 
never tested {{nodetool snapshot}} with a {{snapshots}} or {{backups}} tag 
name. SSTableLoader for a {{snapshots}} table (the {{backups}} name was tested 
by CASSANDRA-16235. The patch should add some tests for those scenarios.
We should also probably test a {{nodetool refresh}} with a {{snapshots}} or 
{{backups}} keyspace.

Pinging [~e.dimitrova] as she was involved in CASSANDRA-16235.

> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Gregor Uhlenheuer
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227268#comment-17227268
 ] 

Benjamin Lerer commented on CASSANDRA-14013:


{quote}That is not true{quote}

You are right, I should open my eyes properly ;-)

Then unless I am mistaken (again ;-)), you cannot rely on 
{{DatabaseDescriptor.getAllDataFileLocations()}} as those directories will not 
be the same as the one in which is stored the input directory for the 
SSTableLoader.



> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Gregor Uhlenheuer
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14013) Data loss in snapshots keyspace after service restart

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227268#comment-17227268
 ] 

Benjamin Lerer edited comment on CASSANDRA-14013 at 11/6/20, 9:23 AM:
--

{quote}That is not true{quote}

You are right, I should open my eyes properly ;-)

Then unless I am mistaken (again ;-)), we cannot rely on 
{{DatabaseDescriptor.getAllDataFileLocations()}} as those directories will not 
be the same as the one in which is stored the input directory for the 
SSTableLoader.




was (Author: blerer):
{quote}That is not true{quote}

You are right, I should open my eyes properly ;-)

Then unless I am mistaken (again ;-)), you cannot rely on 
{{DatabaseDescriptor.getAllDataFileLocations()}} as those directories will not 
be the same as the one in which is stored the input directory for the 
SSTableLoader.



> Data loss in snapshots keyspace after service restart
> -
>
> Key: CASSANDRA-14013
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14013
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Gregor Uhlenheuer
>Assignee: Stefan Miklosovic
>Priority: Normal
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> I am posting this bug in hope to discover the stupid mistake I am doing 
> because I can't imagine a reasonable answer for the behavior I see right now 
> :-)
> In short words, I do observe data loss in a keyspace called *snapshots* after 
> restarting the Cassandra service. Say I do have 1000 records in a table 
> called *snapshots.test_idx* then after restart the table has less entries or 
> is even empty.
> My kind of "mysterious" observation is that it happens only in a keyspace 
> called *snapshots*...
> h3. Steps to reproduce
> These steps to reproduce show the described behavior in "most" attempts (not 
> every single time though).
> {code}
> # create keyspace
> CREATE KEYSPACE snapshots WITH replication = {'class': 'SimpleStrategy', 
> 'replication_factor': 1};
> # create table
> CREATE TABLE snapshots.test_idx (key text, seqno bigint, primary key(key));
> # insert some test data
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1', 1);
> ...
> INSERT INTO snapshots.test_idx (key,seqno) values ('key1000', 1000);
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 1000
> # restart service
> kill 
> cassandra -f
> # count entries
> SELECT count(*) FROM snapshots.test_idx;
> 0
> {code}
> I hope someone can point me to the obvious mistake I am doing :-)
> This happened to me using both Cassandra 3.9 and 3.11.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14925) DecimalSerializer.toString() can be used as OOM attack

2020-11-06 Thread Jacek Lewandowski (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227257#comment-17227257
 ] 

Jacek Lewandowski commented on CASSANDRA-14925:
---

When is it going to be merged?

> DecimalSerializer.toString() can be used as OOM attack 
> ---
>
> Key: CASSANDRA-14925
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14925
> Project: Cassandra
>  Issue Type: Bug
>  Components: Legacy/Core
>Reporter: Zhao Yang
>Assignee: Zhao Yang
>Priority: Low
>
> Currently, in {{DecimalSerializer.toString(value)}}, it uses 
> {{BigDecimal.toPlainString()}} which generates huge string for large scale 
> values.
>  
> {code:java}
> BigDecimal d = new BigDecimal("1e-" + (Integer.MAX_VALUE - 6));
> d.toPlainString(); // oom{code}
>  
> Propose to use {{BigDecimal.toString()}} when scale is larger than 100 which 
> is configurable via {{-Dcassandra.decimal.maxscaleforstring}}
>  
> | patch | circle-ci |
> |[trunk|https://github.com/jasonstack/cassandra/commits/decimal-tostring-trunk]|[unit|https://circleci.com/gh/jasonstack/cassandra/751?utm_campaign=vcs-integration-link_medium=referral_source=github-build-link]|
> The code should apply cleanly to 3.0+.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16192) Add more tests to cover compaction metrics

2020-11-06 Thread Mohamed Zafraan (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227243#comment-17227243
 ] 

Mohamed Zafraan commented on CASSANDRA-16192:
-

[~blerer] Sorry. Must have done so by accident.

> Add more tests to cover compaction metrics
> --
>
> Key: CASSANDRA-16192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16192
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Benjamin Lerer
>Assignee: Mohamed Zafraan
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch
>
>
> Some compaction metrics do not seems to be tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16192) Add more tests to cover compaction metrics

2020-11-06 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16192:
---
Status: Patch Available  (was: Review In Progress)

> Add more tests to cover compaction metrics
> --
>
> Key: CASSANDRA-16192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16192
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Benjamin Lerer
>Assignee: Mohamed Zafraan
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch
>
>
> Some compaction metrics do not seems to be tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-16192) Add more tests to cover compaction metrics

2020-11-06 Thread Benjamin Lerer (Jira)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227234#comment-17227234
 ] 

Benjamin Lerer commented on CASSANDRA-16192:


[~mohamed_zafraan] The reviewers for a patch should be different persons that 
the ones that created the patch. By consequence you cannot put yourself as 
reviewer. :-)

> Add more tests to cover compaction metrics
> --
>
> Key: CASSANDRA-16192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16192
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Benjamin Lerer
>Assignee: Mohamed Zafraan
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch
>
>
> Some compaction metrics do not seems to be tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-16192) Add more tests to cover compaction metrics

2020-11-06 Thread Benjamin Lerer (Jira)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-16192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benjamin Lerer updated CASSANDRA-16192:
---
Reviewers: Adam Holmberg  (was: Adam Holmberg, Mohamed Zafraan)

> Add more tests to cover compaction metrics
> --
>
> Key: CASSANDRA-16192
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16192
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Test/unit
>Reporter: Benjamin Lerer
>Assignee: Mohamed Zafraan
>Priority: Normal
> Fix For: 4.0-beta
>
> Attachments: 0001-added-unit-tests-to-cover-compaction-metrics.patch
>
>
> Some compaction metrics do not seems to be tested.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org