[jira] [Commented] (CASSANDRA-14847) improvement of nodetool status -r

2018-10-29 Thread Nate McCall (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668068#comment-16668068
 ] 

Nate McCall commented on CASSANDRA-14847:
-

This seems reasonable to me. [~fyamashi] can you verify that the patch applies 
cleanly on the following branches: cassandra-3.0, cassandra-3.11 and trunk?

> improvement of nodetool status -r
> -
>
> Key: CASSANDRA-14847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14847
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Fumiya Yamashita
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: 3.11.1.patch
>
>
> Hello,
> When using "nodetool status -r", I found a problem that the response time 
> becomes longer depending on the number of vnodes.
>  In my testing environment, when the num_token is 256 and the number of nodes 
> is 6, the response takes about 60 seconds.
> It turned out that the findMaxAddressLength method in status.java is causing 
> the delay.
>  Despite only obtaining the maximum length of the address by the number of 
> vnodes, `tokenrange * vnode` times also loop processing, there is redundancy.
> To prevent duplicate host names from being referenced every time, I modified 
> to check with hash.
>  In my environment, the response time has been reduced from 60 seconds to 2 
> seconds.
> I attached the patch, so please check it.
>  Thank you
> {code:java}
> [before]
> Datacenter: dc1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN *** 559.32 KB 256 48.7% 0555746a-60c2-4717-b042-94ba951ef679 ***
> UN *** 721.48 KB 256 51.4% 1af4acb6-e0a0-4bcb-8bba-76ae2e225cd5 ***
> UN *** 699.98 KB 256 48.3% 5215c728-9b80-4e3c-b46b-c5b8e5eb753f ***
> UN *** 691.65 KB 256 48.1% 57da4edf-4acb-474d-b26c-27f048c37bd6 ***
> UN *** 705.66 KB 256 52.8% 07520eab-47d2-4f5d-aeeb-f6e599c9b084 ***
> UN *** 610.87 KB 256 50.7% 6b39acaf-6ed6-42e4-a357-0d258bdf87b7 ***
> time : 66s
> [after]
> Datacenter: dc1
> ===
> Status=Up/Down
> |/ State=Normal/Leaving/Joining/Moving
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN *** 559.32 KB 256 48.7% 0555746a-60c2-4717-b042-94ba951ef679 ***
> UN *** 721.48 KB 256 51.4% 1af4acb6-e0a0-4bcb-8bba-76ae2e225cd5 ***
> UN *** 699.98 KB 256 48.3% 5215c728-9b80-4e3c-b46b-c5b8e5eb753f ***
> UN *** 691.65 KB 256 48.1% 57da4edf-4acb-474d-b26c-27f048c37bd6 ***
> UN *** 705.66 KB 256 52.8% 07520eab-47d2-4f5d-aeeb-f6e599c9b084 ***
> UN *** 610.87 KB 256 50.7% 6b39acaf-6ed6-42e4-a357-0d258bdf87b7 ***
> time : 2s
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14851) Blog Post: "Introducing Transient Replication"

2018-10-29 Thread Nate McCall (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16668044#comment-16668044
 ] 

Nate McCall commented on CASSANDRA-14851:
-

Post looks good. We just posted content from CASSANDRA-14835 so i'd like to sit 
on this until Monday so we have a more gradual publishing output. WDYT 
[~aweisberg] ?

> Blog Post: "Introducing Transient Replication"
> --
>
> Key: CASSANDRA-14851
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14851
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Minor
>  Labels: blog
> Attachments: introducing_transient_replication.patch
>
>
> This is a blog post introducing transient replication. The patch (patch 
> compatible) attached applies to the website repo (outside the project's 
> primary Git repo).
> SVN patch containing the post is attached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14835) Blog Post: "Audit Logging in Apache Cassandra 4.0"

2018-10-29 Thread Nate McCall (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate McCall updated CASSANDRA-14835:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Posted. Thanks again [~vinaykumarcse]!

> Blog Post: "Audit Logging in Apache Cassandra 4.0"
> --
>
> Key: CASSANDRA-14835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14835
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Vinay Chella
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: blog
> Attachments: 14835_audit_logging_cassandra.patch, 
> 14835_auditlog_blog_rendered.png
>
>
> This is a blog post talking about Audit Logging feature in Apache Cassandra 
> 4.0 (CASSANDRA-12151). 
> I am sharing the google doc link at this moment for reviews, as soon as we 
> finalize, will send the SVN patch with markdown



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14835) Blog Post: "Audit Logging in Apache Cassandra 4.0"

2018-10-29 Thread Vinay Chella (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667952#comment-16667952
 ] 

Vinay Chella commented on CASSANDRA-14835:
--

Thank you. Yes, potentially that could have been the reason. 

> Blog Post: "Audit Logging in Apache Cassandra 4.0"
> --
>
> Key: CASSANDRA-14835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14835
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Vinay Chella
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: blog
> Attachments: 14835_audit_logging_cassandra.patch, 
> 14835_auditlog_blog_rendered.png
>
>
> This is a blog post talking about Audit Logging feature in Apache Cassandra 
> 4.0 (CASSANDRA-12151). 
> I am sharing the google doc link at this moment for reviews, as soon as we 
> finalize, will send the SVN patch with markdown



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14780) Avoid creating empty compaction tasks after truncate

2018-10-29 Thread Kurt Greaves (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14780:
-
Issue Type: Bug  (was: Improvement)

> Avoid creating empty compaction tasks after truncate
> 
>
> Key: CASSANDRA-14780
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14780
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Minor
> Fix For: 4.0
>
>
> There is a chance we create empty {{RepairFinishedCompactionTask}} after a 
> table is truncated



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14780) Avoid creating empty compaction tasks after truncate

2018-10-29 Thread Kurt Greaves (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14780:
-
Fix Version/s: (was: 4.x)
   4.0

> Avoid creating empty compaction tasks after truncate
> 
>
> Key: CASSANDRA-14780
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14780
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Minor
> Fix For: 4.0
>
>
> There is a chance we create empty {{RepairFinishedCompactionTask}} after a 
> table is truncated



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14829) Make stop-server.bat wait for Cassandra to terminate

2018-10-29 Thread Kurt Greaves (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kurt Greaves updated CASSANDRA-14829:
-
Assignee: Georg Dietrich
  Status: Patch Available  (was: Open)

> Make stop-server.bat wait for Cassandra to terminate
> 
>
> Key: CASSANDRA-14829
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14829
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Packaging
> Environment: Windows 10
>Reporter: Georg Dietrich
>Assignee: Georg Dietrich
>Priority: Minor
>  Labels: easyfix, windows
> Fix For: 3.11.x, 4.x, 4.0.x
>
>
> While administering a single node Cassandra on Windows, I noticed that the 
> stop-server.bat script returns before the cassandra process has actually 
> terminated. For use cases like creating a script "shut down & create backup 
> of data directory without having to worry about open files, then restart", it 
> would be good to make stop-server.bat wait for Cassandra to terminate.
> All that is needed for that is to change in 
> apache-cassandra-3.11.3\bin\stop-server.bat "start /B powershell /file ..." 
> to "start /WAIT /B powershell /file ..." (additional /WAIT parameter).
> Does this sound reasonable?
> Here is the pull request: https://github.com/apache/cassandra/pull/287



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-10-29 Thread Joseph Lynch (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667881#comment-16667881
 ] 

Joseph Lynch commented on CASSANDRA-14358:
--

{quote}[Why add protocol version 
here?|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-14358#diff-1560ed3bf5675f8ec0b1b35198debe15R41]
{quote}
Because otherwise the test fails the {{protocolVersion}} check instead of the 
{{sendBufferSize}} check which the test was (I believe) trying to test. Before 
this patch I believe that the unit test was testing the wrong thing.
{quote}[The acceptable range of values differs, but the tests are for both are 
-1|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-14358#diff-f5e4f8d3a95c98844b371ba1d1e98285R224].
 I just want to confirm what the underlying tunables actually support.
{quote}
Correct, the {{TCP_USER_TIMEOUT}} can be set to zero (which will pick up the OS 
level setting).
{noformat}
   TCP_USER_TIMEOUT (since Linux 2.6.37)
  This option takes an unsigned int as an argument.  When the value 
is greater than 0, it specifies the maximum amount of time in milliseconds that 
transmitted data may remain unacknowledged before TCP will forcibly close the 
corresponding connection and return ETIMEDOUT to the applica‐
  tion.  If the option value is specified as 0, TCP will to use the 
system default.
{noformat}
The connection timeout setting also [supports 
zero|https://netty.io/4.0/api/io/netty/channel/ChannelConfig.html#setConnectTimeoutMillis-int-]
 (meaning disable) but I don't (imo) think that users should ever disable 
connection timeouts. A user could plausibly set it to the same value as their 
RPC timeout, or even higher, but I don't think turning it off ever makes sense.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested
> Fix For: 4.0, 2.1.x, 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 10 Minute Partition.pdf
>
>
> edit summary: This primarily impacts networks with stateful firewalls such as 
> AWS. I'm working on a proper patch for trunk but unfortunately it relies on 
> the Netty refactor in 4.0 so it will be hard to backport to previous 
> versions. A workaround for earlier versions is to set the 
> {{net.ipv4.tcp_retries2}} sysctl to ~5. This can be done with the following:
> {code:java}
> $ cat /etc/sysctl.d/20-cassandra-tuning.conf
> net.ipv4.tcp_retries2=5
> $ # Reload all sysctls
> $ sysctl --system{code}
> Original Bug Report:
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before 

[jira] [Created] (CASSANDRA-14858) Validate range tombstones and clustering order on the write path

2018-10-29 Thread Blake Eggleston (JIRA)
Blake Eggleston created CASSANDRA-14858:
---

 Summary: Validate range tombstones and clustering order on the 
write path
 Key: CASSANDRA-14858
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14858
 Project: Cassandra
  Issue Type: Bug
Reporter: Blake Eggleston
Assignee: Blake Eggleston


Some fuzz testing I've been doing uncovered some situations where invalid 
tombstones could be written and/or unfiltereds could be written to sstables out 
of clustering order. Both of these caused pretty severe database breakage. 
Although cql statement logic should (in theory) prevent users from triggering 
these bugs in normal use, it's possible there's an edge case somewhere that 
could write data like this. We should add some validation in the lower levels 
of the storage layer to protect it from bugs / malicious input from the higher 
levels.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



svn commit: r1845181 - in /cassandra/site: publish/blog/2018/10/29/ publish/blog/2018/10/29/audit_logging_cassandra.html publish/blog/index.html publish/feed.xml src/_posts/2018-10-29-audit_logging_ca

2018-10-29 Thread zznate
Author: zznate
Date: Mon Oct 29 22:53:37 2018
New Revision: 1845181

URL: http://svn.apache.org/viewvc?rev=1845181=rev
Log:
CASSANDRA-14835 - Audit Logging in 4.0 blog post from Vinay Chella

Added:
cassandra/site/publish/blog/2018/10/29/
cassandra/site/publish/blog/2018/10/29/audit_logging_cassandra.html
cassandra/site/src/_posts/2018-10-29-audit_logging_cassandra.markdown
Modified:
cassandra/site/publish/blog/index.html
cassandra/site/publish/feed.xml

Added: cassandra/site/publish/blog/2018/10/29/audit_logging_cassandra.html
URL: 
http://svn.apache.org/viewvc/cassandra/site/publish/blog/2018/10/29/audit_logging_cassandra.html?rev=1845181=auto
==
--- cassandra/site/publish/blog/2018/10/29/audit_logging_cassandra.html (added)
+++ cassandra/site/publish/blog/2018/10/29/audit_logging_cassandra.html Mon Oct 
29 22:53:37 2018
@@ -0,0 +1,358 @@
+
+
+  
+
+
+
+
+  
+  
+  
+  
+  
+  
+
+
+  Audit Logging in Apache Cassandra 4.0
+
+  http://cassandra.apache.org/blog/2018/10/29/audit_logging_cassandra.html;>
+
+  https://maxcdn.bootstrapcdn.com/bootstrap/3.3.6/css/bootstrap.min.css; 
integrity="sha384-1q8mTJOASx8j1Au+a5WDVnPi2lkFfwwEAa8hDDdjZlpLegxhjVME1fgjWPGmkzs7"
 crossorigin="anonymous">
+  
+  
+
+  
+  https://use.fontawesome.com/releases/v5.2.0/css/all.css; 
integrity="sha384-hWVjflwFxL6sNzntih27bfxkr27PmbbK/iSvJ+a4+0owXq79v+lsFkW54bOGbiDQ"
 crossorigin="anonymous">
+  
+  http://cassandra.apache.org/feed.xml; title="Apache Cassandra Website" />
+
+
+  
+
+
+  
+
+  
+
+  
+  Apache Software Foundation 
+  
+http://www.apache.org;>Apache Homepage
+http://www.apache.org/licenses/;>License
+http://www.apache.org/foundation/sponsorship.html;>Sponsorship
+http://www.apache.org/foundation/thanks.html;>Thanks
+http://www.apache.org/security/;>Security
+  
+
+  
+
+  
+  Apache Cassandra
+  
+
+  
+
+Audit Logging in Apache Cassandra 4.0
+
+  
+
+  
+
+  
+
+  
+
+  
+  
+
+  
+
+  Toggle navigation
+  
+  
+  
+
+
+  
+
+  
+
+  Home
+  Download
+  Documentation
+  Community
+  
+Blog
+
+
+  
+
+  
+
+  
+
+
+
+  
+  Audit Logging in Apache Cassandra 4.0
+Posted on October 29, 2018 by the Apache Cassandra Community
+ Back to the Apache Cassandra Blog
+
+  Database audit logging is an industry standard tool for enterprises to
+capture critical data change events including what data changed and who
+triggered the event. These captured records can then be reviewed later
+to ensure compliance with regulatory, security and operational policies.
+
+Prior to Apache Cassandra 4.0, the open source community did not have a
+good way of tracking such critical database activity. With this goal in
+mind, Netflix implemented
+https://issues.apache.org/jira/browse/CASSANDRA-12151;>CASSANDRA-12151
+so that users of Cassandra would have a simple yet powerful audit
+logging tool built into their database out of the box.
+
+Why are Audit Logs Important?
+
+Audit logging database activity is one of the key components for making
+a database truly ready for the enterprise. Audit logging is generally
+useful but enterprises frequently use it for:
+
+
+  Regulatory compliance with laws such as https://en.wikipedia.org/wiki/Sarbanes%E2%80%93Oxley_Act;>SOX, https://en.wikipedia.org/wiki/Payment_Card_Industry_Data_Security_Standard;>PCI
 and https://en.wikipedia.org/wiki/General_Data_Protection_Regulation;>GDPR
 et al. These types of compliance are crucial for companies that are traded on 
public stock exchanges, hold payment information such as credit cards, or 
retain private user information.
+  Security compliance. Companies often have strict rules for what data can 
be accessed by which employees, both to protect the privacy of users but also 
to limit the probability of a data breach.
+  Debugging complex data corruption bugs such as those found in massively 
distributed microservice architectures like Netflix’s.
+
+
+Why is Audit Logging Difficult?
+
+Implementing a simple logger in the request (inbound/outbound) path
+sounds easy, but the devil is in the details. In particular, the “fast
+path” of a database, where audit logging must operate, strives to do as
+little as humanly possible so that users get the fastest and most
+scalable database system possible. While implementing Cassandra audit
+logging, we had to ensure that the audit log infrastructure does not
+take up excessive CPU or IO resources from the actual database execution
+itself. However, one cannot simply optimize only for performance because
+that may compromise the guarantees of the audit 

[jira] [Commented] (CASSANDRA-14835) Blog Post: "Audit Logging in Apache Cassandra 4.0"

2018-10-29 Thread Nate McCall (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667852#comment-16667852
 ] 

Nate McCall commented on CASSANDRA-14835:
-

Assigned and toggled to patch submitted (which !patch submitted may have been 
the issue?)

 

> Blog Post: "Audit Logging in Apache Cassandra 4.0"
> --
>
> Key: CASSANDRA-14835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14835
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Vinay Chella
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: blog
> Attachments: 14835_audit_logging_cassandra.patch, 
> 14835_auditlog_blog_rendered.png
>
>
> This is a blog post talking about Audit Logging feature in Apache Cassandra 
> 4.0 (CASSANDRA-12151). 
> I am sharing the google doc link at this moment for reviews, as soon as we 
> finalize, will send the SVN patch with markdown



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-10-29 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667851#comment-16667851
 ] 

Ariel Weisberg commented on CASSANDRA-14358:


[Why add protocol version 
here?|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-14358#diff-1560ed3bf5675f8ec0b1b35198debe15R41]
[The acceptable range of values differs, but the tests are for both are 
-1|https://github.com/apache/cassandra/compare/trunk...jolynch:CASSANDRA-14358#diff-f5e4f8d3a95c98844b371ba1d1e98285R224].
 I just want to confirm what the underlying tunables actually support.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Assignee: Joseph Lynch
>Priority: Major
>  Labels: 4.0-feature-freeze-review-requested
> Fix For: 4.0, 2.1.x, 2.2.x, 3.0.x, 3.11.x
>
> Attachments: 10 Minute Partition.pdf
>
>
> edit summary: This primarily impacts networks with stateful firewalls such as 
> AWS. I'm working on a proper patch for trunk but unfortunately it relies on 
> the Netty refactor in 4.0 so it will be hard to backport to previous 
> versions. A workaround for earlier versions is to set the 
> {{net.ipv4.tcp_retries2}} sysctl to ~5. This can be done with the following:
> {code:java}
> $ cat /etc/sysctl.d/20-cassandra-tuning.conf
> net.ipv4.tcp_retries2=5
> $ # Reload all sysctls
> $ sysctl --system{code}
> Original Bug Report:
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck never making any progress 
> until the OS intervenes.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> 

[jira] [Updated] (CASSANDRA-14835) Blog Post: "Audit Logging in Apache Cassandra 4.0"

2018-10-29 Thread Nate McCall (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nate McCall updated CASSANDRA-14835:

Reviewer: Nate McCall
  Status: Patch Available  (was: Open)

> Blog Post: "Audit Logging in Apache Cassandra 4.0"
> --
>
> Key: CASSANDRA-14835
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14835
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Vinay Chella
>Assignee: Vinay Chella
>Priority: Minor
>  Labels: blog
> Attachments: 14835_audit_logging_cassandra.patch, 
> 14835_auditlog_blog_rendered.png
>
>
> This is a blog post talking about Audit Logging feature in Apache Cassandra 
> 4.0 (CASSANDRA-12151). 
> I am sharing the google doc link at this moment for reviews, as soon as we 
> finalize, will send the SVN patch with markdown



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14841) Unknown column coordinator_port during deserialization

2018-10-29 Thread Dinesh Joshi (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dinesh Joshi updated CASSANDRA-14841:
-
Reviewers: Dinesh Joshi

> Unknown column coordinator_port during deserialization
> --
>
> Key: CASSANDRA-14841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14841
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Ariel Weisberg
>Priority: Major
>
> When upgrading from 3.x to 4.0 I get exceptions in the old nodes once the 
> first 4.0 node starts up. I have tested to upgrade from both 3.0.15 and 
> 3.11.3 and get the same problem.
>  
> {noformat}
> 2018-10-22T11:12:05.060+0200 ERROR 
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228 
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during 
> deserialization
> at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]{noformat}
> I think it was introduced by CASSANDRA-7544.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14849) some empty/invalid bounds aren't caught by SelectStatement

2018-10-29 Thread Blake Eggleston (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667764#comment-16667764
 ] 

Blake Eggleston commented on CASSANDRA-14849:
-

Nice, that's much more succinct. Pushed up your changes, plus some expansion of 
the unit test.

> some empty/invalid bounds aren't caught by SelectStatement
> --
>
> Key: CASSANDRA-14849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14849
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0
>
>
> Nonsensical clustering bounds like "c >= 100 AND c < 100" aren't converted to 
> Slices.NONE like they should be. Although this seems to be completely benign, 
> it is technically incorrect and complicates some testing since it can cause 
> memtables and sstables to return different results for the same data for 
> these bounds in some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14761) Rename speculative_retry to match additional_write_policy

2018-10-29 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14761?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-14761:
---
Status: Patch Available  (was: In Progress)

[trunk 
change|https://github.com/apache/cassandra/compare/trunk...aweisberg:14761-trunk?expand=1]
[CircleCI|https://circleci.com/gh/aweisberg/cassandra/tree/14761-trunk]

> Rename speculative_retry to match additional_write_policy
> -
>
> Key: CASSANDRA-14761
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14761
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Core
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Major
> Fix For: 4.0
>
>
> It's not really speculative. This commit is where it was last named and shows 
> what to update 
> https://github.com/aweisberg/cassandra/commit/e1df8e977d942a1b0da7c2a7554149c781d0e6c3



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 16kb

2018-10-29 Thread Jon Haddad (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad updated CASSANDRA-13241:
---
Reviewer: Jon Haddad

> Lower default chunk_length_in_kb from 64kb to 16kb
> --
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14841) Unknown column coordinator_port during deserialization

2018-10-29 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667641#comment-16667641
 ] 

Ariel Weisberg commented on CASSANDRA-14841:


[trunk 
code|https://github.com/apache/cassandra/compare/trunk...aweisberg:14841-trunk?expand=1]
[CircleCI|https://circleci.com/gh/aweisberg/cassandra/tree/14841-trunk]

> Unknown column coordinator_port during deserialization
> --
>
> Key: CASSANDRA-14841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14841
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Ariel Weisberg
>Priority: Major
>
> When upgrading from 3.x to 4.0 I get exceptions in the old nodes once the 
> first 4.0 node starts up. I have tested to upgrade from both 3.0.15 and 
> 3.11.3 and get the same problem.
>  
> {noformat}
> 2018-10-22T11:12:05.060+0200 ERROR 
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228 
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during 
> deserialization
> at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]{noformat}
> I think it was introduced by CASSANDRA-7544.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14841) Unknown column coordinator_port during deserialization

2018-10-29 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-14841:
---
Status: Patch Available  (was: Open)

> Unknown column coordinator_port during deserialization
> --
>
> Key: CASSANDRA-14841
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14841
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Assignee: Ariel Weisberg
>Priority: Major
>
> When upgrading from 3.x to 4.0 I get exceptions in the old nodes once the 
> first 4.0 node starts up. I have tested to upgrade from both 3.0.15 and 
> 3.11.3 and get the same problem.
>  
> {noformat}
> 2018-10-22T11:12:05.060+0200 ERROR 
> [MessagingService-Incoming-/10.216.193.244] CassandraDaemon.java:228 
> Exception in thread Thread[MessagingService-Incoming-/10.216.193.244,5,main]
> java.lang.RuntimeException: Unknown column coordinator_port during 
> deserialization
> at org.apache.cassandra.db.Columns$Serializer.deserialize(Columns.java:452) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.filter.ColumnFilter$Serializer.deserialize(ColumnFilter.java:482)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:760)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.db.ReadCommand$Serializer.deserialize(ReadCommand.java:697)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.io.ForwardingVersionedSerializer.deserialize(ForwardingVersionedSerializer.java:50)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at org.apache.cassandra.net.MessageIn.read(MessageIn.java:123) 
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessage(IncomingTcpConnection.java:192)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.receiveMessages(IncomingTcpConnection.java:180)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]
> at 
> org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:94)
>  ~[apache-cassandra-3.11.3.jar:3.11.3]{noformat}
> I think it was introduced by CASSANDRA-7544.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14857) Use a more space efficient representation for compressed chunk offsets

2018-10-29 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667621#comment-16667621
 ] 

Ariel Weisberg edited comment on CASSANDRA-14857 at 10/29/18 7:28 PM:
--

For posterity some example code implementing a summing representation that uses 
~25% of the space of storing 8 byte values.
https://github.com/apache/cassandra/compare/trunk...aweisberg:14857-trunk?expand=1


was (Author: aweisberg):
For posterity some example code implementing a summing representation that uses 
~25% of the space of storing 8 byte values.
https://github.com/apache/cassandra/compare/trunk...aweisberg:14841-trunk?expand=1

> Use a more space efficient representation for compressed chunk offsets
> --
>
> Key: CASSANDRA-14857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compression
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 4.x
>
>
> CASSANDRA-13241 proposes a few implementations and in IRC there was 
> discussion of using a variable length representation. I like the fixed length 
> ones just because they yield a lot of benefit, but are simple and easy to 
> understand and debug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 16kb

2018-10-29 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667629#comment-16667629
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


[trunk 
patch|https://github.com/apache/cassandra/compare/trunk...aweisberg:13241-trunk?expand=1]
[CircleCI|https://circleci.com/gh/aweisberg/cassandra/tree/13241-trunk]

I updated the patch to only include a change of the default from 64kb to 16kb. 
I could use a reviewer.

> Lower default chunk_length_in_kb from 64kb to 16kb
> --
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Issue Comment Deleted] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 16kb

2018-10-29 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-13241:
---
Comment: was deleted

(was: [trunk 
patch|https://github.com/apache/cassandra/compare/trunk...aweisberg:13241-trunk?expand=1]
[CircleCI|https://circleci.com/gh/aweisberg/cassandra/tree/13241-trunk]

I updated the patch to only include a change of the default from 64kb to 16kb. 
I could use a reviewer.)

> Lower default chunk_length_in_kb from 64kb to 16kb
> --
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13241) Lower default chunk_length_in_kb from 64kb to 16kb

2018-10-29 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-13241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667623#comment-16667623
 ] 

Ariel Weisberg commented on CASSANDRA-13241:


[trunk 
patch|https://github.com/apache/cassandra/compare/trunk...aweisberg:13241-trunk?expand=1]
[CircleCI|https://circleci.com/gh/aweisberg/cassandra/tree/13241-trunk]

I updated the patch to only include a change of the default from 64kb to 16kb. 
I could use a reviewer.

> Lower default chunk_length_in_kb from 64kb to 16kb
> --
>
> Key: CASSANDRA-13241
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13241
> Project: Cassandra
>  Issue Type: Wish
>  Components: Core
>Reporter: Benjamin Roth
>Assignee: Ariel Weisberg
>Priority: Major
> Attachments: CompactIntegerSequence.java, 
> CompactIntegerSequenceBench.java, CompactSummingIntegerSequence.java
>
>
> Having a too low chunk size may result in some wasted disk space. A too high 
> chunk size may lead to massive overreads and may have a critical impact on 
> overall system performance.
> In my case, the default chunk size lead to peak read IOs of up to 1GB/s and 
> avg reads of 200MB/s. After lowering chunksize (of course aligned with read 
> ahead), the avg read IO went below 20 MB/s, rather 10-15MB/s.
> The risk of (physical) overreads is increasing with lower (page cache size) / 
> (total data size) ratio.
> High chunk sizes are mostly appropriate for bigger payloads pre request but 
> if the model consists rather of small rows or small resultsets, the read 
> overhead with 64kb chunk size is insanely high. This applies for example for 
> (small) skinny rows.
> Please also see here:
> https://groups.google.com/forum/#!topic/scylladb-dev/j_qXSP-6-gY
> To give you some insights what a difference it can make (460GB data, 128GB 
> RAM):
> - Latency of a quite large CF: https://cl.ly/1r3e0W0S393L
> - Disk throughput: https://cl.ly/2a0Z250S1M3c
> - This shows, that the request distribution remained the same, so no "dynamic 
> snitch magic": https://cl.ly/3E0t1T1z2c0J



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14857) Use a more space efficient representation for compressed chunk offsets

2018-10-29 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667621#comment-16667621
 ] 

Ariel Weisberg edited comment on CASSANDRA-14857 at 10/29/18 7:12 PM:
--

For posterity some example code implementing a summing representation that uses 
~25% of the space of storing 8 byte values.
https://github.com/apache/cassandra/compare/trunk...aweisberg:14841-trunk?expand=1


was (Author: aweisberg):
For posterity some example code implementing a summing representation that uses 
+25% of the space of storing 8 byte values.
https://github.com/apache/cassandra/compare/trunk...aweisberg:14841-trunk?expand=1

> Use a more space efficient representation for compressed chunk offsets
> --
>
> Key: CASSANDRA-14857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compression
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 4.x
>
>
> CASSANDRA-13241 proposes a few implementations and in IRC there was 
> discussion of using a variable length representation. I like the fixed length 
> ones just because they yield a lot of benefit, but are simple and easy to 
> understand and debug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14857) Use a more space efficient representation for compressed chunk offsets

2018-10-29 Thread Ariel Weisberg (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667621#comment-16667621
 ] 

Ariel Weisberg commented on CASSANDRA-14857:


For posterity some example code implementing a summing representation that uses 
+25% of the space of storing 8 byte values.
https://github.com/apache/cassandra/compare/trunk...aweisberg:14841-trunk?expand=1

> Use a more space efficient representation for compressed chunk offsets
> --
>
> Key: CASSANDRA-14857
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14857
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Compression
>Reporter: Ariel Weisberg
>Assignee: Ariel Weisberg
>Priority: Minor
> Fix For: 4.x
>
>
> CASSANDRA-13241 proposes a few implementations and in IRC there was 
> discussion of using a variable length representation. I like the fixed length 
> ones just because they yield a lot of benefit, but are simple and easy to 
> understand and debug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14857) Use a more space efficient representation for compressed chunk offsets

2018-10-29 Thread Ariel Weisberg (JIRA)
Ariel Weisberg created CASSANDRA-14857:
--

 Summary: Use a more space efficient representation for compressed 
chunk offsets
 Key: CASSANDRA-14857
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14857
 Project: Cassandra
  Issue Type: Improvement
  Components: Compression
Reporter: Ariel Weisberg
Assignee: Ariel Weisberg
 Fix For: 4.x


CASSANDRA-13241 proposes a few implementations and in IRC there was discussion 
of using a variable length representation. I like the fixed length ones just 
because they yield a lot of benefit, but are simple and easy to understand and 
debug.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14813) Crash frequently due to fatal error caused by "StubRoutines::updateBytesCRC32"

2018-10-29 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-14813:
---
Attachment: (was: image.png)

> Crash frequently due to fatal error caused by "StubRoutines::updateBytesCRC32"
> --
>
> Key: CASSANDRA-14813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14813
> Project: Cassandra
>  Issue Type: Bug
> Environment: *OS:* 
> {code:java}
> CentOS release 6.9 (Final){code}
> *JAVA:*
> {noformat}
> java version "1.8.0_101"
>  Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
>  Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode){noformat}
>  
> *Memory:*
> {noformat}
> 256GB{noformat}
> *CPU:*
> {noformat}
> 4*  Intel® Xeon® Processor E5-2620 v4{noformat}
> *DISK:*
> {noformat}
> Filesystem Size Used Avail Use% Mounted on
>  /dev/sda3 423G 28G 374G 7% /
>  tmpfs 126G 68K 126G 1% /dev/shm
>  /dev/sda1 240M 40M 188M 18% /boot
>  /dev/sdb 3.7T 33M 3.7T 1% /mpp-data/c/cache
>  /dev/sdc 3.7T 2.7T 984G 74% /mpp-data/c/data00
>  /dev/sdd 3.7T 2.5T 1.2T 68% /mpp-data/c/data01
>  /dev/sde 3.7T 2.7T 1.1T 72% /mpp-data/c/data02
>  /dev/sdf 3.7T 2.5T 1.2T 68% /mpp-data/c/data03
>  /dev/sdg 3.7T 2.4T 1.3T 66% /mpp-data/c/data04
>  /dev/sdh 3.7T 2.6T 1.2T 69% /mpp-data/c/data05
>  /dev/sdi 3.7T 2.6T 1.2T 70% /mpp-data/c/data06{noformat}
>Reporter: Jinchao Zhang
>Priority: Major
>  Labels: StubRoutines, crash, updateBytesCRC32
> Attachments: f3.png, f4.png, hs1.png, hs2.png, hs_err_pid26350.log
>
>
> Recently, we encountered the same problem described by CASSANDRA-14283 in our 
> production system, which runs on Cassandra 3.11.2. We noticed that this issue 
> has been resolved in Cassandra 3.11.3 (CASSANDRA-14284), thus we upgrade our 
> system to 3.11.3. However, this induce more frequent crash,as shown in the 
> following screenshots, and the reduced hs file is posted here as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14813) Crash frequently due to fatal error caused by "StubRoutines::updateBytesCRC32"

2018-10-29 Thread Ariel Weisberg (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ariel Weisberg updated CASSANDRA-14813:
---
Attachment: image.png

> Crash frequently due to fatal error caused by "StubRoutines::updateBytesCRC32"
> --
>
> Key: CASSANDRA-14813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14813
> Project: Cassandra
>  Issue Type: Bug
> Environment: *OS:* 
> {code:java}
> CentOS release 6.9 (Final){code}
> *JAVA:*
> {noformat}
> java version "1.8.0_101"
>  Java(TM) SE Runtime Environment (build 1.8.0_101-b13)
>  Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode){noformat}
>  
> *Memory:*
> {noformat}
> 256GB{noformat}
> *CPU:*
> {noformat}
> 4*  Intel® Xeon® Processor E5-2620 v4{noformat}
> *DISK:*
> {noformat}
> Filesystem Size Used Avail Use% Mounted on
>  /dev/sda3 423G 28G 374G 7% /
>  tmpfs 126G 68K 126G 1% /dev/shm
>  /dev/sda1 240M 40M 188M 18% /boot
>  /dev/sdb 3.7T 33M 3.7T 1% /mpp-data/c/cache
>  /dev/sdc 3.7T 2.7T 984G 74% /mpp-data/c/data00
>  /dev/sdd 3.7T 2.5T 1.2T 68% /mpp-data/c/data01
>  /dev/sde 3.7T 2.7T 1.1T 72% /mpp-data/c/data02
>  /dev/sdf 3.7T 2.5T 1.2T 68% /mpp-data/c/data03
>  /dev/sdg 3.7T 2.4T 1.3T 66% /mpp-data/c/data04
>  /dev/sdh 3.7T 2.6T 1.2T 69% /mpp-data/c/data05
>  /dev/sdi 3.7T 2.6T 1.2T 70% /mpp-data/c/data06{noformat}
>Reporter: Jinchao Zhang
>Priority: Major
>  Labels: StubRoutines, crash, updateBytesCRC32
> Attachments: f3.png, f4.png, hs1.png, hs2.png, hs_err_pid26350.log
>
>
> Recently, we encountered the same problem described by CASSANDRA-14283 in our 
> production system, which runs on Cassandra 3.11.2. We noticed that this issue 
> has been resolved in Cassandra 3.11.3 (CASSANDRA-14284), thus we upgrade our 
> system to 3.11.3. However, this induce more frequent crash,as shown in the 
> following screenshots, and the reduced hs file is posted here as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14856) Thoroughly test how the GossipingPropertyFileSnitch (GPFS) behaves when fields go missing in gossip

2018-10-29 Thread Jeremy Hanna (JIRA)
Jeremy Hanna created CASSANDRA-14856:


 Summary: Thoroughly test how the GossipingPropertyFileSnitch 
(GPFS) behaves when fields go missing in gossip
 Key: CASSANDRA-14856
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14856
 Project: Cassandra
  Issue Type: Test
  Components: Configuration
Reporter: Jeremy Hanna


>From the [dev list 
>discussion|https://lists.apache.org/thread.html/998f5f674ba244c4003893364ee068651da40902559ee1e50bb1c602@%3Cdev.cassandra.apache.org%3E]
> about deprecating the PropertyFileSnitch (PFS).  It appears that there are 
>still times that fields go missing from gossip.  Theoretically and anecdotally 
>in practice, that shouldn't cause problems for GPFS.  However it would be nice 
>to do more thorough testing of the effects missing gossip fields have on 
>things that rely on it (e.g. GPFS).  That would allow us to more confidently 
>deprecate/remove PFS as we would have more confidence in how solid GPFS is.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14847) improvement of nodetool status -r

2018-10-29 Thread Jeremy Hanna (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14847?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-14847:
-
Description: 
Hello,

When using "nodetool status -r", I found a problem that the response time 
becomes longer depending on the number of vnodes.
 In my testing environment, when the num_token is 256 and the number of nodes 
is 6, the response takes about 60 seconds.

It turned out that the findMaxAddressLength method in status.java is causing 
the delay.
 Despite only obtaining the maximum length of the address by the number of 
vnodes, `tokenrange * vnode` times also loop processing, there is redundancy.

To prevent duplicate host names from being referenced every time, I modified to 
check with hash.
 In my environment, the response time has been reduced from 60 seconds to 2 
seconds.

I attached the patch, so please check it.
 Thank you
{code:java}
[before]
Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN *** 559.32 KB 256 48.7% 0555746a-60c2-4717-b042-94ba951ef679 ***
UN *** 721.48 KB 256 51.4% 1af4acb6-e0a0-4bcb-8bba-76ae2e225cd5 ***
UN *** 699.98 KB 256 48.3% 5215c728-9b80-4e3c-b46b-c5b8e5eb753f ***
UN *** 691.65 KB 256 48.1% 57da4edf-4acb-474d-b26c-27f048c37bd6 ***
UN *** 705.66 KB 256 52.8% 07520eab-47d2-4f5d-aeeb-f6e599c9b084 ***
UN *** 610.87 KB 256 50.7% 6b39acaf-6ed6-42e4-a357-0d258bdf87b7 ***

time : 66s

[after]
Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN *** 559.32 KB 256 48.7% 0555746a-60c2-4717-b042-94ba951ef679 ***
UN *** 721.48 KB 256 51.4% 1af4acb6-e0a0-4bcb-8bba-76ae2e225cd5 ***
UN *** 699.98 KB 256 48.3% 5215c728-9b80-4e3c-b46b-c5b8e5eb753f ***
UN *** 691.65 KB 256 48.1% 57da4edf-4acb-474d-b26c-27f048c37bd6 ***
UN *** 705.66 KB 256 52.8% 07520eab-47d2-4f5d-aeeb-f6e599c9b084 ***
UN *** 610.87 KB 256 50.7% 6b39acaf-6ed6-42e4-a357-0d258bdf87b7 ***

time : 2s
{code}

  was:
Hello,

When using "nodetool -r", I found a problem that the response time becomes 
longer depending on the number of vnodes.
 In my testing environment, when the num_token is 256 and the number of nodes 
is 6, the response takes about 60 seconds.

It turned out that the findMaxAddressLength method in status.java is causing 
the delay.
 Despite only obtaining the maximum length of the address by the number of 
vnodes, `tokenrange * vnode` times also loop processing, there is redundancy.

To prevent duplicate host names from being referenced every time, I modified to 
check with hash.
 In my environment, the response time has been reduced from 60 seconds to 2 
seconds.

I attached the patch, so please check it.
 Thank you
{code:java}
[before]
Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN *** 559.32 KB 256 48.7% 0555746a-60c2-4717-b042-94ba951ef679 ***
UN *** 721.48 KB 256 51.4% 1af4acb6-e0a0-4bcb-8bba-76ae2e225cd5 ***
UN *** 699.98 KB 256 48.3% 5215c728-9b80-4e3c-b46b-c5b8e5eb753f ***
UN *** 691.65 KB 256 48.1% 57da4edf-4acb-474d-b26c-27f048c37bd6 ***
UN *** 705.66 KB 256 52.8% 07520eab-47d2-4f5d-aeeb-f6e599c9b084 ***
UN *** 610.87 KB 256 50.7% 6b39acaf-6ed6-42e4-a357-0d258bdf87b7 ***

time : 66s

[after]
Datacenter: dc1
===
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN *** 559.32 KB 256 48.7% 0555746a-60c2-4717-b042-94ba951ef679 ***
UN *** 721.48 KB 256 51.4% 1af4acb6-e0a0-4bcb-8bba-76ae2e225cd5 ***
UN *** 699.98 KB 256 48.3% 5215c728-9b80-4e3c-b46b-c5b8e5eb753f ***
UN *** 691.65 KB 256 48.1% 57da4edf-4acb-474d-b26c-27f048c37bd6 ***
UN *** 705.66 KB 256 52.8% 07520eab-47d2-4f5d-aeeb-f6e599c9b084 ***
UN *** 610.87 KB 256 50.7% 6b39acaf-6ed6-42e4-a357-0d258bdf87b7 ***

time : 2s
{code}


> improvement of nodetool status -r
> -
>
> Key: CASSANDRA-14847
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14847
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Fumiya Yamashita
>Priority: Major
> Fix For: 3.11.x
>
> Attachments: 3.11.1.patch
>
>
> Hello,
> When using "nodetool status -r", I found a problem that the response time 
> becomes longer depending on the number of vnodes.
>  In my testing environment, when the num_token is 256 and the number of nodes 
> is 6, the response takes about 60 seconds.
> It turned out that the findMaxAddressLength method in status.java is causing 
> the delay.
>  Despite only obtaining the 

[jira] [Updated] (CASSANDRA-14848) When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non seed nodes

2018-10-29 Thread Jeremy Hanna (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-14848:
-
Labels: security  (was: )

> When upgrading 3.11.3->4.0 using SSL 4.0 nodes does not connect to old non 
> seed nodes
> -
>
> Key: CASSANDRA-14848
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14848
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tommy Stendahl
>Priority: Major
>  Labels: security
>
> When upgrading from 3.11.3 to 4.0 with server encryption enabled the new 4.0 
> node only connects to 3.11.3 seed node, there are no connection established 
> to non-seed nodes on the old version.
> I have four nodes, *.242 is upgraded to 4.0, *.243 and *.244 are 3.11.3 
> non-seed and *.246 are 3.11.3 seed. After starting the 4.0 node I get this 
> nodetool status on the different nodes:
> {noformat}
> *.242
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN 10.216.193.242 1017.77 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 
> RAC1
> DN 10.216.193.243 743.32 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 
> RAC1
> DN 10.216.193.244 711.54 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 
> RAC1
> UN 10.216.193.246 659.81 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 
> RAC1
> *.243 and *.244
> -- Address Load Tokens Owns (effective) Host ID Rack
> DN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 
> RAC1
> UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1
> UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 
> RAC1
> UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 
> RAC1
> *.246
> -- Address Load Tokens Owns (effective) Host ID Rack
> UN 10.216.193.242 657.4 KiB 256 75,1% 7d278e14-d549-42f3-840d-77cfd852fbf4 
> RAC1
> UN 10.216.193.243 471 KiB 256 74,8% 5586243a-ca74-4125-8e7e-09e82e23c4e5 RAC1
> UN 10.216.193.244 471.71 KiB 256 75,2% c155e262-b898-4e86-9e1d-d4d0f97e88f6 
> RAC1
> UN 10.216.193.246 388.54 KiB 256 74,9% 502dd00f-fc02-4024-b65f-b98ba3808291 
> RAC1
> {noformat}
>  
> I have built 4.0 with wire tracing activated and in my config the 
> storage_port=12700 and ssl_storage_port=12701. In the log I can see that the 
> 4.0 node start to connect to the 3.11.3 seed node on the storage_port but 
> quickly switch to the ssl_storage_port, but when connecting to the non-seed 
> nodes it never switch to the ssl_storage_port.
> {noformat}
> >grep 193.246 system.log | grep Outbound
> 2018-10-25T10:57:36.799+0200 [MessagingService-NettyOutbound-Thread-4-1] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x2f0e5e55] CONNECT: 
> /10.216.193.246:12700
> 2018-10-25T10:57:36.902+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c] CONNECT: 
> /10.216.193.246:12701
> 2018-10-25T10:57:36.905+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, 
> L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] ACTIVE
> 2018-10-25T10:57:36.906+0200 [MessagingService-NettyOutbound-Thread-4-2] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x9e81f62c, 
> L:/10.216.193.242:37252 - R:10.216.193.246/10.216.193.246:12701] WRITE: 8B
> >grep 193.243 system.log | grep Outbound
> 2018-10-25T10:57:38.438+0200 [MessagingService-NettyOutbound-Thread-4-3] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xd8f1d6c4] CONNECT: 
> /10.216.193.243:12700
> 2018-10-25T10:57:38.540+0200 [MessagingService-NettyOutbound-Thread-4-4] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0xfde6cc9f] CONNECT: 
> /10.216.193.243:12700
> 2018-10-25T10:57:38.694+0200 [MessagingService-NettyOutbound-Thread-4-5] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x7e87fc4e] CONNECT: 
> /10.216.193.243:12700
> 2018-10-25T10:57:38.741+0200 [MessagingService-NettyOutbound-Thread-4-7] INFO 
> i.n.u.internal.logging.Slf4JLogger:101 info [id: 0x39395296] CONNECT: 
> /10.216.193.243:12700{noformat}
>  
> When I had the dbug log activated and started the 4.0 node I can see that it 
> switch port for *.246 but not for *.243 and *.244.
> {noformat}
> >grep DEBUG system.log| grep OutboundMessagingConnection | grep 
> >maybeUpdateConnectionId
> 2018-10-25T13:12:56.095+0200 [ScheduledFastTasks:1] DEBUG 
> o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing 
> connectionId to 10.216.193.246:12701 (GOSSIP), with a different port for 
> secure communication, because peer version is 11
> 2018-10-25T13:12:58.100+0200 [ReadStage-1] DEBUG 
> o.a.c.n.a.OutboundMessagingConnection:314 maybeUpdateConnectionId changing 
> connectionId to 10.216.193.246:12701 

[jira] [Updated] (CASSANDRA-14850) Make it possible to connect with user/pass + port in fqltool replay

2018-10-29 Thread Jeremy Hanna (JIRA)


 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeremy Hanna updated CASSANDRA-14850:
-
Labels: fqltool security  (was: )

> Make it possible to connect with user/pass + port in fqltool replay
> ---
>
> Key: CASSANDRA-14850
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14850
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
>  Labels: fqltool, security
> Fix For: 4.x
>
>
> We also need to close the executor service



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-14849) some empty/invalid bounds aren't caught by SelectStatement

2018-10-29 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667251#comment-16667251
 ] 

Aleksey Yeschenko edited comment on CASSANDRA-14849 at 10/29/18 2:05 PM:
-

I think the patch is correct, but I have a few issues with it.

1. We are inlining a copy of {{ClusteringComparator.compare()}}, ish, into 
{{Slice.isEmpty()}}. If the former changes somehow, there is a risk of 
forgetting to apply the difference to the inlined version.
 2. There is duplication of work. After making the regular {{compare()}} call, 
we are going through the motions again in the common case.
 3. There are different returns with some nesting involved that makes it a bit 
trickier to follow than necessary.

I *think* essentially we are just lacking a {{cmp == 0 && one of the bounds is 
exclusive}} condition, and the whole method can be simplified quite a bit 
(relatively). Pushed an illustration/review branch with those issues handled 
[here|https://github.com/iamaleksey/cassandra/commits/14849-review].


was (Author: iamaleksey):
I think the patch is correct, but I have a few issues with it.

1. We are inlining a copy of {{ClusteringComparator.compare()}}, ish, into 
{{Slices.isEmpty()}}. If the former changes somehow, there is a risk of 
forgetting to apply the difference to the inlined version.
 2. There is duplication of work. After making the regular {{compare()}} call, 
we are going through the motions again in the common case.
 3. There are different returns with some nesting involved that makes it a bit 
trickier to follow than necessary.

I *think* essentially we are just lacking a {{cmp == 0 && one of the bounds is 
exclusive}} condition, and the whole method can be simplified quite a bit 
(relatively). Pushed an illustration/review branch with those issues handled 
[here|https://github.com/iamaleksey/cassandra/commits/14849-review].

> some empty/invalid bounds aren't caught by SelectStatement
> --
>
> Key: CASSANDRA-14849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14849
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0
>
>
> Nonsensical clustering bounds like "c >= 100 AND c < 100" aren't converted to 
> Slices.NONE like they should be. Although this seems to be completely benign, 
> it is technically incorrect and complicates some testing since it can cause 
> memtables and sstables to return different results for the same data for 
> these bounds in some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14849) some empty/invalid bounds aren't caught by SelectStatement

2018-10-29 Thread Aleksey Yeschenko (JIRA)


[ 
https://issues.apache.org/jira/browse/CASSANDRA-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667251#comment-16667251
 ] 

Aleksey Yeschenko commented on CASSANDRA-14849:
---

I think the patch is correct, but I have a few issues with it.

1. We are inlining a copy of {{ClusteringComparator.compare()}}, ish, into 
{{Slices.isEmpty()}}. If the former changes somehow, there is a risk of 
forgetting to apply the difference to the inlined version.
 2. There is duplication of work. After making the regular {{compare()}} call, 
we are going through the motions again in the common case.
 3. There are different returns with some nesting involved that makes it a bit 
trickier to follow than necessary.

I *think* essentially we are just lacking a {{cmp == 0 && one of the bounds is 
exclusive}} condition, and the whole method can be simplified quite a bit 
(relatively). Pushed an illustration/review branch with those issues handled 
[here|https://github.com/iamaleksey/cassandra/commits/14849-review].

> some empty/invalid bounds aren't caught by SelectStatement
> --
>
> Key: CASSANDRA-14849
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14849
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Blake Eggleston
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0
>
>
> Nonsensical clustering bounds like "c >= 100 AND c < 100" aren't converted to 
> Slices.NONE like they should be. Although this seems to be completely benign, 
> it is technically incorrect and complicates some testing since it can cause 
> memtables and sstables to return different results for the same data for 
> these bounds in some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org