[jira] [Updated] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-03-30 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14358:
-
Description: 
I've been trying to debug nodes not being able to see each other during longer 
(~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can contribute to 
{{UnavailableExceptions}} during rolling restarts of 3.0.x and 2.1.x clusters 
for us. I think I finally have a lead. It appears that prior to trunk (with the 
awesome Netty refactor) we do not set socket connect timeouts on SSL 
connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set {{SO_TIMEOUT}} as far as 
I can tell on outbound connections either. I believe that this means that we 
could potentially block forever on {{connect}} or {{recv}} syscalls, and we 
could block forever on the SSL Handshake as well. I think that the OS will 
protect us somewhat (and that may be what's causing the eventual timeout) but I 
think that given the right network conditions our {{OutboundTCPConnection}} 
threads can just be stuck never making any progress until the OS intervenes.

I have attached some logs of such a network partition during a rolling restart 
where an old node in the cluster has a completely foobarred 
{{OutboundTcpConnection}} for ~10 minutes before finally getting a 
{{java.net.SocketException: Connection timed out (Write failed)}} and 
immediately successfully reconnecting. I conclude that the old node is the 
problem because the new node (the one that restarted) is sending ECHOs to the 
old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
stuck and can't make any forward progress. By the time we could notice this and 
slap TRACE logging on, the only thing we see is ~10 minutes later a 
{{SocketException}} inside {{writeConnected}}'s flush and an immediate 
recovery. It is interesting to me that the exception happens in 
{{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
failure}} I believe that this can't be a connection reset), because my 
understanding is that we should have a fully handshaked SSL connection at that 
point in the code.

Current theory:
 # "New" node restarts,  "Old" node calls 
[newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
 # Old node starts [creating a 
new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
 SSL socket 
 # SSLSocket calls 
[createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
 which conveniently calls connect with a default timeout of "forever". We could 
hang here forever until the OS kills us.
 # If we continue, we get to 
[writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
 which eventually calls 
[flush|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L341]
 on the output stream and also can hang forever. I think the probability is 
especially high when a node is restarting and is overwhelmed with SSL 
handshakes and such.

I don't fully understand the attached traceback as it appears we are getting a 
{{Connection Timeout}} from a {{send}} failure (my understanding is you can 
only get a connection timeout prior to a send), but I think it's reasonable 
that we have a timeout configuration issue. I'd like to try to make Cassandra 
robust to networking issues like this via maybe:
 # Change the {{SSLSocket}} {{getSocket}} methods to provide connection 
timeouts of 2s (equivalent to trunk's 
[timeout|https://github.com/apache/cassandra/blob/11496039fb18bb45407246602e31740c56d28157/src/java/org/apache/cassandra/net/async/NettyFactory.java#L329])
 # Appropriately set recv timeouts via {{SO_TIMEOUT}}, maybe something like 2 
minutes (in old versions via 
[setSoTimeout|https://docs.oracle.com/javase/8/docs/api/java/net/Socket.html#setSoTimeout-int-],
 in trunk via 
[SO_TIMEOUT|http://netty.io/4.0/api/io/netty/channel/ChannelOption.html#SO_TIMEOUT]
 # Since we can't set send timeouts afaik (thanks java) maybe we can have some 
kind of watchdog that ensures OutboundTcpConnection is making progress in its 
queue and if it doesn't make any progress for ~30s-1m, forces a disconnect.

If anyone has insight or suggestions, I'd be grateful. I am going to rule out 
if this is keepalive duration by setting tcp_keepalive_probes to like 1 and 
maybe tcp_retries2 to like 8 get more information about the 

[jira] [Updated] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-03-30 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14358:
-
Description: 
I've been trying to debug nodes not being able to see each other during longer 
(~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can contribute to 
{{UnavailableExceptions}} during rolling restarts of 3.0.x and 2.1.x clusters 
for us. I think I finally have a lead. It appears that prior to trunk (with the 
awesome Netty refactor) we do not set socket connect timeouts on SSL 
connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set {{SO_TIMEOUT}} as far as 
I can tell on outbound connections either. I believe that this means that we 
could potentially block forever on {{connect}} or {{recv}} syscalls, and we 
could block forever on the SSL Handshake as well. I think that the OS will 
protect us somewhat (and that may be what's causing the eventual timeout) but I 
think that given the right network conditions our {{OutboundTCPConnection}} 
threads can just be stuck forever never making any progress.

I have attached some logs of such a network partition during a rolling restart 
where an old node in the cluster has a completely foobarred 
{{OutboundTcpConnection}} for ~10 minutes before finally getting a 
{{java.net.SocketException: Connection timed out (Write failed)}} and 
immediately successfully reconnecting. I conclude that the old node is the 
problem because the new node (the one that restarted) is sending ECHOs to the 
old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
stuck and can't make any forward progress. By the time we could notice this and 
slap TRACE logging on, the only thing we see is ~10 minutes later a 
{{SocketException}} inside {{writeConnected}}'s flush and an immediate 
recovery. It is interesting to me that the exception happens in 
{{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
failure}} I believe that this can't be a connection reset), because my 
understanding is that we should have a fully handshaked SSL connection at that 
point in the code.

Current theory:
 # "New" node restarts,  "Old" node calls 
[newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
 # Old node starts [creating a 
new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
 SSL socket 
 # SSLSocket calls 
[createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
 which conveniently calls connect with a default timeout of "forever". We could 
hang here forever until the OS kills us.
 # If we continue, we get to 
[writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
 which eventually calls 
[flush|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L341]
 on the output stream and also can hang forever. I think the probability is 
especially high when a node is restarting and is overwhelmed with SSL 
handshakes and such.

I don't fully understand the attached traceback as it appears we are getting a 
{{Connection Timeout}} from a {{send}} failure (my understanding is you can 
only get a connection timeout prior to a send), but I think it's reasonable 
that we have a timeout configuration issue. I'd like to try to make Cassandra 
robust to networking issues like this via maybe:
 # Change the {{SSLSocket}} {{getSocket}} methods to provide connection 
timeouts of 2s (equivalent to trunk's 
[timeout|https://github.com/apache/cassandra/blob/11496039fb18bb45407246602e31740c56d28157/src/java/org/apache/cassandra/net/async/NettyFactory.java#L329])
 # Appropriately set recv timeouts via {{SO_TIMEOUT}}, maybe something like 2 
minutes (in old versions via 
[setSoTimeout|https://docs.oracle.com/javase/8/docs/api/java/net/Socket.html#setSoTimeout-int-],
 in trunk via 
[SO_TIMEOUT|http://netty.io/4.0/api/io/netty/channel/ChannelOption.html#SO_TIMEOUT]
 # Since we can't set send timeouts afaik (thanks java) maybe we can have some 
kind of watchdog that ensures OutboundTcpConnection is making progress in its 
queue and if it doesn't make any progress for ~30s-1m, forces a disconnect.

If anyone has insight or suggestions, I'd be grateful. I am going to rule out 
if this is keepalive duration by setting tcp_keepalive_probes to like 1 and 
maybe tcp_retries2 to like 8 get more information about the state of the tcp 

[jira] [Updated] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-03-30 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14358:
-
Environment: 
Cassandra 2.1.19 (also reproduced on 3.0.15), running with 
{{internode_encryption: all}} and the EC2 multi region snitch on Linux 4.13 
within the same AWS region. Smallest cluster I've seen the problem on is 12 
nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
reproduce on at least one node in the cluster.

So all the connections are SSL and we're connecting on the internal ip 
addresses (not the public endpoint ones).

Potentially relevant sysctls:
{noformat}
/proc/sys/net/ipv4/tcp_syn_retries = 2
/proc/sys/net/ipv4/tcp_synack_retries = 5
/proc/sys/net/ipv4/tcp_keepalive_time = 7200
/proc/sys/net/ipv4/tcp_keepalive_probes = 9
/proc/sys/net/ipv4/tcp_keepalive_intvl = 75
/proc/sys/net/ipv4/tcp_retries2 = 15
{noformat}

  was:
Cassandra 2.1.19 (also reproduced on 3.0.15), running with 
{{internode_encryption: all}} and the EC2 multi region snitch on Linux 4.13 
within the same AWS region. Smallest cluster I've seen the problem on is 12 
nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
reproduce on at least one node in the cluster.

So all the connections are SSL and we're connecting on the internal ip 
addresses (not the public endpoint ones).

Potentially relevant sysctls:
{noformat}
/proc/sys/net/ipv4/tcp_syn_retries = 2
/proc/sys/net/ipv4/tcp_synack_retries = 5
/proc/sys/net/ipv4/tcp_keepalive_time = 7200
/proc/sys/net/ipv4/tcp_keepalive_probes = 9
/proc/sys/net/ipv4/tcp_keepalive_intvl = 75
{noformat}


> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> /proc/sys/net/ipv4/tcp_retries2 = 15
> {noformat}
>Reporter: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{recv}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck forever never making any 
> progress.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see 

[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-03-30 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421098#comment-16421098
 ] 

Joseph Lynch commented on CASSANDRA-14346:
--

[~bdeggleston]
{quote}I think the problems that exist in C* with regard to understanding the 
state of repairs and streams, and the inability to cancel them without 
restarting nodes are orthogonal to talking about the best approach to 
coordinate them. 
{quote}
I disagree. When you're inside Cassandra you have one process lifecycle and 
don't have to do IPC via JMX (which is, honestly speaking, really bad IPC). A 
concrete example, when the outside process restarts it loses all active JMX 
connections and therefore loses track of all repairs, and it can't get them 
back. We'd have to implement some kind of more robust IPC than JMX (e.g. 
CASSANDRA-12944) for this to ever work well imo. On the other hand when the 
scheduler is inside the same process, we don't have to solve IPC, just 
inter-thread communication which is much easier.
{quote}As far as I’m aware, it’s not currently possible for a repair to 
determine if it’s taking a long time, finished with a lost notification, or 
stuck somewhere. So that’s really a limitation in the design of how cassandra 
does individual streams and repair sessions that should be solved regardless, 
and not really an argument in favor of one approach or the other.
{quote}
I definitely agree this is a big problem either way, and I think the core idea 
of our proposal is to keep work small so that if we do have to cancel or lose 
them it's not a big deal. Hopefully with robust incremental repair this won't 
be as big an issue because the occasional full range can just do super small 
subranges and not worry about streaming too many sstables since incrementals 
theoretically repaired most of the data already.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Major
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-03-30 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14358:
-
Description: 
I've been trying to debug nodes not being able to see each other during longer 
(~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can contribute to 
{{UnavailableExceptions}} during rolling restarts of 3.0.x and 2.1.x clusters 
for us. I think I finally have a lead. It appears that prior to trunk (with the 
awesome Netty refactor) we do not set socket connect timeouts on SSL 
connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set {{SO_TIMEOUT}} as far as 
I can tell on outbound connections either. I believe that this means that we 
could potentially block forever on {{connect}} or {{recv}} syscalls, and we 
could block forever on the SSL Handshake as well. I think that the OS will 
protect us somewhat (and that may be what's causing the eventual timeout) but I 
think that given the right network conditions our {{OutboundTCPConnection}} 
threads can just be stuck forever never making any progress.

I have attached some logs of such a network partition during a rolling restart 
where an old node in the cluster has a completely foobarred 
{{OutboundTcpConnection}} for ~10 minutes before finally getting a 
{{java.net.SocketException: Connection timed out (Write failed)}} and 
immediately successfully reconnecting. I conclude that the old node is the 
problem because the new node (the one that restarted) is sending ECHOs to the 
old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
stuck and can't make any forward progress. By the time we could notice this and 
slap TRACE logging on, the only thing we see is ~10 minutes later a 
{{SocketException}} inside {{writeConnected}}'s flush and an immediate 
recovery. It is interesting to me that the exception happens in 
{{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
failure}} I believe that this can't be a connection reset), because my 
understanding is that we should have a fully handshaked SSL connection at that 
point in the code.

Current theory:
 # "New" node restarts,  "Old" node calls 
[newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
 # Old node starts [creating a 
new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
 SSL socket 
 # SSLSocket calls 
[createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
 which conveniently calls connect with a default timeout of "forever". We could 
hang here forever until the OS kills us.
 # If we continue, we get to 
[writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
 which eventually calls 
[flush|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L341]
 on the output stream and also can hang forever. I think the probability is 
especially high when a node is restarting and is overwhelmed with SSL 
handshakes and such.

I don't fully understand the attached traceback as it appears we are getting a 
{{Connection Timeout}} from a {{send}} failure (my understanding is you can 
only get a connection timeout prior to a send), but I think it's reasonable 
that we have a timeout configuration issue. I'd like to try to make Cassandra 
robust to networking issues like this via maybe:
 # Change the {{SSLSocket}} {{getSocket}} methods to provide connection 
timeouts of 2s (equivalent to trunk's 
[timeout|https://github.com/apache/cassandra/blob/11496039fb18bb45407246602e31740c56d28157/src/java/org/apache/cassandra/net/async/NettyFactory.java#L329])
 # Appropriately set recv timeouts via {{SO_TIMEOUT}}, maybe something like 2 
minutes (in old versions via 
[setSoTimeout|https://docs.oracle.com/javase/8/docs/api/java/net/Socket.html#setSoTimeout-int-],
 in trunk via 
[SO_TIMEOUT|http://netty.io/4.0/api/io/netty/channel/ChannelOption.html#SO_TIMEOUT]
 # Since we can't set send timeouts afaik (thanks java) maybe we can have some 
kind of watchdog that ensures OutboundTcpConnection is making progress in its 
queue and if it doesn't make any progress for ~30s-1m, forces a disconnect.

If anyone has insight or suggestions, I'd be grateful. I am going to rule out 
if this is keepalive duration by setting tcp_keepalive_probes to like 1; and 
get more information about the state of the tcp connections the next time this 

[jira] [Commented] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-03-30 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421087#comment-16421087
 ] 

Joseph Lynch commented on CASSANDRA-14358:
--

It's also worth noting that the non ssl connections have the same problem, it's 
just unlikely I think that the destination server get's as overloaded and drops 
a handshake.

> OutboundTcpConnection can hang for many minutes when nodes restart
> --
>
> Key: CASSANDRA-14358
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
> with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
> 4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
> 12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
> reproduce on at least one node in the cluster.
> So all the connections are SSL and we're connecting on the internal ip 
> addresses (not the public endpoint ones).
> Potentially relevant sysctls:
> {noformat}
> /proc/sys/net/ipv4/tcp_syn_retries = 2
> /proc/sys/net/ipv4/tcp_synack_retries = 5
> /proc/sys/net/ipv4/tcp_keepalive_time = 7200
> /proc/sys/net/ipv4/tcp_keepalive_probes = 9
> /proc/sys/net/ipv4/tcp_keepalive_intvl = 75
> {noformat}
>Reporter: Joseph Lynch
>Priority: Major
> Attachments: 10 Minute Partition.pdf
>
>
> I've been trying to debug nodes not being able to see each other during 
> longer (~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can 
> contribute to {{UnavailableExceptions}} during rolling restarts of 3.0.x and 
> 2.1.x clusters for us. I think I finally have a lead. It appears that prior 
> to trunk (with the awesome Netty refactor) we do not set socket connect 
> timeouts on SSL connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set 
> {{SO_TIMEOUT}} as far as I can tell on outbound connections either. I believe 
> that this means that we could potentially block forever on {{connect}} or 
> {{send}} syscalls, and we could block forever on the SSL Handshake as well. I 
> think that the OS will protect us somewhat (and that may be what's causing 
> the eventual timeout) but I think that given the right network conditions our 
> {{OutboundTCPConnection}} threads can just be stuck forever never making any 
> progress.
> I have attached some logs of such a network partition during a rolling 
> restart where an old node in the cluster has a completely foobarred 
> {{OutboundTcpConnection}} for ~10 minutes before finally getting a 
> {{java.net.SocketException: Connection timed out (Write failed)}} and 
> immediately successfully reconnecting. I conclude that the old node is the 
> problem because the new node (the one that restarted) is sending ECHOs to the 
> old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
> node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
> me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
> stuck and can't make any forward progress. By the time we could notice this 
> and slap TRACE logging on, the only thing we see is ~10 minutes later a 
> {{SocketException}} inside {{writeConnected}}'s flush and an immediate 
> recovery. It is interesting to me that the exception happens in 
> {{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
> failure}} I believe that this can't be a connection reset), because my 
> understanding is that we should have a fully handshaked SSL connection at 
> that point in the code.
> Current theory:
>  # "New" node restarts,  "Old" node calls 
> [newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
>  # Old node starts [creating a 
> new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
>  SSL socket 
>  # SSLSocket calls 
> [createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
>  which conveniently calls connect with a default timeout of "forever". We 
> could hang here forever until the OS kills us.
>  # If we continue, we get to 
> [writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
>  which eventually calls 
> [flush|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L341]
>  on the output stream and also can hang 

[jira] [Resolved] (CASSANDRA-14001) Gossip after node restart can take a long time to converge about "down" nodes in large clusters

2018-03-30 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch resolved CASSANDRA-14001.
--
Resolution: Cannot Reproduce

Closing since I don't think the issue here is actually the issue.

> Gossip after node restart can take a long time to converge about "down" nodes 
> in large clusters
> ---
>
> Key: CASSANDRA-14001
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14001
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Joseph Lynch
>Priority: Minor
>
> When nodes restart in a large cluster, they mark all nodes as "alive", which 
> first calls {{markDead}} and then creates an {{EchoMessage}} and in the 
> callback to that marks the node as alive. This works great, except when that 
> initial echo fails for w.e. reason and that node is marked as dead, in which 
> case it will remain dead for a long while.
> We mostly see this on 100+ node clusters, and almost always when nodes are in 
> different datacenters that have unreliable network connections (e.g, cross 
> region in AWS) and I think that it comes down to a combination of:
> 1. Only a node itself can mark another node as "UP"
> 2. Nodes only gossip with dead nodes with probability {{#dead / (#live +1)}}
> In particular the algorithm in #2 leads to long convergence times because the 
> number of dead nodes it typically very small compared to the cluster size. My 
> back of the envelope model of this algorithm indicates that for a 100 node 
> cluster this would take an average of ~50 seconds with a stdev of 50 seconds, 
> which means we might be waiting _minutes_ for the nodes to gossip with each 
> other. I'm modeling this as the minimum of two [geometric 
> distributions|https://en.wikipedia.org/wiki/Geometric_distribution] with 
> parameter {{p=1/#nodes}}, yielding a geometric distribution with parameter 
> {{p=1-(1-(1/#nodes)^2)}}. So for a 100 node cluster:
> {noformat}
> 100 node cluster =>
> X = Pr(node1 gossips with node2) = geom(0.01)
> Y = Pr(node 2 gossips with node1) = geom(0.01)
> Z = min(X or Y) = geom(1 - (1 - 0.01)^2) = geom(0.02)
> E[Z] = 1/0.02 = 50
> V[Z] = (1-0.02)/(0.02)^2 = 2450
> 1000 node cluster ->
> Z = geom(1 - (1 - 0.001)^2) = geom(0.002)
> E[Z] = 500
> V[Z] = 24500
> {noformat}
> Since we gossip every second that means that on expectation in a 100 node 
> cluster these nodes would see each other after about a minute and in a 
> thousand node cluster, after ~8 minutes. For 100 node clusters the variance 
> is astounding, and means that in particular edge cases we might be waiting 
> hours before these nodes gossip with each other.
> I'm thinking of writing a patch which either:
> # Makes gossip order a shuffled list that includes dead nodes a la [swim 
> gossip|https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf]. This would 
> make it so that we waste some rounds on dead nodes but guarantee linear 
> bounding of gossip.
> # Adds an endpoint that re-triggers gossip with all nodes. Operators could 
> call this after a restart a few times if they detect a gossip inconsistency.
> # Bounding the probability we gossip with a dead node at some reasonable 
> number like 1/10 or something. This might cause a lot of gossip load when a 
> node is actually down for large clusters, but would also act to bound the 
> variance.
> # Something else?
> I've got a WIP 
> [branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:force_gossip]
>  on 3.11 which implements options #1 and #2, but I can reduce/change/modify 
> as needed if people think there is a better way. The patch doesn't pass tests 
> yet but I'm not going to change/add the tests unless we think moving to time 
> bounded gossip for down nodes is a good idea.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14001) Gossip after node restart can take a long time to converge about "down" nodes in large clusters

2018-03-30 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421084#comment-16421084
 ] 

Joseph Lynch commented on CASSANDRA-14001:
--

After digging deeply I think that the evidence is indicating not an issue with 
Gossip, but just with how we establish connections on startup. I think we're 
just hitting a combination of CASSANDRA-13993 and CASSANDRA-14001. I'm going to 
close this out since I don't think the issue is Gossip related. 

> Gossip after node restart can take a long time to converge about "down" nodes 
> in large clusters
> ---
>
> Key: CASSANDRA-14001
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14001
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Lifecycle
>Reporter: Joseph Lynch
>Priority: Minor
>
> When nodes restart in a large cluster, they mark all nodes as "alive", which 
> first calls {{markDead}} and then creates an {{EchoMessage}} and in the 
> callback to that marks the node as alive. This works great, except when that 
> initial echo fails for w.e. reason and that node is marked as dead, in which 
> case it will remain dead for a long while.
> We mostly see this on 100+ node clusters, and almost always when nodes are in 
> different datacenters that have unreliable network connections (e.g, cross 
> region in AWS) and I think that it comes down to a combination of:
> 1. Only a node itself can mark another node as "UP"
> 2. Nodes only gossip with dead nodes with probability {{#dead / (#live +1)}}
> In particular the algorithm in #2 leads to long convergence times because the 
> number of dead nodes it typically very small compared to the cluster size. My 
> back of the envelope model of this algorithm indicates that for a 100 node 
> cluster this would take an average of ~50 seconds with a stdev of 50 seconds, 
> which means we might be waiting _minutes_ for the nodes to gossip with each 
> other. I'm modeling this as the minimum of two [geometric 
> distributions|https://en.wikipedia.org/wiki/Geometric_distribution] with 
> parameter {{p=1/#nodes}}, yielding a geometric distribution with parameter 
> {{p=1-(1-(1/#nodes)^2)}}. So for a 100 node cluster:
> {noformat}
> 100 node cluster =>
> X = Pr(node1 gossips with node2) = geom(0.01)
> Y = Pr(node 2 gossips with node1) = geom(0.01)
> Z = min(X or Y) = geom(1 - (1 - 0.01)^2) = geom(0.02)
> E[Z] = 1/0.02 = 50
> V[Z] = (1-0.02)/(0.02)^2 = 2450
> 1000 node cluster ->
> Z = geom(1 - (1 - 0.001)^2) = geom(0.002)
> E[Z] = 500
> V[Z] = 24500
> {noformat}
> Since we gossip every second that means that on expectation in a 100 node 
> cluster these nodes would see each other after about a minute and in a 
> thousand node cluster, after ~8 minutes. For 100 node clusters the variance 
> is astounding, and means that in particular edge cases we might be waiting 
> hours before these nodes gossip with each other.
> I'm thinking of writing a patch which either:
> # Makes gossip order a shuffled list that includes dead nodes a la [swim 
> gossip|https://www.cs.cornell.edu/~asdas/research/dsn02-swim.pdf]. This would 
> make it so that we waste some rounds on dead nodes but guarantee linear 
> bounding of gossip.
> # Adds an endpoint that re-triggers gossip with all nodes. Operators could 
> call this after a restart a few times if they detect a gossip inconsistency.
> # Bounding the probability we gossip with a dead node at some reasonable 
> number like 1/10 or something. This might cause a lot of gossip load when a 
> node is actually down for large clusters, but would also act to bound the 
> variance.
> # Something else?
> I've got a WIP 
> [branch|https://github.com/apache/cassandra/compare/cassandra-3.11...jolynch:force_gossip]
>  on 3.11 which implements options #1 and #2, but I can reduce/change/modify 
> as needed if people think there is a better way. The patch doesn't pass tests 
> yet but I'm not going to change/add the tests unless we think moving to time 
> bounded gossip for down nodes is a good idea.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-03-30 Thread Joseph Lynch (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph Lynch updated CASSANDRA-14358:
-
Description: 
I've been trying to debug nodes not being able to see each other during longer 
(~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can contribute to 
{{UnavailableExceptions}} during rolling restarts of 3.0.x and 2.1.x clusters 
for us. I think I finally have a lead. It appears that prior to trunk (with the 
awesome Netty refactor) we do not set socket connect timeouts on SSL 
connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set {{SO_TIMEOUT}} as far as 
I can tell on outbound connections either. I believe that this means that we 
could potentially block forever on {{connect}} or {{send}} syscalls, and we 
could block forever on the SSL Handshake as well. I think that the OS will 
protect us somewhat (and that may be what's causing the eventual timeout) but I 
think that given the right network conditions our {{OutboundTCPConnection}} 
threads can just be stuck forever never making any progress.

I have attached some logs of such a network partition during a rolling restart 
where an old node in the cluster has a completely foobarred 
{{OutboundTcpConnection}} for ~10 minutes before finally getting a 
{{java.net.SocketException: Connection timed out (Write failed)}} and 
immediately successfully reconnecting. I conclude that the old node is the 
problem because the new node (the one that restarted) is sending ECHOs to the 
old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
stuck and can't make any forward progress. By the time we could notice this and 
slap TRACE logging on, the only thing we see is ~10 minutes later a 
{{SocketException}} inside {{writeConnected}}'s flush and an immediate 
recovery. It is interesting to me that the exception happens in 
{{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
failure}} I believe that this can't be a connection reset), because my 
understanding is that we should have a fully handshaked SSL connection at that 
point in the code.

Current theory:
 # "New" node restarts,  "Old" node calls 
[newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
 # Old node starts [creating a 
new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
 SSL socket 
 # SSLSocket calls 
[createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
 which conveniently calls connect with a default timeout of "forever". We could 
hang here forever until the OS kills us.
 # If we continue, we get to 
[writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
 which eventually calls 
[flush|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L341]
 on the output stream and also can hang forever. I think the probability is 
especially high when a node is restarting and is overwhelmed with SSL 
handshakes and such.

I don't fully understand the attached traceback as it appears we are getting a 
{{Connection Timeout}} from a {{send}} failure (my understanding is you can 
only get a connection timeout prior to a send), but I think it's reasonable 
that we have a timeout configuration issue. I'd like to try to make Cassandra 
robust to networking issues like this via maybe:
 # Change the {{SSLSocket}} {{getSocket}} methods to provide connection 
timeouts of 2s (equivalent to trunk's 
[timeout|https://github.com/apache/cassandra/blob/11496039fb18bb45407246602e31740c56d28157/src/java/org/apache/cassandra/net/async/NettyFactory.java#L329])
 # Appropriately set recv timeouts via {{SO_TIMEOUT}}, maybe something like 2 
minutes (in old versions via 
[setSoTimeout|https://docs.oracle.com/javase/8/docs/api/java/net/Socket.html#setSoTimeout-int-],
 in trunk via 
[SO_TIMEOUT|http://netty.io/4.0/api/io/netty/channel/ChannelOption.html#SO_TIMEOUT]
 # Since we can't set send timeouts afaik (thanks java) maybe we can have some 
kind of watchdog that ensures OutboundTcpConnection is making progress in its 
queue and if it doesn't make any progress for ~30s-1m, forces a disconnect.

If anyone has insight or suggestions, I'd be grateful. I am going to rule out 
if this is keepalive duration by setting tcp_keepalive_probes to like 1; and 
get more information about the state of the tcp connections the next time this 

[jira] [Created] (CASSANDRA-14358) OutboundTcpConnection can hang for many minutes when nodes restart

2018-03-30 Thread Joseph Lynch (JIRA)
Joseph Lynch created CASSANDRA-14358:


 Summary: OutboundTcpConnection can hang for many minutes when 
nodes restart
 Key: CASSANDRA-14358
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14358
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
 Environment: Cassandra 2.1.19 (also reproduced on 3.0.15), running 
with {{internode_encryption: all}} and the EC2 multi region snitch on Linux 
4.13 within the same AWS region. Smallest cluster I've seen the problem on is 
12 nodes, reproduces more reliably on 40+ and 300 node clusters consistently 
reproduce on at least one node in the cluster.

So all the connections are SSL and we're connecting on the internal ip 
addresses (not the public endpoint ones).

Potentially relevant sysctls:
{noformat}
/proc/sys/net/ipv4/tcp_syn_retries = 2
/proc/sys/net/ipv4/tcp_synack_retries = 5
/proc/sys/net/ipv4/tcp_keepalive_time = 7200
/proc/sys/net/ipv4/tcp_keepalive_probes = 9
/proc/sys/net/ipv4/tcp_keepalive_intvl = 75
{noformat}
Reporter: Joseph Lynch
 Attachments: 10 Minute Partition.pdf

I've been trying to debug nodes not being able to see each other during longer 
(~5 minute+) Cassandra restarts in 3.0.x and 2.1.x which can contribute to 
{{UnavailableExceptions}} during rolling restarts of 3.0.x and 2.1.x clusters 
for us. I think I finally have a lead. It appears that prior to trunk (with the 
awesome Netty refactor) we do not set socket connect timeouts on SSL 
connections (in 2.1.x, 3.0.x, or 3.11.x) nor do we set {{SO_TIMEOUT}}s as far 
as I can tell on outbound connections either. I believe that this means that we 
could potentially block forever on {{connect}} or {{send}} syscalls, and we 
could block forever on the SSL Handshake as well. I think that the OS will 
protect us somewhat (and that may be what's causing the eventual timeout) but I 
think that given the right network conditions our {{OutboundTCPConnection}} 
threads can just be stuck forever never making any progress.

I have attached some logs of such a network partition during a rolling restart 
where an old node in the cluster has a completely foobarred 
{{OutboundTcpConnection}} for ~10 minutes before finally getting a 
{{java.net.SocketException: Connection timed out (Write failed)}} and 
immediately successfully reconnecting. I conclude that the old node is the 
problem because the new node (the one that restarted) is sending ECHOs to the 
old node, and the old node is sending ECHOs and REQUEST_RESPONSES to the new 
node's ECHOs, but the new node is never getting the ECHO's. This appears, to 
me, to indicate that the old node's {{OutboundTcpConnection}} thread is just 
stuck and can't make any forward progress. By the time we could notice this and 
slap TRACE logging on, the only thing we see is ~10 minutes later a 
{{SocketException}} inside {{writeConnected}}'s flush and an immediate 
recovery. It is interesting to me that the exception happens in 
{{writeConnected}} and it's a _connection timeout_ (and since we see {{Write 
failure}} I believe that this can't be a connection reset), because my 
understanding is that we should have a fully handshaked SSL connection at that 
point in the code.

Current theory:
 # "New" node restarts,  "Old" node calls 
[newSocket|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L433]
 # Old node starts [creating a 
new|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java#L141]
 SSL socket 
 # SSLSocket calls 
[createSocket|https://github.com/apache/cassandra/blob/cassandra-3.11/src/java/org/apache/cassandra/security/SSLFactory.java#L98],
 which conveniently calls connect with a default timeout of "forever". We could 
hang here forever until the OS kills us.
 # If we continue, we get to 
[writeConnected|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L263]
 which eventually calls 
[flush|https://github.com/apache/cassandra/blob/6f30677b28dcbf82bcd0a291f3294ddf87dafaac/src/java/org/apache/cassandra/net/OutboundTcpConnection.java#L341]
 on the output stream and also can hang forever. I think the probability is 
especially high when a node is restarting and is overwhelmed with SSL 
handshakes and such.

I don't fully understand the attached traceback as it appears we are getting a 
{{Connection Timeout}} from a {{send}} failure (my understanding is you can 
only get a connection timeout prior to a send), but I think it's reasonable 
that we have a timeout configuration issue. I'd like to try to make Cassandra 
robust to networking issues like this via maybe:
 # Change the {{SSLSocket}} {{getSocket}} methods to provide 

[jira] [Assigned] (CASSANDRA-14354) rename ColumnFamilyStoreCQLHelper to TableCQLHelper

2018-03-30 Thread Jon Haddad (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jon Haddad reassigned CASSANDRA-14354:
--

Assignee: Venkata Harikrishna Nukala

> rename ColumnFamilyStoreCQLHelper to TableCQLHelper
> ---
>
> Key: CASSANDRA-14354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14354
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jon Haddad
>Assignee: Venkata Harikrishna Nukala
>Priority: Major
> Attachments: 14354-trunk.txt
>
>
> Seems like a simple 1:1 rename.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14357) A Simple List of New Major Features Desired for Version 4.0

2018-03-30 Thread Kenneth Brotman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421050#comment-16421050
 ] 

Kenneth Brotman commented on CASSANDRA-14357:
-

You took the core stakeholder group- the users themselves - out of the 
discussion about major features for the next release! 



> A Simple List of New Major Features Desired for Version 4.0
> ---
>
> Key: CASSANDRA-14357
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14357
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Kenneth Brotman
>Priority: Major
>
> Just list any desired new major features for 4.0 that you want added.  I will 
> maintain a compiled list somewhere on this Jira as well.  Don't worry about 
> any steps beyond this.  Don't make any judgements about or make any comments 
> at all about what others add. 
> No judgments at this point.  This is a list of everyone's suggestions.  Add 
> your suggestions for new major features you desire be added for version 4.0 
> only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-10726) Read repair inserts should not be blocking

2018-03-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421049#comment-16421049
 ] 

Blake Eggleston edited comment on CASSANDRA-10726 at 3/30/18 11:14 PM:
---

[trunk|https://github.com/bdeggleston/cassandra/tree/10726-v2]
 [dtests|https://github.com/bdeggleston/cassandra-dtest/tree/10726]
 [tests|https://circleci.com/workflow-run/7c271901-a224-4326-bb32-cd75f218ce96]

The patch makes these 2 changes to read repair behavior.

After a digest mismatch, data requests are sent to all participants in the 
original request, but only CL.blockFor responses are required to proceed (used 
to be CL.ALL, which would be 3/3 if we speculated). The followup data read will 
now also speculatively read from another replica if it's looking like one may 
not respond, and another is available (ie: we didn't speculate on the digest 
requests).

When sending repair mutations, we now only block on CL.blockFor acks (used to 
be CL.ALL). We will now also speculatively send a repair mutation to an 
additional node with the combined contents of all unacked mutations if it looks 
like one may not respond.

(C* branch is written on top of CASSANDRA-14353, so that's a dependency, but 
should get committed soon)


was (Author: bdeggleston):
[trunk|https://github.com/bdeggleston/cassandra/tree/10726-v2]
 [dtests|https://github.com/bdeggleston/cassandra-dtest/tree/10726]
 [tests|https://circleci.com/workflow-run/7c271901-a224-4326-bb32-cd75f218ce96]

The patch makes these 2 changes to read repair behavior.

After a digest mismatch, data requests are sent to all participants in the 
original request, but only CL.blockFor responses are required to proceed (used 
to be CL.ALL, which would be 3/3 if we speculated). The followup data read will 
now also speculatively read from another replica if it's looking like one may 
not respond, and another is available.

When sending repair mutations, we now only block on CL.blockFor acks (used to 
be CL.ALL). We will now also speculatively send a repair mutation to an 
additional node with the contents of all unacked mutations if it looks like one 
may not respond.

(C* branch is written on top of CASSANDRA-14353, so that's a dependency, but 
should get committed soon)

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Richard Low
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-10726) Read repair inserts should not be blocking

2018-03-30 Thread Blake Eggleston (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-10726:

Status: Patch Available  (was: Awaiting Feedback)

[trunk|https://github.com/bdeggleston/cassandra/tree/10726-v2]
 [dtests|https://github.com/bdeggleston/cassandra-dtest/tree/10726]
 [tests|https://circleci.com/workflow-run/7c271901-a224-4326-bb32-cd75f218ce96]

The patch makes these 2 changes to read repair behavior.

After a digest mismatch, data requests are sent to all participants in the 
original request, but only CL.blockFor responses are required to proceed (used 
to be CL.ALL, which would be 3/3 if we speculated). The followup data read will 
now also speculatively read from another replica if it's looking like one may 
not respond, and another is available.

When sending repair mutations, we now only block on CL.blockFor acks (used to 
be CL.ALL). We will now also speculatively send a repair mutation to an 
additional node with the contents of all unacked mutations if it looks like one 
may not respond.

(C* branch is written on top of CASSANDRA-14353, so that's a dependency, but 
should get committed soon)

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Richard Low
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Assigned] (CASSANDRA-12106) Add ability to blacklist a CQL partition so all requests are ignored

2018-03-30 Thread Geoffrey Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Geoffrey Yu reassigned CASSANDRA-12106:
---

Assignee: Sumanth Pasupuleti  (was: Geoffrey Yu)

> Add ability to blacklist a CQL partition so all requests are ignored
> 
>
> Key: CASSANDRA-12106
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12106
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: Geoffrey Yu
>Assignee: Sumanth Pasupuleti
>Priority: Minor
> Fix For: 4.x
>
> Attachments: 12106-trunk.txt
>
>
> Sometimes reads/writes to a given partition may cause problems due to the 
> data present. It would be useful to have a manual way to blacklist such 
> partitions so all read and write requests to them are rejected.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14354) rename ColumnFamilyStoreCQLHelper to TableCQLHelper

2018-03-30 Thread Venkata Harikrishna Nukala (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421038#comment-16421038
 ] 

Venkata Harikrishna Nukala commented on CASSANDRA-14354:


[~rustyrazorblade] Can you assign this ticket to me? I had uploaded the patch 
to this ticket. Please review it.

> rename ColumnFamilyStoreCQLHelper to TableCQLHelper
> ---
>
> Key: CASSANDRA-14354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14354
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jon Haddad
>Priority: Major
> Attachments: 14354-trunk.txt
>
>
> Seems like a simple 1:1 rename.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14354) rename ColumnFamilyStoreCQLHelper to TableCQLHelper

2018-03-30 Thread Venkata Harikrishna Nukala (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venkata Harikrishna Nukala updated CASSANDRA-14354:
---
Attachment: 14354-trunk.txt

> rename ColumnFamilyStoreCQLHelper to TableCQLHelper
> ---
>
> Key: CASSANDRA-14354
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14354
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Jon Haddad
>Priority: Major
> Attachments: 14354-trunk.txt
>
>
> Seems like a simple 1:1 rename.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14357) A Simple List of New Major Features Desired for Version 4.0

2018-03-30 Thread Kenneth Brotman (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421021#comment-16421021
 ] 

Kenneth Brotman commented on CASSANDRA-14357:
-

Jason,

What are you doing!  It should start with the users in less detailed simple 
"user" language.  The dev step would have come quick enough.  I guess you're 
the expert.  In the future, consult in private, never do things publicly like 
that.

Kenneth Brotman



> A Simple List of New Major Features Desired for Version 4.0
> ---
>
> Key: CASSANDRA-14357
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14357
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Kenneth Brotman
>Priority: Major
>
> Just list any desired new major features for 4.0 that you want added.  I will 
> maintain a compiled list somewhere on this Jira as well.  Don't worry about 
> any steps beyond this.  Don't make any judgements about or make any comments 
> at all about what others add. 
> No judgments at this point.  This is a list of everyone's suggestions.  Add 
> your suggestions for new major features you desire be added for version 4.0 
> only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14116) Refactor repair

2018-03-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16421015#comment-16421015
 ] 

Jason Brown commented on CASSANDRA-14116:
-

sgtm. +1

> Refactor repair
> ---
>
> Key: CASSANDRA-14116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14116
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Repair
>Reporter: Dikang Gu
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0
>
>
> As part of the pluggable storage engine effort, we'd like to modularize the 
> repair related code, make it to be independent from existing storage engine 
> implementation details.
> For now, refer to 
> https://docs.google.com/document/d/1suZlvhzgB6NIyBNpM9nxoHxz_Ri7qAm-UEO8v8AIFsc
>  for high level designs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7839) Support standard EC2 naming conventions in Ec2Snitch

2018-03-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420995#comment-16420995
 ] 

Jason Brown commented on CASSANDRA-7839:


Also, I realized the current call site where {{IEndpointSnitch#validate}} is 
called from is not very good: it uses cluster metadata from gossip, however 
gossip has not been enabled before the call site! We have, however, executed 
the shadow round of gossip, and we have the cluster metadata after the shadow 
round completes. Thus, at those call sites, with the shadow round data, we 
should call {{IEndpointSnitch#validate}}. Added a patch for that

> Support standard EC2 naming conventions in Ec2Snitch
> 
>
> Key: CASSANDRA-7839
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7839
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Gregory Ramsperger
>Assignee: Jason Brown
>Priority: Major
>  Labels: docs-impacting
> Attachments: CASSANDRA-7839-aws-naming-conventions.patch
>
>
> The EC2 snitches use datacenter and rack naming conventions inconsistent with 
> those presented in Amazon EC2 APIs as region and availability zone. A 
> discussion of this is found in CASSANDRA-4026. This has not been changed for 
> valid backwards compatibility reasons. Using SnitchProperties, it is possible 
> to switch between the legacy naming and the full, AWS-style naming. 
> Proposal:
> * introduce a property (ec2_naming_scheme) to switch naming schemes.
> * default to current/legacy naming scheme
> * add support for a new scheme ("standard") which is consistent AWS 
> conventions
> ** data centers will be the region name, including the number
> ** racks will be the availability zone name, including the region name
> Examples:
> * * legacy* : datacenter is the part of the availability zone name preceding 
> the last "\-" when the zone ends in \-1 and includes the number if not \-1. 
> Rack is the portion of the availability zone name following  the last "\-".
> ** us-west-1a => dc: us-west, rack: 1a
> ** us-west-2b => dc: us-west-2, rack: 2b; 
> * *standard* : datacenter is the part of the availability zone name preceding 
> zone letter. rack is the entire availability zone name.
> ** us-west-1a => dc: us-west-1, rack: us-west-1a
> ** us-west-2b => dc: us-west-2, rack: us-west-2b; 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-7839) Support standard EC2 naming conventions in Ec2Snitch

2018-03-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420995#comment-16420995
 ] 

Jason Brown edited comment on CASSANDRA-7839 at 3/30/18 10:09 PM:
--

Also, I realized the current call site where {{IEndpointSnitch#validate}} is 
called from is not very good: it uses cluster metadata from gossip, however 
gossip has not been enabled before the call site! We have, however, executed 
the shadow round of gossip, and we have the cluster metadata after the shadow 
round completes. Thus, at those call sites, with the shadow round data, we 
should call {{IEndpointSnitch#validate}}. Added a commit for that


was (Author: jasobrown):
Also, I realized the current call site where {{IEndpointSnitch#validate}} is 
called from is not very good: it uses cluster metadata from gossip, however 
gossip has not been enabled before the call site! We have, however, executed 
the shadow round of gossip, and we have the cluster metadata after the shadow 
round completes. Thus, at those call sites, with the shadow round data, we 
should call {{IEndpointSnitch#validate}}. Added a patch for that

> Support standard EC2 naming conventions in Ec2Snitch
> 
>
> Key: CASSANDRA-7839
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7839
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Gregory Ramsperger
>Assignee: Jason Brown
>Priority: Major
>  Labels: docs-impacting
> Attachments: CASSANDRA-7839-aws-naming-conventions.patch
>
>
> The EC2 snitches use datacenter and rack naming conventions inconsistent with 
> those presented in Amazon EC2 APIs as region and availability zone. A 
> discussion of this is found in CASSANDRA-4026. This has not been changed for 
> valid backwards compatibility reasons. Using SnitchProperties, it is possible 
> to switch between the legacy naming and the full, AWS-style naming. 
> Proposal:
> * introduce a property (ec2_naming_scheme) to switch naming schemes.
> * default to current/legacy naming scheme
> * add support for a new scheme ("standard") which is consistent AWS 
> conventions
> ** data centers will be the region name, including the number
> ** racks will be the availability zone name, including the region name
> Examples:
> * * legacy* : datacenter is the part of the availability zone name preceding 
> the last "\-" when the zone ends in \-1 and includes the number if not \-1. 
> Rack is the portion of the availability zone name following  the last "\-".
> ** us-west-1a => dc: us-west, rack: 1a
> ** us-west-2b => dc: us-west-2, rack: 2b; 
> * *standard* : datacenter is the part of the availability zone name preceding 
> zone letter. rack is the entire availability zone name.
> ** us-west-1a => dc: us-west-1, rack: us-west-1a
> ** us-west-2b => dc: us-west-2, rack: us-west-2b; 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-03-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420981#comment-16420981
 ] 

Blake Eggleston commented on CASSANDRA-14346:
-

I think the problems that exist in C* with regard to understanding the state of 
repairs and streams, and the inability to cancel them without restarting nodes 
are orthogonal to talking about the best approach to coordinate them. 

As far as I’m aware, it’s not currently possible for a repair to determine if 
it’s taking a long time, finished with a lost notification, or stuck somewhere. 
So that’s really a limitation in the design of how cassandra does individual 
streams and repair sessions that should be solved regardless, and not really an 
argument in favor of one approach or the other.

The auth concern makes sense, but unless this completely removes the need for 
any sort of sidecar process, you’ll still have to deal with it.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Major
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Resolved] (CASSANDRA-14357) A Simple List of New Major Features Desired for Version 4.0

2018-03-30 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown resolved CASSANDRA-14357.
-
Resolution: Invalid

This is better handled on the dev@ ML list, not jira.

> A Simple List of New Major Features Desired for Version 4.0
> ---
>
> Key: CASSANDRA-14357
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14357
> Project: Cassandra
>  Issue Type: Wish
>Reporter: Kenneth Brotman
>Priority: Major
>
> Just list any desired new major features for 4.0 that you want added.  I will 
> maintain a compiled list somewhere on this Jira as well.  Don't worry about 
> any steps beyond this.  Don't make any judgements about or make any comments 
> at all about what others add. 
> No judgments at this point.  This is a list of everyone's suggestions.  Add 
> your suggestions for new major features you desire be added for version 4.0 
> only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12151) Audit logging for database activity

2018-03-30 Thread Dinesh Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420963#comment-16420963
 ] 

Dinesh Joshi commented on CASSANDRA-12151:
--

[~spo...@gmail.com] The {{BinLog}} does what I suggested. It deletes old files 
once you exceed the allocated storage quota. From my reading, it seems that it 
segments logs on the basis of time (defaults to hourly).

> Audit logging for database activity
> ---
>
> Key: CASSANDRA-12151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12151
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: stefan setyadi
>Assignee: Vinay Chella
>Priority: Major
> Fix For: 4.x
>
> Attachments: 12151.txt, CASSANDRA_12151-benchmark.html, 
> DesignProposal_AuditingFeature_ApacheCassandra_v1.docx
>
>
> we would like a way to enable cassandra to log database activity being done 
> on our server.
> It should show username, remote address, timestamp, action type, keyspace, 
> column family, and the query statement.
> it should also be able to log connection attempt and changes to the 
> user/roles.
> I was thinking of making a new keyspace and insert an entry for every 
> activity that occurs.
> Then It would be possible to query for specific activity or a query targeting 
> a specific keyspace and column family.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14116) Refactor repair

2018-03-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420961#comment-16420961
 ] 

Blake Eggleston commented on CASSANDRA-14116:
-

bq. The {{Predicate}} in the original includes the additional cluases on the if 
statement

{{SnapshotTask}} isn't run in incremental repairs (see {{RepairJob#run}}), so 
that line would always evaluate to true.

bq. Is this a decent time to switch from the guava Predicate and 
ListenableFuture type uses, and switch to the JDK equivalents?

I'm not personally concerned with it, and I think it would be better as it's 
own ticket if/when we decided to do it.

> Refactor repair
> ---
>
> Key: CASSANDRA-14116
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14116
> Project: Cassandra
>  Issue Type: Sub-task
>  Components: Repair
>Reporter: Dikang Gu
>Assignee: Blake Eggleston
>Priority: Major
> Fix For: 4.0
>
>
> As part of the pluggable storage engine effort, we'd like to modularize the 
> repair related code, make it to be independent from existing storage engine 
> implementation details.
> For now, refer to 
> https://docs.google.com/document/d/1suZlvhzgB6NIyBNpM9nxoHxz_Ri7qAm-UEO8v8AIFsc
>  for high level designs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Created] (CASSANDRA-14357) A Simple List of New Major Features Desired for Version 4.0

2018-03-30 Thread Kenneth Brotman (JIRA)
Kenneth Brotman created CASSANDRA-14357:
---

 Summary: A Simple List of New Major Features Desired for Version 
4.0
 Key: CASSANDRA-14357
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14357
 Project: Cassandra
  Issue Type: Wish
Reporter: Kenneth Brotman


Just list any desired new major features for 4.0 that you want added.  I will 
maintain a compiled list somewhere on this Jira as well.  Don't worry about any 
steps beyond this.  Don't make any judgements about or make any comments at all 
about what others add. 

No judgments at this point.  This is a list of everyone's suggestions.  Add 
your suggestions for new major features you desire be added for version 4.0 
only.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-7839) Support standard EC2 naming conventions in Ec2Snitch

2018-03-30 Thread Jason Brown (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-7839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420957#comment-16420957
 ] 

Jason Brown commented on CASSANDRA-7839:


[~jolynch]: made a bunch of small changes based on your first review. Please 
take a look at the next commit on the same branch (sha 
{{2fa6b1a37eba67ba20be85de5f40e68d5e875e4c}})

> Support standard EC2 naming conventions in Ec2Snitch
> 
>
> Key: CASSANDRA-7839
> URL: https://issues.apache.org/jira/browse/CASSANDRA-7839
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Gregory Ramsperger
>Assignee: Jason Brown
>Priority: Major
>  Labels: docs-impacting
> Attachments: CASSANDRA-7839-aws-naming-conventions.patch
>
>
> The EC2 snitches use datacenter and rack naming conventions inconsistent with 
> those presented in Amazon EC2 APIs as region and availability zone. A 
> discussion of this is found in CASSANDRA-4026. This has not been changed for 
> valid backwards compatibility reasons. Using SnitchProperties, it is possible 
> to switch between the legacy naming and the full, AWS-style naming. 
> Proposal:
> * introduce a property (ec2_naming_scheme) to switch naming schemes.
> * default to current/legacy naming scheme
> * add support for a new scheme ("standard") which is consistent AWS 
> conventions
> ** data centers will be the region name, including the number
> ** racks will be the availability zone name, including the region name
> Examples:
> * * legacy* : datacenter is the part of the availability zone name preceding 
> the last "\-" when the zone ends in \-1 and includes the number if not \-1. 
> Rack is the portion of the availability zone name following  the last "\-".
> ** us-west-1a => dc: us-west, rack: 1a
> ** us-west-2b => dc: us-west-2, rack: 2b; 
> * *standard* : datacenter is the part of the availability zone name preceding 
> zone letter. rack is the entire availability zone name.
> ** us-west-1a => dc: us-west-1, rack: us-west-1a
> ** us-west-2b => dc: us-west-2, rack: us-west-2b; 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14349) Untracked CDC segment files are not deleted after replay

2018-03-30 Thread Jay Zhuang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420955#comment-16420955
 ] 

Jay Zhuang commented on CASSANDRA-14349:


Hi [~shichao.an], nice finding. It would be great to have a dTest for that, 
just restart the node a few time and check if there's any orphaned commitlog in 
{{cdc_raw}} directory. Any non-active commitlog that doesn't have idx file 
should be considered orphaned.

cc. [~JoshuaMcKenzie]

> Untracked CDC segment files are not deleted after replay
> 
>
> Key: CASSANDRA-14349
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14349
> Project: Cassandra
>  Issue Type: Bug
>  Components: Local Write-Read Paths
>Reporter: Shichao An
>Assignee: Shichao An
>Priority: Minor
>
> When CDC is enabled, a hard link to each commit log file will be created in 
> cdc_raw directory. Those commit logs with CDC mutations will also have cdc 
> index files created along with the hard links; these are intended for the 
> consumer to handle and clean them up.
> However, if we don't produce any CDC traffic, those hard links in cdc_raw 
> will be never cleaned up (because hard links will still be created, without 
> the index files), whereas the real original commit logs are correctly deleted 
> after replay during process startup. This will results in many untracked hard 
> links in cdc_raw if we restart the cassandra process many times. I am able to 
> use CCM to reproduce it in trunk version which has the CASSANDRA-12148 
> changes.
> This seems a bug in handleReplayedSegment of the commit log segment manager 
> which neglects to take care of CDC commit logs. I will attach a patch here.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-03-30 Thread Joseph Lynch (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420838#comment-16420838
 ] 

Joseph Lynch commented on CASSANDRA-14346:
--

[~adejanovski] thanks for the feedback, it is appreciated to have more eyes 
looking. I think my general response to you is "yes this is a very hard 
problem, and this solution does not entirely solve it, but let's take it 
incrementally". I'm hoping to get a basic version into 4.0, *marked explicitly 
as experimental* and disabled by default. Users can then start using it if they 
like, and we can iterate, fixing bugs, adding features and iterating as we go. 
We will absolutely have global kill switches to turn it off. I'll reply to your 
specific points in the doc and if I get a chance I'll copy it back to here.

[~bdeggleston]
{quote}
The problem I see with distributing control of cluster level operations like 
repair to the nodes themselves is that it’s more complicated to do correctly 
than it is with a separate management process. You have to deal with failure 
scenarios, internode coordinations, etc, etc. It seems like one of the benefits 
of having a sidecar project like reaper or priam is that you can dispense with 
a lot of the complexity that comes with designing around single points of 
failure, and simplify your management logic.
{quote}
I think some important context is that we just finished implementing this as a 
per node sidecar (in Priam). Having done it with a sidecar I really think 
external processes of any kind are the wrong way to do it generally speaking. 
The short version is "JMX is really bad". In particular, reasoning about stuck 
vs lost repairs (esp when jmx connections temporarily fail and then you lose 
notifications on all the repairs you are doing) is extremely difficult. We 
probably have 2k loc just dealing with edge cases when the sidecar restarts but 
Cassandra does not (you have to wait for Cassandra to finish any existing 
repairs and guess which ones those were), when Cassandra restarts but the 
sidecar does not (you have to wait for Cassandra to come back healthy and 
possibly time it out), when a Cassandra repair thread gets stuck forever and 
never makes any progress, etc ... Fundamentally a sidecar can't reach in and be 
like "hey you should be heartbeating constantly and if you stop making progress 
I will kill you". You also have to manage configuration of repair through an 
additional table rather than table configs, and you have to credential the 
sidecar so it can speak to both JMX and CQL. The main benefit of a sidecar imo 
is that it can use a different Cassandra cluster to coordinate all cluster 
repairs, but I think if we did it right we might be able to have the 
in-cassandra implementation do this as well. 

{quote}
Maybe a better solution here is to provide an official sidecar ops tool for 
cassandra? It’s not trivial suggestion, I know, but every cluster needs one. I 
also feel like there's some momentum building around the idea in the developer 
community. I think it would be worth it to talk about that, before going too 
far with this.
{quote}
I liked this idea when Sankalp and Dinesh proposed it last week to us, and I 
still like it a lot. I'll keep this in mind during the port so that if we do 
end up with a sidecar by 4.0 we can easily switch to it if we decide that's 
better. I personally only think it would be better if we moved all the internal 
repair state out of Cassandra into the sidecar (similar to if we took all the 
internal compaction state out into the sidecar).
 

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Major
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would 

[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-03-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420800#comment-16420800
 ] 

Blake Eggleston commented on CASSANDRA-14346:
-

First, I’ve only skimmed the design doc and Alexander’s comments. I’m also not 
really familiar with how priam or reaper is implemented. So if I’m missing 
something obvious, let me know.

The problem I see with distributing control of cluster level operations like 
repair to the nodes themselves is that it’s more complicated to do correctly 
than it is with a separate management process. You have to deal with failure 
scenarios, internode coordinations, etc, etc. It seems like one of the benefits 
of having a sidecar project like reaper or priam is that you can dispense with 
a lot of the complexity that comes with designing around single points of 
failure, and simplify your management logic. 

Maybe a better solution here is to provide an official sidecar ops tool for 
cassandra? It’s not trivial suggestion, I know, but every cluster needs one. I 
also feel like there's some momentum building around the idea in the developer 
community. I think it would be worth it to talk about that, before going too 
far with this.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Major
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-03-30 Thread Alexander Dejanovski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420677#comment-16420677
 ] 

Alexander Dejanovski commented on CASSANDRA-14346:
--

I was told that my comments sounded like I'm strongly opposed to this ticket, 
which is absolutely not the case so I'll sum up my thoughts here : 
 * Coordinated repair is a must have and should be the first thing that's 
implemented
 * Scheduling and (especially) auto scheduling will require more thoughts and 
discussion IMHO, at least as long as incremental repair has not proved to be 
bulletproof in 4.0 (we still have to see it running in production for a while). 
Once we can repair any table/keyspace in just a few minutes things will be very 
different.
 * Based on what the Apache Cassandra project went through with new features 
lately, I wouldn't rush into implementing all of this by default and take a 
more cautious approach for 4.0.

On a side note, because one might think I'm biased in that conversation (hum 
monologue so far), removing boilerplate from Reaper to have some features like 
computing the splits or coordinating the repair jobs handled by Cassandra 
internally would actually make me VERY happy.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Major
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13665) nodetool clientlist

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420673#comment-16420673
 ] 

ASF GitHub Bot commented on CASSANDRA-13665:


Github user clohfink closed the pull request at:

https://github.com/apache/cassandra/pull/190


> nodetool clientlist
> ---
>
> Key: CASSANDRA-13665
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13665
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jon Haddad
>Assignee: Chris Lohfink
>Priority: Major
>
> There should exist a nodetool command that lists each client connection. 
> Ideally it would display the following:
>  * host
>  * protocol version
>  * user logged in as
>  * current keyspace
>  * total queries executed
>  * ssl connections



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14202) Assertion error on sstable open during startup should invoke disk failure policy

2018-03-30 Thread Blake Eggleston (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420667#comment-16420667
 ] 

Blake Eggleston commented on CASSANDRA-14202:
-

+1

> Assertion error on sstable open during startup should invoke disk failure 
> policy
> 
>
> Key: CASSANDRA-14202
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14202
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should catch all exceptions when opening sstables on startup and invoke 
> the disk failure policy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14202) Assertion error on sstable open during startup should invoke disk failure policy

2018-03-30 Thread Blake Eggleston (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-14202:

Status: Ready to Commit  (was: Patch Available)

> Assertion error on sstable open during startup should invoke disk failure 
> policy
> 
>
> Key: CASSANDRA-14202
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14202
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: Marcus Eriksson
>Assignee: Marcus Eriksson
>Priority: Major
> Fix For: 3.0.x, 3.11.x, 4.x
>
>
> We should catch all exceptions when opening sstables on startup and invoke 
> the disk failure policy



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-6719) redesign loadnewsstables

2018-03-30 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420655#comment-16420655
 ] 

Jordan West edited comment on CASSANDRA-6719 at 3/30/18 4:11 PM:
-

[~krummas], patch looks great. comments below (mostly minor):
 * Since {{nodetool refresh}} is being deprecated, instead of modified, it 
would be nice to maintain as much of the same functionality as before. 
Additional options could be introduced to address the two areas I noticed 
changes:
 ** FSUtils.handleCorruptSSTable/handleFSError are no longer called
 ** Row cache invalidation was not previously performed — this is a good thing 
regardless, so maybe skip an option for this one. 

 * If using {{nodetool refresh}} with JDOB, the counting keys per boundary work 
is done just to throw it away.
 * Minor/naming nit: consider renaming {{CFS#loadSSTables}}’s dirPath -> 
srcPath and {{findBestDiskAndInvalidCache}}’s path -> srcPath
 * Minor/usability nit: I couldn’t find many cases where 
{{@Option(required=true)}} is used. WDYT about moving the path to a positional 
argument since its required and this command does not take a variable number of 
positional args?
 * Minor/usability nit: Instead of noVerify=true,noVerifyTokens=false being an 
invalid state, make noVerify=true imply noVerifyTokens=true. 
 * The JavaDoc for {{CFS.loadNewSSTables}} should be updated to point to the 
new {{StorageService.loadSSTables}}. 
 * The comment on CFS#L861 is useful but out of place. 
 * Minor/naming nit: The naming of the “allKeys” variable in 
{{ImportTest#testImportInvalidateCache}} is misleading. 
 * Minor nits in {{ImportTest#testBestDisk}}:
 ** Instead of hardcoding token values what about using e.g. 
{{t.compareTo(mock.getDiskBoundaries().positions.get(0).getToken()) <= 0}}?
 ** Are you intentionally leaving the Random seed hardcoded?


was (Author: jrwest):
[~krummas], patch looks great. comments below (mostly minor):
 * Since \{{nodetool refresh }}is being deprecated, instead of modified, it 
would be nice to maintain as much of the same functionality as before. 
Additional options could be introduced to address the two areas I noticed 
changes:
 ** FSUtils.handleCorruptSSTable/handleFSError are no longer called
 ** Row cache invalidation was not previously performed — this is a good thing 
regardless, so maybe skip an option for this one. 

 * If using {{nodetool refresh}} with JDOB, the counting keys per boundary work 
is done just to throw it away.
 * Minor/naming nit: consider renaming {{CFS#loadSSTables}}’s dirPath -> 
srcPath and {{findBestDiskAndInvalidCache}}’s path -> srcPath
 * Minor/usability nit: I couldn’t find many cases where 
{{@Option(required=true)}} is used. WDYT about moving the path to a positional 
argument since its required and this command does not take a variable number of 
positional args?
 * Minor/usability nit: Instead of noVerify=true,noVerifyTokens=false being an 
invalid state, make noVerify=true imply noVerifyTokens=true. 
 * The JavaDoc for {{CFS.loadNewSSTables}} should be updated to point to the 
new {{StorageService.loadSSTables}}. 
 * The comment on CFS#L861 is useful but out of place. 
 * Minor/naming nit: The naming of the “allKeys” variable in 
{{ImportTest#testImportInvalidateCache}} is misleading. 
 * Minor nits in {{ImportTest#testBestDisk}}:
 ** Instead of hardcoding token values what about using e.g. 
{{t.compareTo(mock.getDiskBoundaries().positions.get(0).getToken()) <= 0}}?
 ** Are you intentionally leaving the Random seed hardcoded?

> redesign loadnewsstables
> 
>
> Key: CASSANDRA-6719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6719
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: 6719.patch
>
>
> CFSMBean.loadNewSSTables scans data directories for new sstables dropped 
> there by an external agent.  This is dangerous because of possible filename 
> conflicts with existing or newly generated sstables.
> Instead, we should support leaving the new sstables in a separate directory 
> (specified by a parameter, or configured as a new location in yaml) and take 
> care of renaming as necessary automagically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Comment Edited] (CASSANDRA-6719) redesign loadnewsstables

2018-03-30 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420655#comment-16420655
 ] 

Jordan West edited comment on CASSANDRA-6719 at 3/30/18 4:11 PM:
-

[~krummas], patch looks great. comments below (mostly minor):
 * Since \{{nodetool refresh }}is being deprecated, instead of modified, it 
would be nice to maintain as much of the same functionality as before. 
Additional options could be introduced to address the two areas I noticed 
changes:
 ** FSUtils.handleCorruptSSTable/handleFSError are no longer called
 ** Row cache invalidation was not previously performed — this is a good thing 
regardless, so maybe skip an option for this one. 

 * If using {{nodetool refresh}} with JDOB, the counting keys per boundary work 
is done just to throw it away.
 * Minor/naming nit: consider renaming {{CFS#loadSSTables}}’s dirPath -> 
srcPath and {{findBestDiskAndInvalidCache}}’s path -> srcPath
 * Minor/usability nit: I couldn’t find many cases where 
{{@Option(required=true)}} is used. WDYT about moving the path to a positional 
argument since its required and this command does not take a variable number of 
positional args?
 * Minor/usability nit: Instead of noVerify=true,noVerifyTokens=false being an 
invalid state, make noVerify=true imply noVerifyTokens=true. 
 * The JavaDoc for {{CFS.loadNewSSTables}} should be updated to point to the 
new {{StorageService.loadSSTables}}. 
 * The comment on CFS#L861 is useful but out of place. 
 * Minor/naming nit: The naming of the “allKeys” variable in 
{{ImportTest#testImportInvalidateCache}} is misleading. 
 * Minor nits in {{ImportTest#testBestDisk}}:
 ** Instead of hardcoding token values what about using e.g. 
{{t.compareTo(mock.getDiskBoundaries().positions.get(0).getToken()) <= 0}}?
 ** Are you intentionally leaving the Random seed hardcoded?


was (Author: jrwest):
[~krummas], patch looks great. comments below (mostly minor):
 * Since {{nodetool refresh }}is being deprecated, instead of modified, it 
would be nice to maintain as much of the same functionality as before. 
Additional options could be introduced to address the two areas I noticed 
changes:
 * FSUtils.handleCorruptSSTable/handleFSError are no longer called
 * Row cache invalidation was not previously performed — this is a good thing 
regardless, so maybe skip an option for this one. 


 * If using {{nodetool refresh}} with JDOB, the counting keys per boundary work 
is done just to throw it away.
 * Minor/naming nit: consider renaming {{CFS#loadSSTables}}’s dirPath -> 
srcPath and {{findBestDiskAndInvalidCache}}’s path -> srcPath
 * Minor/usability nit: I couldn’t find many cases where 
{{@Option(required=true)}} is used. WDYT about moving the path to a positional 
argument since its required and this command does not take a variable number of 
positional args?
 * Minor/usability nit: Instead of noVerify=true,noVerifyTokens=false being an 
invalid state, make noVerify=true imply noVerifyTokens=true. 
 * The JavaDoc for {{CFS.loadNewSSTables}} should be updated to point to the 
new {{StorageService.loadSSTables}}. 
 * The comment on CFS#L861 is useful but out of place. 
 * Minor/naming nit: The naming of the “allKeys” variable in 
{{ImportTest#testImportInvalidateCache}} is misleading. 
 * Minor nits in {{ImportTest#testBestDisk}}:
 * Instead of hardcoding token values what about using e.g. 
{{t.compareTo(mock.getDiskBoundaries().positions.get(0).getToken()) <= 0}}?
 * Are you intentionally leaving the Random seed hardcoded?

> redesign loadnewsstables
> 
>
> Key: CASSANDRA-6719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6719
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: 6719.patch
>
>
> CFSMBean.loadNewSSTables scans data directories for new sstables dropped 
> there by an external agent.  This is dangerous because of possible filename 
> conflicts with existing or newly generated sstables.
> Instead, we should support leaving the new sstables in a separate directory 
> (specified by a parameter, or configured as a new location in yaml) and take 
> care of renaming as necessary automagically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-6719) redesign loadnewsstables

2018-03-30 Thread Jordan West (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-6719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420655#comment-16420655
 ] 

Jordan West commented on CASSANDRA-6719:


[~krummas], patch looks great. comments below (mostly minor):
 * Since {{nodetool refresh }}is being deprecated, instead of modified, it 
would be nice to maintain as much of the same functionality as before. 
Additional options could be introduced to address the two areas I noticed 
changes:
 * FSUtils.handleCorruptSSTable/handleFSError are no longer called
 * Row cache invalidation was not previously performed — this is a good thing 
regardless, so maybe skip an option for this one. 


 * If using {{nodetool refresh}} with JDOB, the counting keys per boundary work 
is done just to throw it away.
 * Minor/naming nit: consider renaming {{CFS#loadSSTables}}’s dirPath -> 
srcPath and {{findBestDiskAndInvalidCache}}’s path -> srcPath
 * Minor/usability nit: I couldn’t find many cases where 
{{@Option(required=true)}} is used. WDYT about moving the path to a positional 
argument since its required and this command does not take a variable number of 
positional args?
 * Minor/usability nit: Instead of noVerify=true,noVerifyTokens=false being an 
invalid state, make noVerify=true imply noVerifyTokens=true. 
 * The JavaDoc for {{CFS.loadNewSSTables}} should be updated to point to the 
new {{StorageService.loadSSTables}}. 
 * The comment on CFS#L861 is useful but out of place. 
 * Minor/naming nit: The naming of the “allKeys” variable in 
{{ImportTest#testImportInvalidateCache}} is misleading. 
 * Minor nits in {{ImportTest#testBestDisk}}:
 * Instead of hardcoding token values what about using e.g. 
{{t.compareTo(mock.getDiskBoundaries().positions.get(0).getToken()) <= 0}}?
 * Are you intentionally leaving the Random seed hardcoded?

> redesign loadnewsstables
> 
>
> Key: CASSANDRA-6719
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6719
> Project: Cassandra
>  Issue Type: New Feature
>  Components: Tools
>Reporter: Jonathan Ellis
>Assignee: Marcus Eriksson
>Priority: Minor
>  Labels: lhf
> Fix For: 4.x
>
> Attachments: 6719.patch
>
>
> CFSMBean.loadNewSSTables scans data directories for new sstables dropped 
> there by an external agent.  This is dangerous because of possible filename 
> conflicts with existing or newly generated sstables.
> Instead, we should support leaving the new sstables in a separate directory 
> (specified by a parameter, or configured as a new location in yaml) and take 
> care of renaming as necessary automagically.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor

2018-03-30 Thread Michael Burman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Burman updated CASSANDRA-14356:
---
Fix Version/s: 4.0
Reproduced In: 4.0
   Status: Patch Available  (was: Open)

> LWTs keep failing in trunk after immutable refactor
> ---
>
> Key: CASSANDRA-14356
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14356
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), 
> Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 
> 14:01)
>Reporter: Michael Burman
>Priority: Major
> Fix For: 4.0
>
> Attachments: CASSANDRA-14356.diff
>
>
> In the PaxosState, the original assert check is in the form of:
> assert promised.update.metadata() == accepted.update.metadata() && 
> accepted.update.metadata() == mostRecentCommit.update.metadata();
> However, after the change to make TableMetadata immutable this no longer 
> works as these instances are not necessarily the same (or never). This causes 
> the LWTs to fail although they're still correctly targetting the same table.
> From IRC:
>  It's a bug alright. Though really, the assertion should be on the 
> metadata ids, cause TableMetadata#equals does more than what we want.
>  That is, replacing by .equals() is not ok. That would reject throw 
> on any change to a table metadata, while the spirit of the assumption was to 
> sanity check both update were on the same table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor

2018-03-30 Thread Michael Burman (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Burman updated CASSANDRA-14356:
---
Attachment: CASSANDRA-14356.diff

> LWTs keep failing in trunk after immutable refactor
> ---
>
> Key: CASSANDRA-14356
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14356
> Project: Cassandra
>  Issue Type: Bug
>  Components: Core
> Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), 
> Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 
> 14:01)
>Reporter: Michael Burman
>Priority: Major
> Attachments: CASSANDRA-14356.diff
>
>
> In the PaxosState, the original assert check is in the form of:
> assert promised.update.metadata() == accepted.update.metadata() && 
> accepted.update.metadata() == mostRecentCommit.update.metadata();
> However, after the change to make TableMetadata immutable this no longer 
> works as these instances are not necessarily the same (or never). This causes 
> the LWTs to fail although they're still correctly targetting the same table.
> From IRC:
>  It's a bug alright. Though really, the assertion should be on the 
> metadata ids, cause TableMetadata#equals does more than what we want.
>  That is, replacing by .equals() is not ok. That would reject throw 
> on any change to a table metadata, while the spirit of the assumption was to 
> sanity check both update were on the same table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Updated] (CASSANDRA-14318) Fix query pager DEBUG log leak causing hit in paged reads throughput

2018-03-30 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-14318:

Summary: Fix query pager DEBUG log leak causing hit in paged reads 
throughput  (was: Debug logging can create massive performance issues)

> Fix query pager DEBUG log leak causing hit in paged reads throughput
> 
>
> Key: CASSANDRA-14318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14318
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Major
>  Labels: lhf, performance
> Fix For: 2.2.13
>
> Attachments: cassandra-2.2-debug.yaml, debuglogging.png, flame22 
> nodebug sjk svg.png, flame22-nodebug-sjk.svg, flame22-sjk.svg, 
> flame_graph_snapshot.png
>
>
> Debug logging can involve in many cases (especially very low latency ones) a 
> very important overhead on the read path in 2.2 as we've seen when upgrading 
> clusters from 2.0 to 2.2.
> The performance impact was especially noticeable on the client side metrics, 
> where p99 could go up to 10 times higher, while ClientRequest metrics 
> recorded by Cassandra didn't show any overhead.
> Below shows latencies recorded on the client side with debug logging on 
> first, and then without it :
> !debuglogging.png!  
> We generated a flame graph before turning off debug logging that shows the 
> read call stack is dominated by debug logging : 
> !flame_graph_snapshot.png!
> I've attached the original flame graph for exploration.
> Once disabled, the new flame graph shows that the read call stack gets 
> extremely thin, which is further confirmed by client recorded metrics : 
> !flame22 nodebug sjk svg.png!
> The query pager code has been reworked since 3.0 and it looks like 
> log.debug() calls are gone there, but for 2.2 users and to prevent such 
> issues to appear with default settings, I really think debug logging should 
> be disabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[01/10] cassandra git commit: Downgrade logger.debug calls to logger.trace in the read path

2018-03-30 Thread paulo
Repository: cassandra
Updated Branches:
  refs/heads/cassandra-2.2 53b6116d5 -> ac77e5e77
  refs/heads/cassandra-3.0 68079e4b2 -> 2d2b1a71f
  refs/heads/cassandra-3.11 18278e422 -> 6f30677b2
  refs/heads/trunk c22ee2bd4 -> b08b4dcc7


Downgrade logger.debug calls to logger.trace in the read path

Patch by Alexander Dejanovski; Reviewed by Paulo Motta for CASSANDRA-14318


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ac77e5e7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ac77e5e7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ac77e5e7

Branch: refs/heads/cassandra-2.2
Commit: ac77e5e7742548f7c7c25da3923841f59d4b2713
Parents: 53b6116
Author: Alexander Dejanovski 
Authored: Tue Mar 27 12:05:27 2018 +0200
Committer: Paulo Motta 
Committed: Fri Mar 30 12:10:05 2018 -0300

--
 CHANGES.txt | 1 +
 .../apache/cassandra/service/pager/AbstractQueryPager.java  | 9 +
 .../org/apache/cassandra/service/pager/SliceQueryPager.java | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 2e45b85..4828517 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.13
+ * Fix query pager DEBUG log leak causing hit in paged reads throughput 
(CASSANDRA-14318)
  * Backport circleci yaml (CASSANDRA-14240)
 Merged from 2.1:
  * CVE-2017-5929 Security vulnerability in Logback warning in NEWS.txt 
(CASSANDRA-14183)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
--
diff --git 
a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
index 02623eb..46d4a3e 100644
--- a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
@@ -86,13 +86,13 @@ abstract class AbstractQueryPager implements QueryPager
 
 if (rows.isEmpty())
 {
-logger.debug("Got empty set of rows, considering pager exhausted");
+logger.trace("Got empty set of rows, considering pager exhausted");
 exhausted = true;
 return Collections.emptyList();
 }
 
 int liveCount = getPageLiveCount(rows);
-logger.debug("Fetched {} live rows", liveCount);
+logger.trace("Fetched {} live rows", liveCount);
 
 // Because SP.getRangeSlice doesn't trim the result (see SP.trim()), 
liveCount may be greater than what asked
 // (currentPageSize). This would throw off the paging logic so we trim 
the excess. It's not extremely efficient
@@ -109,7 +109,8 @@ abstract class AbstractQueryPager implements QueryPager
 // we still need to return the current page)
 if (liveCount < currentPageSize)
 {
-logger.debug("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount, currentPageSize);
+logger.trace("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount,
+currentPageSize);
 exhausted = true;
 }
 
@@ -130,7 +131,7 @@ abstract class AbstractQueryPager implements QueryPager
 remaining++;
 }
 
-logger.debug("Remaining rows to page: {}", remaining);
+logger.trace("Remaining rows to page: {}", remaining);
 
 if (!isExhausted())
 shouldFetchExtraRow = recordLast(rows.get(rows.size() - 1));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
--
diff --git a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
index 1a2fc6c..3420831 100644
--- a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
@@ -89,7 +89,7 @@ public class SliceQueryPager extends AbstractQueryPager 
implements SinglePartiti
 if (lastReturned != null)
 filter = filter.withUpdatedStart(lastReturned, cfm);
 
-logger.debug("Querying next page of slice query; new filter: {}", 
filter);
+logger.trace("Querying next page of slice query; new filter: {}", 
filter);
 ReadCommand pageCmd = command.withUpdatedFilter(filter);
 return 

[jira] [Updated] (CASSANDRA-14318) Debug logging can create massive performance issues

2018-03-30 Thread Paulo Motta (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-14318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Paulo Motta updated CASSANDRA-14318:

   Resolution: Fixed
 Reviewer: Paulo Motta
Fix Version/s: (was: 3.11.x)
   (was: 4.x)
   (was: 3.0.x)
   (was: 2.2.x)
   2.2.13
   Status: Resolved  (was: Patch Available)

{quote}I think anyone performing benchmarks for Cassandra changes should be 
aware that the predefined mode isn't relevant and that a user defined test 
should be used (maybe we should create one that would be used as standard 
benchmark).
{quote}
Good find! Can you check if this is the case in trunk, and if so maybe open a 
lhf ticket to change that?
{quote}For the record, the same tests on 3.11.2 didn't show any notable 
performance difference between debug on and off
{quote}
Nice to know we managed to handle all debug/verbose log leaks there. It will be 
easier to maintain this after CASSANDRA-14326.
{quote}here's the patch if you're willing to review/commit it, and the unit 
test results in CircleCI.
{quote}
Thanks for the patch, experiments and analysis! Even though 2.2 is on critical 
fixes only mode, 50% is a significant performance hit on throughput for this 
workload, and since the patch is pretty simple I don't see a reason not to 
commit it.

CI looks good. I added a CHANGES.txt not and committed as 
{{ac77e5e7742548f7c7c25da3923841f59d4b2713}} to cassandra-2.2 branch.

> Debug logging can create massive performance issues
> ---
>
> Key: CASSANDRA-14318
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14318
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Alexander Dejanovski
>Assignee: Alexander Dejanovski
>Priority: Major
>  Labels: lhf, performance
> Fix For: 2.2.13
>
> Attachments: cassandra-2.2-debug.yaml, debuglogging.png, flame22 
> nodebug sjk svg.png, flame22-nodebug-sjk.svg, flame22-sjk.svg, 
> flame_graph_snapshot.png
>
>
> Debug logging can involve in many cases (especially very low latency ones) a 
> very important overhead on the read path in 2.2 as we've seen when upgrading 
> clusters from 2.0 to 2.2.
> The performance impact was especially noticeable on the client side metrics, 
> where p99 could go up to 10 times higher, while ClientRequest metrics 
> recorded by Cassandra didn't show any overhead.
> Below shows latencies recorded on the client side with debug logging on 
> first, and then without it :
> !debuglogging.png!  
> We generated a flame graph before turning off debug logging that shows the 
> read call stack is dominated by debug logging : 
> !flame_graph_snapshot.png!
> I've attached the original flame graph for exploration.
> Once disabled, the new flame graph shows that the read call stack gets 
> extremely thin, which is further confirmed by client recorded metrics : 
> !flame22 nodebug sjk svg.png!
> The query pager code has been reworked since 3.0 and it looks like 
> log.debug() calls are gone there, but for 2.2 users and to prevent such 
> issues to appear with default settings, I really think debug logging should 
> be disabled by default.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[02/10] cassandra git commit: Downgrade logger.debug calls to logger.trace in the read path

2018-03-30 Thread paulo
Downgrade logger.debug calls to logger.trace in the read path

Patch by Alexander Dejanovski; Reviewed by Paulo Motta for CASSANDRA-14318


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ac77e5e7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ac77e5e7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ac77e5e7

Branch: refs/heads/cassandra-3.0
Commit: ac77e5e7742548f7c7c25da3923841f59d4b2713
Parents: 53b6116
Author: Alexander Dejanovski 
Authored: Tue Mar 27 12:05:27 2018 +0200
Committer: Paulo Motta 
Committed: Fri Mar 30 12:10:05 2018 -0300

--
 CHANGES.txt | 1 +
 .../apache/cassandra/service/pager/AbstractQueryPager.java  | 9 +
 .../org/apache/cassandra/service/pager/SliceQueryPager.java | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 2e45b85..4828517 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.13
+ * Fix query pager DEBUG log leak causing hit in paged reads throughput 
(CASSANDRA-14318)
  * Backport circleci yaml (CASSANDRA-14240)
 Merged from 2.1:
  * CVE-2017-5929 Security vulnerability in Logback warning in NEWS.txt 
(CASSANDRA-14183)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
--
diff --git 
a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
index 02623eb..46d4a3e 100644
--- a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
@@ -86,13 +86,13 @@ abstract class AbstractQueryPager implements QueryPager
 
 if (rows.isEmpty())
 {
-logger.debug("Got empty set of rows, considering pager exhausted");
+logger.trace("Got empty set of rows, considering pager exhausted");
 exhausted = true;
 return Collections.emptyList();
 }
 
 int liveCount = getPageLiveCount(rows);
-logger.debug("Fetched {} live rows", liveCount);
+logger.trace("Fetched {} live rows", liveCount);
 
 // Because SP.getRangeSlice doesn't trim the result (see SP.trim()), 
liveCount may be greater than what asked
 // (currentPageSize). This would throw off the paging logic so we trim 
the excess. It's not extremely efficient
@@ -109,7 +109,8 @@ abstract class AbstractQueryPager implements QueryPager
 // we still need to return the current page)
 if (liveCount < currentPageSize)
 {
-logger.debug("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount, currentPageSize);
+logger.trace("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount,
+currentPageSize);
 exhausted = true;
 }
 
@@ -130,7 +131,7 @@ abstract class AbstractQueryPager implements QueryPager
 remaining++;
 }
 
-logger.debug("Remaining rows to page: {}", remaining);
+logger.trace("Remaining rows to page: {}", remaining);
 
 if (!isExhausted())
 shouldFetchExtraRow = recordLast(rows.get(rows.size() - 1));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
--
diff --git a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
index 1a2fc6c..3420831 100644
--- a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
@@ -89,7 +89,7 @@ public class SliceQueryPager extends AbstractQueryPager 
implements SinglePartiti
 if (lastReturned != null)
 filter = filter.withUpdatedStart(lastReturned, cfm);
 
-logger.debug("Querying next page of slice query; new filter: {}", 
filter);
+logger.trace("Querying next page of slice query; new filter: {}", 
filter);
 ReadCommand pageCmd = command.withUpdatedFilter(filter);
 return localQuery
  ? 
Collections.singletonList(pageCmd.getRow(Keyspace.open(command.ksName)))


-
To unsubscribe, e-mail: 

[08/10] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2018-03-30 Thread paulo
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f30677b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f30677b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f30677b

Branch: refs/heads/cassandra-3.11
Commit: 6f30677b28dcbf82bcd0a291f3294ddf87dafaac
Parents: 18278e4 2d2b1a7
Author: Paulo Motta 
Authored: Fri Mar 30 12:15:37 2018 -0300
Committer: Paulo Motta 
Committed: Fri Mar 30 12:15:37 2018 -0300

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[10/10] cassandra git commit: Merge branch 'cassandra-3.11' into trunk

2018-03-30 Thread paulo
Merge branch 'cassandra-3.11' into trunk


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/b08b4dcc
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/b08b4dcc
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/b08b4dcc

Branch: refs/heads/trunk
Commit: b08b4dcc72a8d9b4fa5b92dae97ba527161f130d
Parents: c22ee2b 6f30677
Author: Paulo Motta 
Authored: Fri Mar 30 12:16:19 2018 -0300
Committer: Paulo Motta 
Committed: Fri Mar 30 12:16:19 2018 -0300

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[06/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0

2018-03-30 Thread paulo
Merge branch 'cassandra-2.2' into cassandra-3.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2d2b1a71
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2d2b1a71
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2d2b1a71

Branch: refs/heads/cassandra-3.0
Commit: 2d2b1a71f8bd8dbd069cb2bc321936e819baa9ad
Parents: 68079e4 ac77e5e
Author: Paulo Motta 
Authored: Fri Mar 30 12:15:17 2018 -0300
Committer: Paulo Motta 
Committed: Fri Mar 30 12:15:17 2018 -0300

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[05/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0

2018-03-30 Thread paulo
Merge branch 'cassandra-2.2' into cassandra-3.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2d2b1a71
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2d2b1a71
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2d2b1a71

Branch: refs/heads/cassandra-3.11
Commit: 2d2b1a71f8bd8dbd069cb2bc321936e819baa9ad
Parents: 68079e4 ac77e5e
Author: Paulo Motta 
Authored: Fri Mar 30 12:15:17 2018 -0300
Committer: Paulo Motta 
Committed: Fri Mar 30 12:15:17 2018 -0300

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[07/10] cassandra git commit: Merge branch 'cassandra-2.2' into cassandra-3.0

2018-03-30 Thread paulo
Merge branch 'cassandra-2.2' into cassandra-3.0


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/2d2b1a71
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/2d2b1a71
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/2d2b1a71

Branch: refs/heads/trunk
Commit: 2d2b1a71f8bd8dbd069cb2bc321936e819baa9ad
Parents: 68079e4 ac77e5e
Author: Paulo Motta 
Authored: Fri Mar 30 12:15:17 2018 -0300
Committer: Paulo Motta 
Committed: Fri Mar 30 12:15:17 2018 -0300

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[04/10] cassandra git commit: Downgrade logger.debug calls to logger.trace in the read path

2018-03-30 Thread paulo
Downgrade logger.debug calls to logger.trace in the read path

Patch by Alexander Dejanovski; Reviewed by Paulo Motta for CASSANDRA-14318


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ac77e5e7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ac77e5e7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ac77e5e7

Branch: refs/heads/trunk
Commit: ac77e5e7742548f7c7c25da3923841f59d4b2713
Parents: 53b6116
Author: Alexander Dejanovski 
Authored: Tue Mar 27 12:05:27 2018 +0200
Committer: Paulo Motta 
Committed: Fri Mar 30 12:10:05 2018 -0300

--
 CHANGES.txt | 1 +
 .../apache/cassandra/service/pager/AbstractQueryPager.java  | 9 +
 .../org/apache/cassandra/service/pager/SliceQueryPager.java | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 2e45b85..4828517 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.13
+ * Fix query pager DEBUG log leak causing hit in paged reads throughput 
(CASSANDRA-14318)
  * Backport circleci yaml (CASSANDRA-14240)
 Merged from 2.1:
  * CVE-2017-5929 Security vulnerability in Logback warning in NEWS.txt 
(CASSANDRA-14183)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
--
diff --git 
a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
index 02623eb..46d4a3e 100644
--- a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
@@ -86,13 +86,13 @@ abstract class AbstractQueryPager implements QueryPager
 
 if (rows.isEmpty())
 {
-logger.debug("Got empty set of rows, considering pager exhausted");
+logger.trace("Got empty set of rows, considering pager exhausted");
 exhausted = true;
 return Collections.emptyList();
 }
 
 int liveCount = getPageLiveCount(rows);
-logger.debug("Fetched {} live rows", liveCount);
+logger.trace("Fetched {} live rows", liveCount);
 
 // Because SP.getRangeSlice doesn't trim the result (see SP.trim()), 
liveCount may be greater than what asked
 // (currentPageSize). This would throw off the paging logic so we trim 
the excess. It's not extremely efficient
@@ -109,7 +109,8 @@ abstract class AbstractQueryPager implements QueryPager
 // we still need to return the current page)
 if (liveCount < currentPageSize)
 {
-logger.debug("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount, currentPageSize);
+logger.trace("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount,
+currentPageSize);
 exhausted = true;
 }
 
@@ -130,7 +131,7 @@ abstract class AbstractQueryPager implements QueryPager
 remaining++;
 }
 
-logger.debug("Remaining rows to page: {}", remaining);
+logger.trace("Remaining rows to page: {}", remaining);
 
 if (!isExhausted())
 shouldFetchExtraRow = recordLast(rows.get(rows.size() - 1));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
--
diff --git a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
index 1a2fc6c..3420831 100644
--- a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
@@ -89,7 +89,7 @@ public class SliceQueryPager extends AbstractQueryPager 
implements SinglePartiti
 if (lastReturned != null)
 filter = filter.withUpdatedStart(lastReturned, cfm);
 
-logger.debug("Querying next page of slice query; new filter: {}", 
filter);
+logger.trace("Querying next page of slice query; new filter: {}", 
filter);
 ReadCommand pageCmd = command.withUpdatedFilter(filter);
 return localQuery
  ? 
Collections.singletonList(pageCmd.getRow(Keyspace.open(command.ksName)))


-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For 

[09/10] cassandra git commit: Merge branch 'cassandra-3.0' into cassandra-3.11

2018-03-30 Thread paulo
Merge branch 'cassandra-3.0' into cassandra-3.11


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/6f30677b
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/6f30677b
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/6f30677b

Branch: refs/heads/trunk
Commit: 6f30677b28dcbf82bcd0a291f3294ddf87dafaac
Parents: 18278e4 2d2b1a7
Author: Paulo Motta 
Authored: Fri Mar 30 12:15:37 2018 -0300
Committer: Paulo Motta 
Committed: Fri Mar 30 12:15:37 2018 -0300

--

--



-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[03/10] cassandra git commit: Downgrade logger.debug calls to logger.trace in the read path

2018-03-30 Thread paulo
Downgrade logger.debug calls to logger.trace in the read path

Patch by Alexander Dejanovski; Reviewed by Paulo Motta for CASSANDRA-14318


Project: http://git-wip-us.apache.org/repos/asf/cassandra/repo
Commit: http://git-wip-us.apache.org/repos/asf/cassandra/commit/ac77e5e7
Tree: http://git-wip-us.apache.org/repos/asf/cassandra/tree/ac77e5e7
Diff: http://git-wip-us.apache.org/repos/asf/cassandra/diff/ac77e5e7

Branch: refs/heads/cassandra-3.11
Commit: ac77e5e7742548f7c7c25da3923841f59d4b2713
Parents: 53b6116
Author: Alexander Dejanovski 
Authored: Tue Mar 27 12:05:27 2018 +0200
Committer: Paulo Motta 
Committed: Fri Mar 30 12:10:05 2018 -0300

--
 CHANGES.txt | 1 +
 .../apache/cassandra/service/pager/AbstractQueryPager.java  | 9 +
 .../org/apache/cassandra/service/pager/SliceQueryPager.java | 2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/CHANGES.txt
--
diff --git a/CHANGES.txt b/CHANGES.txt
index 2e45b85..4828517 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,4 +1,5 @@
 2.2.13
+ * Fix query pager DEBUG log leak causing hit in paged reads throughput 
(CASSANDRA-14318)
  * Backport circleci yaml (CASSANDRA-14240)
 Merged from 2.1:
  * CVE-2017-5929 Security vulnerability in Logback warning in NEWS.txt 
(CASSANDRA-14183)

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
--
diff --git 
a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
index 02623eb..46d4a3e 100644
--- a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
@@ -86,13 +86,13 @@ abstract class AbstractQueryPager implements QueryPager
 
 if (rows.isEmpty())
 {
-logger.debug("Got empty set of rows, considering pager exhausted");
+logger.trace("Got empty set of rows, considering pager exhausted");
 exhausted = true;
 return Collections.emptyList();
 }
 
 int liveCount = getPageLiveCount(rows);
-logger.debug("Fetched {} live rows", liveCount);
+logger.trace("Fetched {} live rows", liveCount);
 
 // Because SP.getRangeSlice doesn't trim the result (see SP.trim()), 
liveCount may be greater than what asked
 // (currentPageSize). This would throw off the paging logic so we trim 
the excess. It's not extremely efficient
@@ -109,7 +109,8 @@ abstract class AbstractQueryPager implements QueryPager
 // we still need to return the current page)
 if (liveCount < currentPageSize)
 {
-logger.debug("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount, currentPageSize);
+logger.trace("Got result ({}) smaller than page size ({}), 
considering pager exhausted", liveCount,
+currentPageSize);
 exhausted = true;
 }
 
@@ -130,7 +131,7 @@ abstract class AbstractQueryPager implements QueryPager
 remaining++;
 }
 
-logger.debug("Remaining rows to page: {}", remaining);
+logger.trace("Remaining rows to page: {}", remaining);
 
 if (!isExhausted())
 shouldFetchExtraRow = recordLast(rows.get(rows.size() - 1));

http://git-wip-us.apache.org/repos/asf/cassandra/blob/ac77e5e7/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
--
diff --git a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java 
b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
index 1a2fc6c..3420831 100644
--- a/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/SliceQueryPager.java
@@ -89,7 +89,7 @@ public class SliceQueryPager extends AbstractQueryPager 
implements SinglePartiti
 if (lastReturned != null)
 filter = filter.withUpdatedStart(lastReturned, cfm);
 
-logger.debug("Querying next page of slice query; new filter: {}", 
filter);
+logger.trace("Querying next page of slice query; new filter: {}", 
filter);
 ReadCommand pageCmd = command.withUpdatedFilter(filter);
 return localQuery
  ? 
Collections.singletonList(pageCmd.getRow(Keyspace.open(command.ksName)))


-
To unsubscribe, e-mail: 

[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-03-30 Thread Alexander Dejanovski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420577#comment-16420577
 ] 

Alexander Dejanovski commented on CASSANDRA-14346:
--

Two other issues with automated scheduling of repairs would be : 
 * Rolling upgrades : All repairs would have to be terminated and schedules 
stopped as soon as the cluster is running mixed versions
 * Expansion to new DCs : if repair triggers during the expansion to a new DC 
before rebuild has fully ended on all nodes, the cluster will be crushed by the 
entropy repair will find. Since many users will not be aware that the cluster 
is constantly repairing itself, this is likely to happen a lot.

The latter could be mitigated if a rebuild is detected and appropriate measures 
were taken. I'm not sure how we can detect this flawlessly though and there 
would still be many cases where the cluster has been expanded but rebuild isn't 
started right after.

It could be argued that any scheduled repair system is subject to the same 
caveats, but the difference is that those systems are setup by a user, not by 
the database itself, which should then be responsible for protecting itself 
against such scenarios.

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Major
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual consistency. Most 
> recently CASSANDRA-10070, CASSANDRA-8911 and CASSANDRA-13924 have all looked 
> for ways to solve this problem.
> At Netflix we've built a scheduled repair service within Priam (our sidecar), 
> which we spoke about last year at NGCC. Given the positive feedback at NGCC 
> we focussed on getting it production ready and have now been using it in 
> production to repair hundreds of clusters, tens of thousands of nodes, and 
> petabytes of data for the past six months. Also based on feedback at NGCC we 
> have invested effort in figuring out how to integrate this natively into 
> Cassandra rather than open sourcing it as an external service (e.g. in Priam).
> As such, [~vinaykumarcse] and I would like to re-work and merge our 
> implementation into Cassandra, and have created a [design 
> document|https://docs.google.com/document/d/1RV4rOrG1gwlD5IljmrIq_t45rz7H3xs9GbFSEyGzEtM/edit?usp=sharing]
>  showing how we plan to make it happen, including the the user interface.
> As we work on the code migration from Priam to Cassandra, any feedback would 
> be greatly appreciated about the interface or v1 implementation features. I 
> have tried to call out in the document features which we explicitly consider 
> future work (as well as a path forward to implement them in the future) 
> because I would very much like to get this done before the 4.0 merge window 
> closes, and to do that I think aggressively pruning scope is going to be a 
> necessity.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-14346) Scheduled Repair in Cassandra

2018-03-30 Thread Alexander Dejanovski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-14346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420504#comment-16420504
 ] 

Alexander Dejanovski commented on CASSANDRA-14346:
--

I really like the idea of making repair something that is coordinated by the 
cluster instead of being node centric like currently.
This is how it should be implemented, and external tools should only add 
features over this. nodetool really should be doing this by default.
I globally agree with the state machine that is detailed (haven't spent that 
much time on it though...)

I disagree with the doc Resiliency's point 6 that adding nodes won't impact the 
repair : it will change the token ranges and some of the splits will now spread 
across different replicas which will make them unsuitable for repair (think of 
clusters with 256 vnodes per node).
You either have to cancel the repair or recompute the remaining splits to move 
on with the job.

I would add a feature to your nodetool repairstatus command that allows to list 
only the currently running repairs.

Then I think the approach of implementing a fully automated, seamless, 
continuous repair "that just works" without user intervention is unsafe in the 
wild, there are too many caveats.
There are many different types of cluster out there and some of them just 
cannot run repair without careful tuning or monitoring (if at all).
The current design shows no backpressure mechanism to ensure that further 
running sequences won't harm the cluster because it's already running late on 
compactions (may it be due to overstreaming or entropy, or just the activity of 
the cluster).
Repairing by table will add a lot of overhead over repairing a list of tables 
(or all) in a single session, unless multiple repairs at once on a node are 
allowed, which won't permit to safely terminate a single repair.
It is also unclear in the current design if repair can be disabled for select 
tables for example (like "type: none").
The proposal doesn't seem to involve any change into how "nodetool repair" 
behaves. Will it be changed to use the state machine and coordinate throughout 
the cluster ?

Trying to replace external tools with built in features has its limits I think, 
and currently the design gives only limited control by such external tools (may 
it be Reaper or Datastax repair service or Priam or ...).
To make an analogy that was seen recently on the ML, it's as if you implemented 
automatic spreading of configuration changes from within Cassandra instead of 
relying on tools like Chef or Puppet.
You'll still need global tools to manage repairs over several clusters anyway, 
which a Cassandra built-in feature cannot (and should not) provide.

My point is that making repair smarter and coordinated within Cassandra is a 
great idea and I support it 100%, but the current design makes it too automated 
and the defaults could easily lead to severe performance problems without the 
user triggering anything.
I don't know either how it could be made to work along user defined repairs as 
you'll need to force terminate some sessions.

To summarize, I would put aside the scheduling features and implement the 
coordinated repairs by splits within Cassandra. The StorageServiceMBean should 
evolve to allow manually setting the number of splits by node, or rely on a 
number of split generated by Cassandra itself.
Then it should also be possible to track progress externally by listing splits 
(sequences) through JMX, and pause/resume select repair runs.

Also, the current design should evolve to allow a single sequence to include 
multiple token ranges. We have that feature waiting to be merged in Reaper to 
group token ranges that have the same replicas, in order to reduce the overhead 
of vnodes.
Starting with 3.0, repair jobs can be triggered with multiple token ranges that 
will be executed as a single session if the replicas are the same for all. So, 
to prevent having to change the data model in the future, I'd suggest storing a 
list of token ranges instead of just one.
Repair events should be tracked in a separate table also to avoid overwriting 
the last event each time (one thing Reaper currently sucks at as well).

I'll go back to the document soon and add my comments there.

 

Cheers

> Scheduled Repair in Cassandra
> -
>
> Key: CASSANDRA-14346
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14346
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Repair
>Reporter: Joseph Lynch
>Priority: Major
>  Labels: CommunityFeedbackRequested
> Fix For: 4.0
>
> Attachments: ScheduledRepairV1_20180327.pdf
>
>
> There have been many attempts to automate repair in Cassandra, which makes 
> sense given that it is necessary to give our users eventual 

[jira] [Created] (CASSANDRA-14356) LWTs keep failing in trunk after immutable refactor

2018-03-30 Thread Michael Burman (JIRA)
Michael Burman created CASSANDRA-14356:
--

 Summary: LWTs keep failing in trunk after immutable refactor
 Key: CASSANDRA-14356
 URL: https://issues.apache.org/jira/browse/CASSANDRA-14356
 Project: Cassandra
  Issue Type: Bug
  Components: Core
 Environment: OpenJDK Runtime Environment (build 1.8.0_161-b14), 
Cassandra 4.0 commit c22ee2bd451d030e99cfb65be839bbc735a5352f (29.3.2018 14:01)
Reporter: Michael Burman


In the PaxosState, the original assert check is in the form of:

assert promised.update.metadata() == accepted.update.metadata() && 
accepted.update.metadata() == mostRecentCommit.update.metadata();

However, after the change to make TableMetadata immutable this no longer works 
as these instances are not necessarily the same (or never). This causes the 
LWTs to fail although they're still correctly targetting the same table.

>From IRC:

 It's a bug alright. Though really, the assertion should be on the 
metadata ids, cause TableMetadata#equals does more than what we want.
 That is, replacing by .equals() is not ok. That would reject throw on 
any change to a table metadata, while the spirit of the assumption was to 
sanity check both update were on the same table.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12151) Audit logging for database activity

2018-03-30 Thread Stefan Podkowinski (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420389#comment-16420389
 ] 

Stefan Podkowinski commented on CASSANDRA-12151:


We should not provide solutions that we know will have significant performance 
issues on busy production systems. I don’t mind keeping the FileAuditLogger 
class, as it’s really trivial. But please remove references to it from 
cassandra.yaml and the corresponding entries in logback.xml (don’t like to have 
an empty audit.log file on all nodes either). Let’s nudge users to use to 
BinAuditLogger right away as the recommended solution.

If you don’t think adding an option like include_auditlog_types should be 
necessary, that’s fine. But then let me use my own implementation for filtering 
logs, if my requirements are different. This means that I’d have to be able to 
specify additional parameters (e.g. custom filtering options) along with the 
class name for my implementation. So I'd suggest to use ParameterizedClass for 
the logger in cassandra.yaml to make that easily possible.

Looking at IAuditLogger and thinking about how to filter log events makes me a 
bit worried about the design in general there. We keep generating AuditLogEntry 
instances and create unnecessary garbage, even if we’re only interested in some 
specific entry types. Maybe we should move filtering either into the 
IAuditLogger implementation or make it possible to use a custom AuditLogFilter 
as well (IAuditLogger.getLogFilter() ?).

Just looking at AuditLogFilter makes me also think that the isFiltered logic 
should be reconsidered, as null values will cause entries to always pass, even 
if the include set is not empty.
{quote}How do you suggest we approach this problem? We cannot keep the audit 
log files around indefinitely. Perhaps we can specify a disk quota for audit 
log files in the configuration? We can expose this setting in cassandra.yaml?
{quote}
How does this work with BinLogger? I haven't looked at that part in detail yet. 
Does it overwrite old entries automatically? How would you suggest users should 
archive logs from there?

> Audit logging for database activity
> ---
>
> Key: CASSANDRA-12151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12151
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: stefan setyadi
>Assignee: Vinay Chella
>Priority: Major
> Fix For: 4.x
>
> Attachments: 12151.txt, CASSANDRA_12151-benchmark.html, 
> DesignProposal_AuditingFeature_ApacheCassandra_v1.docx
>
>
> we would like a way to enable cassandra to log database activity being done 
> on our server.
> It should show username, remote address, timestamp, action type, keyspace, 
> column family, and the query statement.
> it should also be able to log connection attempt and changes to the 
> user/roles.
> I was thinking of making a new keyspace and insert an entry for every 
> activity that occurs.
> Then It would be possible to query for specific activity or a query targeting 
> a specific keyspace and column family.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-13665) nodetool clientlist

2018-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-13665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420351#comment-16420351
 ] 

ASF GitHub Bot commented on CASSANDRA-13665:


Github user mhartopo commented on a diff in the pull request:

https://github.com/apache/cassandra/pull/190#discussion_r178269292
  
--- Diff: src/java/org/apache/cassandra/tools/nodetool/ClientStats.java ---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.tools.nodetool;
+
+import java.util.List;
+import java.util.Map;
+import java.util.Map.Entry;
+
+import org.apache.cassandra.tools.NodeProbe;
+import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
+import org.apache.cassandra.tools.nodetool.formatter.TableBuilder;
+
+import io.airlift.airline.Command;
+import io.airlift.airline.Option;
+
+@Command(name = "clientstats", description = "Print information about 
connected clients")
+public class ClientStats extends NodeToolCmd
+{
+@Option(title = "list_connections", name = "--all", description = 
"Lists all connections")
+private boolean listConnections = false;
+
+@Override
+public void execute(NodeProbe probe)
+{
+if (listConnections)
+{
+List> clients = (List>) probe.getClientMetric("connections");
+if (!clients.isEmpty())
+{
+TableBuilder table = new TableBuilder();
+table.add("Address", "SSL", "Version", "User", "Keyspace", 
"Requests");
+for (Map conn : clients)
+{
+table.add(conn.get("address"), conn.get("ssl"), 
conn.get("version"), 
+  conn.get("user"), conn.get("keyspace"), 
conn.get("requests"));
+}
+table.printTo(System.out);
+System.out.println();
+}
+}
+
+Map connectionsByUser = (Map) 
probe.getClientMetric("connectedNativeClientsByUser");
+int total = connectionsByUser.values().stream().reduce(0, 
Integer::sum);
+System.out.println("Total connected clients: " + total);
+System.out.println();
+TableBuilder table = new TableBuilder();
+table.add("User", "Connections");
+for (Entry entry : connectionsByUser.entrySet())
+{
+table.add(entry.getKey(), entry.getValue().toString());
+}
+table.printTo(System.out);
+}
+}
--- End diff --

don't forget newline at the EOF


> nodetool clientlist
> ---
>
> Key: CASSANDRA-13665
> URL: https://issues.apache.org/jira/browse/CASSANDRA-13665
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Tools
>Reporter: Jon Haddad
>Assignee: Chris Lohfink
>Priority: Major
>
> There should exist a nodetool command that lists each client connection. 
> Ideally it would display the following:
>  * host
>  * protocol version
>  * user logged in as
>  * current keyspace
>  * total queries executed
>  * ssl connections



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-12151) Audit logging for database activity

2018-03-30 Thread Dinesh Joshi (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420323#comment-16420323
 ] 

Dinesh Joshi commented on CASSANDRA-12151:
--

Hey [~vinaykumarcse], thank you for making the code changes. Minor nit on the 
whitespacing but other than that LGTM.


Hey [~spo...@gmail.com], 
{quote}Rotating out log files by simply deleting them is probably also not what 
you'd expect from a auditing solution.
{quote}
How do you suggest we approach this problem? We cannot keep the audit log files 
around indefinitely. Perhaps we can specify a disk quota for audit log files in 
the configuration? We can expose this setting in cassandra.yaml?

> Audit logging for database activity
> ---
>
> Key: CASSANDRA-12151
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12151
> Project: Cassandra
>  Issue Type: New Feature
>Reporter: stefan setyadi
>Assignee: Vinay Chella
>Priority: Major
> Fix For: 4.x
>
> Attachments: 12151.txt, CASSANDRA_12151-benchmark.html, 
> DesignProposal_AuditingFeature_ApacheCassandra_v1.docx
>
>
> we would like a way to enable cassandra to log database activity being done 
> on our server.
> It should show username, remote address, timestamp, action type, keyspace, 
> column family, and the query statement.
> it should also be able to log connection attempt and changes to the 
> user/roles.
> I was thinking of making a new keyspace and insert an entry for every 
> activity that occurs.
> Then It would be possible to query for specific activity or a query targeting 
> a specific keyspace and column family.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org



[jira] [Commented] (CASSANDRA-9261) Prepare and Snapshot for repairs should use higher timeouts for expiring map

2018-03-30 Thread Pranav Jindal (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-9261?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16420309#comment-16420309
 ] 

Pranav Jindal commented on CASSANDRA-9261:
--

[~kohlisankalp] not exactly sure, but repair was hanged after these errors.

> Prepare and Snapshot for repairs should use higher timeouts for expiring map
> 
>
> Key: CASSANDRA-9261
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9261
> Project: Cassandra
>  Issue Type: Improvement
>Reporter: sankalp kohli
>Assignee: sankalp kohli
>Priority: Minor
> Fix For: 2.1.6, 2.2.0 beta 1
>
> Attachments: 0001-make-prepare-snapshot-timeout-to-1-hour.patch, 
> trunk_9261.txt
>
>
> We wait for 1 hour after sending the prepare message but expiring map will 
> remove it after RPC timeout. 
> In snapshot during repair, we only wait for RPC time. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org