[jira] [Updated] (CASSANDRA-12906) Update doco with new getting started with contribution section

2016-11-12 Thread Ben Slater (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Slater updated CASSANDRA-12906:
---
Status: Patch Available  (was: Open)

Draft updates in the attached patch. Substantial changes are only in the new 
file (gettingstarted.rst) other changes are just to update the index and add 
labels (anchors) in the existing files for linking.

> Update doco with new getting started with contribution section
> --
>
> Key: CASSANDRA-12906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12906
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Attachments: CASSANDRA_12906-trunk.patch
>
>
> Following discussion on the mailing list about how to get more community 
> input it seemed to be agreed that adding some doco emphasising contributions 
> other than creating new features would be a good idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12906) Update doco with new getting started with contribution section

2016-11-12 Thread Ben Slater (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ben Slater updated CASSANDRA-12906:
---
Attachment: CASSANDRA_12906-trunk.patch

> Update doco with new getting started with contribution section
> --
>
> Key: CASSANDRA-12906
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12906
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Documentation and Website
>Reporter: Ben Slater
>Assignee: Ben Slater
>Priority: Minor
> Attachments: CASSANDRA_12906-trunk.patch
>
>
> Following discussion on the mailing list about how to get more community 
> input it seemed to be agreed that adding some doco emphasising contributions 
> other than creating new features would be a good idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12906) Update doco with new getting started with contribution section

2016-11-12 Thread Ben Slater (JIRA)
Ben Slater created CASSANDRA-12906:
--

 Summary: Update doco with new getting started with contribution 
section
 Key: CASSANDRA-12906
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12906
 Project: Cassandra
  Issue Type: Improvement
  Components: Documentation and Website
Reporter: Ben Slater
Assignee: Ben Slater
Priority: Minor


Following discussion on the mailing list about how to get more community input 
it seemed to be agreed that adding some doco emphasising contributions other 
than creating new features would be a good idea.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset

2016-11-12 Thread Bing Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915
 ] 

Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 6:21 AM:
---

[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed 
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. 
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: 
Connection reset" and the timestamp when the "initiator" (the host that was 
running repair) reported failure to search the system.log on every node. Can 
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [STREAM-IN-/52.220.127.181:7001] 2016-11-10 22:23:31,196 
StreamSession.java:529 - [Stream #49719e00-a794-11e6-be90-f1ad7e862a5b] 
Streaming error occurred on session with peer 52.220.127.181
...
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] 
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep 
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (*Note* _the IP/timestamp 
in the log file differ from those in the original bug report as this is from 
another round of test_) debug.log.2016-11-10_2319.gz


was (Author: bing1wu):
[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed 
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. 
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: 
Connection reset" and the timestamp when the "initiator" (the host that was 
running repair) reported failure to search the system.log on every node. Can 
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] 
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep 
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (*Note* _the IP/timestamp 
in the log file differ from those in the original bug report as this is from 
another round of test_) debug.log.2016-11-10_2319.gz

> Streaming failed due to SSL Socket connection reset
> ---
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bing Wu
> Attachments: debug.log.2016-11-10_2319.gz
>
>
> While running "nodetool repair", I see many instances of 
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in 
> system.logs on some nodes in the cluster. Timestamps correspond to streaming 
> source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] 
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: 
> 

[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset

2016-11-12 Thread Bing Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915
 ] 

Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 6:03 AM:
---

[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed 
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. 
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: 
Connection reset" and the timestamp when the "initiator" (the host that was 
running repair) reported failure to search the system.log on every node. Can 
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] 
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep 
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (*Note* _the IP/timestamp 
in the log file differ from those in the original bug report as this is from 
another round of test_) debug.log.2016-11-10_2319.gz


was (Author: bing1wu):
[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed 
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. 
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: 
Connection reset" and the timestamp when the "initiator" (the host that was 
running repair) reported failure to search the system.log on every node. Can 
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] 
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep 
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (not the initiator) 
debug.log.2016-11-10_2319.gz

> Streaming failed due to SSL Socket connection reset
> ---
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bing Wu
> Attachments: debug.log.2016-11-10_2319.gz
>
>
> While running "nodetool repair", I see many instances of 
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in 
> system.logs on some nodes in the cluster. Timestamps correspond to streaming 
> source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] 
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: 
> javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) 
> ~[na:1.8.0_102]
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> ~[na:1.8.0_102]
> at 

[jira] [Issue Comment Deleted] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset

2016-11-12 Thread Bing Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Wu updated CASSANDRA-12886:

Comment: was deleted

(was: I lost track of the debug.log or some of the system.logs. And the admins 
just added the tcp_keepingalive settings. So I will try to repro this issue 
from a clean slate. )

> Streaming failed due to SSL Socket connection reset
> ---
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bing Wu
> Attachments: debug.log.2016-11-10_2319.gz
>
>
> While running "nodetool repair", I see many instances of 
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in 
> system.logs on some nodes in the cluster. Timestamps correspond to streaming 
> source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] 
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: 
> javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) 
> ~[na:1.8.0_102]
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> ~[na:1.8.0_102]
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> ~[na:1.8.0_102]
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> ~[na:1.8.0_102]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> ~[na:1.8.0_102]
> at 
> org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
>  ~[apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection 
> reset
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset

2016-11-12 Thread Bing Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915
 ] 

Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 5:46 AM:
---

[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed 
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. 
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: 
Connection reset" and the timestamp when the "initiator" (the host that was 
running repair) reported failure to search the system.log on every node. Can 
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] 
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep 
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (not the initiator) 
debug.log.2016-11-10_2319.gz


was (Author: bing1wu):
[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed 
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. 
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: 
Connection reset" and the timestamp when the "initiator" (the host that was 
running repair) reported failure to search the system.log on every node. Can 
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] 
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep 
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (not the initiator)

> Streaming failed due to SSL Socket connection reset
> ---
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bing Wu
> Attachments: debug.log.2016-11-10_2319.gz
>
>
> While running "nodetool repair", I see many instances of 
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in 
> system.logs on some nodes in the cluster. Timestamps correspond to streaming 
> source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] 
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: 
> javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) 
> ~[na:1.8.0_102]
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> ~[na:1.8.0_102]
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> ~[na:1.8.0_102]
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> 

[jira] [Updated] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset

2016-11-12 Thread Bing Wu (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bing Wu updated CASSANDRA-12886:

Attachment: debug.log.2016-11-10_2319.gz

> Streaming failed due to SSL Socket connection reset
> ---
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bing Wu
> Attachments: debug.log.2016-11-10_2319.gz
>
>
> While running "nodetool repair", I see many instances of 
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in 
> system.logs on some nodes in the cluster. Timestamps correspond to streaming 
> source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] 
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: 
> javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) 
> ~[na:1.8.0_102]
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> ~[na:1.8.0_102]
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> ~[na:1.8.0_102]
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> ~[na:1.8.0_102]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> ~[na:1.8.0_102]
> at 
> org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
>  ~[apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection 
> reset
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset

2016-11-12 Thread Bing Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915
 ] 

Bing Wu commented on CASSANDRA-12886:
-

[~pauloricardomg] To answer your questions
*Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed 
streams is the same throughout the cluster?
*A:* Yes. More details: Not all nodes in the cluster reported SSL failure. 
About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: 
Connection reset" and the timestamp when the "initiator" (the host that was 
running repair) reported failure to search the system.log on every node. Can 
confirm those failures all pointed back to the initiator, e.g. {noformat}
ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 
StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] 
Streaming error occurred on session with peer *initiator-public-ip*
{noformat}
*Q:* What are your tcp_keepalive settings? (see tuning guide here)
*A:* They are what's recommended: {noformat}$ sudo  sysctl -A | grep 
net.ipv4.tcp_keep
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_probes = 3
net.ipv4.tcp_keepalive_intvl = 10
{noformat}
*Q:* Also, can you paste full debug.log sample of a node with this error?
*A:* Here is the debug.log file on a remote machine (not the initiator)

> Streaming failed due to SSL Socket connection reset
> ---
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bing Wu
>
> While running "nodetool repair", I see many instances of 
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in 
> system.logs on some nodes in the cluster. Timestamps correspond to streaming 
> source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] 
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: 
> javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) 
> ~[na:1.8.0_102]
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> ~[na:1.8.0_102]
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> ~[na:1.8.0_102]
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> ~[na:1.8.0_102]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> ~[na:1.8.0_102]
> at 
> org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
>  ~[apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection 
> reset
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset

2016-11-12 Thread Bing Wu (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653288#comment-15653288
 ] 

Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 5:29 AM:
---

I lost track of the debug.log or some of the system.logs. And the admins just 
added the tcp_keepingalive settings. So I will try to repro this issue from a 
clean slate. 


was (Author: bing1wu):
[~pjrmoreira] I lost track of the debug.log or some of the system.logs. And the 
admins just added the tcp_keepingalive settings. So I will try to repro this 
issue from a clean slate. 

> Streaming failed due to SSL Socket connection reset
> ---
>
> Key: CASSANDRA-12886
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12886
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Bing Wu
>
> While running "nodetool repair", I see many instances of 
> "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in 
> system.logs on some nodes in the cluster. Timestamps correspond to streaming 
> source/initiator's error messages of "sync failed between ..."
> Setup: 
> - Cassandra 3.7.01 
> - CentOS 6.7 in AWS (multi-region)
> - JDK version: {noformat}
> java version "1.8.0_102"
> Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
> {noformat}
> - cassandra.yaml:
> {noformat}
> server_encryption_options:
> internode_encryption: all
> keystore: [path]
> keystore_password: [password]
> truststore: [path]
> truststore_password: [password]
> # More advanced defaults below:
> # protocol: TLS
> # algorithm: SunX509
> # store_type: JKS
> # cipher_suites: 
> [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
> require_client_auth: false
> {noformat}
> Error messages in system.log on the target host:
> {noformat}
> ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 
> StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] 
> Streaming error occurred on session with peer 54.247.111.232
> javax.net.ssl.SSLException: Connection has been shutdown: 
> javax.net.ssl.SSLException: java.net.SocketException: Connection reset
> at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) 
> ~[na:1.8.0_102]
> at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) 
> ~[na:1.8.0_102]
> at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) 
> ~[na:1.8.0_102]
> at 
> java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) 
> ~[na:1.8.0_102]
> at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) 
> ~[na:1.8.0_102]
> at 
> org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66)
>  ~[apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at 
> org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342)
>  [apache-cassandra-3.7.0.jar:3.7.0]
> at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
> Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection 
> reset
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-10726) Read repair inserts should not be blocking

2016-11-12 Thread Jason Brown (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-10726:

Assignee: Xiaolong Jiang  (was: Nachiket Patil)

> Read repair inserts should not be blocking
> --
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
>  Issue Type: Improvement
>  Components: Coordination
>Reporter: Richard Low
>Assignee: Xiaolong Jiang
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-12813) NPE in auth for bootstrapping node

2016-11-12 Thread Alex Petrov (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alex Petrov resolved CASSANDRA-12813.
-
Resolution: Fixed

> NPE in auth for bootstrapping node
> --
>
> Key: CASSANDRA-12813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12813
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Charles Mims
>Assignee: Alex Petrov
> Fix For: 2.2.9, 3.0.10, 3.10
>
>
> {code}
> ERROR [SharedPool-Worker-1] 2016-10-19 21:40:25,991 Message.java:617 - 
> Unexpected exception during request; channel = [id: 0x15eb017f, / omitted>:40869 => /10.0.0.254:9042]
> java.lang.NullPointerException: null
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:144)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:86)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:182)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_101]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.9.jar:3.0.9]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> {code}
> I have a node that has been joining for around 24 hours.  My application is 
> configured with the IP address of the joining node in the list of nodes to 
> connect to (ruby driver), and I have been getting around 200 events of this 
> NPE per hour.  I removed the IP of the joining node from the list of nodes 
> for my app to connect to and the errors stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12813) NPE in auth for bootstrapping node

2016-11-12 Thread Alex Petrov (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660436#comment-15660436
 ] 

Alex Petrov commented on CASSANDRA-12813:
-

Fixed dtests to catch the problem 
[here|https://github.com/riptano/cassandra-dtest/pull/1382].

> NPE in auth for bootstrapping node
> --
>
> Key: CASSANDRA-12813
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12813
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Charles Mims
>Assignee: Alex Petrov
> Fix For: 2.2.9, 3.0.10, 3.10
>
>
> {code}
> ERROR [SharedPool-Worker-1] 2016-10-19 21:40:25,991 Message.java:617 - 
> Unexpected exception during request; channel = [id: 0x15eb017f, / omitted>:40869 => /10.0.0.254:9042]
> java.lang.NullPointerException: null
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:144)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:86)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:182)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78)
>  ~[apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324)
>  [netty-all-4.0.23.Final.jar:4.0.23.Final]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_101]
>   at 
> org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164)
>  [apache-cassandra-3.0.9.jar:3.0.9]
>   at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) 
> [apache-cassandra-3.0.9.jar:3.0.9]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101]
> {code}
> I have a node that has been joining for around 24 hours.  My application is 
> configured with the IP address of the joining node in the list of nodes to 
> connect to (ruby driver), and I have been getting around 200 events of this 
> NPE per hour.  I removed the IP of the joining node from the list of nodes 
> for my app to connect to and the errors stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-12905) streaming issues with 3.9 (repair)

2016-11-12 Thread Nir Zilka (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nir Zilka updated CASSANDRA-12905:
--
Description: 
Hello,

I performed two upgrades to the current cluster (currently 15 nodes),
first it was 2.2.5.1 and repair worked flawlessly,
second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well,
then i upgraded 2 weeks ago to 3.9 - and the repair problems started.

there are several errors types from the system.log (different nodes) :

- Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
- Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed 
out - received only 0 responses
- Remote peer xxx.xxx.xxx.xxx failed stream session
- Session completed with the following error
org.apache.cassandra.streaming.StreamException: Stream failed



i use 3.9 default configuration with the cluster settings adjustments (3 seeds, 
GossipingPropertyFileSnitch).
streaming_socket_timeout_in_ms is the default (8640).

i'm afraid from consistency problems while i'm not performing repair.

Any ideas?

Thanks,
Nir.


  was:
Hello,

I performed two upgrades to the current cluster (currently 15 nodes),
first it was 2.2.5.1 and repair worked flawlessly,
second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well,
then i upgraded 2 weeks ago to 3.9 - and the repair problems started.

there are several errors types from the system.log :

- Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
- Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed 
out - received only 0 responses
- Remote peer xxx.xxx.xxx.xxx failed stream session
- Session completed with the following error
org.apache.cassandra.streaming.StreamException: Stream failed



i use 3.9 default configuration with the cluster settings adjustments (3 seeds, 
GossipingPropertyFileSnitch).
streaming_socket_timeout_in_ms is the default (8640).

i'm afraid from consistency problems while i'm not performing repair.

Any ideas?

Thanks,
Nir.



> streaming issues with 3.9 (repair)
> --
>
> Key: CASSANDRA-12905
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
> Project: Cassandra
>  Issue Type: Bug
>  Components: Streaming and Messaging
> Environment: centos 6.7 x86_64
>Reporter: Nir Zilka
> Fix For: 3.9
>
>
> Hello,
> I performed two upgrades to the current cluster (currently 15 nodes),
> first it was 2.2.5.1 and repair worked flawlessly,
> second upgrade was to 3.0.9 (with upgradesstables) and also repair worked 
> well,
> then i upgraded 2 weeks ago to 3.9 - and the repair problems started.
> there are several errors types from the system.log (different nodes) :
> - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
> - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation 
> timed out - received only 0 responses
> - Remote peer xxx.xxx.xxx.xxx failed stream session
> - Session completed with the following error
> org.apache.cassandra.streaming.StreamException: Stream failed
> 
> i use 3.9 default configuration with the cluster settings adjustments (3 
> seeds, GossipingPropertyFileSnitch).
> streaming_socket_timeout_in_ms is the default (8640).
> i'm afraid from consistency problems while i'm not performing repair.
> Any ideas?
> Thanks,
> Nir.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (CASSANDRA-12905) streaming issues with 3.9 (repair)

2016-11-12 Thread Nir Zilka (JIRA)
Nir Zilka created CASSANDRA-12905:
-

 Summary: streaming issues with 3.9 (repair)
 Key: CASSANDRA-12905
 URL: https://issues.apache.org/jira/browse/CASSANDRA-12905
 Project: Cassandra
  Issue Type: Bug
  Components: Streaming and Messaging
 Environment: centos 6.7 x86_64
Reporter: Nir Zilka
 Fix For: 3.9


Hello,

I performed two upgrades to the current cluster (currently 15 nodes),
first it was 2.2.5.1 and repair worked flawlessly,
second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well,
then i upgraded 2 weeks ago to 3.9 - and the repair problems started.

there are several errors types from the system.log :

- Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx
- Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed 
out - received only 0 responses
- Remote peer xxx.xxx.xxx.xxx failed stream session
- Session completed with the following error
org.apache.cassandra.streaming.StreamException: Stream failed



i use 3.9 default configuration with the cluster settings adjustments (3 seeds, 
GossipingPropertyFileSnitch).
streaming_socket_timeout_in_ms is the default (8640).

i'm afraid from consistency problems while i'm not performing repair.

Any ideas?

Thanks,
Nir.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[Cassandra Wiki] Update of "ThirdPartySupport" by winguzone

2016-11-12 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for 
change notification.

The "ThirdPartySupport" page has been changed by winguzone:
https://wiki.apache.org/cassandra/ThirdPartySupport?action=diff=49=50

Comment:
3rd party support

  
  {{http://www.decisivelabs.com.au/img/platforms/instaclustr-l...@2x.png}} 
[[https://www.instaclustr.com/?cid=casspp|Instaclustr]] provides managed Apache 
Cassandra hosting on Amazon Web Services. Instaclustr dramatically reduces 
administration overheads and support costs by providing automated deployment, 
backups, cluster balancing and performance tuning.
  
+ {{https://winguzone.com/wp-content/uploads/2016/11/IconWiki.png}} 
[[https://winguzone.com/?utm_source=apwiki|Winguzone]] provides affordable cost 
effective Apache Cassandra clusters in major Clouds.
  
  
{{https://opencredo.com/wp-content/uploads/2013/07/OpenCredo-Logo-Alt-CMYK-Process-Converted-300x72.png}}
 [[https://opencredo.com|OpenCredo]] is a pragmatic hands-on software and 
devOps consultancy with a wealth of experience in open source technologies. We 
are Datastax Certified experts and have been working with Cassandra since 2012. 
And so through our real-world experience, we can provide expertise in both 
Apache Cassandra and DataStax Enterprise. Contact us at i...@opencredo.com
  


[jira] [Commented] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to blame)

2016-11-12 Thread Stefano Ortolani (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659647#comment-15659647
 ] 

Stefano Ortolani commented on CASSANDRA-11723:
--

Hi [~snazy], didn't manage to test it yet since the issue was taking place in 
production only.
I might be able to backport the version from 14.04 and test it during the next 
upgrade cycle (3.0.10 should be imminent).
I will keep you posted.

> Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to 
> blame)
> --
>
> Key: CASSANDRA-11723
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11723
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
> Fix For: 3.0.x
>
>
> Upgrade seems fine, but any restart of the node might lead to a situation 
> where the node just dies after 30 seconds / 1 minute. 
> Nothing in the logs besides many "FailureDetector.java:456 - Ignoring 
> interval time of 3000892567 for /10.12.a.x" output every second (against all 
> other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair 
> notifications:
> {code:xml}
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2373187360 for /10.12.a.x
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2000276196 for /10.12.a.y
> DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - 
> Digest mismatch:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
> DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) 
> (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
>   at 
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - 
> Ignoring interval time of 3000299340 for /10.12.33.5
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.IllegalStateException: Unable to compute ceiling for max when 
> histogram overflowed
>   at 
> org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_60]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_60]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> I know this is not much but nothing else gets to dmesg or to any other log. 
> Any suggestion how to debug this further?
> I upgraded two nodes so far, and it happened on both nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-12244) progress in compactionstats is reported wrongly for view builds

2016-11-12 Thread ZhaoYang (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659573#comment-15659573
 ] 

ZhaoYang commented on CASSANDRA-12244:
--

[~muru] thanks for the comments. I updated the commit: 
https://github.com/jasonstack/cassandra/commit/4339a960fe9cd2b42dc80192570d39287d6c9157

> progress in compactionstats is reported wrongly for view builds
> ---
>
> Key: CASSANDRA-12244
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12244
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Tom van der Woerdt
>Assignee: ZhaoYang
>Priority: Minor
>  Labels: lhf
> Fix For: 3.0.9
>
>
> In the view build progress given by compactionstats, there are several issues 
> :
> {code}
>  id   compaction type   keyspace 
> table   completed   total unit   progress
>038d3690-4dbe-11e6-b207-21ec388d48e6View build  mykeyspace   
> mytable   844 bytes   967 bytes   ranges 87.28%
> Active compaction remaining time :n/a
> {code}
> 1) those are ranges, not bytes
> 2) it's not at 87.28%, it's at ~4%. the method for calculating progress in 
> Cassandra is wrong: it neglects to sort the tokens it's iterating through 
> (ViewBuilder.java) and thus ends up with a random number.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (CASSANDRA-9738) Migrate key-cache to be fully off-heap

2016-11-12 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp resolved CASSANDRA-9738.
-
   Resolution: Won't Fix
Fix Version/s: (was: 3.x)

Closed as won't-fix.

We already have CASSANDRA-11206 to handle large partitions and CASSANDRA-9754 
is also making progress. Both tickets make this one somehow superfluous.

> Migrate key-cache to be fully off-heap
> --
>
> Key: CASSANDRA-9738
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9738
> Project: Cassandra
>  Issue Type: Sub-task
>Reporter: Robert Stupp
>Assignee: Robert Stupp
>
> Key cache still uses a concurrent map on-heap. This could go to off-heap and 
> feels doable now after CASSANDRA-8099.
> Evaluation should be done in advance based on a POC to prove that pure 
> off-heap counter cache buys a performance and/or gc-pressure improvement.
> In theory, elimination of on-heap management of the map should buy us some 
> benefit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to blame)

2016-11-12 Thread Robert Stupp (JIRA)

 [ 
https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Stupp updated CASSANDRA-11723:
-
Status: Awaiting Feedback  (was: Open)

> Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to 
> blame)
> --
>
> Key: CASSANDRA-11723
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11723
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
> Fix For: 3.0.x
>
>
> Upgrade seems fine, but any restart of the node might lead to a situation 
> where the node just dies after 30 seconds / 1 minute. 
> Nothing in the logs besides many "FailureDetector.java:456 - Ignoring 
> interval time of 3000892567 for /10.12.a.x" output every second (against all 
> other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair 
> notifications:
> {code:xml}
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2373187360 for /10.12.a.x
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2000276196 for /10.12.a.y
> DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - 
> Digest mismatch:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
> DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) 
> (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
>   at 
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - 
> Ignoring interval time of 3000299340 for /10.12.33.5
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.IllegalStateException: Unable to compute ceiling for max when 
> histogram overflowed
>   at 
> org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_60]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_60]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> I know this is not much but nothing else gets to dmesg or to any other log. 
> Any suggestion how to debug this further?
> I upgraded two nodes so far, and it happened on both nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to blame)

2016-11-12 Thread Robert Stupp (JIRA)

[ 
https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659364#comment-15659364
 ] 

Robert Stupp commented on CASSANDRA-11723:
--

[~ostefano], did upgrading to recent jemalloc work for you?

> Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to 
> blame)
> --
>
> Key: CASSANDRA-11723
> URL: https://issues.apache.org/jira/browse/CASSANDRA-11723
> Project: Cassandra
>  Issue Type: Bug
>Reporter: Stefano Ortolani
> Fix For: 3.0.x
>
>
> Upgrade seems fine, but any restart of the node might lead to a situation 
> where the node just dies after 30 seconds / 1 minute. 
> Nothing in the logs besides many "FailureDetector.java:456 - Ignoring 
> interval time of 3000892567 for /10.12.a.x" output every second (against all 
> other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair 
> notifications:
> {code:xml}
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2373187360 for /10.12.a.x
> DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - 
> Ignoring interval time of 2000276196 for /10.12.a.y
> DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - 
> Digest mismatch:
> org.apache.cassandra.service.DigestMismatchException: Mismatch for key 
> DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) 
> (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718)
>   at 
> org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) 
> ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - 
> Ignoring interval time of 3000299340 for /10.12.33.5
> ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 
> ScheduledReporter.java:119 - RuntimeException thrown from 
> GraphiteReporter#report. Exception was suppressed.
> java.lang.IllegalStateException: Unable to compute ceiling for max when 
> histogram overflowed
>   at 
> org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103)
>  ~[apache-cassandra-3.0.5.jar:3.0.5]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166)
>  ~[metrics-graphite-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) 
> ~[metrics-core-3.1.0.jar:3.1.0]
>   at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> [na:1.8.0_60]
>   at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) 
> [na:1.8.0_60]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  [na:1.8.0_60]
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  [na:1.8.0_60]
>   at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60]
> {code}
> I know this is not much but nothing else gets to dmesg or to any other log. 
> Any suggestion how to debug this further?
> I upgraded two nodes so far, and it happened on both nodes.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)