[jira] [Updated] (CASSANDRA-12906) Update doco with new getting started with contribution section
[ https://issues.apache.org/jira/browse/CASSANDRA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Slater updated CASSANDRA-12906: --- Status: Patch Available (was: Open) Draft updates in the attached patch. Substantial changes are only in the new file (gettingstarted.rst) other changes are just to update the index and add labels (anchors) in the existing files for linking. > Update doco with new getting started with contribution section > -- > > Key: CASSANDRA-12906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12906 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Ben Slater >Assignee: Ben Slater >Priority: Minor > Attachments: CASSANDRA_12906-trunk.patch > > > Following discussion on the mailing list about how to get more community > input it seemed to be agreed that adding some doco emphasising contributions > other than creating new features would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12906) Update doco with new getting started with contribution section
[ https://issues.apache.org/jira/browse/CASSANDRA-12906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Slater updated CASSANDRA-12906: --- Attachment: CASSANDRA_12906-trunk.patch > Update doco with new getting started with contribution section > -- > > Key: CASSANDRA-12906 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12906 > Project: Cassandra > Issue Type: Improvement > Components: Documentation and Website >Reporter: Ben Slater >Assignee: Ben Slater >Priority: Minor > Attachments: CASSANDRA_12906-trunk.patch > > > Following discussion on the mailing list about how to get more community > input it seemed to be agreed that adding some doco emphasising contributions > other than creating new features would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12906) Update doco with new getting started with contribution section
Ben Slater created CASSANDRA-12906: -- Summary: Update doco with new getting started with contribution section Key: CASSANDRA-12906 URL: https://issues.apache.org/jira/browse/CASSANDRA-12906 Project: Cassandra Issue Type: Improvement Components: Documentation and Website Reporter: Ben Slater Assignee: Ben Slater Priority: Minor Following discussion on the mailing list about how to get more community input it seemed to be agreed that adding some doco emphasising contributions other than creating new features would be a good idea. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
[ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915 ] Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 6:21 AM: --- [~pauloricardomg] To answer your questions *Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same throughout the cluster? *A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the timestamp when the "initiator" (the host that was running repair) reported failure to search the system.log on every node. Can confirm those failures all pointed back to the initiator, e.g. {noformat} ERROR [STREAM-IN-/52.220.127.181:7001] 2016-11-10 22:23:31,196 StreamSession.java:529 - [Stream #49719e00-a794-11e6-be90-f1ad7e862a5b] Streaming error occurred on session with peer 52.220.127.181 ... ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip* {noformat} *Q:* What are your tcp_keepalive settings? (see tuning guide here) *A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep net.ipv4.tcp_keep net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 10 {noformat} *Q:* Also, can you paste full debug.log sample of a node with this error? *A:* Here is the debug.log file on a remote machine (*Note* _the IP/timestamp in the log file differ from those in the original bug report as this is from another round of test_) debug.log.2016-11-10_2319.gz was (Author: bing1wu): [~pauloricardomg] To answer your questions *Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same throughout the cluster? *A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the timestamp when the "initiator" (the host that was running repair) reported failure to search the system.log on every node. Can confirm those failures all pointed back to the initiator, e.g. {noformat} ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip* {noformat} *Q:* What are your tcp_keepalive settings? (see tuning guide here) *A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep net.ipv4.tcp_keep net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 10 {noformat} *Q:* Also, can you paste full debug.log sample of a node with this error? *A:* Here is the debug.log file on a remote machine (*Note* _the IP/timestamp in the log file differ from those in the original bug report as this is from another round of test_) debug.log.2016-11-10_2319.gz > Streaming failed due to SSL Socket connection reset > --- > > Key: CASSANDRA-12886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12886 > Project: Cassandra > Issue Type: Bug >Reporter: Bing Wu > Attachments: debug.log.2016-11-10_2319.gz > > > While running "nodetool repair", I see many instances of > "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in > system.logs on some nodes in the cluster. Timestamps correspond to streaming > source/initiator's error messages of "sync failed between ..." > Setup: > - Cassandra 3.7.01 > - CentOS 6.7 in AWS (multi-region) > - JDK version: {noformat} > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > {noformat} > - cassandra.yaml: > {noformat} > server_encryption_options: > internode_encryption: all > keystore: [path] > keystore_password: [password] > truststore: [path] > truststore_password: [password] > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > require_client_auth: false > {noformat} > Error messages in system.log on the target host: > {noformat} > ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 > StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] > Streaming error occurred on session with peer 54.247.111.232 > javax.net.ssl.SSLException: Connection has been shutdown: >
[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
[ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915 ] Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 6:03 AM: --- [~pauloricardomg] To answer your questions *Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same throughout the cluster? *A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the timestamp when the "initiator" (the host that was running repair) reported failure to search the system.log on every node. Can confirm those failures all pointed back to the initiator, e.g. {noformat} ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip* {noformat} *Q:* What are your tcp_keepalive settings? (see tuning guide here) *A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep net.ipv4.tcp_keep net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 10 {noformat} *Q:* Also, can you paste full debug.log sample of a node with this error? *A:* Here is the debug.log file on a remote machine (*Note* _the IP/timestamp in the log file differ from those in the original bug report as this is from another round of test_) debug.log.2016-11-10_2319.gz was (Author: bing1wu): [~pauloricardomg] To answer your questions *Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same throughout the cluster? *A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the timestamp when the "initiator" (the host that was running repair) reported failure to search the system.log on every node. Can confirm those failures all pointed back to the initiator, e.g. {noformat} ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip* {noformat} *Q:* What are your tcp_keepalive settings? (see tuning guide here) *A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep net.ipv4.tcp_keep net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 10 {noformat} *Q:* Also, can you paste full debug.log sample of a node with this error? *A:* Here is the debug.log file on a remote machine (not the initiator) debug.log.2016-11-10_2319.gz > Streaming failed due to SSL Socket connection reset > --- > > Key: CASSANDRA-12886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12886 > Project: Cassandra > Issue Type: Bug >Reporter: Bing Wu > Attachments: debug.log.2016-11-10_2319.gz > > > While running "nodetool repair", I see many instances of > "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in > system.logs on some nodes in the cluster. Timestamps correspond to streaming > source/initiator's error messages of "sync failed between ..." > Setup: > - Cassandra 3.7.01 > - CentOS 6.7 in AWS (multi-region) > - JDK version: {noformat} > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > {noformat} > - cassandra.yaml: > {noformat} > server_encryption_options: > internode_encryption: all > keystore: [path] > keystore_password: [password] > truststore: [path] > truststore_password: [password] > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > require_client_auth: false > {noformat} > Error messages in system.log on the target host: > {noformat} > ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 > StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] > Streaming error occurred on session with peer 54.247.111.232 > javax.net.ssl.SSLException: Connection has been shutdown: > javax.net.ssl.SSLException: java.net.SocketException: Connection reset > at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) > ~[na:1.8.0_102] > at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) > ~[na:1.8.0_102] > at
[jira] [Issue Comment Deleted] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
[ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Wu updated CASSANDRA-12886: Comment: was deleted (was: I lost track of the debug.log or some of the system.logs. And the admins just added the tcp_keepingalive settings. So I will try to repro this issue from a clean slate. ) > Streaming failed due to SSL Socket connection reset > --- > > Key: CASSANDRA-12886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12886 > Project: Cassandra > Issue Type: Bug >Reporter: Bing Wu > Attachments: debug.log.2016-11-10_2319.gz > > > While running "nodetool repair", I see many instances of > "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in > system.logs on some nodes in the cluster. Timestamps correspond to streaming > source/initiator's error messages of "sync failed between ..." > Setup: > - Cassandra 3.7.01 > - CentOS 6.7 in AWS (multi-region) > - JDK version: {noformat} > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > {noformat} > - cassandra.yaml: > {noformat} > server_encryption_options: > internode_encryption: all > keystore: [path] > keystore_password: [password] > truststore: [path] > truststore_password: [password] > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > require_client_auth: false > {noformat} > Error messages in system.log on the target host: > {noformat} > ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 > StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] > Streaming error occurred on session with peer 54.247.111.232 > javax.net.ssl.SSLException: Connection has been shutdown: > javax.net.ssl.SSLException: java.net.SocketException: Connection reset > at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) > ~[na:1.8.0_102] > at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) > ~[na:1.8.0_102] > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) > ~[na:1.8.0_102] > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > ~[na:1.8.0_102] > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > ~[na:1.8.0_102] > at > org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66) > ~[apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371) > [apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342) > [apache-cassandra-3.7.0.jar:3.7.0] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] > Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection > reset > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
[ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915 ] Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 5:46 AM: --- [~pauloricardomg] To answer your questions *Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same throughout the cluster? *A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the timestamp when the "initiator" (the host that was running repair) reported failure to search the system.log on every node. Can confirm those failures all pointed back to the initiator, e.g. {noformat} ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip* {noformat} *Q:* What are your tcp_keepalive settings? (see tuning guide here) *A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep net.ipv4.tcp_keep net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 10 {noformat} *Q:* Also, can you paste full debug.log sample of a node with this error? *A:* Here is the debug.log file on a remote machine (not the initiator) debug.log.2016-11-10_2319.gz was (Author: bing1wu): [~pauloricardomg] To answer your questions *Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same throughout the cluster? *A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the timestamp when the "initiator" (the host that was running repair) reported failure to search the system.log on every node. Can confirm those failures all pointed back to the initiator, e.g. {noformat} ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip* {noformat} *Q:* What are your tcp_keepalive settings? (see tuning guide here) *A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep net.ipv4.tcp_keep net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 10 {noformat} *Q:* Also, can you paste full debug.log sample of a node with this error? *A:* Here is the debug.log file on a remote machine (not the initiator) > Streaming failed due to SSL Socket connection reset > --- > > Key: CASSANDRA-12886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12886 > Project: Cassandra > Issue Type: Bug >Reporter: Bing Wu > Attachments: debug.log.2016-11-10_2319.gz > > > While running "nodetool repair", I see many instances of > "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in > system.logs on some nodes in the cluster. Timestamps correspond to streaming > source/initiator's error messages of "sync failed between ..." > Setup: > - Cassandra 3.7.01 > - CentOS 6.7 in AWS (multi-region) > - JDK version: {noformat} > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > {noformat} > - cassandra.yaml: > {noformat} > server_encryption_options: > internode_encryption: all > keystore: [path] > keystore_password: [password] > truststore: [path] > truststore_password: [password] > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > require_client_auth: false > {noformat} > Error messages in system.log on the target host: > {noformat} > ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 > StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] > Streaming error occurred on session with peer 54.247.111.232 > javax.net.ssl.SSLException: Connection has been shutdown: > javax.net.ssl.SSLException: java.net.SocketException: Connection reset > at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) > ~[na:1.8.0_102] > at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) > ~[na:1.8.0_102] > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) > ~[na:1.8.0_102] > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) >
[jira] [Updated] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
[ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bing Wu updated CASSANDRA-12886: Attachment: debug.log.2016-11-10_2319.gz > Streaming failed due to SSL Socket connection reset > --- > > Key: CASSANDRA-12886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12886 > Project: Cassandra > Issue Type: Bug >Reporter: Bing Wu > Attachments: debug.log.2016-11-10_2319.gz > > > While running "nodetool repair", I see many instances of > "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in > system.logs on some nodes in the cluster. Timestamps correspond to streaming > source/initiator's error messages of "sync failed between ..." > Setup: > - Cassandra 3.7.01 > - CentOS 6.7 in AWS (multi-region) > - JDK version: {noformat} > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > {noformat} > - cassandra.yaml: > {noformat} > server_encryption_options: > internode_encryption: all > keystore: [path] > keystore_password: [password] > truststore: [path] > truststore_password: [password] > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > require_client_auth: false > {noformat} > Error messages in system.log on the target host: > {noformat} > ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 > StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] > Streaming error occurred on session with peer 54.247.111.232 > javax.net.ssl.SSLException: Connection has been shutdown: > javax.net.ssl.SSLException: java.net.SocketException: Connection reset > at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) > ~[na:1.8.0_102] > at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) > ~[na:1.8.0_102] > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) > ~[na:1.8.0_102] > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > ~[na:1.8.0_102] > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > ~[na:1.8.0_102] > at > org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66) > ~[apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371) > [apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342) > [apache-cassandra-3.7.0.jar:3.7.0] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] > Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection > reset > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
[ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660915#comment-15660915 ] Bing Wu commented on CASSANDRA-12886: - [~pauloricardomg] To answer your questions *Q:* Can you check if the source/destination STREAM-(IN/OUT)-IP of failed streams is the same throughout the cluster? *A:* Yes. More details: Not all nodes in the cluster reported SSL failure. About 7 out of 30 nodes did. I used a combination of "java.net.SocketException: Connection reset" and the timestamp when the "initiator" (the host that was running repair) reported failure to search the system.log on every node. Can confirm those failures all pointed back to the initiator, e.g. {noformat} ERROR [StreamConnectionEstablisher:6] 2016-11-10 22:23:30,303 StreamSession.java:529 - [Stream #496c6de0-a794-11e6-bf13-7df2869901ea] Streaming error occurred on session with peer *initiator-public-ip* {noformat} *Q:* What are your tcp_keepalive settings? (see tuning guide here) *A:* They are what's recommended: {noformat}$ sudo sysctl -A | grep net.ipv4.tcp_keep net.ipv4.tcp_keepalive_time = 60 net.ipv4.tcp_keepalive_probes = 3 net.ipv4.tcp_keepalive_intvl = 10 {noformat} *Q:* Also, can you paste full debug.log sample of a node with this error? *A:* Here is the debug.log file on a remote machine (not the initiator) > Streaming failed due to SSL Socket connection reset > --- > > Key: CASSANDRA-12886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12886 > Project: Cassandra > Issue Type: Bug >Reporter: Bing Wu > > While running "nodetool repair", I see many instances of > "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in > system.logs on some nodes in the cluster. Timestamps correspond to streaming > source/initiator's error messages of "sync failed between ..." > Setup: > - Cassandra 3.7.01 > - CentOS 6.7 in AWS (multi-region) > - JDK version: {noformat} > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > {noformat} > - cassandra.yaml: > {noformat} > server_encryption_options: > internode_encryption: all > keystore: [path] > keystore_password: [password] > truststore: [path] > truststore_password: [password] > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > require_client_auth: false > {noformat} > Error messages in system.log on the target host: > {noformat} > ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 > StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] > Streaming error occurred on session with peer 54.247.111.232 > javax.net.ssl.SSLException: Connection has been shutdown: > javax.net.ssl.SSLException: java.net.SocketException: Connection reset > at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) > ~[na:1.8.0_102] > at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) > ~[na:1.8.0_102] > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) > ~[na:1.8.0_102] > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > ~[na:1.8.0_102] > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > ~[na:1.8.0_102] > at > org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66) > ~[apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371) > [apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342) > [apache-cassandra-3.7.0.jar:3.7.0] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] > Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection > reset > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (CASSANDRA-12886) Streaming failed due to SSL Socket connection reset
[ https://issues.apache.org/jira/browse/CASSANDRA-12886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15653288#comment-15653288 ] Bing Wu edited comment on CASSANDRA-12886 at 11/13/16 5:29 AM: --- I lost track of the debug.log or some of the system.logs. And the admins just added the tcp_keepingalive settings. So I will try to repro this issue from a clean slate. was (Author: bing1wu): [~pjrmoreira] I lost track of the debug.log or some of the system.logs. And the admins just added the tcp_keepingalive settings. So I will try to repro this issue from a clean slate. > Streaming failed due to SSL Socket connection reset > --- > > Key: CASSANDRA-12886 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12886 > Project: Cassandra > Issue Type: Bug >Reporter: Bing Wu > > While running "nodetool repair", I see many instances of > "javax.net.ssl.SSLException: java.net.SocketException: Connection reset" in > system.logs on some nodes in the cluster. Timestamps correspond to streaming > source/initiator's error messages of "sync failed between ..." > Setup: > - Cassandra 3.7.01 > - CentOS 6.7 in AWS (multi-region) > - JDK version: {noformat} > java version "1.8.0_102" > Java(TM) SE Runtime Environment (build 1.8.0_102-b14) > Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) > {noformat} > - cassandra.yaml: > {noformat} > server_encryption_options: > internode_encryption: all > keystore: [path] > keystore_password: [password] > truststore: [path] > truststore_password: [password] > # More advanced defaults below: > # protocol: TLS > # algorithm: SunX509 > # store_type: JKS > # cipher_suites: > [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA] > require_client_auth: false > {noformat} > Error messages in system.log on the target host: > {noformat} > ERROR [STREAM-OUT-/54.247.111.232:7001] 2016-11-07 07:30:56,475 > StreamSession.java:529 - [Stream #e14abcb0-a4bb-11e6-9758-55b9ac38b78e] > Streaming error occurred on session with peer 54.247.111.232 > javax.net.ssl.SSLException: Connection has been shutdown: > javax.net.ssl.SSLException: java.net.SocketException: Connection reset > at sun.security.ssl.SSLSocketImpl.checkEOF(SSLSocketImpl.java:1541) > ~[na:1.8.0_102] > at sun.security.ssl.SSLSocketImpl.checkWrite(SSLSocketImpl.java:1553) > ~[na:1.8.0_102] > at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:71) > ~[na:1.8.0_102] > at > java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) > ~[na:1.8.0_102] > at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) > ~[na:1.8.0_102] > at > org.apache.cassandra.io.util.WrappedDataOutputStreamPlus.flush(WrappedDataOutputStreamPlus.java:66) > ~[apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.sendMessage(ConnectionHandler.java:371) > [apache-cassandra-3.7.0.jar:3.7.0] > at > org.apache.cassandra.streaming.ConnectionHandler$OutgoingMessageHandler.run(ConnectionHandler.java:342) > [apache-cassandra-3.7.0.jar:3.7.0] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] > Caused by: javax.net.ssl.SSLException: java.net.SocketException: Connection > reset > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-10726) Read repair inserts should not be blocking
[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Brown updated CASSANDRA-10726: Assignee: Xiaolong Jiang (was: Nachiket Patil) > Read repair inserts should not be blocking > -- > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination >Reporter: Richard Low >Assignee: Xiaolong Jiang > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-12813) NPE in auth for bootstrapping node
[ https://issues.apache.org/jira/browse/CASSANDRA-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alex Petrov resolved CASSANDRA-12813. - Resolution: Fixed > NPE in auth for bootstrapping node > -- > > Key: CASSANDRA-12813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12813 > Project: Cassandra > Issue Type: Bug >Reporter: Charles Mims >Assignee: Alex Petrov > Fix For: 2.2.9, 3.0.10, 3.10 > > > {code} > ERROR [SharedPool-Worker-1] 2016-10-19 21:40:25,991 Message.java:617 - > Unexpected exception during request; channel = [id: 0x15eb017f, / omitted>:40869 => /10.0.0.254:9042] > java.lang.NullPointerException: null > at > org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:144) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:86) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:182) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513) > [apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407) > [apache-cassandra-3.0.9.jar:3.0.9] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_101] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [apache-cassandra-3.0.9.jar:3.0.9] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-3.0.9.jar:3.0.9] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] > {code} > I have a node that has been joining for around 24 hours. My application is > configured with the IP address of the joining node in the list of nodes to > connect to (ruby driver), and I have been getting around 200 events of this > NPE per hour. I removed the IP of the joining node from the list of nodes > for my app to connect to and the errors stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12813) NPE in auth for bootstrapping node
[ https://issues.apache.org/jira/browse/CASSANDRA-12813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15660436#comment-15660436 ] Alex Petrov commented on CASSANDRA-12813: - Fixed dtests to catch the problem [here|https://github.com/riptano/cassandra-dtest/pull/1382]. > NPE in auth for bootstrapping node > -- > > Key: CASSANDRA-12813 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12813 > Project: Cassandra > Issue Type: Bug >Reporter: Charles Mims >Assignee: Alex Petrov > Fix For: 2.2.9, 3.0.10, 3.10 > > > {code} > ERROR [SharedPool-Worker-1] 2016-10-19 21:40:25,991 Message.java:617 - > Unexpected exception during request; channel = [id: 0x15eb017f, / omitted>:40869 => /10.0.0.254:9042] > java.lang.NullPointerException: null > at > org.apache.cassandra.auth.PasswordAuthenticator.doAuthenticate(PasswordAuthenticator.java:144) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator.authenticate(PasswordAuthenticator.java:86) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator.access$100(PasswordAuthenticator.java:54) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.auth.PasswordAuthenticator$PlainTextSaslAuthenticator.getAuthenticatedUser(PasswordAuthenticator.java:182) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.messages.AuthResponse.execute(AuthResponse.java:78) > ~[apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:513) > [apache-cassandra-3.0.9.jar:3.0.9] > at > org.apache.cassandra.transport.Message$Dispatcher.channelRead0(Message.java:407) > [apache-cassandra-3.0.9.jar:3.0.9] > at > io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext.access$700(AbstractChannelHandlerContext.java:32) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > io.netty.channel.AbstractChannelHandlerContext$8.run(AbstractChannelHandlerContext.java:324) > [netty-all-4.0.23.Final.jar:4.0.23.Final] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_101] > at > org.apache.cassandra.concurrent.AbstractLocalAwareExecutorService$FutureTask.run(AbstractLocalAwareExecutorService.java:164) > [apache-cassandra-3.0.9.jar:3.0.9] > at org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > [apache-cassandra-3.0.9.jar:3.0.9] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_101] > {code} > I have a node that has been joining for around 24 hours. My application is > configured with the IP address of the joining node in the list of nodes to > connect to (ruby driver), and I have been getting around 200 events of this > NPE per hour. I removed the IP of the joining node from the list of nodes > for my app to connect to and the errors stopped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-12905) streaming issues with 3.9 (repair)
[ https://issues.apache.org/jira/browse/CASSANDRA-12905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nir Zilka updated CASSANDRA-12905: -- Description: Hello, I performed two upgrades to the current cluster (currently 15 nodes), first it was 2.2.5.1 and repair worked flawlessly, second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well, then i upgraded 2 weeks ago to 3.9 - and the repair problems started. there are several errors types from the system.log (different nodes) : - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed out - received only 0 responses - Remote peer xxx.xxx.xxx.xxx failed stream session - Session completed with the following error org.apache.cassandra.streaming.StreamException: Stream failed i use 3.9 default configuration with the cluster settings adjustments (3 seeds, GossipingPropertyFileSnitch). streaming_socket_timeout_in_ms is the default (8640). i'm afraid from consistency problems while i'm not performing repair. Any ideas? Thanks, Nir. was: Hello, I performed two upgrades to the current cluster (currently 15 nodes), first it was 2.2.5.1 and repair worked flawlessly, second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well, then i upgraded 2 weeks ago to 3.9 - and the repair problems started. there are several errors types from the system.log : - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed out - received only 0 responses - Remote peer xxx.xxx.xxx.xxx failed stream session - Session completed with the following error org.apache.cassandra.streaming.StreamException: Stream failed i use 3.9 default configuration with the cluster settings adjustments (3 seeds, GossipingPropertyFileSnitch). streaming_socket_timeout_in_ms is the default (8640). i'm afraid from consistency problems while i'm not performing repair. Any ideas? Thanks, Nir. > streaming issues with 3.9 (repair) > -- > > Key: CASSANDRA-12905 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12905 > Project: Cassandra > Issue Type: Bug > Components: Streaming and Messaging > Environment: centos 6.7 x86_64 >Reporter: Nir Zilka > Fix For: 3.9 > > > Hello, > I performed two upgrades to the current cluster (currently 15 nodes), > first it was 2.2.5.1 and repair worked flawlessly, > second upgrade was to 3.0.9 (with upgradesstables) and also repair worked > well, > then i upgraded 2 weeks ago to 3.9 - and the repair problems started. > there are several errors types from the system.log (different nodes) : > - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx > - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation > timed out - received only 0 responses > - Remote peer xxx.xxx.xxx.xxx failed stream session > - Session completed with the following error > org.apache.cassandra.streaming.StreamException: Stream failed > > i use 3.9 default configuration with the cluster settings adjustments (3 > seeds, GossipingPropertyFileSnitch). > streaming_socket_timeout_in_ms is the default (8640). > i'm afraid from consistency problems while i'm not performing repair. > Any ideas? > Thanks, > Nir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CASSANDRA-12905) streaming issues with 3.9 (repair)
Nir Zilka created CASSANDRA-12905: - Summary: streaming issues with 3.9 (repair) Key: CASSANDRA-12905 URL: https://issues.apache.org/jira/browse/CASSANDRA-12905 Project: Cassandra Issue Type: Bug Components: Streaming and Messaging Environment: centos 6.7 x86_64 Reporter: Nir Zilka Fix For: 3.9 Hello, I performed two upgrades to the current cluster (currently 15 nodes), first it was 2.2.5.1 and repair worked flawlessly, second upgrade was to 3.0.9 (with upgradesstables) and also repair worked well, then i upgraded 2 weeks ago to 3.9 - and the repair problems started. there are several errors types from the system.log : - Sync failed between /xxx.xxx.xxx.xxx and /xxx.xxx.xxx.xxx - Streaming error occurred on session with peer xxx.xxx.xxx.xxx Operation timed out - received only 0 responses - Remote peer xxx.xxx.xxx.xxx failed stream session - Session completed with the following error org.apache.cassandra.streaming.StreamException: Stream failed i use 3.9 default configuration with the cluster settings adjustments (3 seeds, GossipingPropertyFileSnitch). streaming_socket_timeout_in_ms is the default (8640). i'm afraid from consistency problems while i'm not performing repair. Any ideas? Thanks, Nir. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[Cassandra Wiki] Update of "ThirdPartySupport" by winguzone
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Cassandra Wiki" for change notification. The "ThirdPartySupport" page has been changed by winguzone: https://wiki.apache.org/cassandra/ThirdPartySupport?action=diff=49=50 Comment: 3rd party support {{http://www.decisivelabs.com.au/img/platforms/instaclustr-l...@2x.png}} [[https://www.instaclustr.com/?cid=casspp|Instaclustr]] provides managed Apache Cassandra hosting on Amazon Web Services. Instaclustr dramatically reduces administration overheads and support costs by providing automated deployment, backups, cluster balancing and performance tuning. + {{https://winguzone.com/wp-content/uploads/2016/11/IconWiki.png}} [[https://winguzone.com/?utm_source=apwiki|Winguzone]] provides affordable cost effective Apache Cassandra clusters in major Clouds. {{https://opencredo.com/wp-content/uploads/2013/07/OpenCredo-Logo-Alt-CMYK-Process-Converted-300x72.png}} [[https://opencredo.com|OpenCredo]] is a pragmatic hands-on software and devOps consultancy with a wealth of experience in open source technologies. We are Datastax Certified experts and have been working with Cassandra since 2012. And so through our real-world experience, we can provide expertise in both Apache Cassandra and DataStax Enterprise. Contact us at i...@opencredo.com
[jira] [Commented] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to blame)
[ https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659647#comment-15659647 ] Stefano Ortolani commented on CASSANDRA-11723: -- Hi [~snazy], didn't manage to test it yet since the issue was taking place in production only. I might be able to backport the version from 14.04 and test it during the next upgrade cycle (3.0.10 should be imminent). I will keep you posted. > Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to > blame) > -- > > Key: CASSANDRA-11723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11723 > Project: Cassandra > Issue Type: Bug >Reporter: Stefano Ortolani > Fix For: 3.0.x > > > Upgrade seems fine, but any restart of the node might lead to a situation > where the node just dies after 30 seconds / 1 minute. > Nothing in the logs besides many "FailureDetector.java:456 - Ignoring > interval time of 3000892567 for /10.12.a.x" output every second (against all > other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair > notifications: > {code:xml} > DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - > Ignoring interval time of 2373187360 for /10.12.a.x > DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - > Ignoring interval time of 2000276196 for /10.12.a.y > DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - > Digest mismatch: > org.apache.cassandra.service.DigestMismatchException: Mismatch for key > DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) > (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718) > at > org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - > Ignoring interval time of 3000299340 for /10.12.33.5 > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > at > org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_60] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_60] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {code} > I know this is not much but nothing else gets to dmesg or to any other log. > Any suggestion how to debug this further? > I upgraded two nodes so far, and it happened on both nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-12244) progress in compactionstats is reported wrongly for view builds
[ https://issues.apache.org/jira/browse/CASSANDRA-12244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659573#comment-15659573 ] ZhaoYang commented on CASSANDRA-12244: -- [~muru] thanks for the comments. I updated the commit: https://github.com/jasonstack/cassandra/commit/4339a960fe9cd2b42dc80192570d39287d6c9157 > progress in compactionstats is reported wrongly for view builds > --- > > Key: CASSANDRA-12244 > URL: https://issues.apache.org/jira/browse/CASSANDRA-12244 > Project: Cassandra > Issue Type: Bug >Reporter: Tom van der Woerdt >Assignee: ZhaoYang >Priority: Minor > Labels: lhf > Fix For: 3.0.9 > > > In the view build progress given by compactionstats, there are several issues > : > {code} > id compaction type keyspace > table completed total unit progress >038d3690-4dbe-11e6-b207-21ec388d48e6View build mykeyspace > mytable 844 bytes 967 bytes ranges 87.28% > Active compaction remaining time :n/a > {code} > 1) those are ranges, not bytes > 2) it's not at 87.28%, it's at ~4%. the method for calculating progress in > Cassandra is wrong: it neglects to sort the tokens it's iterating through > (ViewBuilder.java) and thus ends up with a random number. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (CASSANDRA-9738) Migrate key-cache to be fully off-heap
[ https://issues.apache.org/jira/browse/CASSANDRA-9738?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp resolved CASSANDRA-9738. - Resolution: Won't Fix Fix Version/s: (was: 3.x) Closed as won't-fix. We already have CASSANDRA-11206 to handle large partitions and CASSANDRA-9754 is also making progress. Both tickets make this one somehow superfluous. > Migrate key-cache to be fully off-heap > -- > > Key: CASSANDRA-9738 > URL: https://issues.apache.org/jira/browse/CASSANDRA-9738 > Project: Cassandra > Issue Type: Sub-task >Reporter: Robert Stupp >Assignee: Robert Stupp > > Key cache still uses a concurrent map on-heap. This could go to off-heap and > feels doable now after CASSANDRA-8099. > Evaluation should be done in advance based on a POC to prove that pure > off-heap counter cache buys a performance and/or gc-pressure improvement. > In theory, elimination of on-heap management of the map should buy us some > benefit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to blame)
[ https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Stupp updated CASSANDRA-11723: - Status: Awaiting Feedback (was: Open) > Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to > blame) > -- > > Key: CASSANDRA-11723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11723 > Project: Cassandra > Issue Type: Bug >Reporter: Stefano Ortolani > Fix For: 3.0.x > > > Upgrade seems fine, but any restart of the node might lead to a situation > where the node just dies after 30 seconds / 1 minute. > Nothing in the logs besides many "FailureDetector.java:456 - Ignoring > interval time of 3000892567 for /10.12.a.x" output every second (against all > other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair > notifications: > {code:xml} > DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - > Ignoring interval time of 2373187360 for /10.12.a.x > DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - > Ignoring interval time of 2000276196 for /10.12.a.y > DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - > Digest mismatch: > org.apache.cassandra.service.DigestMismatchException: Mismatch for key > DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) > (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718) > at > org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - > Ignoring interval time of 3000299340 for /10.12.33.5 > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > at > org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_60] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_60] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {code} > I know this is not much but nothing else gets to dmesg or to any other log. > Any suggestion how to debug this further? > I upgraded two nodes so far, and it happened on both nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11723) Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to blame)
[ https://issues.apache.org/jira/browse/CASSANDRA-11723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15659364#comment-15659364 ] Robert Stupp commented on CASSANDRA-11723: -- [~ostefano], did upgrading to recent jemalloc work for you? > Cassandra upgrade from 2.1.11 to 3.0.5 leads to unstable nodes (jemalloc to > blame) > -- > > Key: CASSANDRA-11723 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11723 > Project: Cassandra > Issue Type: Bug >Reporter: Stefano Ortolani > Fix For: 3.0.x > > > Upgrade seems fine, but any restart of the node might lead to a situation > where the node just dies after 30 seconds / 1 minute. > Nothing in the logs besides many "FailureDetector.java:456 - Ignoring > interval time of 3000892567 for /10.12.a.x" output every second (against all > other nodes) in debug.log plus some spurious GraphiteErrors/ReadRepair > notifications: > {code:xml} > DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - > Ignoring interval time of 2373187360 for /10.12.a.x > DEBUG [GossipStage:1] 2016-05-05 22:29:03,921 FailureDetector.java:456 - > Ignoring interval time of 2000276196 for /10.12.a.y > DEBUG [ReadRepairStage:24] 2016-05-05 22:29:03,990 ReadCallback.java:234 - > Digest mismatch: > org.apache.cassandra.service.DigestMismatchException: Mismatch for key > DecoratedKey(-152946356843306763, e859fdd2f264485f42030ce261e4e12e) > (d6e617ece3b7bec6138b52b8974b8cab vs 31becca666a62b3c4b2fc0bab9902718) > at > org.apache.cassandra.service.DigestResolver.resolve(DigestResolver.java:85) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.service.ReadCallback$AsyncRepairRunner.run(ReadCallback.java:225) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > DEBUG [GossipStage:1] 2016-05-05 22:29:04,841 FailureDetector.java:456 - > Ignoring interval time of 3000299340 for /10.12.33.5 > ERROR [metrics-graphite-reporter-1-thread-1] 2016-05-05 22:29:05,692 > ScheduledReporter.java:119 - RuntimeException thrown from > GraphiteReporter#report. Exception was suppressed. > java.lang.IllegalStateException: Unable to compute ceiling for max when > histogram overflowed > at > org.apache.cassandra.utils.EstimatedHistogram.rawMean(EstimatedHistogram.java:231) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > org.apache.cassandra.metrics.EstimatedHistogramReservoir$HistogramSnapshot.getMean(EstimatedHistogramReservoir.java:103) > ~[apache-cassandra-3.0.5.jar:3.0.5] > at > com.codahale.metrics.graphite.GraphiteReporter.reportHistogram(GraphiteReporter.java:252) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.graphite.GraphiteReporter.report(GraphiteReporter.java:166) > ~[metrics-graphite-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter.report(ScheduledReporter.java:162) > ~[metrics-core-3.1.0.jar:3.1.0] > at > com.codahale.metrics.ScheduledReporter$1.run(ScheduledReporter.java:117) > ~[metrics-core-3.1.0.jar:3.1.0] > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) > [na:1.8.0_60] > at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) > [na:1.8.0_60] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) > [na:1.8.0_60] > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > [na:1.8.0_60] > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > [na:1.8.0_60] > at java.lang.Thread.run(Thread.java:745) [na:1.8.0_60] > {code} > I know this is not much but nothing else gets to dmesg or to any other log. > Any suggestion how to debug this further? > I upgraded two nodes so far, and it happened on both nodes. -- This message was sent by Atlassian JIRA (v6.3.4#6332)