Re: Issue with New Production Cluster

2017-09-29 Thread Michael Shuler
"Connection reset by peer" is almost certainly network issues. Same error:

https://github.com/netty/netty/issues/5993

mtr - ping/trace tool to find possible flaky switch/router
tcpdump and/or wireshark - tools to gather and observe network packets

-- 
Michael

On 09/29/2017 10:38 AM, Thakrar, Jayesh wrote:
> Other things you can do to test network issues (as a non-sysadmin user)
> is to use the following commands -
> 
>  
> 
> "netstat -i"
> 
> This will show all the network interfaces (including any bonded
> interfaces) and the in/out traffic (as rx/tx) and any errors in them.
> 
> Note that these are cumulative counters, so you need to do a number of
> samplings to ensure that the errors, if any, are not old ones.
> 
>  
> 
> "netstat -s"
> 
> This show more extended network level stats - and the info varies by the
> OS flavor and version.
> 
> But many of the stats are kind of self-explanatory.
> 
>  
> 
> *From: *Jeff Jirsa 
> *Date: *Friday, September 29, 2017 at 10:32 AM
> *To: *cassandra 
> *Subject: *Re: Issue with New Production Cluster
> 
>  
> 
> I don't know what logging is available driver side, I'd probably be
> writing a shell script to ping all three servers to see if I can prove
> there's a network problem outside of cassandra first.
> 
>  
> 
> On Fri, Sep 29, 2017 at 8:19 AM, Jonathan Baynes
> mailto:jonathan.bay...@tradeweb.com>> wrote:
> 
> Thank you Jeff, you are a credit to this community.. very helpful
> 
>  
> 
> Last question Is there any logging I can turn on (especially with
> the cassandra drive 2.5.0) that would assist in gaining more of an
> insight as to what’s going on? If I’m getting any failed connections
> at the cassandra layer?
> 
>  
> 
>     Thanks
> 
> J
> 
>  
> 
> *From:*Jeff Jirsa [mailto:jji...@gmail.com <mailto:jji...@gmail.com>]
> *Sent:* 29 September 2017 16:07
> 
> 
> *To:* cassandra
> *Subject:* Re: Issue with New Production Cluster
> 
>  
> 
> The failure detector is seeing updates every 2.1-2.5  seconds, which
> it will ignore because it's over the 2 second default failure
> detector interval.
> 
>  
> 
> If there's no load, there's no reason it shouldn't be seeing it far
> more frequently.
> 
>  
> 
> If you're not seeing any signs of GC pauses (like GCInspector log
> lines), then it seems like you've got a network interface flapping -
> maybe STP is triggering on a port, or a bad cable/interface, or
> something along those lines. 
> 
>  
> 
>  
> 
>  
> 
>  
> 
>  
> 
> On Fri, Sep 29, 2017 at 7:32 AM, Jonathan Baynes
> mailto:jonathan.bay...@tradeweb.com>>
> wrote:
> 
> Hi Jeff,
> 
>  
> 
> This is version 3.0.11. Being run on Oracle Red Hat Linux.
> 
>  
> 
> If I retry immediately it fails, leave it for 20 minutes, like I
> have just now and retry it and it has worked. (?!?!)
> 
> Ive checked all the logs (system and Debug ) and in the logs I have
> this:
> 
>  
> 
> DEBUG [GossipStage:1] 2017-09-29 15:28:04,390
> FailureDetector.java:456 - Ignoring interval time of 2102011879
>  for /10.172.181.62
> 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.181.62&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=F0Xvn8VrY2v_GJioteQGl2Xl9NrlI4T532IaGBwQL-U&e=>
> 
> INFO  [SharedPool-Worker-1] 2017-09-29 15:28:07,382 Message.java:615
> - Unexpected exception during request; channel = [id: 0x963c334d,
> L:/10.172.117.61:9042
> 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.117.61-3A9042&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=N1yysCzQbMJfPonp7tTVzQ5V6P_NGmCRCIGYb-JJS0w&e=>
> - R:/10.172.117.21:51853
> 
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.117.21-3A51853&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=NwQs5pv53MTu7RXO4tUrDSVZAtsEdXTsObmf9GWbNKE&e=>]
> 
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peerat
> io.netty.channel.unix.FileDescriptor.readAddres

Re: Issue with New Production Cluster

2017-09-29 Thread Jeff Jirsa
I don't know what logging is available driver side, I'd probably be writing
a shell script to ping all three servers to see if I can prove there's a
network problem outside of cassandra first.

On Fri, Sep 29, 2017 at 8:19 AM, Jonathan Baynes <
jonathan.bay...@tradeweb.com> wrote:

> Thank you Jeff, you are a credit to this community.. very helpful
>
>
>
> Last question Is there any logging I can turn on (especially with the
> cassandra drive 2.5.0) that would assist in gaining more of an insight as
> to what’s going on? If I’m getting any failed connections at the cassandra
> layer?
>
>
>
> Thanks
>
> J
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* 29 September 2017 16:07
>
> *To:* cassandra
> *Subject:* Re: Issue with New Production Cluster
>
>
>
> The failure detector is seeing updates every 2.1-2.5  seconds, which it
> will ignore because it's over the 2 second default failure detector
> interval.
>
>
>
> If there's no load, there's no reason it shouldn't be seeing it far more
> frequently.
>
>
>
> If you're not seeing any signs of GC pauses (like GCInspector log lines),
> then it seems like you've got a network interface flapping - maybe STP is
> triggering on a port, or a bad cable/interface, or something along those
> lines.
>
>
>
>
>
>
>
>
>
>
>
> On Fri, Sep 29, 2017 at 7:32 AM, Jonathan Baynes <
> jonathan.bay...@tradeweb.com> wrote:
>
> Hi Jeff,
>
>
>
> This is version 3.0.11. Being run on Oracle Red Hat Linux.
>
>
>
> If I retry immediately it fails, leave it for 20 minutes, like I have just
> now and retry it and it has worked. (?!?!)
>
> Ive checked all the logs (system and Debug ) and in the logs I have this:
>
>
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:04,390 FailureDetector.java:456 -
> Ignoring interval time of 2102011879 <(210)%20201-1879> for /10.172.181.62
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.181.62&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=F0Xvn8VrY2v_GJioteQGl2Xl9NrlI4T532IaGBwQL-U&e=>
>
> INFO  [SharedPool-Worker-1] 2017-09-29 15:28:07,382 Message.java:615 -
> Unexpected exception during request; channel = [id: 0x963c334d, L:/
> 10.172.117.61:9042
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.117.61-3A9042&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=N1yysCzQbMJfPonp7tTVzQ5V6P_NGmCRCIGYb-JJS0w&e=>
> - R:/10.172.117.21:51853
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.117.21-3A51853&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=NwQs5pv53MTu7RXO4tUrDSVZAtsEdXTsObmf9GWbNKE&e=>
> ]
>
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peerat io.netty.channel.unix.
> FileDescriptor.readAddress(...)(Unknown Source)
> ~[netty-all-4.0.44.Final.jar:4.0.44.Final]
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:07,389 FailureDetector.java:456 -
> Ignoring interval time of 2085147992 <(208)%20514-7992> for /10.172.117.63
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.117.63&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=kcNQ4izrbn-35U5vWnv5cBKG6XfN3nmgkKozyHEfaWA&e=>
>
> INFO  [SharedPool-Worker-2] 2017-09-29 15:28:07,400 Message.java:615 -
> Unexpected exception during request; channel = [id: 0xa7633234, L:/
> 10.172.117.61:9042
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.117.61-3A9042&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=N1yysCzQbMJfPonp7tTVzQ5V6P_NGmCRCIGYb-JJS0w&e=>
> ! R:/10.172.117.21:51818
> <https://urldefense.proofpoint.com/v2/url?u=http-3A__10.172.117.21-3A51818&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=LHuC2PiAE_qEnK8nzWN-xEQVrqsbc2SCkDvN8UV7FvA&s=Zd17m05nbPXla0hA2GWhNKHRRo-N_nFqOfrN2_c6meY&e=>
> ]
>
> io.netty.channel.unix.Errors$NativeIoException: syscall:read(...)()
> failed: Connection reset by peerat io.netty.channel.unix.
> FileDescriptor.readAddress(...)(Unknown Source)
> ~[netty-all-4.0.44.Final.jar:4

Re: Issue with New Production Cluster

2017-09-29 Thread Jeff Jirsa
2017-09-29 15:28:30,306 FailureDetector.java:456 -
> Ignoring interval time of 2000326847 for /10.172.181.63
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:32,393 FailureDetector.java:456 -
> Ignoring interval time of 2086445998 <(208)%20644-5998> for /10.172.181.62
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:32,393 FailureDetector.java:456 -
> Ignoring interval time of 2086481727 <(208)%20648-1727> for /10.172.181.63
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:35,290 FailureDetector.java:456 -
> Ignoring interval time of 2363746015 <(236)%20374-6015> for /10.172.117.62
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:35,290 FailureDetector.java:456 -
> Ignoring interval time of 2151813348 for /10.172.117.63
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:37,393 FailureDetector.java:456 -
> Ignoring interval time of 2103432914 <(210)%20343-2914> for /10.172.181.62
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:40,394 FailureDetector.java:456 -
> Ignoring interval time of 2255155834 <(225)%20515-5834> for /10.172.117.63
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:41,394 FailureDetector.java:456 -
> Ignoring interval time of 2086966679 <(208)%20696-6679> for /10.172.117.62
>
> DEBUG [GossipStage:1] 2017-09-29 15:28:41,394 FailureDetector.java:456 -
> Ignoring interval time of 2087037617 <(208)%20703-7617> for /10.172.181.63
>
> INFO  [HANDSHAKE-/10.172.181.61] 2017-09-29 15:28:47,132
> OutboundTcpConnection.java:523 - Handshaking version with /10.172.181.61
>
>
>
>
>
> The Handshake @15:28 is about the same time as I just tried it and it
> working. So it has an inability to handshake with the nodes??
>
>
>
> Any suggestions would be great, I have to go to the Network team with
> this, but I’m not sure where to start…
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* 29 September 2017 14:54
> *To:* cassandra
> *Subject:* Re: Issue with New Production Cluster
>
>
>
> What version?
>
> If you retry immediately, does it reconnect?
>
> Anything in the logs?
>
>
>
> What you describe is atypical - timeouts on queries can (and will) happen
> occasionally under load, but timeout on connect is atypical. Any sign of
> networking issues/slowness/dns problems/etc?
>
>
>
>
>
> On Fri, Sep 29, 2017 at 6:43 AM, Jonathan Baynes <
> jonathan.bay...@tradeweb.com> wrote:
>
> Hi Community,
>
>
>
> I have a 6 node ring, covering 2 DC’s, the ring isn’t being used yet and
> we are just in the connectivity and testing phase. So the boxes are NOT
> under any load.
>
>
>
> I’ve gone to connect to CQLSH this afternoon and I’ve had this returned:
>
>
>
> cqlsh xx.xxx.xxx.xx -u cassandra -p cassandra
>
> Connection error: ('Unable to connect to any servers', { xx.xxx.xxx.xx’:
> OperationTimedOut('errors=Timed out creating connection (5 seconds),
> last_host=None',)})
>
>
>
> Status of the service is “Running”
>
> Nodetool status is UP for all nodes.
>
> Tested telnet on all ports and they are all up and connecting.
>
> Checked the PID is up for cassandra (it is)
>
>
>
> Any idea how I can debug the cause of this further?
>
>
>
>
>
> Thanks
>
> J
>
>
>
>
>
>
>
> *Jonathan Baynes*
>
> DBA
> Tradeweb Europe Limited
>
> Moor Place  •  1 Fore Street Avenue
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D1-2BFore-2BStreet-2BAvenue-25C2-25A0-25C2-25A0-25E2-2580-25A2-25C2-25A0-25C2-25A0London-2BEC2Y-2B9DT-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=ynRqj6nQ2Rc0PoXhoOv_0HAaz8G8xuGO8wNbuHB-3pU&s=46cq-66-QUqnj-taNMRFYe7YK3eNez6ogHJCv014X9w&e=>
>   •
> <https://maps.google.com/?q=1%0D+Fore+Street+Avenue%C2%A0%C2%A0%E2%80%A2%C2%A0%C2%A0London%0D+EC2Y+9DT+%3Chttps://urldefense.proofpoint.com/v2/url?u%3Dhttps-3A__maps.google.com_-3Fq-3D1-2BFore-2BStreet-2BAvenue-25C2-25A0-25C2-25A0-25E2-2580-25A2-25C2-25A0-25C2-25A0London-2BEC2Y-2B9DT-26entry-3Dgmail-26source-3Dg%26d%3DDwMFaQ%26c%3DsA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk%26r%3DCNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8%26m%3DynRqj6nQ2Rc0PoXhoOv_0HAaz8G8xuGO8wNbuHB-3pU%26s%3D46cq-66-QUqnj-taNMRFYe7YK3eNez6ogHJCv014X9w%26e%3D%3E&entry=gmail&source=g>
>   London EC2Y 9DT
> <https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D1-2BFore-2BStreet-2BAvenue-25C2-25A0-25C2-25A0-25E2-2580-25A2-25C2-25A0-25C2-25A0London-2BEC2Y-2B9DT-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8

RE: Issue with New Production Cluster

2017-09-29 Thread Jonathan Baynes
 Handshake @15:28 is about the same time as I just tried it and it working. 
So it has an inability to handshake with the nodes??

Any suggestions would be great, I have to go to the Network team with this, but 
I’m not sure where to start…

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: 29 September 2017 14:54
To: cassandra
Subject: Re: Issue with New Production Cluster

What version?
If you retry immediately, does it reconnect?
Anything in the logs?

What you describe is atypical - timeouts on queries can (and will) happen 
occasionally under load, but timeout on connect is atypical. Any sign of 
networking issues/slowness/dns problems/etc?


On Fri, Sep 29, 2017 at 6:43 AM, Jonathan Baynes 
mailto:jonathan.bay...@tradeweb.com>> wrote:
Hi Community,

I have a 6 node ring, covering 2 DC’s, the ring isn’t being used yet and we are 
just in the connectivity and testing phase. So the boxes are NOT under any load.

I’ve gone to connect to CQLSH this afternoon and I’ve had this returned:

cqlsh xx.xxx.xxx.xx -u cassandra -p cassandra
Connection error: ('Unable to connect to any servers', { xx.xxx.xxx.xx’: 
OperationTimedOut('errors=Timed out creating connection (5 seconds), 
last_host=None',)})

Status of the service is “Running”
Nodetool status is UP for all nodes.
Tested telnet on all ports and they are all up and connecting.
Checked the PID is up for cassandra (it is)

Any idea how I can debug the cause of this further?


Thanks
J



Jonathan Baynes
DBA
Tradeweb Europe Limited
Moor Place  •  1 Fore Street 
Avenue<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D1-2BFore-2BStreet-2BAvenue-25C2-25A0-25C2-25A0-25E2-2580-25A2-25C2-25A0-25C2-25A0London-2BEC2Y-2B9DT-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=ynRqj6nQ2Rc0PoXhoOv_0HAaz8G8xuGO8wNbuHB-3pU&s=46cq-66-QUqnj-taNMRFYe7YK3eNez6ogHJCv014X9w&e=>
  •  London EC2Y 
9DT<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3D1-2BFore-2BStreet-2BAvenue-25C2-25A0-25C2-25A0-25E2-2580-25A2-25C2-25A0-25C2-25A0London-2BEC2Y-2B9DT-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=ynRqj6nQ2Rc0PoXhoOv_0HAaz8G8xuGO8wNbuHB-3pU&s=46cq-66-QUqnj-taNMRFYe7YK3eNez6ogHJCv014X9w&e=>
P +44 (0)20 77760988  •  F +44 (0)20 7776 
3201  •  M +44 (0) xx
jonathan.bay...@tradeweb.com<mailto:jonathan.bay...@tradeweb.com>

[cid:image001.jpg@01CD26AD.4165F110]<http://www.tradeweb.com/>   follow us:  
[cid:image002.jpg@01CD26AD.4165F110] 
<https://urldefense.proofpoint.com/v2/url?u=https-3A__www.linkedin.com_company_tradeweb-3Ftrk-3Dtop-5Fnav-5Fhome&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=ynRqj6nQ2Rc0PoXhoOv_0HAaz8G8xuGO8wNbuHB-3pU&s=saD3ZbF7LhTn3IHCHWSzhGp8Ns6SIRO0s0KXCnEbqLU&e=>
[cid:image003.jpg@01CD26AD.4165F110] 
<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.twitter.com_Tradeweb&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=ynRqj6nQ2Rc0PoXhoOv_0HAaz8G8xuGO8wNbuHB-3pU&s=Duw_ud1x7HIX3h-0UEtKorCrlKn8C5tUoH6tkrvXDy0&e=>
—
A leading marketplace<http://www.tradeweb.com/About-Us/Awards/> for electronic 
fixed income, derivatives and ETF trading




This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy it. Any unauthorized copying, 
disclosure or distribution of the material in this e-mail is strictly 
forbidden. Tradeweb reserves the right to monitor all e-mail communications 
through its networks. If you do not wish to receive marketing emails about our 
products / services, please let us know by contacting us, either by email at 
contac...@tradeweb.com<mailto:contac...@tradeweb.com> or by writing to us at 
the registered office of Tradeweb in the UK, which is: Tradeweb Europe Limited 
(company number 3912826), 1 Fore Street Avenue London EC2Y 
9DT<https://urldefense.proofpoint.com/v2/url?u=https-3A__maps.google.com_-3Fq-3Dnumber-2B3912826-29-2C-2B1-2BFore-2BStreet-2BAvenue-2BLondon-2BEC2Y-2B9DT-26entry-3Dgmail-26source-3Dg&d=DwMFaQ&c=sA0VaJZJFLZREu2pbPeqjXHJ-Wd9NNzgHW3gpUOLSSk&r=CNKccIKIKCVbYTu1VxR8dIOP6NLpf4fYjidpNm-diQ8&m=ynRqj6nQ2Rc0PoXhoOv_0HAaz8G8xuGO8wNbuHB-3pU&s=VSn9NuhaGsP_VCpKoBZFV2hz_46iJ3VhVk5rYBKHeN8&e=>.
 To see our privacy policy, visit our website @ 
www.tradeweb.com<http://www.tradeweb.com>.



Re: Issue with New Production Cluster

2017-09-29 Thread Jeff Jirsa
What version?
If you retry immediately, does it reconnect?
Anything in the logs?

What you describe is atypical - timeouts on queries can (and will) happen
occasionally under load, but timeout on connect is atypical. Any sign of
networking issues/slowness/dns problems/etc?


On Fri, Sep 29, 2017 at 6:43 AM, Jonathan Baynes <
jonathan.bay...@tradeweb.com> wrote:

> Hi Community,
>
>
>
> I have a 6 node ring, covering 2 DC’s, the ring isn’t being used yet and
> we are just in the connectivity and testing phase. So the boxes are NOT
> under any load.
>
>
>
> I’ve gone to connect to CQLSH this afternoon and I’ve had this returned:
>
>
>
> cqlsh xx.xxx.xxx.xx -u cassandra -p cassandra
>
> Connection error: ('Unable to connect to any servers', { xx.xxx.xxx.xx’:
> OperationTimedOut('errors=Timed out creating connection (5 seconds),
> last_host=None',)})
>
>
>
> Status of the service is “Running”
>
> Nodetool status is UP for all nodes.
>
> Tested telnet on all ports and they are all up and connecting.
>
> Checked the PID is up for cassandra (it is)
>
>
>
> Any idea how I can debug the cause of this further?
>
>
>
>
>
> Thanks
>
> J
>
>
>
>
>
>
>
> *Jonathan Baynes*
>
> DBA
> Tradeweb Europe Limited
>
> Moor Place  •  1 Fore Street Avenue
> 
>   •  London EC2Y 9DT
> 
> P +44 (0)20 77760988 <+44%2020%207776%200988>  •  F +44 (0)20 7776 3201
> <+44%2020%207776%203201>  •  M +44 (0) xx
>
> jonathan.bay...@tradeweb.com
>
>
>
> [image: cid:image001.jpg@01CD26AD.4165F110] 
> follow us:  *[image: cid:image002.jpg@01CD26AD.4165F110]*
>    [image:
> cid:image003.jpg@01CD26AD.4165F110] 
>
> —
>
> A leading marketplace  for
> electronic fixed income, derivatives and ETF trading
>
>
>
> 
>
> This e-mail may contain confidential and/or privileged information. If you
> are not the intended recipient (or have received this e-mail in error)
> please notify the sender immediately and destroy it. Any unauthorized
> copying, disclosure or distribution of the material in this e-mail is
> strictly forbidden. Tradeweb reserves the right to monitor all e-mail
> communications through its networks. If you do not wish to receive
> marketing emails about our products / services, please let us know by
> contacting us, either by email at contac...@tradeweb.com or by writing to
> us at the registered office of Tradeweb in the UK, which is: Tradeweb
> Europe Limited (company number 3912826), 1 Fore Street Avenue London EC2Y
> 9DT
> .
> To see our privacy policy, visit our website @ www.tradeweb.com.
>


Issue with New Production Cluster

2017-09-29 Thread Jonathan Baynes
Hi Community,

I have a 6 node ring, covering 2 DC's, the ring isn't being used yet and we are 
just in the connectivity and testing phase. So the boxes are NOT under any load.

I've gone to connect to CQLSH this afternoon and I've had this returned:

cqlsh xx.xxx.xxx.xx -u cassandra -p cassandra
Connection error: ('Unable to connect to any servers', { xx.xxx.xxx.xx': 
OperationTimedOut('errors=Timed out creating connection (5 seconds), 
last_host=None',)})

Status of the service is "Running"
Nodetool status is UP for all nodes.
Tested telnet on all ports and they are all up and connecting.
Checked the PID is up for cassandra (it is)

Any idea how I can debug the cause of this further?


Thanks
J



Jonathan Baynes
DBA
Tradeweb Europe Limited
Moor Place  *  1 Fore Street Avenue  *  London EC2Y 9DT
P +44 (0)20 77760988  *  F +44 (0)20 7776 3201  *  M +44 (0) xx
jonathan.bay...@tradeweb.com

[cid:image001.jpg@01CD26AD.4165F110]   follow us:  
[cid:image002.jpg@01CD26AD.4165F110] 

[cid:image003.jpg@01CD26AD.4165F110] 
-
A leading marketplace for electronic 
fixed income, derivatives and ETF trading




This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy it. Any unauthorized copying, 
disclosure or distribution of the material in this e-mail is strictly 
forbidden. Tradeweb reserves the right to monitor all e-mail communications 
through its networks. If you do not wish to receive marketing emails about our 
products / services, please let us know by contacting us, either by email at 
contac...@tradeweb.com or by writing to us at the registered office of Tradeweb 
in the UK, which is: Tradeweb Europe Limited (company number 3912826), 1 Fore 
Street Avenue London EC2Y 9DT. To see our privacy policy, visit our website @ 
www.tradeweb.com.