[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy commented on JENKINS-48955 Re: master-slave connection getting terminated once in every 12 hours and recovered after 1 minute Ivan Fernandez Calvo I'm going to give this solution a chance and provide a feedback here. Thanks for staying in touch. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy edited a comment on JENKINS-48955 Re: master-slave connection getting terminated once in every 12 hours and recovered after 1 minute We still experiencing described problems. What was done among other actions: 1. Ping thread was disabled in Jenkins. ([https://wiki.jenkins.io/display/JENKINS/Ping+Thread)]2. SELinux was completely disabled on slaves (getenforce outputs `Disabled`)3. All possible timeouts were increased. 4. Java version was set to be same on agents and server. The most helpful action was disabling of SELinux - the number of ssh failures decreased by 10 times.[~stibi] FYI. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy commented on JENKINS-48955 Re: master-slave connection getting terminated once in every 12 hours and recovered after 1 minute We still experiencing described problems. What was done among other actions: 1. Ping thread was disabled in Jenkins. (https://wiki.jenkins.io/display/JENKINS/Ping+Thread) 2. SELinux was completely disabled on slaves (getenforce outputs `Disabled`) 3. All possible timeouts were increased. The most helpful action was disabling of SELinux - the number of ssh failures decreased by 10 times. Martin Stiborský FYI. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy commented on JENKINS-48955 Re: master-slave connection getting terminated once in every 12 hours and recovered after 1 minute Ivan Fernandez Calvo thanks for your responses. any kind of network issues are excluded (or have very low possibility) - both server and agents are in same AWS VPC and have 10Gigs network enabled. In most of the time (99.99%) agents are performing well w/o any issues. there are only 2 executors per agent and agent is m4.xlarge (with 16G of RAM) instance. Jenkins agent is starting with default config. There were no OOMs/agent crushes spotted. I'm going to follow your suggestion and turn Agents logs on to see if some additional information can be gathered. And yes - we have a pretty high load on agents but I'm not sure it is so huge to make ssh connection interruptions. Add Comment This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy commented on JENKINS-48955 Re: master-slave connection getting terminated once in every 12 hours and recovered after 1 minute Prakash G, thanks for the comment but in my case, IPs are managed by AWS and overlapping of addresses is impossible. Add Comment This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy edited a comment on JENKINS-48955 Re: master-slave connection getting terminated once in every 12 hours and recovered after 1 minute The same approach found:{code:java}ERROR: Connection terminatedjava.io.EOFExceptionat java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680)at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155)at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861)at java.io.ObjectInputStream.(ObjectInputStream.java:357)at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48)at hudson.remoting.Command.readFrom(Command.java:140)at hudson.remoting.Command.readFrom(Command.java:126)at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36)at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)Caused: java.io.IOException: Unexpected termination of the channelat hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)ERROR: Socket connection to SSH server was lostjava.net.SocketTimeoutException: The connect timeout expiredat com.trilead.ssh2.Connection$1.run(Connection.java:762)at com.trilead.ssh2.util.TimeoutService$TimeoutThread.run(TimeoutService.java:91)Slave JVM has not reported exit code before the socket was lost[08/15/18 06:37:34] [SSH] Connection closed.{code}Jenkins app: 2.136SSH Agent Plugin: 1.16 SSH Slaves Plugin: 1.26 EC2 Fleet Plugin (v.1.1.7) was used to prepare agents. No possible network issues found, reproduces frequently in 20-30 minutes after successful agent startup. AWS spot instances weren't stopped/terminated during spotted fails.For the time of bug investigation is there any possibility to make failed stages transparently restart in case of agent connectivity issues? Add Comment This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396)
[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy reopened an issue Jenkins / JENKINS-48955 master-slave connection getting terminated once in every 12 hours and recovered after 1 minute Change By: Eugene Chepurniy Resolution: Cannot Reproduce Status: Resolved Reopened Add Comment This message was sent by Atlassian JIRA (v7.10.1#710002-sha1:6efc396) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-48955) master-slave connection getting terminated once in every 12 hours and recovered after 1 minute
Title: Message Title Eugene Chepurniy commented on JENKINS-48955 Re: master-slave connection getting terminated once in every 12 hours and recovered after 1 minute The same approach found: ERROR: Connection terminated java.io.EOFException at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2680) at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3155) at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:861) at java.io.ObjectInputStream.(ObjectInputStream.java:357) at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:48) at hudson.remoting.Command.readFrom(Command.java:140) at hudson.remoting.Command.readFrom(Command.java:126) at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:36) at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63) Caused: java.io.IOException: Unexpected termination of the channel at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77) ERROR: Socket connection to SSH server was lost java.net.SocketTimeoutException: The connect timeout expired at com.trilead.ssh2.Connection$1.run(Connection.java:762) at com.trilead.ssh2.util.TimeoutService$TimeoutThread.run(TimeoutService.java:91) Slave JVM has not reported exit code before the socket was lost [08/15/18 06:37:34] [SSH] Connection closed. Jenkins app: 2.136 SSH Agent Plugin: 1.16 EC2 Fleet Plugin (v.1.1.7) was used to prepare agents. No possible network issues found, reproduces frequently in 20-30 minutes after successful agent startup. AWS spot instances weren't stopped/terminated during spotted fails. For the time of bug investigation is there any possibility to make failed stages transparently restart in case of agent connectivity issues? Add Comment