[JIRA] (JENKINS-53879) EC2 workers terminated before connection can be established, only on v1.40
Title: Message Title FABRIZIO MANFREDI commented on JENKINS-53879 Re: EC2 workers terminated before connection can be established, only on v1.40 The revert is ongoing, root cause is under investigation, it is not happen with all jenkins version. Can you upload on the connection log of the slave to JENKINS-53876 What is the operating system of the slave node ? and what is the default shell of the user in the slave ? Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53879) EC2 workers terminated before connection can be established, only on v1.40
Title: Message Title Stephen Rosen closed an issue as Duplicate Jenkins / JENKINS-53879 EC2 workers terminated before connection can be established, only on v1.40 Change By: Stephen Rosen Status: Open Closed Resolution: Duplicate Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53879) EC2 workers terminated before connection can be established, only on v1.40
Title: Message Title Stephen Rosen commented on JENKINS-53879 Re: EC2 workers terminated before connection can be established, only on v1.40 It very well could be a duplicate; apologies if so. I didn't find that issue when looking at open issues, probably because I was looking for things about "premature termination" not "failure to connect". Let's mark it as a dup and close, and if I find it persists after that bug is fixed, I'll reopen. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53879) EC2 workers terminated before connection can be established, only on v1.40
Title: Message Title Basil Peace commented on JENKINS-53879 Re: EC2 workers terminated before connection can be established, only on v1.40 Isn't it a duplicate of JENKINS-53876? Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-53879) EC2 workers terminated before connection can be established, only on v1.40
Title: Message Title Stephen Rosen created an issue Jenkins / JENKINS-53879 EC2 workers terminated before connection can be established, only on v1.40 Issue Type: Bug Assignee: FABRIZIO MANFREDI Components: ec2-plugin Created: 2018-10-02 19:34 Environment: Jenkins ver. 2.138.1 Ubuntu 14.04 Priority: Major Reporter: Stephen Rosen I just did my monthly round of plugin upgrades and found that EC2 workers were failing to connect. Downgrading this plugin, and only this plugin, to v1.39 resolved the issue, so I'm fairly confident this is the source of the behavior. When I watched the activity, I see worker nodes spinning up, correctly, based on our various labels and job requirements. They get to the running state in EC2, but shortly thereafter (< 1 minute) they are terminated. I believe that this is happening during the guest OS boot time, as I tried polling for connections on port 22 from the master node and never got a success – and our workers are all configured to run sshd open to the master on 22. All of our nodes are configured with Launch Timeout in Seconds = 300, but this failure was very consistent and the last time I measured our launch times, they were around 3.5 minutes. If I had to venture a guess, I would say that something has changed in terms of the leniency with which nodes are treated as they start, and their boot times are being counted differently. If someone can confirm that that's what changed, I'll probably just up my timeouts to 10 minutes and walk away, but I'm not confident that would work.