[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-04-19 Thread ifernandezca...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ivan Fernandez Calvo assigned an issue to Ivan Fernandez Calvo  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-49736  
 
 
  Remote ssh slave disconnection   
 

  
 
 
 
 

 
Change By: 
 Ivan Fernandez Calvo  
 
 
Assignee: 
 Ivan Fernandez Calvo  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-04-19 Thread ifernandezca...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ivan Fernandez Calvo closed an issue as Not A Defect  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Trilead library it is really sensitive to network performance/issues  
 

  
 
 
 
 

 
 Jenkins /  JENKINS-49736  
 
 
  Remote ssh slave disconnection   
 

  
 
 
 
 

 
Change By: 
 Ivan Fernandez Calvo  
 
 
Status: 
 Open Closed  
 
 
Resolution: 
 Not A Defect  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit h

[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-04-18 Thread ch...@preface.co.uk (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Chris Amis commented on  JENKINS-49736  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Remote ssh slave disconnection   
 

  
 
 
 
 

 
 I said earlier that the 1G connection made things better, after some time now with not another occurence I think it fixed it. I did not just change the switch, we stripped the whole lot down and rebuilt it in a new home, we check all cables were nice and snug and so on. My guess is that we must have had a bad connection and the jenkins SSH connection was really susceptible (remember that the tests would run the full 60 hours if I ran them from a cmd prompt on the jenkins machine). Chris  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-04-18 Thread franc...@aichelbaum.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Francois Aichelbaum commented on  JENKINS-49736  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Remote ssh slave disconnection   
 

  
 
 
 
 

 
 Hi Ivan Fernandez Calvo   On our side, we have the very exact same issues since beginning of January and kernel patches for Spectre/Meltdown. Sysctl on Jenkins (core) and the various slaves) have been tweaked multiple times without luck. Your figures are not the best for our setup and the number of jobs we run, but though, did not help either when we used such (those values are very conservative but are not of the best use in production environments with high load.   Cheers  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-04-18 Thread ifernandezca...@cloudbees.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Ivan Fernandez Calvo commented on  JENKINS-49736  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Remote ssh slave disconnection   
 

  
 
 
 
 

 
 Did you try to tune the keepalive int the TCP stack? the default values are terrible 

 

sysctl -w net.ipv4.tcp_keepalive_time=120
sysctl -w net.ipv4.tcp_keepalive_intvl=30
sysctl -w net.ipv4.tcp_keepalive_probes=8
sysctl -w net.ipv4.tcp_fin_timeout=30
 

  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-03-13 Thread o.v.nenas...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Oleg Nenashev assigned an issue to Unassigned  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Bulk issue update: The plugin connectivity is still unstable from what I see in this and other reports. Probably the recent patches in 1.24-1.25 caused some extra instability by getting rid of interlocks between agent connection and termination logic. Apparently it impacts some reconnection scenarios due to the race conditions. Unfortunately I do not have capacity to work on the plugin in medium-term. So for now I am unassigning issues from myself. Ivan Fernandez Calvo was very kind to take ownership of the plugin and to handle some workload in it. Probably he will have some capacity to review the backlog I was unable to triage.  
 

  
 
 
 
 

 
 Jenkins /  JENKINS-49736  
 
 
  Remote ssh slave disconnection   
 

  
 
 
 
 

 
Change By: 
 Oleg Nenashev  
 
 
Assignee: 
 Oleg Nenashev  
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

   

[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-03-07 Thread ch...@preface.co.uk (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Chris Amis commented on  JENKINS-49736  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Remote ssh slave disconnection   
 

  
 
 
 
 

 
 We have managed to very much improved matters by putting all the machines on a single 1G switch. Before this we had Jenkins-1G-100M-Slave which does not sound too bad. The 1G switch is the same device, the 100M is a big Cisco job in a rack. We have plans to remote the Jenkins server into the cloud, should I say we cannot risk it?    
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-03-01 Thread ch...@preface.co.uk (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Chris Amis commented on  JENKINS-49736  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: Remote ssh slave disconnection   
 

  
 
 
 
 

 
 I updated the problem box to the latest LTS release, same as the working box. Much improved, but still fails 50% of the time. I suspect one of the corporate forced updates that happened on the day it went wrong is the culprit. http://support.microsoft.com/?kbid=4056894 http://support.microsoft.com/?kbid=4056568 http://support.microsoft.com/?kbid=4054176 http://support.microsoft.com/?kbid=4054998 Somehow something has made the jenkins ssh comms more susceptible to something.    
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e)  
 
 

 
   
 

  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-49736) Remote ssh slave disconnection

2018-02-26 Thread ch...@preface.co.uk (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Chris Amis created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-49736  
 
 
  Remote ssh slave disconnection   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Oleg Nenashev  
 
 
Components: 
 ssh-slaves-plugin  
 
 
Created: 
 2018-02-26 10:25  
 
 
Environment: 
 Jenkins 2.89.4 on Windows 7. ssh slaves, 14.04 LTS and 16.04 LTS.  
 
 
Labels: 
 slave  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Chris Amis  
 

  
 
 
 
 

 
 This is a bit of an epic, sorry. For a year or so we have had a system configured that after build triggers an ssh slave (14.04 LTS) in a test chamber to run hours of tests, an hour after a build, 12 hours overnight and 60 hours over the weekend. All stable and happy. For the last couple of months we have been recreating the setup for a new product, pretty much a copy/paste of the existing system except this is on 16.04 LTS. Here is the twist, adding the new system seems to have broken the old. I have about logs of about 600 good jobs up to 16th Feb, after that about 2 in 50 have worked. I cannot even work out which end is failing. Over the last weekend I disconnected Jenkins from the slaves (I changed the addresses not just offlined). On the jenkins server I ran 2 command windows, in the windows I ran ssh connections to the slaves and executed the commands manually to run 60 hour tests on both. Both stayed up for 60 hours and ran to conclusion. Tried automated again this morning, broken in minutes. I spotted this morning a new version of the ssh plugin, so I upgraded from 1.25.1 to 1.26, same results. Any ideas...   The jenkins log looks like this