I found a solution to this shortly after posting my question. It seems
jenkins uses a cache directory in the SSH users homedir for caching of
jars when the agent runs. All of my slaves use the same SSH account to
launch the agent. That account authenticates using LDAP and has a
homedir on a central fileserver that all the slave nodes connect to. So
when the first connection is established, it seems to be locking the
cache dir, blocking subsequent use of it.
This appears to be new behavior, as we have been using this setup for
years without trouble. The solution was to set -Duser.home=
in the JVM Options in advanced configuration for the node. Since
is always on the local disk of the slave node, there
should be no further file locking problems.
It would have saved me some pain if the error logging had been more
descriptive to identify this problem. "I/O error" or "null" was not
super useful in debugging this. And since we haven't had problems in the
past, it didn't occur to me that this might be an issue. Regardless, it
seems to be resolved now and hopefully this post will help somebody in
the future.
Seth
On 8/28/19 11:34 AM, Seth Galitzer wrote:
For the last two weeks, I cannot launch the remoting agent on linux
slaves. Server version is 2.191, running on Debian 9.9 (stretch),
installed from jenkins.io repo. Slaves are Ubuntu 18.04 (bionic), with
openjdk-8 installed. Eventually, one slave will start, but none of the
rest will. Between reboots or restarts of the jenkins server, the slave
that successfully connects is different each time. There are 22 linux
slaves total. Working directory is on local disk for each slave. SSH
user is from LDAP.
Can somebody help me figure out what is blocking the start of the agents?
Thanks.
Seth
Sample server log:
2019-08-28 16:11:35.107+
[id=740]SEVEREhudson.slaves.ChannelPinger#install: Failed to set up a
ping for linux64-santos13-minion
java.io.IOException: Closing all channels
at com.trilead.ssh2.channel.Channel.setReasonClosed(Channel.java:333)
at
com.trilead.ssh2.channel.ChannelManager.closeChannel(ChannelManager.java:289)
at
com.trilead.ssh2.channel.ChannelManager.closeAllChannels(ChannelManager.java:269)
at com.trilead.ssh2.Connection.close(Connection.java:536)
at com.trilead.ssh2.Connection.close(Connection.java:530)
at
hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:511)
at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:484)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297)
at
jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at
jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
Caused: java.io.IOException: SSH channel is closed
at
com.trilead.ssh2.channel.ChannelManager.ioException(ChannelManager.java:1540)
at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:373)
at
com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
at
com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:68)
at
hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89)
at
hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:62)
at
hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
at
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
at hudson.remoting.Channel.send(Channel.java:721)
at hudson.remoting.Request.call(Request.java:213)
at hudson.remoting.Channel.call(Channel.java:954)
at hudson.slaves.ChannelPinger.install(ChannelPinger.java:115)
at hudson.slaves.ChannelPinger.preOnline(ChannelPinger.java:98)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:667)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:435)
at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:607)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:113)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:441)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-28 16:11:35.107+
[id=846]INFOh.r.SynchronousCommandTransport$ReaderThread#run: I/O error
in channel linux64-santos14-minion
java.io.EOFException
at
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:49)