Re: unable to launch remoting agent on slaves

2019-08-29 Thread Seth Galitzer
I found a solution to this shortly after posting my question. It seems 
jenkins uses a cache directory in the SSH users homedir for caching of 
jars when the agent runs. All of my slaves use the same SSH account to 
launch the agent. That account authenticates using LDAP and has a 
homedir on a central fileserver that all the slave nodes connect to. So 
when the first connection is established, it seems to be locking the 
cache dir, blocking subsequent use of it.


This appears to be new behavior, as we have been using this setup for 
years without trouble. The solution was to set -Duser.home= 
in the JVM Options in advanced configuration for the node. Since 
 is always on the local disk of the slave node, there 
should be no further file locking problems.


It would have saved me some pain if the error logging had been more 
descriptive to identify this problem. "I/O error" or "null" was not 
super useful in debugging this. And since we haven't had problems in the 
past, it didn't occur to me that this might be an issue. Regardless, it 
seems to be resolved now and hopefully this post will help somebody in 
the future.


Seth

On 8/28/19 11:34 AM, Seth Galitzer wrote:
For the last two weeks, I cannot launch the remoting agent on linux 
slaves. Server version is 2.191, running on Debian 9.9 (stretch), 
installed from jenkins.io repo. Slaves are Ubuntu 18.04 (bionic), with 
openjdk-8 installed. Eventually, one slave will start, but none of the 
rest will. Between reboots or restarts of the jenkins server, the slave 
that successfully connects is different each time. There are 22 linux 
slaves total. Working directory is on local disk for each slave. SSH 
user is from LDAP.


Can somebody help me figure out what is blocking the start of the agents?

Thanks.
Seth

Sample server log:
2019-08-28 16:11:35.107+ 
[id=740]SEVEREhudson.slaves.ChannelPinger#install: Failed to set up a 
ping for linux64-santos13-minion

java.io.IOException: Closing all channels
at com.trilead.ssh2.channel.Channel.setReasonClosed(Channel.java:333)
at 
com.trilead.ssh2.channel.ChannelManager.closeChannel(ChannelManager.java:289)
at 
com.trilead.ssh2.channel.ChannelManager.closeAllChannels(ChannelManager.java:269)

at com.trilead.ssh2.Connection.close(Connection.java:536)
at com.trilead.ssh2.Connection.close(Connection.java:530)
at 
hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:511)

at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:484)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297)
at 
jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at 
jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)

Caused: java.io.IOException: SSH channel is closed
at 
com.trilead.ssh2.channel.ChannelManager.ioException(ChannelManager.java:1540)

at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:373)
at 
com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
at 
com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:68)
at 
hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89)
at 
hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:62)
at 
hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)

at hudson.remoting.Channel.send(Channel.java:721)
at hudson.remoting.Request.call(Request.java:213)
at hudson.remoting.Channel.call(Channel.java:954)
at hudson.slaves.ChannelPinger.install(ChannelPinger.java:115)
at hudson.slaves.ChannelPinger.preOnline(ChannelPinger.java:98)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:667)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:435)
at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:607)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:113)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:441)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

at java.lang.Thread.run(Thread.java:748)
2019-08-28 16:11:35.107+ 
[id=846]INFOh.r.SynchronousCommandTransport$ReaderThread#run: I/O error 
in channel linux64-santos14-minion

java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)

at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.(ObjectI

unable to launch remoting agent on slaves

2019-08-28 Thread Seth Galitzer
For the last two weeks, I cannot launch the remoting agent on linux slaves. 
Server version is 2.191, running on Debian 9.9 (stretch), installed from 
jenkins.io repo. Slaves are Ubuntu 18.04 (bionic), with openjdk-8 
installed. Eventually, one slave will start, but none of the rest will. 
Between reboots or restarts of the jenkins server, the slave that 
successfully connects is different each time. There are 22 linux slaves 
total. Working directory is on local disk for each slave. SSH user is from 
LDAP.

Can somebody help me figure out what is blocking the start of the agents?

Thanks.
Seth

Sample server log:
2019-08-28 16:11:35.107+ [id=740] SEVERE 
hudson.slaves.ChannelPinger#install: 
Failed to set up a ping for linux64-santos13-minion
java.io.IOException: Closing all channels
at com.trilead.ssh2.channel.Channel.setReasonClosed(Channel.java:333)
at 
com.trilead.ssh2.channel.ChannelManager.closeChannel(ChannelManager.java:289)
at 
com.trilead.ssh2.channel.ChannelManager.closeAllChannels(ChannelManager.java:269)
at com.trilead.ssh2.Connection.close(Connection.java:536)
at com.trilead.ssh2.Connection.close(Connection.java:530)
at 
hudson.plugins.sshslaves.SSHLauncher.cleanupConnection(SSHLauncher.java:511)
at hudson.plugins.sshslaves.SSHLauncher.launch(SSHLauncher.java:484)
at hudson.slaves.SlaveComputer$1.call(SlaveComputer.java:297)
at 
jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
at 
jenkins.security.ImpersonatingExecutorService$2.call(ImpersonatingExecutorService.java:71)
Caused: java.io.IOException: SSH channel is closed
at 
com.trilead.ssh2.channel.ChannelManager.ioException(ChannelManager.java:1540)
at com.trilead.ssh2.channel.ChannelManager.sendData(ChannelManager.java:373)
at 
com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:63)
at 
com.trilead.ssh2.channel.ChannelOutputStream.write(ChannelOutputStream.java:68)
at 
hudson.remoting.ChunkedOutputStream.sendFrame(ChunkedOutputStream.java:89)
at 
hudson.remoting.ChunkedOutputStream.sendBreak(ChunkedOutputStream.java:62)
at 
hudson.remoting.ChunkedCommandTransport.writeBlock(ChunkedCommandTransport.java:46)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.write(AbstractSynchronousByteArrayCommandTransport.java:46)
at hudson.remoting.Channel.send(Channel.java:721)
at hudson.remoting.Request.call(Request.java:213)
at hudson.remoting.Channel.call(Channel.java:954)
at hudson.slaves.ChannelPinger.install(ChannelPinger.java:115)
at hudson.slaves.ChannelPinger.preOnline(ChannelPinger.java:98)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:667)
at hudson.slaves.SlaveComputer.setChannel(SlaveComputer.java:435)
at hudson.plugins.sshslaves.SSHLauncher.startAgent(SSHLauncher.java:607)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:113)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:441)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:406)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2019-08-28 16:11:35.107+ [id=846] INFO 
h.r.SynchronousCommandTransport$ReaderThread#run: 
I/O error in channel linux64-santos14-minion
java.io.EOFException
at 
java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2681)
at 
java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3156)
at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:862)
at java.io.ObjectInputStream.(ObjectInputStream.java:358)
at hudson.remoting.ObjectInputStreamEx.(ObjectInputStreamEx.java:49)
at hudson.remoting.Command.readFrom(Command.java:140)
at hudson.remoting.Command.readFrom(Command.java:126)
at 
hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:35)
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:63)
Caused: java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)

Sample remoting.log on slave:
Aug 28, 2019 11:11:35 AM 
hudson.remoting.SynchronousCommandTransport$ReaderThread run
INFO: I/O error in channel channel
java.io.IOException: Unexpected termination of the channel
at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:77)
Caused by: java.io.EOFException
at 
java.base/java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2763)
at 
java.base/java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:3258)
at 
java.base/java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:873)
at