[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-03-03 Thread yoann.dubre...@gmail.com (JIRA)














































Yoann Dubreuil
 commented on  JENKINS-26947


Unattended wait in the remoting code















Just created a PR: https://github.com/jenkinsci/maven-plugin/pull/39



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-03-03 Thread te...@java.net (JIRA)














































James Nord
 commented on  JENKINS-26947


Unattended wait in the remoting code















FWIW the original report has nothing to do with packet corruption - just the channel dying.

You can get the same results with a "kill -9" on the slave.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-03-03 Thread yoann.dubre...@gmail.com (JIRA)














































Yoann Dubreuil
 commented on  JENKINS-26947


Unattended wait in the remoting code















Yes that's right. I found the problem when playing with netem, hence the bug report.

It's a bug in the Maven plugin. When upstream channel is closed, Maven channel stays around. Will post a PR shortly.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-25 Thread te...@java.net (JIRA)












































 
James Nord
 edited a comment on  JENKINS-26947


Unattended wait in the remoting code
















possibly a duplicate of JENKINS-10840

Soemthing strange is going on with Docker and tc.

with 2 freestyle builds I see a failure and the salve is disconnected with.

java.io.IOException: remote file operation failed: /home/jenkins/data/jenkins-slave.exe at hudson.remoting.Channel@7407d0f5:docker_ssh: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.FilePath.act(FilePath.java:985)
	at hudson.FilePath.act(FilePath.java:967)
	at hudson.FilePath.exists(FilePath.java:1435)
	at org.jenkinsci.modules.windows_slave_installer.SlaveExeUpdater$1.call(SlaveExeUpdater.java:46)
	at org.jenkinsci.modules.windows_slave_installer.SlaveExeUpdater$1.call(SlaveExeUpdater.java:37)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:549)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:751)
	at hudson.FilePath.act(FilePath.java:978)
	... 9 more
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
	at java.io.ObjectInputStream.init(ObjectInputStream.java:299)
	at hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
ERROR: Socket connection to SSH server was lost
java.io.IOException: Peer sent DISCONNECT message (reason code 2): Packet corrupt
	at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:766)
	at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:489)
	at java.lang.Thread.run(Thread.java:745)



But a single bit packet corruption should cause the packet to be thrown away by the OS layer due to a TCP checksum miss-match and not to be seen by the application.

The other interesting thing is that a build can be runnign fine and it only dies when a new build is kicked off - I would not expect an issue in setting up a new channel in the multiplex (from what KK said) to fail the other channels.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-25 Thread te...@java.net (JIRA)














































James Nord
 commented on  JENKINS-26947


Unattended wait in the remoting code















possibly a duplicate of JENKINS-10840

Soemthing strange is going on with Docker and tc.

with 2 freestyle builds I see a failure and the salve is disconnected with.
noformat
java.io.IOException: remote file operation failed: /home/jenkins/data/jenkins-slave.exe at hudson.remoting.Channel@7407d0f5:docker_ssh: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.FilePath.act(FilePath.java:985)
	at hudson.FilePath.act(FilePath.java:967)
	at hudson.FilePath.exists(FilePath.java:1435)
	at org.jenkinsci.modules.windows_slave_installer.SlaveExeUpdater$1.call(SlaveExeUpdater.java:46)
	at org.jenkinsci.modules.windows_slave_installer.SlaveExeUpdater$1.call(SlaveExeUpdater.java:37)
	at jenkins.util.ContextResettingExecutorService$2.call(ContextResettingExecutorService.java:46)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)
Caused by: hudson.remoting.ChannelClosedException: channel is already closed
	at hudson.remoting.Channel.send(Channel.java:549)
	at hudson.remoting.Request.call(Request.java:129)
	at hudson.remoting.Channel.call(Channel.java:751)
	at hudson.FilePath.act(FilePath.java:978)
	... 9 more
Caused by: java.io.IOException: Unexpected termination of the channel
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:50)
Caused by: java.io.EOFException
	at java.io.ObjectInputStream$PeekInputStream.readFully(ObjectInputStream.java:2325)
	at java.io.ObjectInputStream$BlockDataInputStream.readShort(ObjectInputStream.java:2794)
	at java.io.ObjectInputStream.readStreamHeader(ObjectInputStream.java:801)
	at java.io.ObjectInputStream.init(ObjectInputStream.java:299)
	at hudson.remoting.ObjectInputStreamEx.init(ObjectInputStreamEx.java:40)
	at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)
	at hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)
ERROR: Socket connection to SSH server was lost
java.io.IOException: Peer sent DISCONNECT message (reason code 2): Packet corrupt
	at com.trilead.ssh2.transport.TransportManager.receiveLoop(TransportManager.java:766)
	at com.trilead.ssh2.transport.TransportManager$1.run(TransportManager.java:489)
	at java.lang.Thread.run(Thread.java:745)
noformat

But a single bit packet corruption should cause the packet to be thrown away by the OS layer due to a TCP checksum miss-match and not to be seen by the application.

The other interesting thing is that a build can be runnign fine and it only dies when a new build is kicked off - I would not expect an issue in setting up a new channel in the multiplex (from what KK said) to fail the other channels.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-25 Thread te...@java.net (JIRA)














































James Nord
 commented on  JENKINS-26947


Unattended wait in the remoting code















Have you disabled the PIngThread at all?

AFAICT netem does not kill the connection - the remote end will be retransmitting the packets - and as such the channel is not closed.  
The PingThread should eventually notice this (10 minutes interval + 4 minute timeout) so after at most 14 minutes the connection should be killed and this thread unblock.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-25 Thread yoann.dubre...@gmail.com (JIRA)














































Yoann Dubreuil
 commented on  JENKINS-26947


Unattended wait in the remoting code















No, I did not disable the ping thread. In fact, I did nothing special, just started a fresh Jenkins instance and connected it to this docker slave. I took the thread dump 30 minutes after the disconnection. Will relaunch the test this afternoon to see if it would ever times out or not.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-24 Thread yoann.dubre...@gmail.com (JIRA)














































Yoann Dubreuil
 updated  JENKINS-26947


Unattended wait in the remoting code
















Change By:


Yoann Dubreuil
(24/Feb/15 10:32 PM)




Description:


Ifindawaytotriggeraremotingproblemusingtcpfaultinjectionwithnetem.Imabletotriggerthiswaitcallathudson.remoting.Request.call(Request.java:146):{
{
code}
while(response==null!channel.isInClosed())//Idontknowexactlywhenthiscanhappen,aspendingCallsarecleanedupbyChannel,//butinproductionIveobservedthatinrareoccasionitcanblockforever,evenafterachannel//isgone.Sobedefensiveagainstthat.wait(30*1000);
{code
}
}
Whenthiswaitistriggered,therunningbuildisstuckandconsumesaexecutor.Itloopsoverandoveronthewait.Toreproduce,setupaSSHslaveusingtheattachedDockerfile,andsetupnetemonthedocker0bridgelikethis:
{code}
tcqdiscadddevdocker0rootnetemtcqdiscchangedevdocker0rootnetemcorrupt1
{code}


Testingrequirestorunthejobonetimebeforeconfiguringnetem,asnetemsettingsareappliedtoallnetworkstreams,itcouldfailwhiledownloadingMavendependencies.IjustlaunchedaMavenbuildofaexampleprojecttotriggertheproblem.ItmightbeaMavenspecificproblem...Toremovenetemsettings,justruntcqdiscdeldevdocker0root.IveattachedtheDockerfile,thecommandIusedtolaunchitandathreaddumpofaJenkinsstuckmaster.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-13 Thread dan...@beckweb.net (JIRA)














































Daniel Beck
 commented on  JENKINS-26947


Unattended wait in the remoting code















Is this a security issue? E.g. is this exploitable by third parties to disrupt network reachable Jenkins service?



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-13 Thread yoann.dubre...@gmail.com (JIRA)














































Yoann Dubreuil
 commented on  JENKINS-26947


Unattended wait in the remoting code















You must be on the path of the network stream to be able to change the packet content. Even if you are able to get there, the SSH protocol protects the content of the stream. You would only be able to trigger a disconnection, but at this stage, I bet a lot of other network services are in danger.

So for me, it's not a security issue.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [remoting] (JENKINS-26947) Unattended wait in the remoting code

2015-02-12 Thread yoann.dubre...@gmail.com (JIRA)














































Yoann Dubreuil
 created  JENKINS-26947


Unattended wait in the remoting code















Issue Type:


Bug



Assignee:


Unassigned


Attachments:


Dockerfile, launch.sh, stacktrace.txt



Components:


remoting



Created:


12/Feb/15 10:36 PM



Description:


I find a way to trigger a remoting problem using tcp fault injection with netem. I'm able to trigger this wait call at hudson.remoting.Request.call(Request.java:146):

{{
while(response==null  !channel.isInClosed())
  // I don't know exactly when this can happen, as pendingCalls are cleaned up by Channel,
  // but in production I've observed that in rare occasion it can block forever, even after a channel
  // is gone. So be defensive against that.
  wait(30*1000);
}}

When this wait is triggered, the running build is stuck and consumes a executor. It loops over and over on the wait.

To reproduce, setup a SSH slave using the attached Dockerfile, and setup netem on the docker0 bridge like this:

tc qdisc add dev docker0 root netem
tc qdisc change dev docker0 root netem corrupt 1

Testing requires to run the job one time before configuring netem, as netem settings are applied to all network streams, it could fail while downloading Maven dependencies. I just launched a Maven build of a example project to trigger the problem. It might be a Maven specific problem...

To remove netem settings, just run tc qdisc del dev docker0 root.

I've attached the Dockerfile, the command I used to launch it and a threaddump of a Jenkins stuck master.




Environment:


Linux




Project:


Jenkins



Priority:


Minor



Reporter:


Yoann Dubreuil

























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.