[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase
Simon Jagoe commented on JENKINS-13330 Jenkins slave hangs in post build phase I am working on a system that uses COM to communicate with a backend service. There are some test cases that hang in certain rare cases, which cause other (concurrent) builds of the same project to not complete, hanging in the build post-processing. While we are working on trying to prevent hanging in our test cases, I do not think Jenkins should be waiting before sending emails. It makes Jenkins not very robust against problematic builds. In my opinion the build system should expect problems in the user scripts and be able to handle that. This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators. For more information on JIRA, see: http://www.atlassian.com/software/jira -- You received this message because you are subscribed to the Google Groups Jenkins Issues group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase
Simon Jagoe edited a comment on JENKINS-13330 Jenkins slave hangs in post build phase I am working on a system that uses COM to communicate with a backend service. There are some test cases that hang in certain rare cases when using COM, which cause other (concurrent) builds of the same project to not complete, hanging in the build post-processing. While we are working on trying to prevent hanging in our test cases, I do not think Jenkins should be waiting before sending emails. It makes Jenkins not very robust against problematic builds. In my opinion the build system should expect problems in the user scripts and be able to handle that. This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators. For more information on JIRA, see: http://www.atlassian.com/software/jira -- You received this message because you are subscribed to the Google Groups Jenkins Issues group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/groups/opt_out.
[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase
Stephen Morrison commented on JENKINS-13330 Jenkins slave hangs in post build phase I have seen the same thing. One job of type A hanging (because of a problem in the script) causes all other jobs of type A to hang on completion. This only happens for me when all the Job As were triggered from the same Job B. i.e. Job B triggers 8 Job As. One Job A hangs. All the other Job As will also hang even though they have completed successfully. I think that's a bug, jobs should be able to run in parallel without having dependencies on the success (or otherwise) of other jobs of the same type. This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators. For more information on JIRA, see: http://www.atlassian.com/software/jira
[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase
[ https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kohsuke Kawaguchi updated JENKINS-13330: Description: We have an intermittent problem with slaves hanging AFTER the job itself is finished. In the post processing step (?) what we see is that the console log has this line: Description set: vap_current_iter-2012_03_29_19_01_03 And then nothing. Usually, it will look like this: Description set: prod_pull-2012_03_28_19_01_03 Notifying upstream build armada_Launch_prod_pull #13 of job completion Project armada_Launch_prod_pull still waiting for 1 builds to complete Notifying upstream projects of job completion Notifying upstream of completion: armada_Launch_prod_pull #13 Finished: SUCCESS I setup a logger for hudson.model.Run, and it currently has this : {noformat} at java.lang.Thread.run(Thread.java:619) Mar 30, 2012 12:44:00 PM hudson.model.Run run INFO: galleon_allUnit #1134 main build action completed: SUCCESS Mar 30, 2012 12:44:00 PM hudson.model.Run setResult FINE: galleon_allUnit #1134 : result is set to SUCCESS java.lang.Exception at hudson.model.Run.setResult(Run.java:352) at hudson.model.Run.run(Run.java:1410) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:238) {noformat} Repeated for every hung slave. The main hudson log doesn't have any additional information. Disconnecting the slave has no effect. Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually appears to hang on shutdown). The only way we have found to recover is to kill -9 the tomcat process. The tread dump for one of the slaves (they are all the same) is: {noformat} Thread Dump Channel reader thread: channel Channel reader thread: channel Id=9 Group=main RUNNABLE (in native) at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked java.io.BufferedInputStream@1ae615a at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) main main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Object.java:485) at hudson.remoting.Channel.join(Channel.java:766) at hudson.remoting.Launcher.main(Launcher.java:420) at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) at hudson.remoting.Launcher.run(Launcher.java:206) at hudson.remoting.Launcher.main(Launcher.java:168) Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 Group=main TIMED_WAITING at java.lang.Thread.sleep(Native Method) at hudson.remoting.PingThread.run(PingThread.java:86) Pipe writer thread: channel Pipe writer thread: channel Id=12 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358) at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) pool-1-thread-267 pool-1-thread-267 Id=285 Group=main RUNNABLE at sun.management.ThreadImpl.dumpThreads0(Native Method) at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374) at hudson.Functions.getThreadInfos(Functions.java:872) at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93) at hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89) at hudson.remoting.UserRequest.perform(UserRequest.java:118) at hudson.remoting.UserRequest.perform(UserRequest.java:48) at hudson.remoting.Request$2.run(Request.java:287) at
[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase
[ https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=163389#comment-163389 ] Kohsuke Kawaguchi commented on JENKINS-13330: - In the thread dump, #2284 is waiting for the build to complete, then later concurrent builds are waiting for that to finish before it sends out its failure notification. Jenkins slave hangs in post build phase --- Key: JENKINS-13330 URL: https://issues.jenkins-ci.org/browse/JENKINS-13330 Project: Jenkins Issue Type: Bug Components: master-slave, slave-status Environment: RHEL 5, both master and all slaves. Jenkins is running inside of Tomcat Reporter: Clark Wright Priority: Critical Attachments: jenkins-stall-threaddump.gz, jenkins-stall-threaddump.gz, Screenshot-galleon_allIntegration #1196 Console [Jenkins] - Mozilla Firefox.png We have an intermittent problem with slaves hanging AFTER the job itself is finished. In the post processing step (?) what we see is that the console log has this line: Description set: vap_current_iter-2012_03_29_19_01_03 And then nothing. Usually, it will look like this: Description set: prod_pull-2012_03_28_19_01_03 Notifying upstream build armada_Launch_prod_pull #13 of job completion Project armada_Launch_prod_pull still waiting for 1 builds to complete Notifying upstream projects of job completion Notifying upstream of completion: armada_Launch_prod_pull #13 Finished: SUCCESS I setup a logger for hudson.model.Run, and it currently has this : {noformat} at java.lang.Thread.run(Thread.java:619) Mar 30, 2012 12:44:00 PM hudson.model.Run run INFO: galleon_allUnit #1134 main build action completed: SUCCESS Mar 30, 2012 12:44:00 PM hudson.model.Run setResult FINE: galleon_allUnit #1134 : result is set to SUCCESS java.lang.Exception at hudson.model.Run.setResult(Run.java:352) at hudson.model.Run.run(Run.java:1410) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:238) {noformat} Repeated for every hung slave. The main hudson log doesn't have any additional information. Disconnecting the slave has no effect. Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually appears to hang on shutdown). The only way we have found to recover is to kill -9 the tomcat process. The tread dump for one of the slaves (they are all the same) is: {noformat} Thread Dump Channel reader thread: channel Channel reader thread: channel Id=9 Group=main RUNNABLE (in native) at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked java.io.BufferedInputStream@1ae615a at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) main main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Object.java:485) at hudson.remoting.Channel.join(Channel.java:766) at hudson.remoting.Launcher.main(Launcher.java:420) at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) at hudson.remoting.Launcher.run(Launcher.java:206) at hudson.remoting.Launcher.main(Launcher.java:168) Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 Group=main TIMED_WAITING at java.lang.Thread.sleep(Native Method) at hudson.remoting.PingThread.run(PingThread.java:86) Pipe writer thread: channel Pipe writer thread: channel Id=12 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed at sun.misc.Unsafe.park(Native Method) - waiting on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158) at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925) at
[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase
[ https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=163390#comment-163390 ] Kohsuke Kawaguchi commented on JENKINS-13330: - #2284 is waiting for the shell script build step to complete. Later builds are past that point, but those are blocked at the point of e-mail notification, waiting for the previous build to complete (or else it won't be able to decide what to send out in the e-mail.) So at this point, my reading of this is that this is not a bug in Jenkins but simply a hang in the user script, but if you suspect otherwise, please wait for the next hang to occur, then obtain the thread dump both on the master and the slave, and check what the first blocked build is doing. As I've explained above, concurrent executions of the same job can block if the earlier build is blocked, so we need to focus on the root cause, which is why the first of the blocked builds is blocking. Jenkins slave hangs in post build phase --- Key: JENKINS-13330 URL: https://issues.jenkins-ci.org/browse/JENKINS-13330 Project: Jenkins Issue Type: Bug Components: master-slave, slave-status Environment: RHEL 5, both master and all slaves. Jenkins is running inside of Tomcat Reporter: Clark Wright Priority: Critical Attachments: jenkins-stall-threaddump.gz, jenkins-stall-threaddump.gz, Screenshot-galleon_allIntegration #1196 Console [Jenkins] - Mozilla Firefox.png We have an intermittent problem with slaves hanging AFTER the job itself is finished. In the post processing step (?) what we see is that the console log has this line: Description set: vap_current_iter-2012_03_29_19_01_03 And then nothing. Usually, it will look like this: Description set: prod_pull-2012_03_28_19_01_03 Notifying upstream build armada_Launch_prod_pull #13 of job completion Project armada_Launch_prod_pull still waiting for 1 builds to complete Notifying upstream projects of job completion Notifying upstream of completion: armada_Launch_prod_pull #13 Finished: SUCCESS I setup a logger for hudson.model.Run, and it currently has this : {noformat} at java.lang.Thread.run(Thread.java:619) Mar 30, 2012 12:44:00 PM hudson.model.Run run INFO: galleon_allUnit #1134 main build action completed: SUCCESS Mar 30, 2012 12:44:00 PM hudson.model.Run setResult FINE: galleon_allUnit #1134 : result is set to SUCCESS java.lang.Exception at hudson.model.Run.setResult(Run.java:352) at hudson.model.Run.run(Run.java:1410) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:238) {noformat} Repeated for every hung slave. The main hudson log doesn't have any additional information. Disconnecting the slave has no effect. Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually appears to hang on shutdown). The only way we have found to recover is to kill -9 the tomcat process. The tread dump for one of the slaves (they are all the same) is: {noformat} Thread Dump Channel reader thread: channel Channel reader thread: channel Id=9 Group=main RUNNABLE (in native) at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked java.io.BufferedInputStream@1ae615a at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) main main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Object.java:485) at hudson.remoting.Channel.join(Channel.java:766) at hudson.remoting.Launcher.main(Launcher.java:420) at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) at hudson.remoting.Launcher.run(Launcher.java:206) at hudson.remoting.Launcher.main(Launcher.java:168) Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 Group=main TIMED_WAITING at java.lang.Thread.sleep(Native Method) at
[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase
[ https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=162921#comment-162921 ] Randall Schulz commented on JENKINS-13330: -- I am experiencing what I believe to be the same problem. Because of the timing of the most recent occurrence of the event on my system, I had 8 jobs in this stalled state. Most interestingly, I discovered that stopping just one of these jobs (as in clicking the red X icon in the Build Executor Status section) caused not only that one job to terminate, 4 others went with it. When I stopped one of the remaining 3 jobs, they all disappeared. Perhaps this is a clue. I have attached a thread dump file. Jenkins slave hangs in post build phase --- Key: JENKINS-13330 URL: https://issues.jenkins-ci.org/browse/JENKINS-13330 Project: Jenkins Issue Type: Bug Components: master-slave, slave-status Environment: RHEL 5, both master and all slaves. Jenkins is running inside of Tomcat Reporter: Clark Wright Priority: Critical Attachments: jenkins-stall-threaddump.gz, Screenshot-galleon_allIntegration #1196 Console [Jenkins] - Mozilla Firefox.png We have an intermittent problem with slaves hanging AFTER the job itself is finished. In the post processing step (?) what we see is that the console log has this line: Description set: vap_current_iter-2012_03_29_19_01_03 And then nothing. Usually, it will look like this: Description set: prod_pull-2012_03_28_19_01_03 Notifying upstream build armada_Launch_prod_pull #13 of job completion Project armada_Launch_prod_pull still waiting for 1 builds to complete Notifying upstream projects of job completion Notifying upstream of completion: armada_Launch_prod_pull #13 Finished: SUCCESS I setup a logger for hudson.model.Run, and it currently has this : at java.lang.Thread.run(Thread.java:619) Mar 30, 2012 12:44:00 PM hudson.model.Run run INFO: galleon_allUnit #1134 main build action completed: SUCCESS Mar 30, 2012 12:44:00 PM hudson.model.Run setResult FINE: galleon_allUnit #1134 : result is set to SUCCESS java.lang.Exception at hudson.model.Run.setResult(Run.java:352) at hudson.model.Run.run(Run.java:1410) at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46) at hudson.model.ResourceController.execute(ResourceController.java:88) at hudson.model.Executor.run(Executor.java:238) Repeated for every hung slave. The main hudson log doesn't have any additional information. Disconnecting the slave has no effect. Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually appears to hang on shutdown). The only way we have found to recover is to kill -9 the tomcat process. The tread dump for one of the slaves (they are all the same) is: Thread Dump Channel reader thread: channel Channel reader thread: channel Id=9 Group=main RUNNABLE (in native) at java.io.FileInputStream.readBytes(Native Method) at java.io.FileInputStream.read(FileInputStream.java:199) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked java.io.BufferedInputStream@1ae615a at java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249) at java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542) at java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351) at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030) main main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Native Method) - waiting on hudson.remoting.Channel@e1d5ea at java.lang.Object.wait(Object.java:485) at hudson.remoting.Channel.join(Channel.java:766) at hudson.remoting.Launcher.main(Launcher.java:420) at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366) at hudson.remoting.Launcher.run(Launcher.java:206) at hudson.remoting.Launcher.main(Launcher.java:168) Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 Group=main TIMED_WAITING at java.lang.Thread.sleep(Native Method) at hudson.remoting.PingThread.run(PingThread.java:86) Pipe writer thread: channel Pipe writer thread: channel Id=12 Group=main WAITING on java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed at sun.misc.Unsafe.park(Native Method) - waiting on