[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase

2013-03-25 Thread sja...@enthought.com (JIRA)














































Simon Jagoe
 commented on  JENKINS-13330


Jenkins slave hangs in post build phase















I am working on a system that uses COM to communicate with a backend service. There are some test cases that hang in certain rare cases, which cause other (concurrent) builds of the same project to not complete, hanging in the build post-processing.

While we are working on trying to prevent hanging in our test cases, I do not think Jenkins should be waiting before sending emails. It makes Jenkins not very robust against problematic builds. In my opinion the build system should expect problems in the user scripts and be able to handle that.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase

2013-03-25 Thread sja...@enthought.com (JIRA)












































 
Simon Jagoe
 edited a comment on  JENKINS-13330


Jenkins slave hangs in post build phase
















I am working on a system that uses COM to communicate with a backend service. There are some test cases that hang in certain rare cases when using COM, which cause other (concurrent) builds of the same project to not complete, hanging in the build post-processing.

While we are working on trying to prevent hanging in our test cases, I do not think Jenkins should be waiting before sending emails. It makes Jenkins not very robust against problematic builds. In my opinion the build system should expect problems in the user scripts and be able to handle that.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira







-- 
You received this message because you are subscribed to the Google Groups Jenkins Issues group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.




[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase

2012-07-25 Thread stephen.morri...@intecbilling.com (JIRA)














































Stephen Morrison
 commented on  JENKINS-13330


Jenkins slave hangs in post build phase















I have seen the same thing.  One job of type A hanging (because of a problem in the script) causes all other jobs of type A to hang on completion.  This only happens for me when all the Job As were triggered from the same Job B.  i.e. Job B triggers 8 Job As.  One Job A hangs.  All the other Job As will also hang even though they have completed successfully.

I think that's a bug, jobs should be able to run in parallel without having dependencies on the success (or otherwise) of other jobs of the same type.



























This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators.
For more information on JIRA, see: http://www.atlassian.com/software/jira






[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase

2012-05-31 Thread k...@kohsuke.org (JIRA)

 [ 
https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kohsuke Kawaguchi updated JENKINS-13330:


Description: 
We have an intermittent problem with slaves hanging AFTER the job itself is 
finished. In the post processing step (?) what we see is that the console log 
has this line:

Description set: vap_current_iter-2012_03_29_19_01_03

And then nothing. Usually, it will look like this:

Description set: prod_pull-2012_03_28_19_01_03
Notifying upstream build armada_Launch_prod_pull #13 of job completion
Project armada_Launch_prod_pull still waiting for 1 builds to complete
Notifying upstream projects of job completion
Notifying upstream of completion: armada_Launch_prod_pull #13
Finished: SUCCESS

I setup a logger for hudson.model.Run, and it currently has this :

{noformat}
at java.lang.Thread.run(Thread.java:619)

Mar 30, 2012 12:44:00 PM hudson.model.Run run
INFO: galleon_allUnit #1134 main build action completed: SUCCESS
Mar 30, 2012 12:44:00 PM hudson.model.Run setResult
FINE: galleon_allUnit #1134 : result is set to SUCCESS
java.lang.Exception
at hudson.model.Run.setResult(Run.java:352)
at hudson.model.Run.run(Run.java:1410)
at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
at hudson.model.ResourceController.execute(ResourceController.java:88)
at hudson.model.Executor.run(Executor.java:238)
{noformat}

Repeated for every hung slave.

The main hudson log doesn't have any additional information.

Disconnecting the slave has no effect.

Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually 
appears to hang on shutdown).

The only way we have found to recover is to kill -9 the tomcat process.

The tread dump for one of the slaves (they are all the same) is:

{noformat}
Thread Dump
Channel reader thread: channel

Channel reader thread: channel Id=9 Group=main RUNNABLE (in native)
at java.io.FileInputStream.readBytes(Native Method)
at java.io.FileInputStream.read(FileInputStream.java:199)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
-  locked java.io.BufferedInputStream@1ae615a
at 
java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
at 
java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
at 
java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)


main

main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea
at java.lang.Object.wait(Native Method)
-  waiting on hudson.remoting.Channel@e1d5ea
at java.lang.Object.wait(Object.java:485)
at hudson.remoting.Channel.join(Channel.java:766)
at hudson.remoting.Launcher.main(Launcher.java:420)
at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
at hudson.remoting.Launcher.run(Launcher.java:206)
at hudson.remoting.Launcher.main(Launcher.java:168)


Ping thread for channel hudson.remoting.Channel@e1d5ea:channel

Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 
Group=main TIMED_WAITING
at java.lang.Thread.sleep(Native Method)
at hudson.remoting.PingThread.run(PingThread.java:86)


Pipe writer thread: channel

Pipe writer thread: channel Id=12 Group=main WAITING on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
at sun.misc.Unsafe.park(Native Method)
-  waiting on 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
at 
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
at 
java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:358)
at 
java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:947)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)


pool-1-thread-267

pool-1-thread-267 Id=285 Group=main RUNNABLE
at sun.management.ThreadImpl.dumpThreads0(Native Method)
at sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:374)
at hudson.Functions.getThreadInfos(Functions.java:872)
at 
hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:93)
at 
hudson.util.RemotingDiagnostics$GetThreadDump.call(RemotingDiagnostics.java:89)
at hudson.remoting.UserRequest.perform(UserRequest.java:118)
at hudson.remoting.UserRequest.perform(UserRequest.java:48)
at hudson.remoting.Request$2.run(Request.java:287)
at 

[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase

2012-05-31 Thread k...@kohsuke.org (JIRA)

[ 
https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=163389#comment-163389
 ] 

Kohsuke Kawaguchi commented on JENKINS-13330:
-

In the thread dump, #2284 is waiting for the build to complete, then later 
concurrent builds are waiting for that to finish before it sends out its 
failure notification.

 Jenkins slave hangs in post build phase
 ---

 Key: JENKINS-13330
 URL: https://issues.jenkins-ci.org/browse/JENKINS-13330
 Project: Jenkins
  Issue Type: Bug
  Components: master-slave, slave-status
 Environment: RHEL 5, both master and all slaves.
 Jenkins is running inside of Tomcat
Reporter: Clark Wright
Priority: Critical
 Attachments: jenkins-stall-threaddump.gz, 
 jenkins-stall-threaddump.gz, Screenshot-galleon_allIntegration #1196 Console 
 [Jenkins] - Mozilla Firefox.png


 We have an intermittent problem with slaves hanging AFTER the job itself is 
 finished. In the post processing step (?) what we see is that the console log 
 has this line:
 Description set: vap_current_iter-2012_03_29_19_01_03
 And then nothing. Usually, it will look like this:
 Description set: prod_pull-2012_03_28_19_01_03
 Notifying upstream build armada_Launch_prod_pull #13 of job completion
 Project armada_Launch_prod_pull still waiting for 1 builds to complete
 Notifying upstream projects of job completion
 Notifying upstream of completion: armada_Launch_prod_pull #13
 Finished: SUCCESS
 I setup a logger for hudson.model.Run, and it currently has this :
 {noformat}
 at java.lang.Thread.run(Thread.java:619)
 Mar 30, 2012 12:44:00 PM hudson.model.Run run
 INFO: galleon_allUnit #1134 main build action completed: SUCCESS
 Mar 30, 2012 12:44:00 PM hudson.model.Run setResult
 FINE: galleon_allUnit #1134 : result is set to SUCCESS
 java.lang.Exception
 at hudson.model.Run.setResult(Run.java:352)
 at hudson.model.Run.run(Run.java:1410)
 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
 at hudson.model.ResourceController.execute(ResourceController.java:88)
 at hudson.model.Executor.run(Executor.java:238)
 {noformat}
 Repeated for every hung slave.
 The main hudson log doesn't have any additional information.
 Disconnecting the slave has no effect.
 Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually 
 appears to hang on shutdown).
 The only way we have found to recover is to kill -9 the tomcat process.
 The tread dump for one of the slaves (they are all the same) is:
 {noformat}
 Thread Dump
 Channel reader thread: channel
 Channel reader thread: channel Id=9 Group=main RUNNABLE (in native)
 at java.io.FileInputStream.readBytes(Native Method)
 at java.io.FileInputStream.read(FileInputStream.java:199)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 -  locked java.io.BufferedInputStream@1ae615a
 at 
 java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
 at 
 java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
 at 
 java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
 at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
 main
 main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea
 at java.lang.Object.wait(Native Method)
 -  waiting on hudson.remoting.Channel@e1d5ea
 at java.lang.Object.wait(Object.java:485)
 at hudson.remoting.Channel.join(Channel.java:766)
 at hudson.remoting.Launcher.main(Launcher.java:420)
 at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
 at hudson.remoting.Launcher.run(Launcher.java:206)
 at hudson.remoting.Launcher.main(Launcher.java:168)
 Ping thread for channel hudson.remoting.Channel@e1d5ea:channel
 Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 
 Group=main TIMED_WAITING
 at java.lang.Thread.sleep(Native Method)
 at hudson.remoting.PingThread.run(PingThread.java:86)
 Pipe writer thread: channel
 Pipe writer thread: channel Id=12 Group=main WAITING on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
 at sun.misc.Unsafe.park(Native Method)
 -  waiting on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
 at java.util.concurrent.locks.LockSupport.park(LockSupport.java:158)
 at 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:1925)
 at 
 

[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase

2012-05-31 Thread k...@kohsuke.org (JIRA)

[ 
https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=163390#comment-163390
 ] 

Kohsuke Kawaguchi commented on JENKINS-13330:
-

#2284 is waiting for the shell script build step to complete. Later builds are 
past that point, but those are blocked at the point of e-mail notification, 
waiting for the previous build to complete (or else it won't be able to decide 
what to send out in the e-mail.)

So at this point, my reading of this is that this is not a bug in Jenkins but 
simply a hang in the user script, but if you suspect otherwise, please wait for 
the next hang to occur, then obtain the thread dump both on the master and the 
slave, and check what the first blocked build is doing.

As I've explained above, concurrent executions of the same job can block if the 
earlier build is blocked, so we need to focus on the root cause, which is why 
the first of the blocked builds is blocking.

 Jenkins slave hangs in post build phase
 ---

 Key: JENKINS-13330
 URL: https://issues.jenkins-ci.org/browse/JENKINS-13330
 Project: Jenkins
  Issue Type: Bug
  Components: master-slave, slave-status
 Environment: RHEL 5, both master and all slaves.
 Jenkins is running inside of Tomcat
Reporter: Clark Wright
Priority: Critical
 Attachments: jenkins-stall-threaddump.gz, 
 jenkins-stall-threaddump.gz, Screenshot-galleon_allIntegration #1196 Console 
 [Jenkins] - Mozilla Firefox.png


 We have an intermittent problem with slaves hanging AFTER the job itself is 
 finished. In the post processing step (?) what we see is that the console log 
 has this line:
 Description set: vap_current_iter-2012_03_29_19_01_03
 And then nothing. Usually, it will look like this:
 Description set: prod_pull-2012_03_28_19_01_03
 Notifying upstream build armada_Launch_prod_pull #13 of job completion
 Project armada_Launch_prod_pull still waiting for 1 builds to complete
 Notifying upstream projects of job completion
 Notifying upstream of completion: armada_Launch_prod_pull #13
 Finished: SUCCESS
 I setup a logger for hudson.model.Run, and it currently has this :
 {noformat}
 at java.lang.Thread.run(Thread.java:619)
 Mar 30, 2012 12:44:00 PM hudson.model.Run run
 INFO: galleon_allUnit #1134 main build action completed: SUCCESS
 Mar 30, 2012 12:44:00 PM hudson.model.Run setResult
 FINE: galleon_allUnit #1134 : result is set to SUCCESS
 java.lang.Exception
 at hudson.model.Run.setResult(Run.java:352)
 at hudson.model.Run.run(Run.java:1410)
 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
 at hudson.model.ResourceController.execute(ResourceController.java:88)
 at hudson.model.Executor.run(Executor.java:238)
 {noformat}
 Repeated for every hung slave.
 The main hudson log doesn't have any additional information.
 Disconnecting the slave has no effect.
 Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually 
 appears to hang on shutdown).
 The only way we have found to recover is to kill -9 the tomcat process.
 The tread dump for one of the slaves (they are all the same) is:
 {noformat}
 Thread Dump
 Channel reader thread: channel
 Channel reader thread: channel Id=9 Group=main RUNNABLE (in native)
 at java.io.FileInputStream.readBytes(Native Method)
 at java.io.FileInputStream.read(FileInputStream.java:199)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 -  locked java.io.BufferedInputStream@1ae615a
 at 
 java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
 at 
 java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
 at 
 java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
 at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
 main
 main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea
 at java.lang.Object.wait(Native Method)
 -  waiting on hudson.remoting.Channel@e1d5ea
 at java.lang.Object.wait(Object.java:485)
 at hudson.remoting.Channel.join(Channel.java:766)
 at hudson.remoting.Launcher.main(Launcher.java:420)
 at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
 at hudson.remoting.Launcher.run(Launcher.java:206)
 at hudson.remoting.Launcher.main(Launcher.java:168)
 Ping thread for channel hudson.remoting.Channel@e1d5ea:channel
 Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 
 Group=main TIMED_WAITING
 at java.lang.Thread.sleep(Native Method)
 at 

[JIRA] (JENKINS-13330) Jenkins slave hangs in post build phase

2012-05-18 Thread rsch...@box.com (JIRA)

[ 
https://issues.jenkins-ci.org/browse/JENKINS-13330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=162921#comment-162921
 ] 

Randall Schulz commented on JENKINS-13330:
--

I am experiencing what I believe to be the same problem. Because of the timing 
of the most recent occurrence of the event on my system, I had 8 jobs in this 
stalled state. Most interestingly, I discovered that stopping just one of these 
jobs (as in clicking the red X icon in the Build Executor Status section) 
caused not only that one job to terminate, 4 others went with it. When I 
stopped one of the remaining 3 jobs, they all disappeared.

Perhaps this is a clue.

I have attached a thread dump file.

 Jenkins slave hangs in post build phase
 ---

 Key: JENKINS-13330
 URL: https://issues.jenkins-ci.org/browse/JENKINS-13330
 Project: Jenkins
  Issue Type: Bug
  Components: master-slave, slave-status
 Environment: RHEL 5, both master and all slaves.
 Jenkins is running inside of Tomcat
Reporter: Clark Wright
Priority: Critical
 Attachments: jenkins-stall-threaddump.gz, 
 Screenshot-galleon_allIntegration #1196 Console [Jenkins] - Mozilla 
 Firefox.png


 We have an intermittent problem with slaves hanging AFTER the job itself is 
 finished. In the post processing step (?) what we see is that the console log 
 has this line:
 Description set: vap_current_iter-2012_03_29_19_01_03
 And then nothing. Usually, it will look like this:
 Description set: prod_pull-2012_03_28_19_01_03
 Notifying upstream build armada_Launch_prod_pull #13 of job completion
 Project armada_Launch_prod_pull still waiting for 1 builds to complete
 Notifying upstream projects of job completion
 Notifying upstream of completion: armada_Launch_prod_pull #13
 Finished: SUCCESS
 I setup a logger for hudson.model.Run, and it currently has this :
 at java.lang.Thread.run(Thread.java:619)
 Mar 30, 2012 12:44:00 PM hudson.model.Run run
 INFO: galleon_allUnit #1134 main build action completed: SUCCESS
 Mar 30, 2012 12:44:00 PM hudson.model.Run setResult
 FINE: galleon_allUnit #1134 : result is set to SUCCESS
 java.lang.Exception
 at hudson.model.Run.setResult(Run.java:352)
 at hudson.model.Run.run(Run.java:1410)
 at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:46)
 at hudson.model.ResourceController.execute(ResourceController.java:88)
 at hudson.model.Executor.run(Executor.java:238)
 Repeated for every hung slave.
 The main hudson log doesn't have any additional information.
 Disconnecting the slave has no effect.
 Trying to do an orderly shutdown of Jenkins has no effect (jenkins actually 
 appears to hang on shutdown).
 The only way we have found to recover is to kill -9 the tomcat process.
 The tread dump for one of the slaves (they are all the same) is:
 Thread Dump
 Channel reader thread: channel
 Channel reader thread: channel Id=9 Group=main RUNNABLE (in native)
 at java.io.FileInputStream.readBytes(Native Method)
 at java.io.FileInputStream.read(FileInputStream.java:199)
 at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
 at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
 -  locked java.io.BufferedInputStream@1ae615a
 at 
 java.io.ObjectInputStream$PeekInputStream.peek(ObjectInputStream.java:2249)
 at 
 java.io.ObjectInputStream$BlockDataInputStream.peek(ObjectInputStream.java:2542)
 at 
 java.io.ObjectInputStream$BlockDataInputStream.peekByte(ObjectInputStream.java:2552)
 at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1297)
 at java.io.ObjectInputStream.readObject(ObjectInputStream.java:351)
 at hudson.remoting.Channel$ReaderThread.run(Channel.java:1030)
 main
 main Id=1 Group=main WAITING on hudson.remoting.Channel@e1d5ea
 at java.lang.Object.wait(Native Method)
 -  waiting on hudson.remoting.Channel@e1d5ea
 at java.lang.Object.wait(Object.java:485)
 at hudson.remoting.Channel.join(Channel.java:766)
 at hudson.remoting.Launcher.main(Launcher.java:420)
 at hudson.remoting.Launcher.runWithStdinStdout(Launcher.java:366)
 at hudson.remoting.Launcher.run(Launcher.java:206)
 at hudson.remoting.Launcher.main(Launcher.java:168)
 Ping thread for channel hudson.remoting.Channel@e1d5ea:channel
 Ping thread for channel hudson.remoting.Channel@e1d5ea:channel Id=10 
 Group=main TIMED_WAITING
 at java.lang.Thread.sleep(Native Method)
 at hudson.remoting.PingThread.run(PingThread.java:86)
 Pipe writer thread: channel
 Pipe writer thread: channel Id=12 Group=main WAITING on 
 java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@14263ed
 at sun.misc.Unsafe.park(Native Method)
 -  waiting on