[JIRA] (JENKINS-51057) EventDispatcher and ConcurrentLinkedQueue ate my JVM

2019-03-24 Thread airad...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Álvaro Iradier commented on  JENKINS-51057  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM   
 

  
 
 
 
 

 
 Hi Maxfield Stewart, yes I also got the exception after some tests in our production environment, and had to wrap in try-catch. I just updated my comment with latest version of the script we are running. Probably the dispatcher.stop() is not necessary, as I mostly always get an exception, but it won't hurt. Also I noticed that the part in the " Dispatcher is not in HTTP Sessions. Clearing" was not executed most of the times, so I copied the cleanup part (unsubscribeAll() and the .stop()) to the "Clearing retryQueue with..." part. After a couple of weeks running I examined the thread dumps and saw no trace of any leak regarding the EventDispatcher$Retry elements, so for us it is working. Looking forward for a fix in the official upstream version of the plugin   
 

  
 
 
 
 

 
 
 

 
 
 Add Comment  
 

  
 

  
 
 
 
  
 

  
 
 
 
 

 
 This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d)  
 

  
 

   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] (JENKINS-51057) EventDispatcher and ConcurrentLinkedQueue ate my JVM

2019-03-24 Thread airad...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Álvaro Iradier edited a comment on  JENKINS-51057  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM   
 

  
 
 
 
 

 
 After many tests and cleanup scripts, we still we able to find some EventDispatcher$Retry elements in the heap but no subscribers in the GuavaPubSubBus or EventDispatchers in the HttpSessions. Some additional verifications in the plugin made me notice that as AsyncEventDispatcher was being used, it is possible that the EventDispatchers were still being referenced by some asyncContexts or threads that were not completed or released. Finally I also added a dispatcher.stop() call in my cleanup script, and it looks like there are no traces of leaked EventDispatcher$Retry clases in the heap anymore, but we still keep observing our instance and analyzing heap dumps, it is very soon to confirm.Just in case it can help, this is the cleanup script we are running daily:{code:java}import org.jenkinsci.plugins.pubsub.PubsubBus;import org.jenkinsci.plugins.ssegateway.sse.*;def dryRun = falsethis.bus = PubsubBus.getBus();// change visibility of retryQueue so that we can use reflection instead...def retryQueueField = org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.getDeclaredField('retryQueue')retryQueueField.setAccessible(true)def dispatcherCount = 0def dispatchersList = []//Build a list of EventDispatchers in all existing HTTP sessionsprintln "DISPATCHERS IN HTTP SESSIONS"println ""def sessions = Jenkins.instance.servletContext.this$0._sessionHandler._sessionCache._sessionssessions.each{id, session->  def eventDispatchers = EventDispatcherFactory.getDispatchers(session)  if (eventDispatchers) {eventDispatchers.each { dispatcherId, dispatcher ->  dispatchersList.add(dispatcherId)  def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...  if (retryQueue.peek() != null) {def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000println "Dispatcher: " + dispatcher.getClass().getName() + " - " + dispatcher.id + " with " + retryQueue.size() + " events, oldest is " + oldestAge + " seconds."} else {println "Dispatcher: " + dispatcher.getClass().getName() + " - " + dispatcher.id + " with no retryEvents"  }  }  }}println "There are " + dispatchersList.size() + " dispatchers in HTTP sessions"println ""//Find all subscribers in busprintln "DISPATCHERS IN PUBSUBBUS"println ""this.bus.subscribers. each any { channelSubscriber, guavaSubscriber ->  if (channelSubscriber.getClass().getName().equals('org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$SSEChannelSubscriber')) {dispatcherCount++def dispatcher = channelSubscriber.this$0def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...if (retryQueue.peek() != null) {  def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000   println "Dispatcher: " + dispatcher.id + " with " + retryQueue.size() + " events, oldest is " + oldestAge + " seconds."if (oldestAge > 300) {   println "  Clearing retryQueue with " + retryQueue.size() + " events"if (!dryRun) { retryQueue.clear() dispatcher.unsubscribeAll()try {  dispatcher.stop() }  catch (Exception ex) {  println "  !! Exception stopping AsynDispatcher"}  }  else {  println "  Ignoring, dryrun"}  }} else { println "Dispatcher: " + dispatcher.id + " with no retryEvents"}if 

[JIRA] (JENKINS-51057) EventDispatcher and ConcurrentLinkedQueue ate my JVM

2019-03-08 Thread airad...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Álvaro Iradier commented on  JENKINS-51057  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM   
 

  
 
 
 
 

 
 After many tests and cleanup scripts, we still we able to find some EventDispatcher$Retry elements in the heap but no subscribers in the GuavaPubSubBus or EventDispatchers in the HttpSessions. Some additional verifications in the plugin made me notice that as AsyncEventDispatcher was being used, it is possible that the EventDispatchers were still being referenced by some asyncContexts or threads that were not completed or released. Finally I also added a dispatcher.stop() call in my cleanup script, and it looks like there are no traces of leaked EventDispatcher$Retry clases in the heap anymore, but we still keep observing our instance and analyzing heap dumps, it is very soon to confirm. Just in case it can help, this is the cleanup script we are running daily: 

 

import org.jenkinsci.plugins.pubsub.PubsubBus;
import org.jenkinsci.plugins.ssegateway.sse.*;

def dryRun = false
this.bus = PubsubBus.getBus();

// change visibility of retryQueue so that we can use reflection instead...
def retryQueueField = org.jenkinsci.plugins.ssegateway.sse.EventDispatcher.getDeclaredField('retryQueue')
retryQueueField.setAccessible(true)

def dispatcherCount = 0
def dispatchersList = []

//Build a list of EventDispatchers in all existing HTTP sessions
println "DISPATCHERS IN HTTP SESSIONS"
println ""
def sessions = Jenkins.instance.servletContext.this$0._sessionHandler._sessionCache._sessions
sessions.each{id, session->
  def eventDispatchers = EventDispatcherFactory.getDispatchers(session)
  if (eventDispatchers) {
eventDispatchers.each { dispatcherId, dispatcher ->
  dispatchersList.add(dispatcherId)
  def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...
  if (retryQueue.peek() != null) {
def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000
println "Dispatcher: " + dispatcher.getClass().getName() + " - " + dispatcher.id + " with " + retryQueue.size() + " events, oldest is " + oldestAge + " seconds."  
  } else {
println "Dispatcher: " + dispatcher.getClass().getName() + " - " + dispatcher.id + " with no retryEvents"
  }  
}
  }
}

println "There are " + dispatchersList.size() + " dispatchers in HTTP sessions"
println ""

//Find all subscribers in bus
println "DISPATCHERS IN PUBSUBBUS"
println ""
this.bus.subscribers.each{ channelSubscriber, guavaSubscriber ->
  if (channelSubscriber.getClass().getName().equals('org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$SSEChannelSubscriber')) {
dispatcherCount++
def dispatcher = channelSubscriber.this$0

def retryQueue = retryQueueField.get(dispatcher) // Need to use reflection since retryQueue is private in super class...
if (retryQueue.peek() != null) {
  def oldestAge = (System.currentTimeMillis() - retryQueue.peek().timestamp)/1000
	  println "Dispatcher: " + dispatcher.id + " with " + retryQueue.size() + " events, oldest is " + oldestAge + " seconds."  
  if (oldestAge > 300) {
  	println "  Clearing retryQueue with " + retryQueue.size() + " events"
if (!dryRun) {
	retryQueue.clear()
} else {
  println "  Ignoring, dryrun"
}
  }
} else {
	println "Dispatcher: " + dispatcher.id + " with no retryEvents"
}

[JIRA] (JENKINS-51057) EventDispatcher and ConcurrentLinkedQueue ate my JVM

2019-03-07 Thread airad...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Álvaro Iradier commented on  JENKINS-51057  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
  Re: EventDispatcher and ConcurrentLinkedQueue ate my JVM   
 

  
 
 
 
 

 
 Just my 2 cents... Jon Sten, I tried your cleanup script with mixed results. It was able to empty retryQueues but we still had a big leak. After isolating the problem, I noticed that if the HTTP session from the user expires, that script is not able to find the leaking EventDispatchers. They are, however, still in the GuavaPubSubBus subscribers map, and can be seen with:     

 

import org.jenkinsci.plugins.pubsub.PubsubBus;
this.bus = PubsubBus.getBus();
this.bus.subscribers.each{ channelSubscriber, guavaSubscriber -> println channelSubscriber }
println "Done"
 

   This outputs some lines containing: org.jenkinsci.plugins.ssegateway.sse.EventDispatcher$SSEChannelSubscriber@3472875b  which is the Inner class that implements the ChannelSubscriber interface. I suspect that the memory is still leaking as it is being referenced through this subscriber. I saw there is already a PR https://github.com/jenkinsci/sse-gateway-plugin/pull/27 and a fork of the plugin https://github.com/taboola/sse-gateway-plugin The approach of this fix is changing the handling of retries so they are aborted and the queue cleared after some amount of time or retries. Howevery, I wonder if the problem is the cleanup should be done when the HttpSession is destroyed. Currently, I can see in https://github.com/jenkinsci/sse-gateway-plugin/blob/master/src/main/java/org/jenkinsci/plugins/ssegateway/sse/EventDispatcher.java the cleanup code for sessionDestroyed is: 

 

/**
 * Http session listener.
 */
@Extension
public static final class SSEHttpSessionListener extends HttpSessionListener {
@Override
public void sessionDestroyed(HttpSessionEvent httpSessionEvent) {
try {
Map dispatchers = EventDispatcherFactory.getDispatchers(httpSessionEvent.getSession());
try {
for (EventDispatcher dispatcher : dispatchers.values()) {
try {
dispatcher.unsubscribeAll();
} catch (Exception e) {
LOGGER.log(Level.FINE, "Error during unsubscribeAll() for dispatcher " + dispatcher.getId() + ".", e);
}
}
} finally {
dispatchers.clear();
}
} catch (Exception e) {
LOGGER.log(Level.FINE, "Error during session cleanup. The session has probably timed out.", e);
}
}
}
 

 but although I can see that for every dispatcher there is a call to dispatcher.unsubscribeAll(), I am missing a call to retryQueue.clear(); But I am just guessing here... Anyone knowing the internals of the plugin can confirm this? I might try forking the plugin and testing a modified version. Regards.  
 

  

[JIRA] (JENKINS-50138) Job and executor freeze and become zombie due to groovy issue

2018-03-13 Thread airad...@gmail.com (JIRA)
Title: Message Title


 
 
 
 

 
 
 

 
   
 Álvaro Iradier created an issue  
 

  
 
 
 
 

 
 
  
 
 
 
 

 
 Jenkins /  JENKINS-50138  
 
 
  Job and executor freeze and become zombie due to groovy issue   
 

  
 
 
 
 

 
Issue Type: 
  Bug  
 
 
Assignee: 
 Unassigned  
 
 
Components: 
 pipeline  
 
 
Created: 
 2018-03-13 11:49  
 
 
Environment: 
 Jenkins 2.73.3, Jenkins 2.89.4 LTS, Jenkins 2.111  
 
 
Priority: 
  Major  
 
 
Reporter: 
 Álvaro Iradier  
 

  
 
 
 
 

 
 Related to this issue reported to Groovy team: https://issues.apache.org/jira/browse/GROOVY-8507 The groovyc compiler hangs and freezes if enum is declared nested inside another enum. The problem is, when this happens, the Job and the executor in Jenkins become zombie threads, and they need to be killed by finding the Thread name on the Jenkins console and stopping both the executor and the Job threads, as instructed here: https://stackoverflow.com/questions/14456592/how-to-stop-an-unstoppable-zombie-job-on-jenkins-without-restarting-the-server The red cross icon next to the executor does not cancel the Job. So, I think the processes should not become zombie and impossible to kill, and this situation should be either detected, or allow the user to kill the freezed Job. How to reproduce Simply create a Pipeline type Job, and include the following code in the Pipeline: 

 

enum OuterEnum {
  VALUE,
 enum InnerEnum {
  A
 }
}

node {
 stage('Break') {
  echo '' + "${OuterEnum.values()}"
 }
}
 

 The Job and the executor will freeze, and there is no way to kill them from the UI.