[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Jeff Thompson closed an issue as Cannot Reproduce Jenkins / JENKINS-49816 swarm node says connected succesffuly, but master has placed it offline Change By: Jeff Thompson Status: Open Closed Resolution: Cannot Reproduce Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Jeff Thompson commented on JENKINS-49816 Re: swarm node says connected succesffuly, but master has placed it offline I have no idea what's going on here and I haven't seen any other similar reports. Unfortunately there isn't enough information in the logs to figure anything out. It might be possible to figure more out with finer-grained logging but that would make it difficult to sift through it all to find something interesting. As Alex Gray notes it's common enough to be a nuisance but not frequent enough to be a big problem. Without more info there's not much to take action on. I don't know there's much reason to hold this ticket open. Add Comment This message was sent by Atlassian Jira (v7.11.2#711002-sha1:fdc329d) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Alex Gray commented on JENKINS-49816 Re: swarm node says connected succesffuly, but master has placed it offline I was able to reproduce this locally by running a script that basically: 1. runs the the "java -jar swarm ..." command to connect the slave the master 2. verifies it was connected to the master Out of a few thousand loops, it fails to connect (same logs I posted originally... basically the "INFO: Connected" is not present in the logs.) Maybe if this failed 1/10 times this can be considered a bug, but 1/1000 probably not. Add Comment This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Oleg Nenashev commented on JENKINS-49816 Re: swarm node says connected succesffuly, but master has placed it offline Well, with the current info I cannot say whether it is a defect or not. Add Comment This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Alex Gray commented on JENKINS-49816 Re: swarm node says connected succesffuly, but master has placed it offline Unfortunately, those are the only logs. I am spinning up pretty hefty AWS spotinstances (c4.xlarge) when this happens. Out of 1000 spin ups from the same AMI, I get a few of these failures. Most of our "regular" nodes are t2.small and they don't have any issues, but we only spin up a few of those per month, so it's probably not related to any type of load. We also have newrelic metrics and there is no load when this is happening. The workaround is pretty easy: I have a cron that runs every 5 minutes and if the last line in the log is not "INFO: Connected" after a few minutes, I restart the swarm jar and it works. I would rate this ticket as "low priority", since it's easy to detect and easy to workaround. Add Comment This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Oleg Nenashev commented on JENKINS-49816 Re: swarm node says connected succesffuly, but master has placed it offline Is there anything else in the master log? Maybe the agent becomes unavailable due to the heavy classloading or so Add Comment This message was sent by Atlassian JIRA (v7.3.0#73011-sha1:3c73d0e) -- You received this message because you are subscribed to the Google Groups "Jenkins Issues" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Alex Gray updated an issue Jenkins / JENKINS-49816 swarm node says connected succesffuly, but master has placed it offline Change By: Alex Gray We spin up 1000's of nodes with swarm per month.Every month we encounter a few scenarios where the swarm agent says it connected successfully, but the jenkins master does not show it.The node has these logs (notice it does not say "INFO: Connected", which it usually does):{panel:title=Swarm Logs}INFO: Client.main invoked with: [-name eod-us-west-2_spot_m3.xlarge-i-03918a0ef1ef6d8be -description Created by Swarm. InstanceID=i-03918a0ef1ef6d8be AmiId=ami-a030b2d8 -executors 1 -fsroot /mnt/ope/ws -labels eod-us-west-2_spot_m3.xlarge -master https://jenkins.clearcare.it/ -mode normal -retry 30 -username s...@clearcareonline.com -password nJ0yuLYBcOJE -disableSslVerification]Feb 28, 2018 7:49:57 PM hudson.plugins.swarm.Client runINFO: Discovering Jenkins masterSLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".SLF4J: Defaulting to no-operation (NOP) logger implementationSLF4J: See [http://www.slf4j.org/codes.html#StaticLoggerBinder] for further details.Feb 28, 2018 7:50:14 PM hudson.plugins.swarm.Client runINFO: Attempting to connect to [https://jenkins.clearcare.it/] ea7ab441-78d0-4548-a571-5feaae0be121 with ID fd8127ceFeb 28, 2018 7:50:14 PM hudson.plugins.swarm.SwarmClient getCsrfCrumbSEVERE: Could not obtain CSRF crumb. Response code: 404Feb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main createEngineINFO: Setting up slave: eod-us-west-2_spot_m3.xlarge-i-03918a0ef1ef6d8be-fd8127ceFeb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main$CuiListener INFO: Jenkins agent is running in headless mode.Feb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main$CuiListener statusINFO: Locating server among [https://jenkins.foo.it/]Feb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main$CuiListener statusINFO: Agent discovery successfulAgent address: jenkins.foo.itAgent port: 30001Identity: c9:5a:43:aa:0e:bc:16:0a:c5:92:09:91:03:46:f7:ecFeb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main$CuiListener statusINFO: HandshakingFeb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main$CuiListener statusINFO: Connecting to jenkins.foo.it:30001Feb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main$CuiListener statusINFO: Trying protocol: JNLP4-connectFeb 28, 2018 7:50:15 PM hudson.remoting.jnlp.Main$CuiListener statusINFO: Remote identity confirmed: c9:5a:43:aa:0e:bc:16:0a:c5:92:09:91:03:46:f7:ec{panel}On the master logs, I see this:WARNING: Making eod-us-west-2_spot_m3.xlarge-i-03918a0ef1ef6d8be-fd8127ce offline because it’s not respondingRestarting the java process does the trick, but I hate manually doing this.It seems the swarm jar gets stuck after the log, "Remote identity confirmed". Again, out of 1000 times a month, this issue occurs maybe 2-4 times.
[JIRA] (JENKINS-49816) swarm node says connected succesffuly, but master has placed it offline
Title: Message Title Alex Gray created an issue Jenkins / JENKINS-49816 swarm node says connected succesffuly, but master has placed it offline Issue Type: Bug Assignee: Oleg Nenashev Components: swarm-plugin Created: 2018-03-01 02:42 Environment: Jenkins ver. 2.89.4 Swarm 3.9 Priority: Major Reporter: Alex Gray We spin up 1000's of nodes with swarm per month. Every month we encounter a few scenarios where the swarm agent says it connected successfully, but the jenkins master does not show it. The node has these logs (notice it does not say "INFO: Connected", which it usually does): Swarm Logs INFO: Client.main invoked with: [-name eod-us-west-2_spot_m3.xlarge-i-03918a0ef1ef6d8be -description Created by Swarm. InstanceID=i-03918a0ef1ef6d8be AmiId=ami-a030b2d8 -executors 1 -fsroot /mnt/ope/ws -labels eod-us-west-2_spot_m3.xlarge -master https://jenkins.clearcare.it/ -mode normal -retry 30 -username s...@clearcareonline.com -password nJ0yuLYBcOJE -disableSslVerification] Feb 28, 2018 7:49:57 PM hudson.plugins.swarm.Client run INFO: Discovering Jenkins master SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. Feb 28, 2018 7:50:14 PM hudson.plugins.swarm.Client run INFO: Attempting to connect to https://jenkins.clearcare.it/