Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-16 Thread kuisathaverat
El El dom, 16 feb 2020 a las 14:48, Vincent Massol 
escribió:

> This is why I've been asking from the beginning if it's normal :) Usually
> when there's a stack trace it's not really normal. But it happens so
> frequently that the only explanation I can think of is that it's the normal
> behavior of Jenkins.
>

So you see exceptions on every disconnection, I’ve never seen this behavior
on the Docker plugin, I’ve seen the distinction messages but without an
exception.

> --
Un Saludo
Iván Fernández Calvo
https://www.linkedin.com/in/iv%C3%A1n-fern%C3%A1ndez-calvo-21425033

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/CAKo5Qrr4%3D3cmCgfKe_MkzjByO7m88znQdFWuV1tYRrZ2Nkmy7A%40mail.gmail.com.


Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-16 Thread Vincent Massol
Hi Ivan,
 

> so the only difference is the stack trace of the exception, the log level 
> is the same.


Is it possible that you misunderstood the data at  
https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA ? :)

As is mentioned there, ALL of the lines are the same as the ones at the top 
(ie they all have a stack trace). I just didn't put the full lines for the 
sake of space ;)

This is why I've been asking from the beginning if it's normal :) Usually 
when there's a stack trace it's not really normal. But it happens so 
frequently that the only explanation I can think of is that it's the normal 
behavior of Jenkins.

WDYT?

Thanks again!
-Vincent

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/84491b67-7061-4ad7-a964-c7252dd4c7bf%40googlegroups.com.


Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-16 Thread Iván Fernández Calvo



> El 16 feb 2020, a las 14:15, Vincent Massol  escribió:
> 
> In both cases it"ll be reported as INFO in the logs too. Right?

It seems like, I didn’t noticed that the exception is also a INFO messages, so 
the only difference is the stack trace of the exception, the log level is the 
same.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/6A64DDBC-4F5C-405C-838D-9E22A9462EC7%40gmail.com.


Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-16 Thread Vincent Massol

>
> After seen the log I understand you are asking for the INFO messages that 
> inform that a Docker agent is disconnected, IIRC those messages are normal 
> they only inform about the Docker agent status, 
>
>
Thanks for your reply. Let me make sure I understand. So the Docker Cloud 
plugin will spawn new Jenkins Docker agents. It'll stop the agents by using 
DockerContainerWatchdog thread which regularly tries to connect to the 
agent and when it fails, it removes the agent. This is what happened in the 
following example:

2020-02-14 09:13:37.434+ [id=268432] INFO i
.j.docker.DockerTransientNode$1#println: Disconnected computer for node 
'Jenkins 
SSH Slave a3-0094ebcvu7jkf'.
122020-02-14 09:13:37.434+ [id=268243] INFO 
h.r.SynchronousCommandTransport$ReaderThread#run: I/O error in channel 
Jenkins SSH Slave a3-0094ebcvu7jkf
13java.net.SocketException: Socket closed
14 at java.net.SocketInputStream.socketRead0(Native Method)
15 at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
16 at java.net.SocketInputStream.read(SocketInputStream.java:171)
17 at java.net.SocketInputStream.read(SocketInputStream.java:141)
18 at io.jenkins.docker.client
.DockerMultiplexedInputStream.readInternal(DockerMultiplexedInputStream.java
:48)
19 at io.jenkins.docker.client.DockerMultiplexedInputStream.read
(DockerMultiplexedInputStream.java:30)
20 at hudson.remoting.FlightRecorderInputStream.read
(FlightRecorderInputStream.java:91)
21 at hudson.remoting.ChunkedInputStream.readHeader(ChunkedInputStream.java:
72)
22 at hudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.
java:103)
23 at 
hudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.
java:39)
24 at hudson.remoting.AbstractSynchronousByteArrayCommandTransport.read
(AbstractSynchronousByteArrayCommandTransport.java:34)
25 at 
hudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.
java:63)
262020-02-14 09:13:37.440+ [id=268432] INFO i
.j.docker.DockerTransientNode$1#println: Removed Node for node 'Jenkins SSH 
Slave a3-0094ebcvu7jkf'.

So it means that whether the agent finishes it work or whether there's a 
connection issue between Jenkins master and the agent, it'll be reported 
the same in the jenkins.log file. Basically it's the same mechanism for 
stopping an agent having finished its work or handling a connection error. 
In both cases it"ll be reported as INFO in the logs too. Right?

you can change the verbose level of the Java package on logs configuration 
> to omit those type of messages if they bother you


Indeed that could be interesting. Now it means we would also not be able to 
see the real communication errors between master and agents I guess.

Thanks a lot for your help. If you could confirm this it would be great; 
I'd be able to move forward and move to the next problems (we have plenty 
of intermittent errors to figure out ;)).

Have a great weekend
-Vincent

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/c7e601cd-b34c-4e75-8d76-fb58439474ef%40googlegroups.com.


Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-15 Thread Ivan Fernandez Calvo
After seen the log I understand you are asking for the INFO messages that 
inform that a Docker agent is disconnected, IIRC those messages are normal they 
only inform about the Docker agent status, you can change the verbose level of 
the Java package on logs configuration to omit those type of messages if they 
bother you.
About the other message the InterruptedException, this looks like and issue, 
but there is not much info to troubleshooting it, you have to monitor those 
errors and try to find something in common, same job always, same Docker image, 
Same resources, ... the most common issue is a resources problem, in those 
cases the container is killed because an OOM error, you can check if this is 
the case if you can make a Docker inspect of the container.

>> El 14 feb 2020, a las 21:47, Vincent Massol  escribió:
> 
> Thanks Ivan. We're not using SHH agents but Docker Cloud (the agents are 
> provisioned on the fly as docker containers).
> 
> I was indeed looking for how to turn on some debugging on the agent side but 
> I couldn't find anything. Also the agent docker container is removed once the 
> job is finished so it seems even harder to get some info about what's going 
> on.
> 
> What I wanted to know is whether what we're experiencing is a normal behavior 
> of Jenkins or not. I'm asking because a lot of our jobs are going fine every 
> day but we stil have several ones that are killed in mid-air every day. For 
> example if I take agent 6 (a6) from 
> https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA I can see it's been 
> terminate on 2020-02-10 at:
> * 4:44
> * 5:06
> * 5:24
> * 7:45
> * 10:06
> * 10:24
> * etc
> 
> Now I don't think we have that many job failures every day. It's more like 1 
> or 2 per day. So I'm not sure what to think of it. 
> 
> I was trying to investigate why we see the following regularly (every day) in 
> our CI job logs:
> 
> Cannot contact Jenkins SSH Slave a6-009448n7sqon4: 
> java.lang.InterruptedException
> Agent Jenkins SSH Slave a6-009448n7sqon4 was deleted; cancelling node body
> Could not connect to Jenkins SSH Slave a6-009448n7sqon4 to send interrupt 
> signal to process
> 
> And then I discovered what I've pasted at 
> https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA by looking at the jenkins 
> master log file and I went "wow, how come there are so many disconnections".
> 
> Any idea is most welcome!
> 
> Thanks a lot
> -Vincent
> 
> 
> Le vendredi 14 février 2020 19:50:27 UTC+1, Ivan Fernandez Calvo a écrit :
>> 
>> Pingthread and some monitoring stuff run every 4 min, I think that the 
>> disconnections happens before that process but because there is not activity 
>> on this agents is not detected until the pingthread passes. So I guess you 
>> have half closed connections, I mean, the agent closes the convention but 
>> the master does not received the reset packet. If you are using SSH agents, 
>> you can enable the verbose mode on the sshd server to monitor what the heck 
>> happens see 
>> https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug
> 
> -- 
> You received this message because you are subscribed to a topic in the Google 
> Groups "Jenkins Users" group.
> To unsubscribe from this topic, visit 
> https://groups.google.com/d/topic/jenkinsci-users/A1H9vVP-9c4/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to 
> jenkinsci-users+unsubscr...@googlegroups.com.
> To view this discussion on the web visit 
> https://groups.google.com/d/msgid/jenkinsci-users/6745b3f8-6da2-49b4-8e99-835fb67315dc%40googlegroups.com.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/BBB88F2F-F7A6-4DB0-A7D7-18404B7B7B58%40gmail.com.


Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-14 Thread Vincent Massol
Thanks Ivan. We're not using SHH agents but Docker Cloud (the agents are 
provisioned on the fly as docker containers).

I was indeed looking for how to turn on some debugging on the agent side 
but I couldn't find anything. Also the agent docker container is removed 
once the job is finished so it seems even harder to get some info about 
what's going on.

What I wanted to know is whether what we're experiencing is a normal 
behavior of Jenkins or not. I'm asking because a lot of our jobs are going 
fine every day but we stil have several ones that are killed in mid-air 
every day. For example if I take agent 6 (a6) from 
https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA I can see it's been 
terminate on 2020-02-10 at:
* 4:44
* 5:06
* 5:24
* 7:45
* 10:06
* 10:24
* etc

Now I don't think we have that many job failures every day. It's more like 
1 or 2 per day. So I'm not sure what to think of it. 

I was trying to investigate why we see the following regularly (every day) 
in our CI job logs:

Cannot contact Jenkins SSH Slave a6-009448n7sqon4: 
java.lang.InterruptedException
Agent Jenkins SSH Slave a6-009448n7sqon4 was deleted; cancelling node body
Could not connect to Jenkins SSH Slave a6-009448n7sqon4 to send interrupt 
signal to process

And then I discovered what I've pasted at 
https://up1.xwikisas.com/#vI0VAypIpe_tD9LrQRTdMA by looking at the jenkins 
master log file and I went "wow, how come there are so many disconnections".

Any idea is most welcome!

Thanks a lot
-Vincent


Le vendredi 14 février 2020 19:50:27 UTC+1, Ivan Fernandez Calvo a écrit :
>
> Pingthread and some monitoring stuff run every 4 min, I think that the 
> disconnections happens before that process but because there is not 
> activity on this agents is not detected until the pingthread passes. So I 
> guess you have half closed connections, I mean, the agent closes the 
> convention but the master does not received the reset packet. If you are 
> using SSH agents, you can enable the verbose mode on the sshd server to 
> monitor what the heck happens see 
> https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug
>  
> 

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/6745b3f8-6da2-49b4-8e99-835fb67315dc%40googlegroups.com.


Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-14 Thread Ivan Fernandez Calvo
Pingthread and some monitoring stuff run every 4 min, I think that the 
disconnections happens before that process but because there is not activity on 
this agents is not detected until the pingthread passes. So I guess you have 
half closed connections, I mean, the agent closes the convention but the master 
does not received the reset packet. If you are using SSH agents, you can enable 
the verbose mode on the sshd server to monitor what the heck happens see 
https://github.com/jenkinsci/ssh-slaves-plugin/blob/master/doc/TROUBLESHOOTING.md#common-info-needed-to-troubleshooting-a-bug

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/6afddcab-d9e4-46a1-84b4-3e692e285910%40googlegroups.com.


Re: Frequent "Disconnected computer for node" messages in jenkins logs

2020-02-14 Thread Victor Martinez
I've seen those stack traces with some other Cloud Node providers in 
Jenkins. 

Not sure if that's an implementation within the Jenkins core or the 
docker-plugin itself or some specific design.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/e402eb10-4600-4cf7-bf86-19b0410a5c9d%40googlegroups.com.