> I increased idle timeout from 10min to 60min.

Was it around the time this [2] job failed recently?

  16:14:44 ++ sleep 184s
  16:16:29 FATAL: command execution failed

Vratko.

[2] https://jenkins.fd.io/job/csit-vpp-perf-verify-master-3n-hsw/335/console

-----Original Message-----
From: [email protected] <[email protected]> On Behalf Of Kenny Paul via RT
Sent: Tuesday, 2019-April-30 18:22
To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) <[email protected]>
Cc: [email protected]; [email protected]
Subject: [csit-dev] [FD.io Helpdesk #73486] Jenkins.fd.io network issues


I increased idle timeout from 10min to 60min. Let's see if that makes any 
difference.

Regards,

--
Anton Baranov
Sr. System Operations Engineer
The Linux Foundation

On Tue Apr 30 10:03:46 2019, [email protected] wrote:
> >> interleaved by quick periods of activity
> 
> >>> 09:26:36 ++ sleep 197s
> 
> > send any keepalive packages
> 
> I always assumed the console outputs are enough to keep jnlp 
> connection alive.
> 
> Also, I believe this failure over weekend has hit multiple jobs at 
> once.
> 
> For example https://jenkins.fd.io/job/csit-vpp-perf-verify-master-3n-
> hsw/333/console
>   09:32:54 ++ sleep 184s
>   09:33:09 FATAL: command execution failed
> 
> Vratko.
> 
> -----Original Message-----
> From: [email protected] <[email protected]> On Behalf Of Kenny 
> Paul via RT
> Sent: Tuesday, 2019-April-30 15:57
> To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) 
> <[email protected]>
> Cc: [email protected]; [email protected]
> Subject: [csit-dev] [FD.io Helpdesk #73486] Jenkins.fd.io network 
> issues
> 
> Hello Vratko,
> 
> Thank you for explanation. I'm wondering within that period of time 
> when reservation was unsuccessful (~40min) does the job keep jnlp 
> connection alive (send any keepalive packages)?
> 
> I checked the haproxy node where jnlp is runnining and I don't see any 
> DOWN notification for it
> 
> Thanks,
> --
> Anton Baranov
> Sr. System Operations Engineer
> The Linux Foundation
> 
> On Tue Apr 30 09:27:56 2019, [email protected] wrote:
> > > 05:26:36 mkdir: cannot create directory '/tmp/reservation_dir':
> > > File
> > > exists
> >
> > That error is expected, it just means  the testbed is currently used 
> > by another job, so this job should sleep a while and try again.
> >
> > > the job was waiting (sleep) from 04:45:12 til 05:26:36
> >
> > I believe my browser is showing me UTC timestamps, which show values 
> > larger by 4 hours.
> >
> > > we have 10m idle timeout
> >
> > The ~3m period of sleeps are interleaved by quick periods of 
> > activity, so we usually do not hit the timeout.
> >
> > But the final sleep probably took longer for some reason
> >
> > 09:26:36 ++ sleep 197s
> > 09:32:20 FATAL: command execution failed
> >
> > and something bad has happened in less than 6 minutes.
> > So it does not look like the 10m timeout.
> >
> > Vratko.
> >
> > -----Original Message-----
> >  From: [email protected] <[email protected]> On Behalf Of 
> > Kenny Paul via RT
> > Sent: Tuesday, 2019-April-30 15:09
> >  To: Jan Gelety -X (jgelety - PANTHEON TECHNOLOGIES at Cisco) 
> > <[email protected]>
> > Cc: [email protected]; [email protected]
> >  Subject: [csit-dev] [FD.io Helpdesk #73486] Jenkins.fd.io network 
> > issues
> >
> > Hello Jan
> >
> > From logs I see that the job was waiting (sleep) from 04:45:12 til
> >  05:26:36 which could cause jnlp session to timed out as we have 10m 
> > idle timeout (client and server side) set on jenkins.fd.io
> >
> > Could you check that error:
> >
> > 05:26:36 Reservation unsuccessful:
> >  05:26:36 mkdir: cannot create directory '/tmp/reservation_dir': 
> > File exists
> >
> > Cheers,
> >
> > --
> > Anton Baranov
> > Sr. System Operations Engineer
> > The Linux Foundation
> >
> > On Mon Apr 29 02:58:28 2019, [email protected] wrote:
> > > Hello,
> > >
> > > We are experiencing quite a lot of network issues when running 
> > > CSIT tests for 19.04 report:
> > >
> > > Caused: hudson.remoting.ChannelClosedException: Channel "unknown":
> > > Remote call on JNLP4-connect connection from vex-yul-rot-ingress-
> > >   1.ci.codeaurora.org/10.30.48.3:41068 failed. The channel is 
> > > closing down or has closed down
> > >
> > > https://jenkins.fd.io/job/csit-vpp-perf-verify-1904-3n-
> > > hsw/13/consol
> > > e
> > >
> > > Could you, please, have a look on it?
> > >
> > > Thank you very much.
> > >
> > > Regards,
> > > Jan
> >
> >
> 
> 



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12900): https://lists.fd.io/g/vpp-dev/message/12900
Mute This Topic: https://lists.fd.io/mt/31419993/21656
Group Owner: [email protected]
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [[email protected]]
-=-=-=-=-=-=-=-=-=-=-=-
  • ... Peter Mikus via Lists.Fd.Io
    • ... Maciek Konstantynowicz via RT
    • ... Maciek Konstantynowicz via RT
  • ... Anton Baranov via RT
    • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
      • ... Anton Baranov via RT
        • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
          • ... Vratko Polak -X via RT
          • ... Anton Baranov via RT
            • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io
              • ... Vratko Polak -X via RT
              • ... Anton Baranov via RT
      • ... Vratko Polak -X via RT

Reply via email to