Re: leap second outage

2015-07-01 Thread Justin Paine via NANOG
Any confirmation if the AWS outage was leap second-related?


Justin Paine
Head of Trust  Safety
CloudFlare Inc.
PGP KeyID: 57B6 0114 DE0B 314D


On Tue, Jun 30, 2015 at 8:32 PM, Dovid Bender do...@telecurve.com wrote:
 I read that and that at midnight local time since that's when you have the 
 extra second. I know a large carrier in Israel is down. Waiting for conf. If 
 it's leep second related.

 --Original Message--
 From: Stefan
 Sender: NANOG
 To: frnk...@iname.com
 Cc: nanog@nanog.org
 Subject: Re: leap second outage
 Sent: Jun 30, 2015 23:30

 This was supposed to have happened @midnight UTC, right? Meaning that we
 are past that event. Under which scenarios should people be concerned about
 midnight local time? Lots of confusing messages flying all over...
 On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:

 We experienced our first leap second outage -- our SHE (super head end) is
 using (old) Motorola encoders and we lost those video channels.  They
 restarted all those encoders to restore service.

 Frank



 Regards,

 Dovid


Re: leap second outage

2015-07-01 Thread Jimmy Hess
On Wed, Jul 1, 2015 at 12:38 AM, Mikael Abrahamsson swm...@swm.pp.se wrote:
 quickly. Either we should abolish the leap second or we should make leap
 second adjustments (back and forth) on a monthly basis to exercise the code.

See  maybe there should some day be building codes for
commercially marketed software  that provide minimum independent
formal testing to be done by licensed independent testers,  including
leap seconds and such. ^_^

The leap second issues are possibly rare and intermittent,  therefore,
 having a few per month  is not necessarily giving adequate exposure
to code paths that may go wrong during an insert/del event.

There's never been a negative leap second, only insertions, but how
deletions are implemented  might expose new bugs, since there hasn't
been one before,  And you can only have one leap per 24 hours,
positive or minus,  pick one.

 Shouldn't this kind of 'exercise'  be done  during the QA process
before releasing new system software,   rather than mucking with clock
accuracy?

There is a recent article with some Leap Second  'stress testing' code:
  https://access.redhat.com/articles/199563


Readily available test methods are available,  there ought to be
little legitimate excuse for anyone writing serious software that has
long-running processes or threads   not to include  evaluation for
possible leap second  issues  and other possible clock-related issues
such as clock stepping, DST, and Year 2038 in their standard smoke
tests

 --
 Mikael Abrahamssonemail: swm...@swm.pp.se
--
-JH


Re: leap second outage

2015-07-01 Thread Johnny Eriksson
Mikael Abrahamsson swm...@swm.pp.se wrote:
 This is similar to the jiffycounter wrapping, since this doesn't happen 
 that often, it's not commonly tested for. Good way is to start the jiffy 
 counter so it wraps after 10 minutes of uptime. That way you'll run into 
 any bugs quickly. Either we should abolish the leap second or we should 
 make leap second adjustments (back and forth) on a monthly basis to 
 exercise the code.

You could do this, move back on even-numbered months and forward on odd.

Any real adjustment could be done via inhibiting the monthly change...

 This is a hard sell though...

'fraid so.

 Mikael Abrahamssonemail: swm...@swm.pp.se

--Johnny


RE: leap second outage

2015-07-01 Thread frnkblk
Yes, happened at 7 pm Central (0:oo UTC).

 

From: Stefan [mailto:netfort...@gmail.com] 
Sent: Tuesday, June 30, 2015 10:30 PM
To: frnk...@iname.com
Cc: nanog@nanog.org
Subject: Re: leap second outage

 

This was supposed to have happened @midnight UTC, right? Meaning that we are 
past that event. Under which scenarios should people be concerned about 
midnight local time? Lots of confusing messages flying all over... 

On Jun 30, 2015 10:13 PM, frnk...@iname.com mailto:frnk...@iname.com  wrote:

We experienced our first leap second outage -- our SHE (super head end) is
using (old) Motorola encoders and we lost those video channels.  They
restarted all those encoders to restore service.

Frank



RE: leap second outage

2015-07-01 Thread frnkblk
And just 12.5% of them required TLC. =)

-Original Message-
From: NANOG [mailto:nanog-boun...@nanog.org] On Behalf Of frnk...@iname.com
Sent: Wednesday, July 01, 2015 7:05 AM
To: 'Stefan'
Cc: nanog@nanog.org
Subject: RE: leap second outage

Yes, happened at 7 pm Central (0:oo UTC).

 

From: Stefan [mailto:netfort...@gmail.com] 
Sent: Tuesday, June 30, 2015 10:30 PM
To: frnk...@iname.com
Cc: nanog@nanog.org
Subject: Re: leap second outage

 

This was supposed to have happened @midnight UTC, right? Meaning that we are 
past that event. Under which scenarios should people be concerned about 
midnight local time? Lots of confusing messages flying all over... 

On Jun 30, 2015 10:13 PM, frnk...@iname.com mailto:frnk...@iname.com  wrote:

We experienced our first leap second outage -- our SHE (super head end) is
using (old) Motorola encoders and we lost those video channels.  They
restarted all those encoders to restore service.

Frank





Re: leap second outage

2015-07-01 Thread Tim Raphael
No, it was a route leak by a colo  provider (Axcelx) downstream.

Regards,

Tim Raphael

 On 1 Jul 2015, at 11:37 am, Justin Paine via NANOG nanog@nanog.org wrote:
 
 Any confirmation if the AWS outage was leap second-related?
 
 
 Justin Paine
 Head of Trust  Safety
 CloudFlare Inc.
 PGP KeyID: 57B6 0114 DE0B 314D
 
 
 On Tue, Jun 30, 2015 at 8:32 PM, Dovid Bender do...@telecurve.com wrote:
 I read that and that at midnight local time since that's when you have the 
 extra second. I know a large carrier in Israel is down. Waiting for conf. If 
 it's leep second related.
 
 --Original Message--
 From: Stefan
 Sender: NANOG
 To: frnk...@iname.com
 Cc: nanog@nanog.org
 Subject: Re: leap second outage
 Sent: Jun 30, 2015 23:30
 
 This was supposed to have happened @midnight UTC, right? Meaning that we
 are past that event. Under which scenarios should people be concerned about
 midnight local time? Lots of confusing messages flying all over...
 On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:
 
 We experienced our first leap second outage -- our SHE (super head end) is
 using (old) Motorola encoders and we lost those video channels.  They
 restarted all those encoders to restore service.
 
 Frank
 
 Regards,
 
 Dovid


Re: leap second outage

2015-07-01 Thread Harlan Stenn
Mikael Abrahamsson writes:
 This is similar to the jiffycounter wrapping, since this doesn't happen 
 that often, it's not commonly tested for. Good way is to start the jiffy 
 counter so it wraps after 10 minutes of uptime. That way you'll run into 
 any bugs quickly. Either we should abolish the leap second or we should 
 make leap second adjustments (back and forth) on a monthly basis to 
 exercise the code.
 
 This is a hard sell though...

and it's perversely interesting.  It would even be tolerable when the
difference between UTC and UT1 is such that the insertions and deletions
maintain the +/- .9 s difference.  There would even be enough time to
warn folks about this.

H


Re: leap second outage

2015-07-01 Thread Colin Johnston
oracle linux did this
Jul  1 02:01:29 oraclelinux ntpd[600]: 0.0.0.0 061c 0c clock_step -1.006445 s
Jul  1 02:01:29 oraclelinux ntpd[600]: 0.0.0.0 0615 05 clock_sync
Jul  1 02:01:29 oraclelinux systemd: Time has been changed
Jul  1 02:01:30 oraclelinux ntpd[600]: 0.0.0.0 c618 08 no_sys_peer
all seemed fine after this

sophus utm did this
2015:07:01-00:59:59 cloudsophosvm kernel: [653957.707421] Clock: inserting leap 
second 23:59:60 UTC
all seemed fine after this


Colin




Re: leap second outage

2015-07-01 Thread Harlan Stenn
Jimmy Hess writes:
 On Wed, Jul 1, 2015 at 12:38 AM, Mikael Abrahamsson swm...@swm.pp.se wrote:
  quickly. Either we should abolish the leap second or we should make leap
  second adjustments (back and forth) on a monthly basis to exercise the code
 .
 
 See  maybe there should some day be building codes for
 commercially marketed software  that provide minimum independent
 formal testing to be done by licensed independent testers,  including
 leap seconds and such. ^_^

And NTF's Certification and Compliance programs are going to do this.
At least as soon as NTF has the resources to get this moving.

 The leap second issues are possibly rare and intermittent,  therefore,
  having a few per month  is not necessarily giving adequate exposure
 to code paths that may go wrong during an insert/del event.

If they happened every 6 month's time that would be often enough, but
the earth hasn't slowed down that much yet.  There will be enough times
that we could insert or delete one every month and still have |UT-UT1|
be under .9 seconds.

If it was announced that starting in 6 months' time we'll be inserting
or deleting a leap second every month or so that would give folks enough
time to prep for it, and I'm pretty confident that the leap-second would
soon become a non-event.

 There's never been a negative leap second, only insertions, but how
 deletions are implemented  might expose new bugs, since there hasn't
 been one before,  And you can only have one leap per 24 hours,
 positive or minus,  pick one.

Yup.

  Shouldn't this kind of 'exercise'  be done  during the QA process
 before releasing new system software,   rather than mucking with clock
 accuracy?

leap second handling is a mechanism question.  Which one to choose is
a policy question.  IMO, a vendor should provide adequate mechanism.
The customer should get to choose policy.

 There is a recent article with some Leap Second  'stress testing' code:
   https://access.redhat.com/articles/199563
 
 
 Readily available test methods are available,  there ought to be
 little legitimate excuse for anyone writing serious software that has
 long-running processes or threads   not to include  evaluation for
 possible leap second  issues  and other possible clock-related issues
 such as clock stepping, DST, and Year 2038 in their standard smoke
 tests

Yes.  And even so, testing these things takes time and equipment.
-- 
Harlan Stenn st...@ntp.org
http://networktimefoundation.org - be a member!


Re: leap second outage

2015-06-30 Thread Dovid Bender
I read that and that at midnight local time since that's when you have the 
extra second. I know a large carrier in Israel is down. Waiting for conf. If 
it's leep second related.

--Original Message--
From: Stefan
Sender: NANOG
To: frnk...@iname.com
Cc: nanog@nanog.org
Subject: Re: leap second outage
Sent: Jun 30, 2015 23:30

This was supposed to have happened @midnight UTC, right? Meaning that we
are past that event. Under which scenarios should people be concerned about
midnight local time? Lots of confusing messages flying all over...
On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:

 We experienced our first leap second outage -- our SHE (super head end) is
 using (old) Motorola encoders and we lost those video channels.  They
 restarted all those encoders to restore service.

 Frank



Regards,

Dovid


Re: leap second outage

2015-06-30 Thread Nicholas Suan
Correct, the leap second gets inserted at midnight UTC.

Leap seconds can be introduced in UTC at the end of the months of December

 or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every
 six months, either to announce a time step in UTC or to confirm that there
 will be no time step at the next possible date.

ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat

On Tue, Jun 30, 2015 at 11:30 PM, Stefan netfort...@gmail.com wrote:
 This was supposed to have happened @midnight UTC, right? Meaning that we
 are past that event. Under which scenarios should people be concerned about
 midnight local time? Lots of confusing messages flying all over...
 On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:

 We experienced our first leap second outage -- our SHE (super head end) is
 using (old) Motorola encoders and we lost those video channels.  They
 restarted all those encoders to restore service.

 Frank




leap second outage

2015-06-30 Thread frnkblk
We experienced our first leap second outage -- our SHE (super head end) is
using (old) Motorola encoders and we lost those video channels.  They
restarted all those encoders to restore service.

Frank



Re: leap second outage

2015-06-30 Thread Stefan
This was supposed to have happened @midnight UTC, right? Meaning that we
are past that event. Under which scenarios should people be concerned about
midnight local time? Lots of confusing messages flying all over...
On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:

 We experienced our first leap second outage -- our SHE (super head end) is
 using (old) Motorola encoders and we lost those video channels.  They
 restarted all those encoders to restore service.

 Frank




Re: leap second outage

2015-06-30 Thread Dovid Bender
No. Some one leaked some routes: 
https://mobile.twitter.com/Axcelx/status/616058414746202113


Regards,

Dovid

-Original Message-
From: Justin Paine jus...@cloudflare.com
Date: Tue, 30 Jun 2015 20:37:06 
To: do...@telecurve.com
Cc: Stefannetfort...@gmail.com; NANOGnanog-boun...@nanog.org; 
frnk...@iname.com; nanog@nanog.org
Subject: Re: leap second outage

Any confirmation if the AWS outage was leap second-related?


Justin Paine
Head of Trust  Safety
CloudFlare Inc.
PGP KeyID: 57B6 0114 DE0B 314D


On Tue, Jun 30, 2015 at 8:32 PM, Dovid Bender do...@telecurve.com wrote:
 I read that and that at midnight local time since that's when you have the 
 extra second. I know a large carrier in Israel is down. Waiting for conf. If 
 it's leep second related.

 --Original Message--
 From: Stefan
 Sender: NANOG
 To: frnk...@iname.com
 Cc: nanog@nanog.org
 Subject: Re: leap second outage
 Sent: Jun 30, 2015 23:30

 This was supposed to have happened @midnight UTC, right? Meaning that we
 are past that event. Under which scenarios should people be concerned about
 midnight local time? Lots of confusing messages flying all over...
 On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:

 We experienced our first leap second outage -- our SHE (super head end) is
 using (old) Motorola encoders and we lost those video channels.  They
 restarted all those encoders to restore service.

 Frank



 Regards,

 Dovid


Re: leap second outage

2015-06-30 Thread Josh Luthman
That is my understanding as well.  The event was about 3.5 hours ago.


Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

On Tue, Jun 30, 2015 at 11:30 PM, Stefan netfort...@gmail.com wrote:

 This was supposed to have happened @midnight UTC, right? Meaning that we
 are past that event. Under which scenarios should people be concerned about
 midnight local time? Lots of confusing messages flying all over...
 On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:

  We experienced our first leap second outage -- our SHE (super head end)
 is
  using (old) Motorola encoders and we lost those video channels.  They
  restarted all those encoders to restore service.
 
  Frank
 
 



Re: leap second outage

2015-06-30 Thread Jean-Francois Mezei
On 15-07-01 00:47, Harlan Stenn wrote:

 What I'm about to say may not be as stupid as it sounds:  The problems
 here aren't problems for cases where it's not a problem.  It is a
 problem where it *is* a problem.

In fairness, systems should be used to NTP making adjustments to the
system clock of a second or less.

However, in systems that expect tightly synchronized clocks, they would
want all the nodes to make the NTP adjustement at the same time.



Re: leap second outage

2015-06-30 Thread Harlan Stenn
Joe writes:
 A leap sec causing issues. For about 40 years now, there have been
 these leap seconds to no real issue. All of these are go-forwards

No, they're all go-backwards events.  That's no big deal to things
that don't care about monotonic time, or to folks who aren't in
violation of something if their timestamps are off by a second.

What I'm about to say may not be as stupid as it sounds:  The problems
here aren't problems for cases where it's not a problem.  It is a
problem where it *is* a problem.

It's a case where one person's signal is another person's noise.

H


Re: leap second outage

2015-06-30 Thread Joe
A leap sec causing issues. For about 40 years now, there have been
these leap seconds to no real issue. All of these are go-forwards
and even MS AD (I believe) treat them as a little bump (nothing to see
here move along). So unless you have really a tight VPN (non-standard
conforming) I'd hope that nothing has happend, and if it did chances
are it's etheir coincidence or intentional.
I certainly hope I am around to collect on the
https://en.wikipedia.org/wiki/Year_2038_problem for retirement.
I think we've all seen the big to do regarding Y2K to know better
Maybe I am wrong, but...

Just my 2¢s
-Joe

On Tue, Jun 30, 2015 at 10:42 PM, Nicholas Suan ns...@nonexiste.net wrote:
 Correct, the leap second gets inserted at midnight UTC.

 Leap seconds can be introduced in UTC at the end of the months of December

  or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every
  six months, either to announce a time step in UTC or to confirm that there
  will be no time step at the next possible date.

 ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat

 On Tue, Jun 30, 2015 at 11:30 PM, Stefan netfort...@gmail.com wrote:
 This was supposed to have happened @midnight UTC, right? Meaning that we
 are past that event. Under which scenarios should people be concerned about
 midnight local time? Lots of confusing messages flying all over...
 On Jun 30, 2015 10:13 PM, frnk...@iname.com wrote:

 We experienced our first leap second outage -- our SHE (super head end) is
 using (old) Motorola encoders and we lost those video channels.  They
 restarted all those encoders to restore service.

 Frank





-- 
-Joe
920-530-3631


Re: leap second outage

2015-06-30 Thread Mikael Abrahamsson

On Wed, 1 Jul 2015, Jean-Francois Mezei wrote:

However, in systems that expect tightly synchronized clocks, they would 
want all the nodes to make the NTP adjustement at the same time.


This is both an operating system and application problem.

http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time
http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time

This is similar to the jiffycounter wrapping, since this doesn't happen 
that often, it's not commonly tested for. Good way is to start the jiffy 
counter so it wraps after 10 minutes of uptime. That way you'll run into 
any bugs quickly. Either we should abolish the leap second or we should 
make leap second adjustments (back and forth) on a monthly basis to 
exercise the code.


This is a hard sell though...

--
Mikael Abrahamssonemail: swm...@swm.pp.se