Re: leap second outage

2015-07-01 Thread Harlan Stenn
Jimmy Hess writes:
> On Wed, Jul 1, 2015 at 12:38 AM, Mikael Abrahamsson  wrote:
> > quickly. Either we should abolish the leap second or we should make leap
> > second adjustments (back and forth) on a monthly basis to exercise the code
> .
> 
> See  maybe there should some day be building codes for
> commercially marketed software  that provide minimum independent
> formal testing to be done by licensed independent testers,  including
> leap seconds and such. ^_^

And NTF's Certification and Compliance programs are going to do this.
At least as soon as NTF has the resources to get this moving.

> The leap second issues are possibly rare and intermittent,  therefore,
>  having a few per month  is not necessarily giving adequate exposure
> to code paths that may go wrong during an insert/del event.

If they happened every 6 month's time that would be often enough, but
the earth hasn't slowed down that much yet.  There will be enough times
that we could insert or delete one every month and still have |UT-UT1|
be under .9 seconds.

If it was announced that "starting in 6 months' time we'll be inserting
or deleting a leap second every month or so that would give folks enough
time to prep for it, and I'm pretty confident that the leap-second would
soon become a non-event.

> There's never been a negative leap second, only insertions, but how
> deletions are implemented  might expose new bugs, since there hasn't
> been one before,  And you can only have one leap per 24 hours,
> positive or minus,  pick one.

Yup.

> & Shouldn't this kind of 'exercise'  be done  during the QA process
> before releasing new system software,   rather than mucking with clock
> accuracy?

leap second handling is a "mechanism" question.  Which one to choose is
a "policy" question.  IMO, a vendor should provide adequate mechanism.
The customer should get to choose policy.

> There is a recent article with some Leap Second  'stress testing' code:
>   https://access.redhat.com/articles/199563
> 
> 
> Readily available test methods are available,  there ought to be
> little legitimate excuse for anyone writing serious software that has
> long-running processes or threads   not to include  evaluation for
> possible leap second  issues  and other possible clock-related issues
> such as clock stepping, DST, and Year 2038 in their standard smoke
> tests

Yes.  And even so, testing these things takes time and equipment.
-- 
Harlan Stenn 
http://networktimefoundation.org - be a member!


Re: leap second outage

2015-07-01 Thread Tim Raphael
No, it was a route leak by a colo  provider (Axcelx) downstream.

Regards,

Tim Raphael

> On 1 Jul 2015, at 11:37 am, Justin Paine via NANOG  wrote:
> 
> Any confirmation if the AWS outage was leap second-related?
> 
> 
> Justin Paine
> Head of Trust & Safety
> CloudFlare Inc.
> PGP KeyID: 57B6 0114 DE0B 314D
> 
> 
>> On Tue, Jun 30, 2015 at 8:32 PM, Dovid Bender  wrote:
>> I read that and that at midnight local time since that's when you have the 
>> extra second. I know a large carrier in Israel is down. Waiting for conf. If 
>> it's leep second related.
>> 
>> --Original Message--
>> From: Stefan
>> Sender: NANOG
>> To: frnk...@iname.com
>> Cc: nanog@nanog.org
>> Subject: Re: leap second outage
>> Sent: Jun 30, 2015 23:30
>> 
>> This was supposed to have happened @midnight UTC, right? Meaning that we
>> are past that event. Under which scenarios should people be concerned about
>> midnight local time? Lots of confusing messages flying all over...
>>> On Jun 30, 2015 10:13 PM,  wrote:
>>> 
>>> We experienced our first leap second outage -- our SHE (super head end) is
>>> using (old) Motorola encoders and we lost those video channels.  They
>>> restarted all those encoders to restore service.
>>> 
>>> Frank
>> 
>> Regards,
>> 
>> Dovid


RE: leap second outage

2015-07-01 Thread frnkblk
And just 12.5% of them required TLC. =)

-Original Message-
From: NANOG [mailto:nanog-boun...@nanog.org] On Behalf Of frnk...@iname.com
Sent: Wednesday, July 01, 2015 7:05 AM
To: 'Stefan'
Cc: nanog@nanog.org
Subject: RE: leap second outage

Yes, happened at 7 pm Central (0:oo UTC).

 

From: Stefan [mailto:netfort...@gmail.com] 
Sent: Tuesday, June 30, 2015 10:30 PM
To: frnk...@iname.com
Cc: nanog@nanog.org
Subject: Re: leap second outage

 

This was supposed to have happened @midnight UTC, right? Meaning that we are 
past that event. Under which scenarios should people be concerned about 
midnight local time? Lots of confusing messages flying all over... 

On Jun 30, 2015 10:13 PM, mailto:frnk...@iname.com> > wrote:

We experienced our first leap second outage -- our SHE (super head end) is
using (old) Motorola encoders and we lost those video channels.  They
restarted all those encoders to restore service.

Frank





Re: leap second outage

2015-07-01 Thread Justin Paine via NANOG
Any confirmation if the AWS outage was leap second-related?


Justin Paine
Head of Trust & Safety
CloudFlare Inc.
PGP KeyID: 57B6 0114 DE0B 314D


On Tue, Jun 30, 2015 at 8:32 PM, Dovid Bender  wrote:
> I read that and that at midnight local time since that's when you have the 
> extra second. I know a large carrier in Israel is down. Waiting for conf. If 
> it's leep second related.
>
> --Original Message--
> From: Stefan
> Sender: NANOG
> To: frnk...@iname.com
> Cc: nanog@nanog.org
> Subject: Re: leap second outage
> Sent: Jun 30, 2015 23:30
>
> This was supposed to have happened @midnight UTC, right? Meaning that we
> are past that event. Under which scenarios should people be concerned about
> midnight local time? Lots of confusing messages flying all over...
> On Jun 30, 2015 10:13 PM,  wrote:
>
>> We experienced our first leap second outage -- our SHE (super head end) is
>> using (old) Motorola encoders and we lost those video channels.  They
>> restarted all those encoders to restore service.
>>
>> Frank
>>
>>
>
> Regards,
>
> Dovid


Re: leap second outage

2015-07-01 Thread Jimmy Hess
On Wed, Jul 1, 2015 at 12:38 AM, Mikael Abrahamsson  wrote:
> quickly. Either we should abolish the leap second or we should make leap
> second adjustments (back and forth) on a monthly basis to exercise the code.

See  maybe there should some day be building codes for
commercially marketed software  that provide minimum independent
formal testing to be done by licensed independent testers,  including
leap seconds and such. ^_^

The leap second issues are possibly rare and intermittent,  therefore,
 having a few per month  is not necessarily giving adequate exposure
to code paths that may go wrong during an insert/del event.

There's never been a negative leap second, only insertions, but how
deletions are implemented  might expose new bugs, since there hasn't
been one before,  And you can only have one leap per 24 hours,
positive or minus,  pick one.

& Shouldn't this kind of 'exercise'  be done  during the QA process
before releasing new system software,   rather than mucking with clock
accuracy?

There is a recent article with some Leap Second  'stress testing' code:
  https://access.redhat.com/articles/199563


Readily available test methods are available,  there ought to be
little legitimate excuse for anyone writing serious software that has
long-running processes or threads   not to include  evaluation for
possible leap second  issues  and other possible clock-related issues
such as clock stepping, DST, and Year 2038 in their standard smoke
tests

> --
> Mikael Abrahamssonemail: swm...@swm.pp.se
--
-JH


RE: leap second outage

2015-07-01 Thread frnkblk
Yes, happened at 7 pm Central (0:oo UTC).

 

From: Stefan [mailto:netfort...@gmail.com] 
Sent: Tuesday, June 30, 2015 10:30 PM
To: frnk...@iname.com
Cc: nanog@nanog.org
Subject: Re: leap second outage

 

This was supposed to have happened @midnight UTC, right? Meaning that we are 
past that event. Under which scenarios should people be concerned about 
midnight local time? Lots of confusing messages flying all over... 

On Jun 30, 2015 10:13 PM, mailto:frnk...@iname.com> > wrote:

We experienced our first leap second outage -- our SHE (super head end) is
using (old) Motorola encoders and we lost those video channels.  They
restarted all those encoders to restore service.

Frank



Re: leap second outage

2015-07-01 Thread Johnny Eriksson
Mikael Abrahamsson  wrote:
> This is similar to the jiffycounter wrapping, since this doesn't happen 
> that often, it's not commonly tested for. Good way is to start the jiffy 
> counter so it wraps after 10 minutes of uptime. That way you'll run into 
> any bugs quickly. Either we should abolish the leap second or we should 
> make leap second adjustments (back and forth) on a monthly basis to 
> exercise the code.

You could do this, move back on even-numbered months and forward on odd.

Any real adjustment could be done via inhibiting the monthly change...

> This is a hard sell though...

'fraid so.

> Mikael Abrahamssonemail: swm...@swm.pp.se

--Johnny


Re: leap second outage

2015-06-30 Thread Colin Johnston
oracle linux did this
Jul  1 02:01:29 oraclelinux ntpd[600]: 0.0.0.0 061c 0c clock_step -1.006445 s
Jul  1 02:01:29 oraclelinux ntpd[600]: 0.0.0.0 0615 05 clock_sync
Jul  1 02:01:29 oraclelinux systemd: Time has been changed
Jul  1 02:01:30 oraclelinux ntpd[600]: 0.0.0.0 c618 08 no_sys_peer
all seemed fine after this

sophus utm did this
2015:07:01-00:59:59 cloudsophosvm kernel: [653957.707421] Clock: inserting leap 
second 23:59:60 UTC
all seemed fine after this


Colin




Re: leap second outage

2015-06-30 Thread Harlan Stenn
Mikael Abrahamsson writes:
> This is similar to the jiffycounter wrapping, since this doesn't happen 
> that often, it's not commonly tested for. Good way is to start the jiffy 
> counter so it wraps after 10 minutes of uptime. That way you'll run into 
> any bugs quickly. Either we should abolish the leap second or we should 
> make leap second adjustments (back and forth) on a monthly basis to 
> exercise the code.
> 
> This is a hard sell though...

and it's perversely interesting.  It would even be tolerable when the
difference between UTC and UT1 is such that the insertions and deletions
maintain the +/- .9 s difference.  There would even be enough time to
warn folks about this.

H


Re: leap second outage

2015-06-30 Thread Mikael Abrahamsson

On Wed, 1 Jul 2015, Jean-Francois Mezei wrote:

However, in systems that expect tightly synchronized clocks, they would 
want all the nodes to make the NTP adjustement at the same time.


This is both an operating system and application problem.

http://infiniteundo.com/post/25326999628/falsehoods-programmers-believe-about-time
http://infiniteundo.com/post/25509354022/more-falsehoods-programmers-believe-about-time

This is similar to the jiffycounter wrapping, since this doesn't happen 
that often, it's not commonly tested for. Good way is to start the jiffy 
counter so it wraps after 10 minutes of uptime. That way you'll run into 
any bugs quickly. Either we should abolish the leap second or we should 
make leap second adjustments (back and forth) on a monthly basis to 
exercise the code.


This is a hard sell though...

--
Mikael Abrahamssonemail: swm...@swm.pp.se


Re: leap second outage

2015-06-30 Thread Jean-Francois Mezei
On 15-07-01 00:47, Harlan Stenn wrote:

> What I'm about to say may not be as stupid as it sounds:  The problems
> here aren't problems for cases where it's not a problem.  It is a
> problem where it *is* a problem.

In fairness, systems should be used to NTP making adjustments to the
system clock of a second or less.

However, in systems that expect tightly synchronized clocks, they would
want all the nodes to make the NTP adjustement at the same time.



Re: leap second outage

2015-06-30 Thread Harlan Stenn
Joe writes:
> A leap sec causing issues. For about 40 years now, there have been
> these leap seconds to no real issue. All of these are "go-forwards"

No, they're all "go-backwards" events.  That's no big deal to things
that don't care about monotonic time, or to folks who aren't in
violation of something if their timestamps are off by a second.

What I'm about to say may not be as stupid as it sounds:  The problems
here aren't problems for cases where it's not a problem.  It is a
problem where it *is* a problem.

It's a case where one person's signal is another person's noise.

H


Re: leap second outage

2015-06-30 Thread Joe
A leap sec causing issues. For about 40 years now, there have been
these leap seconds to no real issue. All of these are "go-forwards"
and even MS AD (I believe) treat them as a little bump (nothing to see
here move along). So unless you have really a tight VPN (non-standard
conforming) I'd hope that nothing has happend, and if it did chances
are it's etheir coincidence or intentional.
I certainly hope I am around to collect on the
https://en.wikipedia.org/wiki/Year_2038_problem for retirement.
I think we've all seen the "big to do" regarding Y2K to know better
Maybe I am wrong, but...

Just my 2¢s
-Joe

On Tue, Jun 30, 2015 at 10:42 PM, Nicholas Suan  wrote:
> Correct, the leap second gets inserted at midnight UTC.
>
> "Leap seconds can be introduced in UTC at the end of the months of December
>
>  or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every
>  six months, either to announce a time step in UTC or to confirm that there
>  will be no time step at the next possible date."
>
> ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat
>
> On Tue, Jun 30, 2015 at 11:30 PM, Stefan  wrote:
>> This was supposed to have happened @midnight UTC, right? Meaning that we
>> are past that event. Under which scenarios should people be concerned about
>> midnight local time? Lots of confusing messages flying all over...
>> On Jun 30, 2015 10:13 PM,  wrote:
>>
>>> We experienced our first leap second outage -- our SHE (super head end) is
>>> using (old) Motorola encoders and we lost those video channels.  They
>>> restarted all those encoders to restore service.
>>>
>>> Frank
>>>
>>>



-- 
-Joe
920-530-3631


Re: leap second outage

2015-06-30 Thread Nicholas Suan
Correct, the leap second gets inserted at midnight UTC.

"Leap seconds can be introduced in UTC at the end of the months of December

 or June, depending on the evolution of UT1-TAI. Bulletin C is mailed every
 six months, either to announce a time step in UTC or to confirm that there
 will be no time step at the next possible date."

ftp://hpiers.obspm.fr/iers/bul/bulc/bulletinc.dat

On Tue, Jun 30, 2015 at 11:30 PM, Stefan  wrote:
> This was supposed to have happened @midnight UTC, right? Meaning that we
> are past that event. Under which scenarios should people be concerned about
> midnight local time? Lots of confusing messages flying all over...
> On Jun 30, 2015 10:13 PM,  wrote:
>
>> We experienced our first leap second outage -- our SHE (super head end) is
>> using (old) Motorola encoders and we lost those video channels.  They
>> restarted all those encoders to restore service.
>>
>> Frank
>>
>>


Re: leap second outage

2015-06-30 Thread Dovid Bender
No. Some one leaked some routes: 
https://mobile.twitter.com/Axcelx/status/616058414746202113


Regards,

Dovid

-Original Message-
From: Justin Paine 
Date: Tue, 30 Jun 2015 20:37:06 
To: 
Cc: Stefan; NANOG; 
; 
Subject: Re: leap second outage

Any confirmation if the AWS outage was leap second-related?


Justin Paine
Head of Trust & Safety
CloudFlare Inc.
PGP KeyID: 57B6 0114 DE0B 314D


On Tue, Jun 30, 2015 at 8:32 PM, Dovid Bender  wrote:
> I read that and that at midnight local time since that's when you have the 
> extra second. I know a large carrier in Israel is down. Waiting for conf. If 
> it's leep second related.
>
> --Original Message--
> From: Stefan
> Sender: NANOG
> To: frnk...@iname.com
> Cc: nanog@nanog.org
> Subject: Re: leap second outage
> Sent: Jun 30, 2015 23:30
>
> This was supposed to have happened @midnight UTC, right? Meaning that we
> are past that event. Under which scenarios should people be concerned about
> midnight local time? Lots of confusing messages flying all over...
> On Jun 30, 2015 10:13 PM,  wrote:
>
>> We experienced our first leap second outage -- our SHE (super head end) is
>> using (old) Motorola encoders and we lost those video channels.  They
>> restarted all those encoders to restore service.
>>
>> Frank
>>
>>
>
> Regards,
>
> Dovid


Re: leap second outage

2015-06-30 Thread Josh Luthman
That is my understanding as well.  The event was about 3.5 hours ago.


Josh Luthman
Office: 937-552-2340
Direct: 937-552-2343
1100 Wayne St
Suite 1337
Troy, OH 45373

On Tue, Jun 30, 2015 at 11:30 PM, Stefan  wrote:

> This was supposed to have happened @midnight UTC, right? Meaning that we
> are past that event. Under which scenarios should people be concerned about
> midnight local time? Lots of confusing messages flying all over...
> On Jun 30, 2015 10:13 PM,  wrote:
>
> > We experienced our first leap second outage -- our SHE (super head end)
> is
> > using (old) Motorola encoders and we lost those video channels.  They
> > restarted all those encoders to restore service.
> >
> > Frank
> >
> >
>


Re: leap second outage

2015-06-30 Thread Dovid Bender
I read that and that at midnight local time since that's when you have the 
extra second. I know a large carrier in Israel is down. Waiting for conf. If 
it's leep second related.

--Original Message--
From: Stefan
Sender: NANOG
To: frnk...@iname.com
Cc: nanog@nanog.org
Subject: Re: leap second outage
Sent: Jun 30, 2015 23:30

This was supposed to have happened @midnight UTC, right? Meaning that we
are past that event. Under which scenarios should people be concerned about
midnight local time? Lots of confusing messages flying all over...
On Jun 30, 2015 10:13 PM,  wrote:

> We experienced our first leap second outage -- our SHE (super head end) is
> using (old) Motorola encoders and we lost those video channels.  They
> restarted all those encoders to restore service.
>
> Frank
>
>

Regards,

Dovid


Re: leap second outage

2015-06-30 Thread Stefan
This was supposed to have happened @midnight UTC, right? Meaning that we
are past that event. Under which scenarios should people be concerned about
midnight local time? Lots of confusing messages flying all over...
On Jun 30, 2015 10:13 PM,  wrote:

> We experienced our first leap second outage -- our SHE (super head end) is
> using (old) Motorola encoders and we lost those video channels.  They
> restarted all those encoders to restore service.
>
> Frank
>
>