Re: Segmentation fault running ntpd

2015-11-04 Thread Ian Lepore
On Wed, 2015-11-04 at 17:49 -0800, Doug Hardie wrote:
> > On 4 November 2015, at 08:15, Mark Martinec <
> > mark.martinec+free...@ijs.si> wrote:
> > 
> > Upgrading 10.2-RELEASE-p6 to 10.2-RELEASE-p7 now solved ntpd
> > crashes
> > (apparently fixed by: FreeBSD Errata Notice FreeBSD-EN-15:20.vm).
> > 
> > Thanks!!!
> > 
> >  Mark
> > 
> 
> ntpdc hangs when you do a peers command on 9.3.  Eventually it
> returns a no response from the server.  However, ntpq works just fine
> and nagios is able to get the status without problems.  Both of those
> did not work properly before.
> 
> — Doug

The protocol used by ntpdc is no longer supported by ntpd, and that
change came along for the ride with the security and bugfixes that were
recently merged back to the 9 and 10 branches.

Everything that can be done with ntpdc on older releases can now be
done using ntpq with the new release.  The ntpdc program itself is
still present so that you can still administer remote servers running
older code, since they won't be able to do everything via ntpq.

-- Ian

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-11-04 Thread Doug Hardie

> On 4 November 2015, at 08:15, Mark Martinec  
> wrote:
> 
> Upgrading 10.2-RELEASE-p6 to 10.2-RELEASE-p7 now solved ntpd crashes
> (apparently fixed by: FreeBSD Errata Notice FreeBSD-EN-15:20.vm).
> 
> Thanks!!!
> 
>  Mark
> 

ntpdc hangs when you do a peers command on 9.3.  Eventually it returns a no 
response from the server.  However, ntpq works just fine and nagios is able to 
get the status without problems.  Both of those did not work properly before.

— Doug

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-11-04 Thread Mark Martinec

Upgrading 10.2-RELEASE-p6 to 10.2-RELEASE-p7 now solved ntpd crashes
(apparently fixed by: FreeBSD Errata Notice FreeBSD-EN-15:20.vm).

Thanks!!!

  Mark


On 2015-11-01 10:31, Andre Albsmeier wrote:

On Fri, 30-Oct-2015 at 19:47:59 +0100, Mark Martinec wrote:

Not sure if it's the same issue, but it sure looks like it is.

I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5
to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just
replaced the /usr/sbin/ntpd with a new one; then I restarted
the ntpd.

On all host but one this was successful: the new ntpd starts
fine and works normally. But on one of these machines the
ntpd process immediately crashes with SIGSEGV. That machine
has an Intel Xeon cpu. It is not apparent to me in what way
this machine differs from others,


I'll add my observations here:

I am using an ntp.conf with a single server entry:

server ntp.some.domain.org

ntp.some.domain.org is a CNAME pointing to gate.some.domain.org
and the latter contains an A record pointing to 192.168.128.1.

After updating 9.3-STABLE to the latest version (one which includes ntp
4.2.8p4), ntpd crashes:

Nov 1 09:38:38 voyager kernel: pid 4443 (ntpd), uid 0: exited on signal 
11


This happens in line 871 of ntpd.c where mlockall() is called:

&& 0 != mlockall(MCL_CURRENT|MCL_FUTURE))

It does NOT crash with MCL_FUTURE only.
It does crash with MCL_CURRENT only.

When adding

rlimit memlock -1

to ntpd.conf it does NOT crash (as mlockall() won't be called anymore).

When specifying the IP address (192.168.128.1) as the server it
does NOT crash.

When specifying gate.some.domain.org as the server it also does
NOT crash. tcpdump shows in this case:

09:49:59.542310 IP 192.168.128.2.21102 > 192.168.128.1.53: 7639+ A?
gate.some.domain.org. (41)
09:49:59.542578 IP 192.168.128.1.53 > 192.168.128.2.21102: 7639* 1/1/0
A 192.168.128.1 (71)
09:49:59.542612 IP 192.168.128.2.52455 > 192.168.128.1.53: 42047+
? gate.some.domain.org. (41)
09:49:59.542792 IP 192.168.128.1.53 > 192.168.128.2.52455: 42047* 0/1/0 
(88)


When reverting the server entry back to ntp.some.domain.org
it crashes and tcpdump shows:

09:36:05.172552 IP 192.168.128.2.17836 > 192.168.128.1.53: 49768+ A?
ntp.some.domain.org. (40)
09:36:05.173320 IP 192.168.128.1.53 > 192.168.128.2.17836: 49768*
2/1/0 CNAME gate.some.domain.org., A 192.168.128.1 (89)
09:36:05.173361 IP 192.168.128.2.22611 > 192.168.128.1.53: 63808+
? ntp.some.domain.org. (40)
09:36:05.173595 IP 192.168.128.1.53 > 192.168.128.2.22611: 63808*
1/1/0 CNAME gate.some.domain.org. (106)

The probability for crashing increases with the speed and the
number of cores of the machine: On my old single-core Pentiums
it never crashes, on my quad-cores i7-3770K it always crashes.

The (asynchronous) resolving of the names start in line 3876 of
ntp_config.c:

getaddrinfo_sometime(curr_peer->addr->address,

If we put the mlockall() call directly before this line, the
crash is gone.

Maybe you want to play around with rlimit, CNAMES, IPs and
so on...

-Andre

Anyone else seeing this?

2015-10-30 12:34, je David Wolfskill napisal
> On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote:
>> David Wolfskill  writes:
>> > ...
>> > bound to 172.17.1.245 -- renewal in 43200 seconds.
>> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
>> > Starting Network: lo0 em0 iwn0 lagg0.
>> > ...
>>
>> Did you find a solution?  I'm wondering if the ntpd problems people
>> are
>> reporting on freebsd-security@ are related.  I vaguely recall hearing
>> that this had been traced to a pthread bug, but can't find anything
>> about it in commit logs or mailing list archives.
>> 
>
> I don't recall finding "a solution" per se; that said, I also don't
> recall seeing an occurrence of the above for enough time that I'm not
> sure when I sent that message. :-}
>
> As a reality check:
>
> g1-252(11.0-C)[1] ls -lT /*.core
> -rw-r--r--  1 root  wheel  13783040 Aug 18 04:19:03 2015 /ntpd.core
> g1-252(11.0-C)[2]
>
> So -- among other points -- my last sighting of whatever was causing
> that was the day I built:
>
> FreeBSD 11.0-CURRENT #157  r286880M/286880:1100079: Tue Aug 18
> 04:45:25 PDT 2015
> r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
>
> Note that the machines where I run head get updated daily (unless
> there's enough of a problem with head that I can't build it or can't
> boot it (and I'm unable to circumvent the issue within a reasonable
> time)) -- and while I do attempt to run ntpd on the machines, the above
> failure is more "annoying" than "crippling" in my particular case.
>
> And I'm presently running:
>
> FreeBSD 11.0-CURRENT #227  r290138M/290138:1100084: Thu Oct 29
> 05:12:58 PDT 2015
> r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
>
> and building head @r290190 as I type.
>
> And FWIW, I *suspect* that one of the issues involved (in my case)
> was a ... lack 

Re: Segmentation fault running ntpd

2015-11-01 Thread Andre Albsmeier
On Fri, 30-Oct-2015 at 19:47:59 +0100, Mark Martinec wrote:
> Not sure if it's the same issue, but it sure looks like it is.
> 
> I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5
> to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just
> replaced the /usr/sbin/ntpd with a new one; then I restarted
> the ntpd.
> 
> On all host but one this was successful: the new ntpd starts
> fine and works normally. But on one of these machines the
> ntpd process immediately crashes with SIGSEGV. That machine
> has an Intel Xeon cpu. It is not apparent to me in what way
> this machine differs from others,

I'll add my observations here:

I am using an ntp.conf with a single server entry:

server ntp.some.domain.org

ntp.some.domain.org is a CNAME pointing to gate.some.domain.org
and the latter contains an A record pointing to 192.168.128.1.

After updating 9.3-STABLE to the latest version (one which includes ntp
4.2.8p4), ntpd crashes:

Nov 1 09:38:38 voyager kernel: pid 4443 (ntpd), uid 0: exited on signal 11

This happens in line 871 of ntpd.c where mlockall() is called:

&& 0 != mlockall(MCL_CURRENT|MCL_FUTURE))

It does NOT crash with MCL_FUTURE only.
It does crash with MCL_CURRENT only.

When adding

rlimit memlock -1

to ntpd.conf it does NOT crash (as mlockall() won't be called anymore).

When specifying the IP address (192.168.128.1) as the server it
does NOT crash.

When specifying gate.some.domain.org as the server it also does
NOT crash. tcpdump shows in this case:

09:49:59.542310 IP 192.168.128.2.21102 > 192.168.128.1.53: 7639+ A? 
gate.some.domain.org. (41)
09:49:59.542578 IP 192.168.128.1.53 > 192.168.128.2.21102: 7639* 1/1/0 A 
192.168.128.1 (71)
09:49:59.542612 IP 192.168.128.2.52455 > 192.168.128.1.53: 42047+ ? 
gate.some.domain.org. (41)
09:49:59.542792 IP 192.168.128.1.53 > 192.168.128.2.52455: 42047* 0/1/0 (88)

When reverting the server entry back to ntp.some.domain.org
it crashes and tcpdump shows:

09:36:05.172552 IP 192.168.128.2.17836 > 192.168.128.1.53: 49768+ A? 
ntp.some.domain.org. (40)
09:36:05.173320 IP 192.168.128.1.53 > 192.168.128.2.17836: 49768* 2/1/0 CNAME 
gate.some.domain.org., A 192.168.128.1 (89)
09:36:05.173361 IP 192.168.128.2.22611 > 192.168.128.1.53: 63808+ ? 
ntp.some.domain.org. (40)
09:36:05.173595 IP 192.168.128.1.53 > 192.168.128.2.22611: 63808* 1/1/0 CNAME 
gate.some.domain.org. (106)

The probability for crashing increases with the speed and the
number of cores of the machine: On my old single-core Pentiums
it never crashes, on my quad-cores i7-3770K it always crashes.

The (asynchronous) resolving of the names start in line 3876 of
ntp_config.c:

getaddrinfo_sometime(curr_peer->addr->address,

If we put the mlockall() call directly before this line, the
crash is gone.

Maybe you want to play around with rlimit, CNAMES, IPs and
so on...

-Andre

Anyone else seeing this?
> 2015-10-30 12:34, je David Wolfskill napisal
> > On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote:
> >> David Wolfskill  writes:
> >> > ...
> >> > bound to 172.17.1.245 -- renewal in 43200 seconds.
> >> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> >> > Starting Network: lo0 em0 iwn0 lagg0.
> >> > ...
> >> 
> >> Did you find a solution?  I'm wondering if the ntpd problems people 
> >> are
> >> reporting on freebsd-security@ are related.  I vaguely recall hearing
> >> that this had been traced to a pthread bug, but can't find anything
> >> about it in commit logs or mailing list archives.
> >> 
> > 
> > I don't recall finding "a solution" per se; that said, I also don't
> > recall seeing an occurrence of the above for enough time that I'm not
> > sure when I sent that message. :-}
> > 
> > As a reality check:
> > 
> > g1-252(11.0-C)[1] ls -lT /*.core
> > -rw-r--r--  1 root  wheel  13783040 Aug 18 04:19:03 2015 /ntpd.core
> > g1-252(11.0-C)[2]
> > 
> > So -- among other points -- my last sighting of whatever was causing
> > that was the day I built:
> > 
> > FreeBSD 11.0-CURRENT #157  r286880M/286880:1100079: Tue Aug 18
> > 04:45:25 PDT 2015
> > r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
> > 
> > Note that the machines where I run head get updated daily (unless
> > there's enough of a problem with head that I can't build it or can't
> > boot it (and I'm unable to circumvent the issue within a reasonable
> > time)) -- and while I do attempt to run ntpd on the machines, the above
> > failure is more "annoying" than "crippling" in my particular case.
> > 
> > And I'm presently running:
> > 
> > FreeBSD 11.0-CURRENT #227  r290138M/290138:1100084: Thu Oct 29
> > 05:12:58 PDT 2015
> > r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64
> > 
> > and building head @r290190 as I type.
> > 
> > And FWIW, I *suspect* that one of the issues involved (in my case)
> > was a ... lack of determinism ... in events involving getting the
> > (wireless) 

Re: Segmentation fault running ntpd

2015-10-30 Thread NGie Cooper

> On Oct 30, 2015, at 01:42, Dag-Erling Smørgrav  wrote:
> 
> David Wolfskill  writes:
>> ...
>> bound to 172.17.1.245 -- renewal in 43200 seconds.
>> pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
>> Starting Network: lo0 em0 iwn0 lagg0.
>> ...
> 
> Did you find a solution?  I'm wondering if the ntpd problems people are
> reporting on freebsd-security@ are related.  I vaguely recall hearing
> that this had been traced to a pthread bug, but can't find anything
> about it in commit logs or mailing list archives.

https://svnweb.freebsd.org/changeset/base/287591 ?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread Dag-Erling Smørgrav
NGie Cooper  writes:
> Dag-Erling Smørgrav  writes:
> > David Wolfskill  writes:
> > > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> > Did you find a solution?  [...]
> https://svnweb.freebsd.org/changeset/base/287591 ?

Are you certain?  The commit message does not mention David or ntpd.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread Dag-Erling Smørgrav
David Wolfskill  writes:
> ...
> bound to 172.17.1.245 -- renewal in 43200 seconds.
> pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> Starting Network: lo0 em0 iwn0 lagg0.
> ...

Did you find a solution?  I'm wondering if the ntpd problems people are
reporting on freebsd-security@ are related.  I vaguely recall hearing
that this had been traced to a pthread bug, but can't find anything
about it in commit logs or mailing list archives.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread Dag-Erling Smørgrav
Franco Fichtner  writes:
> Well, it’s on stable/10 since September 16 and somebody reported that
> this particular branch would not trigger the crash along with HEAD,
> but any 10.x would.  Can’t find the reference right now though.

OK, we should do an EN with that patch then, but we may have to include
some of the other recent commits to the vm_map.c, which seem (at a quick
glance) to be related.

DES
-- 
Dag-Erling Smørgrav - d...@des.no
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread NGie Cooper

> On Oct 30, 2015, at 02:32, Franco Fichtner  wrote:
> 
> Well, it’s on stable/10 since September 16 and somebody reported that
> this particular branch would not trigger the crash along with HEAD,
> but any 10.x would.  Can’t find the reference right now though.

You’re right. My Mail.app search fu was failing me for a minute..


r287846 | kib | 2015-09-15 21:20:39 -0700 (Tue, 15 Sep 2015) | 4 lines

MFC r287591:
There is no reason in the current kernel to disallow write access to
the COW wired entry if the entry permissions allow it.  Remove the check.

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread Franco Fichtner
Well, it’s on stable/10 since September 16 and somebody reported that
this particular branch would not trigger the crash along with HEAD,
but any 10.x would.  Can’t find the reference right now though.


> On 30 Oct 2015, at 10:24 am, NGie Cooper  wrote:
> 
> 
>> On Oct 30, 2015, at 02:18, Franco Fichtner  wrote:
>> 
>> Hi all,
>> 
>> I did a quick test build and this seems to solve the ntpd crash issue
>> on top of releng/10.1.
> 
> Makes sense … looking through my email r287591 was never MFCed back to 
> stable/9 or stable/10 :/ .
> HTH,
> -NGie
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread Matthew Seaman
On 10/30/15 09:32, Franco Fichtner wrote:
> Well, it’s on stable/10 since September 16 and somebody reported that
> this particular branch would not trigger the crash along with HEAD,
> but any 10.x would.  Can’t find the reference right now though.

That was me, amongst others.  There are threads on security@ and questions@.

>> On 30 Oct 2015, at 10:24 am, NGie Cooper  wrote:
>>
>>
>>> On Oct 30, 2015, at 02:18, Franco Fichtner  wrote:
>>>
>>> Hi all,
>>>
>>> I did a quick test build and this seems to solve the ntpd crash issue
>>> on top of releng/10.1.
>>
>> Makes sense … looking through my email r287591 was never MFCed back to 
>> stable/9 or stable/10 :/ .
>> HTH,
>> -NGie

There were two problems reported:

1) ntpdc and ntpq would crash -- this was reported for 9.3-STABLE -- I
don't think it affected other releases, and was diagnosed as due to a
pthreads linking issue.  Solved for 9.x in r290044 and r290046

2) ntpd SEGV's on startup on 10.2-RELEASE-p6 (possibly others).
Curiously, so does net/ntp from ports, but only on the second startup.
Exactly the same ntp package seems to run and restart just fine on
recent 10-STABLE though.  As does the base system ntpd.

Cheers,

Matthew







signature.asc
Description: OpenPGP digital signature


Re: Segmentation fault running ntpd

2015-10-30 Thread NGie Cooper

> On Oct 30, 2015, at 02:05, Dag-Erling Smørgrav  wrote:
> 
> NGie Cooper  writes:
>> Dag-Erling Smørgrav  writes:
>>> David Wolfskill  writes:
 pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
>>> Did you find a solution?  [...]
>> https://svnweb.freebsd.org/changeset/base/287591 ?
> 
> Are you certain?  The commit message does not mention David or ntpd.

That commit was pretty involved. Peter documented the issue in the thread 
titled "ABORT! ABORT! Re: HEADS UP: this month's cluster refresh” that was sent 
to the internal list.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread Franco Fichtner
Hi all,

I did a quick test build and this seems to solve the ntpd crash issue
on top of releng/10.1.


Cheers,
Franco

> On 30 Oct 2015, at 10:09 am, NGie Cooper  wrote:
> 
> 
>> On Oct 30, 2015, at 02:05, Dag-Erling Smørgrav  wrote:
>> 
>> NGie Cooper  writes:
>>> Dag-Erling Smørgrav  writes:
 David Wolfskill  writes:
> pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
 Did you find a solution?  [...]
>>> https://svnweb.freebsd.org/changeset/base/287591 ?
>> 
>> Are you certain?  The commit message does not mention David or ntpd.
> 
> That commit was pretty involved. Peter documented the issue in the thread 
> titled "ABORT! ABORT! Re: HEADS UP: this month's cluster refresh” that was 
> sent to the internal list.
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread NGie Cooper

> On Oct 30, 2015, at 02:18, Franco Fichtner  wrote:
> 
> Hi all,
> 
> I did a quick test build and this seems to solve the ntpd crash issue
> on top of releng/10.1.

Makes sense … looking through my email r287591 was never MFCed back to stable/9 
or stable/10 :/ .
HTH,
-NGie
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread Mark Martinec

Not sure if it's the same issue, but it sure looks like it is.

I have upgraded a couple of hosts (amd64) from 10.2-RELEASE-p5
to 10.2-RELEASE-p6, i.e. the freebsd-upgrade essentially just
replaced the /usr/sbin/ntpd with a new one; then I restarted
the ntpd.

On all host but one this was successful: the new ntpd starts
fine and works normally. But on one of these machines the
ntpd process immediately crashes with SIGSEGV. That machine
has an Intel Xeon cpu. It is not apparent to me in what way
this machine differs from others,

Played with some variations of ntpd on that host, here are
some findings:

- the new ntpd (that came with 10.2-RELEASE-p6) runs fine
  if it does *not* daemonize, i.e. ntpd with an option -n or -d
  stays attached to a terminal and works fine; the same
  happens when run under ktrace -d -i ntpd  ... it works fine,
  even when it daemonizes;

- the ntpd built from fresh net/ntp-devel behaves exactly
  the same: crashes on that machine when it daemonizes

- a previous ntpd (from 10.2-RELEASE-p5) works fine,
  so I ended up downgrading ntpd to that previous version
  on that machine. Also a ntpd from a recent 10-STABLE
  when copied to that host runs fine there!

I haven't tried yet to build it with debugging, or capture
a core dump.

Puzzling...

   Mark



2015-10-30 12:34, je David Wolfskill napisal

On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote:

David Wolfskill  writes:
> ...
> bound to 172.17.1.245 -- renewal in 43200 seconds.
> pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> Starting Network: lo0 em0 iwn0 lagg0.
> ...

Did you find a solution?  I'm wondering if the ntpd problems people 
are

reporting on freebsd-security@ are related.  I vaguely recall hearing
that this had been traced to a pthread bug, but can't find anything
about it in commit logs or mailing list archives.



I don't recall finding "a solution" per se; that said, I also don't
recall seeing an occurrence of the above for enough time that I'm not
sure when I sent that message. :-}

As a reality check:

g1-252(11.0-C)[1] ls -lT /*.core
-rw-r--r--  1 root  wheel  13783040 Aug 18 04:19:03 2015 /ntpd.core
g1-252(11.0-C)[2]

So -- among other points -- my last sighting of whatever was causing
that was the day I built:

FreeBSD 11.0-CURRENT #157  r286880M/286880:1100079: Tue Aug 18
04:45:25 PDT 2015
r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

Note that the machines where I run head get updated daily (unless
there's enough of a problem with head that I can't build it or can't
boot it (and I'm unable to circumvent the issue within a reasonable
time)) -- and while I do attempt to run ntpd on the machines, the above
failure is more "annoying" than "crippling" in my particular case.

And I'm presently running:

FreeBSD 11.0-CURRENT #227  r290138M/290138:1100084: Thu Oct 29
05:12:58 PDT 2015
r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

and building head @r290190 as I type.

And FWIW, I *suspect* that one of the issues involved (in my case)
was a ... lack of determinism ... in events involving getting the
(wireless) network connectivity into a usable state as part of the
initial transition to multi-user mode.  (I only have evidence at
the moment of the issue on my laptop; my build machine, which only
uses a wired NIC, has no /ntpd.core file.  It and my laptop are updated
pretty much in lock-step; it runs a completely GENERIC kernel, while
the laptop runs a modestly customized one based on GENERIC.)

Peace,
david

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Segmentation fault running ntpd

2015-10-30 Thread David Wolfskill
On Fri, Oct 30, 2015 at 09:42:07AM +0100, Dag-Erling Smørgrav wrote:
> David Wolfskill  writes:
> > ...
> > bound to 172.17.1.245 -- renewal in 43200 seconds.
> > pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
> > Starting Network: lo0 em0 iwn0 lagg0.
> > ...
> 
> Did you find a solution?  I'm wondering if the ntpd problems people are
> reporting on freebsd-security@ are related.  I vaguely recall hearing
> that this had been traced to a pthread bug, but can't find anything
> about it in commit logs or mailing list archives.
> 

I don't recall finding "a solution" per se; that said, I also don't
recall seeing an occurrence of the above for enough time that I'm not
sure when I sent that message. :-}

As a reality check:

g1-252(11.0-C)[1] ls -lT /*.core
-rw-r--r--  1 root  wheel  13783040 Aug 18 04:19:03 2015 /ntpd.core
g1-252(11.0-C)[2] 

So -- among other points -- my last sighting of whatever was causing
that was the day I built:

FreeBSD 11.0-CURRENT #157  r286880M/286880:1100079: Tue Aug 18 04:45:25 PDT 
2015 r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

Note that the machines where I run head get updated daily (unless
there's enough of a problem with head that I can't build it or can't
boot it (and I'm unable to circumvent the issue within a reasonable
time)) -- and while I do attempt to run ntpd on the machines, the above
failure is more "annoying" than "crippling" in my particular case.

And I'm presently running:

FreeBSD 11.0-CURRENT #227  r290138M/290138:1100084: Thu Oct 29 05:12:58 PDT 
2015 r...@g1-252.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64

and building head @r290190 as I type.

And FWIW, I *suspect* that one of the issues involved (in my case)
was a ... lack of determinism ... in events involving getting the
(wireless) network connectivity into a usable state as part of the
initial transition to multi-user mode.  (I only have evidence at
the moment of the issue on my laptop; my build machine, which only
uses a wired NIC, has no /ntpd.core file.  It and my laptop are updated
pretty much in lock-step; it runs a completely GENERIC kernel, while
the laptop runs a modestly customized one based on GENERIC.)

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who would murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


signature.asc
Description: PGP signature


Re: Segmentation fault running ntpd

2015-07-30 Thread Alfred Perlstein
Adrian the crash we are seeing here is very easily reproducible. Grab 
our private ports repo and revert my most recent revert and build.  It 
appears to show up multiple times per day somehow in our configuration.


On 7/28/15 7:25 PM, Adrian Chadd wrote:

On 28 July 2015 at 16:09, David Wolfskill da...@catwhisker.org wrote:

On Tue, Jul 28, 2015 at 04:05:33PM -0700, Adrian Chadd wrote:

Is this still happening for you?


g1-245(10.2-P)[4] ls -lT /S4/ntpd.core
-rw-r--r--  1 root  wheel  13783040 Jul 28 04:56:28 2015 /S4/ntpd.core

Apparently so, yes.

(/S4 is where I have the head root file system mounted when I'm not
running from slice 4.)

Hm, is there any way you can get symbols for it?

I don't think I can easily get symbols for the crash we have, but my
crash is also deep in malloc code..


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Segmentation fault running ntpd

2015-07-28 Thread David Wolfskill
On Tue, Jul 28, 2015 at 04:05:33PM -0700, Adrian Chadd wrote:
 Is this still happening for you?
 

g1-245(10.2-P)[4] ls -lT /S4/ntpd.core 
-rw-r--r--  1 root  wheel  13783040 Jul 28 04:56:28 2015 /S4/ntpd.core

Apparently so, yes.

(/S4 is where I have the head root file system mounted when I'm not
running from slice 4.)

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpDpAhZQ_vQX.pgp
Description: PGP signature


Re: Segmentation fault running ntpd

2015-07-28 Thread Adrian Chadd
Is this still happening for you?


-a


On 24 July 2015 at 06:03, David Wolfskill da...@catwhisker.org wrote:
 On Sun, Jul 19, 2015 at 11:36:00AM -0700, David Wolfskill wrote:
 On Sun, Jul 19, 2015 at 10:24:11AM -0600, Ian Lepore wrote:
  ...
  Was there anything (at all) in /var/log/messages about ntpd?  Even the
  routine messages (such as what interfaces it binds to) might give a bit
  of a clue about how far it got in its init before it died.
  

 Sorry; there might have been something yesterday...
 If I do get another recurrence, I'll try to gather a bit more
 information.
 

 OK; got another one.

 This time, I have the complete /var/log/messages for a verbose boot,
 from that boot to just a bit after the ntpd crash; it's in
 http://www.catwhisker.org/~david/FreeBSD/head; as of the moment, that
 contains:

 [PARENTDIR] Parent Directory -
 [   ] CANARY  2015-03-22 10:03   15K
 [   ] CANARY.gz   2015-03-22 10:03  6.3K
 [   ] ntpd.core   2015-07-24 05:31   13M
 [   ] ntpd.core.gz2015-07-24 05:31  124K
 [TXT] ntpd_crash_msgs.txt 2015-07-24 05:40  138K
 [   ] ntpd_crash_msgs.txt.gz  2015-07-24 05:40   19K

 This was running:

 FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #133  
 r285836M/285836:1100077: Fri Jul 24 05:24:41 PDT 2015 
 r...@g1-245.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64


 Trying gdb /usr/obj/usr/src/usr.sbin/ntp/ntpd/ntpd ntpd.core still
 doesn't help much:

 This GDB was configured as amd64-marcel-freebsd...(no debugging symbols 
 found)...
 Core was generated by `ntpd'.
 Program terminated with signal 11, Segmentation fault.
 Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
 ...
 Loaded symbols for /libexec/ld-elf.so.1
 #0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
 [New Thread 801c07400 (LWP 100133/unknown)]
 [New Thread 801c06400 (LWP 100132/unknown)]
 (gdb) bt
 #0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
 #1  0x0008ccbd4f34 in ?? ()
 #2  0x0005 in ?? ()
 #3  0x000801800448 in ?? ()
 #4  0x0008011ca888 in sbrk () from /lib/libc.so.7
 #5  0x0008018000c8 in ?? ()
 #6  0x0008018000c0 in ?? ()
 #7  0x0208 in ?? ()
 #8  0x000801c32fb0 in ?? ()
 #9  0x0001 in ?? ()
 #10 0x000801cc20c8 in ?? ()
 #11 0x0030 in ?? ()
 #12 0x000801cc20c8 in ?? ()
 #13 0x7fffe480 in ?? ()
 #14 0x0008011cd240 in sbrk () from /lib/libc.so.7
 #15 0x0280 in ?? ()
 #16 0x0008014bbc70 in malloc_message () from /lib/libc.so.7
 #17 0x0008018000c0 in ?? ()
 #18 0x000801800448 in ?? ()
 #19 0x0032 in ?? ()
 #20 0x000801800458 in ?? ()
 #21 0x0008014bbc68 in malloc_message () from /lib/libc.so.7
 #22 0x000801cc2000 in ?? ()
 #23 0x0008014bba60 in malloc_message () from /lib/libc.so.7
 #24 0x000801cc20d8 in ?? ()
 #25 0x00a0 in ?? ()
 #26 0x0208 in ?? ()
 #27 0x7fffe4d0 in ?? ()
 #28 0x0008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7
 Previous frame inner to this frame (corrupt stack?)
 (gdb)


 I am presently suspecting that it's a bit dependent on ... well, timing.

 I have my ~/.xsession set up so that once I've entered the passphrase(s)
 for my SSH private key(s), scripts start running to establish
 connections to other machines -- e.g., open an xterm locally, ssh
 over to my mailhub and (re-)establish a tmux session on that machine
 where I run mutt to read  write email (such as this message).

 While that almost always Just Works in stable/10, it's rather ...
 spottier ... in head -- I'd say it's about a 50% probability that it will
 work, vs. the ssh connection attempt hanging, and eventually timing out.
 But if I've waited (say) 30 seconds or so, I can establish such a
 connection easily.

 Granted, I am using wireless (802.11), but I get a sense that things
 are claimed to be ready to go a bit prematurely -- at least sometimes.

 Peace,
 david
 --
 David H. Wolfskill  da...@catwhisker.org
 Those who murder in the name of God or prophet are blasphemous cowards.

 See http://www.catwhisker.org/~david/publickey.gpg for my public key.
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Segmentation fault running ntpd

2015-07-28 Thread David Wolfskill
On Tue, Jul 28, 2015 at 04:25:45PM -0700, Adrian Chadd wrote:
 ...
 Hm, is there any way you can get symbols for it?

Well, I could CFLAGS+= -g in /etc/make.conf  to a clean build, then
try to re-create it ( point gdb at the objects in /usr/obj/obj/*) --
would that do?

 I don't think I can easily get symbols for the crash we have, but my
 crash is also deep in malloc code..
 

Coincidence?  Inquiring minds want to know :-}

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgpe1oC0HgQsj.pgp
Description: PGP signature


Re: Segmentation fault running ntpd

2015-07-28 Thread Eric van Gyzen

WITH_DEBUG_FILES=1  (IIRC)

On 7/28/15 6:35 PM, Adrian Chadd wrote:

There's some way in stable/10 and -head to get it to install debug
symbols for things. Maybe it's only libraries, but you'll at least
want that in there so you can get stack traces through libc.



-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Segmentation fault running ntpd

2015-07-28 Thread Adrian Chadd
There's some way in stable/10 and -head to get it to install debug
symbols for things. Maybe it's only libraries, but you'll at least
want that in there so you can get stack traces through libc.



-adrian
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Segmentation fault running ntpd

2015-07-28 Thread Adrian Chadd
On 28 July 2015 at 16:09, David Wolfskill da...@catwhisker.org wrote:
 On Tue, Jul 28, 2015 at 04:05:33PM -0700, Adrian Chadd wrote:
 Is this still happening for you?
 

 g1-245(10.2-P)[4] ls -lT /S4/ntpd.core
 -rw-r--r--  1 root  wheel  13783040 Jul 28 04:56:28 2015 /S4/ntpd.core

 Apparently so, yes.

 (/S4 is where I have the head root file system mounted when I'm not
 running from slice 4.)

Hm, is there any way you can get symbols for it?

I don't think I can easily get symbols for the crash we have, but my
crash is also deep in malloc code..


-a
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Segmentation fault running ntpd

2015-07-24 Thread David Wolfskill
On Sun, Jul 19, 2015 at 11:36:00AM -0700, David Wolfskill wrote:
 On Sun, Jul 19, 2015 at 10:24:11AM -0600, Ian Lepore wrote:
  ...
  Was there anything (at all) in /var/log/messages about ntpd?  Even the
  routine messages (such as what interfaces it binds to) might give a bit
  of a clue about how far it got in its init before it died. 
  
 
 Sorry; there might have been something yesterday...
 If I do get another recurrence, I'll try to gather a bit more
 information.
 

OK; got another one.

This time, I have the complete /var/log/messages for a verbose boot,
from that boot to just a bit after the ntpd crash; it's in
http://www.catwhisker.org/~david/FreeBSD/head; as of the moment, that
contains:

[PARENTDIR] Parent Directory -   
[   ] CANARY  2015-03-22 10:03   15K  
[   ] CANARY.gz   2015-03-22 10:03  6.3K  
[   ] ntpd.core   2015-07-24 05:31   13M  
[   ] ntpd.core.gz2015-07-24 05:31  124K  
[TXT] ntpd_crash_msgs.txt 2015-07-24 05:40  138K  
[   ] ntpd_crash_msgs.txt.gz  2015-07-24 05:40   19K  

This was running:

FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #133  
r285836M/285836:1100077: Fri Jul 24 05:24:41 PDT 2015 
r...@g1-245.catwhisker.org:/common/S4/obj/usr/src/sys/CANARY  amd64


Trying gdb /usr/obj/usr/src/usr.sbin/ntp/ntpd/ntpd ntpd.core still
doesn't help much:

This GDB was configured as amd64-marcel-freebsd...(no debugging symbols 
found)...
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
...
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
[New Thread 801c07400 (LWP 100133/unknown)]
[New Thread 801c06400 (LWP 100132/unknown)]
(gdb) bt
#0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
#1  0x0008ccbd4f34 in ?? ()
#2  0x0005 in ?? ()
#3  0x000801800448 in ?? ()
#4  0x0008011ca888 in sbrk () from /lib/libc.so.7
#5  0x0008018000c8 in ?? ()
#6  0x0008018000c0 in ?? ()
#7  0x0208 in ?? ()
#8  0x000801c32fb0 in ?? ()
#9  0x0001 in ?? ()
#10 0x000801cc20c8 in ?? ()
#11 0x0030 in ?? ()
#12 0x000801cc20c8 in ?? ()
#13 0x7fffe480 in ?? ()
#14 0x0008011cd240 in sbrk () from /lib/libc.so.7
#15 0x0280 in ?? ()
#16 0x0008014bbc70 in malloc_message () from /lib/libc.so.7
#17 0x0008018000c0 in ?? ()
#18 0x000801800448 in ?? ()
#19 0x0032 in ?? ()
#20 0x000801800458 in ?? ()
#21 0x0008014bbc68 in malloc_message () from /lib/libc.so.7
#22 0x000801cc2000 in ?? ()
#23 0x0008014bba60 in malloc_message () from /lib/libc.so.7
#24 0x000801cc20d8 in ?? ()
#25 0x00a0 in ?? ()
#26 0x0208 in ?? ()
#27 0x7fffe4d0 in ?? ()
#28 0x0008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7
Previous frame inner to this frame (corrupt stack?)
(gdb) 


I am presently suspecting that it's a bit dependent on ... well, timing.

I have my ~/.xsession set up so that once I've entered the passphrase(s)
for my SSH private key(s), scripts start running to establish
connections to other machines -- e.g., open an xterm locally, ssh
over to my mailhub and (re-)establish a tmux session on that machine
where I run mutt to read  write email (such as this message).

While that almost always Just Works in stable/10, it's rather ...
spottier ... in head -- I'd say it's about a 50% probability that it will
work, vs. the ssh connection attempt hanging, and eventually timing out.
But if I've waited (say) 30 seconds or so, I can establish such a
connection easily.

Granted, I am using wireless (802.11), but I get a sense that things
are claimed to be ready to go a bit prematurely -- at least sometimes.

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgptpn4GcpGjF.pgp
Description: PGP signature


Re: Segmentation fault running ntpd

2015-07-19 Thread Ian Lepore
On Sat, 2015-07-18 at 05:09 -0700, David Wolfskill wrote:
 Lousy timing (no pun intended -- it's early in the day for me),
 given the recent MFC, but as I was booting my laptop to yesterday's
 head:
 
 FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127  
 r285652M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015 
 r...@g1-245.catwhisker.org:/common/S3/obj/usr/src/sys/CANARY  amd64
 
 to build today's head (@r285670; still in progress as I type), I
 happened to note [Oh, great -- we can no longer copy/paste from
 console now??!?  Fine, I'll transcribe by hand :-(]:
 
 ...
 bound to 172.17.1.245 -- renewal in 43200 seconds.
 pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
 Starting Network: lo0 em0 iwn0 lagg0.
 ...
 
 Trying to examine the /ntpd.core, I see:
 root@g1-245:/ # gdb `which ntpd` ntpd.core 
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...(no debugging symbols 
 found)...
 Core was generated by `ntpd'.
 Program terminated with signal 11, Segmentation fault.
 Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
 Loaded symbols for /lib/libm.so.5
 Reading symbols from /lib/libcrypto.so.7...(no debugging symbols 
 found)...done.
 Loaded symbols for /lib/libcrypto.so.7
 Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
 Loaded symbols for /lib/libthr.so.3
 Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
 Loaded symbols for /lib/libc.so.7
 Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols 
 found)...done.
 Loaded symbols for /libexec/ld-elf.so.1
 #0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
 [New Thread 801c07400 (LWP 100122/unknown)]
 [New Thread 801c06400 (LWP 100120/unknown)]
 (gdb) bt
 #0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
 #1  0x0008ccbd4f34 in ?? ()
 #2  0x0005 in ?? ()
 #3  0x000801800448 in ?? ()
 #4  0x0008011ca888 in sbrk () from /lib/libc.so.7
 #5  0x0008018000c8 in ?? ()
 #6  0x0008018000c0 in ?? ()
 #7  0x0208 in ?? ()
 #8  0x000801c32fb0 in ?? ()
 #9  0x0001 in ?? ()
 #10 0x000801cc20c8 in ?? ()
 #11 0x0030 in ?? ()
 #12 0x000801cc20c8 in ?? ()
 #13 0x7fffe480 in ?? ()
 #14 0x0008011cd240 in sbrk () from /lib/libc.so.7
 #15 0x0280 in ?? ()
 #16 0x0008014bbc70 in malloc_message () from /lib/libc.so.7
 #17 0x0008018000c0 in ?? ()
 #18 0x000801800448 in ?? ()
 #19 0x0032 in ?? ()
 #20 0x000801800458 in ?? ()
 #21 0x0008014bbc68 in malloc_message () from /lib/libc.so.7
 #22 0x000801cc2000 in ?? ()
 ---Type return to continue, or q return to quit---
 #23 0x0008014bba60 in malloc_message () from /lib/libc.so.7
 #24 0x000801cc20d8 in ?? ()
 #25 0x00a0 in ?? ()
 #26 0x0208 in ?? ()
 #27 0x7fffe4d0 in ?? ()
 #28 0x0008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7
 Previous frame inner to this frame (corrupt stack?)
 (gdb) 
 
 which seems... well, not especially useful, as far as I can tell.
 
 
 This is (as mentioned above) on my laptop; as such, it is expected to
 wander from one network to another.  Accordingly:
 
 * Since it could be connected to a network I do not control, I use a
   packet filter (IPFW, in my case) to reduce my exposure from a
   possibly-hostile network.
 
 * Rather than enabling ntpd in /etc/rc.conf, I use
   /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP
   lease.  (For networks I control, I also set up the DHCP server to
   advertise what NTP server the DHCP clients should use, but the code in
   dhclient-exit-hooks merely prefers that, rather han requiring it.)
 
 * In my world-view -- at least for networks I control -- DNS zone files
   are the Source of Truth with respect to hostname - IP address
   correspondence, and Dynamic DNS is Evil.  I populate my zone files
   with appropriate A  PTR records so that every assignable DHCP
   address has a PTR record, and the hostname to which it points has
   an A record that points back to that IP address.  Accordingly, I
   also use /etc/dhclient-exit-hooks so the laptop can find out what
   its hostname is, and set it accordingly.
 
 Mind, I've been doing the above for well over a decade, so that doesn't
 qualify as new.
 
 And most of the time, it Just Works (which is a significant reason I
 keep doing it).
 
 A couple of other things that are more recent, and possibly of
 relevance:
 
 * As alluded to above, I have the em0  wlan0 (iwn(4)) NICs set up using
   Link Aggregation in failover mode.  In practice, I rarely use
   

Re: Segmentation fault running ntpd

2015-07-19 Thread David Wolfskill
On Sun, Jul 19, 2015 at 10:24:11AM -0600, Ian Lepore wrote:
 ...
 Was there anything (at all) in /var/log/messages about ntpd?  Even the
 routine messages (such as what interfaces it binds to) might give a bit
 of a clue about how far it got in its init before it died. 
 

Sorry; there might have been something yesterday, but what with the
(verbose) reboots after builds, I, rolled over all of my
/var/log/messages* files; the earliest recrd I still have is from Jul
19 06:00:00 (UTC-0700), and I did not get a recurrence today.

(The one from yesterday wasn't the first I had seen -- I wanted to wait
until I had some hope that the issue was reproducible before whining
about it. :-})

If I do get another recurrence, I'll try to gather a bit more
information.

Thanks!

Peace,
david
-- 
David H. Wolfskill  da...@catwhisker.org
Those who murder in the name of God or prophet are blasphemous cowards.

See http://www.catwhisker.org/~david/publickey.gpg for my public key.


pgp1uXovIDi31.pgp
Description: PGP signature


Segmentation fault running ntpd

2015-07-18 Thread David Wolfskill
Lousy timing (no pun intended -- it's early in the day for me),
given the recent MFC, but as I was booting my laptop to yesterday's
head:

FreeBSD g1-245.catwhisker.org 11.0-CURRENT FreeBSD 11.0-CURRENT #127  
r285652M/285652:1100077: Fri Jul 17 04:30:16 PDT 2015 
r...@g1-245.catwhisker.org:/common/S3/obj/usr/src/sys/CANARY  amd64

to build today's head (@r285670; still in progress as I type), I
happened to note [Oh, great -- we can no longer copy/paste from
console now??!?  Fine, I'll transcribe by hand :-(]:

...
bound to 172.17.1.245 -- renewal in 43200 seconds.
pid 544 (ntpd), uid 0: exited on signal 11 (core dumped)
Starting Network: lo0 em0 iwn0 lagg0.
...

Trying to examine the /ntpd.core, I see:
root@g1-245:/ # gdb `which ntpd` ntpd.core 
GNU gdb 6.1.1 [FreeBSD]
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type show copying to see the conditions.
There is absolutely no warranty for GDB.  Type show warranty for details.
This GDB was configured as amd64-marcel-freebsd...(no debugging symbols 
found)...
Core was generated by `ntpd'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /lib/libm.so.5...(no debugging symbols found)...done.
Loaded symbols for /lib/libm.so.5
Reading symbols from /lib/libcrypto.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libcrypto.so.7
Reading symbols from /lib/libthr.so.3...(no debugging symbols found)...done.
Loaded symbols for /lib/libthr.so.3
Reading symbols from /lib/libc.so.7...(no debugging symbols found)...done.
Loaded symbols for /lib/libc.so.7
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols found)...done.
Loaded symbols for /libexec/ld-elf.so.1
#0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
[New Thread 801c07400 (LWP 100122/unknown)]
[New Thread 801c06400 (LWP 100120/unknown)]
(gdb) bt
#0  0x0008011cd6a0 in sbrk () from /lib/libc.so.7
#1  0x0008ccbd4f34 in ?? ()
#2  0x0005 in ?? ()
#3  0x000801800448 in ?? ()
#4  0x0008011ca888 in sbrk () from /lib/libc.so.7
#5  0x0008018000c8 in ?? ()
#6  0x0008018000c0 in ?? ()
#7  0x0208 in ?? ()
#8  0x000801c32fb0 in ?? ()
#9  0x0001 in ?? ()
#10 0x000801cc20c8 in ?? ()
#11 0x0030 in ?? ()
#12 0x000801cc20c8 in ?? ()
#13 0x7fffe480 in ?? ()
#14 0x0008011cd240 in sbrk () from /lib/libc.so.7
#15 0x0280 in ?? ()
#16 0x0008014bbc70 in malloc_message () from /lib/libc.so.7
#17 0x0008018000c0 in ?? ()
#18 0x000801800448 in ?? ()
#19 0x0032 in ?? ()
#20 0x000801800458 in ?? ()
#21 0x0008014bbc68 in malloc_message () from /lib/libc.so.7
#22 0x000801cc2000 in ?? ()
---Type return to continue, or q return to quit---
#23 0x0008014bba60 in malloc_message () from /lib/libc.so.7
#24 0x000801cc20d8 in ?? ()
#25 0x00a0 in ?? ()
#26 0x0208 in ?? ()
#27 0x7fffe4d0 in ?? ()
#28 0x0008011bdd7a in _malloc_thread_cleanup () from /lib/libc.so.7
Previous frame inner to this frame (corrupt stack?)
(gdb) 

which seems... well, not especially useful, as far as I can tell.


This is (as mentioned above) on my laptop; as such, it is expected to
wander from one network to another.  Accordingly:

* Since it could be connected to a network I do not control, I use a
  packet filter (IPFW, in my case) to reduce my exposure from a
  possibly-hostile network.

* Rather than enabling ntpd in /etc/rc.conf, I use
  /etc/dhclient-exit-hooks to start ntpd after the laptop has a DHCP
  lease.  (For networks I control, I also set up the DHCP server to
  advertise what NTP server the DHCP clients should use, but the code in
  dhclient-exit-hooks merely prefers that, rather han requiring it.)

* In my world-view -- at least for networks I control -- DNS zone files
  are the Source of Truth with respect to hostname - IP address
  correspondence, and Dynamic DNS is Evil.  I populate my zone files
  with appropriate A  PTR records so that every assignable DHCP
  address has a PTR record, and the hostname to which it points has
  an A record that points back to that IP address.  Accordingly, I
  also use /etc/dhclient-exit-hooks so the laptop can find out what
  its hostname is, and set it accordingly.

Mind, I've been doing the above for well over a decade, so that doesn't
qualify as new.

And most of the time, it Just Works (which is a significant reason I
keep doing it).

A couple of other things that are more recent, and possibly of
relevance:

* As alluded to above, I have the em0  wlan0 (iwn(4)) NICs set up using
  Link Aggregation in failover mode.  In practice, I rarely use
  the em0 (wired) NIC -- I had originally done that based on a
  misperception of how I thought things were set up at work, and
  then just left the configuration alone and