Re: Supported graphics (in HEAD)

2022-05-12 Thread Tom Ivar Helbekkmo
I wrote:

> I didn't think of [trying other Nvidia boards].

Tried two even older Nvidia boards I happened to have on the shelf, but
the kernel crashed during autoconfiguration with both of them.  Then I
found this cute little Radeon board, a really old one, and now I'm happy
as a clam.  Never had so much free memory on this workstation, there's
no sign of any leakage, and 1080p video plays just fine.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Supported graphics (in HEAD)

2022-05-10 Thread Tom Ivar Helbekkmo
matthew green  writes:

> can you file a PR about this?

Done.  kern/56826

> i don't see the problem on 750 or 730 cards.

I didn't think of that...  I do have a couple of other graphics boards
on the shelf, but as they're all Nvidia, I figured it didn't make sense
to try them.  Guess I'll do some experimenting tonight, to see if one of
the others can avoid this problem for me.  (They're even older than the
one I'm using, but it's not as if I need anything super fancy.  I'd like
to be able to use Youtube in the browser from time to time, but that's
the most advanced graphics need I have.)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Supported graphics (in HEAD)

2022-05-08 Thread Tom Ivar Helbekkmo
Robert Elz  writes:

> Any advice?

Well, in my experience, nvidia is probably something you only want if
you have lots of RAM in your workstation.  In HEAD, there's a lot of
memory leaking going on - every change to the image on the monitor leaks
kmem-04096 items, and on my 1920x1080 monitor, watching a youtube video
in firefox leaks 2-300 of those per second.

Of course, I only notice because I have a mere 4 GiB of RAM in this
workstation, which is more than plenty for the first couple of hours of
work (firefox and a few terminal windows, using the browser as little as
possible, and completely avoiding video), but demands a daily reboot.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Disappearing mouse buttons

2022-02-09 Thread Tom Ivar Helbekkmo
n...@netbsd.org writes:

> Nope, too low level. Take a look at how buttons are processed here:
>
> xsrc/external/mit/xf86-input-ws/dist/src/ws.c

Thanks - I'll take a look at it tonight, instrument it a bit, and see
what I can discover.

Interestingly, the modification I suggested didn't make any difference;
it still announces five buttons and Z axis when the uhid attaches.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Disappearing mouse buttons

2022-02-09 Thread Tom Ivar Helbekkmo
n...@netbsd.org writes:

> I think you're looking in the wrong area - the most likely culprit is
> a change I made to Xorg where the default driver for pointing devices
> will be "ws" instead of "mouse" (this is primaily useful for
> touchscreens and touchpads, and mirrors a similar change for OpenBSD).

That was, indeed, it!  Thank you so much for identifying it for me!  :)

Does this mean that our /dev/wsmouse isn't working properly?  Having a
device that exhibits this behaviour, should I be taking a closer look at
sys/dev/wscons/wsmouse.c?  Or possibly lower down in the stack, like
sys/dev/usb/ums.c, or even sys/dev/hid/hidms.c, as it reports "5 buttons
and Z dir", where I assume it should be claiming (at least) 8 buttons,
seeing as my problem button is button 8 when it's working...?

I notice that in sys/dev/hid/hidms.c, we count the buttons thus:

for (i = 1; i <= MAX_BUTTONS; i++)
if (!hid_locate(desc, size, HID_USAGE2(HUP_BUTTON, i),
id, hid_input, >hidms_loc_btn[i - 1], 0))
break;
[...]
ms->nbuttons = i - 1;

Now, if my device is claiming buttons 1, 2, 3, 4, 5, and 8, it would
seem that this code would end up ignoring button 8, just like I see in
the dmesg output (and in X, when it uses the wsmouse/ums/hid stack).
Assuming that the "mouse" driver in X talks more directly to the device
about its configuration, a slight modification to the above to find the
highest numbered button present, instead of the last one contiguously
numbered from 1, would be interesting.  I'll try it, and see.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Disappearing mouse buttons

2022-02-08 Thread Tom Ivar Helbekkmo
Michael  writes:

> Does it still work as expected with an older -current?

Yup.  It still works the way it used to when I plug it into my Pinebook,
that still runs a -current from early September, and likewise with my
Linux laptop.  The dmesg output from the Pinebook is identical: it says
"5 buttons and Z dir" there, too, but there, the four actual buttons
generate (according to xev) buttons 1, 2, 3, and 8, and the scroll wheel
4 and 5.  So the difference between the September -current and a fresh
one is that the fourth button now generates button 4 events instead of
8.  No change after it's been observed to work as expected on Linux and
on slightly older NetBSD; it's still button 4 when plugged back into the
workstation with amd64-current on it.  (And the change, there, occurred
exactly when I upgraded it from that September 9th version, with nothing
else changed.)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Disappearing mouse buttons

2022-02-08 Thread Tom Ivar Helbekkmo
I'm looking for something that I can't find...  The "Kensington Expert
Mouse" trackball on my amd64-current workstation has four buttons and a
scroll wheel, and these used to generate all different button presses -
so at least six distinct buttons.  This was still working with a
-current from back at the start of September.

After a fresh update, however, it now reports thus:

uhidev1 at uhub4 port 1 configuration 1 interface 0
uhidev1: vendor 047d (0x047d) Kensington Expert Mouse (0x1020), rev 2.00/1.06, 
addr 4, iclass 3/1
ums0 at uhidev1: 5 buttons and Z dir
wsmouse0 at ums0 mux 0

With this, I suddenly have the fourth button mapped as a duplicate of
one of the scroll wheel directions.  The wsmouse mapping is "1 2 3 4 5",
and the fourth button generates button 4; the scroll wheel 4 and 5.  The
fourth button, then, is just single steps of one scroll direction, while
it used to give me the "back" function normally associated with one of
the thumb buttons on most modern mice.

I've been looking at changes in sys/dev/usb, sys/dev/hid, and
sys/dev/wscons, but I can't find anything that looks relevant.

If anyone has any hints for me, that would be appreciated!

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


MIDI with Java on -current?

2021-10-13 Thread Tom Ivar Helbekkmo
I have this Java application (JSynthLib) that needs to talk MIDI with my
synthesizers.  I've previously run it with older versions of the JRE,
using the LinuxCharDevMidiProvider that came with it - but that no
longer works with the current environments.  All I really need is a
standard interface between the official MIDI bits in the JRE and the
NetBSD /dev/rmidi stuff...

Anyone know of something like that?

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Partial reads on unix domain sockets

2021-04-07 Thread Tom Ivar Helbekkmo
Some time last year, probably late summer or autumn, a change was made
that caused transfer of small chunks of data over unix domain sockets to
have a higher chance of resulting in a read() getting only part of the
chunk.

While there is no guarantee of a one to one relationship between writes
and reads, it seems that some applications expect this.  In my case, it
was jack (pkgsrc/audio/jack) that failed.  It comes with, among other
things, a daemon, jackd, and a library for use by clients wishing to
connect to it.  Communication between jackd and its clients became
impossible with this change, because the code in jack expects to be able
to exchange C structs between server and clients.  The jackd server has
a thread that uses poll() to wait for available packets from clients,
and when something arrives, it is read with code like this example:

   if (read (client_fd, , sizeof(req)) != sizeof(req)) {
  jack_error ("cannot read ACK connection request from client");
  return -1;
   }

The client_fd is an open unix domain stream socket, and it is *not* in
non-blocking mode.  The structs being transfered are of various sizes,
and can, from a casual inspection of the header files, be up to a couple
of hundred bytes long.

Data is written to the sockets using code like this:

   if (write (reply_fd, , sizeof(req)) < (ssize_t)sizeof(req)) {
  jack_error ("cannot write request result to client");
  return -1;
   }

Meanwhile, in the client library, the code at the other end of this
communication is simply:

   if (write (fd, , sizeof(req)) != sizeof(req)) {
  jack_error ("cannot write event connect request to server (%s)",
  strerror (errno));
  close (fd);
  return -1;
   }

   if (read (fd, , sizeof(res)) != sizeof(res)) {
  jack_error ("cannot read event connect result from server (%s)",
  strerror (errno));
  close (fd);
  return -1;
   }

Obviously, poll() will return, with information about available data,
before the entire chunk written by the other end is available.

I haven't filed a PR on this, as it isn't technically an error in
NetBSD.  However, if there is a wide-spread belief out there that code
such as this will "just work" (I'm guessing it "just works" on Linux,
just like it does on NetBSD < 10), and it's not otherwise detrimental to
have the data from a single write() call all be available to the reader
of the socket before triggering a select() or poll() that's waiting for
it, then maybe such an adjustment should be considered.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Is anyone using the wg(4) driver with a commerical VPN service??

2020-10-26 Thread Tom Ivar Helbekkmo
Brad Spencer  writes:

> Just wondering if anyone has succeeded in getting the wg(4) network
> driver working with one of the commercial VPN providers?  I attempted it
> with one in particular, with an admitted slightly older -current and did
> not succeed in getting it working.

Haven't tried that, no - but I suspect you've configured something
wrong, as I know that our wg works reliably with the WireGuard(tm)
implementations in Linux and Android.  (Well, either that, or you hit
one of the short windows of -current when something was wrong.)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: cmake hanging

2020-09-20 Thread Tom Ivar Helbekkmo
Chavdar Ivanov  writes:

> je_malloc_mutex_lock_slow is seen in the both traces in one of the
> threads (weird that all of them are trying to mknod...). The same call
> is seen in my trace.

Ditto for mine.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: cmake hanging

2020-09-01 Thread Tom Ivar Helbekkmo
Chavdar Ivanov  writes:

> I am having the same cmake hangs as in this thread. I've attached the
> gdb 'thread apply all bt' output (collected with script).

That looks suspiciouly similar to the hangs I'm seeing with dhcpd on
amd64-current (note the mutex lock attempt, while nothing else looks
very interesting):

(gdb) thread apply all bt

Thread 7 (process 12269):
#0  0x7e573a2a892a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7e573e80a9a9 in pthread_cond_timedwait (cond=0x7e573fd67d08, 
mutex=0x7e573fd67cd8, abstime=0x0) at /usr/src/lib/libpthread/pthread_cond.c:167
#2  0x7e573f01ed2e in isc_app_ctxrun (ctx=0x7e573fd67c80) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/app.c:340
#3  0x5664e624 in dispatch () at 
/usr/src/external/mpl/dhcp/lib/common/../../dist/common/dispatch.c:121
#4  0x5668aae7 in main (argc=, argv=) at 
/usr/src/external/mpl/dhcp/bin/server/../../dist/server/dhcpd.c:1114

Thread 6 (process 21463):
#0  0x7e573a2a892a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7e573e80a9a9 in pthread_cond_timedwait (cond=0x7e5740037850, 
mutex=0x7e5740037800, abstime=0x0) at /usr/src/lib/libpthread/pthread_cond.c:167
#2  0x7e573f02a0a4 in dispatch (threadid=, 
manager=0x7e5740036800) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:1059
#3  run (queuep=) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:1346
#4  0x7e573e80bee0 in pthread__create_tramp (cookie=0x7e574002e000) at 
/usr/src/lib/libpthread/pthread.c:560
#5  0x7e573a2924e0 in ?? () from /usr/lib/libc.so.12
#6  0x0040 in ?? ()
#7  0x7e573980 in ?? ()
#8  0x001003a0efff in ?? ()
#9  0x7e57396000c0 in ?? ()
#10 0x001fff40 in ?? ()
#11 0x in ?? ()

Thread 5 (process 19116):
#0  0x7e573a244d8a in _sys___kevent50 () from /usr/lib/libc.so.12
#1  0x7e573e8079d8 in __kevent50 (fd=, ev=ev@entry=0x0, 
nev=nev@entry=0, rev=rev@entry=0x7e5740008000, nrev=nrev@entry=64, 
ts=ts@entry=0x0) at /usr/src/lib/libpthread/pthread_cancelstub.c:176
#2  0x7e573f0223c4 in netthread (uap=0x7e5740039800) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/unix/socket.c:3519
#3  0x7e573e80bee0 in pthread__create_tramp (cookie=0x7e574002fc00) at 
/usr/src/lib/libpthread/pthread.c:560
#4  0x7e573a2924e0 in ?? () from /usr/lib/libc.so.12
#5  0x0060 in ?? ()
#6  0x7e573940 in ?? ()
#7  0x002003a0efff in ?? ()
#8  0x7e5737a000c0 in ?? ()
#9  0x003fff40 in ?? ()
Backtrace stopped: Cannot access memory at address 0x7e5737800028

Thread 4 (process 25506):
#0  0x7e573a2a892a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7e573e80a9a9 in pthread_cond_timedwait (cond=0x7e574003a868, 
mutex=0x7e574003a810, abstime=0x0) at /usr/src/lib/libpthread/pthread_cond.c:167
#2  0x7e573f028631 in run (uap=0x7e574003a800) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/timer.c:650
#3  0x7e573e80bee0 in pthread__create_tramp (cookie=0x7e5740031800) at 
/usr/src/lib/libpthread/pthread.c:560
#4  0x7e573a2924e0 in ?? () from /usr/lib/libc.so.12
Backtrace stopped: Cannot access memory at address 0x7e57367f

Thread 3 (process 26000):
#0  0x7e573a2a892a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7e573e80a9a9 in pthread_cond_timedwait (cond=0x7e573fd69cd0, 
mutex=0x7e573fd69c80, abstime=0x0) at /usr/src/lib/libpthread/pthread_cond.c:167
#2  0x7e573f02a0a4 in dispatch (threadid=, 
manager=0x7e573fd68c80) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:1059
#3  run (queuep=) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/task.c:1346
#4  0x7e573e80bee0 in pthread__create_tramp (cookie=0x7e5740033400) at 
/usr/src/lib/libpthread/pthread.c:560
#5  0x7e573a2924e0 in ?? () from /usr/lib/libc.so.12
Backtrace stopped: Cannot access memory at address 0x7e57357e

Thread 2 (process 24520):
#0  0x7e573a2a892a in ___lwp_park60 () from /usr/lib/libc.so.12
#1  0x7e573e809791 in pthread__mutex_lock_slow 
(ptm=ptm@entry=0x7e573fd71158, ts=ts@entry=0x0) at 
/usr/src/lib/libpthread/pthread_mutex.c:363
#2  0x7e573e809a44 in pthread_mutex_lock (ptm=0x7e573fd71158) at 
/usr/src/lib/libpthread/pthread_mutex.c:215
#3  pthread_mutex_lock (ptm=ptm@entry=0x7e573fd71158) at 
/usr/src/lib/libpthread/pthread_mutex.c:196
#4  0x7e573f0224a8 in process_fd (writeable=false, readable=true, 
fd=, thread=0x7e573fd6bc80) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/unix/socket.c:3259
#5  process_fds (nevents=, events=, 
thread=0x7e573fd6bc80) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/unix/socket.c:3332
#6  netthread (uap=0x7e573fd6bc80) at 
/usr/src/external/mpl/bind/lib/libisc/../../dist/lib/isc/unix/socket.c:3621
#7  0x7e573e80bee0 in pthread__create_tramp (cookie=0x7e5740027000) at 
/usr/src/lib/libpthread/pthread.c:560
#8  0x7e573a2924e0 in ?? 

Re: WireGuard in NetBSD

2020-08-21 Thread Tom Ivar Helbekkmo
Robert Swindells  writes:

> Does it work for IPv6 ?

Yes, it does.  I've been using it all along, with my main NetBSD server
system as a hub, and various NetBSD and Linux laptops, android phones,
and the NetBSD system at our mountain cabin connecting to it.  It's been
very well behaved on amd64 and aarch64 -- I never bothered to figure out
how to make it work on 32 bit architectures, but it seems Taylor has.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: USB keyboard input overrun on EHCI?

2020-03-03 Thread Tom Ivar Helbekkmo
I wrote:

> I'll test a kernel without that "else if" block tonight.

After discussion on IRC, what I'll try first of all is to change

   callout_reset(>sc_delay, 1, ukbd_delayed_decode, sc);

to

   callout_reset(>sc_delay, 0, ukbd_delayed_decode, sc);

inside the "else if" block.  We only need to move the handling of the
keyboard event out of the interrupt frame; there's no need to delay it,
so unless this causes other problems for keyboard initiated DDB entry,
it should be the right thing to do, anyway.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: USB keyboard input overrun on EHCI?

2020-03-03 Thread Tom Ivar Helbekkmo
Brian Buhrow  writes:

>   hello.  My recollection may be slightly wrong here since I'm
> still running NetBSD-5 in most cases, but my understanding is that
> ehci(4) connected devices are all USB-2.0 and for slower devices, the
> uhci(4) or ohci(4) hub drivers provide service.

Ah, it's not different hardware as much as different protocol choices?
I do notice that on the two laptops, where the keyboard works well,
there are "handing over full speed device" messages when connecting it,
whereas on the workstation, where it can overrun the input, this does
not occur.  Maybe I should go through the BIOS configuration with a fine
tooth comb, to see if there's something there that limits devices in
what level of USB functionality they can negotiate?

Meanwhile, I think I've found something more interesting (thanks to
debugging guidance from riastradh@): in /sys/dev/usb/ukbd.c, there is
this block of code, in the interrupt handler:

if ((sc->sc_flags & FLAG_DEBOUNCE) && !(sc->sc_flags & FLAG_POLLING)) {
/*
 * Some keyboards have a peculiar quirk.  They sometimes
 * generate a key up followed by a key down for the same
 * key after about 10 ms.
 * We avoid this bug by holding off decoding for 20 ms.
 */
sc->sc_data = *ud;
callout_reset(>sc_delay, hz / 50, ukbd_delayed_decode, sc);
#ifdef DDB
} else if (sc->sc_console_keyboard && !(sc->sc_flags & FLAG_POLLING)) {
/*
 * For the console keyboard we can't deliver CTL-ALT-ESC
 * from the interrupt routine.  Doing so would start
 * polling from inside the interrupt routine and that
 * loses bigtime.
 */
sc->sc_data = *ud;
callout_reset(>sc_delay, 1, ukbd_delayed_decode, sc);
#endif
} else {
ukbd_decode(sc, ud);
}

If I read this correctly, the first bit says "for certain keyboards, we
have to wait 20ms after a change before accepting it, because it may be
a switch bounce, which will have cleared by then".  Fine - not relevant
to me.  The next bit, though, is.  It says that if we're the console
keyboard, we always wait 10ms, and then handle the keyboard event (if it
is still relevant).  This means, though, that if we get the next event
from the keyboard within 10ms of the last one, we may lose an event.

I haven't carefully measured the output from my new keyboard when it's
generating sequences of keypresses, but by just eyeballing it, I've
estimated it to be in in the vicinity of 50cps, or 10ms per event - and
a difference between the workstation that's being overrun and the two
laptops that aren't, is, of course, that on the former my new keyboard
is the console keyboard.

I'll test a kernel without that "else if" block tonight.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


USB keyboard input overrun on EHCI?

2020-03-03 Thread Tom Ivar Helbekkmo
This is a problem that I believe has been present for a while, but has
recently (since the new year) become worse:

I have, for a long time, been using a bar code scanner to register books
on LibraryThing, and I've been noticing that my amd64 home workstation
has had a tendency to drop a digit from the injected ISBN from time to
time - I guess it's typically lost one digit from a 10 digit ISBN on
something like every third scan.

To check the scanner, I tried it on a Dell laptop, and, more recently,
on a Pinebook, and it shows no problems on either.

Recently, I bought an Ergodox-EZ keyboard.  The BIOS in the keyboard is
able to generate Unicode characters using the standard input method that
is implemented in e.g. GTK 2.  This transmits Ctrl-Shift-U, followed by
the hex digits of the Unicode code point, and then a space character.

Running a kernel from (IIRC) January 2nd, this mostly worked fine.  Then
I updated to one from February 21st, and now my workstation only manages
to receive a Unicode character once every dozen tries or thereabout.

So, the problem that was already present got a lot worse - and on the
Dell laptop, and on the Pinebook, running kernels built from the same
sources, everything is still fine.

The difference between the systems (apart from two of them being amd64,
and one aarch64) is that the Pinebook has OHCI, the Dell UHCI, and the
(failing) workstation EHCI hardware.

The transmit speed of the keyboard is low.  I haven't measured it, but
it looks to be in the 50cps range, meaning 20ms between characters.
(Estimated by watching the keyboard transmit a longer string.)

I'd appreciate hints as to where I should be looking for this problem.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: mpv coredump

2020-02-16 Thread Tom Ivar Helbekkmo
Thomas Klausner  writes:

>> To generate a diff of this commit:
>> cvs rdiff -u -r1.164 -r1.165 src/lib/libpthread/pthread.c
>> cvs rdiff -u -r1.101 -r1.102 src/lib/libpthread/pthread_int.h
>> cvs rdiff -u -r1.74 -r1.75 src/lib/libpthread/pthread_mutex.c
>> cvs rdiff -u -r1.18 -r1.19 src/lib/libpthread/pthread_tsd.c
>
> If I revert this, mpv works again.

Also, the GNU zip utilities (gzip, gunzip, gzcat) just hang (in state
"parked") when run on aarch64 after the above commit.  Reverting it lets
them work correctly again.

(Noticed while installing an upgrade - the upgrade stopped progressing
after base.tgz had been unpacked.)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: crashes in amd64 8.99.51/9.99.2 with panic: pr_find_pagehead: [npfcn4pl]

2019-08-07 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo  writes:

> Geoff Wing  writes:
>
>> 8.99.51 crash:
>> panic: pr_find_pagehead: [npfcn4pl] item 0x98a0b89491b8 poolid 182 != 181
>
> I'm seeing these on amd64 and aarch64, both current at the time that the
> release branch for version 9 was created.

...but no longer, after rmind supplied patches that christos committed. :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: crashes in amd64 8.99.51/9.99.2 with panic: pr_find_pagehead: [npfcn4pl]

2019-08-04 Thread Tom Ivar Helbekkmo
Geoff Wing  writes:

> 8.99.51 crash:
> panic: pr_find_pagehead: [npfcn4pl] item 0x98a0b89491b8 poolid 182 != 181

I'm seeing these on amd64 and aarch64, both current at the time that the
release branch for version 9 was created.  The crashes seem to happen
when there's quite a bit of disk activity: both systems crash during the
nightly jobs cron run, and a surefire way to provoke a crash is to
install a new version of the system, by running 'tar xzpf' on each of
the set files in turn.  (I've tried, several times, to get a complete
distribution from the above mentioned point in time successfully
installed on these, but still don't have all the X stuff completed.)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Recent USB changes broke kernel memory allocation

2019-02-11 Thread Tom Ivar Helbekkmo
Jaromír Doleček  writes:

> Fixed now. If you update the tree to have sys/dev/usb/umass.c rev.
> 1.174 you'll get the fixed files.

That did the trick!  Thanks!  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Recent USB changes broke kernel memory allocation

2019-02-10 Thread Tom Ivar Helbekkmo
It seems that changes made to USB code on February 7th broke the kernel
memory allocation arena.  After that point, it is enough to insert a USB
memory stick into my amd64 laptop, and then remove it, to make the
kernel crash.  It seems the changes to the allocating and freeing calls
got a bit messed up, leading to internal disagreements about item sizes,
at least in the umass code:

: dejah# ;cd /var/crash
: dejah# ;dmesg -N netbsd.26 -M netbsd.26.core | tail -23
[  1525.390177] umass0: SMI Corporation (0x90c) USB DISK (0x1000), rev 
2.00/11.00, addr 2
[  1525.390177] umass0: using SCSI over Bulk-Only
[  1525.390177] scsibus0 at umass0: 2 targets, 1 lun per target
[  1525.660323] sd0 at scsibus0 target 0 lun 0:  disk 
removable
[  1525.660323] sd0: 3864 MB, 7872 cyl, 16 head, 63 sec, 512 bytes/sect x 
7913472 sectors
[  1537.266612] sd0: detached
[  1537.266612] scsibus0: detached
[  1537.266612] panic: kmem_free(0x8412b3188208, 8) != allocated size 472
[  1537.266612] cpu1: Begin traceback...
[  1537.266612] vpanic() at netbsd:vpanic+0x16f
[  1537.266612] snprintf() at netbsd:snprintf
[  1537.266612] kmem_alloc() at netbsd:kmem_alloc
[  1537.266612] umass_detach() at netbsd:umass_detach+0xe1
[  1537.266612] config_detach() at netbsd:config_detach+0x121
[  1537.266612] usb_disconnect_port() at netbsd:usb_disconnect_port+0xb8
[  1537.266612] uhub_explore() at netbsd:uhub_explore+0x221
[  1537.266612] usb_discover.isra.2() at netbsd:usb_discover.isra.2+0x68
[  1537.266612] usb_event_thread() at netbsd:usb_event_thread+0x77
[  1537.266612] cpu1: End traceback...

[  1537.266612] dumping to dev 0,1 (offset=1472, size=1045482):
[  1537.266612] dump 
: dejah# ;gdb netbsd.gdb
GNU gdb (GDB) 8.0.1
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from netbsd.gdb...done.
(gdb) target kvm netbsd.26.core
0x80222d75 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0)
at /usr/src/sys/arch/amd64/amd64/machdep.c:726
726 dumpsys();
(gdb) bt
#0  0x80222d75 in cpu_reboot (howto=howto@entry=260, 
bootstr=bootstr@entry=0x0)
at /usr/src/sys/arch/amd64/amd64/machdep.c:726
#1  0x809ec2c7 in vpanic (fmt=fmt@entry=0x813f8838 
"kmem_free(%p, %zu) != allocated size %zu", 
ap=ap@entry=0x84806a1d5d78) at /usr/src/sys/kern/subr_prf.c:335
#2  0x809ec35e in panic (fmt=fmt@entry=0x813f8838 
"kmem_free(%p, %zu) != allocated size %zu")
at /usr/src/sys/kern/subr_prf.c:254
#3  0x809e1944 in kmem_size_check (sz=8, p=0x8412b3188200) at 
/usr/src/sys/kern/subr_kmem.c:549
#4  kmem_intr_free (p=0x8412b3188200, requested_size=8) at 
/usr/src/sys/kern/subr_kmem.c:337
#5  0x8047d794 in umass_detach (self=, flags=1) at 
/usr/src/sys/dev/usb/umass.c:844
#6  0x809d337b in config_detach (dev=dev@entry=0x8412a6f78908, 
flags=flags@entry=1)
at /usr/src/sys/kern/subr_autoconf.c:1748
#7  0x804697df in usb_disconnect_port (up=up@entry=0x84129e303210, 
parent=, 
flags=flags@entry=1) at /usr/src/sys/dev/usb/usb_subr.c:1665
#8  0x8046a3a2 in uhub_explore (dev=0x84129e2fae20) at 
/usr/src/sys/dev/usb/uhub.c:637
#9  0x80463e47 in usb_discover (sc=, sc=) 
at /usr/src/sys/dev/usb/usb.c:1004
#10 0x80463f0e in usb_event_thread (arg=0x84129e16bf68) at 
/usr/src/sys/dev/usb/usb.c:562
#11 0x802097c7 in lwp_trampoline ()
#12 0x in ?? ()
(gdb) up
#1  0x809ec2c7 in vpanic (fmt=fmt@entry=0x813f8838 
"kmem_free(%p, %zu) != allocated size %zu", 
ap=ap@entry=0x84806a1d5d78) at /usr/src/sys/kern/subr_prf.c:335
335 cpu_reboot(bootopt, NULL);
(gdb) up
#2  0x809ec35e in panic (fmt=fmt@entry=0x813f8838 
"kmem_free(%p, %zu) != allocated size %zu")
at /usr/src/sys/kern/subr_prf.c:254
254 vpanic(fmt, ap);
(gdb) up
#3  0x809e1944 in kmem_size_check (sz=8, p=0x8412b3188200) at 
/usr/src/sys/kern/subr_kmem.c:549
549 panic("kmem_free(%p, %zu) != allocated size %zu",
(gdb) list
544 
545 hd = (struct kmem_header *)p;
546 hsz = hd->size;
547 
548 if (hsz != sz) {
549 panic("kmem_free(%p, %zu) != allocated size %zu",
550 (const uint8_t *)p + SIZE_SIZE, 

Re: Kernel crash trying to use union mount

2019-01-19 Thread Tom Ivar Helbekkmo
"J. Hannken-Illjes"  writes:

> Please show your mounted file systems.

: barsoom# ;mount
/dev/ld0a on / type ffs (local)
/dev/ld0f on /var type ffs (log, local)
/dev/ld0e on /usr type ffs (log, NFS exported, local)
/dev/ld1f on /var/pgsql/data type ffs (log, local)
/dev/ld2e on /usr/local type ffs (log, NFS exported, local)
mfs:463 on /tmp type mfs (synchronous, local)
tmpfs on /var/shm type tmpfs (local)
kernfs on /kern type kernfs (local)
ptyfs on /dev/pts type ptyfs (local)
procfs on /proc type procfs (local)
/dev/ld1e on /u type ffs (log, local)
/dev/dk0 on /m/store type ffs (log, local)
: barsoom# ;

My union mount was of an initially empty directory under /usr/local onto
/usr/src - so a directory from /dev/ld2e on top of one from /dev/ld0e.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Kernel crash trying to use union mount

2019-01-18 Thread Tom Ivar Helbekkmo
I just had a really weird crash on a NetBSD/amd64-current system,
running a kernel 8.99.30 from January 2nd.  Here's what happened:

I was going to experiment with a rather large set of changes to the
local copy of the source tree, which I'd want to revert afterwards, so I
created a directory on another file system, and mounted it on top of
/usr/src with mount_union.  I then copied a 10MiB diff into /usr/src/.
That went well - the file was visible in /usr/src/, and I observed that
it was correctly stored in the auxiliary directory, as expected.

Then I tried reading the file from /usr/src/, and the system immediately
crashed, and dumped core, with the panic:

kernel diagnostic assertion "fli->fli_trans_cnt > 0" failed: file 
"/usr/src/sys/kern/vfs_trans.c", line 451

I had an emacs running, and the crash happened while emacs was
attempting to tab-autocomplete the name of the file for me, so it hadn't
even gotten around to reading the file itself.

The really weird thing was that when it had finished dumping, having
counted down to 1, and printed "successful" on the (serial) console, it
just sat there, completely unresponsive - but still routing packets
(it's my main server, and my gateway to the Internet)!  I let it do this
for a while, and verified that I could connect to TCP ports on it from
inside and outside (as I was still logged on the NetBSD IRC server, I
had someone there check from outside for me), but userland was obviously
not running, so there was no response from the connection.

After a bit, I hit NMI on the front panel, and it dropped nicely into
the kernel debugger.  Bactraces from each of the four CPUs:

cache_lock_cpus()
cache_reclaim()
cache_thread()

sched_pstats()
uvm_scheduler()
sysctl_alloc()

kpause()
sigsuspend1()
sys___sigsuspend14()
syscall()
--- syscall (number 294) ---

x86_stihlt()
acpicpu_cstate_idle_enter()
acpicpu_cstate_idle()
idle_loop()
cpu_hatch()

I rebooted, and found that the file I'd copied into the union mount was
complete and intact in the directory I had union mounted onto /usr/src/,
so it obviously got correctly written.  Running crash(8) on the core
dump shows:

: barsoom# ;crash -M netbsd.73.core -N netbsd.73
Crash version 8.99.30, image version 8.99.30.
System panicked: kernel diagnostic assertion "fli->fli_trans_cnt > 0" failed: 
file "/usr/src/sys/kern/vfs_trans.c", line 451
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
?() at 9a73cceb1690
vpanic() at vpanic+0x178
ch_voltag_convert_in() at ch_voltag_convert_in
fstrans_done() at fstrans_done+0x126
VOP_UNLOCK() at VOP_UNLOCK+0x5b
vput() at vput+0x11
union_lookup1() at union_lookup1+0xfe
union_lookup() at union_lookup+0xa2
VOP_LOOKUP() at VOP_LOOKUP+0x52
lookup_once() at lookup_once+0x1ef
namei_tryemulroot() at namei_tryemulroot+0x45f
namei() at namei+0x29
fd_nameiat.isra.2() at fd_nameiat.isra.2+0x36
do_sys_statat() at do_sys_statat+0x87
sys_fstatat() at sys_fstatat+0x2d
syscall() at syscall+0x173
--- syscall (number 466) ---
7f7ff5c3f61a:

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


sys/arch/evbarm/conf/majors.evbarm missing in -current

2018-09-30 Thread Tom Ivar Helbekkmo
Running 'postinstall check' on a Raspberry Pi 3B+ with a current (well,
as of just before the openssl upgrade of a few days ago) evbarm64
installation complains about a missing majors file:

makedev check:
ERROR: can't find majors file '/usr/src/sys/arch/evbarm/conf/majors.evbarm'

and

ptyfsoldnodes check:
Cannot find device major numbers for pty master and slave

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


signature.asc
Description: PGP signature


Re: A couple of amd64-current crashes

2018-04-27 Thread Tom Ivar Helbekkmo
Martin Husemann  writes:

> Is that uhci a companion of a xhci? Which version of sys/dev/usb/xhci.c
> do you have and can you post a full dmesg please?

No xhci in this system, no.  The uhci usage here is serial communication
with a Z-Wave device that implements ucom0, as seen in dmesg.boot:

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 8.99.14 (DEJAH) #67: Wed Apr 25 01:18:55 CEST 2018

r...@barsoom.hamartun.priv.no:/usr/obj/sys/arch/amd64/compile.amd64/DEJAH
total memory = 4083 MB
avail memory = 3941 MB
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
running cgd selftest aes-xts-256 aes-xts-512 done
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Dell Inc. Latitude E6400  
mainbus0 (root)
ACPI: RSDP 0x000FB9C0 24 (v02 DELL  )
ACPI: XSDT 0xDF451E00 6C (v01 DELL   M09  27DD0604 ASL  
0061)
ACPI: FACP 0xDF451C9C F4 (v04 DELL   M09  27DD0604 ASL  
0061)
Firmware Warning (ACPI): 32/64X length mismatch in FADT/Gpe0Block: 128/64 
(20180313/tbfadt-642)
ACPI: DSDT 0xDF452400 006E14 (v02 INT430 SYSFexxx 1001 INTL 
20050624)
ACPI: FACS 0xDF460C00 40
ACPI: HPET 0xDF451F00 38 (v01 DELL   M09  0001 ASL  
0061)
ACPI:  0xDF460400 30 (v01 DELL   M09  27DD0604 ASL  
0061)
ACPI: APIC 0xDF452000 68 (v01 DELL   M09  27DD0604 ASL  
0047)
ACPI: ASF! 0xDF451C00 6A (v32 DELL   M09  27DD0604 ASL  
0061)
ACPI: MCFG 0xDF451FC0 3C (v16 DELL   M09  27DD0604 ASL  
0061)
ACPI: TCPA 0xDF452300 32 (v01  ASL  
)
ACPI: SLIC 0xDF45209C 000176 (v01 DELL   M09  27DD0604 ASL  
0061)
ACPI: SSDT 0xDF4502EB 00066C (v01 PmRef  CpuPm3000 INTL 
20050624)
ACPI: 2 ACPI AML tables successfully acquired and loaded
ioapic0 at mainbus0 apid 2: pa 0xfec0, version 0x20, 24 pins
cpu0 at mainbus0 apid 0
cpu0: Intel(R) Core(TM)2 Duo CPU T9600  @ 2.80GHz, id 0x1067a
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 1
cpu1: Intel(R) Core(TM)2 Duo CPU T9600  @ 2.80GHz, id 0x1067a
cpu1: package 0, core 1, smt 0
acpi0 at mainbus0: Intel ACPICA 20180313
acpi0: X/RSDT: OemId , AslId 
acpi0: MCFG: segment 0, bus 0-63, address 0xf800
ACPI: Dynamic OEM Table Load:
ACPI: SSDT 0xE401073FA990 0002C3 (v01 PmRef  BspIst   3000 INTL 
20050624)
ACPI: Dynamic OEM Table Load:
ACPI: SSDT 0xE4011ED85010 0005C6 (v01 PmRef  BspCst   3001 INTL 
20050624)
ACPI: Dynamic OEM Table Load:
ACPI: SSDT 0xE4010740B010 0001D7 (v01 PmRef  ApIst3000 INTL 
20050624)
ACPI: Dynamic OEM Table Load:
ACPI: SSDT 0xE4010742DA48 8D (v01 PmRef  ApCst3000 INTL 
20050624)
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed0-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
acpiec0 at acpi0 (ECDV, PNP0C09-0): io 0x930,0x934
pckbc1 at acpi0 (PS2M, PNP0F13) (aux port): irq 12
pckbc2 at acpi0 (KBC, PNP0303) (kbd port): io 0x60,0x64,0x62,0x66 irq 1
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43,0x50-0x53 irq 2
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61,0x63,0x65,0x67
spkr0 at pcppi1: PC Speaker
wsbell at spkr0 not configured
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
MB4 (PNP0C01) at acpi0 not configured
UAR1 (PNP0501) at acpi0 not configured
DIGC (PNP0501) at acpi0 not configured
ECP (PNP0401) at acpi0 not configured
TPM (BCM0102) at acpi0 not configured
TCM (ZIC0101) at acpi0 not configured
FTPM (PNP0C01) at acpi0 not configured
acpivga0 at acpi0 (VID): ACPI Display Adapter
acpiout0 at acpivga0 (CRT, 0x0100): ACPI Display Output Device
acpiout1 at acpivga0 (LCD, 0x0110): ACPI Display Output Device
acpiout1: brightness levels: [0,6,13,20,26,33,40,46,53,60,66,73,80,86,93,100]
acpiout2 at acpivga0 (DVI, 0x0112): ACPI Display Output Device
acpiout3 at acpivga0 (DVI2, 0x0111): ACPI Display Output Device
acpiout4 at acpivga0 (DP, 0x0113): ACPI Display Output Device
acpiout5 at acpivga0 (DP2, 0x0114): ACPI Display Output Device
acpivga0: connected output devices:
acpivga0:   0x0100 (acpiout0): Ext. Monitor, head 0
acpivga0:   0x0110 (acpiout1): LCD Panel, head 0
acpivga0:   0x0111 (acpiout3): Unknown Output Device, head 0
acpivga0:   0x0112 (acpiout2): Unknown Output Device, head 0
acpivga0:   0x0113 (acpiout4): Unknown Output Device, head 0
acpivga0:   0x0114 (acpiout5): Unknown Output Device, head 0
MB2 (PNP0C01) at acpi0 not configured
MB3 (PNP0C01) at acpi0 not configured
MB1 (PNP0C01) 

A couple of amd64-current crashes

2018-04-27 Thread Tom Ivar Helbekkmo
I'm playing with home automation software, so I'm suddenly doing a lot
of USB communication.  Here are a couple of crashes with -current as of
a couple of days ago:

: dejah# ;crash -N netbsd.9 -M netbsd.9.core
Crash version 8.99.14, image version 8.99.14.
System panicked: kernel diagnostic assertion "cv_is_valid(cv)" failed: file 
"/usr/src/sys/kern/kern_condvar.c", line 224 
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
_KERNEL_OPT_ACPI_SCANPCI() at _KERNEL_OPT_ACPI_SCANPCI+0x8
vpanic() at vpanic+0x178
ch_voltag_convert_in() at ch_voltag_convert_in
cv_unsleep() at cv_unsleep+0x7d
sigpost() at sigpost+0x20c
kpsignal2() at kpsignal2+0x5ad
kill1() at kill1+0xf2
sys_kill() at sys_kill+0x60
syscall() at syscall+0x208
--- syscall (number 37) ---
7d422ec3e7ba:
crash> 

: dejah# ;crash -N netbsd.10 -M netbsd.10.core
Crash version 8.99.14, image version 8.99.14.
System panicked: kernel diagnostic assertion "maxp != 0" failed: file 
"/usr/src/sys/dev/usb/uhci.c", line 2128 
Backtrace from time of crash is available.
crash> bt
_KERNEL_OPT_NARCNET() at 0
?() at e40107b5f190
vpanic() at vpanic+0x178
ch_voltag_convert_in() at ch_voltag_convert_in
uhci_reset_std_chain.isra.5() at uhci_reset_std_chain.isra.5+0x3de
uhci_device_intr_start() at uhci_device_intr_start+0xb0
usbd_start_next() at usbd_start_next+0xe7
uhci_softintr() at uhci_softintr+0x203
usb_soft_intr() at usb_soft_intr+0x38
softint_dispatch() at softint_dispatch+0xee
DDB lost frame for Xsoftintr+0x4f, trying 0x8000672400f0
Xsoftintr() at Xsoftintr+0x4f
--- interrupt ---
0:
crash> 

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


signature.asc
Description: PGP signature


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Tom Ivar Helbekkmo
Thomas Klausner  writes:

> On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote:
>> Saying this, from what I'm hearing this only happens at boot time, so we
>> could potentially shrink the buffer back down again if we need to consider
>> dynamically growing it in the kernel as well. No idea if that's even
>> possible or what performance impact it would have.
>
> I had an application report an UDP error with "no buffer space
> available". I don't remember the exact error, sorry. But it was
> definitely some time after system start.
>  Thomas

I keep getting those, and have been for a long, long time:

Apr 24 02:44:27 barsoom openvpn[301]: write UDPv4: No buffer space available 
(code=55)
Apr 24 05:54:47 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:08 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:09 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:15:09 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:45:14 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 11:35:18 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 13:15:12 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


signature.asc
Description: PGP signature


Re: $ sudo ifconfig run0 up list scan

2018-02-03 Thread Tom Ivar Helbekkmo
Andrew Cagney  writes:

> I'm guessing it should list both my and a few near by networks?  I'm
> getting no output and wpa_supplicant never gets past scanning.

Same here.  On my Raspberry Pi model B+:

NetBSD 8.99.10 (OTIUM) #7: Mon Jan  1 14:03:12 CET 2018

r...@barsoom.hamartun.priv.no:/usr/obj/sys/arch/evbarm/compile.evbarm/OTIUM
[...]
run0 at uhub1 port 3
run0: Ralink (0x1044) 802.11 n WLAN (0x800d), rev 2.00/1.01, addr 5
run0: MAC/BBP RT3070 (rev 0x0201), RF RT3020 (MIMO 1T1R), address 
6c:f0:6c:f0:bf:3e
run0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
run0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps 24Mbps 
36Mbps 48Mbps 54Mbps

I've been unable to get wpa_supplicant to connect this to anything --
and 'ifconfig run0 list scan' generates no output.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


signature.asc
Description: PGP signature


Re: Heads-up: RPI FDTisation committed

2018-01-02 Thread Tom Ivar Helbekkmo
Ryo ONODERA  writes:

> -current of 201801020600Z boots fine on my Raspberry Pi.
> Maybe
> https://mail-index.netbsd.org/source-changes/2018/01/01/msg090821.html
> helps me.

That's the change that made mine start working again.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Build failing after recent vadvise changes

2017-12-26 Thread Tom Ivar Helbekkmo
Christos Zoulas  writes:

> That's right, and the dependencies files point to them. Remember you
> need to to this both in the regular libc and compat.

Then the recent change to UPDATING by Martin Husemann should probably be
amended slightly, so it no longer states that a build without "-u" will do.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Build failing after recent vadvise changes

2017-12-26 Thread Tom Ivar Helbekkmo
Christos Zoulas  writes:

> You need to make cleandir in libc and rebuild.

I thought that was what a full build without "-u" did, but it obviously
isn't.  What I've now done is physically remove everything in the obj
directories under libc, and start a new build.  (The "vadvise.S" files
were left, even after the cleandir pass of the build, so I guess they
were no longer supposed to be used after the change, but still picked up
in production rules by make?)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Build failing after recent vadvise changes

2017-12-26 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Kamil Rytarowski <n...@gmx.com> writes:
>
>> It has been documented in doc/UPDATING:
>> [...]
>> +or a one time build without -u will do.
>
> ...which is what I did.  It didn't do.  :)

Interestingly, the i386 build got past libc itself, and failed here:

--- cat ---
#  link  cat/cat
/usr/tools/bin/i486--netbsdelf-gcc--sysroot=/usr/arena/i386  -pie  
-shared-libgcc  -Wl,-z,relro -Wl,--warn-shared-textrel -o cat  cat.o  
-Wl,-dynamic-linker=/libexec/ld.elf_so -Wl,-rpath,/lib  -L=/lib 
/usr/arena/i386/lib/libc.so: undefined reference to `SYS_vadvise'
collect2: error: ld returned 1 exit status

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Build failing after recent vadvise changes

2017-12-26 Thread Tom Ivar Helbekkmo
Kamil Rytarowski  writes:

> It has been documented in doc/UPDATING:
> [...]
> + or a one time build without -u will do.

...which is what I did.  It didn't do.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Build failing after recent vadvise changes

2017-12-26 Thread Tom Ivar Helbekkmo
I just updated from CVS again today, and, following the advice in
UPDATING, started a build without -u.  That didn't do the trick:

/usr/tools/bin/x86_64--netbsd-gcc -nodefaultlibs -shared -Wl,-soname,libc.so.12 
-Wl,--warn-shared-textrel -Wl,-Map=libc.so.12.map -Wl,-z,initfirst   
--sysroot=/usr/arena/amd64 -Wl,-z,relro -Wl,--warn-shared-textrel  -o 
libc.so.12.209.tmp  -Wl,-rpath,/lib  -L=/lib -Wl,-x  -Wl,--whole-archive 
libc_pic.a  -Wl,--no-whole-archive -lgcc 
/usr/tools/lib/gcc/x86_64--netbsd/5.5.0/../../../../x86_64--netbsd/bin/ld: 
libc_pic.a(vadvise.pico): relocation R_X86_64_32 against undefined symbol 
`SYS_vadvise' can not be used when making a shared object; recompile with -fPIC
/usr/tools/lib/gcc/x86_64--netbsd/5.5.0/../../../../x86_64--netbsd/bin/ld: 
final link failed: Nonrepresentable section on output
collect2: error: ld returned 1 exit status

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: 8.99.9 hangs

2017-12-25 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> The hang is hard enough that hitting the NMI switch doesn't do
> anything, which is interesting.

Sorry, that's not correct: it does acknowledge the NMI; it just doesn't
manage to get into DDB.  I get (manually copied from a photograph):

fatal non-maskable interruptfatal non-maskable interruptfatal non-maskable 
interrupt in supervisor mode
fatal non-maskable interrupt in supervisor mode
trap type 9 code 0 rip 0x[...] cs 0x9 rflags 0x202 cr2 0x[...] ilevel 0x8 rsp 
0x[...]
curlwp 0x[...] pid 1248.2 lowest kstack 0x[...]

...but no prompt from DDB, and no response to keypresses on the console.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: 8.99.9 hangs

2017-12-24 Thread Tom Ivar Helbekkmo
Thomas Klausner  writes:

> After updating to 8.99.9 I've experienced strange hangs. The keyboard
> and mouse don't work any longer, and it doesn't react to the power
> button, so I have to reset.

Same here.  It was really bad with a version from about a week ago, but
after updating on the 19th, so I got the changes from ozaki-r@ related
to multiprocessor safety, it got much better.  Still happens, though, on
the system I'm attaching dmesg.boot for.

The hang is hard enough that hitting the NMI switch doesn't do anything,
which is interesting.

And while on that topic: the current handling of NMI on the amd64
multiprocessor platform seems not quite right: we get output from each
processor saying that it's responding to the interrupt, and continuing
afterwards doesn't work, either.  I've played with it a bit, and have
something that at least lets just one CPU actually handle the NMI, and
where continuing works right.  A new NMI after resuming doesn't have any
effect, though, so I guess the non-maskable interrupt is.  :)

If someone who knows how this stuff actually works would like to look at
the code, and what I've done with it, I'm attaching my current diff.
I won't be surprised if I'm doing this all wrong...

-tih
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 8.99.9 (BARSOOM) #29: Sun Dec 24 11:05:05 CET 2017

r...@barsoom.hamartun.priv.no:/usr/obj/sys/arch/amd64/compile.amd64/BARSOOM
total memory = 8191 MB
avail memory = 7931 MB
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
running cgd selftest aes-xts-256 aes-xts-512 done
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Dell Computer Corporation PowerEdge 2850
mainbus0 (root)
ACPI: RSDP 0x000FD5B0 14 (v00 DELL  )
ACPI: RSDT 0x000FD5C4 38 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: FACP 0x000FD620 74 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: DSDT 0xBFFC 003CCD (v01 DELL   PE BKC   0001 MSFT 
010E)
ACPI: FACS 0xBFFCFC00 40
ACPI: APIC 0x000FD694 E0 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: SPCR 0x000FD774 50 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: HPET 0x000FD7C4 38 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: MCFG 0x000FD7FC 3C (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: 1 ACPI AML tables successfully acquired and loaded
ioapic0 at mainbus0 apid 8: pa 0xfec0, version 0x20, 24 pins
ioapic1 at mainbus0 apid 9: pa 0xfec8, version 0x20, 24 pins
ioapic2 at mainbus0 apid 10: pa 0xfec83000, version 0x20, 24 pins
ioapic3 at mainbus0 apid 11: pa 0xfec84000, version 0x20, 24 pins
cpu0 at mainbus0 apid 0
cpu0: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 6
cpu1: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu1: package 3, core 0, smt 0
cpu2 at mainbus0 apid 1
cpu2: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu2: package 0, core 0, smt 1
cpu3 at mainbus0 apid 7
cpu3: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu3: package 3, core 0, smt 1
acpi0 at mainbus0: Intel ACPICA 20171110
acpi0: X/RSDT: OemId , AslId 
acpi0: MCFG: segment 0, bus 0-255, address 0xe000
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed0-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
pcppi1 at acpi0 (SPK, PNP0800): io 0x61
spkr0 at pcppi1: PC Speaker
wsbell at spkr0 not configured
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x5f irq 0
FDC (PNP0700) at acpi0 not configured
COMA (PNP0501) at acpi0 not configured
MBIO (PNP0C01) at acpi0 not configured
NIPM (IPI0001) at acpi0 not configured
acpivga0 at acpi0 (EVGA): ACPI Display Adapter
PEHB (PNP0C02) at acpi0 not configured
ACPI: Enabled 1 GPEs in block 00 to 1F
attimer1: attached to pcppi1
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 8086 product 3590 (rev. 0x09)
ppb0 at pci0 dev 2 function 0: vendor 8086 product 3595 (rev. 0x09)
ppb0: PCI Express capability version 1  x8 @ 
2.5GT/s
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci1 dev 0 function 0: vendor 8086 product 0330 (rev. 0x06)
ppb1: PCI Express capability version 1 
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
amr0 at pci2 dev 14 function 0: AMI RAID 
amr0: interrupting at ioapic1 pin 14
amr0: firmware 5B2D, BIOS H435, 

Re: What's with pkgsrc and checksums?

2017-12-22 Thread Tom Ivar Helbekkmo
Leonardo Taccari  writes:

> The main difference between `make mps' and manually invoking sha1(1)
> is that the RCS $NetBSD$ keywords are deleted in the former, and hence
> why the SHA1 checksum differs.

Ah, and that's why the latter suddenly failed for me when I modified an
existing patch file instead of creating a new one, as I've always done
in the past!  Now it makes sense.  :)

> Apart what Joerg said and - depends on the use case - probably
> LOCALPATCHES can be quite useful as well and IMHO handier.

That's really useful - thanks for the tip!

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: What's with pkgsrc and checksums?

2017-12-21 Thread Tom Ivar Helbekkmo
Joerg Sonnenberger  writes:

> That's never really been supported... The correct target is makepatchsum
> (mps) or makedistinfo.

Ah, "make distinfo" in the relevant pkgsrc directory.  That works; it
makes /usr/pkgsrc/mk/checksum/checksum claim that the checksum is wrong
(and it also disagrees with the checksum algorithm the distfile claims
has been used), but the actual "make patch" process works.

I'll just squeeze my eyes tightly closed and use it.  Thanks!  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


What's with pkgsrc and checksums?

2017-12-21 Thread Tom Ivar Helbekkmo
I'm used to being able to add local patches to pkgsrc, by putting the
patch file in the patches/ subdirectory of the package, and then just
doing "cd patches/; sha1 patch-my.new.patch >> ../distinfo".

Now, these fail like this (in this particular case, I've modified a
patch file, so I've replace its line in distinfo with the right one):

=> Applying pkgsrc patches for SDL2-2.0.7
**
Ignoring patch file
/m/barsoom/pkgsrc/devel/SDL2/patches/patch-src_joystick_bsd_SDL__sysjoystick.c:
invalid checksum
**
ERROR: Patching failed due to modified or broken patch file(s):
ERROR: 
/m/barsoom/pkgsrc/devel/SDL2/patches/patch-src_joystick_bsd_SDL__sysjoystick.c

OK, so I check:

: thuvia# ;/usr/pkgsrc/mk/checksum/checksum ../distinfo 
patch-src_joystick_bsd_SDL__sysjoystick.c
=> Checksum SHA1 OK for patch-src_joystick_bsd_SDL__sysjoystick.c

So what's changed?  I've been doing this for years...

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Incompatible struct in_pktinfo (kern/48166)

2017-12-11 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> [...] PR#48166, which is more than four years old, but still open.

Christos showed me in a private email that the problem has since been
all but solved.  The rest, I'm pretty sure I've got a good handle on.
I'll prepare a patch for a couple of suggested improvements, and some
corrections and clarifications to the documentation.

Back with more in a few days.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Incompatible struct in_pktinfo (kern/48166)

2017-12-08 Thread Tom Ivar Helbekkmo
I decided to see exactly what changes it would take to get the latest
PowerDNS suite to compile under NetBSD-current, which turns out to be
"not a lot".  One of the problems, though, is at our end, and concerns
more than just PowerDNS.  It has to do with how a UDP service on a
multi-homed machine gets its responses properly sent out *from* the
correct address using the IP_PKTINFO socket option.  This was addressed
in PR#48166, which is more than four years old, but still open.

Before I create a ticket for the PowerDNS folks with the set of patches
I'd like them to apply to their code, to make it build and run on NetBSD
"out of the box", I thought I'd see if maybe this one could be handled
at our end, as it seems to me it ought to be.

There's a nice summary of how it's used in the top answer here:
https://stackoverflow.com/questions/3062205/setting-the-source-ip-for-a-udp-socket

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Build failing on new gdb?

2017-11-29 Thread Tom Ivar Helbekkmo
Trying to update to a fresh current, I get:

#create  libgdb/ada-lang.d
CC=/usr/tools/bin/x86_64--netbsd-c++\ -std=gnu++11\ -Wno-error=stack-protector 
/usr/tools/bin/nbmkdep -f ada-lang.d.tmp  --  --sysroot=/usr/arena/amd64 
-D_KERNTYPES -I/usr/src/external/gpl3/gdb/lib/libgdb 
-I/usr/src/external/gpl3/gdb/lib/libgdb/arch/x86_64 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/config 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/common 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/gnulib/import 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/include/opcode 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/libdecnumber 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../libbfd/arch/x86_64 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../libdecnumber/arch/x86_64 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/bfd 
-I/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/include -Dsighandler_t=sig_t 
-DTARGET_SYSTEM_ROOT=\"\" -DTARGET_SYSTEM_ROOT_RELOCATABLE=0 
-DBINDIR=\"/usr/bin\" -DLOCALEDIR=\"/usr/share/locale\" -DHAVE_CONFIG_H -DTUI=1 
-D_KERNTYPES -D_KERNTYPES 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/ada-lang.c &&  mv 
ada-lang.d.tmp ada-lang.d
In file included from /usr/arena/amd64/usr/include/g++/chrono:35:0,
 from 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/utils.h:26,
 from 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/defs.h:751,
 from 
/usr/src/external/gpl3/gdb/lib/libgdb/../../dist/gdb/ada-lang.c:21:
/usr/arena/amd64/usr/include/g++/bits/c++0x_warning.h:32:2: error: #error This 
file requires compiler and library support for the ISO C++ 2011 standard. This 
support must be enabled with the -std=c++11 or -std=gnu++11 compiler options.
 #error This file requires compiler and library support \
  ^
nbmkdep: compile failed.
*** [ada-lang.d] Error code 1
nbmake[9]: stopped in /usr/src/external/gpl3/gdb/lib/libgdb

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Remove fortune quotes attributed to or providing admiration of Adolf Hitler [pr bin/52735]

2017-11-18 Thread Tom Ivar Helbekkmo
m...@netbsd.org writes:

> For any fortune quote you add, you may remove another, no questions
> asked.
>
> For you, that means you can get rid of things you find even slightly
> offensive without needing to convince another person of it being
> "offensive enough".

I assume this was intended as ironic sarcasm.  Let me point out how that
seldom works out well on mailing lists -- or, for that matter, on USENET.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-24 Thread Tom Ivar Helbekkmo
Roy Marples  writes:

> The caveat is that we now need to ARP announce the address during
> reboot to ensure dhcpcd gets the reply on an active interface.

I assume it'll only do send a gratuitous ARP announcement for an address
whose lease is still active?  :)

> Let me know how it works for you.

Running with your latest patch now, and it's working fine for my simple
configuration, at least.

Thanks again, Roy!

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Tom Ivar Helbekkmo
Roy Marples  writes:

> And you can stop the kernel from doing this too if not using dhcpcd
> ndp -i wm0 -- -auto_linklocal

Thanks, Roy!  I wasn't aware of that possibility.  Here's my modified
/etc/ifconfig.wm0:

!/usr/sbin/ndp -i wm0 -- -auto_linklocal
up
media 100baseTX mediaopt full-duplex
ip4csum tcp4csum udp4csum

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Anyway, I've updated my local copy with your patch to improve the UDP
> error logging, and it's running now.  It'll be interesting to see what
> it says -- but I guess, since my network interface does checksumming in
> hardware, the reason for the error message is just that dhcpcd sees the
> packet from dhcpd on the local host before it gets a checksum.

Yup, that seems to be what it is:

Oct 23 14:18:44 barsoom dhcpcd[2232]: vlan3: UDP checksum failure from 
172.27.202.1
Oct 23 14:18:44 barsoom dhcpcd[2232]: wm0: UDP checksum failure from 
172.27.202.1

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Tom Ivar Helbekkmo
Roy Marples  writes:

> I don't know anything about 802.1q trunks.
> How can I tell that it is one, and why shouldn't it have a local address?

Maybe it should, at that?  I was just a bit surprised.  I've been
thinking of 802.1q trunk end points as something other than network
interfaces, but of course they're not: there's no reason why the same
physical network link shouldn't be able to carry both tagged and
untagged packets.

...although it's probably not a good idea, most of the time.  :)

> Even if dhcpcd is not used, if IPv6 is enabled in the kernel and
> auto-link local is set for the interface (which it is by default and it
> looks like you've not disabled it in ifconfig.wm0) then you would get
> this address anyway.

That's a system wide sysctl, isn't it?  Not a per interface thing?
Anyway, it doesn't make a difference, what with a trunk in practice
always being a point to point link between trunk ports on VLAN handling
devices.

Anyway, I've updated my local copy with your patch to improve the UDP
error logging, and it's running now.  It'll be interesting to see what
it says -- but I guess, since my network interface does checksumming in
hardware, the reason for the error message is just that dhcpcd sees the
packet from dhcpd on the local host before it gets a checksum.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Tom Ivar Helbekkmo
Roy Marples  writes:

> This normally indicates a UDP checksum failure.
> [...]
> Maybe try disabling hardware processing of UDP checksums on the interface?

Ah, I should have said: the packets it's complaining about are being
sent by the local host's dhcpd to a DHCP client on another VLAN, so they
haven't even hit the interface yet, I think.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Tom Ivar Helbekkmo
Kengo NAKAHARA  writes:

> Hmm..., sorry, I am not sure about this problem from that information.
> Could you get tcpdump? Of course, if it is not a problem, please do it.

tcpdump seems to show that this is dhcpcd listening on other interfaces
than the one I'm trying to keep it on.  Looking at wm0 is weird, though,
as even though that's the trunk, tcpdump can't see any 802.1q packets
there, but it does see everything as normal traffic.  I guess that has
to do with the hardware acceleration for 802.1q?

>> roy@n.o
>
> I think the issue seems to be related to DHCP. Could you think of any
> other way to solve it?

Has something changed that makes dhcpcd now insist on listening to all
interfaces (including the 802.1q trunk)?  Can I make it not do that?

Oh, and I notice that IPv6 generates a local address on wm0, as on
everything else.  That just looks weird on an 802.1q trunk.  Is there a
way to make it not do that?

# cat /etc/ifconfig.wm0

up
media 100baseTX mediaopt full-duplex
ip4csum tcp4csum udp4csum

# ifconfig wm0

wm0: flags=0x8843 mtu 1500
capabilities=2bf80
capabilities=2bf80
capabilities=2bf80
enabled=3f00
enabled=3f00
ec_capabilities=7
ec_enabled=3
address: 00:13:72:f7:00:06
media: Ethernet 100baseTX full-duplex
status: active
inet6 fe80::213:72ff:fef7:6%wm0/64 flags 0x0 scopeid 0x1

Which VLAN is that IPv6 address on, anyway?  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-22 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> That did the trick!  Thank you!  :)

I'm actually wondering if there may be something else strange going on.
Everything works fine -- but I have this dhcpcd running, because one of
my VLANs is connected to a network where this machine has to accept a
DHCP provisioned IP address from a server.  I run "dhcpcd -q vlan9", and
also give it a configuration file that should keep it from doing
anything I don't want:

allowinterfaces vlan9
interface vlan9
background
persistent
hostname_short
nogateway
nohook resolv.conf, wpa_supplicant, hostname, ntp.conf
script /usr/bin/true

However, after this last upgrade, I keep getting messages from dhcpcd
about other interfaces, where this host is the DHCP server, like:

Oct 22 16:48:28 barsoom dhcpcd[16236]: vlan2: invalid UDP packet from
172.27.201.1
Oct 22 16:48:28 barsoom dhcpcd[16236]: wm0: invalid UDP packet from
172.27.201.1

This happens every time a host on one of the other VLANs gets an address
from the local DHCP server, and I get this pair of messages; one for the
VLAN in question, one for wm0, which is the vlanif with the trunk on it.

Running 8.99.1 from about two months ago, these messages did not occur.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Crash related to VLANs in Oct 18th -current

2017-10-20 Thread Tom Ivar Helbekkmo
Kengo NAKAHARA  writes:

> Could you try the following patch?

That did the trick!  Thank you!  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Crash related to VLANs in Oct 18th -current

2017-10-19 Thread Tom Ivar Helbekkmo
I just updated to a fresh -current yesterday, and am running it on a
couple of amd64 systems.  It crashes during boot on the third one,
though, the one that has VLANs.

It configures wm0 thus:

# cat ifconfig.wm0
up
media 100baseTX mediaopt full-duplex
ip4csum tcp4csum udp4csum

...and then goes on to create a number of VLANs, by this pattern:

# cat ifconfig.vlan0
create
vlan 10 vlanif wm0
ip4csum tcp4csum udp4csum
inet 193.71.27.8 prefixlen 27
inet6 2001:8c0:c904:10::8 prefixlen 64

...and so on.  I set up five of those VLANs, and a split second later
(copied by hand from a photograph of a console terminal, as for some
reason I didn't get a valid crash dump) (the first line is truncated):

panic: kernel diagnostic assertion "(vlanid & ~ETHER_VLAN_MASK) == 0" failed: f
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
wm_rxeof() at netbsd:wm_rxeof+0x88f
wm_intr_legacy() at netbsd:wm_intr_legacy+0xa1
intr_biglock_wrapper() [...]

The KASSERT is in the vlan_set_tag() function in sys/net/if_ether.h.

The interface looks like this:

wm0: flags=0x8843 mtu 1500
capabilities=2bf80
capabilities=2bf80
capabilities=2bf80
enabled=3f00
enabled=3f00
ec_capabilities=7
ec_enabled=3
address: 00:13:72:f7:00:06
media: Ethernet 100baseTX full-duplex
status: active

I'm also running with ALTQ configured in the kernel, but I don't believe
it has been activated by the startup script at this time.

/var/run/dmesg.boot (with older kernel) appended below.

-tih

Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 8.99.1 (BARSOOM) #14: Fri Jul 21 13:05:26 CEST 2017

r...@barsoom.hamartun.priv.no:/usr/obj/sys/arch/amd64/compile.amd64/BARSOOM
total memory = 8191 MB
avail memory = 7932 MB
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
running cgd selftest aes-xts-256 aes-xts-512 done
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Dell Computer Corporation PowerEdge 2850
mainbus0 (root)
ACPI: RSDP 0x000FD5B0 14 (v00 DELL  )
ACPI: RSDT 0x000FD5C4 38 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: FACP 0x000FD620 74 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: DSDT 0xBFFC 003CCD (v01 DELL   PE BKC   0001 MSFT 
010E)
ACPI: FACS 0xBFFCFC00 40
ACPI: APIC 0x000FD694 E0 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: SPCR 0x000FD774 50 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: HPET 0x000FD7C4 38 (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: MCFG 0x000FD7FC 3C (v01 DELL   PE BKC   0001 MSFT 
010A)
ACPI: 1 ACPI AML tables successfully acquired and loaded
ioapic0 at mainbus0 apid 8: pa 0xfec0, version 0x20, 24 pins
ioapic1 at mainbus0 apid 9: pa 0xfec8, version 0x20, 24 pins
ioapic2 at mainbus0 apid 10: pa 0xfec83000, version 0x20, 24 pins
ioapic3 at mainbus0 apid 11: pa 0xfec84000, version 0x20, 24 pins
cpu0 at mainbus0 apid 0
cpu0: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu0: package 0, core 0, smt 0
cpu1 at mainbus0 apid 6
cpu1: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu1: package 3, core 0, smt 0
cpu2 at mainbus0 apid 1
cpu2: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu2: package 0, core 0, smt 1
cpu3 at mainbus0 apid 7
cpu3: Intel(R) Xeon(TM) CPU 3.00GHz, id 0xf43
cpu3: package 3, core 0, smt 1
acpi0 at mainbus0: Intel ACPICA 20170303
acpi0: X/RSDT: OemId , AslId 
acpi0: MCFG: segment 0, bus 0-255, address 0xe000
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed0-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
pcppi1 at acpi0 (SPK, PNP0800): io 0x61
spkr0 at pcppi1: PC Speaker
wsbell at spkr0 not configured
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x5f irq 0
FDC (PNP0700) at acpi0 not configured
COMA (PNP0501) at acpi0 not configured
MBIO (PNP0C01) at acpi0 not configured
NIPM (IPI0001) at acpi0 not configured
acpivga0 at acpi0 (EVGA): ACPI Display Adapter
PEHB (PNP0C02) at acpi0 not configured
ACPI: Enabled 1 GPEs in block 00 to 1F
attimer1: attached to pcppi1
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, 

Re: SSL/TLS and certificates

2017-05-27 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> This is after installing packages mozilla-rootcerts and
> mozilla-rootcerts-openssl, and configuring Postfix with
> smtpd_tls_CAfile = /etc/ssl/certs/ca-certificates.crt

...and there's my silly mistake right there: "smtpd_tls_CAfile" instead
of "smtp_tls_CAfile".

It works just fine when configured correctly.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: npf bug(?)

2017-03-30 Thread Tom Ivar Helbekkmo
6b...@6bone.informatik.uni-leipzig.de writes:

> Thanks for the patch. Unfortunately, I could not apply it to -7. I've
> tested -current. The problem seems to be solved.

Here it is, adjusted for the "netbsd-7" cvs tag:

--- sys/net/npf/npf_handler.c~  2017-03-30 17:26:49.458901595 +0200
+++ sys/net/npf/npf_handler.c   2017-03-30 17:29:52.833241529 +0200
@@ -146,7 +146,7 @@
npf_conn_t *con;
npf_rule_t *rl;
npf_rproc_t *rp;
-   int error, retfl;
+   int error, retfl, flags;
int decision;
 
/*
@@ -164,9 +164,17 @@
rp = NULL;
 
/* Cache everything.  Determine whether it is an IP fragment. */
-   if (__predict_false(npf_cache_all() & NPC_IPFRAG)) {
+   flags = npf_cache_all();
+   if (__predict_false(flags & NPC_IPFRAG)) {
/*
-* Pass to IPv4 or IPv6 reassembly mechanism.
+* We pass IPv6 fragments unconditionally
+* The first IPv6 fragment is not marked as such
+* and passes through the filter
+*/
+   if (flags & NPC_IP6)
+   return 0;
+   /*
+* Pass to IPv4 reassembly mechanism.
 */
error = npf_reassembly(, mp);
if (error) {
--- sys/net/npf/npf_inet.c~ 2017-03-30 17:27:07.661343255 +0200
+++ sys/net/npf/npf_inet.c  2017-03-30 17:30:45.721564537 +0200
@@ -352,6 +352,7 @@
case (IPV6_VERSION >> 4): {
struct ip6_hdr *ip6;
struct ip6_ext *ip6e;
+   struct ip6_frag *ip6f;
size_t off, hlen;
 
ip6 = nbuf_ensure_contig(nbuf, sizeof(struct ip6_hdr));
@@ -384,8 +385,21 @@
hlen = (ip6e->ip6e_len + 1) << 3;
break;
case IPPROTO_FRAGMENT:
+   ip6f = nbuf_ensure_contig(nbuf, sizeof(*ip6f));
+   if (ip6f == NULL)
+   return 0;
+   /*
+* We treat the first fragment as a regular
+* packet and then we pass the rest of the
+* fragments unconditionally. This way if
+* the first packet passes the rest will
+* be able to reassembled, if not they will
+* be ignored. We can do better later.
+*/
+   if (ntohs(ip6f->ip6f_offlg & IP6F_OFF_MASK) != 
0)
+   flags |= NPC_IPFRAG;
+
hlen = sizeof(struct ip6_frag);
-   flags |= NPC_IPFRAG;
break;
case IPPROTO_AH:
hlen = (ip6e->ip6e_len + 2) << 2;


-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: Error/warning message from rc.d/npf

2017-03-23 Thread Tom Ivar Helbekkmo
Paul Goyette  writes:

> See PR kern/51818 for more details - it seems that the second
> "element" in $ext_if is ignored, and the ruleset is applied only to
> the first "element".

I'm guessing tun0 doesn't exist at the time npf is loaded, and a
workaround would be to reload it after starting the process that creates
that interface.

I don't know what npf does (or what I think it should do, for that
matter) when interfaces that are mentioned in the configuration file,
but do not exist at startup, later get created.  Some such interfaces
may be locked to a particular purpose every time, while others may get
created and destroyed from time to time, but for different purposes at
different times.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: PulseAudio and OSS audio of recent NetBSD-current

2017-03-13 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> After the latest upgrade (from 7.99.59 to 7.99.64), it stopped working
> for me, as well.  I've gone back to using OSS directly instead.  :)

Just to add some detail: I was using pulseaudio (mostly from Firefox and
Audacious), and could no longer get any sound out of the workstation.
After fiddling with it for a while, I tried to make it use a USB sound
"card" instead of the built-in one, but that just hung up my USB with a
"ehci_sync_hc: timed out" message, forcing me to reboot.  No keyboard
or mouse is kind of bothersome on a workstation.  ;)

I'm now running my sound sources configured to use JACK (I was already
running jackd to route MIDI between synths) with its OSS backend, which
works.  However, if I accidentally run something else that also wants to
use OSS directly, I have to reboot the workstation to get sound again.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: PulseAudio and OSS audio of recent NetBSD-current

2017-03-12 Thread Tom Ivar Helbekkmo
Johnny Billquist  writes:

>> ...and *man*, I'd forgotten how much 16-bit audio sucks.  :)
>
> I guess that means you detest CDs like nothing else. :-)

Yup.  Compared, side-by-side, to a good vinyl recording, a CD sucks.

But you've got a point.  Falling back to OSS gives horrible sound
quality from (lossless) rips of CDs, which are, as you say, 16 bit to
begin with.  So what's the big difference from Pulse, then?

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: PulseAudio and OSS audio of recent NetBSD-current

2017-03-12 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> After the latest upgrade (from 7.99.59 to 7.99.64), it stopped working
> for me, as well.  I've gone back to using OSS directly instead.  :)

...and *man*, I'd forgotten how much 16-bit audio sucks.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: PulseAudio and OSS audio of recent NetBSD-current

2017-03-12 Thread Tom Ivar Helbekkmo
Ryo ONODERA  writes:

> I would like to know if my problem is my environment specific
> or not.

After the latest upgrade (from 7.99.59 to 7.99.64), it stopped working
for me, as well.  I've gone back to using OSS directly instead.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

> Hmm. Where did the first crash happen? In re(4) or NFS or ffs?

No idea, I'm afraid.  I'll just have to provoke another one.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Have done so, and have been building stuff for two or three hours with
> no accidents.  Looking good so far, in other words.

It crashed just now, though.  Unfortunately, it crashed again during
boot, and ended up overwriting the old core dump before saving it.  :(

The bootup crash looks like the ones I got earlier, with a twist:

panic: ffs_sync: rofs mod, fs=/
cpu1: Begin traceback...
WARNING: SPL NOT LOWERED ON SYSCALL 4056 5 EXIT 6eb05070 6
WARNING: SPL NOT LOWERED ON TRAP EXIT 6 0
vpanic() at WARNING: SPL NOT LOWERED ON TRAP EXIT 6 0
netbsd:vpanic+0x140
WARNING: SPL NOT LOWERED ON SYSCALL 0 1869754096 EXIT 6f722ef0 6
WARNING: SPL NOT LOWERED ON SYSCALL 0 1869754096 EXIT 6f722ef0 6
snprintf() at netbsd:snprintf
WARNING: SPL NOT LOWERED ON SYSCALL 0 1869771712 EXIT 6f7273c0 6
WARNING: SPL NOT LOWERED ON SYSCALL 0 1869771712 EXIT 6f7273c0 6
ffs_sync() at netbsd:ffs_sync+0x26b
VFS_SYNC() at netbsd:VFS_SYNC+0x1c
WARNING: SPL NOT LOWERED ON SYSCALL 0 1869771712 EXIT 6f7273c0 6
sched_sync() at netbsd:sched_sync+0x27b
cpu1: End traceback...
WARNING: SPL NOT LOWERED ON SYSCALL 0 1869771712 EXIT 6f7273c0 6

After this, it took a while to regain control, as it crashed on every
boot, like this:

panic: ffs_newvnode: dup alloc ino=104717 on /: mode 81a4/81a4 gen 
e04e1e3/e04e1e3 size 0 blocks 4
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
ffs_newvnode() at netbsd:ffs_newvnode+0x5c3
vcache_new() at netbsd:vcache_new+0x80
ufs_makeinode() at netbsd:ufs_makeinode+0x38
ufs_create() at netbsd:ufs_create+0x31
VOP_CREATE() at netbsd:VOP_CREATE+0x3d
vn_open() at netbsd:vn_open+0x351
do_open() at netbsd:do_open+0x112
do_sys_openat() at netbsd:do_sys_openat+0x68
sys_open() at netbsd:sys_open+0x24
syscall() at netbsd:syscall+0x1d8
--- syscall (number 5) ---
780a9a43df7a:
cpu0: End traceback...

I got rid of that by booting to single user, and removing the file.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> I will -- I'll just let it finish building firefox first.

Had to give up on that -- too unstable.

> Once that's done, I'll apply your patch to sys/net/if.c, and start
> some heavy building over NFS again.

Have done so, and have been building stuff for two or three hours with
no accidents.  Looking good so far, in other words.  I'll let it keep
running -- I'm planning to start a build of Libre Office before going to
bed.  That should keep the system occupied through the night.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

> Oops. Reverting the commit makes no sense. Please ignore the second
> request.

OK!  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


A few crashes with yesterday's amd64-current -- IPv6 related?

2017-03-05 Thread Tom Ivar Helbekkmo
I updated again yesterday, and it seems at least one stability issue
has been introduced since 7.99.59, which I was running before this.

The first crash came when I was trying to shut down to single user after
booting the new kernel with the existing userland.  I *think* it was
triggered by the kernel missing the correct module directory; I caught a
glimpse of it trying to access a module to connect to the console, and I
later discovered that my ttys file had console enabled instead of ttyE0:

panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() || 
ISSET(curlwp->l_pflag, LP_BOUND))" failed: file 
"/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, 
but preemption is enabled and the caller is not in a softint or CPU-bound LWP
cpu1: Begin traceback...
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
psref_release() at netbsd:psref_release+0xf8
ip_setmoptions() at netbsd:ip_setmoptions+0x269
ip_ctloutput() at netbsd:ip_ctloutput+0x1ee
rip_ctloutput() at netbsd:rip_ctloutput+0xee
rip_ctloutput_wrapper() at netbsd:rip_ctloutput_wrapper+0x2c
sosetopt() at netbsd:sosetopt+0x67
sys_setsockopt() at netbsd:sys_setsockopt+0x91
syscall() at netbsd:syscall+0x1d8
--- syscall (number 105) ---
7eb0dacdb16a:
cpu1: End traceback...

Then it crashed during boot, seemingly related to fsck:

panic: ffs_sync: rofs mod, fs=/
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
ffs_sync() at netbsd:ffs_sync+0x26b
VFS_SYNC() at netbsd:VFS_SYNC+0x1c
sched_sync() at netbsd:sched_sync+0x27b
cpu0: End traceback...

Anyway, I installed the complete updated userland on the machine, and
started updating a bunch of packages from source, with all disk activity
over NFS over UDP over IPv6.  After about three hours:

panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file 
"/usr/src/sys/dev/ic/rtl8169.c", line 1380 
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
re_txeof() at netbsd:re_txeof+0x250
re_intr() at netbsd:re_intr+0x11b
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
--- interrupt ---
x86_mwait() at netbsd:x86_mwait+0xd
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
idle_loop() at netbsd:idle_loop+0x18c
cpu0: End traceback...
uvm_fault(0xfe80cbca48c0, 0x0, 2) -> e
fatal page fault in supervisor mode
trap type 6 code 2 rip 8095500b cs 8 rflags 10282 cr2 84 ilevel 8 rsp 
fe8040afea80
curlwp 0xfe804dedaa20 pid 20873.1 lowest kstack 0xfe8040afb2c0

Once more, it crashed during boot, just like after the first crash:

panic: ffs_sync: rofs mod, fs=/
cpu1: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
ffs_sync() at netbsd:ffs_sync+0x26b
VFS_SYNC() at netbsd:VFS_SYNC+0x1c
sched_sync() at netbsd:sched_sync+0x27b
cpu1: End traceback...

I tried to continue building packages over NFS, but this happened again:

panic: kernel diagnostic assertion "txq->txq_mbuf != NULL" failed: file 
"/usr/src/sys/dev/ic/rtl8169.c", line 1380 
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
ch_voltag_convert_in() at netbsd:ch_voltag_convert_in
re_txeof() at netbsd:re_txeof+0x250
re_intr() at netbsd:re_intr+0x11b
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x1d
Xintr_ioapic_edge19() at netbsd:Xintr_ioapic_edge19+0xee
--- interrupt ---
x86_mwait() at netbsd:x86_mwait+0xd
acpicpu_cstate_idle_enter() at netbsd:acpicpu_cstate_idle_enter+0xdb
acpicpu_cstate_idle() at netbsd:acpicpu_cstate_idle+0xb6
idle_loop() at netbsd:idle_loop+0x18c
cpu0: End traceback...

This is when I pointed WRKOBJDIR to a local scratch directory in
/etc/mk.conf, thus reducing the amount of network traffic severely.
It's now building happily.  :)

I've noticed quite a few IPv6 changes, lately.  Might these mbuf related
assertions have something to do with that?

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: NFS, UDP, and IPv6 don't play nice together

2017-02-03 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> In fact, the very specific combination that doesn't work is NFS using
> UDP over IPv6.  That fails when writes are attempted, if the client is
> running -current.  The other three protocol combinations work fine.

Simplest way I've found to duplicate this:

In /etc/fstab:
sirius:/var/spool/bacula /m/bacula nfs rw,bg,intr,mntudp,noauto

This results in a UDP mount over IPv6.  (I need to use the explicit IPv4
address instead of the name to get an IPv4 mount.)  Then,

# mount /m/bacula
# df
[...]
sirius:/var/spool/bacula 0 0 0 100% /m/bacula
barsoom# stat /m/bacula
stat: /m/bacula: lstat: Input/output error

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


NFS, UDP, and IPv6 don't play nice together

2017-02-02 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> ...and with NFS over TCP, writing works without hanging.  :)

In fact, the very specific combination that doesn't work is NFS using
UDP over IPv6.  That fails when writes are attempted, if the client is
running -current.  The other three protocol combinations work fine.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-24 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Hm. Maybe I should change to a TCP mount, and see what happens...

...and with NFS over TCP, writing works without hanging.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-24 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

>>> The latest pfil.c (v1.34) should fix the panic. Could you try it?
>>
>> I'll give it a go tonight, and report back.

I re-introduced the change that I previously rolled back to get things
working, and then upgraded pfil.c to 1.34 and built a new kernel.  This
worked fine -- you've obviously corrected the problem.  :)

About the NFS hang:

> Can you get DDB? If you can, you can know where the processes hang up:
>   db> ps # you can get LWP addresses of ld and ls
>   db> bt/a  # you can get their stack traces

Noted - but I haven't been able to get into DDB.  I though Ctrl-Alt-Esc
in the first console (the Ctrl-Alt-F1 one) should do it, but it doesn't.

> The hang may happen depending on a NIC. Which NIC do you use?

re0 at pci2 dev 0 function 0: RealTek 8168/8111 PCIe Gigabit Ethernet

> And please let me know NFS options of the client and the server?

Not much.  Server:

nfs_server=YES and nfsd_flags="-n 16" in rc.conf.

Client:

nfs_client=YES in rc.conf, and "rw,bg,intr" as mount options.

Hm. Maybe I should change to a TCP mount, and see what happens...

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-23 Thread Tom Ivar Helbekkmo
Ryota Ozaki  writes:

> The latest pfil.c (v1.34) should fix the panic. Could you try it?

I'll give it a go tonight, and report back.

Meanwhile, do you think this ongoing MPSAFE work may have some unwanted
consequences for NFS?  There's a problem that's been around for at least
a couple of months, but that I only discovered the other day -- I was
running with kernels from late October then, and the problem I observed
is still there after upgrading.

Reading NFS file systems is no problem, which is why I didn't notice it
before, but writing hangs.  Here's an example: I started compiling a C
source file directly to an executable on an NFS mounted file system
(server and client both amd64 running fresh -current).  The compile pass
is fine, but when the ld end of the pipeline wants to write the
executable, it hangs.  So I try to do a 'df' in another terminal, and it
hangs.  Finally, I simply attempt to make 'ls -l [target executable]'
show me if it's written anything yet, and that hangs, too: after an
attempt to write has hung the communication up, reads no longer work,
either:

 UID   PID  PPID   CPU PRI  NI VSZ RSS WCHAN   STAT TTY  TIME 
COMMAND
   0 22179 22678 0 124   0   333445136 netio   D+   pts/170:00.01 
ld [...]
 501 21370 21006   516  85   089521144 nfsrcv  I+   pts/180:00.00 df
 501 21710 1 0 127   089641116 tstile  Dpts/20-   0:00.00 
/bin/ls [...]

Once I have something with "tstile" in the "WCHAN" column, I know that
I can't just reboot the machine: it's going to take a hard reset.

Oh, and it's the client that hangs; the server seems to be just fine,
and a reboot of the client makes NFS reads behave normally again.  On
the server, the output file got created, but is zero bytes.  The error
logged on the client when it gets stuck is this console output:

nfs send error 64 for barsoom:/usr/local

...and then the normal "nfs server not responding" messages in syslog
after that, of course.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: OpenVPN causes fresh -current to crash

2017-01-22 Thread Tom Ivar Helbekkmo
Martin Husemann  writes:

> Could you try backing out this change and see if it helps?
>
> http://mail-index.netbsd.org/source-changes/2017/01/16/msg081115.html

That did the trick.  I've rebooted a few times, now, and the system
comes up as it should, with no incident, every time.  Thanks!  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


OpenVPN causes fresh -current to crash

2017-01-22 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Didn't go so well.  My main machine does routing between several VLANs,
> using Quagga to manage the routing, NPF and ALTQ for traffic management,
> and OpenVPN for tunnels from remote devices, all the while offering a
> number of network services internally.
>
> After updating to a fresh current, attempting to enable NPF will crash
> the machine, as will starting OpenVPN.  The latter causes a crash the
> moment it tries to create a tun interface.

It's a little more complex than that.

With NPF enabled, the machine will sometimes boot, sometimes not.  It
may hang just after enabling NPF, or it may get hung later in the boot
process -- seemingly mostly while doing stuff with USB.  Turning the
machine fully off and on again before a reboot attempt seems to increase
the chance of a successful boot, but it's still about fifty/fifty.  If
it does boot completely, it seems to be stable after that.

OpenVPN, on the other hand, will reliably crash the system.  I'm running
openvpn-2.3.6nb2 from pkgsrc, compiled about a year ago.  It's set up to
create three tunnels, and to (like the rest of the system) route IPv4
and IPv6 over them.  When it starts, the kernel immediately panics while
handling a syscall number 5 for the openvpn process.  The following
copied by hand, because a recursive panic causes the attempt to dump
core to disk to fail:

panic: kernel diagnostic assertion "(kpreempt_disabled() || cpu_softintr_p() || 
ISSET(curlwp->l_pflag, LP_BOUND))" failed: file 
"/usr/src/sys/kern/subr_psref.c", line 291 passive references are CPU-local, 
but preemption is enabled and the caller is not in a softint or CPU-bound LWP

Backtrace:

vpanic()
ch_voltag_convert_in()
psref_release()
pfil_run_arg.isra.0()
if_initialize()
if_attach()
tun_clone_create()
tunopen()
cdev_open()
spec_open()
VOP_OPEN()
vn_open()
do_open()
do_sys_openat()
sys_open()
syscall()

This is with a NetBSD/amd64-current, updated from cvs yesterday.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: wm devices don't work under current amd64

2017-01-21 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Christos Zoulas <chris...@zoulas.com> writes:
>
>> Right now it seems to be a good time to upgrade for example...
>
> That's what I'm hoping for - I started a couple of hours ago.  :)

Didn't go so well.  My main machine does routing between several VLANs,
using Quagga to manage the routing, NPF and ALTQ for traffic management,
and OpenVPN for tunnels from remote devices, all the while offering a
number of network services internally.

After updating to a fresh current, attempting to enable NPF will crash
the machine, as will starting OpenVPN.  The latter causes a crash the
moment it tries to create a tun interface.

Unfortunately, these crashes only cause a traceback to quickly scroll
past on the console, instead of the proper core dump I'm used to.  Not
sure what has caused this change...

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: wm devices don't work under current amd64

2017-01-21 Thread Tom Ivar Helbekkmo
Christos Zoulas  writes:

> Right now it seems to be a good time to upgrade for example...

That's what I'm hoping for - I started a couple of hours ago.  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: wm devices don't work under current amd64

2017-01-20 Thread Tom Ivar Helbekkmo
Christos Zoulas  writes:

> I don't know about that. It is pretty stable with me...

I guess I worded that a bit clumsily.  :)  I meant that there seem to be
a number of rather deep changes going on, accompanied by more reports of
crashes than I'm used to seeing on current-users.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: wm devices don't work under current amd64

2017-01-20 Thread Tom Ivar Helbekkmo
Christos Zoulas  writes:

> Perhaps we want a lock?

OK, that makes sense - and might explain why my problem returned.
(Especially as it's not quite the same as before, and the differences
may well be locking related.)  But should I pursue this on 7.99.39, or
should I upgrade to the latest -current, and see what happens then?
I've been reluctant to upgrade, lately, because I'm under the impression
that -current is extremely unstable -- and I don't really have time
right now to have my primary systems crash on a daily basis...  :)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: wm devices don't work under current amd64

2017-01-20 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> However, another amd64 system that doesn't use VLANs, and is an NFS
> client, is unable to write to NFS file systems if it runs a kernel
> with the patch applied.

Don't mind me - after a little while, the problem returned on this
system, even without the patch.  So I've got something else going on.

Sorry about the false alarm!

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: wm devices don't work under current amd64

2017-01-20 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Masanobu SAITOH <msai...@execsw.org> writes:
>
>>  Please test the latest -current. knakahara found a problem:
>
> That worked fine!  No longer any need for the tcpdump hack.  :)
>
> (I didn't get the latest -current; I just added those patches to 7.99.39.)

Correction: it works *almost* fine.  Turns out that those patches alone
let my main amd64 system boot without the tcpdump hack, and work well
as, among other things, an NFS server.  However, another amd64 system
that doesn't use VLANs, and is an NFS client, is unable to write to NFS
file systems if it runs a kernel with the patch applied.

Patch on NFS server, not on client: no problem.
Patch on NFS server and client: writing to NFS hangs.
Patch on NFS client, not on server: writing to NFS hangs.

I guess the patch depends on other changes after 7.99.39...

Just to be sure we agree what we're discussing, this is the patch:

--- sys/net/if_ethersubr.c  10 Jan 2017 05:42:34 -  1.234
+++ sys/net/if_ethersubr.c  13 Jan 2017 06:11:56 -  1.235
@@ -1475,10 +1475,6 @@
int error;
struct ethercom *ec = (void *)ifp;
 
-   /* Already have VLAN's do nothing. */
-   if (ec->ec_nvlans != 0)
-   return 0;
-
/* Parent does not support VLAN's */
if ((ec->ec_capabilities & ETHERCAP_VLAN_MTU) == 0)
return -1;
--- sys/net/if_vlan.c   15 Dec 2016 09:28:06 -  1.93
+++ sys/net/if_vlan.c   13 Jan 2017 06:11:56 -  1.94
@@ -313,10 +313,12 @@
ifv->ifv_encaplen = ETHER_VLAN_ENCAP_LEN;
ifv->ifv_mintu = ETHERMIN;
 
-   if (ec->ec_nvlans == 0) {
+   if (ec->ec_nvlans++ == 0) {
if ((error = ether_enable_vlan_mtu(p)) >= 0) {
-   if (error)
+   if (error) {
+   ec->ec_nvlans--;
return error;
+   }
ifv->ifv_mtufudge = 0;
} else {
/*
@@ -329,7 +331,6 @@
ifv->ifv_mtufudge = ifv->ifv_encaplen;
}
}
-   ec->ec_nvlans++;
 
/*
 * If the parent interface can do hardware-assisted

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: wm devices don't work under current amd64

2017-01-13 Thread Tom Ivar Helbekkmo
Masanobu SAITOH  writes:

>  Please test the latest -current. knakahara found a problem:

That worked fine!  No longer any need for the tcpdump hack.  :)

(I didn't get the latest -current; I just added those patches to 7.99.39.)

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Leap seconds and date(1)

2016-12-31 Thread Tom Ivar Helbekkmo
Happy new year! :)

Being awake, I decided to observe the leap second.  Not much luck.
Turns out that date(1) shows :59 for two seconds, instead of going to
the correct :60 during the extra second.  Is that correct behaviour?

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: panic enabling ipfilter (Dec 27)

2016-12-28 Thread Tom Ivar Helbekkmo
Geoff Wing  writes:

> Unfortunately my machine is mostly headless and I can't get dmesg saved
> after reboot.
> [...]
> Is "bt;sync" better?

If you want the dmesg output, you do want the core dump:

# dmesg -N /var/crash/netbsd.42 -M /var/crash/netbsd.42.core

...and for backtraces after the fact (your path will vary):

# gdb /sys/arch/amd64/compile/obj.amd64/GENERIC/netbsd.gdb
(gdb) target kvm /var/crash/netbsd.42.core
(gdb) bt

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: File descriptor leak involving kqueue

2016-12-18 Thread Tom Ivar Helbekkmo
Eric Haszlakiewicz  writes:

> For cross-OS support you'll need to add a configure check for that.
> In some environments it seems that the memory leak can't be fixed,
> since res_ndestroy doesn't exist.

Looks that way -- Linux doesn't even have res_nclose(), so I guess
programmers who decide to go this low-level need to adapt their code
carefully to the systems it is to run on.

Now, why the opendmarc people decided to use these calls in the first
place, I don't understand.  They create a new resolver, perform a single
query using it, and tear it back down.  If it were to do something
special, like, say, setting RES_BLAST instead of RES_ROTATE, I could
understand it.  This is just weird.  Is there something I'm not seeing?

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: File descriptor leak involving kqueue

2016-12-18 Thread Tom Ivar Helbekkmo
Joerg Sonnenberger  writes:

> It seems pretty obvious that OpenDMARC is not correctly managing
> ressources. It creates an on-stack res_state, initialized it with
> res_ninit, but never destroys it.

Ah!  I didn't catch that - thanks!  I'll modify OpenDMARC to use
res_ndestroy() instead, now.

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: File descriptor leak involving kqueue

2016-12-18 Thread Tom Ivar Helbekkmo
Robert Swindells  writes:

> I think that close() of a socket can leak the kevent(2) structures
> if there are some still active.

No, I think something else is going on, and it's in the resolver.

In lib/libc/resolv/res_send.c, in the function res_nsend(), res_check()
is called, which in turn calls __res_vinit().  (Not every time, but my
ktrace shows that each leaking of a kqueue fd is preceded by such a
call, and the re-reading of /etc/resolv.conf.)

Now, __res_vinit() unconditionally does this:

statp->_u._ext.ext->kq = kqueue1(O_CLOEXEC);

It seems to me that either this needs to check for a kqueue already
existing (e.g. statp->_u._ext.ext->kq > 0), or it's assuming that if
__res_vinit() gets called more than once, everything has been torn down
properly first.  If so, I suspect that at least one of the calls to
res_nclose() in res_nsend() should really be a call to res_ndestroy(),
which does close the kqueue.

Thoughts?

-tih
-- 
Most people who graduate with CS degrees don't understand the significance
of Lisp.  Lisp is the most important idea in computer science.  --Alan Kay


Re: File descriptor leak involving kqueue

2016-12-13 Thread Tom Ivar Helbekkmo
Robert Swindells  writes:

> Could you try with the following patch ?

Running with it now - but not seeing any occurrences of your knote
messages.

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


File descriptor leak involving kqueue

2016-12-13 Thread Tom Ivar Helbekkmo
Lately, I'm running my postfix with opendkim and opendmarc milters (both
from pkgsrc).  Something about opendmarc is bleeding the system empty of
file handles, and I'd appreciate some help thinking about how to find
out exactly what's going on.

fstat shows me this:

opendmar opendmarc  12624   wd /  64910 drwxr-xr-x1024 r
opendmar opendmarc  126240 /  43280 crw-rw-rw-null rw
opendmar opendmarc  126241 /  43280 crw-rw-rw-null rw
opendmar opendmarc  126242 /  43280 crw-rw-rw-null rw
opendmar opendmarc  126243* unix dgram  <-> /var/run/log [using]
opendmar opendmarc  126244* internet stream tcp localhost:ddi-tcp-6
opendmar opendmarc  126245* internet stream tcp localhost:52067 <-> 
localhost:ddi-tcp-6
opendmar opendmarc  126246* kqueue pending 0
opendmar opendmarc  126247* kqueue pending 0
opendmar opendmarc  126248* kqueue pending 0
opendmar opendmarc  126249* kqueue pending 0
opendmar opendmarc  12624   10* kqueue pending 0
opendmar opendmarc  12624   11* kqueue pending 0
opendmar opendmarc  12624   12* kqueue pending 0

...and then the list of kqueue lines just grows and grows over time.

Doing a ktrace on the process seems to indicate that the kqueue use has
to do with the resolver.  The pattern is like this:

  7013 31 opendmarc CALL  open(0x73e95572c060,0x40,0x1b6)
  7013 31 opendmarc NAMI  "/etc/resolv.conf"
  7013 31 opendmarc RET   open 9
  7013 31 opendmarc CALL  __fstat50(9,0x73e9522d9c00)
  7013 31 opendmarc RET   __fstat50 0
  7013 31 opendmarc CALL  read(9,0x73e950d37000,0x4000)
  7013 31 opendmarc GIO   fd 9 read 47 bytes
   "nameserver 193.71.27.8\nsearch hamartun.priv.no\n"
  7013 31 opendmarc RET   read 47/0x2f
  7013 31 opendmarc CALL  read(9,0x73e950d37000,0x4000)
  7013 31 opendmarc GIO   fd 9 read 0 bytes
   ""
  7013 31 opendmarc RET   read 0
  7013 31 opendmarc CALL  fcntl(9,F_DUPFD_CLOEXEC,0x73e95613d000)
  7013 31 opendmarc RET   fcntl -1 errno 22 Invalid argument
  7013 31 opendmarc CALL  close(9)
  7013 31 opendmarc RET   close 0
  7013 31 opendmarc CALL  __fstat50(0x,0x73e9522d9e50)
  7013 31 opendmarc RET   __fstat50 -1 errno 9 Bad file descriptor
  7013 31 opendmarc CALL  kqueue1(0x40)
  7013 31 opendmarc RET   kqueue1 9
  7013 31 opendmarc CALL  __kevent50(9,0x73e9522d9e20,1,0,0,0x73e95572c080)
  7013 31 opendmarc RET   __kevent50 -1 errno 9 Bad file descriptor
  7013 31 opendmarc CALL  __gettimeofday50(0x73e9522d9e50,0)
  7013 31 opendmarc RET   __gettimeofday50 0
  7013 31 opendmarc CALL  __kevent50(9,0,0,0x73e9522d9860,1,0x73e95572c080)
  7013 31 opendmarc RET   __kevent50 0
  7013 31 opendmarc CALL  __socket30(2,0x1002,0)
  7013 31 opendmarc RET   __socket30 10/0xa
  7013 31 opendmarc CALL  connect(0xa,0x73e9522da6d4,0x10)
  7013 31 opendmarc RET   connect 0
  7013 31 opendmarc CALL  sendto(0xa,0x73e9522da010,0x24,0,0,0)
  7013 31 opendmarc MISC  msghdr: [name=0x0, namelen=0, 
iov=0xfe81015b8e50, iovlen=1, control=0x0, controllen=2166179032, flags=0]
  7013 31 opendmarc GIO   fd 10 wrote 36 bytes
   "\^]I\^A\0\0\^A\0\0\0\0\0\0\^F_dmarc\akeithf4\^Ccom\0\0\^P\0\^A"
  7013 31 opendmarc RET   sendto 36/0x24
  7013 31 opendmarc CALL  __clock_gettime50(0,0x73e9522d9890)
  7013 31 opendmarc RET   __clock_gettime50 0
  7013 31 opendmarc CALL  poll(0x73e9522d9970,1,0x1388)
  7013 31 opendmarc RET   poll 1
  7013 31 opendmarc CALL  
recvfrom(0xa,0x73e9522da8f0,0x2000,0,0x73e9522d9990,0x73e9522d9964)
  7013 31 opendmarc MISC  msghdr: [name=0x0, namelen=29, 
iov=0xfe81015b8e40, iovlen=1, control=0x0, controllen=3021220544, flags=0]
  7013 31 opendmarc GIO   fd 10 read 100 bytes
   
"\^]I\M^A\M^C\0\^A\0\0\0\^A\0\0\^F_dmarc\akeithf4\^Ccom\0\0\^P\0\^A\M-@\^S\0\^F\0\^A\0\0\a\b\0004\^Cns1\fdigita\
locean\M-@\^[\nhostmaster\M-@\^SW\M^X\^XC\0\0*0\0\0\^N\^P\0 
:\M^@\0\0\a\b"
  7013 31 opendmarc MISC  mbsoname: [193.71.27.8]
  7013 31 opendmarc RET   recvfrom 100/0x64
  7013 31 opendmarc CALL  close(0xa)
  7013 31 opendmarc RET   close 0
  7013 31 opendmarc CALL  __gettimeofday50(0x73e9522d9da0,0)
  7013 31 opendmarc RET   __gettimeofday50 0
  7013 31 opendmarc CALL  getpid
  7013 31 opendmarc RET   getpid 7013/0x1b65, 1
  7013 31 opendmarc CALL  __gettimeofday50(0x73e9522d9d30,0)
  7013 31 opendmarc RET   __gettimeofday50 0

...and then it reads /etc/resolv.conf again, to do another lookup.  This
time, the kqueue call returns fd 10 (which has just been closed, after
being used for the socket to talk to the name server).  Next time this
happens, we get 11, then 12, and so on...

It looks like the fd returned by the kqueue system call is never
closed.  I'm guessing /usr/src/lib/libc/resolv/res_send.c is where the
action is, but it's strange that opendmarc 

Re: xorg.conf is read but not acted on correctly

2016-12-11 Thread Tom Ivar Helbekkmo
co...@sdf.org writes:

> Try this:

Cool!  I've got a working console again. Thanks!  :)

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: xorg.conf is read but not acted on correctly

2016-12-10 Thread Tom Ivar Helbekkmo
co...@sdf.org writes:

> Try this:

I will! ...tomorrow, when I'm back home with my build system.  I'm at
our mountain cabin right now, with a cell phone as my modem/router.  :)

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: xorg.conf is read but not acted on correctly

2016-12-10 Thread Tom Ivar Helbekkmo
co...@sdf.org writes:

> (I dunno why it would be different on NetBSD than linux...
> the xorg bits are identical).

It's probably at a lower level.  NetBSD fails to display the (text)
console correctly with the nouveau driver, so I have to wait anxiously
for the psychedelic, misaligned scan lines to eventually be replaced by
a proper xdm login screen...  :)

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: xorg.conf is read but not acted on correctly

2016-12-10 Thread Tom Ivar Helbekkmo
I just tried attaching an external monitor to my old Dell Latitude E6400
laptop.  The plug is VGA style. Linux happily tells me I have two
monitors, with different resolutions, and it finds the correct maximum
resolution of the external monitor.  NetBSD (using the nouveau driver)
decides to run both in parallel, and since it can't figure out the
resolution of the external monitor, it selects 640x480 for the pair.

Is there a simple xorg.conf I can create to help resolve this?

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: The NPF firewall leaks! (was Re: in_cksum: out of data)

2016-12-09 Thread Tom Ivar Helbekkmo
Mindaugas Rasiukevicius  writes:

> I agree that this is not really intuitive and the documentation did
> not clarify this either.

Yes, the documentation should be changed to state that when you
explicitly specify tcp and stateful, you get the s/safr set.  Most
importantly, the examples (npf.conf(5)  and /usr/share/examples/npf)
should be corrected so they show the safest way to set things up.

I must say, NPF is a joy to use.  Even more sysadmin-friendly than PF.

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


The NPF firewall leaks! (was Re: in_cksum: out of data)

2016-12-06 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> So far, I have just one improvement suggestion for npf: the ability to
> use sets instead of singletons in rules is great, but needs to be
> extended to letting sets of addresses and networks cross address
> families.

I now have one more.  I accidentally created a leak in my npf
configuration, partially caused by looking at the example in the man
page npf.conf(5).

I've got several VLANs, one of them connected to the outside world, and
the others to internal networks with various levels of trust.  To limit
access among them, I've configured npf to handle each VLAN by allowing
all outbound traffic, statefully, while limiting inbound traffic to the
particular connections I want to allow.

The groups typically follow this pattern:

group "vlan10" on $vlan10 {
pass stateful out final all
pass in final proto tcp to $somehost port $someservices
pass in final proto udp to $somehost port $otherservices
block return in final all
}

Can you spot the vulnerability?

Some of the attack software that probes well-known ports to look for
holes, will respond to a TCP RST by sending a new TCP SYN from the very
same source port.  Guess what npf does then?  :)

Yup, the TCP RST sent by the last line of the above example gets
permitted out by the rule in the first line, updating the connection
state -- and the next connection attempt is permitted.

I had to change the above to this:

group "vlan10" on $vlan10 {
pass stateful out final proto tcp flags S/SAFR all
pass out final proto tcp all
pass stateful out final all
pass in final proto tcp to $somehost port $someservices
pass in final proto udp to $somehost port $otherservices
block return in final all
}

It's fine and all, but I tend to think that the simplistic first version
might automatically expand to the code in the second one.  In fact, the
documentation seems to agree with me:

 By default, a stateful rule implies SYN-only flag check ("flags
 S/SAFR") for the TCP packets.  It is not advisable to change this
 behavior; however, it can be overridden with the flags keyword.

The code or the documentation needs to change.  I vote for the code.  :)

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: in_cksum: out of data

2016-11-25 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Thanks, Christos!  Then I'll be converting from pf to npf+altqd over the
> weekend, I suppose.  Hmpf!  It's been only, what?, a few years? since I
> converted from ipfilter to pf!  ;)

Well, that was quick and easy!

I've now gone from pf to npf+altqd, and the transition was very simple
and straightforward.  More importantly, the resulting npf.conf is so
much more readable: the grouping of rules by interfaces really helps.
Even more importantly: IPv6 now works properly, since it's not dropping
fragments, and my gateway system has stopped complaining about not being
able to send UPD packets because of a shortage of mbufs!

A big "thank you" to rmind!

Of course, I lost a bit of functionality by not having the ALTQ rules
integrated into the firewall configuration, but since I was really just
tuning my ISP uplink to stop congestion and ensure responsiveness for
interactive traffic, it wasn't so hard to write a proper altq.conf.

So far, I have just one improvement suggestion for npf: the ability to
use sets instead of singletons in rules is great, but needs to be
extended to letting sets of addresses and networks cross address
families.  I'd like to be able to do this:

$myhost = { 193.71.27.7, 2001:8c0:c904:10::7 }
$myservices = { https, smtp }
pass in proto tcp to $myhost port $myservices

Instead, I have to say:

$myhost_v4 = 193.71.27.7
$myhost_v6 = 2001:8c0:c904:10::7
$myservices = { https, smtp }
pass in proto tcp to $myhost_v4 port $myservices
pass in proto tcp to $myhost_v6 port $myservices

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: in_cksum: out of data

2016-11-24 Thread Tom Ivar Helbekkmo
Christos Zoulas  writes:

>>Or, to ask more specifically: what *is* the preferred/suggested software
>>firewall to use in a NetBSD system these days?
>
> npf...

Thanks, Christos!  Then I'll be converting from pf to npf+altqd over the
weekend, I suppose.  Hmpf!  It's been only, what?, a few years? since I
converted from ipfilter to pf!  ;)

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: in_cksum: out of data

2016-11-24 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Are there any plans for upgrading pf to a more current version (the
> IPv6 support has been improved), or is the idea that one should
> transition to npf + altqd?

Or, to ask more specifically: what *is* the preferred/suggested software
firewall to use in a NetBSD system these days?

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


Re: in_cksum: out of data

2016-11-21 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> I'm using pf on it -- is this just a consequence of pf not supporting
> fragmented IPv6 packets?

That seems to be what it is.  Are there any plans for upgrading pf to a
more current version (the IPv6 support has been improved), or is the
idea that one should transition to npf + altqd?

-tih
-- 
I like long walks, especially when they are taken by people who annoy me.


in_cksum: out of data

2016-11-17 Thread Tom Ivar Helbekkmo
I've started using IPv6, and since then, my main NetBSD system, which is
my router connecting the ISP uplink to my various internal networks, has
been sporadically emitting the message "in_cksum: out of data".

I'm using pf on it -- is this just a consequence of pf not supporting
fragmented IPv6 packets?  Quoting from the pf.conf man page:

Currently, only IPv4 fragments are supported and IPv6 fragments are
blocked unconditionally.

-tih
-- 
Elections cannot be allowed to change anything.  --Dr. Wolfgang Schäuble


Re: dhcpcd build failure

2016-10-17 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> Martin Husemann <mar...@duskware.de> writes:
>
>> I would completely remove $OBJDIR/external/bsd/dhcpcd [...]

It turns out that's not the only one that needs to go.  It's being
separately built for a number of ramdisks, as well, and the same problem
occurs there.  So, before a "-u" build after the change:

find /usr/obj -type d -name dhcpcd | xargs rm -rf

-tih
-- 
Elections cannot be allowed to change anything.  --Dr. Wolfgang Schäuble


dhcpcd build failure

2016-10-16 Thread Tom Ivar Helbekkmo
/usr/src/UPDATING says:

20161009:
a new version of dhcpcd has been imported with slightly changed
build infrastructure. When doing a build.sh -u this requires
pruning the external/bsd/dhcpcd objdir.

However, even with this pruning, the build fails:

--- dhcpcd-embedded.d ---
#create  dhcpcd/dhcpcd-embedded.d
CC=/usr/tools/bin/x86_64--netbsd-gcc /usr/tools/bin/nbmkdep -f 
dhcpcd-embedded.d.tmp  --   -std=gnu99--sysroot=/usr/arena/amd64 
-DHAVE_CONFIG_H -D_OPENBSD_SOURCE -DSMALL -DINET -DINET6 -DDHCP6 
-I/usr/src/external/bsd/dhcpcd/dist 
-I/usr/src/distrib/amd64/ramdisks/ramdisk/obj.amd64/dhcpcd  -D_FORTIFY_SOURCE=2 
/usr/src/external/bsd/dhcpcd/dist/dhcpcd-embedded.c &&  mv 
dhcpcd-embedded.d.tmp dhcpcd-embedded.d
x86_64--netbsd-gcc: error: /usr/src/external/bsd/dhcpcd/dist/dhcpcd-embedded.c: 
No such file or directory
x86_64--netbsd-gcc: fatal error: no input files
compilation terminated.
nbmkdep: compile failed.
*** [dhcpcd-embedded.d] Error code 1

The file in question has, by this time, been created in the obj
directory, as it should.

-tih
-- 
Elections cannot be allowed to change anything.  --Dr. Wolfgang Schäuble


Re: dump(8) 4.3BSD syntax

2016-10-07 Thread Tom Ivar Helbekkmo
Thomas Klausner  writes:

> dump still supports, but doesn't document, the 4.3BSD style syntax.
>
> Is there someone who still finds that useful and thinks it should stay
> supported, or is it perhaps time to lose that bit of compatibility?

I hope that dump, just like ps, will keep the backward compatibility
permanently.  I for one would have to concentrate really hard to manage
to use the new syntax...

Ditching it would probably break a lot of existing scripts, too.

-tih
-- 
Elections cannot be allowed to change anything.  --Dr. Wolfgang Schäuble


Re: USB serial problems

2016-10-03 Thread Tom Ivar Helbekkmo
Tom Ivar Helbekkmo <t...@hamartun.priv.no> writes:

> It's still doing it.

But not after updating sys/dev/usb/ucom.c to revision 1.114, which was
committed by Nick Hudson today.  Previously, it would reliably hang on
my second attempt to run a test program that opened and read the USB
serial port, after aborting the program with ^C the first time.  I've
now done that about twenty times in a row, with no incident.

-tih
-- 
Elections cannot be allowed to change anything.  --Dr. Wolfgang Schäuble


  1   2   >