Re: Problems with dhcpcd

2023-10-19 Thread Roy Marples
  On Mon, 09 Oct 2023 12:41:30 +0100  Roy Marples  wrote --- 
 >   On Mon, 09 Oct 2023 11:33:16 +0100  Roy Marples  wrote --- 
 >  >   On Sun, 08 Oct 2023 21:58:54 +0100  Lloyd Parkes  wrote --- 
 >  >  > 
 >  >  > 
 >  >  > On 8/10/23 15:30, Lloyd Parkes wrote:
 >  >  > I found the problem. The syslog function in /libexec/dhcpcd-run-hooks 
 >  >  > tries to echo text to stdout/stderr and the shell script gets killed 
 >  >  > with SIGPIPE when it's being run in the background.
 >  >  > 
 >  >  > Commenting out the lines
 >  >  > 
 >  >  >  case "$lvl" in
 >  >  >  err|error)  echo "$interface: $*" >&2;;
 >  >  >  *)  echo "$interface: $*";;
 >  >  >  esac
 >  >  > 
 >  >  > allows the script to run correctly.
 >  >  > 
 >  >  > Adding the command 'trap "" PIPE' to /libexec/dhcpcd-run-hooks is 
 >  >  > another way that allows the script to run correctly.
 >  > 
 >  > That's interesting. So I'm looking at two bugs here then
 >  > 1) Why is SIGPIPE being raised in the first place
 >  > 2) Why is it not being captured as an error and logged by dhcpcd.
 >  > 
 >  > As best I can tell, even forcing stdout and stderr to /dev/null doesn't 
 > help here.
 >  > What else could this be?
 > 
 > 2) is fixed by this patch.
 > Now dhcpcd correctly reports a broken pipe from running the script.
 
I've just landed dhcpcd-10.0.4 into -current and pkgsrc which fixes this issue.
Sorry for the delay.
Let me know if it works for you!

Roy


Re: Problems with dhcpcd

2023-10-09 Thread Roy Marples
  On Mon, 09 Oct 2023 11:33:16 +0100  Roy Marples  wrote --- 
 >   On Sun, 08 Oct 2023 21:58:54 +0100  Lloyd Parkes  wrote --- 
 >  > 
 >  > 
 >  > On 8/10/23 15:30, Lloyd Parkes wrote:
 >  > I found the problem. The syslog function in /libexec/dhcpcd-run-hooks 
 >  > tries to echo text to stdout/stderr and the shell script gets killed 
 >  > with SIGPIPE when it's being run in the background.
 >  > 
 >  > Commenting out the lines
 >  > 
 >  >  case "$lvl" in
 >  >  err|error)  echo "$interface: $*" >&2;;
 >  >  *)  echo "$interface: $*";;
 >  >  esac
 >  > 
 >  > allows the script to run correctly.
 >  > 
 >  > Adding the command 'trap "" PIPE' to /libexec/dhcpcd-run-hooks is 
 >  > another way that allows the script to run correctly.
 > 
 > That's interesting. So I'm looking at two bugs here then
 > 1) Why is SIGPIPE being raised in the first place
 > 2) Why is it not being captured as an error and logged by dhcpcd.
 > 
 > As best I can tell, even forcing stdout and stderr to /dev/null doesn't help 
 > here.
 > What else could this be?

2) is fixed by this patch.
Now dhcpcd correctly reports a broken pipe from running the script.

https://github.com/NetworkConfiguration/dhcpcd/commit/617a3ae207898a968bccd1e40a299fbfa6a4cc52
diff --git a/src/script.c b/src/script.c
index 2ef99e38..69297a46 100644
--- a/src/script.c
+++ b/src/script.c
@@ -681,6 +681,21 @@ send_interface(struct fd_list *fd, const struct interface 
*ifp, int af)
return retval;
 }
 
+static int
+script_status(const char *script, int status)
+{
+
+   if (WIFEXITED(status)) {
+   if (WEXITSTATUS(status))
+   logerrx("%s: %s: WEXITSTATUS %d",
+   __func__, script, WEXITSTATUS(status));
+   } else if (WIFSIGNALED(status))
+   logerrx("%s: %s: %s",
+   __func__, script, strsignal(WTERMSIG(status)));
+
+   return WEXITSTATUS(status);
+}
+
 static int
 script_run(struct dhcpcd_ctx *ctx, char **argv)
 {
@@ -699,13 +714,7 @@ script_run(struct dhcpcd_ctx *ctx, char **argv)
break;
}
}
-   if (WIFEXITED(status)) {
-   if (WEXITSTATUS(status))
-   logerrx("%s: %s: WEXITSTATUS %d",
-   __func__, argv[0], WEXITSTATUS(status));
-   } else if (WIFSIGNALED(status))
-   logerrx("%s: %s: %s",
-   __func__, argv[0], strsignal(WTERMSIG(status)));
+   status = script_status(argv[0], status);
}
 
return WEXITSTATUS(status);
@@ -763,9 +772,13 @@ script_runreason(const struct interface *ifp, const char 
*reason)
 
 #ifdef PRIVSEP
if (ctx->options & DHCPCD_PRIVSEP) {
-   if (ps_root_script(ctx,
-   ctx->script_buf, (size_t)buflen) == -1)
+   ssize_t err;
+
+   err = ps_root_script(ctx, ctx->script_buf, (size_t)buflen);
+   if (err == -1)
logerr(__func__);
+   else
+   script_status(ctx->script, (int)err);
goto send_listeners;
}
 #endif



Re: Problems with dhcpcd

2023-10-09 Thread Roy Marples
  On Sun, 08 Oct 2023 21:58:54 +0100  Lloyd Parkes  wrote --- 
 > 
 > 
 > On 8/10/23 15:30, Lloyd Parkes wrote:
 > I found the problem. The syslog function in /libexec/dhcpcd-run-hooks 
 > tries to echo text to stdout/stderr and the shell script gets killed 
 > with SIGPIPE when it's being run in the background.
 > 
 > Commenting out the lines
 > 
 >  case "$lvl" in
 >  err|error)  echo "$interface: $*" >&2;;
 >  *)  echo "$interface: $*";;
 >  esac
 > 
 > allows the script to run correctly.
 > 
 > Adding the command 'trap "" PIPE' to /libexec/dhcpcd-run-hooks is 
 > another way that allows the script to run correctly.

That's interesting. So I'm looking at two bugs here then
1) Why is SIGPIPE being raised in the first place
2) Why is it not being captured as an error and logged by dhcpcd.

As best I can tell, even forcing stdout and stderr to /dev/null doesn't help 
here.
What else could this be?

Roy


Re: Problems with dhcpcd

2023-10-06 Thread Roy Marples
> I've installed 10.99.9 from about a day ago onto an old Raspberry Pi and I 
> just can't get it to correctly set its hostname from DHCP. (I have removed 
> the hostname=rpi from /etc/rc.conf).
> What I have discovered so far is that if I manually run "dhcpcd -d" then no 
> hostname gets set. If I run "dhcpcd -d -B" then the hostname does get set. 
> This doesn't make sense.

You're correct, this does not make sense.

> Here are the logs from the failed run (console and /var/log/messages). Dhcpcd 
> doesn't seem to be running the hooks for the "CARRIER", which is something 
> that does happen with dhcpcd -d -B". Interestingly, the message "executing: 
> /libexec/dhcpcd-run-hooks ..." is not logged to syslog by either invocation 
> of dhcpcd.

syslog.conf doesn't log debug messages to /var/log/messages by default - you 
need to enable that.
An alternative is to put `logfile /var/log/dhcpcd.log` into /etc/dhcpcd.conf 
and look there.

> Sep 30 22:15:40  dhcpcd[331]: usmsc0: rebinding lease of 10.0.1.53
> Sep 30 22:15:46  dhcpcd[331]: usmsc0: leased 10.0.1.53 for 86400 seconds
> Sep 30 22:15:52  dhcpcd[331]: usmsc0: adding route to 10.0.1.0/24
> Sep 30 22:15:52  dhcpcd[331]: usmsc0: adding default route via 10.0.1.1

So it took 12 seconds to complete the DHCP transaction and validate the 
addresses are good before applying the DHCP lease.
Without -B, dhcpcd will fork to the background right away so any assignments 
from the DHCP lease won't apply right away.

Is this what you are seeing? Is the hostname even there? You can examine the 
contents of your leases with `dhcpcd -U`.

I have only just imported dhcpcd-10.0.3 to -current.
Unlikely to address this exact issue (if there is one yet), but you never know.

Roy Marples



Re: zfs howto

2021-02-14 Thread Roy Marples

On 14/02/2021 09:35, J. Hannken-Illjes wrote:

The trigger is '-maproot' with group(s), first bug is mountd leaving
'cr_gid' as -2 and setting the first group list member to 10 in this case.

Second bug is ZFS setting illegal group id -2 aka 4294967294 to GID_NOBODY
with id -2.  Later this illegal id leads to null pointer dereference
in zfs_log_create() at zfs_log.c:297 "lr->lr_gid = fuidp->z_fuid_group"
where fuidp is NULL.

With the attached diff the ZFS bug gets fixed and your export works.


Fixes my export full root to ERLITE as well - thanks!
I don't have any group or mapping options, so I guess the hardcoded defaults 
failed.

Could we get ZFS not to actually panic in this instance?
I feel somewhat uncomfortable with ZFS hardcoding these values from user 
editable configs as well but don't have any good ideas for that.


Roy


Re: zfs howto

2021-02-12 Thread Roy Marples

On 12/02/2021 14:44, Greg Troxel wrote:


Long ago I rototilled to zfs howto adding far more questions than
answers.  I just did another rototill pass.

   https://wiki.netbsd.org/zfs/

While many \todos remain, the biggest questions I have are about NFS:

   If I want to export a zfs filesystem over NFS, what specifically do I
   need to do.

   Does the crash bug referenced in the NFS section still exist in
   current current?  (It's still open)
   http://gnats.netbsd.org/55042


It crashes when my ERLite tries to mount / NFS at the checking root phase.

Roy


Re: Automated report: NetBSD-current/i386 build failure

2021-02-03 Thread Roy Marples

On 03/02/2021 17:55, Ryo ONODERA wrote:

Exactly. It happens in dtrace userland build.


Fixed. Sorry about that.

Maybe we should not define CTASSERT ourselves and just use __CTASSERT to avoid 
this in the future?


Roy


Re: Automated report: NetBSD-current/i386 build failure

2021-02-03 Thread Roy Marples

On 03/02/2021 14:42, Ryo ONODERA wrote:

Hi,

It seems that CTASSERT in netinet/in.h conflicts with
CTASSERT in external/cddl/osnet/dist/uts/common/sys/debug.h.

Ryo ONODERA  writes:


Hi,

However I have gotten another failure:

--- dt_print.pico ---
In file included from /usr/src/external/cddl/osnet/sys/sys/debug.h:51,
  from /usr/src/external/cddl/osnet/sys/sys/uio.h:64,
  from /usr/world/9.99/amd64/dest/usr/include/sys/socket.h:99,
  from /us

r/src/external/cddl/osnet/lib/libdtrace/../../dist/lib/

libdtrace/common/dt_print.c:76:
/usr/world/9.99/amd64/dest/usr/include/netinet/in.h:162:1: error: macro "__CTASS
ERT" passed 2 arguments, but takes just 1
   162 | CTASSERT(sizeof(struct in_addr) == 4);
   | ^~~~


I cannot replicate this?
I'm just building a stock kernel - what extra options do I need?

Roy


Re: Help with libcurses and lynx under NetBSD-9 and -current?

2021-02-02 Thread Roy Marples

On 02/02/2021 09:44, Brett Lymn wrote:

Why don't you post your $TERMCAP and infocmp output here?



Umm I don't have a problem with using terminfo.  I am more interested in
working out why lynx is misbehaving in window.  I suspect that is
something I did wrong when I fixed another PR to do with the input
routines not preserving the cursor location.


That was mainly for Brian incase there is something we can spot that's wrong 
with his $TERMCAP string.


Roy


Re: Help with libcurses and lynx under NetBSD-9 and -current?

2021-02-01 Thread Roy Marples

On 01/02/2021 09:53, Brett Lymn wrote:

The TERMCAP variable has some severe liitations, the worst being it can
only be 256bytes in size which was more than adequate for a vt100
definition but a modern colour xterm definition simply won't fit in that
space, terminfo does not have these limitations.


Are you sure about that?
I don't think libterminfo imposes any length on $TERMCAP other than those 
translating to terminfo.


Not ruling out any errors with the conversion though.

You can verify $TERMCAP using infocmp.

$ echo $TERMCAP
dw|vt52|DEC vt52: :cr=^M:do=^J:nl=^J:bl=^G: :le=^H:bs:cd=\EJ:ce=\EK:cl=\EH\EJ: 
:cm=\EY%+ %+ :co#80:li#24: :nd=\EC:ta=^I:pt:sr=\EI:up=\EA: 
:ku=\EA:kd=\EB:kr=\EC:kl=\ED:kb=^H:

$ infocmp dw
# Reconstructed from $TERMCAP
dw|vt52|DEC vt52,
cols#80, lines#24,
bel=^G, clear=\EH\EJ, cr=^M, cub1=^H, cud1=^J, cuf1=\EC,
cup=\EY%p1%{32}%+%c%p2%{32}%+%c, cuu1=\EA, ed=\EJ, el=\EK, ht=^I,
ind=^J, kbs=^H, kcub1=\ED, kcud1=\EB, kcuf1=\EC, kcuu1=\EA, nel=^M^J,
ri=\EI,
$

Why don't you post your $TERMCAP and infocmp output here?

Roy


Re: Routing socket issue?

2021-01-31 Thread Roy Marples

Hi Frank :)

On 31/01/2021 07:58, Frank Kardel wrote:

For example I fail to see how RTM_LOSING helps that because it won't change
how ntpd would configure itself.

Well if I read the comment right I am inclined to differ here:
In in_pcs.c we find:
/*
  * Check for alternatives when higher level complains
  * about service problems.  For now, invalidate cached
  * routing information.  If the route was created dynamically
  * (by a redirect), time to try a default gateway again.
  */
in_losing(struct inpcb *inp)

and the call is in tcp_time.c:
     /*
  * If losing, let the lower level know and try for
  * a better route.  Also, if we backed off this far,
  * our srtt estimate is probably bogus.  Clobber it
  * so we'll take the next rtt measurement as our srtt;
  * move the current srtt into rttvar to keep the current
  * retransmit times until then.
  */

As ntpd acts after a grace period the routing engine may have corrected this 
situation and routing may indeed change.
ntpd's interactions with peers can take up to 1024s so it is good to attempt in 
a best effort way to keep the internal

local address/socket state close to the current state.
It is likely though that there have been routing messages like 
RTM_CHANGE/ADD/DELETE before that and RTM_LOSING is not providing

additional information at the point.


Right, RTM_LOSING is just informational.
If any routing does change then we get RTM_CHANGE/ADD/DELETE etc.





As NTP doesn't bring interfaces up or down, RFM_IFANNOUNCE is useless as well.
If the interface does vanish, any addresses on it will be reported via 
RTM_DELADDR.
RTM_IFINFO is also questionable as commentary in the code is that it only 
cares about addresses.



Well I read
ntp_io.c
     /*
  * we are keen on new and deleted addresses and
  * if an interface goes up and down or routing
  * changes
  */
not as being interested in addresses only.

Also keep in mind that at this point routing messages are processed in a loop 
and the action here

     timer_interfacetimeout(current_time + UPDATE_GRACE);
just sets the variable for the next interface+local address update run. This is 
very cheap. The grace period
will batch multiple routing message together. An explicit routing message flush 
is from my point of view
code clutter here. as the socket is effectively drained in the loop at the cost 
of examining the msg_type and setting

a variable. Not much gained here.


OK, we'll keep RTM_IFINFO but drop RTM_IFANNOUNCE.
The point is trying to eliminate the overflow message entirely.

I mean, if you want to argue against any of that then I would suggest why even 
bother filtering or looking at overflow at all?
Shrink the code - any activity on the routing socket, drain it ignoring all 
error, start the interface update timer.
That would be an option but we should react only on known events. There may be 
one or two events that could be removed from
the list after examination as other messages can cover for them. Keep in mind 
the this is a portable code section and the
code tries to be on the fail safe, robust side for the goal of address/routing 
tracking so adjusting it to a particular implementation

may break on other os implementations.


Well, Dragonfly (prior to my patches there) and by extension FreeBSD (not 
checked to see if that changed) both emit RMT_DELADDR before RTM_IFANNOUNCE (ie 
wrong order) so when they do overflow you never see RTM_IFANNOUNCE to say the 
interface vanished. Hence there is zero point is listening for it for ntp.






As for the message: IMHO it does not need to be logged at all (DPRINTF/maybe 
LOGDEBUG at most) because the overflow should and does just trigger ntpd to 
reevaluate the interface/routing configuration.


This information is not important at all for normal operation as the effects 
are correctly mitigated.


I changed it to LOG_DEBUG as well as removing RTM_LOSING and RTM_IFANNOUNCE as 
discussed above.




Great.

BTW: does the current code revert to (fail safe) periodic interface scanning 
if the routing socket is being disabled (happens when an unexpected error 
code is returned from read(2))?


No.

The socket is non blocking so the only error to ignore here would be EINTR.
Any other errors are due to bad programming IMO.
Could be bad programming, but I prefer the ntpd being forgiving against hiccups 
by reverting to periodic scanning when we
disable to routing socket. That is a fail safe strategy and would also warrant a 
log message as it is an unusual event.


EINTR is now ignored.
I'll find time to restore periodic scanning later.

Roy


Re: Routing socket issue?

2021-01-30 Thread Roy Marples

On 30/01/2021 18:27, Paul Goyette wrote:

On Sat, 30 Jan 2021, Roy Marples wrote:


On 30/01/2021 15:12, Paul Goyette wrote:

I thought we took care of the buffer-space issue a long time ago, but
today I've gotten about a dozen of these:

...
Jan 30 05:20:11 speedy ntpd[3146]: routing socket reports: No buffer
space available


I recently adding a patch to enable the diagnostic AND take action on it.
We can change the upstream default from LOG_ERR to LOG_DEBUG or maybe their 
custom DPRINTF though if you think that would help reduce the noise.


Not concerned about noise, just wanted to make sure we didn't have a
regression slip by.  As long as the message is deliberate, I'm not too
worried.


Just to be clear on this, we have the framework to filter out routing messages 
we don't need to stop overflow from happening and we can also detect when 
overflow still happens.
Currently ntpd now does both, before it just filtered out, but I didn't change 
what it was interested in and now I'm curious why it needs to be interested in 
actual routing changes for interface/address discovery as I'm pretty sure we can 
drop that.


As we enable this in more applications we just have to make some choices - 
filter more out vs increasing buffer size vs just discarding the error if the 
prior two are not feasible.


Roy


Re: Routing socket issue?

2021-01-30 Thread Roy Marples

On 30/01/2021 18:27, Paul Goyette wrote:

On Sat, 30 Jan 2021, Roy Marples wrote:


On 30/01/2021 15:12, Paul Goyette wrote:

I thought we took care of the buffer-space issue a long time ago, but
today I've gotten about a dozen of these:

...
Jan 30 05:20:11 speedy ntpd[3146]: routing socket reports: No buffer
space available


I recently adding a patch to enable the diagnostic AND take action on it.
We can change the upstream default from LOG_ERR to LOG_DEBUG or maybe their 
custom DPRINTF though if you think that would help reduce the noise.


Not concerned about noise, just wanted to make sure we didn't have a
regression slip by.  As long as the message is deliberate, I'm not too
worried.


Well, currently other apps such as dhcpcd still log an error when the routing 
socket overflows but a more helpful message.


I think we can just change it to:
   routing socket overflowed - will update interfaces

Happy with that?

To alleviate the issue we could also stop ntpd from listening to routing changes 
has that has no bearing on how it discovers interfaces and addresses as far as i 
can tell.

Frank ok with that?

Roy


Re: Routing socket issue?

2021-01-30 Thread Roy Marples

On 30/01/2021 15:12, Paul Goyette wrote:

I thought we took care of the buffer-space issue a long time ago, but
today I've gotten about a dozen of these:

...
Jan 30 05:20:11 speedy ntpd[3146]: routing socket reports: No buffer
space available


I recently adding a patch to enable the diagnostic AND take action on it.
We can change the upstream default from LOG_ERR to LOG_DEBUG or maybe their 
custom DPRINTF though if you think that would help reduce the noise.


Roy


Re: Help with libcurses and lynx under NetBSD-9 and -current?

2021-01-27 Thread Roy Marples

On 27/01/2021 17:52, Christos Zoulas wrote:

In article ,
RVP   wrote:

This might be due to the fact that window(1) relies on setting a
custom TERMCAP environment variable to inform programs running
under it of the term. capabilities it supports, and the curses
library no longer makes use of that.

With ncurses, building it with the `--enable-termcap' option
makes it use the TERMCAP variable if it set in the environment.

The ncurses(w) in pkgsrc is not built with that option, so, I
compiled the latest ncurses from source with that option added
and lynx -show_cursor worked just fine under window(1).

-RVP


I think we can make our libterminfo do the same by shuffling a few ifdefs
around :-)


No need for that.

TERMINFO_COMPILE is defined unless built SMALLPROG

So $TERMCAP is respected in the from the environement by default after 
installation. See terminfo(5) for more details as $TERMINFO will take precedence 
is also set.


Roy


Re: Using wg(4) with a commerical VPN provider

2020-11-11 Thread Roy Marples

On 11/11/2020 01:49, Brad Spencer wrote:

@@ -2352,6 +2361,7 @@
if (*af == AF_INET) {
packet_len = ntohs(ip->ip_len);
} else {
+#ifdef INET6
const struct ip6_hdr *ip6;
  
  		if (__predict_false(decrypted_len < sizeof(struct ip6_hdr)))



Might be better to roll it into case statement.
Could wg one day work with a third address family?

Roy


Re: Automated report: NetBSD-current/i386 test failure (l2tp)

2020-10-25 Thread Roy Marples

On 23/10/2020 08:25, Andreas Gustafsson wrote:

Roy Marples wrote:

This is rump crashing and I don't know why.


If the rump kernel crashes in the test, that likely means the real
kernel will crash in actual use.


I can't get a backtrace to tell me where the problem is.


I managed to get one this way:

   sysctl -w kern.defcorename="/tmp/%n.core"
   cd /usr/tests/net/if_l2tp
   ./t_l2tp l2tp_basic_ipv4overipv4
   gdb rump_server /tmp/rump_server.core

It looks like this:


Thanks for that, it should now be fixed.

Roy


Re: Automated report: NetBSD-current/i386 test failure (l2tp)

2020-10-22 Thread Roy Marples

Hi Andreas

On 22/10/2020 09:00, Andreas Gustafsson wrote:

Hi Roy,

On Oct 16, the NetBSD Test Fixture wrote:

The newly failing test cases are:

 net/if_l2tp/t_l2tp:l2tp_basic_ipv4overipv4
 net/if_l2tp/t_l2tp:l2tp_basic_ipv4overipv6
 net/if_l2tp/t_l2tp:l2tp_basic_ipv6overipv4
 net/if_l2tp/t_l2tp:l2tp_basic_ipv6overipv6
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_ah_hmacsha512
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_ah_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_esp_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_transport_esp_rijndaelcbc
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_ah_hmacsha512
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_ah_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_esp_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv4_tunnel_esp_rijndaelcbc
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_ah_hmacsha512
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_ah_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_esp_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_transport_esp_rijndaelcbc
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_ah_hmacsha512
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_ah_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_esp_null
 net/ipsec/t_ipsec_l2tp:ipsec_l2tp_ipv6_tunnel_esp_rijndaelcbc


These are still failing as of 2020.10.21.15.12.15, and the commit that
triggered the failures has now been identified:

   2020.10.15.02.54.10 roy src/sys/net/if_l2tp.c 1.44

For logs, see

   
http://www.gson.org/netbsd/bugs/build/amd64/commits-2020.10.html#2020.10.15.02.54.10


This is rump crashing and I don't know why.
I can't get a backtrace to tell me where the problem is.

Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-10-16 Thread Roy Marples

On 16/10/2020 15:54, NetBSD Test Fixture wrote:

This is an automatically generated notice of new failures of the
NetBSD test suite.

The newly failing test cases are:

 net/if_wg/t_basic:wg_basic_ipv6_over_ipv4
 net/if_wg/t_basic:wg_basic_ipv6_over_ipv6
 net/if_wg/t_basic:wg_payload_sizes_ipv6_over_ipv4
 net/if_wg/t_basic:wg_payload_sizes_ipv6_over_ipv6


These should now be fixed.

Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-10-14 Thread Roy Marples

On 14/10/2020 07:15, Andreas Gustafsson wrote:

On Oct 8, the NetBSD Test Fixture wrote:

The newly failing test cases are:

 net/carp/t_basic:carp_handover_ipv4_halt_carpdevip
 net/carp/t_basic:carp_handover_ipv4_halt_nocarpdevip
 net/carp/t_basic:carp_handover_ipv4_ifdown_carpdevip
 net/carp/t_basic:carp_handover_ipv4_ifdown_nocarpdevip
 net/carp/t_basic:carp_handover_ipv6_halt_carpdevip
 net/carp/t_basic:carp_handover_ipv6_ifdown_carpdevip


These were fixed on Oct 8, but then broken again on Oct 12:

   
http://releng.netbsd.org/b5reports/i386/commits-2020.10.html#2020.10.12.11.07.27


Fixed here:
https://mail-index.netbsd.org/source-changes/2020/10/14/msg122921.html

Note, if `ident /sbin/ifconfig | grep ifconfig.c` shows revision r1.243 - r1.247 
then ifconfig will likely crash with this change on carp interfaces.

This has been resolved in r1.248

Roy


Re: gdb - undefined reference to `std::__1::codecvt::id'

2020-09-29 Thread Roy Marples

On 29/09/2020 20:26, Christos Zoulas wrote:

Or use gcc instead of clang :-)


Ew


Re: gdb - undefined reference to `std::__1::codecvt::id'

2020-09-29 Thread Roy Marples

On 29/09/2020 17:13, Kamil Rytarowski wrote:

The basesystem libc++ is too old for C++ applications like GDB.


I find that dubious as we have the new gdb building fine on amd64 and i386 with 
gnu compiler according to our test runs.

Unless the machine has a local override.

This is clang compiler.


A workaround is to force old GDB.


I've just disabled building GDB for the time being.

Roy


gdb - undefined reference to `std::__1::codecvt::id'

2020-09-29 Thread Roy Marples

#  link  gdb/gdb
/usr/tools/bin/x86_64--netbsd-clang++--sysroot=/ -Wl,--warn-shared-textrel 
-Wl,-z,relro   -pie  -o gdb  gdb.o  -Wl,-rpath-link,/lib  -L=/lib 
-L/home/roy/src/hg/src/external/gpl3/gdb/lib/libgdb/obj.amd64 -lgdb 
-L/home/roy/src/hg/src/external/gpl3/gdb/lib/libopcodes/obj.amd64 -lopcodes 
-L/home/roy/src/hg/src/external/gpl3/gdb/lib/libbfd/obj.amd64 -lbfd 
-L/home/roy/src/hg/src/external/gpl3/gdb/lib/libdecnumber/obj.amd64 -ldecnumber 
-L/home/roy/src/hg/src/external/gpl3/gdb/lib/libgdbsupport/obj.amd64 
-lgdbsupport  -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libctf/obj.amd64 
-lctf  -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libgnulib/obj.amd64 -lgnulib 
 -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libreadline/obj.amd64 -lreadline 
-lterminfo  -L/home/roy/src/hg/src/external/gpl3/gdb/lib/libiberty/obj.amd64 
-liberty -lexpat -llzma -lz -lcurses -lintl -lm -lkvm -lutil
/usr/tools/bin/x86_64--netbsd-ld: 
/home/roy/src/hg/src/external/gpl3/gdb/lib/libgdb/obj.amd64/libgdb.a(string_view-selftests.o): 
in function `std::__1::basic_filebuf 
>::basic_filebuf()':
string_view-selftests.c:(.text._ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev[_ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev]+0x94): 
undefined reference to `std::__1::codecvt::id'
/usr/tools/bin/x86_64--netbsd-ld: 
string_view-selftests.c:(.text._ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev[_ZNSt3__113basic_filebufIcNS_11char_traitsIcEEEC2Ev]+0xc4): 
undefined reference to `std::__1::codecvt::id'
/usr/tools/bin/x86_64--netbsd-ld: 
/home/roy/src/hg/src/external/gpl3/gdb/lib/libgdb/obj.amd64/libgdb.a(string_view-selftests.o): 
in function `std::__1::basic_filebuf 
>::imbue(std::__1::locale const&)':
string_view-selftests.c:(.text._ZNSt3__113basic_filebufIcNS_11char_traitsIcEEE5imbueERKNS_6localeE[_ZNSt3__113basic_filebufIcNS_11char_traitsIcEEE5imbueERKNS_6localeE]+0x13): 
undefined reference to `std::__1::codecvt::id'
x86_64--netbsd-clang: error: linker command failed with exit code 1 (use -v to 
see invocation)

*** Error code 1

What went wrong?
My very limited knowledge of C++ and google foo says codecvt should be part of 
libc++?


Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-09-23 Thread Roy Marples

On 23/09/2020 11:42, NetBSD Test Fixture wrote:

This is an automatically generated notice of a new failure of the
NetBSD test suite.

The newly failing test case is:

 net/if/t_ifconfig:ifconfig_options

The above test failed in each of the last 4 test runs, and passed in
at least 26 consecutive runs before that.

The following commits were made between the last successful test and
the failed test:

 2020.09.23.02.09.18 roy src/sbin/ifconfig/ifconfig.8,v 1.120
 2020.09.23.02.09.18 roy src/sbin/ifconfig/ifconfig.c,v 1.244
 2020.09.23.02.32.04 roy src/usr.sbin/ifwatchd/ifwatchd.c,v 1.44

Logs can be found at:

 
http://releng.NetBSD.org/b5reports/i386/commits-2020.09.html#2020.09.23.02.32.04



Fixed.

Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-09-19 Thread Roy Marples




On 20/09/2020 04:40, Robert Elz wrote:

 Date:Sun, 20 Sep 2020 04:02:45 +0100
 From:Roy Marples 
 Message-ID:  <51d2f8dc-d059-5eae-9899-5c91539d1...@marples.name>

   | The test case just needed fixing.

That is not uncommon after changes elsewhere.

   | The ping to an invalid address caused the ARP entry to enter INCOMPLETE ->
   | WAITDELETE state and this hung over into the next test casing this entry
   | to take too long to validty resolve.

Why?   If a failed ARP (or ND) causes problems for a later request
(incl of the same addr) which should work (that is, any problems at all,
including delays) then I'd consider the implementation broken (not the test).


RFC 7048 expands that consistent failures expontentialy backoff.
Because the server is not reset the backoff may bleed into subsequet tests for 
the same address which why this test was sometimes failing.




   | The solution is after a deliberate fail

And if it wasn't a deliberate fail?  Perhaps being just a fraction of a
second too quick, and attempting a ping (or ssh, or something) just before
the destination becomes reachable (either because it was down, unconfigured,
or the net link between then wasn't functional), and


ATF timings on an emulated environment cannot be that precise.
See PR 43997 for more details.



   | to remove the ARP entry for the address

if the user doing this isn't root, and cannot just remove ARP entries?

Maybe I'm misunderstanding the actual scenario, but it seems to me
that things aren't working as well now as they were before (the timing
in the qemu tests hasn't changed recently - not since the nvmm version
started being used - but before the arp implementation change, it used
to work reliably).


By reliably you mean that a successful ARP resoltion lasts for 20 minutes which 
we don't have any tests for?
If anything the tests we have are more reliable than before as I have not 
adjusted any timings.


Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-09-19 Thread Roy Marples

On 13/09/2020 23:10, Robert Elz wrote:

 Date:Sun, 13 Sep 2020 22:14:00 +0100
 From:Roy Marples 
 Message-ID:  


   | >| >  net/arp/t_arp:arp_proxy_arp_pub
   | >| >  net/arp/t_arp:arp_proxy_arp_pubproxy
   | >
   | > Those two are still failing.
   |
   | Works fine on my box.
   | Can you say how they are failing?

See:

http://releng.netbsd.org/b5reports/i386/2020/2020.09.13.15.27.25/test.html#net_arp_t_arp_arp_proxy_arp_pub


The test case just needed fixing.
Basically the issue was that the test kernel was slow but the test cases were 
fast.
The ping to an invalid address caused the ARP entry to enter INCOMPLETE -> 
WAITDELETE state and this hung over into the next test casing this entry to take 
too long to validty resolve.


The solution is after a deliberate fail to remove the ARP entry for the address 
and ignore the exit code if the entry has naturally expired / been removed.


This fixes all the test case fallout from the ARP -> ND merge and has now 
survived several test runs.


The ND cache expiration test which intermittently fails is based on exact 
timings. A future patch will add jitter to NS, will cause this test to fail more.

Ideas on how to solve it welcome.

Roy


Re: arp: ioctl(SIOCGNBRINFO): Inappropriate ioctl for device

2020-09-16 Thread Roy Marples

On 16/09/2020 10:23, Thomas Klausner wrote:

On Wed, Sep 16, 2020 at 11:10:55AM +0200, Martin Husemann wrote:

On Wed, Sep 16, 2020 at 11:05:49AM +0200, Thomas Klausner wrote:

The one with 192.168.0.x configured is wm0. (I only have an lo0 except for 
that.)


Strange, your kernel is newer or same age as your userland?


My kernel is from September 4. Since there was no version bump I
assumed that I could install a newer userland (with gcc9) without
problems.


Kernel bumped for you.

Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-09-13 Thread Roy Marples

On 13/09/2020 22:07, Robert Elz wrote:

 Date:Sun, 13 Sep 2020 20:06:45 +0100
 From:Roy Marples 
 Message-ID:  <9e977478-d209-2dbb-49d9-3fa9acd25...@marples.name>

   | >  net/arp/t_arp:arp_cache_expiration_10s
   | >  net/arp/t_arp:arp_cache_expiration_5s

Those two are "fixed" (if you can call deleted fixed).


I call them "replaced".
arp_cache_expiration is the mirror of the ndp equivalent.



   | >  net/arp/t_arp:arp_command

That looks OK now.

   | >  net/arp/t_arp:arp_proxy_arp_pub
   | >  net/arp/t_arp:arp_proxy_arp_pubproxy

Those two are still failing.


Works fine on my box.
Can you say how they are failing?

Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-09-13 Thread Roy Marples

On 12/09/2020 22:57, NetBSD Test Fixture wrote:

This is an automatically generated notice of new failures of the
NetBSD test suite.

The newly failing test cases are:

 net/arp/t_arp:arp_cache_expiration_10s
 net/arp/t_arp:arp_cache_expiration_5s
 net/arp/t_arp:arp_command
 net/arp/t_arp:arp_proxy_arp_pub


> This is an automatically generated notice of a new failure of the
> NetBSD test suite.
>
> The newly failing test case is:
>
>  net/arp/t_arp:arp_proxy_arp_pubproxy

These should now be fixed

Roy


Re: Automated report: NetBSD-current/i386 build failure

2020-09-12 Thread Roy Marples

On 12/09/2020 07:40, NetBSD Test Fixture wrote:

This is an automatically generated notice of a NetBSD-current/i386
build failure.

The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
using sources from CVS date 2020.09.12.01.36.26.

An extract from the build.sh output follows:

 --- dependall-include ---
 
/tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld:
 /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: 
undefined reference to `rumpns_nd_set_timer'
 
/tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld:
 /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: 
undefined reference to `rumpns_nd_resolve'
 
/tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld:
 /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: 
undefined reference to `rumpns_nd_nud_hint'
 
/tmp/bracket/build/2020.09.12.01.36.26-i386/tools/lib/gcc/i486--netbsdelf/8.4.0/../../../../i486--netbsdelf/bin/ld:
 /tmp/bracket/build/2020.09.12.01.36.26-i386/destdir/usr/lib/librumpnet_net.so: 
undefined reference to `rumpns_nd_attach_domain'
 collect2: error: ld returned 1 exit status
 *** [t_socket] Error code 1
 nbmake[8]: stopped in 
/tmp/bracket/build/2020.09.12.01.36.26-i386/src/tests/include/sys
 --- dependall-sys ---
 --- dependall-bootxx ---


This should now be fixed in sys/rump/net/lib/libnet/Makefile r1.33

Roy


Re: Automated report: NetBSD-current/i386 build failure

2020-06-18 Thread Roy Marples

On 12/06/2020 16:13, NetBSD Test Fixture wrote:

This is an automatically generated notice of a NetBSD-current/i386
build failure.

The failure occurred on babylon5.netbsd.org, a NetBSD/amd64 host,
using sources from CVS date 2020.06.12.11.21.36.

An extract from the build.sh output follows:

 --- dependall-libagr ---
 --- if_agrether_hash.d ---
 #create  libagr/if_agrether_hash.d
 
CC=/tmp/bracket/build/2020.06.12.11.21.36-i386/tools/bin/i486--netbsdelf-gcc 
/tmp/bracket/build/2020.06.12.11.21.36-i386/tools/bin/nbmkdep -f 
if_agrether_hash.d.tmp  --   -std=gnu99   
--sysroot=/tmp/bracket/build/2020.06.12.11.21.36-i386/destdir -DCOMPAT_50 
-DCOMPAT_60 -DCOMPAT_70 -DCOMPAT_80 -DCOMPAT_90 -nostdinc -imacros 
/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../include/opt/opt_rumpkernel.h
 -I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr -I. 
-I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../../../common/include
 
-I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../include
 
-I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../include/opt
 
-I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../../arch
 
-I/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../..
 -DDIAGNOSTIC -
  DKTRACE -D_FORTIFY_SOURCE=2 
/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libagr/../../../../net/agr/if_agrether_hash.c
 &&  mv -f if_agrether_hash.d.tmp if_agrether_hash.d
 --- dependall-libnet ---
 
/tmp/bracket/build/2020.06.12.11.21.36-i386/src/sys/rump/net/lib/libnet/../../../../netinet6/in6.c:112:10:
 fatal error: compat/netinet6/nd6.h: No such file or directory
  #include 
   ^~~


Fixed here:
https://mail-index.netbsd.org/source-changes/2020/06/12/msg118239.html

Roy


Re: problem w/dhcpcd vs libsupc++.a on sparc?

2020-04-21 Thread Roy Marples

On 21/04/2020 09:07, Roy Marples wrote:

On 21/04/2020 00:02, John D. Baker wrote:

On Mon, 20 Apr 2020, r...@marples.name wrote:


Anyway the patch linked below should fix this.
https://roy.marples.name/cgit/dhcpcd.git/patch/?id=1dc1fce7ae7b4c106a8eb631ed92ab1ed8e86bbc 



I'm waiting for feedback on a few more issues, so hopefully you can
clarify the patch works before I import a fixed dhcpcd.


The patch appears only to be for the 'dhcpcd' in -current.  The version
of 'dhcpcd' in netbsd-9 is rather different in that area.

Building sparc-current again and will test soon.  Need equivalent patch
for 'dhcpcd' in netbsd-9.


Sorry, I thought it was -current.
Here is patch for netbsd-9:
https://roy.marples.name/cgit/dhcpcd.git/commit/?h=dhcpcd-8=ff78692ef3e74f8f7de2883db541de915c295e07 


I've submitted a pullup for netbsd-9 with a more fixes thanks to mrg@ and nick@
Hopefully Martin can address it soon!

-current and pkgsrc have already been fixed.

Roy


Re: problem w/dhcpcd vs libsupc++.a on sparc?

2020-04-21 Thread Roy Marples

On 21/04/2020 00:02, John D. Baker wrote:

On Mon, 20 Apr 2020, r...@marples.name wrote:


Anyway the patch linked below should fix this.
https://roy.marples.name/cgit/dhcpcd.git/patch/?id=1dc1fce7ae7b4c106a8eb631ed92ab1ed8e86bbc

I'm waiting for feedback on a few more issues, so hopefully you can
clarify the patch works before I import a fixed dhcpcd.


The patch appears only to be for the 'dhcpcd' in -current.  The version
of 'dhcpcd' in netbsd-9 is rather different in that area.

Building sparc-current again and will test soon.  Need equivalent patch
for 'dhcpcd' in netbsd-9.


Sorry, I thought it was -current.
Here is patch for netbsd-9:
https://roy.marples.name/cgit/dhcpcd.git/commit/?h=dhcpcd-8=ff78692ef3e74f8f7de2883db541de915c295e07

Roy


Re: HEADS UP: dhcpcd gains privilege separation support

2020-04-04 Thread Roy Marples

Hi Oskar

On 04/04/2020 07:40, os...@fessel.org wrote:

Am 02.04.2020 um 15:07 schrieb Roy Marples :


could it be that this

_dhcpcd has been added to master.passwd and group, so please update your local 
ones before upgrading.
Once installed, you should stop dhcpcd running and then invoke postinstall so 
that the old dhcpcd files (duid, secret, leases, etc) are moved to the chroot 
directory.
Then you can start dhcpcd and it will pick up where it left off.


relates to this build failure:
===  1 extra files in DESTDIR  =
Files in DESTDIR but missing from flist.
File is obsolete or flist is out of date ?
--
./var/db/dhcpcd
=  end of 1 extra files  ===
when running:
===> build.sh command:./build.sh -j 24 -M /hurz/obj -O /hurz/obj -X 
/hurz/xsrc -x release kernel=ZAPPA-pf kernel=XEN3_DOM0 kernel=XEN3_DOMU
===> build.sh started:Fri Apr  3 21:54:54 CEST 2020
===> NetBSD version:  9.99.52
===> MACHINE: amd64
===> MACHINE_ARCH:x86_64
===> Build platform:  NetBSD 9.99.50 amd64
===> HOST_SH: /bin/sh
===> MAKECONF file:   /etc/mk.conf
===> TOOLDIR path:/hurz/obj/tooldir.NetBSD-9.99.50-amd64
===> DESTDIR path:/hurz/obj/destdir.amd64
===> RELEASEDIR path: /hurz/obj/releasedir
===> Updated makewrapper: 
/hurz/obj/tooldir.NetBSD-9.99.50-amd64/bin/nbmake-amd64
—
with sources sup’ed just 30 minutes before the build start?

The same with sources from tonight midnight CEST.

Or did I miss something besides deleting everything in DESTDIR and RELESEDIR?


I forgot to say that /var/db/dhcpcd has been deprecated for new builds so it 
needs to be removed.
A check through our set lists tells me that we can't obsolete directories so I 
think it needs to be manual.


Roy


HEADS UP: dhcpcd gains privilege separation support

2020-04-02 Thread Roy Marples
_dhcpcd has been added to master.passwd and group, so please update your local 
ones before upgrading.
Once installed, you should stop dhcpcd running and then invoke postinstall so 
that the old dhcpcd files (duid, secret, leases, etc) are moved to the chroot 
directory.

Then you can start dhcpcd and it will pick up where it left off.

Roy


Re: Automated report: NetBSD-current/i386 test failure

2020-03-31 Thread Roy Marples

On 31/03/2020 12:22, NetBSD Test Fixture wrote:

The newly failing test case is:

 usr.bin/infocmp/t_terminfo:basic


This error in infocmp is now fixed.

Roy


Re: ZFS on root - almost there

2020-02-25 Thread Roy Marples

On 25/02/2020 21:40, Chavdar Ivanov wrote:

On Tue, 25 Feb 2020 at 20:14, Roy Marples  wrote:


On 22/02/2020 19:22, Roy Marples wrote:

https://wiki.netbsd.org/wiki/RootOnZFS/


Updated the wiki and the ramdisk - either the bootloader needs to load the
modules via boot.cfg or the modules need to be built into the kernel.


I don't get it - with my present, still 9.99.47 setup, I am able to
load modules:


Because we can label a GPT parition "boot".
However, that won't work for MBR based systems.
It's also not very friendly if you have any other OS present who might for 
similar reasons have a parition named boot either.



There's just no easy way to load the modules from the ramdisk without putting
them inside the ramdisk  and I think too many people would forget to
re-build the ramdisk or put it against the wrong kernel.


So is the option of loading them as per the above no longer available?


No it's not.
It is however available from the source history if you really want it.

As a last metric, since reverting back to letting the bootloader load the 
modules, zpool is no longer panicing randomly. Or the randomness just hasn't 
struck yet!


Roy


Re: ZFS on root - almost there

2020-02-25 Thread Roy Marples

On 22/02/2020 19:22, Roy Marples wrote:

https://wiki.netbsd.org/wiki/RootOnZFS/


Updated the wiki and the ramdisk - either the bootloader needs to load the 
modules via boot.cfg or the modules need to be built into the kernel.


There's just no easy way to load the modules from the ramdisk without putting 
them inside the ramdisk  and I think too many people would forget to 
re-build the ramdisk or put it against the wrong kernel.


Also, I've updated the minimum required kernel to 9.99.48 as Taylor R Campbell 
has kindly fixed the problem of writing to the FFS boot device from the ZFS 
chroot :)


Roy


Re: ZFS on root - almost there

2020-02-23 Thread Roy Marples

On 23/02/2020 11:56, Chavdar Ivanov wrote:

On Sun, 23 Feb 2020 at 05:17, Roy Marples  wrote:


On 22/02/2020 21:19, Chavdar Ivanov wrote:

I just noticed - the error message from the sysctl command was that
the string was too long:


Sync up, build a new ramdisk and install it.
Should be fixed now.


It is indeed. So far, this was a VirtualBox guest, 2PCUs, 4GB memory,
EFI enabled. X works fine with the VirtualBox additions installed.

The access to the boot partition (/dev/dk1) fails on umount; it
usually hangs, but I had once a sudden reset.


That should also be fixed if you sync up :)

Roy


Re: ZFS on root - almost there

2020-02-22 Thread Roy Marples

On 22/02/2020 21:19, Chavdar Ivanov wrote:

I just noticed - the error message from the sysctl command was that
the string was too long:


Sync up, build a new ramdisk and install it.
Should be fixed now.

Roy


Re: ZFS on root - almost there

2020-02-22 Thread Roy Marples

On 22/02/2020 19:06, Chavdar Ivanov wrote:

On Sat, 22 Feb 2020 at 18:03, Chavdar Ivanov  wrote:


On Sat, 22 Feb 2020 at 17:03, Roy Marples  wrote:


On 22/02/2020 16:56, Chavdar Ivanov wrote:

Surely I have missed and/or misuderstood some of the above, but I am getting:
...
Starting ZFS on root boot strapper
Copying needed kernel modules from NAME=boot:/stand/amd64/9.99.47/modules
mount: no match for 'boot': No such process
/mnt//stand/amd64/9.99.47/modules/zfs/solaris.kmod not found!
/mnt//stand/amd64/9.99.47/modules/zfs/zfs.kmod not found!
umount: /mnt: not currently mounted

Importing rpool, mounting and pivoting
internal error: failed to initialize ZFS library


It seems it tries to mount the small ufs root on /mnt using
'NAME=boot' label, but the label created by the standard installed is
some GUID.


Wups!
I missed an instruction step to ensure the label of the FFS partiton is boot.
gpt label -i 1 -l boot wd0
Replace 1 with the partition index and wd0 with the device.


It worked; the only problem I am still having is adding swap; both for
a zvol and a gpt partition I get:

...

nzfs# swapctl -a /dev/zvol/dsk/rpool/SWAP
swapctl: /dev/zvol/dsk/rpool/SWAP: Device not configured
nzfs# swapctl -a /dev/dk5
swapctl: /dev/dk5: Device not configured

Could be something I've screwed during the installation, but can't
figure it out.


The next problem is that one can't load any modules; is this by design
or I have again made some mistake? Only the two modules prior to
pivoting are seen - solaris and zfs; after that one gets, e.g.:

➜ ~ ls -l /stand/amd64/9.99.47/modules/dtrace/dtrace.kmod
-r--r--r-- 1 root wheel 320120 Feb 20 10:19
/stand/amd64/9.99.47/modules/dtrace/dtrace.kmod
➜ ~ modload dtrace
modload: dtrace: No such file or directory
➜ ~ uname -a
NetBSD nzfs 9.99.47 NetBSD 9.99.47 (GENERIC) #10: Sat Feb 22 14:18:50
GMT 2020 
sysbuild@ymir:/home/sysbuild/amd64/obj/home/sysbuild/src/sys/arch/amd64/c
ompile/GENERIC amd64





Does this patch help?

Roy

Index: zfsroot.rc
===
RCS file: /cvsroot/src/distrib/common/zfsroot.rc,v
retrieving revision 1.1
diff -u -p -r1.1 zfsroot.rc
--- zfsroot.rc  22 Feb 2020 09:53:47 -  1.1
+++ zfsroot.rc  22 Feb 2020 19:30:43 -
@@ -51,6 +51,13 @@ done
 /sbin/umount "$modmnt"
 echo

+# Point the modulepath to /altroot
+mpath="$(sysctl -n kern.module.path)"
+case "$mpath" in
+/altroot/\*)   ;;
+*) sysctl -w kern.module.path="/altroot/$mpath";;
+esac
+
 echo "Importing $rpool, mounting and pivoting"
 # If we can mount the ZFS root partition to /altroot
 # then chroot to it and start /etc/rc


Re: ZFS on root - almost there

2020-02-22 Thread Roy Marples

On 22/02/2020 11:27, Roy Marples wrote:

On 14/02/2020 12:58, Roy Marples wrote:

So I thought I would have a go at setting up ZFS on root.


I've now comitted enough to manually build a ramdisk to set this all up.
Quick instruction steps which I'll document on web page later:


https://wiki.netbsd.org/wiki/RootOnZFS/

Roy


Re: ZFS on root - almost there

2020-02-22 Thread Roy Marples

On 22/02/2020 16:56, Chavdar Ivanov wrote:

Surely I have missed and/or misuderstood some of the above, but I am getting:
...
Starting ZFS on root boot strapper
Copying needed kernel modules from NAME=boot:/stand/amd64/9.99.47/modules
mount: no match for 'boot': No such process
/mnt//stand/amd64/9.99.47/modules/zfs/solaris.kmod not found!
/mnt//stand/amd64/9.99.47/modules/zfs/zfs.kmod not found!
umount: /mnt: not currently mounted

Importing rpool, mounting and pivoting
internal error: failed to initialize ZFS library


It seems it tries to mount the small ufs root on /mnt using
'NAME=boot' label, but the label created by the standard installed is
some GUID.


Wups!
I missed an instruction step to ensure the label of the FFS partiton is boot.
gpt label -i 1 -l boot wd0
Replace 1 with the partition index and wd0 with the device.

We do it like so to avoid the user needing to load the solaris and zfs modules 
in boot.cfg.
Ideally we should teach sysctl to have kern.boot_device alongside 
kern.root_device to avoid this need.


Roy


Re: ZFS on root - almost there

2020-02-22 Thread Roy Marples

On 14/02/2020 12:58, Roy Marples wrote:

So I thought I would have a go at setting up ZFS on root.


I've now comitted enough to manually build a ramdisk to set this all up.
Quick instruction steps which I'll document on web page later:

Compile the ramdisk
cd src/distrib/amd64/ramdisks/ramdisk-zfsroot
nbmake-amd64

Ensure you are using GPT and not MBR. If you need to change, dd the disk
using /dev/zero as source for about 32k and then the installer will ask you if 
you want MBR or GPT. Once set, it will not prompt to change it again.


Use the installer to do a normal installation, extracting base, modules and 
rescue sets to a small FFS parition (I chose 2G). Do not allow the installer to 
use the rest of the disk.

Drop to the prompt and copy the ramdisk you made earlier to /
Edit /boot.cfg and add this menu item:
menu=Boot ZFS root:fs /ramdisk-zfsroot.fs;boot

Create a ZFS pool on another partition called rpool.
Create the ZFS root filesytem called rpool/ROOT.
zfs set mountpoint=legacy rpool/ROOT
This step is important - the only downside is if you want to create any ZFS 
datasets in rpool/ROOT you need to either set mountpoints in /etc/fstab or 
specify them as they will automatically inherit legacy from ROOT.

Extract the sets you want rpool/ROOT.
Create dev on rpool/ROOT, copy MAKEDEV from /dev to it, cd to it and run 
./MAKEDEV all

Copy your /etc/fstab to rpool/ROOT/etc, but remove the / entry.
Ensure that rc.conf is setup in rpool/ROOT/etc and it has zfs=YES

You should now be good to go!

WARNING: There seems to be a bug that once booted into a ZFS root and mount any 
device and write to it the system will hang trying to unmount it. This is not a 
fault with the ramdisk, but rather with how ZFS works with device nodes on ZFS. 
So to update the kernel, boot into the FFS partition and copy from the ZFS 
partition rather from doing it within the ZFS root.


Once that is fixed I might look into trying to automate some of this in our 
installer.


Good luck!

Roy


Re: ZFS on root - almost there

2020-02-16 Thread Roy Marples

On 14/02/2020 12:58, Roy Marples wrote:

So I thought I would have a go at setting up ZFS on root.


I now have a ramdisk-zfsroot configured!
With just the kernel and modules on the partition I can put this in boot.cfg

menu=Load ZFS Root;load solaris;load zfs;fs /ramdisk-zfsroot.fs;boot

Sadly though zpool cannot find my pool :(
I suspect this is because the bootdevice is now the ramdisk md0 rather than 
wd0a. Is there any way of educating the zfs module about this?


Roy


ZFS on root - almost there

2020-02-14 Thread Roy Marples

So I thought I would have a go at setting up ZFS on root.

Thanks to hannken@ it now boots :)
However, it panics at shutdown (or halt). Screen capture of the panic here:
http://www.netbsd.org/~roy/netbsd-zfs-panic.jpg

Now, what I did during the initial setup was to adjust the mountpoint of 
tank/ROOT/usr to /usr - ie relative to the chroot.


The bootstrap phase is this in /etc/rc

fsck -y /
zfs mount tank/ROOT
mount -t null /dev /tank/ROOT/dev
mount -t null / /tank/ROOT/altroot # this doesn't appear to work
sysctl -w init.root=/tank/ROOT

This works fine, we enter the chroot
For the time being I've disabled fsck_root and adjusts zfs to load all mounts.

We now get to the login with minimal errors and all appears to work.
You can see the mountlist inside the chroot at the top of the screen capture.

If some kind person can fix this panic then I can copy across my live home site 
setup (web server, email, etc) and really test it out.


Roy


Re: Recent if_stat changes have broken sysutils/xosview

2020-02-09 Thread Roy Marples

On 09/02/2020 01:52, Jason Thorpe wrote:



On Feb 8, 2020, at 4:04 PM, Paul Goyette  wrote:

The package no longer builds.  Fails with (among others)

error: 'struct ifnet' has no member named 'if_ibytes'; did you mean 'if_index'?


"struct ifnet" is private to the kernel.  This application should be using the 
properly exported data that's available via ioctls for this purpose.


We have far too many kernel only things exposed to userland.

A constant beef of mine is that we  #define if_type in sys/net/if.h which causes 
conflict building hostapd/wpa_supplicant has they have an enum if_type.


If we can resolve this it would make me a lot happier!
I tried to have a go solving this about a year ago, but gave up due to some 
userland stuff like this no longer working.

If we can solve it via ioctl then awesome.

Roy


Re: Converting termcap entries to terminfo entries

2019-10-23 Thread Roy Marples

Hi Brian

On 22/10/2019 23:14, Brian Buhrow wrote:

hello.  I'm in the process of building NetBSD-9.0 systems in an effort
to consider upgrading from my fleet of NetBSD-5.2 systems to NetBSD-9.  As
a long time window(1) user, I have a termcap entry for the window terminal
type that I use on systems that I ssh into from window(1) panes.  It is my
practice to put a termcap and a terminfo database in my home directory on
such systems, so that regardless of whether a program at the far end wants
termcap or terminfo, it will be able to draw on the screen in full screen
mode.  what I need is a way of converting the termcap entries I have into a
terminfo source file that tic(1) can compile into a .cdb file which can be
used on NetBSD-9 systems.  I have  an older version of captoinfo(1) from
the ncurses pkg, but it produces binary terminfo output unsuitable for the
tic(1) program.  I'm fuly aware that window(1) has been deprecated in favor
of tmux(1), but I haven't climbed the learning curve of tmux(1) yet and I'm
not sure it does everything I get from the window(1) program.
So, can someone tell me what program I should use to convert termcap
files into terminfo source files suitable for the new terminfo libraries in
NetBSD-8 and 9?


We don't have any specific program as such, but terminfo(5) has a 
section "Fetching Compiled Descriptions"


If the environment variable TERMCAP is available and does not begin with
a slash (`/') then it will be translated into terminfo and compiled as
above.  If its name matches TERM then it is used.

So you can use infocmp(8) like so:

$ TERM=captest 
TERMCAP="captest|:al=3*\E^R:am:bl=^G:cd=16*\E^C:ce=16\E^U:cl=2*^L:cm=\Ea%+ 
%+ :" infocmp

# Reconstructed from $TERMCAP
captest,
am,
bel=^G, clear=\f$<2*/>, cr=^M, cud1=^J,
cup=\Ea%p1%{32}%+%c%p2%{32}%+%c, ed=\E\003$<16*/>, el=\E\025$<16/>,
ht=^I, il1=\E\022$<3*/>, ind=^J, kbs=^H, kcub1=^H, kcud1=^J,
nel=^M^J,

I don't know how accurate the conversion will be for you as it's not 
entirely a 1-1 mapping and I think some assumptions are made (I've not 
looked at the source for a while), but hopefully it's good enough.


Might be time consusing with many termcap entries to convert, but it 
should be scriptable at least. Is this good enough for you?


Roy


Re: dhcpcd ignores "force_hostname=YES" on diskless clients?

2019-09-22 Thread Roy Marples

On 21/09/2019 03:01, John D. Baker wrote:

On Fri, 20 Sep 2019, John D. Baker wrote:


(Before the recent imports of later versions of 'dhcpcd', it failed to
obtain the FQDN on a sparc system and set the hostname as "localhost".)


The diskless SPARC system works properly now.

Will have to check other diskless clients (amd64, i386, evbmips).

All diskless client "dhcpcd.conf" files have only the following changes
from default:

   comment out "hostname" directive
   un-comment "ntp_servers" option
   add "env force_hostname=YES"


If the hostname is "localhost" then dhcpcd won't send it.
If the hostname is "localhost" then dhcpcd will set the hostname given 
via DHCP.


So the only config change you should need to make by default is 
uncommenting ntp_servers.


Roy


Re: build issue: _REENTRANT redefined

2019-09-06 Thread Roy Marples

On 06/09/2019 11:34, Thomas Klausner wrote:

I guess I have to turn off the gcc build as well, but for now I'd like
to have both compilers...


I've not been able to build both for many years now.
As my need for building xen packages out-weighs my social want for LLVM, 
I currently only use gcc :(


Roy


Re: NetBSD on a wireless router?

2019-08-17 Thread Roy Marples

On 16/08/2019 04:28, Jason Thorpe wrote:



On Aug 15, 2019, at 8:15 PM, John Franklin  wrote:

because I usually use the Ubiquiti APs for WiFi.  For WiFi performance and 
management on a budget, they’re hard to beat.


+1. I use Ubiquiti to cover the 3 levels of my house + back yard, and it works 
flawlessly (total of 4 APs to do the job).


Another +1 for Ubiquiti.

I have a UAP-AC-Pro plugged stock firmware plugged into my Ubiquiti 
EdgeRouter Lite which in turn runs NetBSD as the router itself.


The range of the UAP-AC-Pro is pretty amazing comapred to anything else 
I've seen at consumer prices.


Roy


Re: CVS commit: src/usr.sbin/postinstall

2019-06-13 Thread Roy Marples

On 13/06/2019 09:00, Manuel Bouyer wrote:

On Thu, Jun 13, 2019 at 06:17:29AM +0300, Valery Ushakov wrote:

[...]
I've been using etcupdate for ages so I only ever really used
postinstall to fix "obsolete" and "catpages".  etcupdate -a has some
kinks and may be we should concentrate on fixing those instead?


I *never* used etcupdate, so for me it's better to have a working postinstall
(I have a PR about it: install/52349, which may have been fixed by the
recent change)


I used etc-update once and accidently overwrote master.passwd
Never used it since, far too risky.

Roy



Re: ipv6 broken

2019-05-13 Thread Roy Marples

On 13/05/2019 13:34, Christos Zoulas wrote:

In article <332662e7-3c78-5d1b-ce05-8c86806f7...@marples.name>,
Roy Marples   wrote:

On 13/05/2019 03:00, Christos Zoulas wrote:

dhcpcd says:

May 13 01:47:01 [79]: wm0: ipv6_start: Can't assign requested address


dhcpcd should say duplicated adddress based on the below, but that's
just cosmetic really.


Kernel says:

[13.119958] wm0: link state DOWN (was UNKNOWN)
[16.261560] wm0: link state UP (was DOWN)
[17.283056] wm0: DAD duplicate address fe80:1::56bf:64ff:fe92:10c8

from 00:17:10:87:19:87:46:66

[17.283056] wm0: possible hardware address duplication detected,

disable IPv6

[17.426267] wm1: link state UP (was UNKNOWN)
[17.427269] Cannot enable an interface with a link-local address

marked duplicate.

Assuming this is -current either our nonce code it is broken or there
really is a duplicate address from hardware address 00:17:10:87:19:87:46:66

Regardless, we need more data.


Reverting the nd6 changes and in particular the is this needed part makes
the DAD message stop. But the Can't assigned requested address remains.


Which ND6 changes specifically?


Re: ipv6 broken

2019-05-13 Thread Roy Marples

On 13/05/2019 03:00, Christos Zoulas wrote:

dhcpcd says:

May 13 01:47:01 [79]: wm0: ipv6_start: Can't assign requested address


dhcpcd should say duplicated adddress based on the below, but that's 
just cosmetic really.



Kernel says:

[13.119958] wm0: link state DOWN (was UNKNOWN)
[16.261560] wm0: link state UP (was DOWN)
[17.283056] wm0: DAD duplicate address fe80:1::56bf:64ff:fe92:10c8 from 
00:17:10:87:19:87:46:66
[17.283056] wm0: possible hardware address duplication detected, disable 
IPv6
[17.426267] wm1: link state UP (was UNKNOWN)
[17.427269] Cannot enable an interface with a link-local address marked 
duplicate.


Assuming this is -current either our nonce code it is broken or there 
really is a duplicate address from hardware address 00:17:10:87:19:87:46:66


Regardless, we need more data.

Roy


Re: "route_enqueue: queue full, dropped message" blast from a 8.99.32 amd64 domU

2019-05-10 Thread Roy Marples

On 10/05/2019 00:40, Greg A. Woods wrote:
[Thu May  9 09:24:08 2019][ 6442662.0806318] route_enqueue: queue 
full, dropped message


There were thousands of identical lines, all separated by a few 
microseconds.  No doubt this spew was the real cause of the apparent

 interrupt storm and the resulting sluggishness.


https://nxr.netbsd.org/xref/src/sys/net/rtsock_shared.c#1602

I would imagine that if an interface is interupting that much then it's 
constantly sending messages to route(4) to say that it's up/down and 
addresses are detached/tentative in a tight loop. The queueing mechanism 
has a fixed length and while we go out of our way to notify userland if 
there's an error sending these messages, we can't send this one at all 
so we just log it.


So it's an artifact of your interupt storm, but not the cause.

Roy


Re: Automated report: NetBSD-current/i386 build failure

2019-05-07 Thread Roy Marples
I think Christos has kindly fixed this for me.Roy

Re: Automated report: NetBSD-current/i386 build failure

2019-01-22 Thread Roy Marples

On 22/01/2019 20:30, Andreas Gustafsson wrote:

The NetBSD Test Fixture wrote:

 --- dhcpcd_make ---
 cc1: all warnings being treated as errors
 *** [dhcpcd.o] Error code 1


More relevant error messages from earlier in the log:

 --- dhcpcd_make ---
 
/tmp/bracket/build/2019.01.22.17.41.06-i386/src/external/bsd/dhcpcd/dist/src/dhcpcd.c:
 In function 'dhcpcd_handlecarrier':
 
/tmp/bracket/build/2019.01.22.17.41.06-i386/src/external/bsd/dhcpcd/dist/src/dhcpcd.c:768:6:
 error: implicit declaration of function 'ipv4ll_reset' 
[-Werror=implicit-function-declaration]
   ipv4ll_reset(ifp);
   ^~~~



Should be fixed now.

Roy


Re: failed to create llentry

2018-11-21 Thread Roy Marples

On 22/11/2018 00:36, Greg Troxel wrote:

Roy Marples  writes:


On 21/11/2018 19:51, co...@sdf.org wrote:

-B -M -c /etc/wpa_supplicant.conf -s seem like really good flags,
thanks.
(are they good enough to be a default? right now anyone using wifi has
to have wpa_supplicant_flags set, so we can't break their usage)


Yes and no.
We would need to ship a default wpa_supplicant.conf - probably
enabling the default socket so wpa_cli(8) just works and commented out
entries for connecting to any open ap and a specific ap with psk.

We might want to enable (but commented out maybe to start with) the
ability instructions over the control socket to configure
wpa_supplicant.conf as well.
This would be handy for applications like dhcpcd-{gtk,qt}

Then, the user just has to set wpa_supplicant=YES in rc.conf and
voila, wireless network setup with X11 and a systray application
becomes a lot easier for the end user to setup.


I am unclear on the fine points, but in general find wpa_supplicant to
be way too painful to deal with.  It really seems like it should be able
to be started by default,


It is painful without a good setup, yes.
It can be started by default if the user so chooses.

So I see sysinst network config coming down to this:
Auto-start wireless Y/N
Auto-configure addresses Y/N
If auto-start wireless is Y, or autoconfigure addresses is N, spawn 
dhcpcd-curses to handle both.


You don't actually pick an interface by default.
I don't even propose we have an advanced section - you want anything 
more, drop to the shell and do it.

ifconfig and route are not hard, neither is editing resolv.conf.
Job done.


and exit if no wifi interfaces,


Why?
Hotplugging of wifi is a thing.
Pinebooks are a really good example of having no networking at boot.
I generally plug the stick and ethernet dongle/cable in after boot.


and have some
command-line wifi_choose program that prints out a list of SSIDs, takes
a number, and asks for a password, and stores both the ssid and the
password, and next time just connects.  Sort of like how a mac works
clicking on the wifi icon, but command line. And a gui version would be
fine too of course.  To me this is the biggest NetBSD wifi usability
issue, or perhaps it's just behind USB wifi adaptors being slightly
flaky.


By GUI you mean X11 based? dhcpcd-{gtk,qt} satisfy this on BSD at least.
dhcpcd-curses is a thing, but it's currently just a monitor.
Now I have a pinebook I can concentrate on fixing some recent 
dhcpcd/netbsd/platform bitrot with shared IP address and then work on 
dhcpcd-curses once more now I have a working NetBSD environment with 
wireless once again.


Roy


Re: failed to create llentry

2018-11-21 Thread Roy Marples

On 21/11/2018 19:51, co...@sdf.org wrote:

-B -M -c /etc/wpa_supplicant.conf -s seem like really good flags,
thanks.
(are they good enough to be a default? right now anyone using wifi has
to have wpa_supplicant_flags set, so we can't break their usage)


Yes and no.
We would need to ship a default wpa_supplicant.conf - probably enabling 
the default socket so wpa_cli(8) just works and commented out entries 
for connecting to any open ap and a specific ap with psk.


We might want to enable (but commented out maybe to start with) the 
ability instructions over the control socket to configure 
wpa_supplicant.conf as well.

This would be handy for applications like dhcpcd-{gtk,qt}

Then, the user just has to set wpa_supplicant=YES in rc.conf and voila, 
wireless network setup with X11 and a systray application becomes a lot 
easier for the end user to setup.



I can't unplug my card because it's PCI.

I'll try to investigate next time it happens


Another way of restarting things is to down/up the interface.

ifconfig urtwn down up

Does wonders - both wpa_supplicant and dhcpcd will react to this.
There should be no need to kill anything with prejudice.

Roy


Re: failed to create llentry

2018-11-21 Thread Roy Marples

On 21/11/2018 18:55, co...@sdf.org wrote:

I don't like debugging problems with daemonized processes.
wpa_supplicant for example prints nothing to syslog. the messages it
gives to stdout are informative.


wpa_supplicant(8) says

-s  Send log messages through syslog(3) instead of to the terminal.


I'm quite grumpy about networking in netbsd in general.


I'm actually very happy.
For example my remote ssh sessions persist without dropping when the 
carrier goes down/up.


Heck, my dhcp lease died on my pinebook half an hour ago and building 
pkgsrc entirely over nfs just carried on working again without the blink 
of an eye.

It's not magic, it's NetBSD.

Roy


Re: failed to create llentry

2018-11-21 Thread Roy Marples

On 21/11/2018 17:18, co...@sdf.org wrote:

I use wpa_supplicant and dhcpcd. When dhcpcd fails to configure the
network I start doing it manually. I don't really pay attention to when
the errors occur but I'll try to keep a closer track about when they
start.

dhcpcd will mysteriously fail while I am connected with wpa_supplicant,


How does it mysteriously fail?


so I'd kill it and do:
pkill -9 dhcpcd


That's quite harsh.


route -n flush
route -n flushall


dhcpcd -k
should do this (and remove any addresses or anything else it added) if 
you don't pkill -9 it.



ifconfig iwm0 local-ip-i-should-have
route add default gateway

Usually when these problems happen one of the following occurs too:
- wpa_supplicant will complain it can't assign an address every hour or
   so, and network traffic will stop for a bit


wpa_supplicant doesn't assign any kind of address by itself.
Can you post some context?


- I'll accidentally restart wpa_supplicant before killling all network
   traffic and get a kernel panic


Backtrace would be nice.


I guess wpa_supplicant does more than I want to do and run into
conflicts with manual setup.


Often my urtwn firmware fails for some unknown reason. It's not the most 
stable stick on my network, but it work in my pinebook.
My solution is to remove and insert the stick until the firmware loads 
correctly.

To allow this to work, I setup wpa_supplicant in plug and play mode.

wpa_supplicant_flags="-B -M -c /etc/wpa_supplicant.conf"

This tells wpa_supplicant to background, match any interface and use the 
stated config file.

dhcpcd runs with default flags and config.

I've been plugging in and removing in no set order the usb wifi stick 
and a usb ethernet dongle and it just works * (there is an issue with IP 
address sharing, unsure if platform, dhcpcd or kernel issue - I'll be 
fixing this once I have a working desktop on the pinebook).


* Sometimes either interface gets an IPv4LL address which means carrier 
is "UP" but there's another issue such as firmware failure or the 
ethernet over power adpater needs a reboot. In any case, no manual 
address setup or routing is needed.


Roy


Re: Travel router part 2

2018-09-05 Thread Roy Marples

On 05/09/2018 14:59, D'Arcy Cain wrote:

On 2018-09-05 08:03 AM, Roy Marples wrote:

and have a named configured to use the forwarders in
/etc/namedb/forwarders.  Whatever the ISP dhcp gives me is stuffed into
the forwarders and used as last resort.  This has been a robust solution
for many open wireless access points.


Since NetBSD-6, dhclient-script has shipped with resolvconf(8) support
that will do that for you.


Do you run dhclient if you have PPPOE set up?  Once the interface is up
I already have an IP address so what does dhclient do?

Note that the router is also the DHCP server so it also has static IPs
on the internal interfaces.  At some point I will add a second wifi card
to connect to campground wifi or tether to my phone.  At that point
dhclient will probably be set up to talk to that second wifi since it
will replace the PPPOE connection.


I don't use dhclient :)

When I last looked into this, I had pppoe and dhcpcd running at the same 
time, always up. Both fed their dns info into resolvconf(8) which then 
configured the results as unbound(8) forwarders.


You can tell resolvconf which interfaces take priority over others.
You might need to tell dhcpcd to use the desination as the default route 
and prefer the pppoe interface via metrics if the default isn't to your 
liking.
dhcpcd will support resolvconf without any changes, but our pppoe isn't 
so friendly and you do need to call it yourself via the up and down actions.


Doing it like so, I don't have to manually do anything other than 
connect to a wireless point or plug an ethernet cable in once the system 
is booted.


As the world moves forward, I would encorage using dhcpcd just because 
it will also handle DHCPv6 prefix delegation should you need that.
You can do it with dhclient as well .. but it needs an awful lot of 
hand holding to work well.


Roy


Re: Travel router part 2

2018-09-05 Thread Roy Marples

On 04/09/2018 23:21, Brett Lymn wrote:

On Sun, Sep 02, 2018 at 11:55:58AM -0400, D'Arcy Cain wrote:


Any thoughts on picking up the DNS servers?  It's not too bad because my
DHCP server can be modified as needed so it is only one location and in
any case I always include Google's public servers.



I have found that DNS can be problematic when travelling, some places
force you through their DNS regardless and do all sorts of lossage.
what I do on my laptop is run a local named and configure forwarders to
the DNS provided so I can override some of the random lossage.  I have a
dhclient (yeah, old habits..) enter script that does:

restore_resolv_conf() {
 # We don't want /etc/resolv.conf changed
 # So this is an empty function
 return 0
}

make_resolv_conf() {
 if [ -f /etc/namedb/forwarders ]
 then
 mv /etc/namedb/forwarders /etc/namedb/forwarders.old
 fi

 printf "forwarders { " > /etc/namedb/forwarders
 for nameserver in $new_domain_name_servers
 do
 printf "%s; " ${nameserver} >> /etc/namedb/forwarders
 done
 echo "};" >> /etc/namedb/forwarders
 echo "forward only;" >> /etc/namedb/forwarders

 pkill -HUP named
 return 0
}

and have a named configured to use the forwarders in
/etc/namedb/forwarders.  Whatever the ISP dhcp gives me is stuffed into
the forwarders and used as last resort.  This has been a robust solution
for many open wireless access points.


Since NetBSD-6, dhclient-script has shipped with resolvconf(8) support 
that will do that for you.


Roy


Re: if_addrflags6: Can't assign requested address

2018-08-20 Thread Roy Marples

On 18/08/2018 03:43, Roy Marples wrote:

On 18/08/2018 03:29, SAITOH Masanobu wrote:

This patch worked. if_addrflags6's error messages disappeared.


:)



Before this patch,

Aug 18 01:00:58 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 01:30:59 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 02:01:01 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 02:31:03 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 03:01:04 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 03:31:05 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 04:01:06 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 04:31:08 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 05:01:09 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 05:31:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 06:01:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 06:31:12 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 07:01:14 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 07:31:15 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 08:01:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument
Aug 18 08:31:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid 
argument


This error message appeared ever 30 minutes, but it also disappeared
with this patch.


That's avoiding the broken IP_PKTINFO implementation in NetBSD-7 - can't 
use it to send.


Comitted. Pullups requested to -7 and -8.

Roy


Re: if_addrflags6: Can't assign requested address

2018-08-17 Thread Roy Marples

On 18/08/2018 03:29, SAITOH Masanobu wrote:

This patch worked. if_addrflags6's error messages disappeared.


:)



Before this patch,


Aug 18 01:00:58 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 01:30:59 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 02:01:01 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 02:31:03 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 03:01:04 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 03:31:05 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 04:01:06 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 04:31:08 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 05:01:09 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 05:31:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 06:01:11 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 06:31:12 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 07:01:14 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 07:31:15 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 08:01:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument
Aug 18 08:31:16 amd64-n7 dhcpcd[250]: wm1: dhcp_sendudp: Invalid argument


This error message appeared ever 30 minutes, but it also disappeared
with this patch.


That's avoiding the broken IP_PKTINFO implementation in NetBSD-7 - can't 
use it to send.


Roy


Re: if_addrflags6: Can't assign requested address

2018-08-17 Thread Roy Marples

On 17/08/2018 10:08, Roy Marples wrote:

On 17/08/2018 09:04, Masanobu SAITOH wrote:

wm2: carrier lost
wm2: executing `/libexec/dhcpcd-run-hooks' NOCARRIER
wm2: deleting address fe80::1392:4012:56d8:a7a2
wm2: if_addrflags6: Can't assign requested address
wm2: if_addrflags6: Can't assign requested address
wm2: if_addrflags6: Can't assign requested address
wm2: if_addrflags6: Can't assign requested address
wm2: carrier acquired
wm2: executing `/libexec/dhcpcd-run-hooks' CARRIER


This helps.
I never saw this because on NetBSD-8, we have addrflags available in 
ifa_msghdr when sent over route(4). This does not exist on NetBSD-7 so 
we need to make an ioctl per address to work out the flags. Sadly, this 
is racy and this is what happens:


Something adds an address.
Kernel annnounces new address to route(4).
Something deletes this address.
Kernel announces the address deleted to route(4).

dhcpcd reads the address added message from route(4) *after* the address 
has been deleted from the kernel. Because dhcpcd needs the address flags 
at this point, an ioctl is made to the deleted address and boom, error.


Luckily dhcpcd handles it correctly and it's just noise.
Please test the attached patch to silence it.
If you can verify it works, let me know and I'll push a new version out.


Since then I've discovered two more critical issues with dhcpcd-7 on 
NetBSD-7.

1) Broken IP_PKTINFO implementation
2) Invalid RTA_BRD in RTM_NEWADDR messages for new addresses
Both of these have already been fixed in -8 and -current and neither 
looks suitable for a pullup and dhcpcd needs a workaround for both anyway.


A better patch attached and I'll hopefully get this pushed out over the 
weekend.


Roy
diff --git a/src/dhcp.c b/src/dhcp.c
index 7a6749d4..1e9fe186 100644
--- a/src/dhcp.c
+++ b/src/dhcp.c
@@ -86,6 +86,11 @@
 #define IPDEFTTL 64 /* RFC1340 */
 #endif
 
+/* NetBSD-7 has an incomplete IP_PKTINFO implementation. */
+#if defined(__NetBSD_Version__) && __NetBSD_Version__ < 8
+#undef IP_PKTINFO
+#endif
+
 /* Assert the correct structure size for on wire */
 __CTASSERT(sizeof(struct ip)   == 20);
 __CTASSERT(sizeof(struct udphdr)   == 8);
diff --git a/src/if-bsd.c b/src/if-bsd.c
index c3c95ba6..cdd959a6 100644
--- a/src/if-bsd.c
+++ b/src/if-bsd.c
@@ -1103,9 +1103,32 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr 
*ifam)
sin = (const void *)rti_info[RTAX_NETMASK];
mask.s_addr = sin != NULL && sin->sin_family == AF_INET ?
sin->sin_addr.s_addr : INADDR_ANY;
+
+#if defined(__NetBSD_Version__) && __NetBSD_Version__ < 8
+   /* NetBSD-7 and older send an invalid broadcast address.
+* So we need to query the actual address to get
+* the right one. */
+   {
+   struct in_aliasreq ifra;
+
+   memset(, 0, sizeof(ifra));
+   strlcpy(ifra.ifra_name, ifp->name,
+   sizeof(ifra.ifra_name));
+   ifra.ifra_addr.sin_family = AF_INET;
+   ifra.ifra_addr.sin_len = sizeof(ifra.ifra_addr);
+   ifra.ifra_addr.sin_addr = addr;
+   if (ioctl(ctx->pf_inet_fd, SIOCGIFALIAS, ) == -1) {
+   if (errno != EADDRNOTAVAIL)
+   logerr("%s: SIOCGIFALIAS", __func__);
+   break;
+   }
+   bcast = ifra.ifra_broadaddr.sin_addr;
+   }
+#else
sin = (const void *)rti_info[RTAX_BRD];
bcast.s_addr = sin != NULL && sin->sin_family == AF_INET ?
sin->sin_addr.s_addr : INADDR_ANY;
+#endif
 
 #if defined(__FreeBSD__) || defined(__DragonFly__)
/* FreeBSD sends RTM_DELADDR for each assigned address
@@ -1134,8 +1157,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr 
*ifam)
if (ifam->ifam_type == RTM_DELADDR)
addrflags = 0 ;
else if ((addrflags = if_addrflags(ifp, , NULL)) == -1) {
-   logerr("%s: if_addrflags: %s",
-   ifp->name, inet_ntoa(addr));
+   if (errno != EADDRNOTAVAIL)
+   logerr("%s: if_addrflags", __func__);
break;
}
 #endif
@@ -1160,7 +1183,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr 
*ifam)
if (ifam->ifam_type == RTM_DELADDR)
addrflags = 0;
else if ((addrflags = if_addrflags6(ifp, , NULL)) == -1) {
-   logerr("%s: if_addrflags6", ifp->name);
+   if (errno != EADDRNOTAVAIL)
+   logerr("%s: if_addrfl

Re: if_addrflags6: Can't assign requested address

2018-08-17 Thread Roy Marples

On 17/08/2018 09:04, Masanobu SAITOH wrote:

wm2: carrier lost
wm2: executing `/libexec/dhcpcd-run-hooks' NOCARRIER
wm2: deleting address fe80::1392:4012:56d8:a7a2
wm2: if_addrflags6: Can't assign requested address
wm2: if_addrflags6: Can't assign requested address
wm2: if_addrflags6: Can't assign requested address
wm2: if_addrflags6: Can't assign requested address
wm2: carrier acquired
wm2: executing `/libexec/dhcpcd-run-hooks' CARRIER


This helps.
I never saw this because on NetBSD-8, we have addrflags available in 
ifa_msghdr when sent over route(4). This does not exist on NetBSD-7 so 
we need to make an ioctl per address to work out the flags. Sadly, this 
is racy and this is what happens:


Something adds an address.
Kernel annnounces new address to route(4).
Something deletes this address.
Kernel announces the address deleted to route(4).

dhcpcd reads the address added message from route(4) *after* the address 
has been deleted from the kernel. Because dhcpcd needs the address flags 
at this point, an ioctl is made to the deleted address and boom, error.


Luckily dhcpcd handles it correctly and it's just noise.
Please test the attached patch to silence it.
If you can verify it works, let me know and I'll push a new version out.

Thanks

Roy
diff --git a/src/if-bsd.c b/src/if-bsd.c
index c3c95ba6..c03e4f6d 100644
--- a/src/if-bsd.c
+++ b/src/if-bsd.c
@@ -1134,8 +1134,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr 
*ifam)
if (ifam->ifam_type == RTM_DELADDR)
addrflags = 0 ;
else if ((addrflags = if_addrflags(ifp, , NULL)) == -1) {
-   logerr("%s: if_addrflags: %s",
-   ifp->name, inet_ntoa(addr));
+   if (errno != EADDRNOTAVAIL)
+   logerr("%s: if_addrflags", __func__);
break;
}
 #endif
@@ -1160,7 +1160,8 @@ if_ifa(struct dhcpcd_ctx *ctx, const struct ifa_msghdr 
*ifam)
if (ifam->ifam_type == RTM_DELADDR)
addrflags = 0;
else if ((addrflags = if_addrflags6(ifp, , NULL)) == -1) {
-   logerr("%s: if_addrflags6", ifp->name);
+   if (errno != EADDRNOTAVAIL)
+   logerr("%s: if_addrflags6", __func__);
break;
}
 #endif
diff --git a/src/if.c b/src/if.c
index eaebefa5..c1c81eb6 100644
--- a/src/if.c
+++ b/src/if.c
@@ -240,7 +240,7 @@ if_learnaddrs(struct dhcpcd_ctx *ctx, struct if_head *ifs,
addrflags = if_addrflags(ifp, >sin_addr,
ifa->ifa_name);
if (addrflags == -1) {
-   if (errno != EEXIST)
+   if (errno != EEXIST && errno != EADDRNOTAVAIL)
logerr("%s: if_addrflags: %s",
__func__,
inet_ntoa(addr->sin_addr));
@@ -266,7 +266,7 @@ if_learnaddrs(struct dhcpcd_ctx *ctx, struct if_head *ifs,
addrflags = if_addrflags6(ifp, >sin6_addr,
ifa->ifa_name);
if (addrflags == -1) {
-   if (errno != EEXIST)
+   if (errno != EEXIST || errno == EADDRNOTAVAIL)
logerr("%s: if_addrflags6", __func__);
continue;
}


Re: Running out of buffers?

2018-08-11 Thread Roy Marples

On 11/08/2018 16:41, Roy Marples wrote:

On 07/08/2018 17:54, Andreas Gustafsson wrote:

On April 28, Roy Marples wrote:

On 27/04/2018 23:58, Robert Elz wrote:
We really need to turn off the error on recv() by default - and 
allow it

to be turned on by applications that actually want to deal with this.


Why should we special case reporting this error instead of others?
While NetBSD might be the first BSD to report ENOBUFS for recv(), it's
certainly not the first OS to do so.


I suspect NetBSD may be the first and only to return ENOBUFS for recv()
on ordinary UDP sockets, and that this has broken BIND, which is
treating ENOBUFS on UDP recv() as an unrecoverable error; see PR 
misc/53421

and http://mail-index.netbsd.org/tech-kern/2018/08/07/msg023815.html .


There is not enough information to say for sure.
This could be a non validated address, the behaviour would be as described.


If there actually existed another OS that exhibited this behavior,
then surely BIND would have exposed the issue long ago, and either
BIND or the OS in case would have been fixed.

Please restore the old behavior, at least for UDP sockets.


Try reading the bind sources:
https://github.com/NetBSD/src/blob/trunk/external/bsd/bind/dist/lib/isc/unix/socket.c#L1923 



I'll quote it here for good measure:
ALWAYS_HARD(ENOBUFS, ISC_R_NORESOURCES);
/* Should never get this one but it was seen. */

That part of the code was imported into NetBSD over 10 years ago, which 
massively pre-dates my recent change to recv() on NetBSD.


Similar code in unbound (which I use extensively without issue) also 
tests for ENOBUFS, which again pre-dates my recv() change but instead 
works around the error instead of just calling it a day.


I've not found where bind opens the socket yet, but hopefully as it hard 
aborts specifically for ENOBUFS on recv it will ensure a large enough 
buffer is allocated for recv - unbound sets the maximum possible for 
reference.


So I found the code here:
https://github.com/isc-projects/bind9/blob/master/lib/isc/unix/socket.c#L309

/*%
 * The size to raise the receive buffer to (from BIND 8).
 */
#ifdef TUNE_LARGE
#ifdef sun
#define RCVBUFSIZE (1*1024*1024)
#else
#define RCVBUFSIZE (16*1024*1024)
#endif
#else
#define RCVBUFSIZE (32*1024)
#endif /* TUNE_LARGE */

So maybe bind just needs to be compiled with TUNE_LARGE set?
https://github.com/NetBSD/src/blob/trunk/external/bsd/bind/include/config.h#L597

Roy


Re: Running out of buffers?

2018-08-11 Thread Roy Marples

On 07/08/2018 17:54, Andreas Gustafsson wrote:

On April 28, Roy Marples wrote:

On 27/04/2018 23:58, Robert Elz wrote:

We really need to turn off the error on recv() by default - and allow it
to be turned on by applications that actually want to deal with this.


Why should we special case reporting this error instead of others?
While NetBSD might be the first BSD to report ENOBUFS for recv(), it's
certainly not the first OS to do so.


I suspect NetBSD may be the first and only to return ENOBUFS for recv()
on ordinary UDP sockets, and that this has broken BIND, which is
treating ENOBUFS on UDP recv() as an unrecoverable error; see PR misc/53421
and http://mail-index.netbsd.org/tech-kern/2018/08/07/msg023815.html .


There is not enough information to say for sure.
This could be a non validated address, the behaviour would be as described.


If there actually existed another OS that exhibited this behavior,
then surely BIND would have exposed the issue long ago, and either
BIND or the OS in case would have been fixed.

Please restore the old behavior, at least for UDP sockets.


Try reading the bind sources:
https://github.com/NetBSD/src/blob/trunk/external/bsd/bind/dist/lib/isc/unix/socket.c#L1923

I'll quote it here for good measure:
ALWAYS_HARD(ENOBUFS, ISC_R_NORESOURCES);
/* Should never get this one but it was seen. */

That part of the code was imported into NetBSD over 10 years ago, which 
massively pre-dates my recent change to recv() on NetBSD.


Similar code in unbound (which I use extensively without issue) also 
tests for ENOBUFS, which again pre-dates my recv() change but instead 
works around the error instead of just calling it a day.


I've not found where bind opens the socket yet, but hopefully as it hard 
aborts specifically for ENOBUFS on recv it will ensure a large enough 
buffer is allocated for recv - unbound sets the maximum possible for 
reference.


Roy


Re: if_addrflags6: Can't assign requested address

2018-08-11 Thread Roy Marples

Hi

On 08/08/2018 03:13, Masanobu SAITOH wrote:

  Hi.

  While testing netbsd-7, I've noticed dhcpcd put the following
message:

Configuring network interfaces: wm0wm0: if_addrflags6: Can't assign 
requested address

wm0: if_addrflags6: Can't assign requested address
wm0: if_addrflags6: Can't assign requested address
wm0: if_addrflags6: Can't assign requested address


  Can we ignore this message, or is it a real problem?

/etc/dhcpcd.conf is the default.


I just got back and cannot replicate this issue with the latest netbsd-7 
sources which ship with dhcpcd-7.0.7.


I use a XEN DOMU for testing. Can you provide more information please?

Roy




Re: if_addrflags6: Can't assign requested address

2018-08-08 Thread Roy Marples
That's a real problem
 I'm away from any test bed until next week so I'll try and look at it then.

Can you add debug to dhcpcd and maybe a logfile directive and attach the result 
to a reply please?

Roy

On 8 August 2018 04:13:30 CEST, Masanobu SAITOH  wrote:
>  Hi.
>
>  While testing netbsd-7, I've noticed dhcpcd put the following
>message:
>
>> Configuring network interfaces: wm0wm0: if_addrflags6: Can't assign
>requested address
>> wm0: if_addrflags6: Can't assign requested address
>> wm0: if_addrflags6: Can't assign requested address
>> wm0: if_addrflags6: Can't assign requested address
>
>  Can we ignore this message, or is it a real problem?
>
>/etc/dhcpcd.conf is the default.
>
>-- 
>---
> SAITOH Masanobu (msai...@execsw.org
>  msai...@netbsd.org)

-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: Cross-building release on MacOSX for amd64 fails...

2018-06-04 Thread Roy Marples

On 04/06/2018 17:11, K. Schreiner wrote:


...like so:

`progress.ro' is up to date.
 compile  dhcpcd/dhcp.o
/u/NetBSD/src/external/bsd/dhcpcd/dist/src/dhcp.c: In function 
'dhcp_arp_probed':
/u/NetBSD/src/external/bsd/dhcpcd/dist/src/dhcp.c:2105:2: error: implicit 
declaration of function 'ipv4ll_drop' [-Werror=implicit-function-declaration]
   ipv4ll_drop(ifp);
   ^~~
cc1: all warnings being treated as errors

*** Failed target:  dhcp.o
*** Failed command: /u/NetBSD/arch/amd64/TOOLS/bin/x86_64--netbsd-gcc -Os 
-fno-asynchronous-unwind-tables -pipe -fstack-protector -Wstack-protector 
--param ssp-buffer-size=1 -std=gnu99 -Wall -Wstrict-prototypes 
-Wmissing-prototypes -Wpointer-arith -Wno-sign-compare -Wsystem-headers 
-Wno-traditional -Wa,--fatal-warnings -Wreturn-type -Wswitch -Wshadow 
-Wcast-qual -Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare 
-Wold-style-definition -Wconversion -Wsign-compare -Wformat=2 
-Wno-format-zero-length -Werror --sysroot=/u/NetBSD/arch/amd64/dest 
-DHAVE_CONFIG_H -D_OPENBSD_SOURCE -DSMALL -DARP -DINET -DINET6 -DDHCP6 
-I/u/NetBSD/src/external/bsd/dhcpcd/include 
-I/u/NetBSD/src/external/bsd/dhcpcd/dist/src 
-I/u/NetBSD/arch/amd64/obj/distrib/amd64/ramdisks/ramdisk/dhcpcd 
-D_FORTIFY_SOURCE=2 -c /u/NetBSD/src/external/bsd/dhcpcd/dist/src/dhcp.c
*** Error code 1

Stop.


My bad!

Should be fixed now.

Roy


Re: Running out of buffers?

2018-05-01 Thread Roy Marples


On 01/05/2018 20:21, Roy Marples wrote:

Another patch.
This time to handle a reported overflow listening to ND6.


This one actually works

Index: sys/netinet6/in6_proto.c
===
RCS file: /cvsroot/src/sys/netinet6/in6_proto.c,v
retrieving revision 1.122
diff -u -p -r1.122 in6_proto.c
--- sys/netinet6/in6_proto.c15 Mar 2018 08:15:21 -  1.122
+++ sys/netinet6/in6_proto.c1 May 2018 19:33:42 -
@@ -597,7 +597,7 @@ int pmtu_expire = 60*10;
  * Nominal space allocated to a raw ip socket.
  */
 #defineRIPV6SNDQ   8192
-#defineRIPV6RCVQ   8192
+#defineRIPV6RCVQ   16384

 u_long rip6_sendspace = RIPV6SNDQ;
 u_long rip6_recvspace = RIPV6RCVQ;



Re: Running out of buffers?

2018-05-01 Thread Roy Marples

On 27/04/2018 21:34, Roy Marples wrote:

Hi Paul

On 27/04/2018 04:09, Paul Goyette wrote:

I've got lots of memory, so I don't understand what buffers are not
available.  Ever since upgrading to my current system (sources dated
2018-03-20 11:25:00 UTC), I've been seeing these messages at random
intervals:


Can you test the below patches please?
The kernel part bumps the default raw socket buffer from 8k to 16k
At least my ERLITE no longer complains about route socket overflow on boot.

The patch to syslogd ensures that the logpath socket receive buffer is a 
minimum of 16k - the current default is 4k.


Hopefully this fixes the issues and won't impact small memory devices 
too much.


Another patch.
This time to handle a reported overflow listening to ND6.

Index: sys/netinet6/in6_proto.c
===
RCS file: /cvsroot/src/sys/netinet6/in6_proto.c,v
retrieving revision 1.122
diff -u -p -r1.122 in6_proto.c
--- sys/netinet6/in6_proto.c15 Mar 2018 08:15:21 -  1.122
+++ sys/netinet6/in6_proto.c1 May 2018 19:18:22 -
@@ -597,7 +597,7 @@ int pmtu_expire = 60*10;
  * Nominal space allocated to a raw ip socket.
  */
 #defineRIPV6SNDQ   8192
-#defineRIPV6RCVQ   8192
+#defineRIPV6RCV2   16384

 u_long rip6_sendspace = RIPV6SNDQ;
 u_long rip6_recvspace = RIPV6RCVQ;


Re: Running out of buffers?

2018-04-28 Thread Roy Marples

On 27/04/2018 23:58, Robert Elz wrote:

 Date:Fri, 27 Apr 2018 21:34:49 +0100
 From:Roy Marples <r...@marples.name>
 Message-ID:  <df3a6231-f417-1d79-f135-bef0fe2f5...@marples.name>

   | Hopefully this fixes the issues and won't impact small memory devices
   | too much.

While those are probably useful changes to make, they don't fix anything,
merely make it less likely.


Until we can dynamically size the buffer in the kernel on demand you are 
correct.



We really need to turn off the error on recv() by default - and allow it
to be turned on by applications that actually want to deal with this.


Why should we special case reporting this error instead of others?
While NetBSD might be the first BSD to report ENOBUFS for recv(), it's 
certainly not the first OS to do so.


Looking at Pauls logs, ntpd is reporting this a fair bit.
Looking at ntpd, it already *has* logic to deal exclusivly with this 
error - it logs it and continues. Any other error and it closes the 
socket and gives up.


Roy


Re: Running out of buffers?

2018-04-27 Thread Roy Marples

Hi Paul

On 27/04/2018 04:09, Paul Goyette wrote:

I've got lots of memory, so I don't understand what buffers are not
available.  Ever since upgrading to my current system (sources dated
2018-03-20 11:25:00 UTC), I've been seeing these messages at random
intervals:


Can you test the below patches please?
The kernel part bumps the default raw socket buffer from 8k to 16k
At least my ERLITE no longer complains about route socket overflow on boot.

The patch to syslogd ensures that the logpath socket receive buffer is a 
minimum of 16k - the current default is 4k.


Hopefully this fixes the issues and won't impact small memory devices 
too much.


Roy
Index: sys/net/raw_cb.h
===
RCS file: /cvsroot/src/sys/net/raw_cb.h,v
retrieving revision 1.28
diff -u -p -r1.28 raw_cb.h
--- sys/net/raw_cb.h25 Sep 2017 01:56:22 -  1.28
+++ sys/net/raw_cb.h27 Apr 2018 20:30:55 -
@@ -57,7 +57,7 @@ struct rawcb {
  * Nominal space allocated to a raw socket.
  */
 #defineRAWSNDQ 8192
-#defineRAWRCVQ 8192
+#defineRAWRCVQ 16384
 
 LIST_HEAD(rawcbhead, rawcb);
 
Index: usr.sbin/syslogd/syslogd.c
===
RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
retrieving revision 1.124
diff -u -p -r1.124 syslogd.c
--- usr.sbin/syslogd/syslogd.c  10 Sep 2017 17:01:07 -  1.124
+++ usr.sbin/syslogd/syslogd.c  27 Apr 2018 20:30:56 -
@@ -75,6 +75,9 @@ __RCSID("$NetBSD: syslogd.c,v 1.124 2017
 #include "syslogd.h"
 #include "extern.h"
 
+/* Minimum size of the logpath socket buffer */
+#defineRCVBUFLEN   16384
+
 #ifndef DISABLE_SIGN
 #include "sign.h"
 struct sign_global_t GlobalSign = {
@@ -480,6 +483,9 @@ getgroup:
die(0, 0, NULL);
}
for (j = 0, pp = LogPaths; *pp; pp++, j++) {
+   int buflen;
+   socklen_t socklen = sizeof(buflen);
+
DPRINTF(D_NET, "Making unix dgram socket `%s'\n", *pp);
unlink(*pp);
memset(, 0, sizeof(sunx));
@@ -493,6 +499,19 @@ getgroup:
die(0, 0, NULL);
}
DPRINTF(D_NET, "Listening on unix dgram socket `%s'\n", *pp);
+   if (getsockopt(funix[j], SOL_SOCKET, SO_RCVBUF,
+  , ) == -1) {
+   logerror("getsockopt: SO_RCVBUF: `%s'", *pp);
+   continue;
+   }
+   if (buflen >= RCVBUFLEN)
+   continue;
+   buflen = RCVBUFLEN;
+   if (setsockopt(funix[j], SOL_SOCKET, SO_RCVBUF,
+  , socklen) == -1) {
+   logerror("setsockopt: SO_RCVBUF: `%s'", *pp);
+   continue;
+   }
}
 
if ((fklog = open(_PATH_KLOG, O_RDONLY, 0)) < 0) {


Re: Running out of buffers?

2018-04-27 Thread Roy Marples

On 27/04/2018 09:45, Patrick Welche wrote:

The very odd situation in which I saw those buffer overflows, is simply
on a home machine, so flaky home broadband, running a pkg_rolling-replace.
The machine has 32G of memory, but from your message that is irrelevant.
The urtwmn0 was struggling (that's new BTW) and I kept having to
/etc/rc.d/wpa_supplicant restart. While texlive was being downloaded,
I hit ctrl-C, and then saw the reams of buffer messages. In terms of
routing, there is just 1 default route. Maybe all the wpa_supplicant
restarts and dhcpcd kicking in helped? (Doesn't really fit the picture...)


Which application logged the error? Both wpa_supplicant and dhcpcd look 
at route(4).

dhcpcd will note it and call getifaddrs(3) to resync the state of affairs.
wpa_supplicant will just log the error and continue.

Roy


Re: Running out of buffers?

2018-04-27 Thread Roy Marples



On 27/04/2018 07:05, Robert Elz wrote:

 Date:Fri, 27 Apr 2018 05:18:16 +0100
 From:Roy Marples <r...@marples.name>
 Message-ID:  <58598dae-238e-44a5-e74f-bbb2fdd7b...@marples.name>

   | No-one has yet weighed in on how this should be resolved.

Go back to silently discarding the error (at least, by default).
Datagram type services (which is what the routing socket,
and others like it, are) generally are just "best effort" with
no error reporting at all.   The higher level protocol needs to
cope.

However, some kind of sockioctl() to enable error reporting
would be OK, for applications that actually need to know
(but they still need to cope with data lost for other reasons.)

What might be interesting to discover however, is just why
there are so many routing socket errors with the buffer space
exhausted - particularly on a huge system like Paul's.  I would
have expected this to be rare assuming everything else is
working correctly.


My understanding (and I've not looked, so could be wrong) is that the 
routing socket gets a 2k buffer by default regardless of how big your 
memory is.


Since NetBSD-5, I've modified the kernel to announce IPv6 address state 
changes, introduced IPv4 address state changes which are also announced 
AND added a layer of compat to the more generic RTM messages so that 
interface address changes can report back PID and flags. In other words, 
while a 2k buffer might have been fine for NetBSD-4 (and we'll never 
really know because overflow errors were silently dropped) it might not 
be fine for a router with many addresses that all become active with the 
internet decides to work. This is an important part because of all the 
NetBSD machines I have, the routing socket only overflows on my ERLITE 
router. The other physical servers, laptops and VMs I have do not.


Roy


Re: Running out of buffers?

2018-04-26 Thread Roy Marples

On 27/04/2018 04:09, Paul Goyette wrote:
I've got lots of memory, so I don't understand what buffers are not 
available.  Ever since upgrading to my current system (sources dated 
2018-03-20 11:25:00 UTC), I've been seeing these messages at random 
intervals:


Apr 23 05:51:33 speedy ntpd[526]: routing socket reports: No buffer 
space available


This may come as some suprise, but the only change is that the error is
now logged. Previously it was silenty discarded.

No-one has yet weighed in on how this should be resolved.



I never saw them with a previous kernel (from March 3rd), so it
would seem that something changed between the 3rd and 20th.

Is anyone else seeing similar?

Any clues on what changed?

The situation doesn't seem fatal (at least, not yet), but I'd like
to mitigate the condition before it gets worse.  :)


Ideas welcome!
The only one stop solution I can think of is increasing the the default
buffer size, but this might adversley affect small memory systems.


Thanks in advance for any suggestions.


Looking forward to hearimg some!

Roy


Re: -current cloner interfaces broken/gone/unusable

2018-04-26 Thread Roy Marples

On 23/04/2018 23:34, Robert Swindells wrote:


Frank Kardel  wrote:

using -current as of 20180421 (NetBSD 8.99.14 (GENERIC) #0: Sat Apr 21
23:01:29 UTC 2018
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64)

no cloning interfaces are visible:

gateway# ifconfig -l
ixg0 ixg1 ixg2 ixg3 lo0 tun0 tun1
gateway# ifconfig -C
ifconfig: SIOCIFGCLONERS for count: Device not configured
gateway# ifconfig vlan0 create
ifconfig: clone_command: Device not configured
ifconfig: exec_matches: Device not configured
gateway#

This does not seem to be a desirable state - any clues what broke here ?


It looks to be the test for a valid interface name in
sys/compat/common/uipc_syscalls_50.c that is causing this, I think it
should only be done when the ioctl command is SIOCGIFDATA or SIOCZIFDATA.

This works for me but is a bit ugly:

Index: uipc_syscalls_50.c
===
RCS file: /cvsroot/src/sys/compat/common/uipc_syscalls_50.c,v
retrieving revision 1.4
diff -u -r1.4 uipc_syscalls_50.c
--- uipc_syscalls_50.c  12 Apr 2018 18:50:13 -  1.4
+++ uipc_syscalls_50.c  23 Apr 2018 22:33:14 -
@@ -63,9 +63,17 @@
 struct ifnet *ifp;
 int error;
  
-   ifp = ifunit(ifdr->ifdr_name);

-   if (ifp == NULL)
-   return ENXIO;
+   switch (cmd) {
+   case SIOCGIFDATA:
+   case SIOCZIFDATA:
+   ifp = ifunit(ifdr->ifdr_name);
+   if (ifp == NULL)
+   return ENXIO;
+   break;
+   default:
+   ifp = NULL;
+   break;
+   }
  
 switch (cmd) {

 case SIOCGIFDATA:



Committed, thanks

Roy


Re: Automated report: NetBSD-current/i386 test failure

2018-04-24 Thread Roy Marples

On 24/04/2018 15:27, Martin Husemann wrote:

On Mon, Apr 23, 2018 at 08:51:52AM +, NetBSD Test Fixture wrote:

This is an automatically generated notice of new failures of the
NetBSD test suite.

The newly failing test cases are:

 net/ndp/t_ra:ra_basic

[..]

 2018.04.20.11.25.39 roy src/usr.sbin/rtadvd/rtadvd.c,v 1.63
 2018.04.20.11.31.54 roy src/usr.sbin/rtadvd/rtadvd.c,v 1.64
 2018.04.20.13.27.45 roy src/usr.sbin/rtadvd/timer.c,v 1.16
 2018.04.20.13.27.45 roy src/usr.sbin/rtadvd/timer.h,v 1.10
 2018.04.20.15.29.19 roy src/usr.sbin/rtadvd/config.c,v 1.39
 2018.04.20.15.57.23 roy src/usr.sbin/rtadvd/config.c,v 1.40
 2018.04.20.15.57.23 roy src/usr.sbin/rtadvd/rtadvd.c,v 1.65
 2018.04.20.15.57.23 roy src/usr.sbin/rtadvd/rtadvd.h,v 1.17
 2018.04.20.15.59.17 roy src/usr.sbin/rtadvd/timer.c,v 1.17
 2018.04.20.16.07.48 roy src/usr.sbin/rtadvd/timer.c,v 1.18
 2018.04.20.16.18.18 roy src/usr.sbin/rtadvd/rtadvd.h,v 1.18
 2018.04.20.16.37.17 roy src/usr.sbin/rtadvd/rtadvd.conf.5,v 1.19
 2018.04.20.16.37.17 roy src/usr.sbin/rtadvd/rtadvd.h,v 1.19


The test seems to assume that rtadvd will send at least one RA at
startup, but the kernel does not count any in this case.

So the test case loops in "await_RA" with a current RA count of 0, untill
ATF times the whole process out.


Maybe the kernel is rejecting the RA somehow?
I've checked and rtadvd is still sending 3 unsolicited RA's - one pretty 
much ASAP and two 15 seconds afterwards, all with some randomisation.


Enable nd6_debug on the rump kernel expecting to process them and check 
for errors.


Roy


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Roy Marples

Hi Tom

On 24/04/2018 12:39, Tom Ivar Helbekkmo wrote:

Thomas Klausner <t...@giga.or.at> writes:


On Tue, Apr 24, 2018 at 08:56:48AM +0100, Roy Marples wrote:

Saying this, from what I'm hearing this only happens at boot time, so we
could potentially shrink the buffer back down again if we need to consider
dynamically growing it in the kernel as well. No idea if that's even
possible or what performance impact it would have.


I had an application report an UDP error with "no buffer space
available". I don't remember the exact error, sorry. But it was
definitely some time after system start.
  Thomas


I keep getting those, and have been for a long, long time:

Apr 24 02:44:27 barsoom openvpn[301]: write UDPv4: No buffer space available 
(code=55)
Apr 24 05:54:47 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 07:24:54 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:08 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 08:53:09 barsoom openvpn[292]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:15:09 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 10:45:14 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 11:35:18 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)
Apr 24 13:15:12 barsoom openvpn[305]: write UDPv4: No buffer space available 
(code=55)


This unrelated to the issue at hand.

That's an upstream issue - the send and write family calls have been 
returning ENOBUFS for quite a while on all OS's I know of.


Roy


Re: -current cloner interfaces broken/gone/unusable

2018-04-24 Thread Roy Marples

On 24/04/2018 08:26, Martin Husemann wrote:

On Tue, Apr 24, 2018 at 07:30:04AM +0200, Frank Kardel wrote:

syslogd has sometimes issues with /var/run/log
2018-04-24T05:13:34.542548+00:00 gateway syslogd 408 - - recvfrom() unix
`/var/run/log': No buffer space available


This is a seaparate change and unrelated to compatibility. It happens
with up to date binaries as well. I think it was a silent bug before
and has now been made more verbose. Still pretty annoying and happens
for me on various machines on every boot. Roy, did you have a chance to
look at it?


Not yet no. But yes, in all releases prior it was a silent bug on all 
types of socket and in all the BSDs as well. I know, I checked - only 
OpenBSD has an overflow check like this and they solve that with a magic 
message on route(4) only which is just yuck as it makes the problem worse.


I only have one machine where I can reliably repro this, my erlite and 
that only happens because route(4) overflows (detected in dhcpcd) as 
it's a router and the box isn't up yet and a load of address validation 
flows over the socket when the link comes up. This is a good thing, 
because dhcpcd can then react to the error and sync it's state using 
getifaddrs().


I think the easiest fix is to increase the default size of the socket 
buffer. Where this is done, I don't know but could find out if pushed.

This would fix everything if the default buffer was big enough.

Saying this, from what I'm hearing this only happens at boot time, so we 
could potentially shrink the buffer back down again if we need to 
consider dynamically growing it in the kernel as well. No idea if that's 
even possible or what performance impact it would have.


The last option is to increase the socket buffer size in all affected 
applications using ioctl (or is it setsockopt?). But to what value I 
don't know. Trial and error?


Roy


Re: dhcpcd vs dhclient: part II (fxp0)

2018-03-08 Thread Roy Marples

On 08/03/2018 14:50, John D. Baker wrote:

I see this behavior with all fxp(4) interfaces under 'dhcpcd'.  The
carrier status from the interface bounces and causes 'dhcpcd' to
repeatedly configure/unconfigure the interface.

The workaround I use is to add:

   interface fxp0
 nolink

to my "/etc/dhcpcd.conf" file or pass "-K" option on command line.


This should be alleviated somewhat in -current and -8 with the new 
address handling.
The link will still flap, but dhcpcd will now persist the lease and give 
time for the lease to be renewed before expiring it and going into 
discovery once more.


Roy


Re: DHCP client: dhclient vs dhcpcd ?

2018-02-02 Thread Roy Marples

On 02/02/2018 12:24, Riccardo Mottola wrote:
does dhpcd share code/features with dhcpcd found on other systems beyond 
the name?


Every feature bar one found in ISC dhclient can be found in dhcpcd.
The one missing feature is the ability of the DHCP client to directly 
update a DNS server with it's FQDN/hostname. My view is that this is 
best left to the DHCP server to handle the update as frequently both are 
on the same machine and thus no extra security is needed.


dhcpcd also supports a very similar script environment - most of the 
variables have the same name and format.


No code is shared at all between the two projects.

Do you have any details on why dhcpcd failed and how dhclient worked, 
like say packet captures?
I can probably guess though - some DHCP servers only work when the 
client id is in a format they know. However, this is not RFC 
compliant. Luckily dhcpcd can be configured to sent a client id the 
DHCP server does like, but this is not out of the box config, but is 
documented in said config.


Likewise dhclient doesn't work on links where a clientid is required 
out of the box. 


I too essentially always use dhclient, it works, while I had issues with 
dhpcd. I have not yet a situation where the opposite true, but I am not 
doubting there are.


dhcpcd not working where the others work has another unpleasant side 
effect: not working during the installer.


I will see if I can find again a network where dhpcd fails.. I hope it 
was like at home, at the office or at my parents, so it is something 
easily to reproduce. If it is a network on a customer's site I might not 
have access to it anymore.. who knows which one it was!


One thing I did not try at the time is to see if it was a purely dhcp 
issue or also network card depndent (e.g. wireless would work, wired 
not). I think to remember that fiddling with the media type helped on 
the wired network, but I might confusing the issues and in any case 
dhclient "did all the magic" dhcpcd not :)


I don't think dhclient does anything special with the media.
I will say that we have a PR where dhcpcd fails during the installer and 
dhclient works, but this turned out to be a hardware failure with the 
carrier detection. Swapping out the faulty hardware made the problem go 
away. dhcpcd is very sensitive to carrier up/down events.


This has improved a lot in NetBSD-8 thanks to moving IPv4 DaD from 
dhcpcd into the kernel which allows dhcpcd to maintain the lease on the 
interface if the link flaps and still be a good network citizen.


Roy


Re: DHCP client: dhclient vs dhcpcd ?

2018-02-01 Thread Roy Marples

Hi Thomas

On 01/02/2018 07:17, Thomas Mueller wrote:

On Wed, Jan 31, 2018 at 1:18 PM, KIRIHARA Masaharu  wrote:

NetBSD has two DHCP clients; dhclient(8) and dhcpcd(8).
What's the difference?
Which is better to use?


I'm biased as I maintain dhcpcd, but dhcpcd is better in every way.

 
On Wed, 31 Jan 2018 13:47:42 +0100, Benny Siegert responded:



I agree that this is confusing. dhclient is the older tool, while
dhcpcd has been created by a NetBSD developer, is newer and smaller. I
have run into situations (on Google Compute Engine for instance) where
dhclient was unable to interpret some of the more modern DHCP
features.
  

I recommend using dhcpcd :)


I have read about NetBSD planning to drop dhclient in favor of dhcpcd.

I have had installations where dhcpcd succeeded where dhclient failed, and (7.99.1 
amd64) where dhclient succeeded where dhcpcd failed >
Failure means not being able to set up the internet connection even if the 
command ran without error messages.


Do you have any details on why dhcpcd failed and how dhclient worked, 
like say packet captures?
I can probably guess though - some DHCP servers only work when the 
client id is in a format they know. However, this is not RFC compliant. 
Luckily dhcpcd can be configured to sent a client id the DHCP server 
does like, but this is not out of the box config, but is documented in 
said config.


Likewise dhclient doesn't work on links where a clientid is required out 
of the box.



I have also had a situation where neither dhcpcd nor dhclient could establish 
the internet connection, but I was able to connect by using ifconfig and route 
directly.


More details please.



I notice NetBSD's dhclient is very big while FreeBSD's dhclient is much 
smaller, like

$ ls -l /sbin/dhclient
-r-xr-xr-x  1 root  wheel  100056 Jul 31  2017 /sbin/dhclient
$ ls -l /media/zip0/sbin/dh*
-r-xr-xr-x  1 root  wheel  5352184 Jun 20  2017 /media/zip0/sbin/dhclient
-r-xr-xr-x  1 root  wheel 6221 Jun 20  2017 /media/zip0/sbin/dhclient-script
-r-xr-xr-x  1 root  wheel   299176 Jun 20  2017 /media/zip0/sbin/dhcpcd

running from FreeBSD 11.1-STABLE where /media/zip0 is mount point for NetBSD 
8.99.1 installation.

FreeBSD uses dhclient in base system, which does not include dhcpcd.


FreeBSD dhclient is based on OpenBSD one, which is basically a very 
stripped down and old ISC dhclient which supports DHCPv4 only and isn't 
extendable.
NetBSD ships a more modern and non stripped down ISC dhclient which is 
more bloaty and extendable but offers more features like say DHCPv6.


For a fair comparison dhcpcd can be compiled for DHCPv4 only (like 
FreeBSD and OpenBSD) and it is currently 120k on i386. But even then, 
that includes the control socket code AND custom DHCP option parsing 
code to pass to shell scripts which cannot currently be stripped out.


But frankly, with your numbers above, a client with all the features 
dhcpcd has and only weighing in at 299176 on disk is pretty impressive 
- newer versions in more recent NetBSD are smaller still.


Roy


Re: Crash related to VLANs in Oct 18th -current

2017-10-24 Thread Roy Marples

On 24/10/17 23:34, Roy Marples wrote:



On 24/10/17 23:30, Roy Marples wrote:

On 24/10/17 13:27, Tom Ivar Helbekkmo wrote:

Roy Marples <r...@marples.name> writes:


This should only happen when dhcpcd is restarted.


I just checked, and when I restart dhcpcd (from current, with your
latest patch manually added), it correctly does a gratuitous arp
announcement on the right VLAN -- while the UDP checksum error messages
are comfortably absent.  :)


You should also apply this subsequent patch picked up during further 
tesing:
https://roy.marples.name/git/dhcpcd.git/commit/?id=8dc83479f50e2ed8b51c5a9383d27367bea1ecea 


Whups :)
Minor change here also:
https://roy.marples.name/git/dhcpcd.git/commit/?id=b091529ddd7d0541548b0a41e78a84bcc65364ef 


Must be a bad hair day.
https://roy.marples.name/git/dhcpcd.git/commit/?id=9ab9c8f51d05a0cb07d1ce641eabfdab61cb107d
https://roy.marples.name/git/dhcpcd.git/commit/?id=621d35c15337577c154ca549aedf4649cc524ba9

I think that should be it now.
All test cases on all platforms currently passing.

Roy


Re: Crash related to VLANs in Oct 18th -current

2017-10-24 Thread Roy Marples



On 24/10/17 23:30, Roy Marples wrote:

On 24/10/17 13:27, Tom Ivar Helbekkmo wrote:

Roy Marples <r...@marples.name> writes:


This should only happen when dhcpcd is restarted.


I just checked, and when I restart dhcpcd (from current, with your
latest patch manually added), it correctly does a gratuitous arp
announcement on the right VLAN -- while the UDP checksum error messages
are comfortably absent.  :)


You should also apply this subsequent patch picked up during further 
tesing:
https://roy.marples.name/git/dhcpcd.git/commit/?id=8dc83479f50e2ed8b51c5a9383d27367bea1ecea 


Whups :)
Minor change here also:
https://roy.marples.name/git/dhcpcd.git/commit/?id=b091529ddd7d0541548b0a41e78a84bcc65364ef


Re: Crash related to VLANs in Oct 18th -current

2017-10-24 Thread Roy Marples

On 24/10/17 13:27, Tom Ivar Helbekkmo wrote:

Roy Marples <r...@marples.name> writes:


This should only happen when dhcpcd is restarted.


I just checked, and when I restart dhcpcd (from current, with your
latest patch manually added), it correctly does a gratuitous arp
announcement on the right VLAN -- while the UDP checksum error messages
are comfortably absent.  :)


You should also apply this subsequent patch picked up during further tesing:
https://roy.marples.name/git/dhcpcd.git/commit/?id=8dc83479f50e2ed8b51c5a9383d27367bea1ecea

Roy


Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Roy Marples

On 23/10/2017 12:18, Roy Marples wrote:

On 23/10/2017 11:28, Tom Ivar Helbekkmo wrote:

Has something changed that makes dhcpcd now insist on listening to all
interfaces (including the 802.1q trunk)?


Yes.
I will try and improve the logic so it's only the relevant interfaces.
The change was made to allow IP address sharing on many interfaces via
DHCP without actually removing the IP address from the non active
interfaces.
This might have been over-zealous on my part.


Can I make it not do that?


Currently not, no.
Hopefully I can change it so that no toggle for it is needed.


Patch here to make it not do this anymore:
https://roy.marples.name/git/dhcpcd.git/commit/?id=c72da9a1ce60d006136c5aa3e1c923d96761a171

The caveat is that we now need to ARP announce the address during reboot 
to ensure dhcpcd gets the reply on an active interface.


Let me know how it works for you.

Roy



Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Roy Marples
On 23/10/2017 14:08, Thor Lancelot Simon wrote:
> I think it is safe to say that an interface which is participating
> in an interface stack such as vlan or agr should never be given an
> address unless the user has explicitly configured the system to do
> so.  The sane default is to give addresses to the leaf interfaces
> only (e.g. vlan) not the root nor intermediate nodes (wm, agr, etc --
> noting of course that any of these interfaces _could_ be the leaf,
> but in fact are not).

The mere act of bringing an interface up will generally assign it an
IPv6 link-local address.
dhcpcd doesn't change this behaviour.

Luckily this can be disabled in dhcpcd quite easy:
# Global default is IPv6 on all interfaces
interface wm0
noipv6 # Disable IPv6 on wm0

Or reverse the logic
noipv6 # Disable IPv6 globally
interface wm0
ipv6 # Enable IPv6 for wm0

Or just disallow the interface entirely:
denyinterfaces wm0

Or just allow some interfaces whilst denying others:
allowinterfaces wm0

And you can stop the kernel from doing this too if not using dhcpcd
ndp -i wm0 -- -auto_linklocal

Roy


Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Roy Marples
On 23/10/2017 11:28, Tom Ivar Helbekkmo wrote:
> Has something changed that makes dhcpcd now insist on listening to all
> interfaces (including the 802.1q trunk)?

Yes.
I will try and improve the logic so it's only the relevant interfaces.
The change was made to allow IP address sharing on many interfaces via
DHCP without actually removing the IP address from the non active
interfaces.
This might have been over-zealous on my part.

> Can I make it not do that?

Currently not, no.
Hopefully I can change it so that no toggle for it is needed.

> Oh, and I notice that IPv6 generates a local address on wm0, as on
> everything else.  That just looks weird on an 802.1q trunk.  Is there a
> way to make it not do that?

I don't know anything about 802.1q trunks.
How can I tell that it is one, and why shouldn't it have a local address?

> 
> # cat /etc/ifconfig.wm0
> 
> up
> media 100baseTX mediaopt full-duplex
> ip4csum tcp4csum udp4csum
> 
> # ifconfig wm0
> 
> wm0: flags=0x8843 mtu 1500
> capabilities=2bf80
> capabilities=2bf80
> capabilities=2bf80
> enabled=3f00
> enabled=3f00
> ec_capabilities=7
> ec_enabled=3
> address: 00:13:72:f7:00:06
> media: Ethernet 100baseTX full-duplex
> status: active
> inet6 fe80::213:72ff:fef7:6%wm0/64 flags 0x0 scopeid 0x1
> 
> Which VLAN is that IPv6 address on, anyway?  :)

No idea.
It's the address belonging to wm0 interface.
See my earlier query.

Even if dhcpcd is not used, if IPv6 is enabled in the kernel and
auto-link local is set for the interface (which it is by default and it
looks like you've not disabled it in ifconfig.wm0) then you would get
this address anyway.

Roy



Re: Crash related to VLANs in Oct 18th -current

2017-10-23 Thread Roy Marples
On 23/10/2017 07:42, Kengo NAKAHARA wrote:
> Hi,
> 
> On 2017/10/22 23:56, Tom Ivar Helbekkmo wrote:
>> Tom Ivar Helbekkmo  writes:
>>
>>> That did the trick!  Thank you!  :)
> 
> Thank you for your testing!
> 
>> I'm actually wondering if there may be something else strange going on.
>> Everything works fine -- but I have this dhcpcd running, because one of
>> my VLANs is connected to a network where this machine has to accept a
>> DHCP provisioned IP address from a server.  I run "dhcpcd -q vlan9", and
>> also give it a configuration file that should keep it from doing
>> anything I don't want:
>>
>> allowinterfaces vlan9
>> interface vlan9
>> background
>> persistent
>> hostname_short
>> nogateway
>> nohook resolv.conf, wpa_supplicant, hostname, ntp.conf
>> script /usr/bin/true

You could use script /dev/null or maybe just script by itself, then
dhcpcd won't even try and call the script. Which makes it more efficient.

>>
>> However, after this last upgrade, I keep getting messages from dhcpcd
>> about other interfaces, where this host is the DHCP server, like:
>>
>> Oct 22 16:48:28 barsoom dhcpcd[16236]: vlan2: invalid UDP packet from
>> 172.27.201.1
>> Oct 22 16:48:28 barsoom dhcpcd[16236]: wm0: invalid UDP packet from
>> 172.27.201.1
>>
>> This happens every time a host on one of the other VLANs gets an address
>> from the local DHCP server, and I get this pair of messages; one for the
>> VLAN in question, one for wm0, which is the vlanif with the trunk on it.
>>
>> Running 8.99.1 from about two months ago, these messages did not occur.

This normally indicates a UDP checksum failure.
For future versions, I've improved the message here:
https://roy.marples.name/git/dhcpcd.git/commit/?id=53bad6f740d66108c7412a492819e4c7e17bff51

> 
> Hmm..., sorry, I am not sure about this problem from that information.
> Could you get tcpdump? Of course, if it is not a problem, please do it.
> 
> 
>> roy@n.o
> 
> I think the issue seems to be related to DHCP. Could you think of any
> other way to solve it?

Maybe try disabling hardware processing of UDP checksums on the interface?

Roy


Re: Any actions regarding WPA2 vulnerabilities

2017-10-17 Thread Roy Marples

On 16/10/2017 20:40, m...@netbsd.org wrote:

On Mon, Oct 16, 2017 at 06:26:09PM +0200, Dmitry Salychev wrote:

Hi, guys.

Are there patches for these WPA2 vulnerabilities? Are there affected ports?
I haven't seen any message regarding the subject. Thanks.

Regards,

- Dmitry


Hi,

We rely on wpa_supplicant/hostapd for WPA2. They have released a set of
patches and spz@ already patched -current, it is also pullup-8 #324,
pullup-7 #1517, pullup-6 #1507.


Many thanks to spz@ for the fast application of patches!


Re: -current vs MKINET6=NO

2017-08-12 Thread Roy Marples

On 12/08/2017 06:09, Geoff Wing wrote:

Hi,
the following files need changes to build a full tree with MKINET6=NO

external/apache2/mDNSResponder/dist/mDNSPosix/mDNSUNP.c
external/bsd/dhcpcd/dist/src/dhcpcd.c
external/bsd/dhcpcd/dist/src/if-bsd.c
external/bsd/tcpdump/bin/Makefile

mDNSUNP.c needs
#include 
for some IFF_* definitions.

dhcpd stuff needs quite a few changes to remove calls to ip6 stuff


dhcpcd patch is quite simple.
https://dev.marples.name/rDHC32ee94da8d8c9d15a28a92ddb6760baf2c87fd23

And one to build without INET (not that we have a knob for that atm)
https://dev.marples.name/rDHC90fabbf1826344d53835f7054655792baf7aa0b4

I'll look into importing a new dhcpcd with these changes and many others 
this weekend.


Roy


Re: long delay getting address from ISP w/-current dhcpcd

2017-07-25 Thread Roy Marples
Hi John

On 24/07/2017 21:46, John D. Baker wrote:
> On Mon, 24 Jul 2017, John D. Baker wrote:
> 
>> Now that it has generated a new DUID (and once the ISP's DHCP server
>> issues a lease for it), I'll need to be sure and copy the "duid" file
>> as "/etc/dhcpcd.duid" for the netbsd-7 installation on the CF card.
>> Then, an update to netbsd-8 will migrate it to "/var/db/dhcpcd/duid".
> 
> This seems to have been a case of different DUID values between the
> local disk (CF) installation and the NFS-root installation and the ISP's
> behavior when being presented with a DUID of which it doesn't yet have
> (or no-longer has) a record.
> 
> Copying the "/var/db/dhcpcd/duid" file from the -current NFS install to
> "/etc/dhcpcd.duid" on the netbsd-7 CF install ensured that either case
> would get an address quickly.  The subsequent upgrade to 8.0_BETA migrated
> the "/etc/dhcpcd.duid" file to "/var/db/dhcpcd/duid" and everything is
> working nicely now.

Glad it working nicely for you now!
Maybe we should put something about this change in some upgrade notes if
we have any?

> Now, to keep this behavior in mind should I put my old SS5-based router
> back into service or replace it with an ERLITE or one of the supported
> RouterBoard products.

Now you know the root cause, whatever you put in place is entirely your
choice.
I myself run an ERLITE router with dhcpcd to negotiate stuff (although
just from the cable modem which has it's own DHCP server) - but it boots
entirely off the USB stick and not NFS as some like to do.

Roy


Re: Seg Faults building devel/gobject-inspection with latest 8.0_BETA

2017-06-22 Thread Roy Marples

On 21/06/2017 23:25, D'Arcy Cain wrote:

On 06/21/17 17:02, D'Arcy Cain wrote:

On 06/21/17 11:04, D'Arcy Cain wrote:

#0  0x70e05a605663 in ?? ()
#1  0x70e05a200585 in ?? () from /usr/pkg/lib/libgthread-2.0.so.0
#2  0x70e0665e9b60 in ?? ()
#3  0x70e05a200669 in _fini () from /usr/pkg/lib/libgthread-2.0.so.0
#4  0x in ?? ()

I am deleting glib2 and dependents and trying again.


Same thing.  Rebuilding everything with symbols to further debug.


Well, I half expected this.  I built with symbols and it didn't dump 
core.  Now what?


Pretty sure this is a toolchain issue.
I solved this by using clang instead of gcc - yes I realise it's not 
everyone's idea of a fix.


Roy


Re: HEADS-UP: /bin/sh memory management bug fixes committed

2017-06-17 Thread Roy Marples

On 18/06/2017 00:22, Robert Elz wrote:

 Date:Sat, 17 Jun 2017 14:09:02 -0700
 From:Alistair Crooks 
 Message-ID:  

Re: NetBSD-current amd64 with dhcpcd connects only par

2017-06-15 Thread Roy Marples
On 14/06/2017 23:59, Robert Nestor wrote:
> Not sure if this helps but noticed the user is using a cable modem connected 
> to Time Warner.  I have the same type of connection on my amd64 system and 
> I’ve noticed similar issues of not always being able to get connected via 
> dhcp.  And like the original poster I sometimes switch between systems using 
> this connection.  Sometimes it works and sometimes it doesn’t.
> 
> What I’ve found is that I can’t get a new connection to Time Warner as long 
> as it thinks my previous lease hasn’t expired or been released.  I either 
> have to release the connection before switching between systems (NetBSD, 
> FreeBSD, Linux, etc) or I have to reboot the Time Warner cable modem which 
> seems to force the return of the lease on my assigned IP address.  When I do 
> this I’ve always been successful getting a dhcp setup using either dhclient 
> or dhcpcd.
> 

This could be because each dhcp client on each OS uses a different
ClientID inside the DHCP transaction.

At least with dhcpcd you can control this on each one by sharing the
DUID file between hosts. It doesn't change once written (although if
dhcpcd writes it then it will change each time).
If the MAC address changes then specify the same IAID for each interface
in dhcpcd.conf across each OS as well. Example:

# NetBSD: bge0 is eth0 on Linux
interface bge0
iaid 0xdeadbeef

# Linux: eth0 is bge0 on NetBSD
interface eth0
iaid 0xdeadbeef

Roy


Re: NetBSD-current amd64 with dhcpcd connects only partially

2017-06-14 Thread Roy Marples
On 14/06/2017 03:03, Thomas Mueller wrote:
> FreeBSD uses dhclient, which works when the driver is OK (re or rsu).

Does this imply that on FreeBSD the driver sometimes fails?
dhcpcd (an older version) is in the FreeBSD ports tree as well as
another reference point - you could try that also.

But at this point it doesn't sound like a dhcpcd issue, it's elsewhere.

Roy


  1   2   >