Re: bind is'nt compilable MKKERBEROS=no

2024-03-10 Thread Havard Eidnes
> dependall ===> lib/../external/mpl/bind/lib/libdns
>  create  libdns/gssapictx.d
> /u/NetBSD/src.ks/external/mpl/bind/lib/libdns/../../dist/lib/dns/gssapictx.c:24:10:
>  fatal error: gssapi/gssapi.h: No such file or directory
>24 | #include 
>   |  ^
> compilation terminated.
> nbmkdep: compile failed.

Hm, yes...  Our in-tree BIND ships with a "static" config.h file
which has "#define HAVE_GSSAPI 1", and while BIND has a configure
option --without-gssapi which would probably make HAVE_GSSAPI not
be defined, the configure script isn't used in our setup.

You could try to comment out HAVE_GSSAPI in the include/config.h
file and see how that goes.

Regards,

- Håvard


syslog, ENOBUFS and non-C implementations

2024-03-05 Thread Havard Eidnes
Hi,

over the last couple of months I have seen at least two non-C
(rust and python) implementations of syslog() equivalent
functionality causing applications written in those languages to
become brittle.

The reason, I hear you ask?

In C, the return type of syslog() is void, so it can't return any
error.  Our C implementation makes a reasonable attempt at
re-trying in the face of OS-errors:

/*
 * If the send() failed, there are two likely scenarios:
 *  1) syslogd was restarted
 *  2) /dev/log is out of socket buffer space
 * We attempt to reconnect to /dev/log to take care of
 * case #1 and keep send()ing data to cover case #2
 * to give syslogd a chance to empty its socket buffer.
 */
for (tries = 0; tries < MAXTRIES; tries++) {
if (send(data->log_file, tbuf, cnt, 0) != -1)
break;
if (errno != ENOBUFS) {
disconnectlog_r(data);
connectlog_r(data);
} else
(void)usleep(1);
}

and if the number of retries is exceeded, our C code tries to
send the syslog message to the console instead.

However, the rust and python implementations have the possibility
of returning errors or raising exceptions, but the applications
using those syslog-like functions are evidently unprepared to
deal with any errors from that functionality, causing those
applications to exit if an error occurred during syslog'ing.

This has caused me to file

  https://github.com/Geal/rust-syslog/issues/79

which has not seen any activity or comments since I submitted it.
This issue caused the net/routinator program (an RPKI validator
written in rust) to exit if I had turned up the logging level
"too high" (to trigger the ENOBUFS condition) when running it on
NetBSD.  I worked around this issue by dialing down the syslog
level in my routinator configuration.


The python issue I'm having is that similarly, the
sysutils/py-borgmatic package is not prepared to handle errors
from syslog'ing, causing it to exit with this error message:

--- Logging error ---
Traceback (most recent call last):
  File "/usr/pkg/lib/python3.10/logging/handlers.py", line 987, in emit
self.socket.send(msg)
OSError: [Errno 55] No buffer space available

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/pkg/lib/python3.10/logging/handlers.py", line 991, in emit
self.socket.send(msg)
OSError: [Errno 55] No buffer space available
Call stack:
  File "/usr/pkg/bin/borgmatic", line 33, in 
sys.exit(load_entry_point('borgmatic==1.8.5', 'console_scripts', 
'borgmatic')())
  File "/usr/pkg/lib/python3.10/site-packages/borgmatic/commands/borgmatic.py", 
line 894, in main
logger.handle(log)

This is operationally troublesome, to say the least, for
potentially long-running programs.

Since this points to the syslog code inside python itself (in
this case python 3.10), I tried replicating parts of the C
semantics and eliminate raising exceptions in the syslog code by
applying the attached local patch.  I've not yet tried to submit
this one upstream, and I'm still testing it locally.


IMHO, having long-running programs become brittle just because
logging failed is just ... silly, and it appears that almost
every programmer out there expects the C semantics that
syslog()'ing never fails.

Secondly: is it something particular we are doing on the NetBSD
end of things which contributes to this problem?  Doesn't other
OSes return ENOBUFS if syslogd isn't able to keep up by consuming
the messages at the receiving end?


Other comments?


Regards,

- Håvard
$NetBSD$

Introduce code to re-try sending of log message up to
10 times, and drop messages if the retry count is exceeded,
instead of raising an error.  Calling code is seldom prepared
to handle exceptions from syslog-like functions, and become
needlessly brittle if syslog-ing can raise an exception.

--- Lib/logging/handlers.py.orig2024-03-05 08:27:17.479574742 +
+++ Lib/logging/handlers.py
@@ -25,6 +25,7 @@ To use, simply 'import logging.handlers'
 
 import io, logging, socket, os, pickle, struct, time, re
 from stat import ST_DEV, ST_INO, ST_MTIME
+import errno
 import queue
 import threading
 import copy
@@ -983,16 +984,39 @@ class SysLogHandler(logging.Handler):
 msg = msg.encode('utf-8')
 msg = prio + msg
 if self.unixsocket:
-try:
-self.socket.send(msg)
-except OSError:
-self.socket.close()
-self._connect_unixsocket(self.address)
-self.socket.send(msg)
+tries = 10
+while tries > 0:
+try:
+self.socket.send(msg)
+except OSError as err:
+tries -= 1
+  

Installing 10.0_RC1 using GPT/UEFI/RaidFrame

2023-11-29 Thread Havard Eidnes
Hi,

I recently had reason to install NetBSD on a new (to me) server.
This server had previously ran Linux/Debian with a software RAID
setup over two drives.

The dmesg of the server is visible at

  https://dmesgd.nycbug.org/index.cgi?do=view=7403

I wanted to continue to use software RAID with this host, but
also wanted to install NetBSD.  I also wanted to try out the new
features in sysinst, which allows you to configure a raidframe
setup without having to escape to the shell to do so.

The partitioning was already:

sd0 sd0: 558 GB, 241845 cyl, 3 head, 1615 sec, 512 bytes/sect x 1172123568 
sector
sd0: GPT GUID: 498d796a-a7cd-4fd4-85cc-0b825e15a291
  dk0 at sd0: "", 524288 blocks at 2048, type: msdos
  dk1 at sd0: "", 1171595264 blocks at 526336, type: raidframe
sd1: 558 GB, 241845 cyl, 3 head, 1615 sec, 512 bytes/sect x 1172123568 sectors
sd1: GPT GUID: df5880c8-f9b8-42f2-90e9-430371502ee5
  dk2 at sd1: "", 524288 blocks at 2048, type: msdos
  dk3 at sd1: "", 1171595264 blocks at 526336, type: raidframe

Well, this was after I ran the installer, of course, the type for
dk1 and dk3 was different from "raidframe" when I started.  The
dk0 and dk2 drives are UEFI boot partitions, and dk1 and dk3 are
(here) available for data / raidframe components.

Using sysinst's "Utility Menu" -> "Partition a disk", I could
first change the "type" of dk1 and dk3 to "RAID", and could then
use the menu entry "Create software RAID" to create a raid set,
adding the two drives and configure it as raid1 (mirror).

Somewhat unfortunately, when that's done, the raidframe set is
initialized "synchronously" in sysinst.  In my case I had to
leave the machine sitting for about 60 minutes before the next
step could be performed.  It could be argued that the
initialization could continue in the background (it does that
anyway) -- it's only the display of the progress which is
synchronous.

When it then came time to do the installation, "raid0" was now
visible as a possible choice for installation disk (which I
picked).  The installation of NetBSD itself was largely
uneventful, and followed a familiar and well-trodden path.

It then became time to try to boot from the newly installed
system.  This was when the first surprise hit.  Sysinst had not
dealt with the UEFI boot partitions, so the Debian boot bits were
still left in dk0 and dk2, and of course it brought up Grub, and
not NetBSD.  So I had to boot up using the USB key with the UEFI
image again, escape to the shell, and do

# mount -t msdos /dev/dk0 /mnt
# cd /mnt/EFI
# mkdir boot
# rm -rf debian
# cp /usr/mdec/bootx64.efi /mnt/EFI/boot/
# cd /
# umount /mnt

and repeat for dk2.

This gets us booting NetBSD.  However, the kernel when it comes
up says

WARNING: findroot: double match for boot device (bootinfo/bootwedge:sd0 
bootinfo/bootwedge:sd1)

and the kernel then proceeds to interactively ask for the root
file system partition, swap partition, and root file system type.
Obviously that is undesireable.

Thinking that the raidframe disk had a normal disk label, I
replied "raid0a", "raid0b" and defaulted the last, but that got
me errno=16 == EBUSY.  The kernel therefore asked once more.
If I insisted, and replied "raid0a", "raid0b" and default once
more, I got a kernel panic with "locking against myself". Fun.

However, what I had not noticed was that raid0 had been equipped
with a GPT partition table, so new dk-devices had been created
for the GPT partitions, and presumbly that caused the "EBUSY"
error.  Live and learn.

However, we still want to avoid having to interactively respond
to which file system to use for root.  At this point, several
paths are available, some of them not without hurdles.

1) It turns out that the raidframe set is not set to either of
   "softroot" or "forceroot".  It turns out that setting it to
   "softroot" via

   # raidctl -A softroot raid0

   is sufficient to get us booting directly into multi-user
   without console interaction.

   (Forceroot is best avoided, as it will force the root file
   system to the raid set even when you boot e.g. from a USB
   key for recovery, should the need arise.)

2) The above manual installation of NetBSD UEFI boot bits does
   not install a boot.cfg file, which could be done.  That file
   is apparently supposed to be in /EFI/NetBSD/boot.cfg in the
   UEFI boot file system.

   There is a possibility to label the wedges with a more
   user-friendly name (as I've done) via

   # gpt show raid0

   Pick up the 'index' value of the FFS partition, and label it
   and the swap partition -- in my case the FFS is '2' and swap
   is '3':

   # gpt label -l NetBSD-root -i 2 raid0
   # gpt label -l NetBSD-swap -i 3 raid0

   Verify with "gpt show -l raid0".

   These labels can apparently be used in /boot.cfg, like so:

   menu=Boot normally:rndseed /var/db/entropy-file;boot NAME=NetBSD-root:/netbsd

   and ... also in /etc/fstab.

   However, when you set your own labels, you *must* also edit
   

Re: new rust

2023-10-16 Thread Havard Eidnes
>>for i in 0..u64::MAX {
>>match libc::_cpuset_isset(i, set) {
>> [...]
>> but ... under which conditions would it seg-fault inside that
>> function?
>
> What's does the Rust impl. of _cpuset_isset() look like? Does it
> take ints by any chance and you're passing a u64 to it here. A C
> compiler will complain if you use `-m32', but, that's all. Don't
> know how the Rust FFI will handle this. That's all I can think
> of...

The relevant rust definitions were (from
vendor/libc/src/unix/bsd/netbsdlike/netbsd/mod.rs):

pub type cpuid_t = u64;

extern "C" {
pub fn _cpuset_isset(cpu: cpuid_t, set: *const cpuset_t) -> ::c_int;
}

Of these, the cpuid_t was wrong, because in C it is

typedef unsigned long   cpuid_t;

(from ), and that's a 32-bit type on ILP32 ports.
On such systems, seen from the 32-bit "actual" libc side, this
would cause rust to do the equivalent of _cpuset_isset(0, NULL),
which is of course going to cause an immediate NULL pointer
de-reference.

This is now all on the way to be fixed, since this pull request has
been accepted and applied upstream:

  https://github.com/rust-lang/libc/pull/3386

and I've applied this patch to the various "rust libc*" versions
vendored inside rust, and have re-built the 1.72.1 bits with this
fix as well.

>> Debugging the C program reveals that pthread_getaffinity_np() has
>> done exactly nothing to the "cset" contents as near as I can
>> tell, the "bits" entry doesn't change.
>
> pthread_getaffinity_np() _can_ be used to get the no. of "online"
> CPUs on both Linux and FreeBSD, but it looks (from my perusal just
> now) like threads default to no affinity on NetBSD and the scheduler
> just picks whatever CPUs available for it--unless the affinity is
> explicitly set, in which case it's inherited.
>
> I think you should just use sysconf(_SC_NPROCESSORS_ONLN) or the
> equivalent on NetBSD.

That threads default to no affinity on NetBSD matches what I'm
seeing and hearing.  However, the affinity set *can* be tweaked
by schedctl (which appears to require root privileges).

The fallback code in rust already does as you suggest: if the
probe for the number of CPUs the thread has affinity to is 0, the
code probes for _SC_NPROCESSORS_ONLN, and if that returns < 1,
then probes for HW_NCPU.

Regards,

- Håvard


Re: new rust (was: gdb issues?)

2023-10-11 Thread Havard Eidnes
> Program terminated with signal SIGSEGV, Segmentation fault.
...
> #0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
> #1  0x03d2bf8c in std::sys::unix::thread::available_parallelism ()

...

> At least it gives a bit of clue about where to go looking for the
> null pointer de-reference, so that's at least something...

This gets me to

work/rustc-1.73.0-src/library/std/src/sys/unix/thread.rs

which says:

#[cfg(target_os = "netbsd")]
{
unsafe {
let set = libc::_cpuset_create();
if !set.is_null() {
let mut count: usize = 0;
if libc::pthread_getaffinity_np(libc::pthread_self(), 
libc::_cpuset_size(set), set) == 0 {
for i in 0..u64::MAX {
match libc::_cpuset_isset(i, set) {
-1 => break,
0 => continue,
_ => count = count + 1,
}
}
}
libc::_cpuset_destroy(set);
if let Some(count) = NonZeroUsize::new(count) {
return Ok(count);
}
}
}
}

which on the surface looks innocent enough, and this is as near
as I can tell the same code as in rust 1.72.1, while the code in
1.71.1 is different, and falls back to using sysctl with this
code (the bootstrap program may be linked with the "old" standard
library, so the problem may have been in 1.72.1 too):

let mut cpus: libc::c_uint = 0;
let mut cpus_size = crate::mem::size_of_val();

unsafe {
cpus = libc::sysconf(libc::_SC_NPROCESSORS_ONLN) as 
libc::c_uint;
}

// Fallback approach in case of errors or no hardware threads.
if cpus < 1 {
let mut mib = [libc::CTL_HW, libc::HW_NCPU, 0, 0];
let res = unsafe {
libc::sysctl(
mib.as_mut_ptr(),
2,
 cpus as *mut _ as *mut _,
 cpus_size as *mut _ as *mut _,
ptr::null_mut(),
0,
)
};

// Handle errors if any.
if res == -1 {
return Err(io::Error::last_os_error());
} else if cpus == 0 {
return Err(io::const_io_error!(io::ErrorKind::NotFound, 
"The number of hardware threads is not known for the target platform"));
}
}
Ok(unsafe { NonZeroUsize::new_unchecked(cpus as usize) })

(Actually, the fallback code is there in 1.73.0 and 1.72.1 too,
it's just not used due to the addition of the netbsd-specific
section above...)

The cpuset(3) man page says

 cpuset_isset(cpu, set)
  Checks if CPU specified by cpu is set in the CPU-set set.
  Returns the positive number if set, zero if not set, and -1 if
  cpu is invalid.

but ... under which conditions would it seg-fault inside that function?
Looking at the C code in common doesn't reveal anything frightening...

However, an attempt at a trivial re-implementation "to count
CPUs" in this manner in C does not trigger this issue on any of
my "problematic" platforms (or on amd64 for that matter):

#include 
#include 
#include 

int
main(int argc, char **argv)
{
int count = 0;
cpuset_t *cset;
int i;
int ret;

cset = cpuset_create();
if (cset != NULL) {
cpuset_zero(cset);
if (pthread_getaffinity_np(pthread_self(),  
cpuset_size(cset),
cset) == 0)
{
for (i = 0; i<256; i++) {
ret = cpuset_isset(i, cset);
if (ret == -1)
break;
if (ret == 0)
continue;
count++;
}
}
}
printf("cpus: %d\n", count);
return 0;
}

but also fails to count the number of CPUs (prints 0). So what
am I (and/or rust) doing wrong?  Or ... is this code simply wrong
anyway, and we need to re-instate the 1.71.1 code path by ripping
out the NetBSD-specific section quoted above?

Meanwhile, the warning in the pthread_getaffinity_np man page is
ignored:

 Portable applications should not use the pthread_setaffinity_np() and
 pthread_getaffinity_np() functions.

Although it could perhaps be argued that rust isn't all that
portable..., and perhaps in particular this piece of code?

Debugging the C program 

Re: gdb issues?

2023-10-11 Thread Havard Eidnes
Hi,

following up on my own message, I finally had the presence of
mind to look at what gdb on armv7 would tell me, if anything,
because that build failed as well.

And... it tells quite a bit more than the other two:

armv7: {2} gdb 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap 
work/rustc-1.73.0-src/bootstrap.core
GNU gdb (GDB) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "armv7--netbsdelf-eabihf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
.
Find the GDB manual and other documentation resources online at:
.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap...
[New process 1]
Core was generated by `bootstrap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
warning: Unsupported auto-load script at offset 0 in section .debug_gdb_scripts
of file 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) where
#0  0x60d0fe74 in _cpuset_isset () from /usr/lib/libc.so.12
#1  0x03d2bf8c in std::sys::unix::thread::available_parallelism ()
#2  0x03cff460 in std::thread::available_parallelism ()
#3  0x0383ed74 in ::augment_args::DEFAULT_VALUE::{{closure}} () at 
flags.rs:110
#4  0x0347d7a8 in core::ops::function::FnOnce::call_once ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.72.0-src/library/core/src/ops/function.rs:250
#5  0x0347f29c in core::ops::function::FnOnce::call_once ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.72.0-src/library/core/src/ops/function.rs:250
#6  0x033f87c8 in once_cell::sync::Lazy::force::{{closure}} ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1212
#7  0x033f8be0 in once_cell::sync::OnceCell::get_or_init::{{closure}} ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1023
#8  0x0383c624 in once_cell::imp::OnceCell::initialize::{{closure}} ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/imp_std.rs:85
#9  0x03cdf5d8 in core::ops::function::impls:: for  F>::call_mut ()
#10 0x03ce0d98 in once_cell::imp::initialize_or_wait ()
#11 0x0383c010 in once_cell::imp::OnceCell::initialize ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/imp_std.rs:81
#12 0x033f9880 in once_cell::sync::OnceCell::get_or_try_init ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1063
#13 0x033f89b0 in once_cell::sync::OnceCell::get_or_init ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1023
#14 0x033f86a8 in once_cell::sync::Lazy::force ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1211
#15 0x033f8580 in  as 
core::ops::deref::Deref>::deref ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/once_cell-1.12.0/src/lib.rs:1221
#16 0x03414ff4 in ::augment_args () at flags.rs:110
#17 0x0340eee0 in ::command () at flags.rs:33
#18 0x0383d784 in clap_builder::derive::Parser::parse_from ()
at 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/vendor/clap_builder-4.2.4/src/derive.rs:52
#19 0x0340e7a4 in bootstrap::flags::Flags::parse () at flags.rs:199
#20 0x033d73c8 in bootstrap::config::Config::parse_inner () at config.rs:1117
#21 0x03681018 in bootstrap::config::Config::parse () at config.rs:1113
#22 0x03381578 in bootstrap::main () at bin/main.rs:20
(gdb) i reg
r0 0x0 0
r1 0x0 0
r2 0x0 0
r3 0x1 1
r4 0x0 0
r5 0x60ed41e8  1626161640
r6 0x1 1
r7 0x0 0
r8 0x0 0
r9 0x7ff64328  2146845480
r100x3d61425   64361509
r110x7ff642cc  2146845388
r120x7ff642d0  2146845392
sp 0x7ff642c0  0x7ff642c0
lr 0x3d2bf8c   64143244
pc 0x60d0fe74  0x60d0fe74 <_cpuset_isset+36>
cpsr   0x20030010  537067536
(gdb) x/i 0x60d0fe74
=> 0x60d0fe74 <_cpuset_isset+36>:   ldr r3, [r1, r2, lsl #2]
(gdb) 

At least it gives a bit of clue about where to go looking for the
null pointer de-reference, so that's at least something...

Meanwhile, the arm64/9.0 

gdb issues?

2023-10-10 Thread Havard Eidnes
Hi,

I have recently had a bear of a time getting the new rust which
landed in pkgsrc-wip the other day to build natively on several
of the targets we support for NetBSD.

The problem is that the "bootstrap" program (a rust executable)
lands on its nose with a SIGSEGV, and dumps core (without leaving
a discernible error message in the build log, so I had to ktrace
to find *that* out, argh!)

However, it appears that gdb has problems dealing with the
combination of the executable and the core file.  I see similar
problems on the following platforms:  NetBSD/macppc 10.0_BETA and
NetBSD/i386 9.3.

I'm beginning to wonder if it's my "gdb driving skills" which are
lacking, or whether it really works this poorly in other NetBSD
contexts as well...

The symptom looks like this on macppc 10.0_BETA:

: {18} gdb 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap 
work/rustc-1.73.0-src/bootstrap.core
GNU gdb (GDB) 11.0.50.20200914-git
...
Reading symbols from 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap...
[New process 19376]

warning: Error reading shared library list entry at 0x4b

warning: Error reading shared library list entry at 0x4b
Core was generated by `bootstrap'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0xfdc52444 in ?? ()
warning: Unsupported auto-load script at offset 0 in section .debug_gdb_scripts
of file 
/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap.
Use `info auto-load python-scripts [REGEXP]' to list them.
(gdb) i reg
r0 0xbe2b1812462872
r1 0xfffd4bf0  4294790128
r2 0xfdbbd008  4256944136
r3 0x0 0
r4 0x0 0
r5 0xfdedc1f8  4260217336
r6 0xfdedc1f8  4260217336
r7 0x0 0
r8 0x3654
r9 0x0 0
r100x1 1
r110xfdc52408  4257555464
r120xfdef9400  4260336640
r130xf9fda016383392
r140xc37e7412811892
r150x8 8
r160xc37f3912812089
r170xc 12
r180xc37f4512812101
r190xb 11
r200xc37f5012812112
r210x5 5
r220xc37f5512812117
r230x1117
r240xc37f6612812134
r250x0 0
r260x1 1
r270x0 0
r280xfdedc1f8  4260217336
r290xfffd4c80  4294790272
r300xfde6c584  4259759492
r310x4 4
pc 0xfdc52444  0xfdc52444
msr
cr 0x42000248  1107296840
lr 0xfdc52414  0xfdc52414
ctr0xfdc52408  4257555464
xer0x0 0
fpscr  0xfff8  -524288
vscr   
vrsave 
(gdb) i target

Symbols from 
"/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/build/bootstrap/debug/bootstrap".
Local core dump file:
`/usr/pkgsrc/wip/rust/work/rustc-1.73.0-src/bootstrap.core', 
file type elf32-powerpc.
0x0001 - 0x00f29000 is load0
0x00f41000 - 0x00fa is load1
0x00fa - 0x00fa03c8 is load2a
0x00fa03c8 - 0x00fa1000 is load2b
0xfd60 - 0xfd608128 is load3a
0xfd608128 - 0xfd61 is load3b
0xfd61 - 0xfd80 is load4
0xfda18000 - 0xfda2c000 is load5
0xfda2c000 - 0xfda2d2dc is load6a
0xfda2d2dc - 0xfda4c000 is load6b
0xfda4c000 - 0xfda4d41c is load7a
0xfda4d41c - 0xfda58000 is load7b
0xfda58000 - 0xfda5834c is load8a
0xfda5834c - 0xfda74000 is load8b
0xfda74000 - 0xfda765c0 is load9a
0xfda765c0 - 0xfda88000 is load9b
0xfda88000 - 0xfda88384 is load10a
0xfda88384 - 0xfda8c000 is load10b
0xfda8c000 - 0xfda8cb7c is load11a
0xfda8cb7c - 0xfda98000 is load11b
0xfda98000 - 0xfda981b4 is load12a
0xfda981b4 - 0xfdab4000 is load12b
0xfdab4000 - 0xfdab52e0 is load13a
0xfdab52e0 - 0xfdac8000 is load13b
0xfdac8000 - 0xfdac85bc is load14a
0xfdac85bc - 0xfdad4000 is load14b
0xfdad4000 - 0xfdad414c is load15a
0xfdad414c - 0xfdaf is load15b
0xfdaf - 0xfdaf04c4 is load16a
0xfdaf04c4 - 0xfdaf4000 is load16b
0xfdaf4000 - 0xfdaf407c is load17a
0xfdaf407c - 0xfdaf8000 is load17b
0xfdaf8000 - 0xfdaf8278 is load18a
0xfdaf8278 - 0xfdafc000 is load18b
0xfdafc000 - 0xfdafc23c is load19a
0xfdafc23c - 0xfdb1 is load19b
0xfdb1 - 0xfdb10120 is load20a

Re: NetBSD 10.0 timeline and branch status

2023-09-10 Thread Havard Eidnes
>> Unfortunately the additional shared library changes require
>> another round of package rebuilds from scratch.  Everyone
>> building packages against netbsd-10: please start a new round
>> from scratch.
>
> Does that mean the pkgsrc-2023Q2 binary packages for 10_BETA 2
> that have been published recently are useless on a new 10_BETA
> install?
>
> That's too bad. I was looking forward to using those packages
> to set up some new CI build machines. Should I wait for the
> 2023Q3 builds then?

With the caveat that I've not verified this myself, I *think*
this depends on how you got to where you are at the moment.  If
your current up-to-date 10.0 BETA using openssl 3 was initially
installed in the pre-openssl3 era and subsequently upgraded, the
old major versions of the shared libraries should still be
installed, and you should then in theory be able to use packages
from a binary package repository from the pre-openssl3 era.

If, on the other hand you're talking about a fresh install of a
10.0 BETA with openssl 3, you will not be able to use binary
packages built in the pre-openssl3 era.

In theory we could consider preparing a "compat 10.0 beta pre-
openssl3" package similar to compat90 etc., but I'm a little
uncertain whether we have "archived binaries" to produce these
from...

Regards,

- Håvard


Re: Building old systems

2023-04-20 Thread Havard Eidnes
>> Indeed, this (without -O) works.  The key is the HOST_CFLAGS
>> variable; I was thinking of just CFLAGS at first.
> 
> I have had some luck with compiling old systems with -V
> HOST_CFLAGS=-fcommon.
> 
> That only goes so far into the past, however.  I thought the
> next step would be to try building even older systems with the
> compiler from oldest successful build.  So, I tried setting
> HOST_CC and HOST_CXX to point to an oldish, but successfully
> built toolset that also successfully compiled its own kernel.
> That is, I run build.sh with -V
> HOST_CC=/path/to/oldish/tooldir/bin/armv7--netbsdelf-eabihf-gcc
> (and similarly for HOST_CXX), where those compilers are not the
> native systems compilers but ones used in a successful kernel
> build for a checkout somewhat after the target checkout.
>
> That has led to errors like the following:
>
>   /path/to/oldish/tooldir/bin/ld: cannot find crt0.o: No such file or 
> directory
>
> I'm guessing this means that some other environment variables
> are not set so the compiler is looking in the wrong place, but
> BUILDING is not helping me think of what that would be.  All
> paths are absolute, except they do include ".." In them.
>
> What should I be configuring to make sure that a tooldir
> compiler is usable?

What I think I would do would be to use HOST_CFLAGS=-fcommon to
do a "./build.sh ... tools" build.  This should, at least in
theory, result in a self-consistent and hopefully working
toolchain of the same vintage as the kernel you are trying to
build.  Try this especially if it's somewhat uncertain where the
/path/to/oldish/tooldir was built from.

Best regards,

- Håvard


Re: Building old systems

2023-04-19 Thread Havard Eidnes
>> It might be better to use corresponding older tools to build older
>> kernels.  Modern gcc switched to -fno-common by default, so if you
>> want to compile an older kernel that has multiple variable definitions
>> you will need to arrange for -fcommon option to be used.
> 
> Is there any way to do that with a current system and build.sh?
> Neither setting it with -V or with env works.

Just glancing at build.sh from -current here, but since this is while
building nbmake, it's early in the process, and it's the host's
toolchain you will be using by necessity.

You don't say which variable you tried setting via -V (I'm wondering a
little what it is with folks who won't be concrete...).

It could be that you should try with

-V HOST_CFLAGS="-O -fcommon"

or possibly with a variant with

-V HOST_LDFLAGS=-fcommon

possibly as an addition, possibly as the only "extra" setting, since
IIUC the -fcommon flag influences the behaviour during linking.

Regards,

- Håvard


clang-built NetBSD and rust

2023-03-27 Thread Havard Eidnes
Hi,

a user contacted me about having a freshly installed version of
NetBSD-current for amd64 built with clang, and a failure to run
the provided "bootstrap kit" for rust, with the following error:

/usr/lib/libgcc_s.so.1: version GCC_3.3 required by 
/tmp/pkgsrc/wip/rust/work/rust-bootstrap/bin/cargo not found

Now, the NetBSD/amd64 bootstrap is built "upstream", i.e. it is
not one of the kits I maintain for NetBSD, and I'm pretty certain
that bootstrap kit is built on NetBSD 8.x:

: {115} pwd
/usr/pkgsrc/wip/rust/work/rust-bootstrap/bin
: {116} file cargo
: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, 
interpreter /usr/libexec/ld.elf_so, for NetBSD 8.0, with debug_info, not 
stripped
: {117} ldd cargo
cargo:
-lpthread.1 => /usr/lib/libpthread.so.1
-lc.12 => /usr/lib/libc.so.12
-lgcc_s.1 => /usr/lib/libgcc_s.so.1
-lm.0 => /usr/lib/libm.so.0
: {118} uname -ps
NetBSD x86_64
: {119}

(That's the 1.67.1 bootstrap kit, used for 1.68.*.)

Now, I'm pretty sure this "bootstrap kit" has already been built with
the equivalent of "CONFIGURE_ARGS+= --enable-cargo-native-static"
(which is in our options.mk), which links nghttp2 and curl statically,
but this does not eliminate the dynamic reference to libgcc_s.so.1.

I've looked at

  ftp://nyftp.netbsd.org/pub/NetBSD-daily/HEAD-llvm/latest/amd64/binary/sets/

and specifically the base.tar.xz file, and it doesn't look like it has
libgcc_s at all, but apparently MKGCC=yes will build it, but that
doesn't appear to be the default (which is probably intentional).

So ... what should I/we do about this?  Do we need a separate rust
bootstrap kit built with clang and built with a clang-built "target
root"?  It looks like there's no netbsd-9 nor netbsd-10 built with
clang, only HEAD?

Regards,

- Håvard


Re: How to build only one part of NetBSD system

2023-01-22 Thread Havard Eidnes
>> As always before such an operation, "do the kernel first".
>
> How do you do the kernel first without building the userland to
> build the updated tools?

The "do the kernel first" is sort of a "general warning".
Whether it is strictly needed depends on what version user-land
and what version source you are trying to mix.

Ultimately, if you e.g. install a -current libc and still run a
9.3 kernel, I predict that you are going to have a hard time
recovering from the mistake.

As for the build error, it's difficult to say.  Building the
kernel typically requires that it *not* be done against the
(possibly old) headers of the running system, and possibly also
with the compiler and tools in the "tools" build.sh result.

It's difficult to know what failed in your case without
significantly more details.

Regards,

- Håvard


Re: How to build only one part of NetBSD system

2023-01-21 Thread Havard Eidnes
>> I tried going into libexec/ld.elf_so and running "make
>> install" but that didn't work or even come close.
>
> It would be something like:
>
>   cd src/libexec/ld.elf_so
>   ${TOOLDIR}/bin/nbmake-${arch} dependall
>   ${TOOLDIR}/bin/nbmake-${arch} install

and if the tool nbmake was built with a different DESTDIR
configured e.g. via "build.sh -D ", you would need

${TOOLDIR}/bin/nbmake-${arch} DESTDIR= dependall
${TOOLDIR}/bin/nbmake-${arch} DESTDIR= install

to build this against the currently installed system and to
install it.

As always before such an operation, "do the kernel first".

Regards,

- Håvard


Re: macppc system wedging under memory pressure

2022-09-09 Thread Havard Eidnes
Well,

following up on my own posting of yesterday evening.

There's good and not so good news: the good news is that my G4
Mac Mini running -current finally managed to build rust-1.62.1
from pkgsrc-current (using llvm from pkgsrc, not the internal
one).  The bad news is that I don't have a definitive explanation
of what caused my earlier problems, even though I'm pretty sure
it was VM-related.

Based at least partially on suggestions from fellow NetBSD
developers, I've made the following adjustments to this host's
setup:

Reduced kern.maxvnodes from the default of around 55000 to 1
(that's from an earlier similar experience from the i386 port).

I had earlier added 1GB swap space as a file which I removed as
swap space using swapctl.  Swap space ties up some part of
physical memory to keep track of the swap space.  (I already had
a 2GB swap partition which is sufficient, it turns out.)

I made the following adjustments to vm settings:

vm.filemax=20  (down from 50)
vm.filemin=5   (down from 10)
vm.execmin=5   (hm, already at 5?)
vm.anonmax=50  (down from 80)
vm.anonmin=5   (down from 10)

Apparently the *min values is what made the difference, I beleive
I made the *max adjustments earlier without succeeding.  Ref.
info in the sysctl(7) man page.

Regards,

- Håvard


macppc system wedging under memory pressure

2022-09-08 Thread Havard Eidnes
Hi,

I'm running NetBSD-current on one of my 1G Mac Mini G4 systems,
doing pkgsrc bulk building.

This go-around I've managed to build llvm, and next up is rust.  This
is proving to be difficult -- my system will consistently wedge it's
user-land (still responding to ping, no response on the console or any
ongoing ssh sessions; well, not entirely correct, it will echo one
carriage-return on the console with a newline, but then that is wedged
as well).  Also, I have still not managed to break into DDB on this
system, so each and every time I have to power-cycle the box.  This
also means that all I have to go on is output from "top -s 1", "vmstat
1" and "systat vm", and this is the latest information I got from
these programs when it wedged just now:

load averages:  1.10,  1.13,  1.05;   up 0+02:01:4521:59:52
103 threads: 5 idle, 6 runnable, 90 sleeping, 1 zombie, 1 on CPU
CPU states:  1.0% user,  5.9% nice, 93.1% system,  0.0% interrupt,  0.0% idle
Memory: 559M Act, 274M Inact, 12M Wired, 186M Exec, 162M File, 36K Free
Swap: 3026M Total, 80M Used, 2951M Free / Pools: 134M Used

  PID   LID USERNAME PRI STATE   TIME   WCPUCPU NAME  COMMAND
 6376 26281 1138  78 RUN 2:03 89.10% 88.96% rustc rustc
0   109 root 126 pgdaemon0:20 15.48% 15.48% pgdaemon  [system]
  733   733 he85 poll0:14  2.93%  2.93% - sshd
  164   164 he85 RUN 0:06  1.17%  1.17% - systat

Notice the rather small amount of "Free" memory, and the rather
high rate of system CPU.  The "vmstat 1" output for the last few
seconds:

 procsmemory  page   disk faults  cpu
 r b  avmfre  flt  re  pi   po   fr   sr w0   in   sy  cs us sy id
 1 0   634804   4164 1869   0   00 1358 1358  0  2800 425 97  3  0
 3 0   637876   1016  786   0   0000  0  2130 410 99  1  0
 2 0   636336   2512  816   4   00 1192 1202  0  3260 508 98  2  0
 2 0   633448   5456  617   0   00 1355 1371  0  2280 374 99  1  0
 2 0   634964   3780  430   0   0000  0  2500 452 98  2  0
 2 0   635988   2740  260   0   0000  0  2610 496 98  2  0
 2 0   637396   1376  386   0   0000  0  3000 459 97  3  0
 2 0   634912   4060  775   0   00 1354 1354  0  1900 245 100 0 0
 2 0   636940   2308  437   0   0000  0  2500 415 100 0 0
 2 0   637912   1064  473   0   0000  0  2510 406 100 0 0
 2 0   633580   5408  175   0   00 1262 1270  0  2540 403 99  1  0
 2 0   637288   1740 1002   0   0000  0  2780 521 97  3  0
 2 0   634340   4324  713   0   00 1354 1357  0  2960 471 96  4  0
 2 0   636388   2160  540   0   0000  0  2160 361 98  2  0
 2 0   637412   1116  258   0   0000  0  2540 405 98  2  0
 2 0   637556   4872  178  12   0  996 1122 42861  4  3070 442 30 70  0
 2 0   638064   9620 1105   3   0 1228 1228 2305 70  4110 667 19 81  0
 2 0   639624   7416  550   0   0000  0  3190 584 97  3  0
 2 0   644744   2200 1299   0   0000  0  2790 416 93  7  0
 6 0   646924   2716  537   0   0 1356  672 2403 14  4120 497 35 65  0
 4 0   654792 36 2022  32   0 1354 1366 7910 91  2410 6735 7 93  0

while "systat vm" doesn't really give any more information than
the above:

6 usersLoad  1.10  1.13  1.05  Thu Sep  8 21:59:51

Proc:r  d  sCsw  Traps SysCal  Intr   Soft  Fault PAGING   SWAPPING
 8 3355471  302 75398 in  out   in  out
ops64
  68.2% Sy   0.0% Us  31.8% Ni   0.0% In   0.0% Idpages  1027
|||||||||||
==forks
  fkppw
Anon   509096  50%   zero472 Interrupts   fksvm
Exec   190804  18%   wired   12000   100 cpu0 clock   pwait
File   166072  16%   inact  280984   openpic irq 29   relck
Meta82832   2%   bufs 6500   openpic irq 63   rlkok
 (kB)real   swaponly  free38 openpic irq 39 1 noram
Active 570368  73812  2716   openpic irq 4011 ndcpy
Namei Sys-cache Proc-cache   167 openpic irq 41   fltcp
Calls hits% hits %   167 gem0 interrupts  397 zfod
66  100   cow
  256 fmin
  Disks: cd0 wd0  341 ftarg
 seeksitarg
 xfers14

Re: CVS commit: src

2022-05-25 Thread Havard Eidnes
>> All of these applications depends on the "MROUTING" kernel option,
>> it seems, which is mostly default-off, except for a few (tending
>> on the more obscure side) kernel configs. I wonder if anyone
>> knows the history there.
>
> I'm not really sure why MROUTING is default off [...]

Isn't MROUTING the kernel support to act as a multicast router,
typically by using the DVMRP protocol, implemented in mrouted.

People operating multicast networks today typically do that using
real routers and variants of PIM.  I don't think NetBSD ever got
the ability to do PIM, nor IGMP3 for that matter, I vaguely seem
to recall there being licensing issues involved, at least for the
latter.

Typically, MROUTING isn't needed to *use* most of the non-
routing-focused multicast-based tools, e.g. I run a dbeacon
instance with just a normal GENERIC kernel using any-source PIM.

Regards,

- Håvard


ATA / IDE drive details

2022-02-19 Thread Havard Eidnes
Hi,

lately I have been dusting off and testing a Cobalt RAQ2 I bought
a while ago.  Fan's been replaced (the old one spun only
intermittently and made noises...)  It has a

viaide0: VIA Technologies VT82C586 (Apollo VP) ATA33 controller

and an internal 40-pin connector for a normal 3.5" PATA drive.

It seems that this system is a bit finicky with what drives it
wants to cooperate fully with.  It came with a Quantum Fireball
4.3G drive, which works.  I've tried a couple of other spinning
drives (a 40G WD drive, a 750G drive), where the install goes OK,
but the system doesn't succeed in booting from them (for unknown
reasons).

I've also tried an SD-to-IDE adapter and also a 40-to-44-pin
adapter with both a laptop drive with spinning disk, and an mSATA
in a 44-pin adapter.  Some of these work better than others --
the mSATA variant downgrades pretty quickly to "no DMA" (while
reporting "lost interrupt"), while I had to do three or four
attempts with the laptop drive before the installation would
complete without errors.

During this process it's of course relevant to know what transfer
modes the drive and controller combination supports, and what's
used.  However, in the console log, I typically find

[   4.0899446] wd0 at atabus0 drive 0
[   4.0899446] wd0: 
[   4.1006106] wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 
312581808 sectors

while if I log in and do "dmesg", more is revealed:

[ 4.089945] wd0 at atabus0 drive 0
[ 4.089945] wd0: 
[ 4.100611] wd0: drive supports 16-sector PIO transfers, LBA48 addressing
[ 4.100611] wd0: 149 GB, 310101 cyl, 16 head, 63 sec, 512 bytes/sect x 
312581808 sectors
[ 4.109958] wd0: 32-bit data port
[ 4.109958] wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 
(Ultra/100)
[ 4.109958] wd0(viaide0:0:0): using PIO mode 4, Ultra-DMA mode 2 (Ultra/33) 
(using DMA)

These extra lines are typically printed with "aprint_verbose()"
variants in sys/dev/ata/ata.c.  My question is basically: is
there a good reason why these extra lines are not instead printed
with the "aprint_normal()" variants?

I'm not sure there is a possibility to turn on verbose booting
with the cobalt, to get these also in the console log.  And then
the user needs to know that that's needed to get these "extra"
lines in the console log.

This certainly caught me by surprise, and confused me while
trying to understand if there's a pattern of what works and what
doesn't.

Regards,

- Håvard


Re: HP DL380p Gen8 interrupt storm

2021-09-03 Thread Havard Eidnes
> Try booting with the CD drive disabled (via userconf)

with "disable cd*", basically no change:

stest: {5} vmstat -i
interrupt  total   rate
TLB shootdown   1693 28
cpu0 timer  5716 95
msix2 vec 0 1465 24
msix6 vec 0 2438 40
ioapic0 pin 2145  0
ioapic0 pin 2052  0
ioapic0 pin 17  1095 181481
ioapic0 pin 4   1136 18
Total   10901440 181690

stest: {6} 

With "disable pciide*" as well, well, gone:

stest: {1} vmstat -i
interrupt   total rate
TLB shootdown1736   27
cpu0 timer   6120   95
msix2 vec 0  2168   33
msix6 vec 0  2559   39
ioapic0 pin 21 440
ioapic0 pin 20 831
ioapic0 pin 41153   18
Total   13863  216

stest: {2} 

Seems disk I/O is quite a bit snappier now.

Looks like a custom kernel is in the cards for this one.

(I had to sneak in "-c" via /boot.cfg, because the second-stage
boot loader didn't want to listen to my serial console input,
while the first-stage boot selector did...)

As for the actual root cause: input sought.

Thanks,

- Håvard


HP DL380p Gen8 interrupt storm

2021-09-03 Thread Havard Eidnes
Hi,

one machine I'm testing NetBSD on feels sort of sluggish, which
is strange because it's got lots of RAM (128GB) and a pair of
Xeon(R) CPU E5-2650 CPUs, for a total of 16 physical cores and 32
with hyperthreading.

It looks like one of the CPUs is using most of its time doing
interrupt processing, "systat vm" often shows * in "Intr" and
I have a constant buzz of 6.3% System CPU:

Proc:r  d  sCsw  Traps SysCal  Intr   Soft  Fault PAGING   SWAPPING
 1 7557281355 * 64277 in  out   in  out
ops
   6.3% Sy   0.0% Us   0.0% Ni   0.1% In  93.6% Idpages
|||||||||||
=== 2 forks
2 fkppw
Checking further:

stest: {8} vmstat -i
interrupt   total   rate
TLB shootdown 4677209  0
cpu0 timer 1046424629 99
msix2 vec 0  62702425  5
msix6 vec 0   3294854  0
ioapic0 pin 21 84  0
ioapic0 pin 20   21074226  2
ioapic0 pin 17  3344590700017 319462
ioapic0 pin 4   12722  0
Total   3345728886166 319570

stest: {9} grep 'ioapic0 pin 17' /var/run/dmesg.boot
pciide0: using ioapic0 pin 17 for native-PCI interrupt
stest: {10}

pciide0 only has the built-in CD drive, if I see correctly.

Full dmesg attached below.

Any hints about what's going on and how to further diagnose and
eventually cure it?

Regards,

- Håvard
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020, 2021 The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 9.99.81 (GENERIC) #2: Wed May  5 11:59:20 UTC 2021
h...@stest.urc.uninett.no:/usr/obj/sys/arch/amd64/compile/GENERIC
total memory = 127 GB
avail memory = 123 GB
entropy: entering seed from bootloader with 256 bits of entropy
entropy: ready
timecounter: Timecounters tick every 10.000 msec
Kernelized RAIDframe activated
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
HP ProLiant DL380p Gen8
mainbus0 (root)
ACPI: RSDP 0x000F4F00 24 (v02 HP)
ACPI: XSDT 0xBDDAED00 EC (v01 HP ProLiant 0002 ??   
162E)
ACPI: FACP 0xBDDAEE40 F4 (v03 HP ProLiant 0002 ??   
162E)
Firmware Warning (ACPI): Invalid length for FADT/Pm1aControlBlock: 32, using 
default 16 (20210331/tbfadt-742)
Firmware Warning (ACPI): Invalid length for FADT/Pm2ControlBlock: 32, using 
default 8 (20210331/tbfadt-742)
ACPI: DSDT 0xBDDAEF40 0026DC (v01 HP DSDT 0001 INTL 
20030228)
ACPI: FACS 0xBDDAC140 40
ACPI: SPCR 0xBDDAC180 50 (v01 HP SPCRRBSU 0001 ??   
162E)
ACPI: MCFG 0xBDDAC200 3C (v01 HP ProLiant 0001  
)
ACPI: HPET 0xBDDAC240 38 (v01 HP ProLiant 0002 ??   
162E)
ACPI:  0xBDDAC280 64 (v02 HP ProLiant 0002 ??   
162E)
ACPI: SPMI 0xBDDAC300 40 (v05 HP ProLiant 0001 ??   
162E)
ACPI:  0xBDDAC340 000230 (v01 HP ProLiant 0001 ??   
162E)
ACPI: APIC 0xBDDAC580 00026A (v01 HP ProLiant 0002  
)
ACPI: SRAT 0xBDDAC800 000750 (v01 HP Proliant 0001 ??   
162E)
ACPI:  0xBDDACF80 000176 (v01 HP ProLiant 0001 ??   
162E)
ACPI:  0xBDDAD100 30 (v01 HP ProLiant 0001 ??   
162E)
ACPI:  0xBDDAD140 BC (v01 HP ProLiant 0001 ??   
162E)
ACPI: DMAR 0xBDDAD200 000558 (v01 HP ProLiant 0001 ??   
162E)
ACPI:  0xBDDAEC40 30 (v01 HP ProLiant 0001  
)
ACPI: PCCT 0xBDDAEC80 6E (v01 HP Proliant 0001 PH   
504D)
ACPI: SSDT 0xBDDB1640 0007EA (v01 HP DEV_PCI1 0001 INTL 
20120503)
ACPI: SSDT 0xBDDB1E40 000103 (v03 HP CRSPCI0  0002 HP   
0001)
ACPI: SSDT 0xBDDB1F80 98 (v03 HP CRSPCI1  0002 HP   
0001)
ACPI: SSDT 0xBDDB2040 00038A (v02 HP riser0   0002 INTL 
20030228)
ACPI: SSDT 0xBDDB2400 000536 (v03 HP riser1a  0002 INTL 
20030228)
ACPI: SSDT 0xBDDB2940 000537 (v03 HP riser2a  0002 INTL 
20030228)
ACPI: SSDT 0xBDDB2E80 000BB9 (v01 HP pcc  0001 INTL 
20120503)
ACPI: SSDT 0xBDDB3A40 000377 (v01 HP pmab 0001 INTL 
20120503)
ACPI: SSDT 0xBDDB3DC0 005524 (v01 HP pcc2 0001 INTL 
20120503)
ACPI: SSDT 0xBDDB9300 004604 (v01 INTEL  PPM RCM  0001 INTL 
20061109)
ACPI: 11 ACPI AML tables successfully acquired and loaded

Re: Problem reports for version control systems

2021-05-02 Thread Havard Eidnes
> I suspect what is commonly the problem here is related to the fact
> that cvs has such a phase at the beginning where it is scanning
> through the file system, which can take quite a while. Some NAT
> devices along the path sometimes have timeouts on existing connections
> that if no traffic is happening for a while, they are dropped, even
> though there hasn't been any FINs on the connection.
> So a connection that just don't have any traffic for a while are hit
> by this, which is exactly the pattern you have with cvs.

This is the reason some of my ~/.ssh/config files has

host *
  ServerAliveInterval 240

in them. :)

Regards,

- Håvard


Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread Havard Eidnes
>> > No amount of uptime and activity was increasing the entropy in my
>> > system before I patched it.
>>
>> As I understand it, entropy was being contributed.  What wasn't
>> happening was the random driver code recognizing and acknowledging that
>> entropy, because it had no way to tell how much of it there really was.
>
> Clearly there was no entropy being contributed in any way shape or form.

Well.  That depends on what you mean by "entropy".

Samples are still being collected and mixed into the pool from your
listed sources.  By your statement, that should have contributed some
"randomness" into the pool, some might call this "entropy".

However, since the quality of those samples are unknown, and there is
no reliable model to estimate the actual quality of those samples,
they are in NetBSD-current not being counted as contributing to the
"entropy estimate" counter.  That is by design: the entropy estimation
is now quite conservative, as you have noticed.

I also presented a workaround for this problem; if you are reasonably
certain that you actually have mixed in a sufficient number of bits of
sufficient quality into the randomness pool (see "rndctl -l -v"), you
can do

# dd if=/dev/urandom of=/dev/random count=1

since if this is done by root, it counts as the "/dev/random" source,
and the bits fed in there by root are counted 1:1 as contributing to
the entropy estimate.  After this, your system will not block anymore
reading on /dev/random, and this state of affairs will be preserved
across reboots as long as you save and restore the entropy pool on
reboot.  (Which, admittedly, requires r/w storage for the relevant
file/directory, ref. your other ongoing thread.)

> I told the system to "count" the entropy being gathered by the
> appropriate driver(s), but it was being ignored entirely.

Well, you are now counting bitstrings of unknown and perhaps dubious
quality as contributing 1:1 to the "entropy estimate".  It's by design
that we don't do that anymore.

> After my fix the system behaved as I told it to.

But now with perhaps questionable estimation of the actual entropy
sitting in your pool.

Regards,

- Håvard


Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread Havard Eidnes
>> My question is, how can we tell what random sources a system actually
>> has, i.e. is there some flag that cpuctl identify shows when a system
>> has RDRAND/RDSEED?
>
> What about architectures that have nothing like RDRAND/RDSEED?  Are
> they, effectively, totally unsupported now?

Nope, not entirely.  But they have to be seeded once.  If they
have storage which survives reboots, and entropy is saved and
restored on reboot, they will be ~fine.

Systems without persistent storage and also without RDRAND/RDSEED
will however be ... a more challenging problem.

Regards,

- Håvard


Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread Havard Eidnes
>> Do note, the existing randomness sources are still being sampled and
>> mixed into the pool, so even if the starting state from the saved
>> entropy may be known (by violating the security of the storage),
>> it's still not possible to predict the complete stream of randomness
>> data once the system has seen a bit of uptime (given that there are
>> actual other sources of (unverified) entropy which aren't all of too
>> low quality).
>
> No amount of uptime and activity was increasing the entropy in my system
> before I patched it.

Indeed, that's also compatible with what I wrote.  The samples
from whatever sources you have are still being mixed into the
pool, but they are not being counted as contributing to the
entropy estimate, because the quality of the samples is at best
unknown.

> The unpatched implementation completely and entirely prevents
> the system from ever using any of those sources, despite
> showing that they are enabled for use.

As far as I know, those samples are still being *used*, but as
said above, not counted as contributing to the entropy count.

A possible workaround is, once you have some uptime and some bits
mixed into the pool, you can do:

% su
# dd if=/dev/urandom of=/dev/random count=1

If you then ensure that the entropy is saved and restored on
reboot, as is typically done, after this initialization, neither
/dev/random nor /dev/urandom will block(!).  Ref. the attachment.

>> Besides, the implementation has been thoroughly vetted.  E.g. the
>> reference [7] from the wikipedia article states in the conclusion on
>> page 20
>>
>>Overall, the Ivy Bridge RNG is a robust design with a large
>>margin of safety that ensures good random data is generated even
>>if the Entropy Source is not operating as well as predicted.
>
> "design" != implementation

Well, if I'm not mistaken, the actual implementation was tested,
not just a theoretical study of the design.  And, as I said,
thermal noise is one of the well-known physical systems which
provide actual entropy.

I am still of the fairly firm beleif that the mistrust in the
hardware vendors' ability to make a reasonable and robust
implementation is without foundation.

Regards,

- Håvard
b# uname -a
NetBSD b.urc.uninett.no 9.99.81 NetBSD 9.99.81 (GENERIC) #0: Sat Apr  3 
23:24:06 UTC 2021  
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/i386/compile/GENERIC i386
b# rndctl -l
Source Bits Type  Flags
/dev/random   0 ???  estimate, collect, v
wd0   0 disk estimate, collect, v, t, dt
cd0   0 disk estimate, collect, v, t, dt
cpu0  0 vm   estimate, collect, v, t, dv
hardclock 0 skew estimate, collect, t
bge0  0 net  estimate, v, t, dt
system-power  0 power estimate, collect, v, t, dt
autoconf  0 ???  estimate, collect, t
seed256 ???  estimate, collect, v
b# rndctl -s
0 bits mixed into pool
  256 bits currently stored in pool (max 256)
0 bits of entropy discarded due to full pool
0 hard-random bits generated
0 pseudo-random bits generated
b# dd if=/dev/random of=/dev/null count=1024
0+1024 records in
64+0 records out
32768 bytes transferred in 0.021 secs (1560380 bytes/sec)
b# rndctl -s
0 bits mixed into pool
  256 bits currently stored in pool (max 256)
0 bits of entropy discarded due to full pool
0 hard-random bits generated
0 pseudo-random bits generated
b#
b# rndctl -lv
Source Bits Type  Flags
/dev/random   0 ???  estimate, collect, v
Dt samples = 0
Dt bits = 0
Dv samples = 0
Dv bits = 0
wd0   0 disk estimate, collect, v, t, dt
Dt samples = 5559
Dt bits = 0
Dv samples = 5869
Dv bits = 0
cd0   0 disk estimate, collect, v, t, dt
Dt samples = 0
Dt bits = 0
Dv samples = 0
Dv bits = 0
cpu0  0 vm   estimate, collect, v, t, dv
Dt samples = 183
Dt bits = 0
Dv samples = 183
Dv bits = 0
hardclock 0 skew estimate, collect, t
Dt samples = 853
Dt bits = 0
Dv samples = 0
Dv bits = 0
bge0  0 net  estimate, v, t, dt
Dt samples = 0
Dt bits = 0
Dv samples = 0
Dv bits = 0
system-power  0 power estimate, collect, v, t, dt
Dt samples = 0
Dt bits = 0
Dv samples = 0
Dv bits = 0
autoconf  0 ???  estimate, collect, t
Dt samples = 43
Dt bits = 0
Dv samples = 0
Dv bits = 0
seed256 ???  estimate, collect, v
Dt samples = 0
Dt bits = 0
Dv samples = 1
Dv bits = 256
b#


Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread Havard Eidnes
> Is that file encrypted?

As I understand it, no.

> I think I'd prefer possibly insecure, but difficult to obtain from outside
> like disk drive interrupt timing low order bits than that.   Regardless of
> how unproven that method might be.

Do note, the existing randomness sources are still being sampled and
mixed into the pool, so even if the starting state from the saved
entropy may be known (by violating the security of the storage),
it's still not possible to predict the complete stream of randomness
data once the system has seen a bit of uptime (given that there are
actual other sources of (unverified) entropy which aren't all of too
low quality).

However, in the new scheme of things, because most of the
traditional sources have unknown quality, and we have no reliable
method to estimate how much "actual entropy" those sources
provide, they no longer count towards the *estimate* of what is
now a lower bound on the "real" entropy available in the pool.

> Lastly, why would anyone presume that RDRAND generates less predictable
> bits (less predictable to someone who knows how it works) than any of
> the other methods that are used.

Looking at

  https://en.wikipedia.org/wiki/RDRAND

and the reference [3] at

  
http://software.intel.com/sites/default/files/m/d/4/1/d/8/441_Intel_R__DRNG_Software_Implementation_Guide_final_Aug7.pdf

reveals that the on-chip entropy source samples thermal noise on
the chip, ref. page 12 where it says:

   The Entropy Source runs asynchronously on a self-timed circuit
   and uses thermal noise within the silicon to output a random
   stream of bits at the rate of 3GHz.

That bitstream is then fed through a "AES-CBC-MAC" based
conditioner and an AES-CTR based deterministic random bit
generator, before the result is given to the user via RDRAND.

If I'm not very much mistaken, thermal noise is one of the well-
known physical sources of actual entropy.

Besides, the implementation has been thoroughly vetted.  E.g. the
reference [7] from the wikipedia article states in the conclusion on
page 20

   Overall, the Ivy Bridge RNG is a robust design with a large
   margin of safety that ensures good random data is generated even
   if the Entropy Source is not operating as well as predicted.

Personally, I as a non-expert can't find anything which seems
overly worrisome with RDRAND.

Best regards,

- Håvard


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-31 Thread Havard Eidnes
> On Wed, Mar 31, 2021 at 12:12:31AM +, Taylor R Campbell wrote:
>> This is false.  If the VM host provided a viornd(4) device then NetBSD
>> would automatically collect, and count, entropy from the host, with no
>> manual intervention.
>
> I would love to see instructions how to do this - I have not seen a working
> virond(4) in any of my Xen domU (but that is a very limited sample).

While this isn't with Xen, and isn't on -current, but this is
what I do for my emulated arm64 system, where the emulator runs on
NetBSD/amd64 8.0:

#!/bin/sh
SMP=4
MEM=8g
qemu-system-aarch64 -M virt -cpu cortex-a57 -smp $SMP -m $MEM \
  -drive if=none,file=disk.img,id=hd0 -device virtio-blk-device,drive=hd0 \
  -netdev type=user,id=net0,hostfwd=tcp::-:22,ipv6=off \
  -nographic \
  -device virtio-net-device,netdev=net0,mac=00:11:66:33:44:55 \
  -device virtio-rng-device \
  -kernel netbsd.img -append root=ld4a

and the booted system is NetBSD/aarch64 9.0 with the unmodified
GENERIC64 kernel:

arm64# rndctl -l
Source Bits Type  Flags
cpu3   7824 vm   estimate, collect, v, t, dv
cpu2   8983 vm   estimate, collect, v, t, dv
cpu1   8351 vm   estimate, collect, v, t, dv
cpu0  12436 vm   estimate, collect, v, t, dv
ld4 8440476 disk estimate, collect, v, t, dt
viornd04096 rng  estimate, collect, v
system-power  0 power estimate, collect, v, t, dt
autoconf 72 ???  estimate, collect, t, dt
printf0 ???  collect
callout 116 skew estimate, collect, v, dv
arm64#
arm64# dmesg | grep rnd
[ 1.10] viornd0 at virtio29: Features: 0x1000
arm64# 
arm64# dmesg | grep virtio29
[ 1.10] virtio29 at simplebus0
[ 1.10] viornd0 at virtio29: Features: 0x1000
[ 1.10] virtio29: allocated 32768 byte for virtqueue 0 for Entropy 
request, size 1024
[ 1.10] virtio29: interrupting on GIC irq 77
arm64# 

When I get to booting a past-rng-rework kernel, I'm fairly
certain that only the input from viornd0 will remain as a source
with "estimate" in the flags field.  Of course, any saved and
restored entropy will also count towards the estimate.


That said, it doesn't look like the amd64 XEN3_DOMU kernel has either
of virtio* or viornd* configured, they're only in the GENERIC and ALL
kernel configs.  Also, I don't know what has to happen on the XEN
"host side" to provide those devices; virtio* is apparently supposed
to be made visible via the pci bus (looking at amd64's GENERIC), but
by the looks of it, XEN only does "pci passthrough" to physical
devices (looking at the comments near the commented-out "pci" config
statements in XEN3_DOMU), so no "emulated" PCI bus where the host can
provide the host-side of the randomness virtual device?

Regards,

- Håvard


Re: -current tar(1) breakage

2021-03-27 Thread Havard Eidnes
>> (gdb) print *cv
>> $1 = {cv_shared = 0x6c652e73, cv_closure = 0x761713184050}
>> (gdb) print *cv->shared
>> There is no member named shared.
>> (gdb) print *cv->cv_shared
>> Cannot access memory at address 0x6c652e73
>> (gdb) print *cv->cv_shared->ci_ops
>> Cannot access memory at address 0x6c652e73
>> (gdb)
>
> 0x6c652e73 == "s.el" (ascii) so it sounds like something
> uninitialized/overwritten.

Hmm...  I thought there had been made progress to make the
address sanitizer feature of gcc and/or clang work on NetBSD?
(ref. "gcc -fsanitize=address")

However, my previous attempt at using that feature on netbsd-9
was unfortunately not met with success (I tried with gcc) -- I
ended up with "rpl_malloc()" as undefined -- we don't define that
but it's supposedly defined in a Linux-based environment...

I didn't try with clang for the program I was looking at, and
I've also not tried on -current.

In principle, tar ought to be a simpler program to get to run
than the multi-threaded program I was trying to get going...

Regards,

- Håvard




Re: lang/rust build fails

2020-05-16 Thread Havard Eidnes
> The failure doesn't give much of a clue about what's happened.
> The last lines in the build.log are:
>
> running: /pkg_comp/work/pkg/lang/rust/work/rust-bootstrap/bin/cargo build 
> --manifest-path 
> /pkg_comp/work/pkg/lang/rust/work/rustc-1.42.0-src/src/bootstrap/Cargo.toml 
> --frozen
>Compiling proc-macro2 v0.4.30
> 
> At that point there's nothing consuming CPU time in the build
> and everything seems to be waiting on something to happen that
> never does.  I've left the system in that state for about 24
> hours and still no progress.
>
> Any clues? Could this be something related to some of the
> recent kernel changes?

I think I've seen something similar to this in my rebuilds:
sometimes cargo is present and waiting, and there is at least one
rustc child which is in "zombie" process state.  It's as if cargo
didn't "get the message" that the child process is done, and its
status is available for collection.

"ps axd" will show the process relationships.

I don't have an explanation why this happens, though, or a fix
for the problem.

Regards,

- Håvard


Re: netbsd-7 build error?

2019-01-03 Thread Havard Eidnes
>> I just updated one of my source trees to netbsd-7, and did a
>> fresh rebuild (empty obj and dest, host oldish 7.1_STABLE), but
>> got:
>
> For what value of "just"? You need something from today around noon UTC
> or newer.

My update was done 13:06 UTC+1, but via a cvs mirror, so possibly
a few hours earlier than that.

>> Anyone have any guess why I saw this build error but it's not
>> seen in the autobuild?
>
> There has been no -7 build since the pullup that broke it (but
> you see the lossage in yesterdays -8 builds).

Aha, that explains it, thanks!  I suspected it was some local
oddity of my own doing; faith in self restored.

Regards,

- Håvard


netbsd-7 build error?

2019-01-03 Thread Havard Eidnes
Hi,

I just updated one of my source trees to netbsd-7, and did a
fresh rebuild (empty obj and dest, host oldish 7.1_STABLE), but
got:

compile  libc/compat___msgctl13.o
In file included from /usr/src/lib/libc/compat/sys/compat___msgctl13.c:48:0:
/usr/src/sys/compat/sys/msg.h: In function '__native_to_msqid_ds13':
/usr/src/sys/compat/sys/msg.h:111:2: error: implicit declaration of function 
'memset' [-Werror=implicit-function-declaration]
  memset(omsqbuf, 0, sizeof(*omsqbuf));
  ^
/usr/src/sys/compat/sys/msg.h:111:2: error: incompatible implicit declaration 
of built-in function 'memset' [-Werror]
/usr/src/sys/compat/sys/msg.h: In function '__native_to_msqid_ds14':
/usr/src/sys/compat/sys/msg.h:153:2: error: incompatible implicit declaration 
of built-in function 'memset' [-Werror]
  memset(omsqbuf, 0, sizeof(*omsqbuf));
  ^
cc1: all warnings being treated as errors

*** Failed target:  compat___msgctl13.o

I see the auto-build has not seen this error.  Locally I did

Index: sys/compat/sys/msg.h
===
RCS file: /cvsroot/src/sys/compat/sys/msg.h,v
retrieving revision 1.4.40.1
diff -u -r1.4.40.1 msg.h
--- sys/compat/sys/msg.h2 Jan 2019 15:25:29 -   1.4.40.1
+++ sys/compat/sys/msg.h3 Jan 2019 15:42:37 -
@@ -22,6 +22,7 @@
 #ifndef _COMPAT_SYS_MSG_H_
 #define _COMPAT_SYS_MSG_H_
 
+#include 
 #include 
 /*
  * Old message queue data structure used before NetBSD 1.5.

Anyone have any guess why I saw this build error but it's not
seen in the autobuild?

- Håvard


Re: Rust, pkgsrc

2018-11-06 Thread Havard Eidnes
> Yes. I had a similar problem.  The build would fill up the
> /tmp/ directory and die from exhausted resources.  I had /tmp/
> created with tmpfs and had a constraint of 64M.  The answer for
> me was to create /tmp in /etc/fstab with tmpfs and no size
> constraint.  Then Rust would build, but it still took a long
> time.

Yes, rust is an absolute Pig resource-wise.  Not only does it
carry a copy of llvm inside itself (it most probably has to), it
also carries a nubmer of other packages inside.  Plus, the build
builds most parts at least twice if I've been able to observe
correctly.

When building on NetBSD/amd64 8.0, I noticed that the work/
directory after a "make" consumes in the order of 10G disk space,
possibly more when cross-building (my current work/ is 13G, which
is an unfinished cross-build because I hit build issues...).

It is conceivable that the storage could be reduced somewhat (but
probably not by much?) by tweaking src/bootstrap/boostrap.py to
say -Cdebuginfo=0 instead of 2 for RUSTFLAGS.

So that you have an idea what to expect:

My amd64 build host has 8G real memory, and a 2G tmpfs /tmp, and
... it didn't run out of space anywhere :) On this particular
host (i7 3rd-gen, 4 real cores, 8 w/HT, pkgsrc and system on SSD)
the build of 1.29.2 completed in a little over 2 hours wall-clock
time, csh's "time" report at the end of the build was

40468.007u 1958.277s 2:04:42.18 567.0%  0+0k 10302+100556io 129329pf+0w

So, yes, the build makes fairly good use of the multiple cores;
notice the 567.0%, which, if I've understood correctly, indicates
approx. 5.7* parallelism on average.

The last version I managed to build on one of my NetBSD/macppc
8.0 machines (a single-core 1.5GHz G4 Mac Mini, 1GB memory) was
1.29.2, the build took nearly 29 hours wallclock time.  This one
doesn't have a tmpfs, and has a single file system with ~40G
free, so it also didn't run into any barriers on the /tmp front
either.

Regards,

- Håvard


Re: llvm self-tests looping(?),Re: llvm self-tests looping(?),Re: llvm self-tests looping(?),Re: llvm self-tests looping(?)

2018-04-09 Thread Havard Eidnes
>>> I don't see an ATF machine for powerpc, there shall be one available.
>>>
>>> http://releng.netbsd.org/test-results.html
>>
>> Mm, OK, doing the tests on netbsd-8 on this MacMini G4 should be
>> fairly straight-forward.
>
> Not so.  The machine wedged partway through the tests, it's in
> the office and I'm at home...

It looks like one of the net/icmp/t_ping* tests caused the wedge.
I commented them out from Atffile, and the tests now completed.

Regards,

- Håvard


Re: llvm self-tests looping(?),Re: llvm self-tests looping(?),Re: llvm self-tests looping(?),Re: llvm self-tests looping(?)

2018-04-07 Thread Havard Eidnes
>> I don't see an ATF machine for powerpc, there shall be one available.
>>
>> http://releng.netbsd.org/test-results.html
>
> Mm, OK, doing the tests on netbsd-8 on this MacMini G4 should be
> fairly straight-forward.

Not so.  The machine wedged partway through the tests, it's in
the office and I'm at home...

Regards,

- Håvard


Re: llvm self-tests looping(?),Re: llvm self-tests looping(?),Re: llvm self-tests looping(?),Re: llvm self-tests looping(?)

2018-04-07 Thread Havard Eidnes
>> Hm, I am suspecting that nobody has actually tested whether
>> backtrace() really works on NetBSD/powerpc...  I'll write a
>> simple test of that in C tomorrow.
>
> Yes, this looks more like dysfunctional backtrace(3).
>
> We have got an ATF test for this:
>
>   tests/lib/libexecinfo/t_backtrace.c
>
> If it will work, it's worth to add a scenario that fails for ppc.

Hmm, that test seems to be working just fine, sigh!

ambrosia: {6} ./t_backtrace backtrace_fmt_basic
t_backtrace: WARNING: Running test cases without atf-run(1) is unsupported
t_backtrace: WARNING: No isolation nor timeout control is being applied; you 
may get unexpected failures; see atf-test-case(4)
got nptrs=19 ncalls=12 (min_frames: 4, max_frames: 9)
backtrace is:
#0: myfunc3
#1: myfunc2
#2: myfunc1
#3: myfunc1
#4: myfunc1
#5: myfunc1
#6: myfunc1
#7: myfunc1
#8: myfunc1
#9: myfunc1
#10: myfunc1
#11: myfunc1
#12: myfunc1
#13: myfunc1
#14: myfunc
#15: atfu_backtrace_fmt_basic_body
#16: atf_tc_run
#17: atf_tp_run
#18: atf_tp_main
passed
ambrosia: {7} pwd
/usr/tests/lib/libexecinfo
ambrosia: {8} 

> I don't see an ATF machine for powerpc, there shall be one available.
>
> http://releng.netbsd.org/test-results.html

Mm, OK, doing the tests on netbsd-8 on this MacMini G4 should be
fairly straight-forward.

>> On the other hand, the backtrace gdb was able to provide
>> decidedly looks incomplete -- the program's main function is not
>> opendir() (!), and maybe this has something to do with it?
>
> This is a bug, it's really a signal trapmpoline. This needs to be fixed
> in GDB.. it's on my TODO list.

OK, good.

>> It doesn't look like the SupportTests program is multi-threaded,
>> although it is linked with -lpthread:
>
> It's common in the LLVM environment to link with everything that could
> be useful.. like libm, librt, libpthread, libdl [for !NetBSD] etc.

Mm, ok.

- Håvard


Re: llvm self-tests looping(?),Re: llvm self-tests looping(?)

2018-04-04 Thread Havard Eidnes
> On 01.04.2018 16:53, Havard Eidnes wrote:
>> And some of the internal functions in libexecinfo are apparently
>> static, so not present in the symbol table for display in the
>> debugger, making debugging all that much harder.
>> 
>> Sigh!
>> 
>> Hints, anyone?
>
> There is an internal LLVM support for unwinding backtrace with an
> attempt to print a meaningful information on a crash signal.

Right...

> I assume that there is a crash in the unwinder code causing recursive
> execution of a signal handler.

:(

Would not that cause the stack to overflow?

> There was also a post-6.0 patch:
>
> Fix llvm-config --system-libs output on FreeBSD and NetBSD
>
> https://github.com/llvm-mirror/llvm/commit/daf294622383687adc281dd695acf4533caf0357
>
> Not sure if it is of any help, but it's worth to backport it to 6.0.

I applied that locally to the 5.0.1 package I'm building from.

I also updated netbsd-8 source, rebuilt kernel and user-land and
installed them, then rebuilt llvm, and tried to re-run the tests,
and the pointers to which library is what has changed a little
(and are probably more accurate now).  Included in the update
was, I think, an update of gcc from 5.4.0 to 5.5.0.  Yes, my
8.0_BETA snapshot was from the middle of last year...

(gdb) where
#0  0xfbe5a96c in memcpy () from /usr/lib/libc.so.12
#1  0xfbed46a0 in ?? () from /usr/lib/libgcc_s.so.1
#2  0xfbed4cfc in ?? () from /usr/lib/libgcc_s.so.1
#3  0xfbed5bc8 in _Unwind_Backtrace () from /usr/lib/libgcc_s.so.1
#4  0xfc0d0c1c in backtrace () from /usr/lib/libexecinfo.so.0
#5  0xfc56d8d8 in llvm::sys::PrintStackTrace(llvm::raw_ostream&) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#6  0xfc56dc60 in PrintStackTraceSignalHandler(void*) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#7  0xfc56bacc in llvm::sys::RunSignalHandlers() ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#8  0xfc56bd4c in SignalHandler(int) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#9  0xfbdda03c in opendir () from /usr/lib/libc.so.12
Backtrace stopped: frame did not save the PC
(gdb) 

How can I see if it repeatedly hits a signal? I tried:

(gdb) b SignalHandler
Breakpoint 1 at 0xfc56bb80
(gdb) c
Continuing.

and ... nothing.

So I interrupted and tried some single-stepping:

(gdb) si
0xfbed474c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4750 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4754 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4758 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed475c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4760 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4764 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47ac in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b0 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b4 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b8 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47bc in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4744 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4748 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed474c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4750 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4754 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4758 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed475c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4760 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4764 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47ac in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b0 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b4 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b8 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47bc in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4744 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4748 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed474c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4750 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4754 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4758 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed475c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4760 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4764 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47ac in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b0 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b4 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47b8 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed47bc in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4744 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4748 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed474c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4750 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4754 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed4758 in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xfbed475c in ?? () from /usr/lib/libgcc_s.so.1
(gdb) 
0xf

Re: llvm self-tests looping(?),Re: llvm self-tests looping(?)

2018-04-03 Thread Havard Eidnes
>> And ... as follow-up I thought I'd check whether "make test" in
>> lang/llvm (5.0.1nb1) works on NetBSD/amd64 8.0_BETA.  And while the
>> selftest setup seems to work fine on this platform, there are quite a
>> bit of unexpected failures:
>>
>>   Expected Passes: 20309
>>   Expected Failures  : 130
>>   Unsupported Tests  : 786
>>   Unexpected Failures: 211
>>
>> and it seems a lot of the tests simply crash.  An example:
>
> Most of that fixed in 6.0, MPROTECT violation in JIT code.

I would have never guessed that as the root cause given the
diagnostics emitted.  This smells like software which tries to be
too clever by half.

Regards,

- Håvard


Re: llvm self-tests looping(?)

2018-04-02 Thread Havard Eidnes
And ... as follow-up I thought I'd check whether "make test" in
lang/llvm (5.0.1nb1) works on NetBSD/amd64 8.0_BETA.  And while the
selftest setup seems to work fine on this platform, there are quite a
bit of unexpected failures:

  Expected Passes: 20309
  Expected Failures  : 130
  Unsupported Tests  : 786
  Unexpected Failures: 211

and it seems a lot of the tests simply crash.  An example:


Testing: 0 .. 10.. 20.. 30.. 40.. 50.. 60
FAIL: LLVM :: ExecutionEngine/test-interp-vec-logical.ll (13657 of 21436)
 TEST 'LLVM :: ExecutionEngine/test-interp-vec-logical.ll' 
FAILED 
Script:
--
/usr/pkgsrc/lang/llvm/work/build/./bin/lli 
/usr/pkgsrc/lang/llvm/work/llvm-5.0.0.src/test/ExecutionEngine/test-interp-vec-logical.ll
 > /dev/null
--
Exit Code: 139

Command Output (stderr):
--
#0 0x7138327c701a llvm::sys::PrintStackTrace(llvm::raw_ostream&) 
(/usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so+0x7c701a)
#1 0x7138327c536f llvm::sys::RunSignalHandlers() 
(/usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so+0x7c536f)
#2 0x7138327c54b0 SignalHandler(int) 
(/usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so+0x7c54b0)
Stack dump:
0.  Program arguments: /usr/pkgsrc/lang/llvm/work/build/./bin/lli 
/usr/pkgsrc/lang/llvm/work/llvm-5.0.0.src/test/ExecutionEngine/test-interp-vec-logical.ll
 
/usr/pkgsrc/lang/llvm/work/build/test/ExecutionEngine/Output/test-interp-vec-logical.ll.script:
 line 1: 18054 Segmentation fault  (core dumped) 
/usr/pkgsrc/lang/llvm/work/build/./bin/lli 
/usr/pkgsrc/lang/llvm/work/llvm-5.0.0.src/test/ExecutionEngine/test-interp-vec-logical.ll
 > /dev/null

However, this doesn't appear to actually give enough information to
zero in on what the actual problem is, sigh!

Isn't the backtracer able to trace through signal delivery events?

I find there's a number of core files left after the selftests have
run:

d3: {10} find work -name '*.core'
work/build/test/MCJITTests.core
work/build/test/OrcJITTests.core
work/build/test/SupportTests.core
work/build/test/BugPoint/opt.core
work/build/test/ExecutionEngine/MCJIT/lli.core
work/build/test/ExecutionEngine/OrcMCJIT/lli.core
work/build/test/ExecutionEngine/lli.core
d3: {11}

but looking at the last one doesn't really give much more useful
information:

d3: {11} gdb -q work/build/./bin/lli
Reading symbols from work/build/./bin/lli...(no debugging symbols found)...done.
(gdb) core-file work/build/test/ExecutionEngine/lli.core
[New process 1]
Core was generated by `lli'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x701d30805000 in main ()
(gdb) x/i 0x701d30805000
=> 0x701d30805000 :   xor%eax,%eax
(gdb) where
#0  0x701d30805000 in main ()
#1  0x701d2e95bf17 in llvm::MCJIT::runFunction(llvm::Function*, 
llvm::ArrayRef) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#2  0x701d2e9378ed in 
llvm::ExecutionEngine::runFunctionAsMain(llvm::Function*, 
std::vector const&, char const* 
const*) () from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#3  0x0041cad6 in main ()
(gdb) i reg
rax0x701d2e95be9a   123270637928090
rbx0x7f7fffecd600   140187731285504
rcx0x701d3080b800   123270670104576
rdx0xff4350da   -12365606
rsi0x20 32
rdi0x701d2d1236e0   123270612530912
rbp0x20 0x20
rsp0x7f7fffecd458   0x7f7fffecd458
r8 0x0  0
r9 0x701d2d1151c0   123270612472256
r100x701d30805000   123270670077952
r110x207519
r120x701d30805000   123270670077952
r130x701d2d16c000   123270612828160
r140x701d2d1236e0   123270612530912
r150x701d2c558b00   123270600166144
rip0x701d30805000   0x701d30805000 
eflags 0x10246  [ PF ZF IF RF ]
cs 0x47 71
ss 0x3f 63
ds 0x3f 63
es 0x3f 63
fs 0x0  0
gs 0x0  0
(gdb) 

?!?  SEGV on "xor %eax,%eax"?!?  I don't think so...

Regards,

- Håvard


llvm self-tests looping(?)

2018-04-01 Thread Havard Eidnes
Hi,

I decided it might be a good idea to run the self-tests in llvm
5.0.1 on powerpc.  However, after the test and utilities are
built, it appears to spin while doing the first test.  The run
log shows:

[100%] Built target LLVMHello_exports
[100%] Built target LLVMHello
Scanning dependencies of target check-llvm
[100%] Running the LLVM regression tests
-- Testing: 21465 tests, 1 threads --
Testing: 0 ..

and the process is spinning consuming CPU (well, it's stopped in
the debugger here):

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
0 root 1260 0K   22M pgdaemon 600:47  0.00%  0.00% [system]
19597 root  28443M   15M STOP 306:27  0.00%  0.00% SupportTests

and the debugger points towards libexecinfo:

ambrosia# gdb -q ./work/build/unittests/Support/SupportTests
Reading symbols from ./work/build/unittests/Support/SupportTests...(no 
debugging symbols found)...done.
(gdb) attach 19597
Attaching to program: 
/usr/pkgsrc/lang/llvm/work/build/unittests/Support/SupportTests, process 19597
Couldn't get registers: Device busy.
Couldn't get registers: Device busy.
(gdb) Reading symbols from /usr/lib/libpthread.so.1...(no debugging symbols 
found)...done.
Reading symbols from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so...(no 
debugging symbols found)...done.
Reading symbols from /usr/lib/librt.so.1...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libexecinfo.so.0...(no debugging symbols 
found)...done.
Reading symbols from /usr/lib/libterminfo.so.1...(no debugging symbols 
found)...done.
Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libstdc++.so.7...(no debugging symbols 
found)...done.
Reading symbols from /usr/lib/libm.so.0...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libgcc_s.so.1...(no debugging symbols 
found)...done.
Reading symbols from /usr/lib/libc.so.12...(no debugging symbols found)...done.
Reading symbols from /usr/lib/libedit.so.3...(no debugging symbols 
found)...done.
Reading symbols from /usr/lib/libelf.so.2...(no debugging symbols found)...done.
Reading symbols from /usr/libexec/ld.elf_so...(no debugging symbols 
found)...done.
[Switching to LWP 1]
0xfc1d3fcc in ?? () from /usr/lib/libexecinfo.so.0

(gdb) where
#0  0xfc1d3fcc in ?? () from /usr/lib/libexecinfo.so.0
#1  0xfc1d4514 in ?? () from /usr/lib/libexecinfo.so.0
#2  0xfc1d53e0 in _Unwind_Backtrace () from /usr/lib/libexecinfo.so.0
#3  0xfc1d128c in backtrace () from /usr/lib/libexecinfo.so.0
#4  0xfc673ac4 in llvm::sys::PrintStackTrace(llvm::raw_ostream&) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#5  0xfc673e4c in PrintStackTraceSignalHandler(void*) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#6  0xfc671c08 in llvm::sys::RunSignalHandlers() ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#7  0xfc671e88 in SignalHandler(int) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#8  0xfbe2af58 in opendir () from /usr/lib/libc.so.12
Backtrace stopped: frame did not save the PC
(gdb) i thread
  Id   Target Id Frame 
* 1LWP 1 0xfbeb4c4c in memcpy () from /usr/lib/libc.so.12
(gdb)

and an earlier backtrace of the same process gave

(gdb) where
#0  0xfbeb4c4c in memcpy () from /usr/lib/libc.so.12
#1  0xfc1d2c20 in ?? () from /usr/lib/libexecinfo.so.0
#2  0xfc1d3470 in ?? () from /usr/lib/libexecinfo.so.0
#3  0xfc1d52e8 in _Unwind_Backtrace () from /usr/lib/libexecinfo.so.0
#4  0xfc1d128c in backtrace () from /usr/lib/libexecinfo.so.0
#5  0xfc673ac4 in llvm::sys::PrintStackTrace(llvm::raw_ostream&) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#6  0xfc673e4c in PrintStackTraceSignalHandler(void*) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#7  0xfc671c08 in llvm::sys::RunSignalHandlers() ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#8  0xfc671e88 in SignalHandler(int) ()
   from /usr/pkgsrc/lang/llvm/work/build/lib/libLLVM-5.0.so
#9  0xfbe2af58 in opendir () from /usr/lib/libc.so.12
Backtrace stopped: frame did not save the PC
(gdb) 

Setting a breakpoint on the next instruction in backtrace():

(gdb) x/i 0xfc1d128c
   0xfc1d128c :   lwz r3,16(r1)
(gdb) x/i backtrace
   0xfc1d1250 :  stwur1,-48(r1)
(gdb) x/20i
   0xfc1d1254 :mflrr0
   0xfc1d1258 :bcl 20,4*cr7+so,0xfc1d125c 
   0xfc1d125c :   li  r9,-1
   0xfc1d1260 :   stw r30,40(r1)
   0xfc1d1264 :   mflrr30
   0xfc1d1268 :   stw r3,8(r1)
   0xfc1d126c :   stw r4,12(r1)
   0xfc1d1270 :   addir4,r1,8
   0xfc1d1274 :   addis   r30,r30,2
   0xfc1d1278 :   stw r9,16(r1)
   0xfc1d127c :   addir30,r30,-448
   0xfc1d1280 :   stw r0,52(r1)

Re: ssh, HPN extension and TCP auto-tuning

2017-09-21 Thread Havard Eidnes
Hi,

> 
> # Improves TCP performance significantly with ssh.
> net.inet.tcp.recvbuf_auto=1
> net.inet.tcp.sendbuf_auto=1
> net.inet.tcp.sendbuf_max=16777216
> net.inet.tcp.recvbuf_max=16777216

Thanks for the suggestions, and I've done some initial
adjustments with beneficial results.  I was a bit more
conservative and went for a 1MB sendbuf_max / recvbuf_max.

One thing I didn't see was any corresponding adjustment of
kern.sbmax; doesn't it also need to be as large as you want the
TCP window to be able to grow?

Best regards,

- Håvard


ssh, HPN extension and TCP auto-tuning

2017-09-20 Thread Havard Eidnes
Hi,

the OpenSSH in NetBSD has for quite a while had the "high-
performance networking" patches applied.

However, despite this, we are observing rather low performance
when copying files over a distance, e.g. we have a pair of hosts
running netbsd-7 code, placed some 14-15ms apart, where scp'ing a
file only manages to give around 2.6MB/s.

Doing a tcpdump and an analysis using tcptrace + looking at the
result with xplot reveals that the TCP window never climbs above
the default 32KB size.

This is when the scp client is pushing the file to the remote
server.

However, when you copy "in the other direction", i.e. when the
remote sshd is the one which is pushing the file across the
network, we get an average of 8.4MB/s when copying a 143MB large
file, and a tcpdump + tcptrace reveals that in this case the
system's automatic tuning of the TCP window is indeed kicking
into action.

The same behaviour can be observed with the scp client from
8.0_BETA: pushing with scp is slow, pulling with scp from the
remote server is quite a bit faster.  I'm going to guess that
"pushing with scp" is the most often used mode, as you may get
file name completion in that case...

If, on the other hand, I bump the recvspace and sendspace on the
two involved hosts, so that the scp client has a larger default
send space, performance improves, but again, TCP auto-tuning does
not appear to be kicking in.

Am I alone in seeing this?

I must say I'm puzzled by the result.

The configuration on both systems are pretty much "stock", and
the network is not the bottleneck in my case.

Admittedly, the OpenSSH in netbsd-7 is quite old, and the HPN
patches are probably of the same vintage, and I've not checked if
a newer combination on that front will improve matters; I may do
that next.

Regards,

- Håvard


aac / ld interaction

2017-08-03 Thread Havard Eidnes
Hi,

this afternoon I attempted to upgrade NetBSD from 7.1_STABLE to
8.0_BETA on an amd64-running machine in our lab.  It has an aac
controller, probed like so:

aac0 at pci6 dev 0 function 0: IBM ServeRAID 8k
aac0: interrupting at ioapic0 pin 17
aac0: Enabling 64-bit address support
aac0: Enable 64-bit array support
aac0: New comm. interface enabled
aac0: MIPS 5KC at 250MHz, 32MB mem (16MB cache), optional battery not installed
ld0 at aac0 unit 0: RAID 1 (Mirror)
ld0: 232 GB, 30378 cyl, 255 head, 63 sec, 512 bytes/sect x 488036352 sectors
ld1 at aac0 unit 1: RAID 10
ld1: 465 GB, 60757 cyl, 255 head, 63 sec, 512 bytes/sect x 976072704 sectors

Apparently, somewhere between those two kernel versions, the 'ld'
driver was MPified, while the aac driver was not.

The symptom I noticed this with was that I managed to unpack the
'base' set from the 8.0_BETA build, but 29% into the 'comp' set, the
machine decided to panic with a rather mysterious null pointer
de-reference inside the aac driver, pointing (by the looks of it) to
the SIMPLEQ_FIRST macro use inside aac_ccb_enqueue().

By this time I had a machine with an 8-based user-land but with a
brittle 8 kernel, and of course the 7-based kernel doesn't want
anything to do with the 8-based ifconfig, so by the looks of it, the
machine is going to need me to physically visit it to rescue it out of
this bind, even though I have remote serial console.

I don't know the MPify-state for the other sub-drivers under 'ld'
(I see aac, cac, icp, mlx and nvme), but this should probably get
some proper attention before 8.0 is released...

And, I have it on good authority that simply replacing splbio() /
splx() with mutex_enter() / mutex_exit() isn't going to work, because
there are a number of things which are not permitted in a mutex-based
critical section which are perfectly fine in an splbio()-protected
section, such as doing malloc(), using bus_* functions etc. etc.  So
the conversion is decidedly not trivial, and certainly above my
current skill level.

So, "Help!"

Regards,

- Håvard


Re: rnd entropy estimate running low?

2017-01-31 Thread Havard Eidnes
>> Meanwhile the hardware random generator sits there unused.
>
> Does it sit there completely unused, or did it get used a little at
> boot time?

It generated some bits at boot time, but apparently not early
enough, because on each reboot the kernel log looks like this:

...
total memory = 1024 MB
avail memory = 1007 MB
sysctl_createv: sysctl_create(machine_arch) returned 17
rnd: callout attached as an entropy source (collecting)
rnd: initialised (4096) with counter
rnd: printf attached as an entropy source (collecting without estimation)
rnd: autoconf attached as an entropy source (collecting)
rnd: WARNING! initial entropy low (5).
rnd: starting statistical RNG test, entropy = 6.
rnd: statistical RNG test done, entropy = 6.
rnd: entropy estimate 0 bits
rnd: asking source callout for 512 bytes
rnd: WARNING! initial entropy low (0).
rnd: entropy estimate 0 bits
rnd: asking source callout for 512 bytes
rnd: system-power attached as an entropy source (collecting)
mainbus0 (root)
cpu0 at mainbus0 core 0: 1536 MHz Cortex-A5 r0p1 (Cortex V7A core)
...

I'm assuming this is because this happens too early, the rng
device hasn't been detected so early in the boot process, and
there's no file system accessible either to re-initialize the
kernel rng from either at this stage, and the boot loader doesn't
have a way to work around this.

(This is more a platform-specific problem, I think, and
tangential to what I discussed initially.)

>> I would have thought it would make more sense to keep the "bits
>> currently stored in pool" more "topped up", and that a re-fill
>> could with benefit be done before the estimate crept down towards
>> zero?  Especially if you have a half-way decent hardware random
>> generator at hand?
>
> Actually, no.  One basic conceit of modern symmetric cryptography is
> that from a single small uniform random 256-bit secret, you can derive
> an arbitrarily large uniform random secret.  `Entropy depletion' does
> not really exist as a meaningful concept in modern cryptography.
>
> The entropy accounting that we currently do is a holdover from days of
> yore when the folklore supported it, but the natural information-
> theoretic interpretation of the folklore actually leads to worse
> attacks in practice -- see the rnd(4) man page for details.  So while
> we haven't gotten rid of the kooky accounting, it doesn't really mean
> anything to see the numbers go down.
>
> There is a limit to the output produced by, e.g., AES-CTR, arising
> from the PRP approximation to a PRF and the birthday paradox, and
> there are some US federal government standards (NIST SP800-90A, in
> particular) about PRNG constructions that Thor wanted to make it easy
> to follow, which is why we rekey cprng(9) after a relatively small
> amount of output -- but that happens much slower than the entropy
> accounting you're looking at, and is not reported to userland.

OK, I'll buy the crypto argument at face value.  However, our
code still behaves differently depending on whether the entropy
estimate is able to "satisfy" the request being processed or not.
So under this description that is also a holdover from older
versions of this code?

It may be coincidental, but this box when it sits otherwise
mostly idle and only does ntp for a long while sometimes logs

Kernel RNG "231 0 1" monobit test FAILURE: 10300 ones
cprng 231 0 1: failed statistical RNG test
...
Kernel RNG "15965 0 4" runs test FAILURE: too many runs of 4 1s (386 >= 384)
cprng 15965 0 4: failed statistical RNG test
...
Kernel RNG "27778 0 3" poker test failure: parameter X = 2.9280
cprng 27778 0 3: failed statistical RNG test
...
Kernel RNG "6647 0 3" poker test failure: parameter X = 47.2720
cprng 6647 0 3: failed statistical RNG test
...
Kernel RNG "24153 0 3" long run test FAILURE: Run of 29 0s found
cprng 24153 0 3: failed statistical RNG test
...
Kernel RNG "2551 0 4" poker test failure: parameter X = 47.60320
cprng 2551 0 4: failed statistical RNG test
...

Admittedly, these are spread over a larger time period, and a
couple of them were the result of provocation by dumping data
from /dev/random with dd.

Regards,

- Håvard


rnd entropy estimate running low?

2017-01-12 Thread Havard Eidnes
Hi,

on a couple of arm boxes I have I've been observing the
development of the entropy estimate, what "rndctl -s" calls "bits
currently stored in pool" over time.

I've also tried to read some of the code to understand the
behaviour.

If I understand correctly, randomness sources come in two basic
flavours: those which offer up randomness samples based on
(possibly external) events, and those which only provide samples
when "asked" to do so.  The hardware randomness generator on my
amlogic arm boards appears to fall into the last category.

On a system with little other active randomness sources (e.g. FS
activity or keyboard / mouse activity), it appears that the "bits
currently stored in pool" will be allowed to decrease quite close
to zero (or even *to* zero) before the polled sources are
queried, via e.g. rnd_extract() only triggering a rnd_getmore()
if it could not initially fulfill the request.  The same also
appears to hold for rnd_tryextract().

Meanwhile the hardware random generator sits there unused.

I would have thought it would make more sense to keep the "bits
currently stored in pool" more "topped up", and that a re-fill
could with benefit be done before the estimate crept down towards
zero?  Especially if you have a half-way decent hardware random
generator at hand?

(This has been observed with both 7.99.47 and 7.99.58, fwiw.)

Regards,

- Håvard


ODROID-C1 networking problems?

2017-01-12 Thread Havard Eidnes
Hi,

I have a couple of ODROID C1 boxes.  One of them appear to have
intermittent networking problems, in particular with receiving
packets.

droid# uname -a
NetBSD droid.urc.uninett.no 7.99.58 NetBSD 7.99.58 (ODROID-C1) #0: Thu Jan 12 
10:12:54 CET 2017  
h...@mt.urc.uninett.no:/u/build/HEAD/obj/evbearmv7hf-el/sys/arch/evbarm/compile/ODROID-C1
 evbarm
droid# dmesg | grep awge0
awge0 at amlogicio0: Gigabit Ethernet Controller
awge0: interrupting on irq 40
awge0: Ethernet address: 00:1e:06:20:1b:50
rgephy0 at awge0 phy 0: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 6
rgephy1 at awge0 phy 1: RTL8169S/8110S/8211 1000BASE-T media interface, rev. 6
droid#

While pinging this box from a remote host (should give 1pps in),
and repeating

   vmstat -i | grep 'irq 40'

does not produce something which increases with 1 per second:

droid# vmstat -i | grep 'irq 40'
armgic (cpu0) irq 40 490
droid# vmstat -i | grep 'irq 40'
armgic (cpu0) irq 40 500
droid# vmstat -i | grep 'irq 40'
armgic (cpu0) irq 40 510
droid# vmstat -i | grep 'irq 40'
armgic (cpu0) irq 40 520
droid# 

(There were several seconds between each of thse.)

Tcpdump'ing on a neighboring host when it tries to ARP for this
neighbor on the same LAN shows that it sends ARP packets fine,
but appears to be "deaf" to the reception of the responses, and
the ARP requests are just retransmitted until the ODROID gives
up.

This same kernel worked fine for a while, I was midway into
checking out pkgsrc when it crashed (for other reasons), but now
I can't get it to be "live" on our network at all.  I've both
done warm and cold reboots.  Ifconfig says everything is fine:

droid# ifconfig awge0
awge0: flags=0x8843 mtu 1500
ec_capabilities=1
ec_enabled=0
address: 00:1e:06:20:1b:50
media: Ethernet autoselect (1000baseT full-duplex)
status: active
inet6 fe80::21e:6ff:fe20:1b50%awge0 prefixlen 64 scopeid 0x2
inet 158.38.39.50 netmask 0xff00 broadcast 158.38.39.255
droid#

It has link etc.

droid# netstat -in
Name  Mtu   Network   Address  Ipkts IerrsOpkts Oerrs Colls
lo0   33180 14 0   14 0 0
lo0   33180 127/0 127.0.0.1   14 0   14 0 0
lo0   33180 default   ::1 14 0   14 0 0
lo0   33180 default   fe80::1 14 0   14 0 0
awge0 1500  00:1e:06:20:1b:50   22 0   93 0 0
awge0 1500  default   fe80::21e:6ff:fe2   22 0   93 0 0
awge0 1500  158/0 158.38.39.5022 0   93 0 0
droid#

The non-zero parts of "netstat -s" says:

droid# netstat -s | grep -v -w 0
...
udp:
25 datagrams received
25 delivered
14 PCB hash misses
67 datagrams output
ip:
25 total packets received
25 packets for this host
67 packets sent from this host
ip6:
11 total packets received
11 packets for this host
20 packets sent from this host
Input packet histogram:
UDP: 11
Mbuf statistics:
3 one mbufs
two or more mbuf:
lo0 = 8
icmp6:
Output packet histogram:
multicast listener report: 8
neighbor solicitation: 1
udp6:
11 datagrams received
11 delivered
11 datagrams output
arp:
42 packets sent
42 request packets
42 packets deferred pending ARP resolution
42 dropped

and that's it.

My other ODROID which runs 7.99.47 is newly booted and which
works fine at the moment says

arp:
3 packets sent
3 request packets
1775 packets received
2 reply packets
1773 valid request packets
1773 broadcast/multicast packets
3 packets with null source IP address
2 packets deferred pending ARP resolution
2 sent

I reverted to the same 7.99.47 kernel on the "problem" ODROID,
and on first boot it worked on the network, but I had a crash
(file system related), and after the auto-reboot it has again
fallen "off the net".

Has anyone seen anything like this before?

Does anyone have suggestions for what more debugging is needed to
narrow down the actual cause?

Regards,

- Håvard


Re: gdb.old is broken

2016-10-15 Thread Havard Eidnes
> And while I'm on a roll I might as well promote -P as well. I think
> that unless you know what you are doing, -d and -P is probably
> switches you always want to apply when you do cvs update.

I agree -- that's why my ~/.cvsrc contains:

update -d -P
diff -u
rdiff -u

Regards,

- Håvard


Re: agr issue in netbsd-7

2015-08-17 Thread Havard Eidnes
 On Thu, Jul 30, 2015 at 10:25:36PM +0200, Havard Eidnes wrote:
  I tried to configure a port channel (agr0).
  When I configure the port channel only with bnx0 or only with bnx1
  everything works. If I use bnx0 and bnx1, the Cisco switch sets one of
  the two links to suspended mode.

 If I'm not terribly mistaken, the problem is that both physical
 interfaces are supposed to pick one of the ethernet addresses and use
 it as the source MAC for all the traffic passed on the aggregate
 logical interface.  Apparently, the bnx driver in NetBSD doesn't (yet)
 have the ability to change the source MAC address.

 you should be able to change it manually with ifconfig (or put the
 appropriate commands in /etc/ifconfig.bnx*) so that both use the same address.
 I've done this in the past and it worked.

Really?  OK, I may have to test that again.  But why doesn't that
happen automatically?  On another host I run an agr0 interface over a
wm0/wm1 combination, and I don't have to tweak anything manually wrt.
MAC addresses to make that work.

I should perhaps mention that I've been testing agr with bnx on
netbsd-6 code.

Regards,

- Håvard


Re: agr issue in netbsd-7

2015-07-30 Thread Havard Eidnes
 I tried to configure a port channel (agr0).
 When I configure the port channel only with bnx0 or only with bnx1
 everything works. If I use bnx0 and bnx1, the Cisco switch sets one of
 the two links to suspended mode.

If I'm not terribly mistaken, the problem is that both physical
interfaces are supposed to pick one of the ethernet addresses and use
it as the source MAC for all the traffic passed on the aggregate
logical interface.  Apparently, the bnx driver in NetBSD doesn't (yet)
have the ability to change the source MAC address.

 Partner's information:

   LACP portAdmin  Oper   Port Port
 Port Flags Priority Dev ID Age key Key Number State
 Gi7/44 SA 32768 0019.b9b0.f145 14s 0x0 0xD0 0x1 0x3D
 Gi7/46 SA 32768 0019.b9b0.f143 14s 0x0 0xD0 0x4 0xD

 Maybe the problem is the device ID. I think the device ID should be
 the same for all ports in a port channel.

Yep, I think that's correct.

Regards,

- Håvard


Re: amd64/7.0_BETA fails to probe USB devices?

2015-05-28 Thread Havard Eidnes
 I've recently been installing NetBSD on a new Lenovo RD350
 server.  I first tried booting from USB disk and from a USB
 CD-ROM drive, and both the install kernels loaded just fine.
 However, the boot medium was not probed by the 7.0_BETA amd64
 kernel. [...]

 Anyone else seeing something similar, or is this particular to
 this chipset combination?  Anything I can do to debug this
 further?

 This is PR kern/48494 :(

 As a workaround you can boot with -2 (disable ACPI).

Does anyone have any good ideas as to how to proceed with finding
a proper solution to this problem?

As you saw, disabling ACPI has its own issues (caused panic).

It seems to me that ACPICA 20131218 is quite old, and I see
traces of newer versions out there, but I have not tinkered
with that code before so have no idea how much effort it would
take to upgrade it.  ...and there's no guarantee that doing so
will actually fix the problem, of course.

Is there any more data which can be gathered to narrow in on the
root cause of this problem?

Regards,

- Håvard


Re: amd64/7.0_BETA fails to probe USB devices?

2015-04-30 Thread Havard Eidnes
 A kernel compiled with options USB_DEBUG doesn't provide any
 more information that I can see [...]

With additionally usbdebug and uhubdebug set to 10 resulted in
the attached boot messages.  cd0 is still not probed.

Regards,

- Håvard
Apr 28 23:39:11 tos-res su: he to root on /dev/pts/2

Password:
Login incorrect or refused on this terminal.
login: root
Apr 28 23:40:12 tos-res login: ROOT LOGIN (root) on tty console
Last login: Tue Apr 28 13:56:37 2015 on console
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 7.0_BETA (USB_DEBUG) #0: Tue Apr 28 13:58:53 UTC 2015

Welcome to NetBSD!

This system is running a beta release of the NetBSD operating system, aimed
at stabilizing the next formal release.  It is close to formal release quality,
but may still contain bugs, even serious ones.  Please bear this in mind and
use the system with care.

You are encouraged to test this version as thoroughly as possible.  Should you
encounter any problem, please report it back to the development team using the
send-pr(1) utility (requires a working MTA).  If yours is not properly set up,
use the web interface at: http://www.NetBSD.org/support/send-pr.html

Thank you for helping us test and improve this beta NetBSD release.

You have new mail.
HHHHHHH  
  HHTerminal type is vt100.
We recommend that you create a non-root account and use su(1) for root access.
tos-res# shutdown -r now
Shutdown NOW!
shutdown: [pid 2676]
tos-res# wall: You have write permission turned off; no reply possible
   
*** FINAL System shutdown message from r...@tos-res.uninett.no ***   
System going down IMMEDIATELY  
   
   
Apr 28 23:40:17 tos-res shutdown: reboot by root: 

System shutdown time has arrived

About to run shutdown hooks...
Stopping cron.
Waiting for PIDS: 1951.
Stopping inetd.
Waiting for PIDS: 1630.
Saved entropy to /var/db/entropy-file.
Removing block-type swap devices
swapctl: removing /dev/sd0b as swap device
Tue Apr 28 23:40:23 UTC 2015

Done running shutdown hooks.
Apr 28 23:40:29 tos-res syslogd[560]: Exiting on signal 15
syncing disks... done
uhub3: detached
uhub2: detached
ukphy3: detached
ukphy2: detached
ukphy1: detached
ukphy0: detached
wm3: detached
wm2: detached
wm1: detached
wm0: detached
atabus5: detached
atabus4: detached
atabus3: detached
atabus2: detached
atabus1: detached
atabus0: detached
pci7: detached
pci6: detached
pci5: detached
pci4: detached
pci3: detached
pci1: detached
sysbeep0: detached
midi0: detached
ppb6: detached
ppb5: detached
ppb4: detached
ppb3: detached
ppb2: detached
ppb0: detached
pchb0: detached
audio0: detached
pci10: detached
sd0: detached
scsibus0: detached
mfi0: detached
pci2: detached
ppb1: detachedrebooting...

 NetBSD/x86 BIOS Boot, Revision 5.9 (from NetBSD 6.1.3)
 Memory: 635/1933196 k

 1. Boot normally
 2. Boot single user
 3. Disable ACPI
 4. Disable ACPI and SMP
 5. Drop to boot prompt

Choose an option; RETURN for default; SPACE to stop countdown.
Option 1 will be chosen in 5 seconds. 0 seconds. 

Option: [1]:5
type ? or help for help.
 boot netbsd.usb

Re: amd64/7.0_BETA fails to probe USB devices?

2015-04-29 Thread Havard Eidnes
 I've recently been installing NetBSD on a new Lenovo RD350
 server.  I first tried booting from USB disk and from a USB
 CD-ROM drive, and both the install kernels loaded just fine.
 However, the boot medium was not probed by the 7.0_BETA amd64
 kernel. [...]

 Anyone else seeing something similar, or is this particular to
 this chipset combination?  Anything I can do to debug this
 further?

 This is PR kern/48494 :(

 As a workaround you can boot with -2 (disable ACPI).

In my case that ends in tears:

coretemp4 at cpu4: thermal sensor, 1 C resolution
coretemp5 at cpu5: thermal sensor, 1 C resolution
coretemp6 at cpu6: thermal sensor, 1 C resolution
coretemp7 at cpu7: thermal sensor, 1 C resolution
fatal privileged instruction fault in supervisor mode
trap type 0 code 0 rip fe887df40d81 cs 8 rflags 10a82 cr2 0 ilevel 8 rsp 
80f9f2c5
curlwp 0xfe813a2a0240 pid 0.58 lowest kstack 0xfe813a2ab2c0
keriels0p wvilingd  setrndt fo  auvi trato sedele..Stopped in pid 0.58 (system) 
at fe887df40d81:   invalid address
db{0} tra
?() at fe887df40d81
db{0} x/i fe887df40d81
fe887df40d81:   invalid address
db{0}

Regards,

- Håvard


Re: amd64/7.0_BETA fails to probe USB devices?

2015-04-29 Thread Havard Eidnes
 This is PR kern/48494 :(

I tried a -current kernel, no improvement.

I also tried the change which was suggested in the PR (and then
reverted), and that also made no difference.

Regards,

- Håvard


amd64/7.0_BETA fails to probe USB devices?

2015-04-28 Thread Havard Eidnes
Hi,

I've recently been installing NetBSD on a new Lenovo RD350
server.  I first tried booting from USB disk and from a USB
CD-ROM drive, and both the install kernels loaded just fine.
However, the boot medium was not probed by the 7.0_BETA amd64
kernel.

The kernel on NetBSD 6.1.3 CD-ROM install media which I had
laying around however *did* probe the boot media, like so:

uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
...
uhub2 at uhub0 port 1: vendor 0x8087 product 0x800a, class 9/0, rev 2.00/0.05, 
addr 2
uhub2: single transaction translator
uhub3 at uhub1 port 1: vendor 0x8087 product 0x8002, class 9/0, rev 2.00/0.05, 
addr 2
uhub3: single transaction translator
umass0 at uhub3 port 2 configuration 1 interface 0
umass0: TSSTcorp USB Mass Storage Device, rev 2.00/6.10, addr 3
atapibus0 at umass0: 2 targets
cd0 at atapibus0 drive 0: TSSTcorp, CDDVDW SE-S084B, TS00 cdrom removable
uhub3: device problem, disabling port 3
...

I suspect the device problem is the USB keyboard, but I had
already switched over to use the serial port as console, so that
doesn't matter.

I ended up using the 6.1.3 installer to install a 7.0_BETA
installation, and that worked reasonably well when I realized
which ones of the 4 ethernet ports in my server worked in 6.1.3
(not the on-board ones, which are wm's of the I250-T1 variant)

Now, after having booted up the 7.0_BETA kernel, I can confirm
that the kernel does not react when I plug a USB flash disk in,
and the USB CD-ROM drive which is still plugged in is also not
detected.

Anyone else seeing something similar, or is this particular to
this chipset combination?  Anything I can do to debug this
further?

A kernel compiled with options USB_DEBUG doesn't provide any
more information that I can see, it just says:

uhub0 at usb0: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub1 at usb1: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
...
uhub2 at uhub0 port 1: vendor 0x8087 product 0x800a, class 9/0, rev 2.00/0.05, 
addr 2
uhub2: single transaction translator
uhub3 at uhub1 port 1: vendor 0x8087 product 0x8002, class 9/0, rev 2.00/0.05, 
addr 2
uhub3: single transaction translator
...

and that's it, even though the CD drive from above is still
plugged in.

What changed in the USB device detection between 6.1.3 and
7.0_BETA which might explain this?


Regards,

- Håvard


7.0_BETA and 7.99.10 fail to find USB devices

2015-04-27 Thread Havard Eidnes
Hi,

I've recently had occasion to try to install NetBSD on a new
Lenovo RD350 1U server.  I've tried the following versions and
boot device combinations:

7.0_BETA from USB flash disk
7.0_BETA from a USB CD-ROM
7.99.10 from a USB CD-ROM
6.1.3 from a USB CD-ROM

I've also hooked up a Dell USB keyboard to the host to be able to
interact with the boot loader, but all the bootups were done with
a serial console.

(That the RAID device finds a disk in some of the latter ones is
me figuring out how to configure it via the firmware.)

The problem I'm seeing is that neither of 7.0_BETA nor 7.99.10
manage to probe either the USB flash disk (when booted via that),
the USB CD-ROM, or the USB keyboard, and ends up prompting

root device: 

after loading and running the OS.

6.1.3 on the other hand manages to find the USB CD-ROM and the
keyboard.

The kernel boot messages from the last three combinations
mentioned above follow attached below.

Anyone have any idea what the problem might be?
Should I send-pr this problem?

Regards,

- Håvard
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 7.0_BETA (GENERIC.201504240620Z)
total memory = 32610 MB
avail memory = 31647 MB
kern.module.path=/stand/amd64/7.0/modules
mainbus0 (root)
ACPI: RSDP 0xf0530 24 (v02 LENOVO)
ACPI: XSDT 0x766c80b0 E4 (v01 LENOVO SV-INT   0126 AMI  00010013)
ACPI: FACP 0x766f56b0 00010C (v05 LENOVO SV-INT   0126 AMI  00010013)
ACPI: DSDT 0x766c8228 02D486 (v02 LENOVO SV-INT   0126 INTL 20091013)
ACPI: FACS 0x7764bf80 40
ACPI: APIC 0x766f57c0 000138 (v03 LENOVO SV-INT   0126 AMI  00010013)
ACPI: FPDT 0x766f58f8 44 (v01 LENOVO SV-INT   0126 AMI  00010013)
ACPI: FIDT 0x766f5940 9C (v01 LENOVO SV-INT   0126 AMI  00010013)
ACPI: SPMI 0x766f59e0 40 (v05 LENOVO SV-INT   0126 AMI. )
ACPI: MCFG 0x766f5a20 3C (v01 LENOVO SV-INT   0126 MSFT 0097)
ACPI: UEFI 0x766f5a60 42 (v01 LENOVO SV-INT   0126  )
ACPI: BDAT 0x766f5aa8 30 (v01 LENOVO SV-INT   0126 INTL 20091013)
ACPI: HPET 0x766f5ad8 38 (v01 LENOVO SV-INT   0126 INTL 20091013)
ACPI: MSCT 0x766f5b10 90 (v01 LENOVO SV-INT   0126 INTL 20091013)
ACPI: PMCT 0x766f5ba0 64 (v01 LENOVO SV-INT   0126 INTL 20091013)
ACPI: SLIT 0x766f5c08 2D (v01 LENOVO SV-INT   0126 INTL 20091013)
ACPI: SRAT 0x766f5c38 000E58 (v03 LENOVO SV-INT   0126 INTL 20091013)
ACPI: WDDT 0x766f6a90 40 (v01 LENOVO SV-INT   0126 INTL 20091013)
ACPI: SSDT 0x766f6ad0 00EE25 (v01 LENOVOPmMgt 0126 INTL 20120913)
ACPI: SLIC 0x767058f8 000176 (v01 ??  0126 AMI  00010013)
ACPI: SSDT 0x76705a70 001BAF (v02 LENOVO SpsNm0126 INTL 20120913)
ACPI: SSDT 0x76707620 64 (v02 LENOVO SpsNvs   0126 INTL 20120913)
ACPI: PRAD 0x76707688 000102 (v02 LENOVO SV-INT   0126 INTL 20120913)
ACPI: DMAR 0x76707790 E6 (v01 LENOVO SV-INT   0126 INTL 20091013)
ACPI: HEST 0x76707878 0002BC (v01 LENOVO SV-INT   0126 INTL 0001)
ACPI: BERT 0x76707b38 30 (v01 LENOVO SV-INT   0126 INTL 0001)
ACPI: ERST 0x76707b68 000230 (v01 LENOVO SV-INT   0126 INTL 0001)
ACPI: EINJ 0x76707d98 000150 (v01 LENOVO SV-INT   0126 INTL 0001)
ACPI: All ACPI Tables successfully acquired
cpu0 at mainbus0 apid 0: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu1 at mainbus0 apid 2: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu2 at mainbus0 apid 4: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu3 at mainbus0 apid 6: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu4 at mainbus0 apid 8: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu5 at mainbus0 apid 10: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu6 at mainbus0 apid 12: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu7 at mainbus0 apid 14: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu8 at mainbus0 apid 1: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu9 at mainbus0 apid 3: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu10 at mainbus0 apid 5: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu11 at mainbus0 apid 7: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu12 at mainbus0 apid 9: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu13 at mainbus0 apid 11: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu14 at mainbus0 apid 13: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
cpu15 at mainbus0 apid 15: Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz, id 0x306f2
ioapic0 at mainbus0 apid 1
ioapic1 at mainbus0 apid 2
acpi0 at mainbus0: Intel ACPICA 20131218
hpet0 at acpi0: high precision event timer (mem 0xfed0-0xfed00400)
SCK0 (ACPI0004) at acpi0 not 

X11 on Lenovo T430

2014-12-23 Thread Havard Eidnes
Hi,

I'm running netbsd-7 code on my new Lenovo T430 laptop.  I'm
using code from November 27 at the moment, with the DRM/KMS
kernel, and there are a few glitches:

1) Sometimes the rendering of images e.g. in a web browser
   (firefox) is mangled / interlaced (not sure how to best
   describe it).  Sometimes causing a re-paint fixes the glitch,
   sometimes it doesn't (by the looks of it).

2) Sometimes X11 appears to hang, and I get these messages in my
   kernel message buffer:

drm: stuck on blitter ring
drm: GPU HANG: ecode 2:0x87d0fff5, reason: Ring hung, action: reset
drm: GPU hangs can indicate a bug anywhere in the entire gfx stack, including 
userspace.
drm: Please file a _new_ bug report on bugs.freedesktop.org against DRI - 
DRM/Intel
drm: drm/i915 developers can then reassign to the right component if it's not a 
kernel issue.
drm: The gpu crash dump is required to analyze gpu hangs, so please always 
attach it.
drm: GPU crash dump saved to /sys/class/drm/card0/error
i915drmkms0: interrupting at ioapic0 pin 16 (pci0@pci::00:02.0)
drm: Enabling RC6 states: RC6 on, RC6p on, RC6pp off

(There's no GPU crash dump that I can find anywhere...)

The hardware is probed by the kernel as:

i915drmkms0 at pci0 dev 2 function 0: Intel Ivy Bridge Integrated Graphics 
Device (rev. 0x09)
drm: Memory usable by graphics device = 2048M
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
i915drmkms0: interrupting at ioapic0 pin 16 (i915)
drm: GMBUS [i915 gmbus dpb] timed out, falling back to bit banging on pin 5
intelfb0 at i915drmkms0
i915drmkms0: info: registered panic notifier
intelfb0: framebuffer at 0x800118fe6000, size 1600x900, depth 32, stride 
6400
wsdisplay0 at intelfb0 kbdmux 1: console (default, vt100 emulation), using 
wskbd0
wsmux1: connecting to wsdisplay0

and pcictl says:

000:02:0: Intel Ivy Bridge Integrated Graphics Device (VGA display, revision 
0x09)
or
000:02:0: 0x01668086 (0x0309)

I'm about to update to newer netbsd-7 code (just updated my
source tree), unless someone tells me no-no, don't do that.

Regards,

- Håvard


Re: panic with 7.0_BETA

2014-10-27 Thread Havard Eidnes
 Another datapoint maybe. I've manage to rebuild a few 7.0_BETAs
 without scratching the old obj files and all arch i386 has
 survived those upgrades. The only system that has crashed for
 me was arch amd64. Was that what you where running on your
 machine too?

Yep, testing with NetBSD/amd64.

- Håvard



GCCs mm_malloc.h?

2014-10-21 Thread Havard Eidnes
Hi,

for some odd reason that I've not yet found, the file
/usr/include/gcc-4.5/mm_malloc.h is being included by one of the
configure tests for the net/libcmis package, and configure is
failing with this error:

/usr/include/gcc-4.5/mm_malloc.h:34:64: error: declaration of 'int 
posix_memalign(void**, size_t, size_t) throw ()' throws different exceptions
/usr/include/stdlib.h:237:6: error: from previous declaration 'int 
posix_memalign(void**, size_t, size_t)'

The mm_malloc.h header is itself including stdlib.h, the section
of code looks like this:

#include stdlib.h

/* We can't depend on stdlib.h since the prototype of posix_memalign
   may not be visible.  */
#ifndef __cplusplus
extern int posix_memalign (void **, size_t, size_t);
#else
extern C int posix_memalign (void **, size_t, size_t) throw ();
#endif

So why the throw()?!?  I didn't know C code could throw C++
exceptions, but I've not been following in detail what GCC is up
to lately...

Including this file on NetBSD/amd64 6.1.5 (my particular case)
will therefore always file if you're compiling C++.

What's the appropriate fix?

This is causing net/libcmis to fail to build which is in turn
causing net/libreoffice4 to fail to build...

Regards,

- Håvard


Re: Problemns after updating from 6.1.4 to 6.1.5

2014-10-18 Thread Havard Eidnes
  It's not at all clear to me where maildrop directory is.  And it is 
  also not clear to me why this is broken, since I took great pains to 
  avoid modifying the postfix {master,main}.cf files during etcupdate.

 I hit that last week - I think it is a change postfix...

 $ ls -ld /var/spool/postfix/maildrop
 drwx-wx--- 2 postfix maildrop 512 Oct 18 10:12
 /var/spool/postfix/maildrop

 Yep, that's what my machine says, too.  (Identical except for the
 mtime.)

 But, since postdrop runs as setgid=maildrop it should be able to write
 the files:

 $ ls -l `which postdrop`
 -r-xr-xr-x  1 root  wheel  183109 Sep 30 00:38 /usr/sbin/postdrop

 Any clue on how to fix?

Yesterday I upgraded a local host (via local src build) to 6.1.5
and have:

# ls -l `which postdrop`
-r-xr-sr-x  1 root  maildrop  183109 Oct 17 13:41 /usr/sbin/postdrop
# 

Regards,

- Håvard


Re: Problemns after updating from 6.1.4 to 6.1.5

2014-10-18 Thread Havard Eidnes
 Ooopppsss!


 red-face

 Seems like I forgot the -p option when untarring the distribution
 sets!

Been there, done that, wasn't too pleasant... :)

Regards,

- Håvard


Re: Intel C602 chipset support?

2013-08-28 Thread Havard Eidnes
 For reference I include the dmesg outputs which I could capture
 since the network device *is* supported.

Bah, forgot the -current dmesg.  Here it is.

- Håvard
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.

NetBSD 6.99.23 (GENERIC) #0: Tue Aug 27 12:01:33 UTC 2013

bui...@b6.netbsd.org:/home/builds/ab/HEAD/amd64/201308271010Z-obj/home/builds/ab/HEAD/src/sys/arch/amd64/compile/GENERIC
total memory = 16351 MB
avail memory = 15860 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter i8254 frequency 1193182 Hz quality 100
Supermicro X9DRW (0123456789)
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu1 at mainbus0 apid 2: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu2 at mainbus0 apid 4: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu3 at mainbus0 apid 6: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu4 at mainbus0 apid 8: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu5 at mainbus0 apid 10: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu6 at mainbus0 apid 32: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu7 at mainbus0 apid 34: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu8 at mainbus0 apid 36: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu9 at mainbus0 apid 38: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu10 at mainbus0 apid 40: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu11 at mainbus0 apid 42: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu12 at mainbus0 apid 1: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu13 at mainbus0 apid 3: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu14 at mainbus0 apid 5: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu15 at mainbus0 apid 7: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu16 at mainbus0 apid 9: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu17 at mainbus0 apid 11: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu18 at mainbus0 apid 33: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu19 at mainbus0 apid 35: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu20 at mainbus0 apid 37: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu21 at mainbus0 apid 39: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu22 at mainbus0 apid 41: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
cpu23 at mainbus0 apid 43: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, id 0x206d7
ioapic0 at mainbus0 apid 0: pa 0xfec0, version 0x20, 24 pins
ioapic1 at mainbus0 apid 2: pa 0xfec01000, version 0x20, 24 pins
ioapic2 at mainbus0 apid 3: pa 0xfec4, version 0x20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20110623
acpi0: X/RSDT: OemId SUPERM,SMCI--MB,0001, AslId AMI ,00010013
acpi0: SCI interrupting at int 9
timecounter: Timecounter ACPI-Fast frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed0-0xfed00400)
timecounter: Timecounter hpet0 frequency 14318180 Hz quality 2000
IOH (PNP0C01) at acpi0 not configured
VTDR (PNP0C02) at acpi0 not configured
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x43 irq 0
pcppi1 at acpi0 (SPKR, PNP0800): io 0x61
midi0 at pcppi1: PC speaker
sysbeep0 at pcppi1
RMSC (PNP0C02) at acpi0 not configured
SIO1 (PNP0C02) at acpi0 not configured
pckbc1 at acpi0 (PS2M, PNP0F03) (aux port): io 0x60,0x64 irq 12
SIO2 (PNP0C02) at acpi0 not configured
UA11 (PNP0501) at acpi0 not configured
UA12 (PNP0501) at acpi0 not configured
SPMI (IPI0001) at acpi0 not configured
PCH (PNP0C01) at acpi0 not configured
CWDT (INT3F0D) at acpi0 not configured
IOH1 (PNP0C01) at acpi0 not configured
VTDR (PNP0C02) at acpi0 not configured
acpibut0 at acpi0 (PWRB, PNP0C0C-170): ACPI Power Button
RMEM (PNP0C01) at acpi0 not configured
acpiwmi0 at acpi0 (SRIO, PNP0C14-0): ACPI WMI Interface
acpiwmibus at acpiwmi0 not configured
SCK0 (ACPI0004) at acpi0 not configured
SCK1 (ACPI0004) at acpi0 not configured
OMSC (PNP0C02) at acpi0 not configured
attimer1: attached to pcppi1
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: vendor 0x8086 product 0x3c00 (rev. 0x07)
ppb0 at pci0 dev 1 function 0: vendor 0x8086 product 0x3c02 (rev. 0x07)
ppb0: PCI Express 2.0 Root Port of PCI-E Root Complex x4 @ 2.5Gb/s
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci0 dev 1 function 1: vendor 0x8086 product 0x3c03 (rev. 0x07)
ppb1: PCI Express 2.0 Root Port of PCI-E Root Complex x4 @ 8.0Gb/s
ppb1: link is x4 @ 5.0Gb/s
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci2 dev 0 function 0: 

Re: Build broken for evbmips-{eb,el} - librumphijack

2013-08-24 Thread Havard Eidnes
 With sources up-to-date, I'm seeing:

 # build  librumphijack/librumphijack.so.0.0
...
 librumphijack.a(hijack.o):(.eh_frame+0x1c): warning: relocation emitted 
 against readonly section

Me too, for all mips ports, apparently.  I've not seen this error
message before, and would like hints how it ought to be fixed.

Regards,

- Håvard


Re: Patch for boot loader: menu command

2013-07-26 Thread Havard Eidnes
  How about change boot to use whatever boot.cfg says is the default
  boot option, and boot ...anything... to be the override-the-menu
  command?

 How does the attached look?  It should implement the suggested idea.

 The changes you propose sound like a good idea, however, I'm not sure
 I completely understand how the bootloader is changing.  Do you mind
 prototyping the interaction in an email?  I.e., simulate a typescript
 of someone operating bootloader with the changes in place?  [It may be
 less trouble to create an actual typescript. :-)]

Sure, I'll give it a try:

[ First boot-up will display on VGA: ]

Welcome to the NetBSD 5.3 installation CD
=

ACPI should work on all modern and legacy hardware, however if you have 
problems, please try disabling it.

If you encounter problems on hardware manufactured after 1998 with ACPI
enabled, please file a problem report including output from the 'dmesg'
command.

1) Install NetBSD
2) Install NetBSD (no ACPI)
3) Install NetBSD (no ACPI, no SMP)
4) Drop to boot prompt

Choose an option; RETURN for default; SPACE to stop countdown.

Option 1 will be chosen in 30

[ press 4 using the PC keyboard ]

type ? or help for help.
 consdev com0

[ ...and on the com0 serial port you will get: ]

Welcome to the NetBSD 5.3 installation CD
===

ACPI should work on all modern and legacy hardware, however if you have 
problems, please try disabling it.

If you encounter problems on hardware manufactured after 1998 with ACPI
enabled, please file a problem report including output from the 'dmesg'
command.
 

[ Now, with my diff, if you simply do boot, you will instead of
  just cycling through the kernel names the boot loader knows
  about, you will instead get the command from the default menu
  entry: ]

 boot
command(s): load /miniroot.kmod;boot netbsd
booting cd0a:netbsd
10089136+518916+618576 [521152+509152]=0xbb28d8
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010
The NetBSD Foundation, Inc.  All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All

etc.


This is with the slightly modified diff, attached.

One problem is that if boot.cfg only contains boot as a
command, that will create an infinite loop in the boot loader.  I
can add a static int to allow bootdefault()'s body only to be
invoked once, breaking this loop.

I still think it's a good idea to add the menu command, in that
case the interaction could go something like this:


[ First boot-up will display on VGA: ]

Welcome to the NetBSD 5.3 installation CD
=

ACPI should work on all modern and legacy hardware, however if you have 
problems, please try disabling it.

If you encounter problems on hardware manufactured after 1998 with ACPI
enabled, please file a problem report including output from the 'dmesg'
command.

1) Install NetBSD
2) Install NetBSD (no ACPI)
3) Install NetBSD (no ACPI, no SMP)
4) Drop to boot prompt

Choose an option; RETURN for default; SPACE to stop countdown.

Option 1 will be chosen in 30

[ press 4 using the PC keyboard ]

type ? or help for help.
 consdev com0

[ ...and on the com0 serial port you will get: ]

Welcome to the NetBSD 5.3 installation CD
===

ACPI should work on all modern and legacy hardware, however if you have 
problems, please try disabling it.

If you encounter problems on hardware manufactured after 1998 with ACPI
enabled, please file a problem report including output from the 'dmesg'
command.
 
 menu
Welcome to the NetBSD 5.3 installation CD
=

ACPI should work on all modern and legacy hardware, however if you have 
problems, please try disabling it.

If you encounter problems on hardware manufactured after 1998 with ACPI
enabled, please file a problem report including output from the 'dmesg'
command.

1) Install NetBSD
2) Install NetBSD (no ACPI)
3) Install NetBSD (no ACPI, no SMP)
4) Drop to boot prompt

Choose an option; RETURN for default; SPACE to stop countdown.

Option 1 will be chosen in 30

[ and you can then chose whichever option you want ]

Regards,

- Håvard
Index: boot/boot2.c
===
RCS file: /cvsroot/src/sys/arch/i386/stand/boot/boot2.c,v
retrieving revision 1.58
diff -u -r1.58 boot2.c
--- boot/boot2.c4 Aug 2012 03:51:27 -   1.58
+++ boot/boot2.c26 Jul 2013 18:26:50 -
@@ -439,6 +439,10 @@
bootit(filename, howto, tell);
} else {
int i;
+
+#ifndef SMALL
+   bootdefault();
+#endif
for (i = 0; i  NUMNAMES; i++)