Re: poll(): IN/OUT vs {RD,WR}NORM

2024-05-29 Thread Brian Buhrow
Hello.  I just did a quick scan of the telnetd sources from NetBSD-5 
and there are
interesting notes in there about all of this and how urgent data is used, or 
not used, in
different cases.  A check of -current sources still have the same notes and 
code regarding all
of this in telnetd.


-Brian


Re: Forcing a USB device to "ugen"

2024-03-26 Thread Brian Buhrow
Isn't it possible to do most of what Jason proposes by using the drvctl 
interface to
detach a driver from a specific USB device?  Then, some glue could be added to 
the ugen driver
to allow it to be attached to arbitrary devices using the same drvctl 
interface?  That seems a
lot easier to me than building a registry of devices and device IDs, which will 
be  woefully
out of date before it gets published.  It also has the advantage of allowing 
the user to do
creative stuff that the developers didn't think of.  Am I missing something 
obvious?

-thanks
-Brian


Re: drm.4 man page and import of X11 drm-kms.7 and al.

2023-10-17 Thread Brian Buhrow
hello.  As someone who regularly works on the kernel, yes, I have 
kernel sources, but
having man pages that provide some explanation of various parts of the source 
is very helpful.
So, I would support making man pages available as is proposed.  Man pages are a 
great
supplement to source code, not a substitute.
-thanks
-Brian



Re: Notes on kern/57133

2023-10-12 Thread Brian Buhrow
Hello.  As a followup on this bug I now know what's causing the panic 
to get triggered.
The issue is that if a request gets requeued requeud, resid gets set to 0, 
which causes the
KASSERT to fire.  So the question is, is it always the case that if a request 
gets requeued,
resid gets set to 0, or is that only in certain cases?  A related question is 
has it always
been the case that resid gets reset to 0 on a requeue, but the mpii(4) driver 
didn't used to
check for this condition?  Or, is the resetting of resid to 0 a new phenomenon? 
 Note that I
did a search of all the scsi/atapi drivers in the tree and I couldn't find 
another instance
where this check is done.  So, I'm inclined to think this check is relatively 
new and it's just
that no one has run into it before.
While it would be interesting to know why these particular requests are getting 
requeued, and
I'll try to figure that out, if it is true that resid gets set to 0 on requeue, 
then this check
is just wrong in the mpii(4) driver.

Any thoughts from anyone?

-Brian


Re: kern.boottime drift after boot?

2023-10-10 Thread Brian Buhrow
Hello David.  Does it always drift backward?  And, it looks like it 
drifted by one second.
Does it ever drift by more than one second?
If not, then a work around in your script would be to assume that if boottime 
is less than 2
seconds from your stored time, then you haven't rebooted.

-Brian



Re: Notes on kern/57133

2023-10-06 Thread Brian Buhrow
hello.  Following up on this thread, I have more diagnostic output.  

The problem occurs when a call to scsipi_get_opcodeinfo() is made for a device. 
 By the time
the request hits the adapter, as shown in the below output, the discrepancy 
between xs->resid
and xs->datalen has already occurred.  I can see where xs->resid is set equal 
to xs->datalen in
scsipi_execute_xs(), but I haven't found where xs->resid gets zeroed out before 
it makes it to
the mpii(4) driver.  Clearly I'm missing something, so I'll keep looking.  I 
don't think it's
corruption or a threading problem, but, rather, something that's different 
about this
particular request path. One change I can try is to put the diagnostic printf 
higher in the
scsipi_request function, in case all the loading of data into the IOC's ccb 
buffers is
corrupting something.
In the mean time, if anyone has ideas, I'm all ears.

-thanks
-Brian


The below output is a snippet from the probing of sd0 on mpii0.
See the e-mail in kern/57133 for a full dmesg output on this card.


[  10.5750642] sd0: tagged queueing
[  10.7650647] mpii0(scsipi_request): resid = 0, datalen = 16384
[  10.7750642] mpii0(scsipi_request): SCSI command info is: 0xa3 0c 80 00 00 00 
00 00 40 00 00 00
[  10.7750642] mpii0(scsipi_request): sx_control = 0x1020, XS_CTL_POLL = 0x2
[  10.7850642] mpii0: resid = 0, datalen = 16384
[  10.7950642] mpii0: SCSI command info is: 0xa3 0c 80 00 00 00 00 00 40 00 00 
00


Re: Notes on kern/57133

2023-10-04 Thread Brian Buhrow
hello Edgar.  I agree with your analysis.  I'll put the check in and 
recompile my test kernel
and see what we get.  I suspect it is an error race in the mpii(4) driver.  I 
notice the call
to scsipi_get_opcodeinfo only happens if the device in question reports as a 
SCSI V3 device.
Is there a way to easily check the SCSI version that a given device reports?  
scsictl identify
doesn't seem to do it.
-thanks
-Brian


Notes on kern/57133

2023-10-04 Thread Brian Buhrow
hello everyone!  Recently I ran across the problem described in 
kern/57133 with the
mpii(4) driver.  I patched sys/dev/pci/mpii.c to provide more diagnostic 
information when the
condition occurs and to avoid the panic.  I also provided output showing the 
SCSI command that
triggers the condition. Could someone look at the output below, or checkout the 
bug report in
gnats and provide details on what this command is?  I also looked through the 
kernel sources
and don't see any other drivers that implement the assertion that triggered the 
original panic
which created the bug report.  Any reason I shouldn't commit the work around 
for the mpii(4)
driver?  I realize it doesn't fix the actual problem, but it will allow folks 
to use systems
with NetBSD that present the problem and will provide diagnostic details which 
can be reported
back to the community for further analysis.


Thoughts?
-Brian


[  10.5574554] sd0 at scsibus0 target 0 lun 0:  
disk fixed
[  10.5574554] sd0: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 
7814037168 sectors
[  10.5681363] dk0 at sd0: "1adc589e-4f32-11ee-b97c-00259036fd2e", 7814033005 
blocks at 34, type: raidframe
[  10.5853388] sd0: tagged queueing
[  10.7574556] mpii0: resid = 0, datalen = 16384
[  10.7574556] mpii0: SCSI command info is: 0xa3 0c 80 00 00 00 00 00 40 00 00 
00
sd1 at scsibus0 target 1 lun 0:  disk fixed
[  10.7774550] sd1: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 
7814037168 sectors
[  10.8374548] mssd1: tagged queueing
[  11.0074552] mpii0: resid = 0, datalen = 16384
[  11.0074552] mpii0: SCSI command info is: 0xa3 0c 80 00 00 00 00 00 40 00 00 
00
sd2 at scsibus0 target 2 lun 0:  disk fixed
[  11.0274552] sd2: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 
7814037168 sectors
[  11.0574559] sd2: 3907019088 trailing sectors not covered by disklabel
[  11.0674553] sd2: tagged queueing
[  11.2574559] mpii0: resid = 0, datalen = 16384
[  11.2574559] mpii0: SCSI command info is: 0xa3 0c 80 00 00 00 00 00 40 00 00 
00
sd3 at scsibus0 target 5 lun 0:  disk fixed
[  11.2774557] sd3: 3726 GB, 3815448 cyl, 16 head, 127 sec, 512 bytes/sect x 
7814037168 sectors
[  11.2874554] dk1 at sd3: "549e5298-5113-11ee-910e-00259036fd2e", 7814033005 
blocks at 34, type: raidframe
[  11.2974554] sd3: tagged queueing



Re: typo in raidN.conf leading to alledgedly failed component

2023-09-12 Thread Brian Buhrow
hello Edgar.  What you could do to improve that situation is exactly 
what you did, except 
use raidctl -C on the last configuration step.  Then, instead of having to 
recopy the entire 
contents of the array, all you need to do is recalculate the parity.  

-thanks
-Brian



Re: Maxphys on -current?

2023-08-04 Thread Brian Buhrow
hello.  Michael's e-mail explains the behavior I'm seeing with trying 
different block
sizes with NetBSD and FreeBSD.   
The scripts below show transfers of the same number of bytes using 1m and 64k 
block sizes for
NetBSD-9.99.77/amd64 and FreeBSD-13.1/amd64.  NetBSD is using SATA3 disks with 
NCQ enabled and
FreeBSD is using SATA3 disks with command queueing enabled.  The block size 
doesn't change the
speed of the transfers on either system.  Interestingly enough, however, the 
FreeBSD
performance is markedly worse on this test.  

-thanks
-Brian


NetBSD-99.77/amd64 with SATA3 disk
# dd if=/dev/rwd0a of=/dev/null bs=1m count=5
5+0 records in
5+0 records out
5242880 bytes transferred in 292.078 secs (179502735 bytes/sec)
# dd if=/dev/rwd0a of=/dev/null bs=65536 count=80
80+0 records in
80+0 records out
5242880 bytes transferred in 292.067 secs (179509496 bytes/sec)

FreeBSD-13.1/AMD64 with SATA3 disk
# dd if=/dev/da4 of=/dev/null bs=1m count=5
5+0 records in
5+0 records out
5242880 bytes transferred in 322.807433 secs (162415095 bytes/sec)
# dd if=/dev/da4 of=/dev/null bs=65536 count=80
80+0 records in
80+0 records out
5242880 bytes transferred in 322.433936 secs (162603232 bytes/sec)


Maxphys on -current?

2023-08-03 Thread Brian Buhrow
hello.  I know that this has ben a very long term project, but I'm 
wondering about the
status of this effort?  I note that FreeBSD-13 has a Maxphys value of 1048576 
bytes.
Have we found other ways to get more throughput from ATA disks that obviate the 
need for this
setting which I'm not aware of?
If not, is anyone working on this project?  The wiki page says the project is 
stalled.

Any thoughts or news would be greatly appreciated.

-thanks
-Brian



Re: unable to create xfer table DMA map for drive 0, error=12

2023-08-03 Thread Brian Buhrow
hello.  The error says you ran out of memory.  My guess is your machine 
has been running
for a while and the amount of contiguous memory available for the kernel to 
allocate has
fragmented, leading to the issue.  I seem to remember versions of NetBSD ealier 
than V8 were
prone to this issue.  it may very well be NetBSD is still  prone to this issue, 
though I've not
seen it for a while, even on my NetBSD-5 fleet of machines.  My guess is a 
reboot will fix the
issue.
-thanks
-Brian



Re: flock(2): locking against itself?

2023-03-30 Thread Brian Buhrow
hello.  You probably already researched this, but it looks like 
newterm() is in the curses
library in NetBSD-5, so getting it to work in NetBSD-1.4T shouldn't be that 
difficult.
-Brian



Re: flock(2): locking against itself?

2023-03-30 Thread Brian Buhrow
Hello.  Yes, I realized the error I'd made after I sent you the e-mail. 
 I wonder if you
could utilize a pty to do what you need to get two different tty structures, 
one blocking for
curses, and the other non-blocking?
-thanks
-Brian



Re: flock(2): locking against itself?

2023-03-30 Thread Brian Buhrow
hello.  I may be missing something in your curses non-blocking case, 
but can't you work
around the issue by setting up an independent file descriptor, and hence tty 
structure, by
performing a dup2(2) on stdin and then closing the original stdin file 
descriptor?
Then, of course, you can dup(2) the newly opened file descriptor back into the 
stdin position
for purposes of your program.

Just a thought; I haven't tried it.

-thanks
-Brian


Possible problem with WAPBL on FFSV1

2023-03-22 Thread Brian Buhrow
Hello.  Recently I saw a panic on two different 9.2_stable machines 
involving the
filesystem.  The two machines in question are virtual machines, running under 
Xen, but I don't
think that's relevant here.  While I'm not sure what the initial panic message 
was, since they
were rebooted by an external monitoring script, the result was that they would 
continually
panic when a specific directory was accessed.

Setup:
Both machines are running a single FFSV1 root filesystem and one directory has 
over 32,000
files in it.  This directory is continually appended and, once a day, files are 
purged from
it.  When the panic occurs, the systems reboot, run their WAPBL logs, don't 
check the
filesystem and, once they access the very large directory, panic again.

Once I brought up the VM's single user and ran fsck against the root 
filesystem, fsck
complained that the very large directory contained empty blocks.  I told it to 
clean the
filesystem and it advised me to run fsck against the same filesystem when I was 
done with the
current run of fsck.  I did, it checked out okay, and things seem to be running 
again without a
problem.

My concern here is that the filesystem can get into a state where 
WAPBL's journal cannot
correct the filesystem. Since I'm pretty sure there was no memory corruption or 
disk corruption,
especially not on two different VM's, I'm wondering if anyone else has seen a 
problem with
WAPBL on FFSV1 with large  directories?

-thanks
-Brian



Re: building 9.1 kernel with /usr/src elsewhere?

2023-03-07 Thread Brian Buhrow
hello.  I regularly build kernels outside of the /usr/src location.  My 
technique is to
install the source in some location: /usr/local/netbsd/src-91, for example, 
then put my
configuration file in: /usr/local/netbsd/src-91/sys/arch//conf/
Then
cd /usr/local/netbsd/src-91/sys/arch//conf
config 
cd ../compile/
make -j 4 >& make.log
And, I'm off to the races.

Does that not work for you?
-Brian



Vmstat -s on -current and -10 shows no local-cpu page allocations under Xen -- is that correct?

2023-02-22 Thread Brian Buhrow
Hello.  I'm running a series of machines, both Xen VM machines and 
machines running on
bare metal, and I notice that on my systems running NetBSD-9.99.77, vmstat -s 
shows that
local-cpu page allocations are never available on the xen VM's, see below.  
Yet, on machines
running on bare metal, local-cpu page allocations are often available, which is 
what I would
expect.  A hand inspection of the sys/uvm kernel sources leads me to suspect 
this condition
still exists under today's -current and the -10 branch of the source tree.  The 
Xen VM I'm
showing is configured with 2 VCPU's and the bare metal example I'm showing has 
4 CPUs.  My
questions are as follows:

1.  What does the local-cpu stat actually reference?  Is it counting the number 
of times a CPU
requested a page of memory and a page was available from the local per-cpu free 
list?

2.  What's special about Xen that makes no  local-cpu allocations available?  
(For the record, I see this behavior on all the 9.99.77 Xen machines I have 
running.)
As a point of reference, under NetBSD-9.2, local-cpu allocations are available 
much of the time under Xen.

3.  Is this a design decision or is it an actual bug?

-thanks
-Brian



2908605631 pagealloc desired color avail
242456994 pagealloc desired color not avail
0 pagealloc local cpu avail
3151062625 pagealloc local cpu not avail
18763 faults with no memory


355234902 pagealloc desired color avail
   250631 pagealloc desired color not avail
310440195 pagealloc local cpu avail
 45045338 pagealloc local cpu not avail
0 faults with no memory


re: devsw_detach is failing -- is this a manifestation of PR kern/56962?

2023-02-12 Thread Brian Buhrow
hello Matthew.  That line assures that if a device driver has unloaded 
and reloaded its
bdevw or cdevw interfaces, it gets assigned the same major numbers that they 
had when they
were first loaded on the system.  If a device driver has never loaded its 
bdevsw or cdevsw
interfaces on a system and the major number is dynamically assigned, this works 
as well because
execution of the for loop exits due to there being an insufficient number of 
conv structures
available when the device driver loads for the first time, so this test is 
never executed.
Once the driver loads for the first time, the conv structure is initialized 
with the correct
major numbers and this test will pass the next time the device driver loads its 
bdevsw or
cdevsw interfaces.  There is one caveat here that was always true, but will 
absolutely be
enforced with this patch.  That is, the caller of devsw_attach() should ensure 
the bmajor and
cmajor numbers they pass into the function point to different variables in 
their function, even
if they do not implement the bdevsw interface.  If they fail to do this, 
they'll get EINVAL
back from the devsw_attach call if they load their bdevsw or cdevsw interfaces 
on the system
more than once.  

-thanks
-Brian

On Feb 13,  2:47pm, matthew green wrote:
} Subject: re: devsw_detach is failing -- is this a manifestation of PR kern
} your change seems to fix a clear but to me.
} 
} > -   if (*bmajor < 0)
} > +   if ((bdev != NULL) && (*bmajor < 0)) 
} > *bmajor = conv->d_bmajor;
} 
} there's also this line i'm curious about, just below:
} 
} if (*bmajor != conv->d_bmajor || *cmajor != conv->d_cmajor) {
} error = EINVAL;
} goto out;
} 
} should the first part also depend upon either bdev != NULL or
} perhaps (*bmajor >= 0 && bdev == NULL) as the following code
} uses...
} 
} 
} .mrg.
>-- End of excerpt from matthew green




Re: devsw_detach is failing -- is this a manifestation of PR kern/56962?

2023-02-10 Thread Brian Buhrow
hello.  Following up on this issue, I've discovered the problem with 
devsw_attach is that
if one is reattaching a previously detached driver and that driver does not 
implementa bdev
interface, devsw_attach returns an EINVAL error.  The following patch fixes 
this problem.  Any
reason I should not commit this change and request pullups for NetBSD-9 and 
NetBSD-10?

-thanks
-Brian


Index: subr_devsw.c
===
RCS file: /cvsroot/src/sys/kern/subr_devsw.c,v
retrieving revision 1.49
diff -u -r1.49 subr_devsw.c
--- subr_devsw.c29 Oct 2022 10:52:36 -  1.49
+++ subr_devsw.c10 Feb 2023 19:11:24 -
@@ -1,4 +1,4 @@
-/* $NetBSD$*/
+/* $NetBSD: subr_devsw.c,v 1.49 2022/10/29 10:52:36 riastradh Exp $
*/
 
 /*-
  * Copyright (c) 2001, 2002, 2007, 2008 The NetBSD Foundation, Inc.
@@ -69,7 +69,7 @@
  */
 
 #include 
-__KERNEL_RCSID(0, "$NetBSD$");
+__KERNEL_RCSID(0, "$NetBSD: subr_devsw.c,v 1.49 2022/10/29 10:52:36 riastradh 
Exp $");
 
 #ifdef _KERNEL_OPT
 #include "opt_dtrace.h"
@@ -397,7 +397,7 @@
if (conv->d_name == NULL || strcmp(devname, conv->d_name) != 0)
continue;
 
-   if (*bmajor < 0)
+   if ((bdev != NULL) && (*bmajor < 0)) 
*bmajor = conv->d_bmajor;
if (*cmajor < 0)
*cmajor = conv->d_cmajor;


devsw_detach is failing -- is this a manifestation of PR kern/56962?

2023-02-01 Thread Brian Buhrow
Hello.  I've found that after I perform a modunload on the module I'm 
working on, I cannot
then modload the module because devsw_attach fails to attach the same module 
again.  And, yes,
the module calls devsw_detach before it unloads.


I'm runing NetBSD-9.99.77, which predates the pr kern/56962 fixes.  Is 
this behavior
expected prior to the fix for this bug or am I running into something else?

-thanks
-Brian



Re: Finding the slot in the ioconf table a module attaches to?

2023-02-01 Thread Brian Buhrow
hello.  Okay.  That is helpful.  Passing -1 in as the cmajor number to 
the devsw_attach()
function does, in fact, assign a reasonable major number which seems to work.  
I use the
cdevsw_lookup_major() function to retrieve the assigned number and print it for 
the user.
So, thankfully, things aren't as broken as I momentarily feared.


-thanks
-Brian



Re: Finding the slot in the ioconf table a module attaches to?

2023-02-01 Thread Brian Buhrow
hello. Mouse's question is the same as mine.  It's fine to have a 
dynamically assigned
major number for the module, but since we don't have a dynamic /dev, that 
number needs to be
static enough so that devices can be created in the /dev tree.  The idea that 
one can request a
prefered major number from the kernel in the module itself partially mitigates 
this issue in
the sense that one can be pretty confident that one will get the same number 
when ever the
system boots on a given system, but it does mean that a kernel update or new 
device driver
statically linked into the kernel, or if the ordering of the module load 
changes, the major
device number assigned to a given module and the number on the node in the /dev 
tree can get
out of sync.

It looks like the devpubd(8) program partially solves this problem, at 
least for the
installation of a new module.  It can be configured, in conjunction with 
modifications to the
/dev/MAKEDEV script, to create new device nodes.  
However, as currently implemented, devpubd doesn't pass the major number to the 
MAKEDEV script.
So, the MAKEDEV script would need to have some intelligence about how to get 
the major number
from the kernel, probably by reading a sysctl variable.  Also, it doesn't look 
like the devpubd
program currently deletes any devices from the /dev tree.  'This is a problem 
since if the
major number changes, the MAKEDEV script, as currently implemented and as 
devpubd currently
calls it, won't modify an existing device in the /dev tree.

Having said that, it looks like the easiest way to make this problem 
better is to extend
the functionality of the devpubd program to allow it to dynamically add and 
delete devices from
the /dev tree and to provide a standardized mechanism for extracting the major 
number from the
kernel and passing it to the MAKEDEV script.

-thanks
-Brian



Re: Finding the slot in the ioconf table a module attaches to?

2023-02-01 Thread Brian Buhrow
hello Brad.  I thought the idea behind modules was that you didn't need 
to rebuild a
kernel to add devices to the ioconf table?  And, in fact, under the old module 
framework, that
is, NetBSD-5 and earlier, you could add devices and major numbers to the table 
without having
to rebuild the kernel.  If, in fact, I need to rebuild the kernel to add device 
drivers to the
kernel, then I submit our module framework is fatally broken.  So, I'll hope 
that isn't the
case and proceed.  If I figure out how to do it, I'll post here so others won't 
have to climb
that learning curve using the same path.
-thanks
-Brian



Re: Finding the slot in the ioconf table a module attaches to?

2023-02-01 Thread Brian Buhrow
hello.  Following up on my own post, I found the mechanism by which the 
cdevsw structure
gets tied to the ioconf table in NetBSD-5.  It's done with:


MOD_DEV("zaptel", "zaptel", NULL, -1, _cdevsw, ZT_MAJOR)

This macro has been removed from the new module framework.  Can someone point 
me in the correct
direction as to where to look for the replacement function for this macro with 
the new module
framework?

I'm not finding it and it seems like it should be simple.
-thanks
-Brian



Finding the slot in the ioconf table a module attaches to?

2023-01-31 Thread Brian Buhrow
hello.  Following up on my module question of last night, I now have 
the module loading
and unloading successfully.  However, when I try to open the devices I've 
associated with the
module, I get a device not configured error.  

Under NetBSD-5, Major number 196 was available, and the open worked.
Under NetBSD-9.99.77, major #196 is taken, so I've elected to use major #222 
instead.

Under both copies of the module source code (Working netBSD-5 and 
work-in-progress NetBSD-9),
 I have something that looks like:



dev_type_open(ztopen);
const struct cdevsw zaptel_cdevsw = {
ztopen, noclose, noread, nowrite, noioctl, nostop, notty,
nopoll, nommap, nokqfilter, nodiscard, 0
};


But I dont see how this gets associated with the device table in either version.

Since I don't see any errors, I'm assuming it's getting attached to the cdevs 
table, but,
somehow, it's not using the major number I think it is.  I believe, but am not 
sure, that under
NetBSD-5, it's just using the next available major number, which happens to be 
196.  Obviously,
that's different than under NetBSD-9.x.  So my question is, 
is there a way to examine the cdevs table on a running system and figure out 
which major number
it's choosing?  There doesn't appear to be a way to hard code the major number, 
but I could be
missing that since we didn't do that on the older version.
-thanks
-Brian



Re: How to build a custom module under NetBSD--9.x and newer?

2023-01-31 Thread Brian Buhrow
hello.  Thank you so much for the answer.  Yes, the problem was that I 
had two  source
files that declared they wer a module.  Since they were modules of different 
names, I assumed
they could all be compiled into one object file.   A small change to the code 
has allowed me to
compile the module and load it into the kernel.  Now, I will proceed with my 
testing.
Thank you again for the quick heads up.
-Brian



How to build a custom module under NetBSD--9.x and newer?

2023-01-30 Thread Brian Buhrow
Hello.  I'm working on porting an old zaptel kernel module which I've 
used under NetbSD-3
and NetBSd-5 for years to a version 9-current from January 2021.  I have the 
module built,
using the bsd.module.mk make file from /usr/share, but when I try to perform a 
modload on the
final .kmod file, I get the following output:

DEBUG: module: Loading module from 
/stand/amd64/9.99.77/modules/zaptel/zaptel.kmod
DEBUG: module: Loading plist from 
/stand/amd64/9.99.77/modules/zaptel/zaptel.plist
DEBUG: module: plist load returned error 2 for 
`/stand/amd64/9.99.77/modules/zaptel/zaptel.kmod'
WARNING: module error: `link_set_modules' section wrong size (got 16, wanted 8)
WARNING: module error: cannot fetch info for `zaptel', error 8

this looks like some kind of error in the linking phase of the process, 
but I'm not sure
how to go about debugging it.

Maybe someone can give me a clue on what to try next?  Is there a flag I need 
to specify the
correct section size?
In looking at the make.log, below, the VERSION environment variable is 
non-existent in the
build environment I'm using.  Could this be the issue?  If it is, what should 
that environment
variable contain?

-thanks
-Brian



#   compile  zaptel/zaptel.o
gcc -O2 -g   -std=gnu99-Wall -Wstrict-prototypes -Wmissing-prototypes 
-Wpointer-arith -Wno-sign-compare  -Wsystem-headers   -Wno-traditional   
-Wa,--fatal-warnings  -Wreturn-type -Wswitch -Wshadow -Wcast-qual 
-Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare -Werror   
-ffreestanding  -fno-strict-aliasing -Wno-pointer-sign -mno-red-zone -mno-mmx 
-mno-sse -mno-avx -msoft-float -mcmodel=kernel -fno-omit-frame-pointer   
-I/usr/src/common/include -DDIAGNOSTIC -I/usr/home/buhrow/src/asterisk/zaptel 
-I/usr/local/netbsd/src-9977/sys -I/usr/src/common/include -DDIAGNOSTIC  
-nostdinc -I. -I/home/buhrow/src/asterisk/zaptel-netbsd-20230118/zaptel 
-isystem /usr/home/buhrow/src/asterisk -isystem 
/usr/home/buhrow/src/asterisk/arch -isystem 
/usr/home/buhrow/src/asterisk/../common/include -D_KERNEL -D_MODULE 
-DSYSCTL_INCLUDE_DESCR -c -Wno-error=implicit-fallthrough   zaptel.c -o 
zaptel.o.o
ctfconvert -L VERSION -o zaptel.o zaptel.o.o && rm -f zaptel.o.o
#   compile  zaptel/ztdummy.o
gcc -O2 -g   -std=gnu99-Wall -Wstrict-prototypes -Wmissing-prototypes 
-Wpointer-arith -Wno-sign-compare  -Wsystem-headers   -Wno-traditional   
-Wa,--fatal-warnings  -Wreturn-type -Wswitch -Wshadow -Wcast-qual 
-Wwrite-strings -Wextra -Wno-unused-parameter -Wno-sign-compare -Werror   
-ffreestanding  -fno-strict-aliasing -Wno-pointer-sign -mno-red-zone -mno-mmx 
-mno-sse -mno-avx -msoft-float -mcmodel=kernel -fno-omit-frame-pointer   
-I/usr/src/common/include -DDIAGNOSTIC -I/usr/home/buhrow/src/asterisk/zaptel 
-I/usr/local/netbsd/src-9977/sys -I/usr/src/common/include -DDIAGNOSTIC  
-nostdinc -I. -I/home/buhrow/src/asterisk/zaptel-netbsd-20230118/zaptel 
-isystem /usr/home/buhrow/src/asterisk -isystem 
/usr/home/buhrow/src/asterisk/arch -isystem 
/usr/home/buhrow/src/asterisk/../common/include -D_KERNEL -D_MODULE 
-DSYSCTL_INCLUDE_DESCR -cztdummy.c -o ztdummy.o.o
ctfconvert -L VERSION -o ztdummy.o ztdummy.o.o && rm -f ztdummy.o.o
#  link  zaptel/zaptel.kmod
gcc  -Wl,--warn-shared-textrel -Wl,-z,relro -nostdlib -r 
-Wl,-T,/usr/libdata/ldscripts/kmodule,-d  -Wl,-Map=zaptel.kmod.map  -o 
zaptel.kmod zaptel.o ztdummy.o
ctfmerge -t -L VERSION -o zaptel.kmod zaptel.o ztdummy.o


Re: ALTQ cannot be stopped Was: Fwd: 10-BETA : some network issues

2023-01-06 Thread Brian Buhrow
Hello Joel.  I'm not sure this is a new problem.  I've seen similar
behavior on NetBSD-5.2.  It seems to happen on systems where there is a
good deal of traffic traversing the network at the time the stop is
requested.
-thanks
-Brian



Re: Enable to send packets on if_loop via bpf

2022-11-09 Thread Brian Buhrow
hello.  Just as a matter of consistency, it seems like using network 
byte order would b a
better choice, since it would match other interfaces on the system.  
-thanks
-Brian



Re: debugging a kernel that doesn't start

2022-09-12 Thread Brian Buhrow
hello.  Another thing to try is to see if you can get to the boot 
prompt and boot the
kernel with various options, i.e. -a, -c, and possibly -2, to disable acpi.  If 
-c gets you to
a driver selection prompt, then you know the kernel is loaded and ready for you 
to disable
drivers.  If you don't get that far, then I'd say it's a boot loader issue.  It 
could be the
way the boot loader interacts with the BIOS, but without more info, it's hard 
to know what,
exactly, is going wrong.
-Brian



Re: Anyone recall the dreaded tstile issue?

2022-07-19 Thread Brian Buhrow
hello.  Is your git process multithreaded, or just forking in the 
traditional manner?  If
it's multithreaded, then you'll want stack traces for each thread.  To check 
for this, I'd
check to see if your process is linked against libpthread, just in case 
something is spawning
threads without your knowledge.

-thanks
-Brian



Re: Anyone recall the dreaded tstile issue?

2022-07-17 Thread Brian Buhrow
Hello.   As I remember, and the web can probably confirm, running 
lockdebug under 5.x
doesn't work at all!
I think you'll find a question on this very point from me some years ago n our 
archives.
-thanks
-Brian



Re: Anyone recall the dreaded tstile issue?

2022-07-15 Thread Brian Buhrow
hello.  If memory serves correct, this problem was discussed relative 
to NetBSD-5 when
Andrew Doran was working on the smp improvements to the kernel.  As manu 
pointed out, it could
be a result of a number of scenarios.  My take away from all the discussion was 
that the best
way to find the problem was to look at all the processes that weren't in tstile 
wait and see
what they're doing.  Everything in tstile wait is basically waiting in line for 
another
resource that's currently in use.  In other words, a lot of tstile processes is 
a symptom of a
problem, not the problem itself.  It sounds like collecting stack traces on the 
processes that
are in puffsrpl wait would be a good start and that might give you a clue as to 
what might be
getting stuck.  It also might be a good idea to get stack traces on all of the 
kernel threads
to see what they're doing.  I'm guessing there's some deadlock between puffs 
and some of the
other filesystem code on the system.

-thanks
-Brian


Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
hello.  In looking at my vmstat-m output, I see:

mclpl   211228146028146 14109 1407435   187 0 524288  35

I see no failures and the number of nmbclusters is: 524288

yet, this machine has displayed this message about 6 times since it was 
rebooted about 5 hours
ago.

Am I missing something?
-thanks
-Brian



Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
hello.  One strange thing I notice on this particular system that seems 
to be different
from the other systems I'm running is that the request count on the mclpl line 
is incrementing
at a pretty fast rate, where as on other systems, the request rate is, more or 
less, constant
over time, with occasional bursts of requests.  Even so, there are no failures 
noted, even
though the driver says it's failed to get an rx cluster a few times since the 
system was
booted.
For example, since the last  message I wrote, the mclpl line now looks like:

mclpl   211229471029440 14801 1476239   187 0 524288   8


Maybe this incrementing thing isn't a big deal, but it jumps right out as being 
different.
-thanks
-Brian



Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
hello.  In looking at the if_xennet_xenbus.c file, I see where the 
if_xennetrxbuf_cache is
initialized, but I don't see where data is put into it before it's requested.  
Is the idea that
the items in the cache are supposed to be provided by the backend, i.e. the 
dom0?  Is it
possible that dom0 isn't providing enough rx requests to satisfy the traffic 
it's sending us? I
think I understand what's supposed to happen once traffic begins flowing:  rx 
requests come in,
if_xennet_xenbus processes them and pushes them back into the 
if_xennetrxbuf_cache cache.   and
pushes them back into the if_xennetrxbuf_cache cache.  What I don't understand 
is how the
initial cache gets populated with free rx requests to use in order to get 
things started.

-thanks
-Brian



Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Brian Buhrow
Hello.  I'm running a number of NetBSD-9 and -current as of 99.77 
amd/64 domu machines on
a couple of different servers with FreeBSD as dom0.  I'm getting the following 
messages from
the kernel: 
xennet0: rx no cluster
Much of the time, these messages seem harmless, but occasionally, the network 
locks up on
machines that display this message.

In looking at the source code, I get that this is a pool allocation failure in
if_xennet_xenbus.c, but I don't understand which memory resource it's running 
out of and if
there is a way to increase that resource.  In general, the domu's in question 
seem to have
plenty of memory and I don't see a lot of memoory pressure for other tasks on 
the systems.

Has anyone else seen these messages on their domu machines and does 
anyone have ideas on
how to correct the issue?
-thanks
-Brian



Re: mfii hanging on boot

2022-06-20 Thread Brian Buhrow
hello.  Given that the system was running fine under 9.2 and you 
haven't updated anything
immediately before it began hanging, I'm inclined to think you have a hardware 
issue.  Some
questions:

1.  Does the raid card have any battery backed up cache on it?  If so, what is 
the state of the
battery?

2.  What  happens if you pull the battery and try to boot the system?

3.  Have you chedked the BIOs of the machine?  Perhaps its motherboard battery 
is going dead
and the BIOS settings are corrupt and need to be reset?

4.  Is it possible to temporarily disconnect the drives attached to the raid 
card to see if it
boots without the attached disks?

5.  Does the behavior persist if you move the raid card to a different PCI slot?

-thanks
-Brian


Re: killed: out of swap

2022-06-15 Thread Brian Buhrow
hello.  One algorithm I think that might be a good option and would 
address the concerns
that I've seen on this topic is to kill the process with the latest start time 
when looking for
resources to free.  Obviously this would fail in the case where a long-running 
process suddenly
starts consuming memory, but in general, it seems that badly behaved processes 
would
typically begin behaving badly from the beginning of their inception.  In any 
case, it should
address the issue of  X or syslogd getting killed, since they would often have 
older start
times than processes that cause the trouble.

-Brian


Re: killed: out of swap

2022-06-14 Thread Brian Buhrow
hello.  Is this something madvise(2) could be extended to do?
-thanks
-Brian

On Jun 14,  2:47pm, Mouse wrote:
} Subject: Re: killed: out of swap
} >> What might be interesting is a way to influence the order in which
} >> processes are chosen to kill...
} > I don't see any realistic way of doing anything with that.  It's
} > basically the first process that tries to allocate another page when
} > there are no more.  There are no other processes at that moment in
} > time that have the problem, so why should any of them be considered?
} 
} To answer that, consider the original poster's situation:
} 
}   > I have a program that keeps malloc()ing (and scribbling a bit
}   > into the allocated memory) until malloc() fails. The
}   > intention is to put pressure on the VM system to find out how
}   > much pool cache memory it can reclaim.
} 
} Such a program would be a prime candidate for declaring itself a
} preferred out-of-swap victim.  SunOS chill(1) - or was it chill(8)? -
} might be another example, though that's of minimal relevance to NetBSD.
} 
} It probably wouldn't be easy - the process which incurred the page
} fault would have to be put to sleep pending the death of the victim
} process - but it could provide for much better behaviour in situations
} like this.
} 
} Perhaps even better would be a way for userland to tell the kernel
} "pretend you're under severe RAM pressure and free what you can"
} without needing to actually run the system out of pages.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Slightly off topic, question about git

2022-06-05 Thread Brian Buhrow
Hello.  At the risk of raising the debate about which version control 
system we should
use, I have a question about git, as well as a comment about it relative to the 
NetBSD source
tree.  I should preface my comments with the caveat that I am not by any means 
a git expert,
and, in fact, I'm barely able to get anything I want out of it.  With that 
said, here are my
questions and observations.  I'd be interested to know how others work around 
these issues
and/or what you think of my observations.

1.  In CVS, I can do something like:
cvs log sys/dev/pci/if_bge.c
and be given a complete history of the changes to that file, as well as a list 
of all the
branches that file participates in and which versions apply to each branch.  
And, I can do this
without having to download all of the history of that file onto my local 
storage.
It seems like the only way to do this with a git repository is to 
download the entire
source tree, along with its history and branches, using git clone with an 
infinite depth.  Is
this correct?  If not, how can I see all the branches of a given repository 
without having to
download the entire repository?

2.  Also, in my exploration of git, it seems like the git log command shows all 
the commits for
each tag, rather than the comments for a specific file or object in the 
repository.  Again, is
this correct?

If I am correct in my guesses about how git works, it seems like I 
would have to download
the entire history of the NetBSD source tree if I want to browse its branches, 
or the commit
history for any given file.  This is a lot of overhead to examine tiny portions 
of the tree,
relatively speaking, assuming we move to git for our version control system.  
It strikes me
that requiring this much storage space from developers, would be a regression 
from what we
currently do.  Since I think we're smarter than that and since we have very 
smart people on our
development team, I want to understand what it is that I don't get about git 
that precludes me
from having to download the entire history of the source tree from day one 
while still
retaining access to that history over time.

-thanks
-Brian



Re: Complete lock-up from using pkgsrc/net/darkstat

2022-05-26 Thread Brian Buhrow
hello.  You're right, I missed that bit of data.  I thought the 
machines were hanging in
mid operation and "dying" in the night.

If shutting the process down  hangs the system, that makes me think the program 
is closing its
file descriptors, especially the network ones, in a way that's different than 
most programs do
it.  I wonder if the system hangs if darkstat is sent a sigkill, thus not 
allowing it to
control its death?  If it doesn't hang under that circumstance, then I'd 
definitely be
interested in understanding how it shuts things down when it receives a signal 
it can work
with.

-Brian
On May 26,  5:17pm, Mouse wrote:
} Subject: Re: Complete lock-up from using pkgsrc/net/darkstat
} > I'd be very surprised if the problem is running interfaces in
} Go back and read the original post - the hang is not during operation;
} then hang is upon killing darkstat, whether during otherwise-normal
} operation or during shutdown.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: Complete lock-up from using pkgsrc/net/darkstat

2022-05-26 Thread Brian Buhrow
hello.  Mouse's thought is a good one as well.  I'll note, for example, 
it's pretty easy
to send a machine to hang heaven with vi(1).  Edit a very large file, say one  
of 3 gigs in
size.  Then, pick a starting line a quarter of the way down the file and mark 
it.   Move down
the file and pick a line 3 quarrters of the way through the file.  Now, try and 
delete the
lines from your starting line to your ending line.  Vi will balloon to a huge 
virtual memory
size and, most likely, cause the NetBSD host to hang hard with no messages to 
the console.  It
seems like uvm should kill such processes, and in some cases it will, but in my 
experience, it
often does not, with host killing results.  The same unhappy result can also be 
achieved by
killing a long running named process that has grown to a huge virtual memory 
size.  In this
case, the system is doing something similar to what Mouse describes for core 
files, but instead
it's doing it as paart of the process cleanup procedure.

-thanks
-Brian



Re: Complete lock-up from using pkgsrc/net/darkstat

2022-05-26 Thread Brian Buhrow
hello.  I'd be very surprised if the problem is running interfaces in 
promiscuous mode.
Anyone using bridge(4), or vlans, for example, is runing their interfaces in 
promiscuous mode,
as is anyone running a dhcp server, or any sort of arpwatch.  I find it highly 
unlikely that
the network stack is that broken.  I'm guessing it's a memory issue with 
darkstat --
specifically, it has a memory leak that runs the system it runs on out of RAM.  
I bet if you
add a ton of swap to a system on which you run darkstat, you'll find it runs 
longer before it
hangs, and, I'm guessing you'll notice there is a lot of swap in use before it 
hangs.  NetBSD
has never been that good at dealing with memoory shortages.

-thanks
-Brian


Re: RAIDframe: reconstucting a temporarily lost drive (was: SATA rescan)

2021-06-16 Thread Brian Buhrow
hello.  I think you can add the new drive as a spare and then  
reconstruct to it without
rebooting.  Once the reconstruction is done, you'll need to reboot to get the 
spare drive back
in the component1 position.  So, yes, a reboot is necessary, but only one and 
it can be done at
your leisure any time afte the reconstruction is complete.  There have been 
times when I've
stretched this process out over weeks, simply to accommodate maintenance 
windows.
-thanks
-Brian


Re: Cisco USB serial console compatiblity?

2021-06-03 Thread Brian Buhrow
Hello.  Thank you very much.  Indeed the umodem(4) recognized the 
device, but I missed the
dmesg output.  
thanks for the tip.
-Brian



Cisco USB serial console compatiblity?

2021-06-02 Thread Brian Buhrow
hello.  Does anyone know what USB serial chip the Cisco USB serial 
console is closest to
in our USB serial drivers list?  the vendor code for the device I'm talking 
about is: 0x05a6
and the product code is: 0x0009.  It doesn't look like we have a driver that 
recognizes these
codes, but I'm thinking it should be easy enough to add them to an existing 
driver.  The
question is, which driver most closely supports these devices?  My simple 
Google search didn't
turn up much accept that the linux cdc_acm driver seems to support it.  I think 
this driver is
for serial class USB devices.  Do we have a generic serial clas driver, like we 
do for class
ethernet USB devices?
Any thoughts on this would be greatly appreciated.
-thanks
-Brian



Re: Devices.

2021-06-01 Thread Brian Buhrow
hello David.  What I don't see in your proposal is a way of 
implementing a dynamic device
filesystem.  NetBSD, and possibly OpenBSD, are the last Unix-like OS's that I'm 
aware of that
use static special files in their filesystems to point to devices.  If your 
proposal was
extended with the idea that we have a device filesystem, then wouldn't we only 
need one
implementation of a filesystem that knows anything about devices?  When you 
want a device tree
in a chroot, you just mount the device filesystem on that chroot.  That 
filesystem would need a
utility that converts a configuration file or database into static permissions 
and ownership,
but that could be done on a per-instance basis, i.e. have a configuration that 
describes /dev
at the root of the system, and other /dev's that describe chrooted dev 
environments, or  dev's
in application directories.  The only case that doesn't cover, that I'm aware 
of, is what to do
about sockets and fifos, which are commonly scattered all over filesystem trees.
In any case, with the advent of gpt named partitions, zfs volumes, USB 
pluggable devices
and other dynamic device environments, not having a dynamic device filesystem 
is becoming a
real detriment.  The way we handle ZFS is just goofy and fraught with bugs in 
dynamic
situations.I'd like to see a more integrated approach to this problem.

-thanks
-Brian



Re: 9.1: boot-time delay?

2021-05-25 Thread Brian Buhrow
Hello.  I suppose it's not possible to configure ahcisata in the BIOS 
on the long-delay
machines?  I'm guessing this is some quirk of the pciide(4) and piixide(4) 
drivers.  Not to be too 
flip, but do you expect these machines to reboot frequently in production?  If 
not, then I'd 
probably live with the delay on reboot as at this point, I'd be concerned that 
any fix I came up with
for it would have implications down the road which might be more serious and 
more impactful on
operations.  I certainly understand the need to know what's going on, but if a 
machine only
reboots once or twice a year in production, then ...

Any newer hardware should have ahcisaata capable SATA controllers, so 
this problem should
go away as the hardware ages out.

-thanks
-Brian

On May 25,  3:33pm, Mouse wrote:
} Subject: Re: 9.1: boot-time delay?
} >> Will HZ=1000 be sufficient and does that reduce the boot time?
} > The latter is a good question which is likely to hint at possible
} > causes.  I'll experiment with various HZ values and see what happens.
} 
} At HZ=8000, the delay (based on the bracketed numbers) is almost
} exactly 22 seconds.
} 
} At HZ=4000, it's almost exactly 10 seconds.
} 
} At HZ=2000, it's almost exactly 4.3 seconds.
} 
} Based on a quadratic fit to those three data points, all I need to do
} is set HZ to 0 and it will reach the rest of the boot 1.2 seconds
} before it gets to uhub3.  Clearly, this machine includes resublimated
} thiotimoline somewhere in its hardware makeup.
} 
} Seriously, though...
} 
} I've just heard from one of the other people working with this.  I was
} told that, on a different hardware platform, the delay is gone even
} with HZ=8000.  I have an instance of that platform among my development
} machines; I tried it and I see the same thing, even with a bit-for-bit
} identical kernel.
} 
} I spent a little time playing with boot -c and disabling various
} things, as suggested by Martin Husemann upthread.
} 
} Disabling ehci* (what that hardware uses for USB) shuts off all USB
} support, of course.  It does nothing for the delay.
} 
} Disabling piixide* still gets the delay, but it also still finds wd0;
} it just attaches pciide* instead of piixide*.
} 
} Disabling both piixide* and pciide* gets rid of wd* _and_ gets rid of
} the delay.  (It doesn't boot fully, of course, beause it has no root
} device.  But it reaches the root device prompt after some four seconds
} instead of 25-plus.)
} 
} Disabling wd does not fix the delay, but it doesn't completely
} eliminate wd; I still see "wd at ... not configured" messages, so it's
} found in some sense.  (I haven't tried completely removing wd from the
} kernel config.)
} 
} The machine without the delay has wd drives, but they attach at
} ahcisata instead of piixide/pciide.  The piixide attachment (the
} machine that exhibits the delay) is
} 
} piixide0 at pci0 dev 31 function 2: Intel 6 Series Serial ATA Controller 
(rev. 0x05)
} piixide0: bus-master DMA support present
} piixide0: primary channel configured to native-PCI mode
} piixide0: using ioapic0 pin 19 for native-PCI interrupt
} atabus0 at piixide0 channel 0
} piixide0: secondary channel configured to native-PCI mode
} atabus1 at piixide0 channel 1
} ...
} piixide1 at pci0 dev 31 function 5: Intel 6 Series Serial ATA Controller 
(rev. 0x05)
} piixide1: bus-master DMA support present
} piixide1: primary channel wired to native-PCI mode
} piixide1: using ioapic0 pin 19 for native-PCI interrupt
} atabus2 at piixide1 channel 0
} piixide1: secondary channel wired to native-PCI mode
} atabus3 at piixide1 channel 1
} ...
} wd0 at atabus0 drive 0
} wd0: 
} wd0: drive supports 16-sector PIO transfers, LBA48 addressing
} wd0: 465 GB, 969021 cyl, 16 head, 63 sec, 512 bytes/sect x 976773168 sectors 
(0 bytes/physsect; first aligned sector: 8)
} wd0: 32-bit data port
} wd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 5 (Ultra/100), 
WRITE DMA FUA, NCQ (32 tags)
} wd0(piixide0:0:0): using PIO mode 4, Ultra-DMA mode 5 (Ultra/100) (using 
DMA), WRITE DMA FUA EXT
} wd1 at atabus2 drive 0
} wd1: 
} wd1: drive supports 1-sector PIO transfers, LBA48 addressing
} wd1: 7641 MB, 15525 cyl, 16 head, 63 sec, 512 bytes/sect x 15649200 sectors
} wd1: 32-bit data port
} wd1: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 6 (Ultra/133), 
WRITE DMA FUA, NCQ (32 tags)
} wd1(piixide1:0:0): using PIO mode 4, Ultra-DMA mode 6 (Ultra/133) (using 
DMA), WRITE DMA FUA EXT
} 
} In contrast, the ahcisata attachment, on the machine with no delay, is
} 
} ahcisata0 at pci0 dev 23 function 0: vendor 8086 product a102 (rev. 0x31)
} ahcisata0: 64-bit DMA
} ahcisata0: AHCI revision 1.31, 4 ports, 32 slots, CAP 
0xe734ff43
} ahcisata0: interrupting at msi1 vec 0
} atabus0 at ahcisata0 channel 0
} atabus1 at ahcisata0 channel 1
} atabus2 at ahcisata0 channel 2
} atabus3 at ahcisata0 channel 3
} ...
} ahcisata0 port 0: device present, speed: 6.0Gb/s
} ahcisata0 port 3: PHY offline
} 

Re: 9.1: boot-time delay?

2021-05-25 Thread Brian Buhrow
Hello.  Is there a reason you need a frequency that high?  Will HZ=1000 
be sufficient and
does that reduce the boot time?  
Also,while I don't entirely understand all the timing mechanisms inside NetBSD, 
it seems if you
do need a higher frequency clock, I'd suggest HZ=1, since it's a factor of 
100 times the
default, rather than an odd 80 times the default.
-thanks
-Brian

On May 25,  9:44am, Mouse wrote:
} Subject: Re: 9.1: boot-time delay?
} Last week, I wrote, here, of a delay when booting 9.1
} 
} >>> [ 3.288539] uhub2: 4 ports with 4 removable, self powered
} >>> [ 3.288539] uhub3: 6 ports with 6 removable, self powered
} >>> [25.272567] wd0 at atabus0 drive 0
} >>> [25.273568] wd0: 
} 
} and, in a later mail,
} 
} > [A]s soon as I sent my mail and started looking at subsetting diffs,
} > I discovered the diffs between the installer kernel and the
} > operational kernel were far smaller than I remembered (and mostly
} > irrelevant):
} 
} It looks as though this, of all things, is the relevant line:
} 
} > +optionsHZ=8000
} 
} My first reaction is that something somewhere is delaying by a fixed
} number of ticks rather than doing arithmetic with hz, but that would
} make the difference go in the other direction.  I'm having trouble
} coming up with a plausible scenario that would explain the delay
} getting _longer_ with HZ=8000.  The most plausible thing I've thought
} of is that some delay is being computed based on hz and then getting
} treated as milliseconds instead of ticks or some such and getting
} multiplied by hz again.
} 
} Anyone have any thoughts on possible ways to track this one down?  I'm
} going to be doing what I can, but any thoughts anyone has would be
} welcomed; I am only minimally familiar with 9.1's kernel internals.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: I think I've found why Xen domUs can't mount some file-backed disk images! (vnd(4) hides labels!)

2021-04-10 Thread Brian Buhrow
hello.  This must be some kind of regression that's ben  around a 
while.  I'm runing a xen
dom0 with NetBSD-5.2 and xen-3.3.2, very old, but vnd(4) does expose the entire 
file to the
domu's including FreeBSD 11 and 12 without any corruption or booting issues.
Do you know when this trouble began?
-thanks
-Brian



Re: regarding the changes to kernel entropy gathering

2021-04-04 Thread Brian Buhrow
Hello.  As I understand it, Greg ran into this problem on a xen domu.  In 
checking my NetBSD-9
system running as a domu under xen-4.14.1, there is no rdrand or rdseed feature 
exposed to
domu's by xen.  This observation is confirmed by looking at the xen command 
line reference
page: https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

So, it seems the best answer is to update our documentation to say that 
the xen
hypervisor, by default, doesn't provide the rdrand and rdseed instructions to 
the xen guests
and NetBSD doesn't trust the random sources provided by the xennet(4) and 
xbd(4) drivers.
Therefore, the only solution to get randomness working for the first time on a 
newlyinstalled
domu is to write 32 bytes to /dev/random.
-thanks
-Brian



Re: 9.1: no wsmouse events...sometimes.

2021-03-09 Thread Brian Buhrow
hello.  Following up on your observations, what if you work in the 
opposite direction?
That is, after everything is booted and working, never mind the magic that 
makes it go, you
explicitly detach the mouse from the wsmux it picks by default and  explicitly 
reattach it to
the mux you want before starting your application?  I have the following in my 
/etc/rc.local to
detach the bell from all the auxiliary sound cards it  grabs a hold of at boot 
time.  this
isn't exactly what you want, but similar commands concerning the mouse, rather 
than the bell,
might get you something that works on boot without requiring  plugging and 
replugging the mouse
to get things going?

-thanks
-Brian

# Turn off the secondary bells on the system  (BB 01/28/2021)
/usr/sbin/wsmuxctl -f 1 -r wsbell1
/usr/sbin/wsmuxctl -f 1 -r wsbell2


Re: ACPI related performance trouble

2021-02-25 Thread Brian Buhrow
hello.  I wonder if you've compared your BIOS settings on both 
machines?  While the BIOS
may be the same version, it's possible the settings are not identical.  this is 
strongly
suggested by the fact that one of your machines shows a serial number in its 
machdep.dmi
output, while the other does not.  Doing that comparison will be tedious, but I 
think worth it.
Perhaps the easiest way is to reset the slow machine to factory defaults, then 
do your
comparison, screen by screen with the fast machine.

-Brian

On Feb 25,  2:38pm, Emmanuel Dreyfus wrote:
} Subject: Re: ACPI related performance trouble
} Joerg Sonnenberger  wrote:
} 
} > Have you compared the machdep sysctl?
} 
} Here it is. 
} 
} --- sysctl.glutamine
} +++ sysctl.leucine


Re: 9.1: no wsmouse events...sometimes.

2021-02-23 Thread Brian Buhrow
hello.  I think a clue lies in a snippet of the wsmux(4) man page:

 If a wskbd(4), wsbell(4), or wsmouse(4) device is opened despite having a
 mux it will be detached from the mux.

If I had to wager a guess as to what's going on, I'd say that when you 
start the system,
the wsmouse0 gets attached to wsmux2 when the system comes up, but by the time 
you unplug the
mouse and plug it in again, your application has the /dev/wsmouse0 device open, 
which causes
the newly attached mouse to not get attached to the mux.  Even though your 
application is a
replacement for init, it still runs too late to intercept the attaching of the 
mouse to the
mux, which is happening before any processes are spawned.  I wonder if you can 
fix this by
using drvctl to detach the mouse, then usbdevctl to reprobe the USB bus and 
reattach it and
thus activate it for your application.  Alternatively, I think I misread your 
original message
and you're opening wsmux2, not wsmouse0 with your application.  If that's true, 
then this
snippet from the manual suggests that if you open wsmouse0, rather than wsmux2, 
you can force
the mouse input to come to your application directly, rather than going to the 
mux.  If my
second scenario is correct, then I'd say there's a bug in the system since I 
think in that
case, the mouse is getting attached to wsmux0 despite your kernel config entry, 
at least, on
boot up.

In any case, I'm sure I only half know what I'm talking about and, even that 
half, might be
suspect.  With that said, perhaps this will give you a clue on something to try 
next.

-thanks
-Brian




Re: 9.1: no wsmouse events...sometimes.

2021-02-23 Thread Brian Buhrow
hello.  Silly question.  How are you forcing the mouse to use mux 2?  
Is it a custom
kernel, a sysctl variable, an unused mouse plugged into the machine elsewhere?
-thanks
-Brian



Re: X vs serial console?

2021-02-10 Thread Brian Buhrow
hello.  I'm guessing you're running into the same issue I did with the 
intelfb driver.  My
solution was to build a kernel with the genfb(4) driver included and the 
intelfb  driver
excluded.  this gives me VGA video at the expense of a bit of X performance, 
which I didn't
care about anyway.
-thanks
-Brian



Re: X vs serial console?

2021-02-09 Thread Brian Buhrow
hello.  What video driver/kernel driver are you using to drive the 
display?  I had a
similar problem using the intelfb(4) driver.  I had to revert to the genfb(4) 
driver to get the
VGA port to work properly.  The Displayport /HDMI ports worked fine under the 
intelfb(4)
driver, but the VGA port would switch off as soon as the BIOS lost control of 
the card.
-thanks
-Brian




Re: Issues with intelfb(4) and USB keyboards

2020-12-22 Thread Brian Buhrow
hello.  My apologies for such a naive question, but what driver should 
I use for the keyboard?  
The default, as created by X -configure, is "kbd"
I'm thinking it should be "wskbd".  Is that correct?
In any case, the keyboard doesn't appear to work when X is running, though it 
works fine, using a USB keyboard and the wskbd driver when I'm using the
console.

I'll try the xorg.conf section you suggest and, if that doesn't work, i'll try 
the genfb driver.

I'm hoping it's an easy fix for the i915drmkms driver, soon.
Again, thanks for the help.
-Brian



Re: Issues with intelfb(4) and USB keyboards

2020-12-22 Thread Brian Buhrow
hello.  Below is the output of edid-decode against the data I extracted 
using your wsedid program.  While I don't remember the exact numbers for this
monitor, they look reasonable and, there is a dotclock value in there.  The 
only thing that looks odd is that it says the monitor is attached to the
displayport, which is incorrect.  It's attached to the 15-pin VGA port.  
Perhaps that's normal, I'm not familiar with this graphics adapter.  Anyway, 
what
do you think?
And, yes, this monitor was really made in the year 2000.
-thanks
-Brian


edid-decode (hex):

00 ff ff ff ff ff ff 00 4d d9 90 03 d1 24 7b 00
2a 0a 01 04 a5 25 1b 96 e3 0c c9 a0 57 47 9b 27
12 48 4c ff ff 80 c2 80 a9 4f 81 59 81 4f 71 59
61 59 45 59 31 59 86 3d 00 c0 51 00 30 40 40 a0
13 00 60 08 11 00 00 1e 00 00 00 fd 00 30 78 1e
60 1e 00 0a 20 20 20 20 20 20 00 00 00 fc 00 53
4f 4e 59 20 43 50 44 2d 45 34 30 30 00 00 00 ff
00 38 30 37 30 33 35 33 0a 20 20 20 20 20 00 8f



Block 0, Base EDID:
  EDID Structure Version & Revision: 1.4
  Vendor & Product Identification:
Manufacturer: SNY
Model: 912
Serial Number: 8070353
Made in: week 42 of 2000
  Basic Display Parameters & Features:
Digital display
Bits per primary color channel: 8
DisplayPort interface
Maximum image size: 37 cm x 27 cm
Gamma: 2.50
DPMS levels: Standby Suspend Off
Supported color formats: RGB 4:4:4
First detailed timing includes the native pixel format and preferred 
refresh rate
Display is continuous frequency
  Color Characteristics:
Red  : 0.6250, 0.3398
Green: 0.2802, 0.6054
Blue : 0.1552, 0.0703
White: 0.2832, 0.2978
  Established Timings I & II:
IBM :   720x40070.082 Hz   9:531.467 kHz  28.320 MHz
IBM :   720x40087.850 Hz   9:539.444 kHz  35.500 MHz
DMT 0x04:   640x48059.940 Hz   4:331.469 kHz  25.175 MHz
Apple   :   640x48066.667 Hz   4:335.000 kHz  30.240 MHz
DMT 0x05:   640x48072.809 Hz   4:337.861 kHz  31.500 MHz
DMT 0x06:   640x48075.000 Hz   4:337.500 kHz  31.500 MHz
DMT 0x08:   800x60056.250 Hz   4:335.156 kHz  36.000 MHz
DMT 0x09:   800x60060.317 Hz   4:337.879 kHz  40.000 MHz
DMT 0x0a:   800x60072.188 Hz   4:348.077 kHz  50.000 MHz
DMT 0x0b:   800x60075.000 Hz   4:346.875 kHz  49.500 MHz
Apple   :   832x62474.551 Hz   4:349.726 kHz  57.284 MHz
DMT 0x0f:  1024x768i   86.958 Hz   4:335.522 kHz  44.900 MHz
DMT 0x10:  1024x76860.004 Hz   4:348.363 kHz  65.000 MHz
DMT 0x11:  1024x76870.069 Hz   4:356.476 kHz  75.000 MHz
DMT 0x12:  1024x76875.029 Hz   4:360.023 kHz  78.750 MHz
DMT 0x24:  1280x1024   75.025 Hz   5:479.976 kHz 135.000 MHz
Apple   :  1152x87075.062 Hz 192:145  68.681 kHz 100.000 MHz
  Standard Timings:
CVT :  1800x1440   59.911 Hz   5:489.447 kHz 218.250 MHz (EDID 1.4 
source)
GTF :  1800x1440   60.000 Hz   5:489.400 kHz 219.566 MHz (EDID 1.3 
source)
DMT 0x36:  1600x1200   75.000 Hz   4:393.750 kHz 202.500 MHz
DMT 0x21:  1280x96085.002 Hz   4:385.938 kHz 148.500 MHz
CVT :  1280x96074.857 Hz   4:375.231 kHz 130.000 MHz (EDID 1.4 
source)
GTF :  1280x96075.000 Hz   4:375.150 kHz 129.859 MHz (EDID 1.3 
source)
CVT :  1152x86484.790 Hz   4:377.159 kHz 119.750 MHz (EDID 1.4 
source)
GTF :  1152x86485.000 Hz   4:377.095 kHz 119.651 MHz (EDID 1.3 
source)
DMT 0x13:  1024x76884.997 Hz   4:368.677 kHz  94.500 MHz
DMT 0x0c:   800x60085.061 Hz   4:353.674 kHz  56.250 MHz
DMT 0x07:   640x48085.008 Hz   4:343.269 kHz  36.000 MHz
  Detailed Timing Descriptors:
DTD 1:  1280x1024   85.024 Hz   5:491.146 kHz 157.500 MHz (352 mm x 264 
mm)
 Hfront   64 Hsync 160 Hback 224 Hpol P
 Vfront1 Vsync   3 Vback  44 Vpol P
  Display Range Limits:
Monitor ranges (GTF): 48-120 Hz V, 30-96 kHz H, max dotclock 300 MHz
Display Product Name: 'SONY CPD-E400'
Display Product Serial Number: '8070353'
Checksum: 0x8f


Re: Issues with intelfb(4) and USB keyboards

2020-12-22 Thread Brian Buhrow
hello.  Thanks for the help.  I'll follow your instructions and report 
back.
-Brian




Re: Issues with intelfb(4) and USB keyboards

2020-12-21 Thread Brian Buhrow
hello.  Sorry for the delay in responding.  Yes, I have a very old 
monitor, an old Sony Trenitron multisync.  I can probably find a dotclock value 
to
use in this configuration, but I don't know how to put it in the i915drmkms 
driver.  Can I put in a number with sysctl?
-thanks
-Brian



Re: Issues with intelfb(4) and USB keyboards

2020-12-19 Thread Brian Buhrow
hello.  Thanks for the quick explanation.  It still doesn't work, but 
the errors are different.
Any ideas what to try next?  How does one set the dot clock?
-thanks
-Brian

[ 1.00] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 
2004, 2005,
[ 1.00] 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 
2016, 2017,
[ 1.00] 2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights 
reserved.
[ 1.00] Copyright (c) 1982, 1986, 1989, 1991, 1993
[ 1.00] The Regents of the University of California.  All rights 
reserved.

[ 1.00] NetBSD 9.1_STABLE (MIRKWOOD) #0: Wed Dec 16 23:49:40 PST 2020
[ 1.00] 
buh...@loth-9.nfbcal.org:/usr/local/netbsd/src-90/sys/arch/amd64/compile/MIRKWOOD
[ 1.00] total memory = 16088 MB
[ 1.00] avail memory = 15594 MB
[ 1.00] WARNING: module error: module `msdos' pushed by boot loader 
already exists
[ 1.00] cpu_rng: RDSEED
[ 1.00] rnd: seeded with 256 bits
[ 1.00] timecounter: Timecounters tick every 10.000 msec
[ 1.00] Kernelized RAIDframe activated
[ 1.00] running cgd selftest aes-xts-256 aes-xts-512 done
[ 1.00] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 
100
[ 1.03] efi: systbl at pa dbd17018
[ 1.03] Dell Inc. OptiPlex 5050
[ 1.03] mainbus0 (root)
[ 1.03] ACPI: RSDP 0xD0ED9000 24 (v02 DELL  )
[ 1.03] ACPI: XSDT 0xD0ED90B8 F4 (v01 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: FACP 0xD0F009F8 00010C (v05 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: DSDT 0xD0ED9240 0277B4 (v02 DELL   CBX3 
01072009 INTL 20160422)
[ 1.03] ACPI: FACS 0xDB80DF00 40
[ 1.03] ACPI: APIC 0xD0F00B08 84 (v03 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: FPDT 0xD0F00B90 44 (v01 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: FIDT 0xD0F00BD8 AC (v01 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: MCFG 0xD0F00C88 3C (v01 DELL   CBX3 
01072009 MSFT 0097)
[ 1.03] ACPI: HPET 0xD0F00CC8 38 (v01 DELL   CBX3 
01072009 AMI. 0005000B)
[ 1.03] ACPI: SSDT 0xD0F00D00 003176 (v02 SaSsdt SaSsdt   
3000 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F03E78 0025A5 (v02 PegSsd PegSsdt  
1000 INTL 20160422)
[ 1.03] ACPI: HPET 0xD0F06420 38 (v01 INTEL  SKL  
0001 MSFT 005F)
[ 1.03] ACPI: SSDT 0xD0F06458 000DE5 (v02 INTEL  Ther_Rvp 
1000 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F07240 0008F6 (v02 INTEL  DELL_SFF 
 INTL 20160422)
[ 1.03] ACPI: UEFI 0xD0F07B38 42 (v01 
  )
[ 1.03] ACPI: SSDT 0xD0F07B80 000EDE (v02 CpuRef CpuSsdt  
3000 INTL 20160422)
[ 1.03] ACPI: LPIT 0xD0F08A60 94 (v01 INTEL  SKL  
 MSFT 005F)
[ 1.03] ACPI: SSDT 0xD0F08AF8 000141 (v02 INTEL  HdaDsp   
 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F08C40 00029F (v02 INTEL  sensrhub 
 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F08EE0 003002 (v02 INTEL  PtidDevc 
1000 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F0BEE8 00050D (v02 INTEL  TbtTypeC 
 INTL 20160422)
[ 1.03] ACPI: DBGP 0xD0F0C3F8 34 (v01 INTEL   
0002 MSFT 005F)
[ 1.03] ACPI: DBG2 0xD0F0C430 54 (v00 INTEL   
0002 MSFT 005F)
[ 1.03] ACPI: MSDM 0xD0F0C488 55 (v03 DELL   CBX3 
06222004 AMI  00010013)
[ 1.03] ACPI: SLIC 0xD0F0C4E0 000176 (v03 DELL   CBX3 
01072009 MSFT 00010013)
[ 1.03] ACPI: TCPA 0xD0F0C658 32 (v02 ALASKA NAPAASF  
 MSFT 0113)
[ 1.03] ACPI: ASF! 0xD0F0C690 A0 (v32 INTEL   HCG 
0001 TFSM 000F4240)
[ 1.03] ACPI: BGRT 0xD0F0C730 38 (v00 ??  
01072009 AMI  00010013)
[ 1.03] ACPI: DMAR 0xD0F0C768 A8 (v01 INTEL  SKL  
0001 INTL 0001)
[ 1.03] ACPI: 10 ACPI AML tables successfully acquired and loaded
[ 1.03] ioapic0 at mainbus0 apid 2: pa 0xfec0, version 0x20, 120 
pins
[ 1.03] cpu0 at mainbus0 apid 0
[ 1.03] cpu0: CPU base freq 32 Hz
[ 1.03] cpu0: CPU max freq 36 Hz
[ 1.03] cpu0: TSC freq CPUID 319200 Hz
[ 1.03] cpu0: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, id 0x506e3
[ 1.03] cpu0: package 0, core 0, smt 0
[ 1.03] cpu1 at mainbus0 apid 2
[ 1.03] cpu1: Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz, id 0x506e3
[ 1.03] cpu1: package 0, core 1, smt 0
[ 1.03] cpu2 at mainbus0 apid 4
[ 1.03] cpu2: 

Issues with intelfb(4) and USB keyboards

2020-12-19 Thread Brian Buhrow
hello.  I'm working to get a new installation of NetBSD-9.1 working on 
a Dell Optiplex 5050.  This machine has an Intel HD graphics 530 video chip in
it with 2 Display ports, 1 HDMI port and 1 VGA port attached to it.  I'm trying 
to get the VGA port working and I'm having trouble with that.  When the
machine boots, the BIOS sets up the display as it should and the VGA port 
works.  when the kernel takes over, it configures the intelfb(4) device and the 
screen
goes black and the monitor claims there's no signal.  Dmesg is shown below, but 
it looks like either 1 of 2 things is happening:

1.  The i915DRM driver is incorrectly assigning the video output to one of the 
displayporp jacks.

2.  The i915DRM driver is unable to properly detect the supported clock rates 
on the VGA connected monitor and it's settting a resolution that's out of
range for the Monitor.

If it helps, I can log in without the screen on the console and I get a 
window of 64 lines X 160 characters.
That seems like a lot of text on a VGA screen.  

I don't understand the significance or not of the dmesg output from the i915  
driver, so I'm wondering if anyone might be able to help me track this issue
down?

-thanks
-Brian


[ 1.00] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 
2004, 2005,
[ 1.00] 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 
2016, 2017,
[ 1.00] 2018, 2019, 2020 The NetBSD Foundation, Inc.  All rights 
reserved.
[ 1.00] Copyright (c) 1982, 1986, 1989, 1991, 1993
[ 1.00] The Regents of the University of California.  All rights 
reserved.

[ 1.00] NetBSD 9.1_STABLE (MIRKWOOD) #0: Wed Dec 16 23:49:40 PST 2020
[ 1.00] 
buh...@loth-9.nfbcal.org:/usr/local/netbsd/src-90/sys/arch/amd64/compile/MIRKWOOD
[ 1.00] total memory = 16088 MB
[ 1.00] avail memory = 15594 MB
[ 1.00] WARNING: module error: module `msdos' pushed by boot loader 
already exists
[ 1.00] cpu_rng: RDSEED
[ 1.00] rnd: seeded with 256 bits
[ 1.00] timecounter: Timecounters tick every 10.000 msec
[ 1.00] Kernelized RAIDframe activated
[ 1.00] running cgd selftest aes-xts-256 aes-xts-512 done
[ 1.00] timecounter: Timecounter "i8254" frequency 1193182 Hz quality 
100
[ 1.03] efi: systbl at pa dbd17018
[ 1.03] Dell Inc. OptiPlex 5050
[ 1.03] mainbus0 (root)
[ 1.03] ACPI: RSDP 0xD0ED9000 24 (v02 DELL  )
[ 1.03] ACPI: XSDT 0xD0ED90B8 F4 (v01 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: FACP 0xD0F009F8 00010C (v05 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: DSDT 0xD0ED9240 0277B4 (v02 DELL   CBX3 
01072009 INTL 20160422)
[ 1.03] ACPI: FACS 0xDB80DF00 40
[ 1.03] ACPI: APIC 0xD0F00B08 84 (v03 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: FPDT 0xD0F00B90 44 (v01 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: FIDT 0xD0F00BD8 AC (v01 DELL   CBX3 
01072009 AMI  00010013)
[ 1.03] ACPI: MCFG 0xD0F00C88 3C (v01 DELL   CBX3 
01072009 MSFT 0097)
[ 1.03] ACPI: HPET 0xD0F00CC8 38 (v01 DELL   CBX3 
01072009 AMI. 0005000B)
[ 1.03] ACPI: SSDT 0xD0F00D00 003176 (v02 SaSsdt SaSsdt   
3000 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F03E78 0025A5 (v02 PegSsd PegSsdt  
1000 INTL 20160422)
[ 1.03] ACPI: HPET 0xD0F06420 38 (v01 INTEL  SKL  
0001 MSFT 005F)
[ 1.03] ACPI: SSDT 0xD0F06458 000DE5 (v02 INTEL  Ther_Rvp 
1000 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F07240 0008F6 (v02 INTEL  DELL_SFF 
 INTL 20160422)
[ 1.03] ACPI: UEFI 0xD0F07B38 42 (v01 
  )
[ 1.03] ACPI: SSDT 0xD0F07B80 000EDE (v02 CpuRef CpuSsdt  
3000 INTL 20160422)
[ 1.03] ACPI: LPIT 0xD0F08A60 94 (v01 INTEL  SKL  
 MSFT 005F)
[ 1.03] ACPI: SSDT 0xD0F08AF8 000141 (v02 INTEL  HdaDsp   
 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F08C40 00029F (v02 INTEL  sensrhub 
 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F08EE0 003002 (v02 INTEL  PtidDevc 
1000 INTL 20160422)
[ 1.03] ACPI: SSDT 0xD0F0BEE8 00050D (v02 INTEL  TbtTypeC 
 INTL 20160422)
[ 1.03] ACPI: DBGP 0xD0F0C3F8 34 (v01 INTEL   
0002 MSFT 005F)
[ 1.03] ACPI: DBG2 0xD0F0C430 54 (v00 INTEL   
0002 MSFT 005F)
[ 1.03] ACPI: MSDM 0xD0F0C488 55 (v03 DELL   CBX3 
06222004 AMI  00010013)
[ 1.03] ACPI: SLIC 0xD0F0C4E0 000176 (v03 DELL   CBX3 
01072009 MSFT 00010013)
[ 1.03] ACPI: TCPA 0xD0F0C658 

Re: Problems with hdaudio(4) on Dell Optiplex-5050

2020-12-16 Thread Brian Buhrow
hello.  Further investigation into this issue reveals that Mouse is
half right as to the problem itself.  the problem, just to reiterate, is
that when I plug a hedphone into the headphone jack  of my Dell Optiplex
5050, the sound is distorted as though the left and right channels are
shorted together.  And, Mouse's observation that this could be hardware
related is correct.
The issue seems to be that the jack on the front of this machine is a
4-pin jack: left channel at tip, right channel at ring 1, microphone at
ring 2 and ground at sleeve.If only it were that simple.  Yes, it's a
4-conductor jack, but the Realtek chip can be programmed to interpret the
signals from the jack in various modes, i.e. headphone mode, where ring 2
and sleeve are logically shorted together, CTIA  mode, where ring 2 is
microphone and sleeve is ground, and OMT mode, where ring 2 and the sleeve
are logically reversed from CTIA mode.  (I think I have that description 
correct,
but I'm still trying to understand all the choices here.)
the point is that while our driver puts the audio through the right
output device at the right time, in the case of the headphone jack, it
doesn't set the mode of the jack at all and the default mode is not the
right one for the operation of headphones.  To accomplish this task,
additional bits need to be read and written to registers outside of the
HDAudio Specification.  I've downloaded the Linux kernel sources to try and
gain an understanding of how to do this and translate these additional
operations into our hdaudio(4) framework.  I wonder, however, if someone on
this list might be able to help me understand the moving parts of our
driver enough to help me figure out how to write the extra operations
necessary to give these additional instructions  to the Realtek chips.  In
terms of an interface, I'm thinking the audioctl(1) interface could be used
to allow the user to select which mode these jacks are in.  that would, for
example, allow a user to select a headset with microphone, headphones, or a
line in or out mode.

If someone has already done this work, then I'm happy to use that, or
to test if they are ready to commit it.

Thoughts, or suggestions welcome.

-thanks
-Brian

On Dec 8,  7:43am, Mouse wrote:
} Subject: Re: Problems with hdaudio(4) on Dell Optiplex-5050
} > [I]f a stereo headphone is plugged in to the jack on the fron tof the
} > machine, the audio switches to that jack, but the two channels are
} > mixed together in such a way as to cause them to cancel each other
} > out on each headphone.
} 
} I'll suggest another possible cause, if I've understood the symptom
} correctly.
} 
} Stereo headphones use three conductors: one ear signal, the other ear
} signal, and common ground.  I have occasionally had the ground
} connection break (usually for fixable reasons, in my cases).  This
} causes the electrical circuit to amount to both ears in series between
} the two signal conductors, leading to an effectively monaural sound
} that amounts to the difference between the two channels.  If there's a
} defect in the jack or its connections (which I can imagine being common
} across a product line or manufacturer), this could at least potentially
} be responsible for the symptom you're hearing.
} 
} Of course, it also could be software, as you suggest.  One experiment
} that might help resolve this occurs to me.  When I've had a machine
} crash while playing sound, it usually keeps playing the last short time
} (50ms?) in a loop.  If you break to ddb while sound is playing, I'd
} expect the same thing to happen.  In that state, the driver isn't going
} to be changing anything, so plugging headphones in then might help you
} identify whether it's a software issue or not.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: Problems with hdaudio(4) on Dell Optiplex-5050

2020-12-10 Thread Brian Buhrow
hello.  Investigating this further reveals the following additional
information and yields more specific questions which I hope someone can help
me with.

The freeBSD-12 driver for this audio chip only knows how to drive the
line out jack on the back of the machine.  If I modify hdafg.c to remove
the fix posted to Rev1.16 of hdafg.c, the NetBSD driver behaves the same
way.  That is, only the line out jack is active.  However, in both FreeBSD
and netBSD cases, the line out provides good stereo output.

When I plug in a headphone set when audio is playing through the
internal speaker, using the headphone jack on the front of the machine, I
get a momentary blip where the sound is correct through the headphones
before the cross channel cancelation takes effect.  
Others have described the same problem on Dell equipment, so I don't think
this is a hardware problem, though folks on the thread referenced below
also thought it might be a hardware problem.

I've downloaded and tried to wade through the HD Audio specification
from Intel, and it definitely helps my understanding of how the driver
works, but I still don't quite understand how codec streams get connected
to output pins and where the associations between related pins happens.  I
think the problem is that when the presence detect happens on the headphone
jack and the presence handler is called in hdafg.c, the only thing that
routine does is enable output on the headphone jack, without making any
other changes to the stream.  What I think needs to happen, and this is
where my understanding begins to faulter, is that in addition to changing
the setting of the output device, the codec->output device association
needs to change to reflect the changed characteristics of the newly enabled
output device.  I have a similar audio chip on a Dell Windows 10 laptop and
I can tell by the way it behaves, that it definitely alters these
associations on the fly as one switches betweeninternal speakers and
headphones.  The windows driver  also asks me whether I've plugged in a
headphone or external speaker, or something else, so there's definitely
some kind of software control over the jack and what it's being used for.
It looks like our driver needs some work to increase the sophistication of
this process.  Right now, for example, there appears no way to attach a
microphone device.  
In fact, as I think about it, I wonder if that's part of the problem.  I
wonder if there's an additional contact in the jack that is to be used for
a microphone that's supposed to be disabled in software when a headphone or
speaker is attached?

Any enlightenment anyone can shed on this work would be greatly
appreciated.

-thanks
-Brian


https://www.unitedbsd.com/d/286-bad-audio-on-netbsd


uaudio(4) seems to be broken in NetBSD-9.1

2020-12-10 Thread Brian Buhrow
hello.  I just filed kern/pr 55856, which documents the failure of
uaudio(4) devices under NetBSD-9.1.  These are devices which work fine
under FreeBSD-12 and previous versions of NetBSD, including NetBSD-3, 4 and
5.  However, I'm sure I've used these devices on NetBSD-versions as early
as NetBSD-2 and, I think, even earlier.  I suspect this is a regression
from the new audio subsystem that went into NetBSD-9.  In any case, if
anyone has thoughts on how to go about resolving this, I'd be very
interested.

-thanks
-Brian



Re: Reparenting processes?

2020-12-08 Thread Brian Buhrow
hello.  Isn't what you want to do very similar to what happens when a
process goes into background and the parent dies?  I.e. its parent gets
reassigned to init, pid 1.  That's a special case, but it seems like you
could create a syscall that does that work and as long as the target parent
is the same uid as the calling process, you should be able to re-use a lot
of the code that does that inheritance  for backrounded processes.  Then,
when the new parent does a wait, it just gets the pid of the newly
transfered child, assuming the child exits at some point.

Just a thought.
-Brian

On Dec 8,  2:36pm, Mouse wrote:
} Subject: Re: Reparenting processes?
} > One complication I can think of: what happens to the original parent 
process$
} 
} If the original parent is not in a wait() at the time, I don't see this
} as an issue; the child just evaporates.  (See below.)
} 
} If the original parent is in a wait(), and this is either its last
} child or it is specifically being waited for (either by pid or pgid),
} I'm not sure.  I'd have to think about that.  Possibly a new wait
} status (WIFREPARENTED?).  Possibly it just turns into whatever would
} have happened if the wait*() had been done immediately after the
} reparenting.  Possibly something else.
} 
} The new parent learning about the new child is not something I've been
} worried about, because what I've been imagining requires active
} collaboration between the old and new parents for the move to happen at
} all, meaning that each parent is expecting the change and can update
} whatever internal data structures it needs to.
} 
} Yes, post-reparenting, the new parent can wait for the child just like
} any other child it (then) has.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Problems with hdaudio(4) on Dell Optiplex-5050

2020-12-08 Thread Brian Buhrow
hello.  I'm trying to get the audio headphone jack working properly on
an Dell Optiplex 5050 with NetBSD-9.1 with sources from December 2, 2020.
When sound comes through the built-in speaker, all works fine.  However, if
a stereo headphone is plugged in to the jack on the fron tof the machine,
the audio switches to that jack, but the two channels are mixed together in
such a way as to cause them to cancel each other out on each headphone.  I
think what's going wrong is something like the following:

1.  When the sound card comes up, the speaker is the active device, so both
channels are combined to drive the speaker, monaural mode.  Then, when  a
headphone is plugged in, the sound card is switched to two channels, but
the two channels are still mixed.  From what I gather reading the source,
the initial set up is how the BIOS  sets things up, so the driver just
leaves things as they are and switches the output from speaker to headphone
when the headphone is connected.  I've seen reports of similar problems
with the NetBSD driver and other Dell models, so I'm confident it's a
software problem.  I looked at the FreeBSD-12 driver to see what they do,
and it appears they attach all the jacks as  different devices, but they
also don't seem to switch modes, so their driver might have the same issue.
Does anyone have any ideas what I should look at to resolve this
issue?
-thanks
-Brian

NetBSD 9.1_STABLE (GENERIC) #0: Mon Dec  7 14:11:20 PST 2020

buh...@loth-9.nfbcal.org:/usr/local/netbsd/src-90/sys/arch/amd64/compile/GENERIC

 . . .

hdaudio0 at pci0 dev 31 function 3: HD Audio Controller
hdaudio0: interrupting at msi2 vec 0
hdafg0 at hdaudio0: vendor 10ec product 0255
hdafg0: DAC00 2ch: Speaker [Jack]
hdafg0: 2ch/0ch 44100Hz 48000Hz 96000Hz 192000Hz PCM16 PCM20 PCM24 AC3
audio0 at hdafg0: playback, capture, full duplex, independent
audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for playback
audio0: slinear_le:16 2ch 48000Hz, blk 1920 bytes (10ms) for recording
spkr0 at audio0: PC Speaker (synthesized)
wsbell at spkr0 not configured
hdafg1 at hdaudio0: vendor 8086 product 2809
hdafg1: DP00 8ch: Digital Out [Jack]
hdafg1: 8ch/0ch 48000Hz PCM16*

-thanks
-Brian



Re: Reparenting processes?

2020-12-07 Thread Brian Buhrow
hello.  Dependning on why you're doing this and what you're running, I
wonder if either screen(1) using the detach feature will do what you want,
or if tmux(1) using a similar detach feature will do what you want.  They
don't actually reparent, but do allow you to log out of one shell and then
log into another and reattach the session you had running before you logged
out initially.

-Brian
On Dec 7,  8:54pm, Mouse wrote:
} Subject: Reparenting processes?
} I've been thinking about building a way to move a job between shells,
} in particular between one window, ssh session, whatever, and another.
} 
} Obviously, this will involve much hackery of existing facilities.  For
} example, it involves switching controlling terminals.  I expect I can
} manage most of it without too much trouble - though that's said without
} having actually tried it.
} 
} But the real bugaboo in my mind is reparenting processes.
} 
} I'm writing here to ask two questions.
} 
} One is: does anyone have experience trying this, or otherwise have any
} reports on attempts to do this, to share?  I'd be interested in any.
} 
} The other is: is there any security property that such a facility would
} break badly?  (There are lots of minor things that might break, such as
} potentially changing terminal type, but that's basically no different
} from telling the terminal emulator to change emulation on the fly,
} which my terminal emulator has been capable of for decades.  I haven't
} come up with any major security issues yet, but that means little.)
} 
} I'm willing to require - indeed, I expect to _want_ to require - that
} the two parents (old and new) be cooperating in this endeavour.  I'm
} willing to require that all three processes be running with the same
} root, though it'd be nice if that weren't necessary.  It would address
} some, though not all, of the desired use cases if all three processes
} have to be running with the same UID.  It would even be of _some_ use
} if it required cooperation from root in some form (a fourth process,
} maybe?).
} 
} As far as I can tell, reparenting currently is limited to (1) init
} inheriting orphans and (2) sharply limited partial reparenting due to
} ptrace().
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: Issues with older wd* / IDE on various platforms

2020-11-13 Thread Brian Buhrow
hello John.  I'm not sure if this addresses your issue, but I'm
looking at an issue with talking to IDE disks on HVM xen domains.  I've
been  looking at the following commit to see if it fixes my issue.  It's
possible this is what breaks your drives.  Actually,  there are several
commits around that time frame that might be related.  this one references
changes to wd.c, but there are related changes to ata_wdc.c that are
probably more related.  

Hope this helps.
-thanks
-Brian
http://mail-index.NetBSD.org/source-changes/2020/05/24/msg117668.html


Re: RAIDframe: what if a disc fails during copyback

2020-10-29 Thread Brian Buhrow
hello.  In my experience, the copyback feature never worked.  I
found I had to reboot, turning the hot spare C into component C, Add the
replaced B as a new hot spare, reconstruct to it, and reboot again to get
everything back into its proper place.  I forget the exact  problem I ran
into, but I think it had something to do with not being able to add another
hot spare when one was in use, or  the system not recognizing the replaced
component B as a valid thing to copy back to.  If you search the archives,
I'm sure you can find the exchange between Greg and I on the topic.  The
The result of that conversation was, as I remember it, something like,
yes, it's broken and if you'd like to fix it, be my guest.
So, I'd be curious to know if you can do the copyback without having
to reboot and, once done, how things work.

-thanks
-Brian

On Oct 29,  7:37pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: RAIDframe: what if a disc fails during copyback
} There still seems to be confusion on what I did.
} 
} Let A and B be the two original components, C a spare (in the cupboard) 
} and B' be B with the new firmware.
} 
} I start with A and B as the two components of a RAID-1.
} Now B failes. I have a degraded RAID with A alone.
} I plug in C, scsictl scsibus0 scan all all it, add it as a hot spare 
} (raidctl -a C) and initiate a reconstruction (raidctl -F B).
} Now I'm redundant again with A and C. Since I didn't re-boot, RAIDframe 
} knows that B has failed and C is a used spare.
} I now actually un-plug B, plug it into another machine, do some testing 
} (verifying that it may reset on writes), install new firmware, do futher 
} testing (verifying it now doesn't reset on writes) and am about to 
} re-plug it into the orignal server (which won't notice it ever disappeared 
} or that B has turned into B'---as far as this question is concerned, 
} I could have done all this in the original server).
} What I'm now intending to do is to raidctl -B (with A, B' and C installed, 
} of course). After that, I intend to raidctl -r C, then 
} scscictl scsibius0 detach C and finally un-plug C and put it back into the 
} cupboard again.
} 
} My question was about 1. B', 2. C or 3. A failing during the copyback.
} 
} > there was a crop of bad Seagate 500GB disks for a while and they had 
} > a tendancy to fail in mass at the same time.
} My working hypothesis since some five years is that all Seagate discs 
} are bad and bound to fail. We had a series of SATA 250G (the example above 
} is about SAS 146K) drives that failed the same way (dozens of them), 
} got most of them replaced on warranty and had the replacements failing 
} the same way again.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=




Re: RAIDframe: what if a disc fails during copyback

2020-10-29 Thread Brian Buhrow
Hello.  Note that Raidframe's notion of a hot spare is somewhat
different than other software raid systems in that once you reboot after
copying to a hot spare, that hot spare becomes just another component in
the raid set.  In other words, it loses its hot spare designation and you
should treat it as you would any other component.   That means that raidctl
-r to replace the existing in-place component can be used to replace the
spare with the original disk now that you have it repaired.

Assuming the original component is still good, a, in Mouse's example,
if 'b' fails during the reconstruction, you're left with a single component
raid1 system again.  If 'A fails during the copy, you're left with some
corrupt data, though the system will not panic and you'll be able to
salvage what you can from the raid.  Unfortunately, I've been caught in
this situation more times than I'd like to say -- there was a crop of bad
Seagate 500GB disks for a while and they had a tendancy to fail in mass at
the same time.

-thanks
-Brian

On Oct 29,  1:53pm, Mouse wrote:
} Subject: Re: RAIDframe: what if a disc fails during copyback
} > In a RAIDframe RAID-1, a disc failed and I reconstructed on a spare.
} > Now I want to replace the failed component (actually by the same
} > disc, which needed a firmware update) and want to copyback to it.
} 
} So, let me make sure I understand you correctly.
} 
} So you have drives A, B, and C.  A and B were live.  Let's say B is the
} one that failed.  You reconstructed onto C and have been running with A
} and C.
} 
} Now you have a new B (which in this case is the same hardware with new
} firmware) and want to put it back into service.  I'm not sure whether
} you want to put it into service in place of A or in place of C.  I'm
} going to assume C.
} 
} So, you'd pull C, replace it with B, and initiate a reconstruct, which
} for RAID 1 means copying from A to B.  Right?
} 
} > How will RAIDframe behave if, during the copyback:
} > 1. The replaced component fails
} 
} Is this B?  Or C?  Because it sounds to me as though C would be out of
} service at this point.
} 
} > 2. The spare fails
} 
} Which is "the spare"?  Are you running with a hot spare?  I think a hot
} spare failing means nothing until/unless RAIDframe tries to fall back
} on it.
} 
} > 3. The other, non-replaced component fails?
} 
} That would be A?
} 
} > Specifically: Is there any szenario (other than more than one disc
} > failing) that will put the RAID into a non-redundant state?  I guess
} > 3. may?
} 
} For RAID 1 in general, as soon as you have only one non-failed drive,
} you have no redundancy.  Based on the assumption that RAIDframe RAID 1
} cannot handle more than two drives (always true as far as I know, and
} the 9.0 raidctl(8) manpage says it's still true as of 9.0), this means
} that
} 
} - If B fails while copying back to it, you are back to non-redundant
}operation on A.
} 
} - If A fails while copying back, you have no operational set.  Your
}only real option is to pull A and B, connect C alone, and fall back
}to the state of things as of when you pulled it; then re-add A or B
}and copyback from C.
} 
} - If C fails while copying from A to B, nothing in particular happens
}except that you don't have the hot spare you thought you did.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: wait(2) and SIGCHLD

2020-08-14 Thread Brian Buhrow
hello.  I think Mouse said it best.  There is a difference between
SIG_DFL and SIG_IGN, which is how you can not get signaled when a child
exists, but wait(2) will still wait for a child  if you call it.

Hope that helps.
-Brian

On Aug 14, 10:10am, Brian Buhrow wrote:
} Subject: Re: wait(2) and SIGCHLD
}   Hello.  I'm not sure I've completely understood your question, but I
} think you're confusing the issue of whether a child posts a SIGCHLD signal
} when it exits versus whether the current process that's calling wait(2)
} receives a SIGCHLD when a child exits.  The default behavior, as I
} understand it, is that if a process has children, by default, it will not
} get signaled if those children terminate.  However, if that process then
} calls wait(2), it will hang until a child terminates, regardless of whether
} it's configured to receive the SIGCHLD or not.  In that instance, I think
} the man page is wrong, at least if code I have running is to be believed.  So,
} I think there's no difference between the default ignoring of the SIGCHLD
} signal and explicitly ignoring it.
} -Brian
} 
} On Aug 14,  1:51pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} } Subject: wait(2) and SIGCHLD
} } I'm confused regarding the behaviour of wait(2) wrt. SIGCHLD handling.
} } 
} } The wait(2) manpage says:
} } 
} } wait() will fail and return immediately if:
} } [ECHILD]The calling process has no existing unwaited-for child
} } processes; or no status from the terminated child
} } process is available because the calling process has
} } asked the system to discard such status by ignoring
} } the signal SIGCHLD or setting the flag SA_NOCLDWAIT
} } for that signal.
} } 
} } However, ignore is the default handler for SIGCHLD.
} } 
} } So does the
} } because the calling process has asked the system
} } to discard such status by ignoring the signal SIGCHLD
} } mean that explicitly ignoring SIGCHLD is different from ignoring it per 
default?
} >-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=
} 
} 
>-- End of excerpt from Brian Buhrow




Re: wait(2) and SIGCHLD

2020-08-14 Thread Brian Buhrow
Hello.  I'm not sure I've completely understood your question, but I
think you're confusing the issue of whether a child posts a SIGCHLD signal
when it exits versus whether the current process that's calling wait(2)
receives a SIGCHLD when a child exits.  The default behavior, as I
understand it, is that if a process has children, by default, it will not
get signaled if those children terminate.  However, if that process then
calls wait(2), it will hang until a child terminates, regardless of whether
it's configured to receive the SIGCHLD or not.  In that instance, I think
the man page is wrong, at least if code I have running is to be believed.  So,
I think there's no difference between the default ignoring of the SIGCHLD
signal and explicitly ignoring it.
-Brian

On Aug 14,  1:51pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: wait(2) and SIGCHLD
} I'm confused regarding the behaviour of wait(2) wrt. SIGCHLD handling.
} 
} The wait(2) manpage says:
} 
}   wait() will fail and return immediately if:
}   [ECHILD]The calling process has no existing unwaited-for child
}   processes; or no status from the terminated child
}   process is available because the calling process has
}   asked the system to discard such status by ignoring
}   the signal SIGCHLD or setting the flag SA_NOCLDWAIT
}   for that signal.
} 
} However, ignore is the default handler for SIGCHLD.
} 
} So does the
}   because the calling process has asked the system
}   to discard such status by ignoring the signal SIGCHLD
} mean that explicitly ignoring SIGCHLD is different from ignoring it per 
default?
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=




Re: AES leaks, cgd ciphers, and vector units in the kernel

2020-06-18 Thread Brian Buhrow
hello.  Another question.  
Does xen advertise and allow the use of these instructions on PV and PVH
domu's?

-thanks
-Brian



Re: AES leaks, cgd ciphers, and vector units in the kernel

2020-06-18 Thread Brian Buhrow
hello.  I have what may be a silly question.  Does this change mean
that I386 users won't have AES capabilities in the kernel at all going
forward?  (I gather that's true for architectures like Sparc, but I'm
assuming the AES code we did have didn't run very well on Sparc anyway.)
However, it seems we still have a number of I386 users and I'm wondering
where this leaves them?
-thanks
-Brian



Re: 8.0 USB "device problem, disabling port"

2020-03-11 Thread Brian Buhrow
hello.  I've looked at this error message in the 5.2 stack and, if you
look in there, there's a lot of hand waving when the code encounters
conditions it doesn't understand.  That is to say, the error path doesn't
do a lot to diagnose the trouble, i.e. figure out if it's a power problem,
a speed matching issue or something else.  That said, I seem to encounter
it most when the port in question is behind a cascade of hub devices,
suggesting that stacks of nested hub devices are problematic for our stack.
I have a Dell workstation here, for example, where USB flash disks plugged
into the front panel ports of the machine generate these errors, but if the
flash sticks are plugged into ports on the back, all is well.
My suggestion is to look in the -current source to see if the uhub
code has gotten smarter about handling error conditions.   The USB stack
has received a lot of attention since 5.2 and 8.x were released.  Perhaps
you can take some ideas from there and back port them to your tree and thus
get better diagnostics and, possibly working fixes for the problem itself.
If that's not fruitful, I suggest looking at the FreeBSD USB stack source
code for similar inspiration.


I'm sure I've said nothing you don't already know, but hopefully it
helps spark an idea anyway.

-Brian

On Mar 11, 10:41am, Mouse wrote:
} Subject: 8.0 USB "device problem, disabling port"
} In 8.0, and at least one earlier version, when the kernel doesn't like
} a USB device for some reason the message generated just says "device
} problem, disabling port" (with the hub name and port number).
} 
} Would I be out of place to suggest that this is a little...light on
} useful information?  It's admittedly not on a par with the infamous
} Nifty Doorways Eleventy upgrade message, but it's definitely further in
} that direction than I'd like.  (As I used to tell the first-level
} people back when I was doing second-level support, "it doesn't work" is
} not a usable problem report.)
} 
} In the tree I'm using at work, I've just thrown printfs into
} usbd_new_device.  But that's a bit ugly, probably not the best approach
} (though I can certainly send a diff for what I have if anyone wants).
} Thoughts?
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Brian Buhrow
hello.  I make heavy use of the COMPAT_XYZ functions and have done so
for many years.  As Mouse says, it's what makes NetBSD very usable and
easy to maintain.  If that functionality left NetBSD, it would reduce its
value  significantly.
I understand it's a lot of work to maintain this functionality and
there are a lot of subtle interactions between the modules as they relate
to security, but it is a real time saver in terms of being able to maintain
OS levels while continuing to be able to use working applications and
knowing that the next upgrade of the OS isn't going to break some critical
service in my shop.
One implication of your proposal is that you'll disable the autoload
functionality, users will turn it back on, use it, and be more vulnerable
than they are now because the primary developers aren't concern with making
things work or secure anymore.   If I remember the discussion from a couple
of years ago, there was some distinction about the invasiveness of each
compat option and its relative security threat.  I think a blanket
disabling of the compat options is too big of ahammer and a more nuanced
approach should be taken.
-thanks

On Sep 26, 10:22am, Mouse wrote:
} Subject: Re: Proposal, again: Disable autoload of compat_xyz modules
} >>> Keeping them enabled for the <1% users interested means keeping
} >>> vulnerabilities for the >99% who don't use these features.
} >> Are the usage numbers really that extreme?  Where'd you get them?  I
} >> didn't think there were any mechanisms in place that would allow
} >> tracking compat usage.
} > No, there is no strict procedure to monitor compat usage, and there
} > never will be.  Maybe it's not <1%, but rather 1.5%; or maybe it's
} > 5%, 10%, 15%.
} 
} > Who cares, exactly?
} 
} The short answer is "anyone who wants NetBSD to be useful".
} 
} If it really is only a tiny fraction - under ten people, say - then,
} sure, yank it out.  If it's 90%, removing it would lose most of the
} userbase, possibly provoke a fork.  15%, 40%, I don't think there is a
} hard line between "pull it" and "keep it", and even if there were I'm
} not sure it would matter because it appears nobody knows what the
} actual use rate is anyway.
} 
} > This compat topic has been discussed over and over, and the
} > conclusion is systematically that these compat options cause immense
} > trouble for little actual use.
} 
} Except the "little actual use" is, apparently, nothing but various wild
} guesses at the actual proportion.  Based on what I've seen in this
} thread, it looks as though the use rate is around 1/2 (two users, two
} non-users) - but, of course, that has no statistical validity; the
} sample is ludicrously small and entirely self-selected.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: Removing PF

2019-04-01 Thread Brian Buhrow
hello.  I can explain this feature.  I use it in two ways.

I use a number of public IP addresses from one ISP, whose service is
delivered through a VPN via another ISP.  Except for the traffic destined
for the far end of the VPN itself, I want all traffic to get routed through
the VPN and for any state information to be tracked through the VPN.  Below
is an example configuration.
The second way I use this is to do essentially the same thing, but to also
explicitly set the destination address on the packets destined for the far
end of the firewall itself.


Let me know if I can explain this more clearly.

-thanks
-Brian


First example:

#   $NetBSD: pf.conf,v 1.3 2005/03/15 16:05:03 peter Exp $
#   $OpenBSD: pf.conf,v 1.28 2004/04/29 21:03:09 frantzen Exp $
#
# See pf.conf(5) and /usr/share/examples/pf for syntax and examples.
# Remember to set net.inet.ip.forwarding=1 and/or net.inet6.ip6.forwarding=1
# in /etc/sysctl.conf if packets are to be forwarded between interfaces.

ext_if="vlan20"
int_if="wm0"
vpn_if="tun0"

#set options here
set block-policy drop
set skip on $vpn_if
set skip on lo0

scrub in all 

#internal filtering rules
pass in quick on $int_if from $int_if:network to $int_if:network 
pass in quick on $int_if from $int_if:network to 10.157.0.0/16 
pass in quick on $int_if from $int_if:network to 172.209.0.0/18 
pass in quick on $int_if from $int_if:network to 10.0.0.0/8 
#Pass internal network traffic through the VPN to expose it to the Internet
pass in quick on $int_if route-to $vpn_if from $int_if:network to any keep state
pass out quick on $int_if from $int_if:network to $int_if:network 
pass out quick on $int_if from 10.157.0.0/16 to $int_if:network 
pass out quick on $int_if from 172.209.0.0/18 to $int_if:network 
pass out quick on $int_if from 10.0.0.0/8 to $int_if:network 
# The next two lines are the traffic we let into the network unsolicited.
pass out quick on $int_if reply-to $vpn_if inet proto udp to 10.157.230.0/24 
port 1723 keep state
pass out quick on $int_if reply-to $vpn_if inet proto tcp to 10.157.230.11 port 
{ 22, 25, 53, 80, 110, 143, 443, 587, 993, 995 } keep state
block out quick on $int_if reply-to $vpn_if inet proto udp to $int_if:network 
port { 19, 111, 137, 161, 5060 }
block out quick on $int_if from any to any 

#Let all pass through the VPN tunnel interface
pass in quick on $vpn_if from any to any 
pass out quick on $vpn_if from any to any 

#external filtering rules
pass out quick on $ext_if from ($ext_if) to any keep state
pass in quick on $ext_if inet proto icmp from any to ($ext_if) icmp-type 
echoreq keep state

block all

Second example:

# Allow the back office to keep using old addresses (06/26/2017)
pass in quick on $dmz_if from $dmz_if:network to $dmz_if:network no state
pass in quick on $dmz_if from $dmz_if:network to $private_if:network no state
#Pass internal network traffic through the VPN to expose it to the Internet
pass in quick on $dmz_if route-to { ($vpn_if 10.157.9.105) }  from 
$dmz_if:network to any keep state
pass out quick on $dmz_if from $dmz_if:network to $dmz_if:network no state
pass out quick on $dmz_if from $private_if:network to $dmz_if:network no state
# The next line lets unsolicited inbound traffic come into the network.
pass out quick on $dmz_if reply-to { ($vpn_if 10.157.9.105) } from any to 
$dmz_if:network keep state
block out quick on $dmz_if from any to any no state




Re: Removing PF

2019-04-01 Thread Brian Buhrow
Hello.  Yes, I listed the features below upthread which are important
to me.  They exist in pf, and, as far as I can tell, not in npf.
-thanks
-Brian


--- Forwarded mail from "Brian Buhrow" 

Hello.  I will examine the documentation for npf again, but here are
the issues I think we should resolve with npf before we rip out pf or ipf.


1.  The documentation for npf is still pretty incomplete.  The example
files are there, but the comments, the last time I looked, were not as
complete as I think they should be.The man pages should also explain
the syntax of the configuration files in more detail, again with examples.

2.  For me to use npf at all, I absolutely need to have the
route-through/reply-to feature.  Pf has that feature.  


-Brian




Re: Removing PF

2019-03-30 Thread Brian Buhrow
Hello.  What folks are saying is that they can't do things with NPF
that they're currently doing with PF or IPF.  It isn't a matter of having a
strong preference, it's that, in my case for example, I cannot make the
setups I currently use in production work with NPF.  And, worse for NPF,
when I point this out or ask for suggestions to implement the features I
need in NPF to make the switch, I'm met with silence, or, yes, NPF doesn't
currently do that.  Would I like a secure, mp-safe firewall?  Yes, but I
need one that has the functionality I currently rely on and, if it's not a
newer version of PF, then I need enough documentation, and documented
examples, to map a migration from the configurations I'm using and which
work, to ones which work in the new environment.  What's reiterated in this
conversation by, not just me, but at least two other folks who have joined
this conversation, is that we're not there yet.  For example, someone asked
about doing ftp-proxy with NPF.  Max claims it's been in NPF since 2011.
Ok, ironic that he himself didn't know about that until he looked into the
matter, but did he post a link on how to use this 8 year old feature in
NPF?  No, he did not.  If it's there, how do we use it?  Where can we read
about how to configure it?
That's not the only example, but it's a concrete one we've seen in
this thread over the past 24 hours.  If you want to see NPF embraced, I
suggest the following:

1.  Make a list of the features missing from NPF that people want in PF and
IPF.

2.  Make a branch of the NetBSD tree and implement those features using the
NPF framework.

3.  When you have those features in and working to your satisfaction,
discussion can be taken up about merging them back into head.

this approach means you don't have to worry about PF or IPF in your
branch and you can concentrate on getting NPF up to speed.  
When it comes time to merge the newly featureful NPF back into head, then
Core can decide if it's ready to drop support for PF and/or IPF.
those of us who are using PF or IPF know we're dealing with older
technology that has issues, but, I'll say it again, those technologies
currently do things that NPF apparently doesn't do.  You're just not going
to get many takers with a promise of less functionality, even if that less
functional alternative is more secure.  You will see this play out over and
over again in a myriad of environments.  

For the sake of discussion, if I were to take on the effort of
importing the current version of PF and if I went so far as to do the work
to make it MP-safe, would you still object to having PF in the tree?  If
the answer is yes, then we're probably at an impass in this discussion.  

My point is that you're not giving us a real choice here.  you're
asking us to sign on to a proposal that doesn't deliver functionality we
currently have, however flawed it may be.  I think we really do want to
achieve a common goal, the question is, how to get there?  Make NPF dance
and sing with feature parity  to PF and IPF and then we can discuss.  But,
I repeat myself.  And, as a final note, I'll add that it may be that
FreeBSD is discussing dropping PF as we are, but they haven't done it yet
and my guess is that they won't until they have a fully functional
replacement.  I think that needs to be true for us as well.

-thanks
-Brian




Re: Removing PF

2019-03-29 Thread Brian Buhrow
Hello.  I will examine the documentation for npf again, but here are
the issues I think we should resolve with npf before we rip out pf or ipf.


1.  The documentation for npf is still pretty incomplete.  The example
files are there, but the comments, the last time I looked, were not as
complete as I think they should be.The man pages should also explain
the syntax of the configuration files in more detail, again with examples.
The features available in npf should be more fully enumerated.  It does me
no good to have a feature in npf if I don't know if it exists or how to use
it.  The recent example of mss clamping in the mailing list being an
example.  Maxine says that feature exists, but Patrick says he doesn't know
how to use it and he didn't even know it was there.

2.  For me to use npf at all, I absolutely need to have the
route-through/reply-to feature.  Pf has that feature.  I have 2
choices at this point if I want to continue using NetBSD as a routing
system: Keep pf working and performant in modern versions of NetBSD on my
own or teach npf how to do route-through/reply-to. Or, I could switch OS's.
How many other users/developers are in this position?I don't think we
know.

3.  The announcement should be included in the NetBSD-8 documentation to
say that pf and ipf are deprecated in NetBSD-8 and will be removed in
NetBSD-9.  In my view, it is insufficient to only include an announcement
like this in the -current documentation.  It should be added to the
last_minute and major changes file for NetBSD-8.


-Brian


On Mar 29,  8:19pm, Matt Sporleder wrote:
} Subject: Re: Removing PF
} 
} 
} What features, exactly, are missing?
>-- End of excerpt from Matt Sporleder




Re: Removing PF

2019-03-29 Thread Brian Buhrow
hello.  My suggestion is, rather than strongly advocating for the
removal of pf, which your message readily conceeds has more functionality
than npf does at this time, why not marshall an effort to get the missing
features from pf into npf?  I totally understand that the work to improve
NPF's scalability and robustness is vitally important, but if you want to
get consensus on removing npf, let's now get the additional functionality
into NPF.  Once done, we can discuss removing pf again.   At the risk of
being extremely rude, although that is not my intention, if as much effort
had been put into improving  the functionality of NPF as has been put into
the effort to get pf removed, I dare say, NPF would now be up to PF in
functionality and the conversation would be easier to have.

right now, NPF doesn't do what I need and from what I can tell in the
documentation, I'm not sure how to write the functionality into NPF.  that
leaves me with the task of  porting PF to a newer version of NetBSD on my
own, with no support from fellow NetBSD users, running an older version of
NetBSD as long as is feasible, or switching to another OS.  You say the
effort to port   modern version of PF to NetBSD is greater than writing the
additional functionality into NPF.  Yet, somehow, after years, that
functionality is not in NPF.  If it were that easy, I think it would have
been done.  And, prey tell, why is it harder to maintain a modern version
of PF than NPF?  If, for the sake of discussion, a modern version of PF were
imported and it were modified to scale with the new features of the network
stack in NetBSD, what would make it harder to maintain than NPF, which has
absolutely no market share outside of NetBSD?  If it is simply that you
don't want to do it, that's fine, but fundamentally, I don't really see
what's harder about maintaining one piece of code versus another.  The
documentation for PF is much more complete than it is for NPF, and when I
looked into the NPF source code, it looked like I would hav to grow a lot
of features in NPF to gain parity with PF.  And, once done, all those
features would have to be maintained going forward, not to mention that
there would be absolutely no testing outside of the NetBSD environment with
NPF.  And, even if the PF code in NetBSD differes from  PF under OpenBSD,
there would be enough similarity that some cross testing and patching would
be helpful.
I know you fundamentally disagree with this analysis and I understand
you are more familiar with the specific code inn question than I am, but I
think the fundamental problem here is that the NetBSD network stack is in
flux as folks work to MP-ify it and, the truth is, any firewall
implementation in the stack is going to require a lot of effort to keep up
with those changes.  As you've noted, firewalls are hard to write and
maintain.  NPF might be a better solution than PF or IPF, but it certainly
doesn't have as many eyes on it and there are a lot of folks who know how
to use one or both of the other solutions who don't have a clue about how
NPF works or should be configured.  I think dismissing that knowledge from
the equation is a mistake and should factor heavily in the decision about
which way we should go.
I also think Core should consider funding a project to either add the
missing functionality to NPF, along with improving the documentation,  or
funding a project to bring PF up to modern standards of functionality and
scalability.

-thanks
-Brian


Re: kernel frameworks for jumbo frame

2019-03-10 Thread Brian Buhrow
hello.  I'm not saying anything that anyone here doesn't already know,
but I'll add that Linux seems to have taken the position that all ethernet
interfaces should be called eth0, eth1, etc. This is fine as far as it
goes, but when you have to start figuring which physical hardware goes with
each interface, things get interesting.  Also, Jason's point about
interface numbering stability is well taken.  Fairly late versions of Linux
still don't have this right, which could be disastrous if, for example,
you're running a firewall with this level of instability and, after a
reboot, the inside and outside interfaces are reversed.  
I wonder if, rather than hiding the driver names, like wm0, bge0,
nfe1, etc. the busy work could be avoided by allowing the user to configure
aliases for said drivers, similar to what was done for disk names.

-thanks
-Brian



Re: RFC: New userspace fetch/store API

2019-02-24 Thread Brian Buhrow
hello.  It strikes me that if you've already written the code to fetch
and store 8, 16, 32 and sometimes 64 bit values, it should be committed.
If other OS's have this functionality, then it means applications wil use
it and subsequent porting of such applications to NetBSD will be easier if
the functionality is there.  It may seem like I'm arguing we should do it
because others did, and perhaps I am to some extent, but having an easy
platform to port aplications to makes netBSD much more attractive than it
would be if a bunch of functionality that's expected by application authors
is missing.

Just my 2 cents.
-thanks
-Brian



Re: scsipi: physio split the request

2018-12-27 Thread Brian Buhrow
hello.  Just out of curiosity, why did the tls-maxphys branch never
get merged with head once the work was done or mostly done?
-thanks
-Brian



Re: Missing compat_43 stuff for netbsd32?

2018-09-11 Thread Brian Buhrow
hello.  I think, but am not certain, that old NetBSD-0.8 binaries
might use some of those 4.3 syscalls.  I have a few binaries I'm still
using from  NetBSD-0.9A days which may use those syscalls as well.  I know
that I use COMPAT_43 in all of my kernel configs, and I have this vague
recollection that when I forgot to include that in one build, those
binaries broke.
-thanks
-Brian

On Sep 11,  1:19pm, Mouse wrote:
} Subject: Re: Missing compat_43 stuff for netbsd32?
} > I believe COMPAT_43 is not NetBSD 4.3 it's BSD 4.3.
} 
} I think so too.  Did NetBSD 4.3 ever exist?  According to
} /ftp/pub/NetBSD-archive on ftp.n.o, 4.0 and 4.0.1 were the only 4.x
} versions.  (Anonymous FTPing to ftp.n.o, I see /NetBSD-archive, but
} it's empty - apparently the archive is in /pub/NetBSD-archive.  I have
} no idea why /NetBSD-archive is there.)
} 
} > Did the 80386 even exist when Berkeley published BSD 4.3?
} 
} According to Wikipedia, the 80386 was introduced in 1985 and 4.3
} was released in June 1986, so, yes, it did.  But between .5 and 1.5
} years is not really long enough for it to be plausible that 4.3 ran on
} the '386.  I don't _think_ BSD ran on the '386 until the Jolitzes, but
} I wasn't close to that effort, so I don't really know.
} 
} > It's probably only useful for running ancient SunOS 4.x binaries,
} > maybe Ultrix, Irix or OSF-1 depending on how closely they followed
} > BSD 4.3.
} 
} That's pretty much my own perspective on COMPAT_43.  Probably should
} have been called COMPAT_BSD43 or some such
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: ddb input via IPMI virtual console

2018-08-07 Thread Brian Buhrow
Hello.  Sorry, my description wasn't clear.  
Since you hav an IPMI capable server, you should be able to turn serial
port redirection on in the BIOS such that com1 (from NetBSD's point of
view) becomes a virtual port which is accessible using the ipmitool
program.  You would do something like:

ipmitool -H 10.10.1.3 -U ADMIN -I lanplus sol activate
After you enter the password, you should be connected to the virtual
serial port where you can see output or type input.  Since this is a serial
port as far as NetBSD is concerned, DDB should work.
This is a separate session from your virtual console, so you can run
it in a separate window.

Change the username and IP address shown above to match
your setup.


To get NetBSD to use that serial port as a console, you'd do something
like:

cd /usr/mdec
installboot -v -o speed=115200 -o console=com1 /dev/boot 
bootxx_ffsv


-Brian

On Aug 7, 11:14am, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: Re: ddb input via IPMI virtual console
} > how about using a serial console in the kernel and then using ipmitool 
} > to talk to DDB when/if the machine goes down?
} I don't have a serial wire through the firewall.
} 
} > but kernel messages won't go there.
} It would be an awful drawback not to see the kernel messages on a physical 
} console.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=




Re: repeated panics in mutex_vector_enter (from unp_thread)

2018-08-06 Thread Brian Buhrow
Hello.  If you've not made any changes to the hardware or software and
things were stable, but now they're not, have you tried looking at the
hardware logs kept by the IPMI BMC board?  It could be there is a hardware
problem that's triggering this problem, rather than a specific software
bug.  Bad memory?  Failing cache controller?

-Brian

On Aug 6,  2:16pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: repeated panics in mutex_vector_enter (from unp_thread)
} Since a few days, I'm experiencing repeated panics in mutex_vector_enter.
} Nothing was changed to the server in question, probably, it's experiencing 
more 
} load/forks than before. The machine is still on 6.1, but I can't tell whether 
} the problem is version specific.
} 
} The tracebacks look similar (third and fourth coulumn in "show reg" output 
are 
} from subsequend panics, missing values mean same as first one):
} 
} uvm_fault(0x8076d460, 0x0, 1) -> e
} fatal page fault in supervisor mode
} trap type 6 code 0 rip 8027663b cs 8 rflags 10286 cr2  8 cpl 0 rsp 
fe811d83bbc0
} kernel: page fault trap, code=0
} Stopped in pid 0.67 (system) at netbsd:mutex_vector_enter+0x32c:  movq
1
} 8(%rdx),%rax
} db{5}> bt
} mutex_vector_enter() at netbsd:mutex_vector_enter+0x32c
} unp_thread() at netbsd:unp_thread+0x2eb
} db{5}> show reg
} ds64Y ?
} esd2a06405?
} fs269d563 563
} gs0
} rdi   fe834689f040fe83471f1700fe83481f1700
} rsi   1000
} rbp   fe811d83bc20fe811d811c20fe811d811c20
} rbx   fe834689f040fe83481f1700fe83481f1700
} rdx   fff0
} rcx   fff0
} rax   fe811d7ed2a0
} r8fe811d7ed2a0
} r90
} r10   0
} r11   2   1   1
} r12   0
} r13   0
} r14   fe811d7ed2a0
} r15   0
} rip   fff8027663b mutex_vector_enter+0x32c
} cs8
} rflags10286
} rsp   fe811d83bbc0fe811d811bc0fe811d811bc0
} ss0
} netbsd:mutex_vector_enter+0x32c:  movq18(%rdx),%rax
} db{5}> reboot 4
} 
} The machine than hangs hard, I need to press reset.
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=




Re: ddb input via IPMI virtual console

2018-08-06 Thread Brian Buhrow
hello.  Since you're using IPMI, how about using a serial console in
the kernel and then using ipmitool to talk to DDB when/if the machine goes
down?  You should be able to do this and still have a virtual VGA console,
but kernel messages won't go there.  This also has the advantage that you
can run script(1) and capture DDB output and/or panic messages for
posterity if you need them.

-Brian

On Aug 6,  2:27pm, Edgar =?iso-8859-1?B?RnXf?= wrote:
} Subject: ddb input via IPMI virtual console
} It looks like my IPMI implementation always emulates a USB keyboard on 
} the virtual console. The real keyboards are PS/2 and I can't change that 
} because it runs on a wire physically passing a /real/ firewall, e. g.
} a constructive element of the building designed to confine a possible fire 
} in the server room. It's close to prohibitively expensive to install another 
} (USB) cable through that and I didn't think about it when I orderd power, 
} VGA and PS/2 cables to be routed through the firewall.
} 
} Can I have ddb input multiplexed from both PS/2 and USB?
>-- End of excerpt from Edgar =?iso-8859-1?B?RnXf?=




Re: compat code function pointers

2018-03-19 Thread Brian Buhrow
hello.  I like the compat.* filenames as I think they're more
descriptive than the junk names.  If the contents of these files is just
for device links, you might even use:
kern_devcompat.*, which is even more descriptive.

Just my 2 cents.
-thanks
-Brian

On Mar 18,  9:00pm, Christos Zoulas wrote:
} Subject: compat code function pointers
} 
} Hi,
} 
} Paul and I have been working towards separating the compat code from the
} main kernel and eliminating all the COMPAT_XX ifdefs. This will allow
} the compat modules to work properly even if the kernel is not compiled
} with the compat options.
} 
} The general approach to this is to move all the compat code out of the
} main kernel leaving behind function pointers that typically point to
} noops (enosys) that the compat code can then repoint to the compat code
} functions when it loads (and re-point them to noops when it unloads).
} 
} The question came up where to store the function pointers for device
} drivers that are not always present in the kernel. We could use weak
} references in the compat code so that the code links, but the kernel
} linker does not currently support weak pointers. Another alternative
} is to create a file in the kernel that contains those function pointers.
} 
} Paul suggested:
} 
}   src/sys/kern/kern_junk.c
}   src/sys/sys/kern_junk.h
} 
} I suggested:
} 
}   src/sys/kern/kern_compat.c
}   src/sys/sys/compat.h
} 
} Opinions?
} 
} christos
>-- End of excerpt from Christos Zoulas




Re: kernel condvars: how to use?

2017-12-07 Thread Brian Buhrow
hello.  I don't consider myself an expert with this stuff, but I've
spent quite a bit of time converting kernel code from using spl()/splx() in
5.2 to using mutexes and convars with some success.  Here are some notes
that may be helpful in your work:

1.  You must initialize mutexes, as Taylor noted, and pick the kind of
mutex you want and the level at which you want it to run.  Mutexes that use
spin locks can't be used in interrupt context.
If you're sharing your IPL with other drivers that may want to hold mutexes
at the same level, particularly ones that might be doing some operation
like copying data into or out of user space, it's good to structure your
code in such a way that you won't try to run when they do.

2.  Initialize your convar with cv_init().

3.  Many times, you can replace s = spl(ipl) with mutex_enter(my_mutex) and
splx(s) with mutex_exit(my_mutex).  It's generally considered to be a very
bad idea to sleep while holding a mutex, much as it is while executing code
inside an spl'd stanza.

4.  If you run into lock contention when debugging your code, pay careful
attention to who holds the lock at the time of the panic.  I found times
when I was locking against myself in non-obvious manners.  

5.  Read the manual pages for mutex(9), convar(9) and rwlock(9), and, when
done, read them again.  Then, look at working examples in the code.
Eventually, it will click in your head and begin to make sense.

Hope these imprecise notes are somewhat helpful.


-Brian

On Dec 7,  6:24pm, Mouse wrote:
} Subject: kernel condvars: how to use?
} I'm trying to write some kernel code, interlocking between an interrupt
} (in my case, a callout()-called function) and a driver read() function.
} I'm using 5.2, so if this is because of bugs that have been fixed since
} then, that's useful information.  (And anyone who isn't interested
} because I'm on such an old version need read no further.)
} 
} I noted that the interfaces I have historically used for this - spl*(),
} sleep(), and wakeup() - are documented as deprecated in favour of
} condvar(9), mutex(9), and rwlock(9).  So I wrote some code using a
} condvar and a mutex, and the system promptly deadlocked.  I got into
} ddb, which told me it was inside intr_biglock_wrapper():
} 
} db{0}> tr
} breakpoint() at netbsd:breakpoint+0x5
} comintr() at netbsd:comintr+0x53a
} Xintr_ioapic_edge7() at netbsd:Xintr_ioapic_edge7+0xeb
} --- interrupt ---
} x86_pause() at netbsd:x86_pause
} intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x16
} Xintr_ioapic_level5() at netbsd:Xintr_ioapic_level5+0xf3
} --- interrupt ---
} x86_pause() at netbsd:x86_pause+0x2
} cdev_poll() at netbsd:cdev_poll+0x6d
} VOP_POLL() at netbsd:VOP_POLL+0x5e
} pollcommon() at netbsd:pollcommon+0x265
} sys_poll() at netbsd:sys_poll+0x5b
} syscall() at netbsd:syscall+0xb9
} db{0}> 
} 
} On reflection, I think I know why.  Userland's syscall handler took the
} mutex in preparation for cv_wait_sig(), the interrupt happens, my code
} is called (verified with a printf), and it tries to take the same mutex
} so it can cv_broadcast().  Of course, the mutex is held and, because
} it's held by code which can't run until the interrupt handler exits,
} will never be released.  Then, when a hardware interrupt hit it found
} the biglock held
} 
} Clearly, I'm doing something wrong.  But I can't see what.  I can't see
} how to use the condvar/mutex primitives without provoking the above
} failure mode.  And they appear to still be the current recommended way,
} based on what I could find, so I'm presumably just missing something.
} Any hints what?
} 
} I can of course provide more information if it would help, but I'm not
} sure what would be useful to mention here.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: Proposal: Disable autoload of compat_xyz modules

2017-08-02 Thread Brian Buhrow
Hello.  My feeling is that the cost of requiring a modload to use
compat_linux and compat_linux32 is fine.  My concern is that by taking it
out of the GENERIC kernel configuration, we lose the regular testing, such
as it is, with the daily builds.  Sure, the module gets built, but it could
be a while before it gets loaded and run by the test harness.  Today, with
these modules in GENERIC, the modules get loaded as a matter of course.
Is there a way to rig our test harness so that you can take the modules out
of the GENERIC kernel configuration and still do more than compile-time
test them?

-thanks
-Brian



Re: Making a BSD system more braille friendly

2017-01-02 Thread Brian Buhrow
Hello Enrico.  What I did was the following:

1.  Built the patched version of screen, apply the patches in the brltty
distribution to the screen package in the pkgsrc distribution.

2.  Build and install the patched version of screen using the tools.

3.  Build brltty using the ugen(4) driver and the oss sound system.

4.  Install brltty, making it owned by root and set the setuid bit.  (Make
sure you  only do this on a machine to which only you have access since
brltty iis not designed to be setuid and is surely a security risk on a
multi-user system.)

5.  Now, when you log into the system, you can run screen, then brltty and
you'll have a fully accessible braille terminal.  In fact, you can create a
shell script in your home directory that you can execute out of your
.profile or .login file that will execute these things for you so that once
you type your password, your braille display will spring to life.

Brltty has all the drivers for the various braile displays it supports
built in, so you don't need to worry about building special support for
your braille display.

-Brian



Re: ugen vs interfaces

2016-11-15 Thread Brian Buhrow
Hello.  A couple of items that might help you with your issue.

1.  The way things are supposed to work, I think, is that you open the
control interface, interface 0, and query it for the interfaces the
particular device supports.  Once you get a list of supported interfaces,
you can then open the specific endpoint units for those devices using the
open(2) call.   If you read: /usr/src/sys/dev/usb/usbdi.c and the include
files it references, you'll begin to get the idea.  If you want to see how
all this machinery works from the userland side of the house, I suggest
reading the NetBSD code from the libusb1-1 package.  there is some coverage
of this topic in the usb(4) man pages, and some web searches will turn up
additional notes as well.  

2.  If you're interested, I have a heavily re-worked ugen(4) driver for
NetBSD-5 which I've not yet ported to NetBSD-current which fixes a number
of issues with the ugen(4) driver.  Specifically, the stock ugen(4) driver
coalesses reads and writes from userland into fewer reads and writes to and
from the devices themselves.  For libraries, such as the libimobiledevice
library for Apple products, this behavior is fatal, in that libusb wants to
be able to control, precisely, what's read from and written to devices and
when.  Also, my changes allow poll(2)/select(2) to actually work on ugen
devices.
If you're interested, I'm happy to make my patches available.  I've been
using them heavily for about 6 months, but more testing is always good.

-thanks
-Brian

On Nov 15,  3:18pm, Mouse wrote:
} Subject: ugen vs interfaces
} I'm trying to use ugen - on 5.2, but a look at cvsweb (ugen.4 and
} ugen.c) makes me think current is identical in these regards.  I've
} forced the device of interest to configure as ugen ("flags 1") and my
} code can find it (bus 3 device 3, at the moment, for example).
} 
} The example code I'm working from was designed (of course :-รพ) for
} Linux, and seems to use interface 2.  But I can't see how to tell
} NetBSD I want to use interface 2.  I can _find_ interface 2, by walking
} interfaces and endpoints on the control endpoint.  But I can't figure
} out how to tell the kernel I want to do bulk I/O on endpoint 3 of
} interface 2.  The ugen device name specifies the bus and device (via
} the ugen instance, in the high bits) and the endpoint number (in the
} low four bits).  But I can't see anywhere I can set the interface
} number.  I can't see any way to set anything on the control endpoint so
} that opening the /dev/ugen* devices will come up using the interface I
} want and I can't see anything I can do with the endpoint file
} descriptor once opened that will cause it to switch to using the
} interface I want.
} 
} Surely I'm just missing something...but what?
} 
} I can, of course, give more details if they'd help.  But I suspect
} there's just something simple I'm not getting
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: ugen vs interfaces

2016-11-15 Thread Brian Buhrow
Hello.   Setting aside your confusion about interfaces and how they
map to device nodes in the /dev tree for the moment, I'm curious what
you're trying to do with this thing that can't be done by using it when it
attaches as ucom(4) and umodem(4) devices?  If you get a ucom(4) device,
then talking to it is just like talking to a tty(4) port, using /dev/ttyU0
or /dev/ttyU1, as appropriate.  Are you just trying to understand the
interface thing or is there a specific task this thing does that the
ucom(4) driver doesn't let you get at?
-thanks
-Brian

On Nov 15,  4:03pm, Mouse wrote:
} Subject: Re: ugen vs interfaces
} > 1.  The way things are supposed to work, I think, is that you open
} > the control interface, interface 0, and query it for the interfaces
} > the particular device supports.  Once you get a list of supported
} > interfaces, you can then open the specific endpoint units for those
} > devices using the open(2) call.
} 
} That's more or less what I was expecting.  But where it goes wrong for
} me is that I don't know what I can open() to get, say, "interface 2
} endpoint 3".  I can get "endpoint 3" by opening /dev/ugen%d.03 for
} some suitable %d, but I don't know how/where to specify "endpoint 3".
} 
} ugen does have an "interface" locator, but I've been unable to make
} that work.  Without special kernel config, the device shows up as
} 
} umodem0 at uhub8 port 2 configuration 1 interface 0
} umodem0: Texas Instruments In-Circuit Debug Interface, rev 1.10/1.00, addr 3, 
iclass 2/2
} umodem0: data interface 1, has no CM over data, has break
} umodem0: status change notification available
} ucom0 at umodem0
} 
} and if I configure
} 
} ugen1 at uhub? port ? vendor 0x1cbe product 0x00fd flags 1
} 
} then it shows up as a ugen - that's the form in which I've had the most
} success so far, walking the interfaces and endpoints as the ugen(4)
} manpage sketches.  But I have been unable to get a ugen to attach at
} interface 2.  My latest attempt is
} 
} ugen1 at uhub? port ? vendor 0x1cbe product 0x00fd configuration 1 interface 
0 flags 1
} ugen2 at uhub? port ? vendor 0x1cbe product 0x00fd configuration ? interface 
2 flags 1
} 
} which attaches no ugens anywhere and still attaches umodem0 and ucom0
} as above.
} 
} > 2.  If you're interested, I have a heavily re-worked ugen(4) driver
} > for NetBSD-5 which I've not yet ported to NetBSD-current which fixes
} > a number of issues with the ugen(4) driver.
} 
} Well, unless it lets me get hold of interface 2 I doubt it will
} actually help much here, but it won't hurt to try.
} 
} I'll ping you offlist.
} 
} /~\ The ASCII   Mouse
} \ / Ribbon Campaign
}  X  Against HTML  mo...@rodents-montreal.org
} / \ Email! 7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B
>-- End of excerpt from Mouse




Re: Plan: journalling fixes for WAPBL

2016-09-21 Thread Brian Buhrow
hello.  Does this discussion imply that the WAPBL log/journaling
function is broken in NetBSD-current?  Are we back to straight FFS as it
was before the days of WAPBL or softdep?  Please tell me I'm mistaken about
this.  If so, that's quite a regression, even from NetBSD-5 where both
WAPBL log and softdep work quite well.
-thanks
-Brian



Re: Spinning down sata disk before poweroff

2016-06-17 Thread Brian Buhrow
hello.  I think this is already handled in wd.c  in wd_shutdown() we
see:

wd_flushcache(wd, AT_POLL);
if ((how & RB_POWERDOWN) == RB_POWERDOWN)
wd_standby(wd, AT_POLL);
return true;

So, wd_standby only gets called if RB_POWERDOWN is sent in the howto
argument.  It seems like the howto argument isn't geting set to
RB_POWERDOWN in this case and the question is why isn't it set?

-thanks
-Brian

On Jun 17, 11:59am, David Young wrote:
} Subject: Re: Spinning down sata disk before poweroff
} On Fri, Jun 17, 2016 at 01:23:36PM +0200, Manuel Bouyer wrote:
} > On Fri, Jun 17, 2016 at 01:49:43AM +, Anindya Mukherjee wrote:
} > > Hi,
} > > 
} > > I'm running NetBSD 7.0.1_PATCH (GENERIC) amd64 on a Dell laptop. Almost 
everything is working perfectly, except the fact that every time I shutdown 
using the -p switch, the hard drive makes a loud click sound as the system 
powers off. I checked the SMART status (atactl and smartctl) and after every 
poweroff the Power_Off-Retract-Count parameter increases by 1.
} > > 
} > > I did some searching on the web and came across PR #21531 where this 
issue was discussed from 2003-2008 and finally a patch was committed which 
resolved the issue by sending the STANDBY_IMMEDIATE command to the disk before 
powering off. Since then the code has been refactored, but it is present in 
src/sys/dev/ata/wd.c line 1970 (wd_shutdown) which calls line 1848 
(wd_standby). This seemed strange since the disk was definitely not being spun 
down.
} > > 
} > > I attached a remote gdb instance and stepped through the code during 
shutdown, breaking on wd_flushcache() which is always called. The code path 
taken is wdclose()->wdlastclose() (lines 1029, 1014). I can see that the cache 
is flushed but then the device is deleted in line 1023. Subsequently, power is 
cut off during shutdown, causing an emergency retract. So, it seems at least 
for newer sata disks the spindown code is not being called. I'm fairly new to 
NetBSD code so there is a chance I read this wrong, so feel free to correct me.
} > > 
} > > Ideally I'd like the disk to spin down during poweroff (-p) and halt 
(-h), perhaps settable using a sysctl, but not during a reboot (-r). I am 
planning to patch wdlastclose() as an experiment to run the spindown code to 
see if it stops the click. Is this a known issue, worthy of a PR? I can file 
one. I can also volunteer a patch once I have tested it on my laptop. Comments 
welcome!
} > 
} > 
} > So the disk is not powered off because it's detached before the pmf 
framework
} > has a chance to power it off (see amd64/amd64/machdep.c:cpu_reboot()).
} > that's bad.
} > Doing the poweroff in wdlastclose() is bad because then you'll have a
} > poweroff/powerup cycle for a reboot, or even on unmount/mount events if this
} > is not your root device. This can be harmfull for some disks (this has 
already
} > been discussed).
} > 
} > The (untested) attached patch should fix this by calling pmf before detach;
} > can you give it a try ?
} 
} Careful!  The alternation of detaching devices and unmounting
} filesystems is purposeful.  You can have devices such as vnd(4) backed
} by filesystems backed by further devices.
} 
} It's possible that unmounting a filesystem will counteract the PMF
} shutdown.
} 
} A less intrusive change that's likely to work pretty well, I think, is
} to introduce a new flag, DETACH_REBOOT or DETACH_STAY_POWERED, that's
} passed to config_detach_all() by cpu_reboot() when the RB_* flags
} indicate a reboot is happening.  Then, in the wd(4) detach routine, put
} the device into standby mode if the flag is not set.
} 
} Dave
} 
} -- 
} David Young //\ Trestle Technology Consulting
} (217) 721-9981  Urbana, IL   http://trestle.tech/
>-- End of excerpt from David Young




  1   2   3   >