Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1

2020-07-05 Thread Brian Buhrow
Hello.  I agree with Mouse, except that I also think it would be very
helpful and useful to have a serial console on USB only devices.  I wonder
if we could make the console a virtual device which is attached dynamically
to a USB serial  port if and when available.  that would let the system
think it has a console, but one would only see it when the kernel and the
USB subsystem are up.  Yes, I get this would make watching things boot
challenging, but by the time you get to single user mode, the kernel is
fully up and running and USB is or should be available by then.

thoughts?


daily CVS update output

2020-07-05 Thread NetBSD source update


Updating src tree:
P src/share/man/man4/speaker.4
P src/sys/arch/arm/sunxi/sunxi_nand.c
P src/sys/arch/sparc64/dev/ffb.c
P src/sys/arch/sparc64/sparc64/autoconf.c
P src/sys/arch/x86/x86/fpu.c
P src/sys/dev/pci/ciss_pci.c
P src/sys/dev/pci/machfb.c
P src/sys/dev/pci/radeonfb.c
P src/sys/sys/fstypes.h
P src/sys/ufs/ffs/ffs_vfsops.c
P src/usr.sbin/cpuctl/arch/aarch64.c

Updating xsrc tree:


Killing core files:



Updating release-8 src tree (netbsd-8):

Updating release-8 xsrc tree (netbsd-8):



Updating release-9 src tree (netbsd-9):

Updating release-9 xsrc tree (netbsd-9):




Updating file list:
-rw-rw-r--  1 srcmastr  netbsd  40089841 Jul  6 03:10 ls-lRA.gz


Re: NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1

2020-07-05 Thread Mouse
> That said, what would it take to wire the NetBSD console to a USB
> serial adapter?

"Too much".  USB is horrible from this persective; you need quite a lot
of infrastructure to do, well, pretty much anything over USB.  Take a
look at the code sometime.  In the particular case of the console,
you'd also have to figure out what to do with the console if the
hardware goes away.

It's one of the reasons I dislike USB, and closely related to most of
the others.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: 9.99.69 panic - libcrypto changes?

2020-07-05 Thread Taylor R Campbell
> Date: Sun, 5 Jul 2020 11:45:48 +0100
> From: Chavdar Ivanov 
> 
> panic: fpudna from userland, ip 0x7c16e87b95ca, trapframe 0xce01527ec000
> cpu0: Begin traceback...
> vpanic() at netbsd:vpanic+0x152
> snprintf() at netbsd:snprintf
> fpu_set_default_cw() at netbsd:fpu_set_default_cw
> cpu0: End traceback...

This and the earlier panic you reported may be fixed by

https://mail-index.netbsd.org/source-changes/2020/07/06/msg119081.html

(The performance regression with WPA/WPA2 remains -- I'm working on
that.)


NetBSD-7.0 boots OK and NetBSD-8.0 hangs/crashes during boot on a MacBook7,1

2020-07-05 Thread Greg A. Woods
So, in my ongoing NetBSD on a MacBook saga

NetBSD-7.2 boots fine from USB on the MacBook Pro (MacBook7,1) (with the
help of rEFIT on a second USB stick).

NetBSD-8.2 and newer, including the most recent -current, hangs during
boot and the kernel messages appear to have torn video:

 http://www.planix.ca/~woods/macbookpro-netbsd-boot-fail.jpg


However today I discovered that NetBSD-8.0 will often boot with the
kernel messages properly visible in nice green on black in a full
52(?)-line display, but it hangs or crashes.  (It is not reliable at
booting though -- sometimes the boot loader just hangs without printing
anything.)

If the boot loader does work though, and if I boot "normally" it just
hangs, with the last message being:

pci0 at mainbus0 bus0: configuration mode 1

The caps-lock button is dead so I think the machine is well and truly
frozen in a CPU loop (the CPU is hot, the fan runs fast).

I'm guessing NetBSD-8.2 and everything more recent is also hanging at
this same spot, but with the busted video mode it's hard to tell for
sure.

If I boot 8.0 with ACPI turned off (boot option #2 or from the boot
prompt "boot -2"), it crashes into ddb after getting a bit further, but
there are many errors about not being ablt to map PCI interrupts.

If I boot 8.0 with "-vx", there are quite a number of "invalid config
space" messages after the pci0 attachment:

pci0 at mainbus0 bus0: configuration mode 1
acpi0: MCFG: 000:00:0: invalid config space (cfg[0x100]=0x, 
alias=false)

The second and third numbers change in each following message, and in
two of those messages the cfg[0x100] number is 0x.

So it looks like ACPI is necessary, but support for using it in this
MacBook7,1 is broken somehow.

I can post a full-res photo of the screen in one or more or all of these
states it someone wants to see it.

In any case, what might have been changed after 8.0 that broke the video
output?  Where do I look?  Is amd64 video now the genfb(4) device code?
Or is it still vga(4)?  If it's genfb(4), then I do see commits about
doing anti-aliasing, and maybe the video junk I see could possibly be
explained by such a thing.  If I can get 7.2 installed (likely), so that
I need only drop a kernel in place instead of building the whole
installimage and writing the damn slow USB stick with a whole install
image every time, then maybe I'll be able to try bisecting changes to
get the video working right again.

I really wish modern PC vendors were not still so bloody stupid with
their firmware as to make it impossible to talk to them via a serial
port of some kind (e.g. a USB serial adapter as console would be
awesome!).  That said, what would it take to wire the NetBSD console to
a USB serial adapter?

In lieu of that it would be nice if hitting ^S on the keyboard would at
least pause the kernel messages from scrolling by during boot, but I get
that such a thing might be a bit hard to arrange for in NetBSD.

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpvllmWWoiDK.pgp
Description: OpenPGP Digital Signature


USB storage transfers halt when usbdevs is run: hardware bug or software bug?

2020-07-05 Thread Greg A. Woods
USB storage device transfers freeze when usbdevs is run:  hardware bug
or software bug?

While I was doing a "gzcat < *.gz > /dev/rsd2d", where sd2 was a USB
memory stick, I happened to run "usbdevs -dv" and the writes to the USB
device froze, and indeed the writing process was stuck in the kernel (I
couldn't even stop it with ^Z).

Luckily yanking the stick out seemed to unfreeze and kill the process
and clean everything up nicely and I was able to re-insert it and re-do
the write to it without incident.

This is on an amd64 server running 9.99.64.

Upon removal and subsequent re-insertion the kernel said the following
(but was silent before this when usbdevs ran):

[ 193334.306434] umass0: BBB reset failed, IOERROR
[ 193334.306434] umass0: BBB bulk-in clear stall failed, IOERROR
[ 193334.318288] umass0: BBB bulk-out clear stall failed, IOERROR
[ 193334.318288] umass0: BBB reset failed, IOERROR
[ 193334.329223] umass0: BBB bulk-in clear stall failed, IOERROR
[ 193334.329223] umass0: BBB bulk-out clear stall failed, IOERROR
[ 193334.341024] umass0: BBB reset failed, IOERROR
[ 193334.341024] umass0: BBB bulk-in clear stall failed, IOERROR
[ 193334.351781] umass0: BBB bulk-out clear stall failed, IOERROR
[ 193334.357775] sd2d: error writing fsbn 4053632 of 4053632-4053759 (sd2 bn 
4053632; cn 4021 tn 7 sn 23)
[ 193334.366963] umass0: BBB reset failed, IOERROR
[ 193334.366963] umass0: BBB bulk-in clear stall failed, IOERROR
[ 193334.378283] umass0: BBB bulk-out clear stall failed, IOERROR
[ 193334.378283] umass0: BBB reset failed, IOERROR
[ 193334.389225] umass0: BBB bulk-in clear stall failed, IOERROR
[ 193334.389225] umass0: BBB bulk-out clear stall failed, IOERROR
[ 193334.401026] umass0: BBB reset failed, IOERROR
[ 193334.401026] umass0: BBB bulk-in clear stall failed, IOERROR
[ 193334.411782] umass0: BBB bulk-out clear stall failed, IOERROR
[ 193334.417780] umass0: BBB reset failed, IOERROR
[ 193334.417780] sd2(umass0:0:0:0): generic HBA error
[ 193334.426444] sd2: detached
[ 193334.426444] scsibus1: detached
[ 193334.426444] umass0: detached
[ 193334.436445] umass0: at uhub6 port 2 (addr 5) disconnected

reinsertion:

[ 193341.516925] umass0 at uhub6 port 2 configuration 1 interface 0
[ 193341.516925] umass0: SMI Corporation (0x090c) USB DISK (0x1000), rev 
2.00/11.00, addr 5
[ 193341.526926] umass0: using SCSI over Bulk-Only
[ 193341.526926] scsibus1 at umass0: 2 targets, 1 lun per target
[ 193342.366983] sd2 at scsibus1 target 0 lun 0:  disk 
removable
[ 193342.376985] sd2: 7712 MB, 15744 cyl, 16 head, 63 sec, 512 bytes/sect x 
15794176 sectors
[ 193342.386986] sd2: GPT GUID: d1e3490c-b0e6-42e9-9d9e-3ac286a0f7e0
[ 193342.396989] dk6 at sd2: "EFI system", 262144 blocks at 2048, type: msdos
[ 193342.396989] dk7 at sd2: "d3aa0396-d911-4aac-baa8-f2478557d31a", 7544832 
blocks at 264192, type: ffs


I'm guessing it's a software bug with bad locking order somewhere.

--
Greg A. Woods 

Kelowna, BC +1 250 762-7675   RoboHack 
Planix, Inc.  Avoncote Farms 


pgpl6RBM0wIkw.pgp
Description: OpenPGP Digital Signature


Re: atexit(), dlclose() and more atexit()

2020-07-05 Thread Kamil Rytarowski
On 05.07.2020 19:42, Robert Elz wrote:
> Date:Tue, 30 Jun 2020 13:43:00 +0200
> From:Kamil Rytarowski 
> Message-ID:  
> 
> I had been ignoring this discussion, but on cleaning up some
> unread list e-mail, I saw this nonsense, and this is just going too far.
> 
>   | This is an extension and extensions are allowed.
> 
> That's absolutely true, but that doesn't relieve the implementation of
> the need to follow what the standard does require.
> 
> And in this case that is:
> 
>   At normal program termination, all functions registered by the
>   atexit( ) function shall be called, in the reverse order of their
>   registration,
> 

This is extended to the behavior of "at dlclose() or a normal program
termination".

> That is, when the program ends, *every*  function registered by atexit()
> must be called - there is nothing there which ever suggests "except if it
> has already been called".  That isn't there, because atexit() functions
> are only expected to be called when the process exits (code can explicitly
> call such a function, independently, if it wants to of course).
> 
> Not only must the functions be called, the order in which they are to be
> called is specified, so if program does atexit(A), then dlopen(L), and in
> the init function for L, we get atexit(B), after which (after the dlopen and
> the init functions are done) the program does atexit(C), then at
> program termination time, the atexit processing must call C, and then B,
> and then A; B must not be called (as part of atexit processing) before C.
> 
> I really cannot see how you can possibly mangle the operations and remain
> compliant with the standard (nor how any other implementation can).
> 

Literal and unextended implementation of the standard happened to be
unpractical. All/most mainstream users (all other BSDs, Win, Mac, GNU,
Solaris, ...) diverged from it within the last 20 years.

> In another message ka...@netbsd.org said:
>   | Technically atexit() != __cxa_atexit(), but the "atexit-registered 
> function"
>   | mechanism is in place and defined for early DSO unload in C++. 
> 
> No-one cares who invented what when, but the very existence of
> cta_atexit (which not just technically is != atexit, it simply
> isn't atexit) would be because atexit() could not be sanely coerced
> to work for the purpose intended.   Don't you think that if atexit()
> would work, they wouldn't simply have used it, instead of inventing
> a new (similarly named) function for the purpose?

atexit() is a direct subset of __cxa_atexit() and they are asked to
share the same internal implementation.

> 
> Back to the initial message:
>   | Another option would be to make dlclose() no-op and keep atexit(3)
>   | operational, but this is certainly not what we want.
> 
> Actually, that one is a possible solution.  dlclose() is not required
> to do anything at all.   While having it never do anything isn't what
> we'd want, having it do nothing if there is a pending atexit function
> from the dynamic object (or even simply one registered by the dynamic
> object - though the problematic case, as I understand it, is when the
> function has been removed and so can no longer sensibly be called) is
> not a ridiculous suggestion.
> 
> If a dynamic library has registered an atexit function, its obvious
> intent is that it will remain loaded until the program exits, and so
> in that case making dlclose(), if called, do nothing seems like an
> entirely sensible idea.
> 

That would be a progress over our current behavior that crashes
always... but it would still be harmful in serious usage.

> kre
> 




signature.asc
Description: OpenPGP digital signature


Re: atexit(), dlclose() and more atexit()

2020-07-05 Thread nia
On Mon, Jul 06, 2020 at 12:42:55AM +0700, Robert Elz wrote:
> Actually, that one is a possible solution.  dlclose() is not required
> to do anything at all.   While having it never do anything isn't what
> we'd want, having it do nothing if there is a pending atexit function
> from the dynamic object (or even simply one registered by the dynamic
> object - though the problematic case, as I understand it, is when the
> function has been removed and so can no longer sensibly be called) is
> not a ridiculous suggestion.

There is precedence for dlclose doing nothing - it's a no-op in musl libc,
by design.

There are obvious downsides (servers with reloadable modules suddenly
have memory leaks) but that's arguably not critical.

Their justification can be found here:

https://wiki.musl-libc.org/functional-differences-from-glibc.html#Unloading-libraries


Re: atexit(), dlclose() and more atexit()

2020-07-05 Thread Robert Elz
Date:Tue, 30 Jun 2020 13:43:00 +0200
From:Kamil Rytarowski 
Message-ID:  

I had been ignoring this discussion, but on cleaning up some
unread list e-mail, I saw this nonsense, and this is just going too far.

  | This is an extension and extensions are allowed.

That's absolutely true, but that doesn't relieve the implementation of
the need to follow what the standard does require.

And in this case that is:

At normal program termination, all functions registered by the
atexit( ) function shall be called, in the reverse order of their
registration,

That is, when the program ends, *every*  function registered by atexit()
must be called - there is nothing there which ever suggests "except if it
has already been called".  That isn't there, because atexit() functions
are only expected to be called when the process exits (code can explicitly
call such a function, independently, if it wants to of course).

Not only must the functions be called, the order in which they are to be
called is specified, so if program does atexit(A), then dlopen(L), and in
the init function for L, we get atexit(B), after which (after the dlopen and
the init functions are done) the program does atexit(C), then at
program termination time, the atexit processing must call C, and then B,
and then A; B must not be called (as part of atexit processing) before C.

I really cannot see how you can possibly mangle the operations and remain
compliant with the standard (nor how any other implementation can).

In another message ka...@netbsd.org said:
  | Technically atexit() != __cxa_atexit(), but the "atexit-registered function"
  | mechanism is in place and defined for early DSO unload in C++. 

No-one cares who invented what when, but the very existence of
cta_atexit (which not just technically is != atexit, it simply
isn't atexit) would be because atexit() could not be sanely coerced
to work for the purpose intended.   Don't you think that if atexit()
would work, they wouldn't simply have used it, instead of inventing
a new (similarly named) function for the purpose?

Back to the initial message:
  | Another option would be to make dlclose() no-op and keep atexit(3)
  | operational, but this is certainly not what we want.

Actually, that one is a possible solution.  dlclose() is not required
to do anything at all.   While having it never do anything isn't what
we'd want, having it do nothing if there is a pending atexit function
from the dynamic object (or even simply one registered by the dynamic
object - though the problematic case, as I understand it, is when the
function has been removed and so can no longer sensibly be called) is
not a ridiculous suggestion.

If a dynamic library has registered an atexit function, its obvious
intent is that it will remain loaded until the program exits, and so
in that case making dlclose(), if called, do nothing seems like an
entirely sensible idea.

kre



Re: 9.99.69 panic - libcrypto changes?

2020-07-05 Thread Chavdar Ivanov
Thank you for the detailed response (I can't claim to understand
completely, of course.). A saved kernel from 9.99.68 still lets me
work with the machine as before; I updated it yesterday and got
another - perhaps identical - panic when downloading mail with
Thunderbird

panic: fpudna from userland, ip 0x7c16e87b95ca, trapframe 0xce01527ec000
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x152
snprintf() at netbsd:snprintf
fpu_set_default_cw() at netbsd:fpu_set_default_cw
cpu0: End traceback...

dumping to dev 168,15 (offset=8, size=4152523):
dump autoconfiguration error: ahcisata0 port 3: clearing WDCTL_RST
failed for drive 0
WARNING: negative runtime; monotonic clock has gone backwards
wddump: device timed out
i/o error


rebooting...

(and no core dump, of course).


On Sat, 4 Jul 2020 at 21:19, Taylor R Campbell  wrote:
>
> > Date: Thu, 2 Jul 2020 23:09:16 +0100
> > From: Chavdar Ivanov 
> >
> > On amd64 9.99.69 from yesterday I get:
> > [...]
> > System panicked: fpudna from kernel, ip 0x802292af, trapframe
> > 0xbe013c564a50
> > [...]
> > Xtrap07() at Xtrap07+0xbd
> > aesni_enc_impl() at aesni_enc_impl+0x1c
> > rijndaelEncrypt() at rijndaelEncrypt+0x4b
> > ccmp_init_blocks() at ccmp_init_blocks+0xe8
> > [...]
>
> I am investigating.  There must be a bug somewhere in the x86 vector
> register state management I used to used to allow the kernel to use
> AES-NI, but I'm not yet sure what it is.
>
> > My WiFi link (iwm) is also visibly slower than usual.
..
> > happened while I was running 'pkgin upgrade' over an NFS mount through
> > the iwm adapter.
>
> This is likely an unintended side effect of my recent AES rework
> (https://mail-index.netbsd.org/tech-kern/2020/06/18/msg026505.html).
>
> For systems where we can take advantage of hardware AES support, like
> yours, after every call into the AES subsystem, the kernel will zero
> the vector registers to avoid leaking secrets through Spectre-class
> speculative execution attacks.
>
> Although your kernel is evidently now taking advantage of hardware
> support for AES (the x86 AES-NI CPU instructions), which is much
> faster than software AES, the logic in our 802.11 stack to compute
> CCMP (the authenticated cipher used in your WPA setup) calls the AES
> block cipher one block at a time.
>
> So it's zeroing all the vector registers for every 16 bytes of data in
> every frame -- twice, because AES-CCM involves two block cipher calls
> for every block of data (one for the AES-CBC-MAC authenticator, one
> for the AES-CTR encryption pad).  I expect this is the source of the
> slowdown you're witnessing.
>
>
> There are a few ways we could work around this:
>
> 1. Push the AES-CCM computation into the AES subsystem, so we only
>zero the vector registers once per frame, or once per mbuf segment.
>This requires a bit of work but if I can find CCMP test vectors
>then it shouldn't be too hard.  At worst, it will require redoing
>when the wifi branch is merged.
>
> 2. Push ieee80211_crypto_* into a worker thread, and use
>
>to avoid zeroing the vector registers.  However, this may require
>some design changes in the 802.11 stack and it's not clear that
>they're the right changes or that this can be done quickly.
>
> 3. Invent a new nestable transaction mechanism to defer zeroing the
>vector registers.  However, there might also be a penalty to
>enabling or disabling the fpu, so it might not solve the whole
>problem, and it is not entirely clear what it should mean in an MI
>context.
>
> Another approach, of course, is to simply use an open wifi network
> instead -- generally hop-by-hop authenticated encryption like WPA is
> not worth much compared to end-to-end authenticated encryption like
> TLS, SSH, or Wireguard.

Chavdar


--