Re: Forcing a USB device to "ugen"

2024-03-26 Thread Manuel Bouyer
On Tue, Mar 26, 2024 at 12:25:07AM +, Taylor R Campbell wrote:
> > Date: Mon, 25 Mar 2024 19:47:31 -0400
> > From: Greg Troxel 
> > 
> > Jason Thorpe  writes:
> > 
> > > I should be able to do this with OpenOCD (pkgsrc/devel/openocd), but
> > > libfdti1 fails to find the device because libusb1 only deals in
> > > "ugen".
> > 
> > Is that fundamental, in that ugen has ioctls that are ugen-ish that
> > uftdi does not?   I am guessing you thought about fixing libusb1.
> 
> It is possible that we could kludge some horrible hacks into ucom(4)
> to pass /dev/ttyU* ioctls through to uftdi(4), but not all USB drivers
> even have a /dev node that could be hacked up in that way.  Really,
> there is a general fundamental limitation with NetBSD's USB stack:
> user programs have no way to take over USB devices from kernel
> drivers.
> 
> We should really expose a /dev/ugen* instance for _every_ USB device;
> those that have kernel drivers attached have only limited access via
> /dev/ugen* (no reads, writes, transfer ioctls, ), until you do
> ioctl(USB_KICK_OUT_KERNEL_DRIVER) or whatever, at which point the
> kernel driver will detach and the user program can take over instead
> and use the full ugen(4) API.
> 
> This is how it works in other systems like Linux with
> USBDEVFS_CLAIMINTERFACE, and that's the model that libusb is built
> around.  It's a nontrivial change to our USB stack requiring some care
> to get right, but this is far and away the biggest shortcoming of our
> USB stack and we should unquestionably do it.

Strongly seconded. 

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PVH boot with qemu

2024-01-08 Thread Manuel Bouyer
On Mon, Jan 08, 2024 at 10:03:07PM +0100, Emile 'iMil' Heitor wrote:
> 
> This morning I was given the idea of having the possibility to build a
> Xen-free kernel but still GENPVH capable.
> This doesn't impact GENERIC which is still able to boot both Xen and
> GENPVH with the following configuration:
> 
> options   XENPVHVM
> options   XEN
> hypervisor*   at mainbus? # Xen hypervisor
> xenbus*   at hypervisor?  # Xen virtual bus
> xencons*  at hypervisor?  # Xen virtual console
> ...
> 
> Now for GENPVH only we would have a unique kernel configuration
> option:
> 
> options GENPVH
> 
> The only drawback I see is that it adds quite some ifn?def's GENPVH.
> 
> Here's the patch: https://imil.net/NetBSD/noxen.patch
> 
> Does this look reasonable to you?

in consinit.c you have:

+#if defined(XENPVHVM) || defined(GENPVH)
+#ifndef GENPVH
if (vm_guest == VM_GUEST_XENPVH) {
if (xen_pvh_consinit() != 0)
return;
/* fallback to native console selection, usefull for dom0 PVH */
}
+#endif

shouldn't the #ifndef GENPVH really be #ifdef XENPVHVM ?

In the same way, the #ifndef GENPVH in xen_machdep.c should either be
#ifdef XENPVHVM or #ifdef XEN

because we probably want to build kernels with both XENPVHVM and GENPVH

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PVH boot with qemu

2023-12-11 Thread Manuel Bouyer
On Mon, Dec 11, 2023 at 10:22:18AM +0100, Emile `iMil' Heitor wrote:
> 
> Hi Manuel,
> 
> On Mon, 11 Dec 2023, Manuel Bouyer wrote:
> 
> > #ifndef GENPVH
> > /* get a page for HYPERVISOR_shared_info */
> > addl$PAGE_SIZE, %ebx
> > addl$PGOFSET,%ebx
> > andl$~PGOFSET,%ebx
> > movl$RELOC(HYPERVISOR_shared_info_pa),%ebp
> > movl%ebx,(%ebp)
> > movl$0,4(%ebp)
> > #endif
> > 
> > How can this work on Xen when GENPVH is defined ?
> > Shouldn't this be made conditional on vm_guest == VM_GUEST_XENPVH ?
> 
> Well the point is that you don't define GENPVH when using Xen, PVH using
> qemu and friends don't need HYPERVISOR_shared_info neither any of the
> hypercall portion of the code. A big chunk of Xen related code is
> ifndef'ed to GENPVH in hypervisor.c; And I was planning on isolating GENPVH
> so there's as little ifdef's as possible.
> 
> Or would you prefer the same kernel to be able to boot in both XENPVH and
> GENPVH modes? I am focusing on making the resulting kernel smaller but this
> could be done also.

Yes, right now GENERIC can be used on bare-metal, PVHVM and XENPVH.
It would be good to have GENERIC working on GENPVH too.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PVH boot with qemu

2023-12-11 Thread Manuel Bouyer
On Mon, Dec 11, 2023 at 08:26:01AM +0100, Emile `iMil' Heitor wrote:
> 
> Here is a clean(er) patch 
> https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:GENPVH
> 
> Rationale
> 
> Like previously explained, locore.S expects start_info being passed by the
> calling hypervisor on %ebx to be located at the end of the symbol table.
> Qemu and Firecracker don't follow this rule which is not part of the
> official Xen ABI https://xenbits.xen.org/docs/unstable/misc/pvh.html
> 
> What our patch first does is make memory mapping loops happy by copying
> the start_info structure where it's expected.
> After that, memory locations and boot parameters are correctly found and
> boot can proceed.
> Of course, the hypervisor not being Xen, a lot of Xen-related code is
> useless, hence the new VM_GUEST_GENPVH (for Generic PVH) vm_guest type,
> as first suggested by Manuel, and a new kernel option, GENPVH.
> I kept the Xen code structure as it was and changed very little code, only
> some || vm_guest == VM_GUEST_GENPVH and a couple #ifndef GENPVH.
> 
> In order to build a Generic PVH kernel, the following options are needed
> 
> #Xen PV support for PVH and HVM guests
> options XENPVHVM
> options XEN
> # Generic PVH support (qemu, firecracker...)
> options GENPVH
> 
> I've added 
> https://github.com/NetBSDfr/NetBSD-src/blob/GENPVH/sys/arch/amd64/conf/MICROVM
> as an example config file.
> I'll probably end up ditching XENPVHVM and XEN but there's still quite
> some work in there.
> 
> We still need to check if we didn't break anything on Xen side and test
> Firecracker. FYI qemu-system-x86_64 also works with the "microvm"
> machine type.
> 
> Feedback very welcome.


Hello,
I don't understand this part:


#ifndef GENPVH
/* get a page for HYPERVISOR_shared_info */
addl$PAGE_SIZE, %ebx
addl$PGOFSET,%ebx
andl$~PGOFSET,%ebx
movl$RELOC(HYPERVISOR_shared_info_pa),%ebp
movl%ebx,(%ebp)
movl$0,4(%ebp)
#endif

How can this work on Xen when GENPVH is defined ?
Shouldn't this be made conditional on vm_guest == VM_GUEST_XENPVH ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PVH boot with qemu

2023-11-29 Thread Manuel Bouyer
On Wed, Nov 29, 2023 at 12:41:53PM +0100, Emile `iMil' Heitor wrote:
> On Wed, 29 Nov 2023, Manuel Bouyer wrote:
> 
> > Of course, this is *not* a Xen VM, so no surprise that start_xen32
> > isn't working.
> 
> I'm just sharing the progress here, in case someone is interested. If this
> is annoying, I'll just keep it to myself until I post an -hypothetical-
> final patch, and sorry for the noise.

It's not annoying. But I think the first step would be to find a way to
have Xen and non-Xen entry points. Hacking on start_xen32 doens't look like
the way to go.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PVH boot with qemu

2023-11-29 Thread Manuel Bouyer
On Wed, Nov 29, 2023 at 08:22:32AM +0100, Emile `iMil' Heitor wrote:
> On Thu, 23 Nov 2023, Emile `iMil' Heitor wrote:
> 
> > It seems we have a similar problem to the second bullet point Colin Percival
> > noted here
> > https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html
> > When removing the hvm_start_info address save portion, the sym mapping
> > doesn't fall into an infinite loop anymore.
> > Not yet sure how to fix that, I'll have a look at FreeBSD's commits on this
> > matter.
> 
> And so it was, in locore.S:start_xen32, this assumption is wrong when the
> entrypoint is called from qemu:
> 
>   /*
>* save addr of the hvm_start_info structure. This is also the end
>* of the symbol table
>*/
> 
> this makes esym point to an address (%ebx + KERNBASE) which is not the
> end of the symbol table.
> Same goes with eblob which is calculated relative to %ebx.
> A friend of mine, Gregory in CC, found that putting those 2 (esym and eblob)
> to 0 made the paging init go fine as both tests (l.660 and 667) will trigger
> jz 1f and keep %edi to __kernel_end.
> This brings us to init_xen_early(), which is failing but that's another story.

Of course, this is *not* a Xen VM, so no surprise that start_xen32
isn't working.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PVH boot with qemu

2023-11-13 Thread Manuel Bouyer
On Mon, Nov 13, 2023 at 06:37:01AM +0100, Emile `iMil' Heitor wrote:
> 
> I first asked guidance in port-xen@ but the topic doesn't seem to have much
> success, I'll try my chances here.
> 
> I am trying to make NetBSD/amd64 boot in PVH mode with qemu, using qemu's
> -kernel flag. The kernel does start executing thanks to the first step
> explained here 
> https://www.daemonology.net/blog/2022-10-18-FreeBSD-Firecracker.html
> i.e. adding PVH entry point to the kernel ELF notes.
> 
>#define ELFNOTE(name, type, desctype, descdata...) \
>   -.pushsection .note.name;   \
>   +.pushsection .note.name, "a", @note;   \
>  .align 4 ;   \
>  .long 2f - 1f/* namesz */;   \
>  .long 4f - 3f/* descsz */;   \
>   @@ -588,6 +603,8 @@ next:   pop %edi
>   movl%eax,(%ebp)
> 
> The start_xen32 entrypoint is then found, and the kernel start, but falls in
> an infinite loop in locore.S when mapping symbols and preloaded modules,
> more precisely, in the fillkpt_nox macro. I assume %ecx is wrong or the region
> corrupted for some reason. 
> https://github.com/NetBSD/src/blob/trunk/sys/arch/amd64/amd64/locore.S#L738

I don't think you can use start_xen32 as is, as it expects a Xen environnemwnt.
You may need to write a new start routine, or make a difference between Xen
vs non-Xen in the existing one.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Unexpected out of memory kills when running parallel find instances over millions of files

2023-10-21 Thread Manuel Bouyer
On Fri, Oct 20, 2023 at 10:26:05PM +0200, Reinoud Zandijk wrote:
> Hi,
> 
> On Thu, Oct 19, 2023 at 11:20:02AM +0200, Mateusz Guzik wrote:
> > Running 20 find(1) instances, where each has a "private" tree with
> > million of files runs into trouble with the kernel killing them (and
> > others):
> > [   785.194378] UVM: pid 1998.1998 (find), uid 0 killed: out of swap
> > [   785.194378] UVM: pid 2010.2010 (find), uid 0 killed: out of swap
> > [   785.224675] UVM: pid 1771.1771 (top), uid 0 killed: out of swap
> > [   785.285291] UVM: pid 1960.1960 (zsh), uid 0 killed: out of swap
> > [   785.376172] UVM: pid 2013.2013 (find), uid 0 killed: out of swap
> > [   785.416572] UVM: pid 1760.1760 (find), uid 0 killed: out of swap
> > [   785.416572] UVM: pid 1683.1683 (tmux), uid 0 killed: out of swap
> > 
> > This should not be happening -- there is tons of reusable RAM as
> > virtually all of the vnodes getting here are immediately recyclable.
> > 
> > $elsewhere I got a report of a workload with hundreds of millions of
> > files which get walked in parallel -- a number high enough that it
> > does not fit in RAM on boxes which run it. Out of curiosity I figured
> > I'll check how others are doing on the front, but key is that this is
> > not a made up problem.
> 
> I can second that. I have had UVM killing my X11 when visiting millions of
> files; it might have been using rump but I am not sure.
> 
> What struck me was that swap was maxed out but systat showed something like
> 40gb as `File'. I haven't looked at the Meta percentage but it wouldn't
> surpise me if that was also high. Just some random snippet:

I've seen it too, although it didn't end up killing processes.
But the nightly jobs (usual daily/security+ backup) ends up pushing to
swap lots of processes, while the file cache grows to more than half the
RAM (I have 16Gb). As a result the machine is really slow and none of the
nightly jobs complete before morning.

Decreasing kern.maxvnodes helps a lot.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: GPT attributes in dkwedge [PATCH]

2023-09-16 Thread Manuel Bouyer
On Sat, Sep 16, 2023 at 08:17:16AM +0200, Martin Husemann wrote:
> [...]
> But the more general solution (which would be just as easy for the end
> user, but more flexibel) is to add support for a rootdev statement
> in boot.cfg and then put the label name or the guid there. Similar
> to evbarm taking a root=dev argument passed from the bootloarder.

This is already there; it's used by Xen.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: sdmmc question.

2023-05-24 Thread Manuel Bouyer
On Wed, May 24, 2023 at 09:39:11AM +, Taylor R Campbell wrote:
> > Date: Tue, 23 May 2023 22:54:13 -0700
> > From: Phil Nelson 
> > 
> >I'm presuming that we'll need something in the dev/spi/files.spi,
> > but haven't figured out what to say to get it to work.   And I'm
> > assuming there is a .c file that needs to implement the interface
> > between the sdmmc and the spi, but I'm not sure.   Do we need
> > something in another place?  
> 
> Guessing someone will need to write a driver for sdmmc at spi
> following Chapter 7: SPI Mode of the SD specification:
> 
> https://www.sdcard.org/downloads/pls/pdf/?p=Part1_Physical_Layer_Simplified_Specification_Ver9.00.jpg=Part1_Physical_Layer_Simplified_Specification_Ver9.00.pdf=EN_SS1_9
> 
> You'll probably want to create a driver, say `sdspi', at
> sys/dev/spi/sd_spi.c that implements struct sdmmc_chip_functions and
> does config_found with sdmmcbus_attach_args and .iattr = "sdmmcbus",
> and with

I have dome something like that for NetBSD/tsarmips:
https://www-soc.lip6.fr/svn/netbsdtsar/trunk/netbsd-8/src/sys/arch/tsarmips/

In this case I had a SPI driver called vcispi:
device vcispi : sdmmcbus
attach vcispi at cluster
file arch/tsarmips/soclib/vcispi.c  vcispi

But in my case the sdmmc commands were handled in the vcispi driver (and
for some part maybe in hardware - I don't remember the details).
This may help, but you'll probably need a sdspi to make the translation
between sdmmc commands and our SPI drivers.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: entropy: pid 17200 (python) blocking due to lack of entropy

2023-03-24 Thread Manuel Bouyer
On Wed, Mar 22, 2023 at 04:38:26PM +, Taylor R Campbell wrote:
> > Date: Wed, 22 Mar 2023 17:18:45 +0100
> > From: Manuel Bouyer 
> > 
> > I did this but it didn't unblock the python process. It did tell me:
> > #rndctl -L /tmp/foo 
> > rndctl: no entropy in seed
> > Also I had a /var/db/entropy-file, but maybe without entropy.
> > But /tmp/foo should have some, it was generated on a host with a hardware 
> > RNG:
> > rdrand 1024  2 rngestimate, collect, v
> 
> Can you please share a complete transcript?

I tried to reproduce the problem, but in 3 pbulk runs the python builds
didn't hang ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: getextattr system.test != getextattr system test

2023-03-23 Thread Manuel Bouyer
On Thu, Mar 23, 2023 at 12:51:35PM +0100, Manuel Bouyer wrote:
> hello,
> trying to set up glusterfs on NetBSD I ran into issues with extended 
> attributes
> (on a FFSv2ea). Basically glusterfs tries to setup attributes in the
> trusted namespace, and if I undertood it properly it does so using
> getextattr(1) & friends. The error message is:
> Staging of operation 'Volume Create' failed on localhost : Failed to set 
> extended attributes trusted.glusterfs.volume-id, reason: Attribute not found 
> 
> So I tested from command line:
> #setextattr system test blash /data/glfs1/vol0
> # getextattr system test /data/glfs1/vol0
> /data/glfs1/vol0blash
> # getextattr system.test /data/glfs1/vol0
> getextattr: /data/glfs1/vol0: failed: Attribute not found
> 
> from getextattr's man page, I understand that getextattr system test and
> getextattr system.test should be equivalent. But actually
> "getextattr system test" tries to read attribute test in namespace system,
> while "getextattr system.test" tries to read attribue system.test in
> namespace system.
> 
> Is it the expected behavior (i.e. did I misinterpret the man page) ?
> Or should getextattr strip the namespace component in the second use
> case ?

After more investigation is appears that my glusterfs issue isn't related
(I was wrong saying it uses *extattr(1): is uses the syscalls), there
is a kernel bug, more on this later.

Back to system test vs system.test: I think it's intended.
Both system.test and trusted.test points to the same namespace in our
implementation, and if we strip the namespace from the name then
system.test and trusted.test would point to the same object, which is not
expected.
So was is expected is: "system.test" and "system system.test" are the same.
This is how the FFSv2ea implementation behaves too.

The man page probably needs some clarification.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


getextattr system.test != getextattr system test

2023-03-23 Thread Manuel Bouyer
hello,
trying to set up glusterfs on NetBSD I ran into issues with extended attributes
(on a FFSv2ea). Basically glusterfs tries to setup attributes in the
trusted namespace, and if I undertood it properly it does so using
getextattr(1) & friends. The error message is:
Staging of operation 'Volume Create' failed on localhost : Failed to set 
extended attributes trusted.glusterfs.volume-id, reason: Attribute not found 

So I tested from command line:
#setextattr system test blash /data/glfs1/vol0
# getextattr system test /data/glfs1/vol0
/data/glfs1/vol0blash
# getextattr system.test /data/glfs1/vol0
getextattr: /data/glfs1/vol0: failed: Attribute not found

from getextattr's man page, I understand that getextattr system test and
getextattr system.test should be equivalent. But actually
"getextattr system test" tries to read attribute test in namespace system,
while "getextattr system.test" tries to read attribue system.test in
namespace system.

Is it the expected behavior (i.e. did I misinterpret the man page) ?
Or should getextattr strip the namespace component in the second use
case ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: entropy: pid 17200 (python) blocking due to lack of entropy

2023-03-22 Thread Manuel Bouyer
On Wed, Mar 22, 2023 at 03:59:18PM +, Taylor R Campbell wrote:
> > Date: Wed, 22 Mar 2023 16:33:55 +0100
> > From: Manuel Bouyer 
> > 
> > I upgraded a Xen guest from -7 to 10, and run into:
> > entropy: pid 17200 (python) blocking due to lack of entropy
> > 
> > how do I get out of this ? I tried various things with rndctl, including
> > copying /var/db/entropy-file from another host (with hardware RNG),
> 
> If you copy /var/db/entropy-file from another host (or, better, create
> a new one with `rndctl -S') _and load it_ with `rndctl -L' on this
> host, this will add nonzero entropy to the system; then use
> `/etc/rc.d/random_seed stop' to save it to disk for the next boot in
> case you shut down uncleanly.

I did this but it didn't unblock the python process. It did tell me:
#rndctl -L /tmp/foo 
rndctl: no entropy in seed
Also I had a /var/db/entropy-file, but maybe without entropy.
But /tmp/foo should have some, it was generated on a host with a hardware RNG:
rdrand     1024  2 rngestimate, collect, v

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


entropy: pid 17200 (python) blocking due to lack of entropy

2023-03-22 Thread Manuel Bouyer
Hello
I upgraded a Xen guest from -7 to 10, and run into:
entropy: pid 17200 (python) blocking due to lack of entropy

how do I get out of this ? I tried various things with rndctl, including
copying /var/db/entropy-file from another host (with hardware RNG),
and ping -f to generate a faira amount of network traffic, but nothing seems
to work. I can't switch any of my devices to estimate.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Dell PERC H330: no disks, no volumes

2022-09-14 Thread Manuel Bouyer
On Wed, Sep 14, 2022 at 11:07:23AM +0200, Edgar Fuß wrote:
> Oh, I wasn't aware the H330 and HBA330 are different devices!
> 
> > There is a PERC H330 and a PERC HBA330 and the Dell PERC9 user manual
> > (includes the H330) says you can boot it in HBA mode. Not sure if
> > that means that you can chose the firmware.
> Oh well. So the HBA330 is a PowerEdge RAID Controller that isn't a RAID 
> controller? Thanks, Dell marketing!
> 
> > -> This is attaching a H330 (RAID version) and it gets the mfii driver.
> > mfii0 at pci1 dev 0 function 0: "PERC H330 Mini", firmware 25.5.9.0001
> OK, remains the question why I don't see any discs in bioctl.
> 
> On startup, the machine utters the following:
> 
>   PowerEdge Expandable RAID Controller BIOS
>   Copyright(c) 2016 Avago Technologies
>   Press  to Run Configuration Utility
>   HA -0 (Bus 1 Dev 0) PERC H330 Mini
>   FW package: 25.5.0.0001
> 
> 
>   0 Non-RAID Disk(s) found on the host adapter.
>   0 Non-RAID Disk(s) handled by BIOS
> 
>   0 Virtual Disk(s) found on the host adapter.
> 
>   0 Virtual Disk(s) handled by BIOS
> 
> Is this normal? The only place I see discs being recognized is in the BIOS 
> setup's controller setup.

It's not normal, and this explains why NetBSD doens't see any disks.
I don't remember the details (and it depends on the controller version),
but you need to have physical disks assigned to one (or more) RAID volume,
and then the RAID volume has to be exported as one (or more) virtual disks.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: debugging a kernel that doesn't start

2022-09-12 Thread Manuel Bouyer
On Mon, Sep 12, 2022 at 10:04:24PM +0200, Edgar Fuß wrote:
> > If you can setup a serial console, it may make things much easier.
> I do have a serial port on the machine.
> 
> > I almost always use serial consoles on dev machines; I don't remember the
> > details but doing the equivalent of a putchar very early was possible.
> Is the BIOS still available or how does that work?

Basically it just requires a outb to the serial port's TX register
(well, you also need to busy-wait for the transmit to complete,
or characters will be lost). This is simple enough to have it working
early in boot, as early as the consinit() call in init_x86_64().

I'm almost sure I used printf() in init_x86_64 with a serial console
(but after the consinit() call or course).
I definitively used it in pmap_bootstrap().

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: debugging a kernel that doesn't start

2022-09-12 Thread Manuel Bouyer
On Mon, Sep 12, 2022 at 10:02:33PM +0200, Edgar Fuß wrote:
> > Have you tried booting a custom kernel with some drivers removed?
> No. I wouldn't know which drivers to remove.
> The problem is the Kernel utters absolutely nothing, so it must hang very, 
> very early.
> 
> > have you tried an uncompressed one?
> No, but I guess the official install image (on a USB key) is supposed to 
> work as-is, no?
> 
> > The simplest way to debug something is using a serial port, do you have
> > access to the one on this machine?
> Yes, there is one. It seems to sort-of mirror the on-screen messages up to 
> the point the NetBSD boot runs. I tried
>   consdev com0,9600
> from the boot prompt but that hung the machine.

On some systems I have to set the ioaddr too

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: debugging a kernel that doesn't start

2022-09-12 Thread Manuel Bouyer
On Mon, Sep 12, 2022 at 09:17:52PM +0200, Edgar Fuß wrote:
> I'm trying to run NetBSD on a Dell PowerEdge R6515, and the kernel is being 
> loaded (PXE or USB) but then the machine hangs hard.
> 
> What's the way to debug a kernel that hangs so early that you can't printf 
> or drop into ddb? I guess that's a phenomenon quite common for a new port 
> or changes to locore.s (or whatever that's called today), but it's completely 
> new to me.
> 
> I have virtually no clue about PeCee hardware. At the point the kernel is 
> started, are BIOS routines still available?

If you can setup a serial console, it may make things much easier.
I almost always use serial consoles on dev machines; I don't remember the
details but doing the equivalent of a putchar very early was possible.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: [PATCH] Re: Profiling broken on x86?

2022-07-08 Thread Manuel Bouyer
On Fri, Jul 08, 2022 at 12:17:25PM +, Emmanuel Dreyfus wrote:
> On Fri, Jul 08, 2022 at 11:55:13AM +0200, Manuel Bouyer wrote:
> > I think it's more than this; it's to use the right instruction on the
> > right CPU. But I don't remember the details
> 
> This tests a CPU features at boot time, and replaces a generic implementation
> by a CPU-optimized one. 
> 
> Indeed it brings maximum performance because there is no CPU feature test
> at runtime, but it comes with a price. Profilig and debugging are broken,
> for instance.

that's not the only story. See the Opteron workaround for example.
It enables SMAP if available, too. 

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: [PATCH] Re: Profiling broken on x86?

2022-07-08 Thread Manuel Bouyer
On Fri, Jul 08, 2022 at 09:10:54AM +, Emmanuel Dreyfus wrote:
> On Thu, Jul 07, 2022 at 12:53:33AM +, Emmanuel Dreyfus wrote:
> > This is NetBSD 9.2_STABLE on amd64. If the kernel is built with
> > profiling (either config -p, or makeoptions PROF="-pg" and options
> > GPROF), it crashes at boot with this panic:
> > 
> > panic: patchfunc: sizes do not match (from=0x80f0d720)
> 
> Here is a workaround, but perhaps the concept of patching the 
> binary is to be questioned. Is it just to save the performance
> of a boolean test?

I think it's more than this; it's to use the right instruction on the
right CPU. But I don't remember the details

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Manuel Bouyer
On Thu, Jun 23, 2022 at 01:48:55PM -0700, Brian Buhrow wrote:
>   hello.  In looking at the if_xennet_xenbus.c file, I see where the 
> if_xennetrxbuf_cache is
> initialized, but I don't see where data is put into it before it's requested. 
>  Is the idea that
> the items in the cache are supposed to be provided by the backend, i.e. the 
> dom0?  Is it
> possible that dom0 isn't providing enough rx requests to satisfy the traffic 
> it's sending us? I
> think I understand what's supposed to happen once traffic begins flowing:  rx 
> requests come in,
> if_xennet_xenbus processes them and pushes them back into the 
> if_xennetrxbuf_cache cache.   and
> pushes them back into the if_xennetrxbuf_cache cache.  What I don't 
> understand is how the
> initial cache gets populated with free rx requests to use in order to get 
> things started.

a pool cache has a backing pool. If there's no item in the pool cache, it
gets some memory from its backing pool.
The point of the cache here it to keep the physical address of items around,
so it doesn't have to be computed again

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Manuel Bouyer
On Thu, Jun 23, 2022 at 12:54:59PM -0700, Brian Buhrow wrote:
>   hello.  In looking at my vmstat-m output, I see:
> 
> mclpl   211228146028146 14109 1407435   187 0 524288  
> 35
> 
> I see no failures and the number of nmbclusters is: 524288
> 
> yet, this machine has displayed this message about 6 times since it was 
> rebooted about 5 hours
> ago.
> 
> Am I missing something?

OK, so this is -current; it is the if_xennetrxbuf_cache pool cache which
is failing. This one has no limits.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Periodic messages on NetBSD-9 and -current: xennet0: rx no cluster

2022-06-23 Thread Manuel Bouyer
On Thu, Jun 23, 2022 at 12:29:11PM -0700, Brian Buhrow wrote:
>   Hello.  I'm running a number of NetBSD-9 and -current as of 99.77 
> amd/64 domu machines on
> a couple of different servers with FreeBSD as dom0.  I'm getting the 
> following messages from
> the kernel: 
> xennet0: rx no cluster
> Much of the time, these messages seem harmless, but occasionally, the network 
> locks up on
> machines that display this message.
> 
> In looking at the source code, I get that this is a pool allocation failure in
> if_xennet_xenbus.c, but I don't understand which memory resource it's running 
> out of and if
> there is a way to increase that resource.  In general, the domu's in question 
> seem to have
> plenty of memory and I don't see a lot of memoory pressure for other tasks on 
> the systems.
> 
>   Has anyone else seen these messages on their domu machines and does 
> anyone have ideas on
> how to correct the issue?

It's running out of mbuf clusters; this is the mclpl in vmstat -m

You can try increasing kern.mbuf.nmbclusters, or if that fail, rebuilding
a kernel with
options NMBCLUSTERS=
e.g.
options NMBCLUSTERS=65536

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: procfs files vs symlink

2022-01-12 Thread Manuel Bouyer
On Wed, Jan 12, 2022 at 08:10:44AM -0500, Christos Zoulas wrote:
> 
> 
> > On Jan 12, 2022, at 8:08 AM, Manuel Bouyer  wrote
> >> 
> >> Where do you get the reference to the original inode? Try it...
> > 
> > I don't understand. My patch doesn't change the procfs behavior for 
> > non-linux
> > binaries so I can't see the problem.
> 
> Ah I did not understand that, sorry. Anyway it is fine then, although I would
> like to understand first why linux binaries break... Do they do readlink(2)
> unconditionally and then they don't handle the error condition?

Yes, I guess that's it, but I don't have the sources. I suspect they do this to
find the absolute path of the binary.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: procfs files vs symlink

2022-01-12 Thread Manuel Bouyer
On Wed, Jan 12, 2022 at 07:56:09AM -0500, Christos Zoulas wrote:
> 
> 
> > On Jan 12, 2022, at 7:54 AM, Manuel Bouyer  wrote:
> > 
> > you can still do this, as long as you're not using a linux ln(1) binary.
> 
> Where do you get the reference to the original inode? Try it...

I don't understand. My patch doesn't change the procfs behavior for non-linux
binaries so I can't see the problem.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: procfs files vs symlink

2022-01-12 Thread Manuel Bouyer
On Wed, Jan 12, 2022 at 12:44:57PM -, Christos Zoulas wrote:
> In article ,
> Manuel Bouyer   wrote:
> >-=-=-=-=-=-
> >
> >On Fri, Jan 07, 2022 at 03:20:04PM +0100, Manuel Bouyer wrote:
> >> Hello
> >> I'm trying to get a linux binary to run on NetBSD, as stated in this thread
> >> http://mail-index.netbsd.org/current-users/2022/01/06/msg041891.html
> >> 
> >> Now I hit an issue where the linux process does a readlink() on a procfs
> >> file and gets EINVAL.
> >> It seems that this is because, on linux all files in /proc//fd/ are
> >> symlinks, while on NetBSD they are some kind of hard links.
> >> E.g. on linux:
> >> bip:/dsk/l1/misc/bouyer/HEAD/clean/src>ls -l /proc/$$/fd/
> >> total 0
> >> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 0 -> /dev/null
> >> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 1 -> /dev/null
> >> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 15 -> /dev/pts/11
> >> lrwx-- 1 bouyer ita-iatos 64 Jan  7 14:13 16 -> /dev/pts/11
> >> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 17 -> /dev/pts/11
> >> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 18 -> /dev/pts/11
> >> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 19 -> /dev/pts/11
> >> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 2 -> /dev/null
> >> 
> >> On NetBSD:
> >> armandeche:/local/armandeche1/bouyer>/emul/linux/bin/ls -l /proc/$$/fd/
> >> total 0
> >> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 15
> >> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 16
> >> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 17
> >> 
> >> Any idea on how to properly fix it ?
> >
> >The attached diff changes the procfs behavior to match the linux one, for
> >linux processes:
> >comore:/home/bouyer>ls -l /proc/self/fd/
> >total 1
> >crw--w  1 bouyer  tty5, 0 Jan 11 11:08 0
> >crw--w  1 bouyer  tty5, 0 Jan 11 11:08 1
> >crw--w  1 bouyer  tty5, 0 Jan 11 11:08 2
> >lr-xr-xr-x  1 bouyer  staff   512 Jan 11 11:08 3 -> /home/bouyer
> >
> >ls: /proc/self/fd//4: Invalid argument
> >lr-xr-xr-x  1 bouyer  staff 0 Jan 11 11:08 4
> >comore:/home/bouyer>/emul/linux/bin/ls -l /proc/self/fd/
> >total 0
> >lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 0 -> /dev/ttyp0
> >lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 1 -> /dev/ttyp0
> >lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 2 -> /dev/ttyp0
> >lr-xr-xr-x 1 bouyer staff 0 Jan 11 11:08 3 -> /
> >
> >and my linux binaries seems to work properly now
> >
> >would it be OK to commit ?
> 
> Err, no :-) The previous behavior uses the original inode from the filesystem.
> This means that you can undelete files:

you can still do this, as long as you're not using a linux ln(1) binary.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: procfs files vs symlink

2022-01-12 Thread Manuel Bouyer
On Wed, Jan 12, 2022 at 07:19:14PM +0700, Robert Elz wrote:
> Date:Tue, 11 Jan 2022 22:20:15 +0100
> From:    Manuel Bouyer 
> Message-ID:  
> 
>   | > What causes that EINVAL?
>   |
>   |
>   | I'm not sure (somneone suggested that the file descriptor has been closed
>   | when ls tries to fstat() it, but I can't confirm this).
> 
> That should generate EBADF not EINVAL.  Attempting readlink()
> on something that is not a symlink, and various other possibilities
> like that would be more probable.  EINVAL isn't listed as a possible
> error return from [f]stat ... not that that guarantees that it cannot
> happen, particularly from within emulation code.

Acutally it's not fstat() but readlink(), as shown by ktrace.
To me it's not clear why readlink() fails on this file descriptor.
But it's not caused by my patch

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: procfs files vs symlink

2022-01-11 Thread Manuel Bouyer
On Tue, Jan 11, 2022 at 08:00:53PM +, David Holland wrote:
> On Tue, Jan 11, 2022 at 11:10:24AM +0100, Manuel Bouyer wrote:
>  > The attached diff changes the procfs behavior to match the linux one, for
>  > linux processes:
>  > comore:/home/bouyer>ls -l /proc/self/fd/
>  > total 1
>  > crw--w  1 bouyer  tty5, 0 Jan 11 11:08 0
>  > crw--w  1 bouyer  tty5, 0 Jan 11 11:08 1
>  > crw--w  1 bouyer  tty5, 0 Jan 11 11:08 2
>  > lr-xr-xr-x  1 bouyer  staff   512 Jan 11 11:08 3 -> /home/bouyer
>  > 
>  > ls: /proc/self/fd//4: Invalid argument
>  > lr-xr-xr-x  1 bouyer  staff 0 Jan 11 11:08 4
> 
> What causes that EINVAL?


I'm not sure (somneone suggested that the file descriptor has been closed
when ls tries to fstat() it, but I can't confirm this). Anyway, it happens
also without my patch - see my mail on current-users about the linux binary
issue.

> 
> also beware -- the linux world expects regular files to have canonical
> paths, and that's just not true elsewhere and can't really be papered
> over.

I didn't check all the cases, but it's enough to make my binaries run.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: procfs files vs symlink

2022-01-11 Thread Manuel Bouyer
On Tue, Jan 11, 2022 at 11:34:45AM +0100, Martin Husemann wrote:
> On Tue, Jan 11, 2022 at 11:10:24AM +0100, Manuel Bouyer wrote:
> > +static inline bool
> > +procfs_proc_is_linux_compat(void)
> > +{
> > +   const char *emulname = curlwp->l_proc->p_emul->e_name;
> > +   return (strncmp(emulname, "linux", 5) == 0);
> > +}
> 
> Not a big deal, but wouldn't it be better to give this behaviour a
> symbolic name and use a bit in e_flags for it? This seems to be mostly
> unused so far (or I did something wrong when searching for it).

Maybe, but there's already code like this in procfs. I just cut-n-pasted
it in a function for my need (and we could use this function in other
places in procfs as well, but this would be a different commit)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: procfs files vs symlink

2022-01-11 Thread Manuel Bouyer
On Fri, Jan 07, 2022 at 03:20:04PM +0100, Manuel Bouyer wrote:
> Hello
> I'm trying to get a linux binary to run on NetBSD, as stated in this thread
> http://mail-index.netbsd.org/current-users/2022/01/06/msg041891.html
> 
> Now I hit an issue where the linux process does a readlink() on a procfs
> file and gets EINVAL.
> It seems that this is because, on linux all files in /proc//fd/ are
> symlinks, while on NetBSD they are some kind of hard links.
> E.g. on linux:
> bip:/dsk/l1/misc/bouyer/HEAD/clean/src>ls -l /proc/$$/fd/
> total 0
> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 0 -> /dev/null
> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 1 -> /dev/null
> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 15 -> /dev/pts/11
> lrwx-- 1 bouyer ita-iatos 64 Jan  7 14:13 16 -> /dev/pts/11
> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 17 -> /dev/pts/11
> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 18 -> /dev/pts/11
> lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 19 -> /dev/pts/11
> lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 2 -> /dev/null
> 
> On NetBSD:
> armandeche:/local/armandeche1/bouyer>/emul/linux/bin/ls -l /proc/$$/fd/
> total 0
> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 15
> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 16
> crw--w 1 bouyer tty 3, 13 Jan  7 15:19 17
> 
> Any idea on how to properly fix it ?

The attached diff changes the procfs behavior to match the linux one, for
linux processes:
comore:/home/bouyer>ls -l /proc/self/fd/
total 1
crw--w  1 bouyer  tty5, 0 Jan 11 11:08 0
crw--w  1 bouyer  tty5, 0 Jan 11 11:08 1
crw--w  1 bouyer  tty5, 0 Jan 11 11:08 2
lr-xr-xr-x  1 bouyer  staff   512 Jan 11 11:08 3 -> /home/bouyer

ls: /proc/self/fd//4: Invalid argument
lr-xr-xr-x  1 bouyer  staff 0 Jan 11 11:08 4
comore:/home/bouyer>/emul/linux/bin/ls -l /proc/self/fd/
total 0
lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 0 -> /dev/ttyp0
lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 1 -> /dev/ttyp0
lr-xr-xr-x 1 root   wheel 0 Jan 11 11:08 2 -> /dev/ttyp0
lr-xr-xr-x 1 bouyer staff 0 Jan 11 11:08 3 -> /

and my linux binaries seems to work properly now

would it be OK to commit ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--
Index: sys/miscfs/procfs/procfs.h
===
RCS file: /cvsroot/src/sys/miscfs/procfs/procfs.h,v
retrieving revision 1.80
diff -u -p -u -r1.80 procfs.h
--- sys/miscfs/procfs/procfs.h  29 Apr 2020 07:18:24 -  1.80
+++ sys/miscfs/procfs/procfs.h  11 Jan 2022 10:06:44 -
@@ -213,6 +213,14 @@ struct mount;
 
 struct proc *procfs_proc_find(struct mount *, pid_t);
 bool procfs_use_linux_compat(struct mount *);
+
+static inline bool
+procfs_proc_is_linux_compat(void)
+{
+   const char *emulname = curlwp->l_proc->p_emul->e_name;
+   return (strncmp(emulname, "linux", 5) == 0);
+}
+
 int procfs_proc_lock(struct mount *, int, struct proc **, int);
 void procfs_proc_unlock(struct proc *);
 int procfs_allocvp(struct mount *, struct vnode **, pid_t, pfstype, int);
Index: sys/miscfs/procfs/procfs_vfsops.c
===
RCS file: /cvsroot/src/sys/miscfs/procfs/procfs_vfsops.c,v
retrieving revision 1.110
diff -u -p -u -r1.110 procfs_vfsops.c
--- sys/miscfs/procfs/procfs_vfsops.c   28 Dec 2020 22:36:16 -  1.110
+++ sys/miscfs/procfs/procfs_vfsops.c   11 Jan 2022 10:06:44 -
@@ -343,7 +343,8 @@ procfs_loadvnode(struct mount *mp, struc
 * We make symlinks for directories
 * to avoid cycles.
 */
-   if (vxp->v_type == VDIR)
+   if (vxp->v_type == VDIR ||
+   procfs_proc_is_linux_compat())
goto symlink;
vp->v_type = vxp->v_type;
break;
Index: sys/miscfs/procfs/procfs_vnops.c
===
RCS file: /cvsroot/src/sys/miscfs/procfs/procfs_vnops.c,v
retrieving revision 1.220
diff -u -p -u -r1.220 procfs_vnops.c
--- sys/miscfs/procfs/procfs_vnops.c8 Dec 2021 20:11:54 -   1.220
+++ sys/miscfs/procfs/procfs_vnops.c11 Jan 2022 10:06:44 -
@@ -1142,7 +1142,8 @@ procfs_lookup(void *v)
fvp = fp->f_vnode;
 
/* Don't show directories */
-   if (fp->f_type == DTYPE_VNODE && fvp->v_type != VDIR) {
+   if (fp->f_type == DTYPE_VNODE && fvp->v_type != VDIR &&
+   !procfs_proc_is_linux_compat()) {
vref(fvp);
close

procfs files vs symlink

2022-01-07 Thread Manuel Bouyer
Hello
I'm trying to get a linux binary to run on NetBSD, as stated in this thread
http://mail-index.netbsd.org/current-users/2022/01/06/msg041891.html

Now I hit an issue where the linux process does a readlink() on a procfs
file and gets EINVAL.
It seems that this is because, on linux all files in /proc//fd/ are
symlinks, while on NetBSD they are some kind of hard links.
E.g. on linux:
bip:/dsk/l1/misc/bouyer/HEAD/clean/src>ls -l /proc/$$/fd/
total 0
lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 0 -> /dev/null
lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 1 -> /dev/null
lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 15 -> /dev/pts/11
lrwx-- 1 bouyer ita-iatos 64 Jan  7 14:13 16 -> /dev/pts/11
lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 17 -> /dev/pts/11
lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 18 -> /dev/pts/11
lrwx-- 1 bouyer ita-iatos 64 Jan  7 15:16 19 -> /dev/pts/11
lr-x-- 1 bouyer ita-iatos 64 Jan  7 14:13 2 -> /dev/null

On NetBSD:
armandeche:/local/armandeche1/bouyer>/emul/linux/bin/ls -l /proc/$$/fd/
total 0
crw--w 1 bouyer tty 3, 13 Jan  7 15:19 15
crw--w 1 bouyer tty 3, 13 Jan  7 15:19 16
crw--w 1 bouyer tty 3, 13 Jan  7 15:19 17

Any idea on how to properly fix it ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: SATA rescan?

2021-06-15 Thread Manuel Bouyer
On Tue, Jun 15, 2021 at 09:28:22PM +0200, Edgar Fuß wrote:
> Is there a way (short of re-booting) to re-scan a SATA port for a disc absent 
> (or dysfunctional) during the boot? I.e., something like scsictl rescan?

drvctl -r -a ata_hl atabusX

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Manuel Bouyer
On Wed, May 26, 2021 at 05:35:53PM +1000, matthew green wrote:
> Manuel Bouyer writes:
> > On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> > > bou...@antioche.eu.org (Manuel Bouyer) writes:
> > > 
> > > >Another issue could be mstohz() called with a delay too short;
> > > >mstohz() will round it up to 1 tick.
> > > 
> > > 
> > > #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> > > 
> > > If mstohz() would round up to full ticks, it could actually avoid
> > > some pitfalls. But it doesn't.
> >
> > indeed. But in this case the problem will show up with smaller hz, not
> > larger. So I think we can rule out this case.
> 
> it's hztoms() that is the problem here.
> 
> #  define hztoms(t) ((unsigned int)(((t) + 0ul) * 1000ul / hz))
> 
> ah... this is the new one.  the old one was:
> 
> #define hztoms(t) \
>  (__predict_false((t) >= 0x2) ? \
>  ((t +0u) / hz) * 1000u : \
>  ((t +0u) * 1000u) / hz)
> 
> looks like christos fixed it in 2019.

I'm not sure how christos's change could be a fix. I introduced hztoms()
and mstohz() to avoid interger overflow for large values, and it looks like
christos reintroduced it for 32bits platforms :(

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-26 Thread Manuel Bouyer
On Tue, May 25, 2021 at 10:46:04PM -, Michael van Elst wrote:
> bou...@antioche.eu.org (Manuel Bouyer) writes:
> 
> >Another issue could be mstohz() called with a delay too short;
> >mstohz() will round it up to 1 tick.
> 
> 
> #  define mstohz(ms) ((unsigned int)((ms + 0ul) * hz / 1000ul))
> 
> If mstohz() would round up to full ticks, it could actually avoid
> some pitfalls. But it doesn't.

indeed. But in this case the problem will show up with smaller hz, not
larger. So I think we can rule out this case.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: 9.1: boot-time delay? [WORKAROUND FOUND]

2021-05-25 Thread Manuel Bouyer
On Tue, May 25, 2021 at 04:04:56PM -0400, Mouse wrote:
> > I suppose it's not possible to configure ahcisata in the BIOS on the
> > long-delay machines?
> 
> Thank you very much!  Yes.  That is possible - and it fixes the delay.
> I would not have thought to look for that; I would not have expected
> piixide and ahcisata to be similar enough that a BIOS setting could
> personality-swap between them.
> 
> > I'm guessing this is some quirk of the pciide(4) and piixide(4)
> > drivers.
> 
> Sounds like; they presumably have a bug somewhere in some delay
> calculation.  But at least I have something approaching a workaround.

The reset and probe procedure is different bewteen ide and ahci. 
The problem is probably in this area.

Actually the root cause may be a delay too short, not too long.
AFAIK the code uses mstohz() everywhere to compute tick values, exept
maybe a few cases where we want a really short delay and we use 1.
This is where a very high HZ may cause a too short delay.
Another issue could be mstohz() called with a delay too short;
mstohz() will round it up to 1 tick.

But I think you will need to instrument the ide probe in dev/ic/wdc.c.
It's been a while since I last looked at this, but I think the code
you want to look at is wdc_drvprobe().

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: 9.1: boot-time delay?

2021-05-25 Thread Manuel Bouyer
On Tue, May 25, 2021 at 10:21:40PM +0200, Christoph Badura wrote:
> On Wed, May 26, 2021 at 05:02:52AM +1000, matthew green wrote:
> > > > +optionsHZ=8000
> > this can become a problem due to integer division.
> > 
> > any number of ticks less than hz (8000) will be rounded
> > down to 0 in a number of places now, where as before it
> > was only less than 100.  i've seen this trip up in the
> > kernel before, and sometimes that '0' means 'poll', and
> > sometimes it means 'sleep forever'.
> 
> > a lot of places in the kernel *do* avoid (eg, with adding
> > hz-1 and then dividing by hz) but there are a number that
> > do not...
> 
> So, should we introduce a CPP macro or an inline function that abstracts
> the common code away in a way that avoids such rounding down and
> produces correct results?

We already have mstohz() and hztoms()

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: ZFS L2ARC on NetBSD-9

2021-04-20 Thread Manuel Bouyer
On Mon, Apr 19, 2021 at 05:59:39PM +, Andrew Parker wrote:
> [...]
> 
> Oops.  I completely misread how the return value of l2arc_write_interval is 
> used so that patch doesn't make any sense.  But adding the printf suggested 
> earlier results in this just after boot:
> 
> 
> [14.600107] WARNING: ZFS on NetBSD is under development
> [14.650039] ZFS filesystem version: 5
> [14.650039] wait 100
> [15.690043] wait 96
> [17.840054] wait 0
> 
> The l2arc then seems to hang indefinitely as I never see another l2 feed 
> after the "wait 0" message:


yes, because '0' means "infinity" for cv_timedwait().
This should be changed to make sure at that cv_timedwait() is called with
a timeout value of at last 1, or (maybe better) skip cv_timedwait() if
the value is 0.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: I think I've found why Xen domUs can't mount some file-backed disk images! (vnd(4) hides labels!)

2021-04-11 Thread Manuel Bouyer
On Sun, Apr 11, 2021 at 01:28:46PM -, Michael van Elst wrote:
> bou...@antioche.eu.org (Manuel Bouyer) writes:
> 
> >The size of the disk is indeed 790528 in the xenstore (and the dom0's
> >kernel message) but I don't know where this comes from.
> 
> >The file is definitively 791121 sectors long:
> 
> vnd computes a fake geometry based on 1MB cylinders.

Why does this trucates the total number of sectors of the vnd ?
there's no reason to do so.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: I think I've found why Xen domUs can't mount some file-backed disk images! (vnd(4) hides labels!)

2021-04-11 Thread Manuel Bouyer
On Sat, Apr 10, 2021 at 03:17:35PM -0700, Greg A. Woods wrote:
> [...]
> # fdisk -F /images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img
> Disk: /images/FreeBSD-12.2-RELEASE-amd64-mini-memstick.img
> NetBSD disklabel disk geometry:
> cylinders: 49, heads: 255, sectors/track: 63 (16065 sectors/cylinder)
> total sectors: 791121, bytes/sector: 512
> 
> BIOS disk geometry:
> cylinders: 49, heads: 255, sectors/track: 63 (16065 sectors/cylinder)
> total sectors: 791121
> 
> Partitions aligned to 16065 sector boundaries, offset 63
> 
> Partition table:
> 0: EFI system partition (sysid 239)
> start 1, size 1600 (1 MB, Cyls 0/0/2-0/25/26)
> 1: FreeBSD or 386BSD or old NetBSD (sysid 165)
> start 1601, size 789520 (386 MB, Cyls 0/25/27-49/62/30), Active
> 2: 
> 3: 
> First active partition: 1
> Drive serial number: 2425393296 (0x90909090)
> 
> # fdisk vnd0
> fdisk: primary partition table invalid, no magic in sector 0
> fdisk: Cannot determine the number of heads
> Disk: /dev/rvnd0d
> NetBSD disklabel disk geometry:
> cylinders: 4096, heads: 64, sectors/track: 32 (2048 sectors/cylinder)
> total sectors: 8388608, bytes/sector: 512
> 
> BIOS disk geometry:
> cylinders: 522, heads: 255, sectors/track: 63 (16065 sectors/cylinder)
> total sectors: 8388608
> 
> Partitions aligned to 16065 sector boundaries, offset 63
> 
> Partition table:
> 0: 
> 1: 
> 2: 
> 3: 
> Bootselector disabled.
> No active partition.
> Drive serial number: 0 (0x)

I can't reproduce this fdisk/disklabel on netbsd-9 nor -current.
fdisk on vnd0 gives me the same partition table as on the file.
FreeBSD fails to boot with the same error message.
The size of the disk is indeed 790528 in the xenstore (and the dom0's
kernel message) but I don't know where this comes from.
xbdback uses getdiskinfo() to get the device's size.
In vnd, the size comes from a VOP_GETATTR() on the file, so it looks
like VOP_GETATTR() returns the wrong size.
The file is definitively 791121 sectors long:
#dd if=FreeBSD-12.2-RELEASE-amd64-mini-memstick.img.orig 
of=FreeBSD-12.2-RELEASE-amd64-mini-memstick.img
791121+0 records in
791121+0 records out
#ls -l FreeBSD-12.2-RELEASE-amd64-mini-memstick.img
-rw-r--r--  1 root  wheel  405053952 Apr 11 11:56 
FreeBSD-12.2-RELEASE-amd64-mini-memstick.img
#expr 405053952 / 512
791121

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: regarding the changes to kernel entropy gathering

2021-04-05 Thread Manuel Bouyer
On Mon, Apr 05, 2021 at 09:30:16AM -0700, Greg A. Woods wrote:
> At Mon, 5 Apr 2021 10:46:19 +0200, Manuel Bouyer  
> wrote:
> Subject: Re: regarding the changes to kernel entropy gathering
> >
> > If I understood it properly, there's no need for such a knob.
> > echo 0123456789abcdef0123456789abcdef > /dev/random
> >
> > will get you back to the state we had in netbsd-9, with (pseudo-)randomness
> > collected from devices.
> 
> Well, no, not quite so much randomness.  Definitely pseudo though!
> 
> My patch on the other hand can at least inject some real randomness into
> the entropy pool, even if it is observable or influenceable by nefarious
> dudes who might be hiding out in my garage.

As I understand it, once /dev/random has been seeded, randomness
from other devices will be taken into account (with or without your patch).

In your case, /dev/random reads did block because it didn't get
an initial seed.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: regarding the changes to kernel entropy gathering

2021-04-05 Thread Manuel Bouyer
On Sun, Apr 04, 2021 at 06:47:23PM -0700, Brian Buhrow wrote:
> Hello.  As I understand it, Greg ran into this problem on a xen domu.  In 
> checking my NetBSD-9
> system running as a domu under xen-4.14.1, there is no rdrand or rdseed 
> feature exposed to
> domu's by xen.  This observation is confirmed by looking at the xen command 
> line reference
> page: https://xenbits.xen.org/docs/unstable/misc/xen-command-line.html

Actually, if the CPU supports rdrand or rdseed, they are available
to domUs:
cpu0: Running on hypervisor: Xen
cpu0: "Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz"
cpu0: Intel Xeon Scalable (Skylake, Cascade Lake, Copper Lake) (686-class)
cpu0: family 0x6 model 0x55 stepping 0x7 (id 0x50657)
[...]
cpu0: features1 0xf6f81203
cpu0: features2 0x810
cpu0: features5 0xd18f2369

Source Bits Type  Flags
xbd04010273 disk estimate, collect, v, t, dt
xennet0   0 net  v, t, dt
cpu0  88774 vm   estimate, collect, v, t, dv
system-power  0 power estimate, collect, v, t, dt
autoconf  1 ???  estimate, collect, t, dt
printf0 ???  collect
callout 108 skew estimate, collect, v, dv
cpurng 4096 rng  estimate, collect, v


-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: regarding the changes to kernel entropy gathering

2021-04-05 Thread Manuel Bouyer
On Mon, Apr 05, 2021 at 01:16:56AM +, RVP wrote:
> [...]
> Hmm. I have to say, that now I find myself not disagreeing with Greg's
> point of view: Maybe NetBSD's default is too strict and a knob like
> kern.entropy.use_pooh_poohed_sources=1 would not be a bad thing for
> some users--with all appropriate sysinst warnings of course.

If I understood it properly, there's no need for such a knob.
echo 0123456789abcdef0123456789abcdef > /dev/random

will get you back to the state we had in netbsd-9, with (pseudo-)randomness
collected from devices.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: UVM behavior under memory pressure

2021-04-01 Thread Manuel Bouyer
On Thu, Apr 01, 2021 at 01:13:05PM -0700, Greg A. Woods wrote:
> At Thu, 1 Apr 2021 21:03:37 +0200, Manuel Bouyer  
> wrote:
> Subject: UVM behavior under memory pressure
> >
> > Of course the system is very slow
> > Shouldn't UVM choose, in this case, to reclaim pages from the file cache
> > for the process data ?
> > I'm using the default vm.* sysctl values.
> 
> I almost never use the default vm.* values.
> 
> I would guess the main problem for your system's memory requirements, at
> the time you showed it, is that the default for vm.anonmin is way too
> low and so raising vm.anonmin might help.  If vm.anonmin isn't high
> enough then the pager won't sacrifice other requirements already in play
> for anon pages.

Yes, I understand this. But, in an emergency situation like this one (there
is no free ram, swap is full, openscad eventually gets killed),
I would expect the pager to reclaim pages where it can;
like file cache (down to vm.filemin, I agree it shouldn't go down to 0).

In my case, vm.anonmax is at 80%, and I suspect it was not reached
(I tried to increase it to 90% but this didn't change anything).

I don't know what was in the file cache; in the mean time, its usage is down
to 39M. Maybe firefox was had some background maintenance running ...
And now openscad can complete its rendering :)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


UVM behavior under memory pressure

2021-04-01 Thread Manuel Bouyer
hello
on a system running netbsd-9 from mid-august, I ended up in a state where
the system has little free memory, no free swap and almost 40% of RAM used
by the file cache:

load averages:  9.00,  5.02,  2.86;   up 0+11:30:5620:57:39
97 processes: 2 runnable, 91 sleeping, 1 stopped, 3 on CPU
CPU states:  0.7% user,  0.0% nice, 96.5% system,  0.0% interrupt,  2.6% idle
Memory: 4987M Act, 2436M Inact, 123M Wired, 198M Exec, 2918M File, 4216K Free
Swap: 520M Total, 520M Used, 4K Free

  PID USERNAME PRI NICE   SIZE   RES STATE  TIME   WCPUCPU COMMAND
0 root   00 0K   30M CPU/3 11:27  0.00%   122% [system]
 1009 bouyer430  2978M  275M parked/0   9:32 27.78% 27.78% firefox68
 1519 bouyer260  2583M  141M RUN/0  0:37 20.90% 20.90% firefox68
 1243 bouyer250  2912M  350M RUN/2  2:56 60.96% 20.12% firefox68
 7341 bouyer260  3221M 2807M CPU/2  5:07 17.09% 17.09% openscad
  729 bouyer760   202M   39M select/3  26:03 16.65% 16.65% X

Of course the system is very slow 
Shouldn't UVM choose, in this case, to reclaim pages from the file cache
for the process data ?
I'm using the default vm.* sysctl values.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: nothing contributing entropy in Xen domUs? or dom0!!!

2021-04-01 Thread Manuel Bouyer
On Thu, Apr 01, 2021 at 04:13:59AM +, RVP wrote:
> > [...]
> 
> Does this /etc/entropy-file match what's there in your /boot.cfg?

irrelevant for Xen, as Xen uses the multiboot protocol.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-04-01 Thread Manuel Bouyer
On Wed, Mar 31, 2021 at 09:58:48PM -0400, Thor Lancelot Simon wrote:
> On Wed, Mar 31, 2021 at 11:24:07AM +0200, Manuel Bouyer wrote:
> > On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote:
> > > 
> > > There are no virtual RNG devices on the system in question, according
> > > to the quoted `rndctl -l' output.  Perhaps the VM host needs to be
> > > taught to expose a virtio-rng device to the guest?
> > 
> > There is no such thing in Xen.
> 
> Is the CPU so old that it doesn't have RDRAND / RDSEED, or is Xen perhaps
> masking these CPU features from the guest?

Is there an easy way to test, on a netbsd-9 system, if the instruction is
present and working ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-31 Thread Manuel Bouyer
On Tue, Mar 30, 2021 at 10:42:53PM +, Taylor R Campbell wrote:
> > Date: Tue, 30 Mar 2021 23:53:43 +0200
> > From: Manuel Bouyer 
> > 
> > On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote:
> > > [...]
> > > 
> > > Perhaps the answer is that nothing seems to be contributing anything to
> > > the entropy pool.  No matter what device I exercise, none of the numbers
> > > in the following changes:
> > 
> > yes, it's been this way since the rnd rototill. Virtual devices are
> > not trusted.
> > 
> > The only way is to manually seed the pool.
> 
> This is false.  The virtual RNG drivers (viornd(4) [1], rump
> hyperentropy [2], maybe others) all assume the VM host provides
> samples with full entropy.  This has always been the case, and this
> didn't change at all in the rototill last year.
> 
> There are no virtual RNG devices on the system in question, according
> to the quoted `rndctl -l' output.  Perhaps the VM host needs to be
> taught to expose a virtio-rng device to the guest?

There is no such thing in Xen.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: nothing contributing entropy in Xen domUs? (causing python3.7 rebuild to get stuck in kernel in "entropy" during an "import" statement)

2021-03-30 Thread Manuel Bouyer
On Tue, Mar 30, 2021 at 02:40:18PM -0700, Greg A. Woods wrote:
> [...]
> 
> Perhaps the answer is that nothing seems to be contributing anything to
> the entropy pool.  No matter what device I exercise, none of the numbers
> in the following changes:

yes, it's been this way since the rnd rototill. Virtual devices are
not trusted.

The only way is to manually seed the pool.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: checking for a closed socket

2021-02-02 Thread Manuel Bouyer
On Tue, Feb 02, 2021 at 07:47:09PM +, Roy Marples wrote:
> On 02/02/2021 19:43, Manuel Bouyer wrote:
> > On Tue, Feb 02, 2021 at 07:39:39PM +, Roy Marples wrote:
> > > On 02/02/2021 18:20, Manuel Bouyer wrote:
> > > > Hello,
> > > > I've been debugging an issue wuth Xen, where xenstored loops at 100%
> > > > CPU on poll(2).
> > > > after code analysis it's looping on closed Unix socket desriptors.
> > > >   From what I understood the code expect poll(2) to return something
> > > > different from POLLIN when the remote end of the socket is
> > > > closed (it checks for (~(POLLOUT|POLLIN)) to it could be either
> > > > POLLERR or POLLHUP I guess - or eventually POLLRDHUP which we don't 
> > > > have).
> > > > 
> > > > Who is right here, linux or NetBSD (linux claims to be posix, while
> > > > our man page doens't mention it) ?
> > > > 
> > > > Is there a way to check if a connection has been closed without a 
> > > > read() ?
> > > 
> > > Oddly enough I was looking at this in poll myself recently.
> > > As it turns out, if the socket has a remote end (which sockets do) then 
> > > the
> > > EOF or hangup needs to be read via POLLIN.
> > > If thre is no remote end (like say a file descriptor) then you get 
> > > POLLHUP.
> > > 
> > > This is common with other OS's and select(2).
> > > The rationale is that the peer closing the socket (ie shutdown()) need is
> > > sent on the wire and is thus a read(), whereas the local file desciptor is
> > > an event.
> > > 
> > > So short answer - no. You need to read() and get the error that way.
> > 
> > I didn't mention it specifically, but the socket used by Xen is a
> > Unix (connected) socket. but I guess it's no different from network sockets.
> 
> Correct.
> 
> If you use kqueue(2) and EVFILT_READ on a socket then our man page claims
> that EV_EOF is set in flags.
> 
> Is that of any use to you?
> It does mean using a BSD only interface though, changing from poll to kqueue.

thanks. This will require code reorg but it may be an option.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: checking for a closed socket

2021-02-02 Thread Manuel Bouyer
On Tue, Feb 02, 2021 at 02:07:01PM -0500, Mouse wrote:
> [...]
> > Is there a way to check if a connection has been closed without a
> > read() ?
> 
> ioctl(FIONREAD), perhaps?  But I suspect you mean "without an
> additional syscall", in which case I suspect there is not.

The additionnal syscall isn't an issue. the extra read may be.
Hum, now that I'm thinking about it, as poll(2) won't return if
POLLIN isn't in the watched event, that won't work anyway.

> 
> I think the theory is: why does it matter?  If you're not going to try
> to do I/O on it, then why do you care?  And if you are, then can't the
> check for the peer having closed be implicit in the I/O attempt?
> 
> Is there some reason it's difficult to do it that way?

The code can put a socket in "ignoring" state for various
(some of them are countermeasures against DOS I guess). In this state
the code don't want to read it, but still wants to be notified when the
remote end is clsoed (or the file descriptor will stay forever).

I think we should close the socket in any case (read error, or protocol
error) but this is something I have to check with Xen developers.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: checking for a closed socket

2021-02-02 Thread Manuel Bouyer
On Tue, Feb 02, 2021 at 07:39:39PM +, Roy Marples wrote:
> On 02/02/2021 18:20, Manuel Bouyer wrote:
> > Hello,
> > I've been debugging an issue wuth Xen, where xenstored loops at 100%
> > CPU on poll(2).
> > after code analysis it's looping on closed Unix socket desriptors.
> >  From what I understood the code expect poll(2) to return something
> > different from POLLIN when the remote end of the socket is
> > closed (it checks for (~(POLLOUT|POLLIN)) to it could be either
> > POLLERR or POLLHUP I guess - or eventually POLLRDHUP which we don't have).
> > 
> > Who is right here, linux or NetBSD (linux claims to be posix, while
> > our man page doens't mention it) ?
> > 
> > Is there a way to check if a connection has been closed without a read() ?
> 
> Oddly enough I was looking at this in poll myself recently.
> As it turns out, if the socket has a remote end (which sockets do) then the
> EOF or hangup needs to be read via POLLIN.
> If thre is no remote end (like say a file descriptor) then you get POLLHUP.
> 
> This is common with other OS's and select(2).
> The rationale is that the peer closing the socket (ie shutdown()) need is
> sent on the wire and is thus a read(), whereas the local file desciptor is
> an event.
> 
> So short answer - no. You need to read() and get the error that way.

I didn't mention it specifically, but the socket used by Xen is a
Unix (connected) socket. but I guess it's no different from network sockets.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


checking for a closed socket

2021-02-02 Thread Manuel Bouyer
Hello,
I've been debugging an issue wuth Xen, where xenstored loops at 100%
CPU on poll(2).
after code analysis it's looping on closed Unix socket desriptors.
>From what I understood the code expect poll(2) to return something
different from POLLIN when the remote end of the socket is
closed (it checks for (~(POLLOUT|POLLIN)) to it could be either
POLLERR or POLLHUP I guess - or eventually POLLRDHUP which we don't have).

Who is right here, linux or NetBSD (linux claims to be posix, while
our man page doens't mention it) ?

Is there a way to check if a connection has been closed without a read() ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: cmake hangs on kqueue

2021-01-12 Thread Manuel Bouyer
On Tue, Jan 12, 2021 at 01:20:39PM +0100, Martin Husemann wrote:
> On Tue, Jan 12, 2021 at 01:11:00PM +0100, Manuel Bouyer wrote:
> > I think I've seen some mails about a similar problem in the past few months
> > but I don't remember the details (and couldn't find a PR about it either).
> 
> That was supposed to be fixed by ticket #907, which got pulled up on
> May 13 2020.

thanks, I though it was a kernel issue.
We have these fixes in the build environnement.

The build env makes heavy use of tmpfs (the packages $WORKDIR are in
a large tmpfs for example), could it be related ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


cmake hangs on kqueue

2021-01-12 Thread Manuel Bouyer
Hello,
on a box building packages, running with a -current amd64 kernel and
9.0_STABLE userland, I see pbulk build hanging.
All related processes are in wait state, exept cmake which is in kqueue,
and this is reproductible (if I kill this cmake instance, pbulk will
proceeed with the next package until it hangs on another cmake instance)
 100  9698 22944 57478  78  0 127384   8620 kqueue  Il+  pts/0   0:00.04 
/usr/pkg/bin/cmake -E cmake_autogen 
/work/graphics/gwenview/work/gwenview-20.04.1/_KDE_build/tests/auto/CMakeFiles/imagescalertest_autogen.dir/AutogenInfo.json

kernel is
9.99.77 build on Sun Jan 10 08:56:13 UTC 2021 (I guess from up to date
sources, but I didn't build it myself and I don't have access to the sources).
Userland is
NetBSD 9.0_STABLE/amd64
  Build date   Thu Jun 11 22:49:34 UTC 2020

I think I've seen some mails about a similar problem in the past few months
but I don't remember the details (and couldn't find a PR about it either).

Does it ring a bell to someone ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: NVMM missing opcode REPE CMPS implementation

2020-11-15 Thread Manuel Bouyer
On Sun, Nov 15, 2020 at 12:23:00PM +0100, Reinoud Zandijk wrote:
> [...]
> 
> IIRC, XEN uses Qemu too (regretfully) :-/

It depends. it uses qemu for or (PV)HVM guests only. PV or PVH guests don't
need it.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Kernel panic with usbverbose (?)

2020-10-10 Thread Manuel Bouyer
On Sat, Oct 10, 2020 at 02:30:43PM +0200, BERTRAND Joël wrote:
> Michael van Elst a écrit :
> > joel.bertr...@systella.fr (=?UTF-8?Q?BERTRAND_Jo=c3=abl?=) writes:
> > 
> >>No. This PID is a kernel thread (ioflush). This morning, I have seen
> >> several panics in ioflush.
> > 
> > Kernel threads all run as PID=0.
> 
>   Of course. I don't know what was 160.1. But since this morning, I've
> only seen panics in 0.175. Another one :
> 
> [  8900.078912] sd0d: error writing fsbn 1151945 of
> 1151945-1151961 (sd0 bn 1151945; cn 5622287 tn 36 sn 17)
> [  8900.078912] sd0d: error writing fsbn 1151962 of
> 1151962-11514445008 (sd0 bn 1151962; cn 5622287 tn 37 sn 2)
> [  8900.078912] sd0d: error writing fsbn 11514445009 of
> 11514445009-11514445072 (sd0 bn 11514445009; cn 5622287 tn 38 sn 17)
> [  8900.078912] sd0d: error writing fsbn 11514445073 of
> 11514445073-11514445089 (sd0 bn 11514445073; cn 5622287 tn 40 sn 17)
> [  8900.078912] sd0d: error writing fsbn 11514445090 of
> 11514445090-11514445104 (sd0 bn 11514445090; cn 5622287 tn 41 sn 2)
> [  8900.078912] sd0d: error writing fsbn 11514445105 of
> 11514445105-11514445168 (sd0 bn 11514445105; cn 5622287 tn 41 sn 17)
> [  8900.078912] sd0d: error writing fsbn 11514445169 (sd0 bn
> 11514445169; cn 5622287 tn 43 sn 17)
> [  8909.342671] uvm_fault(0x8151f760, 0xba0020ad, 1) -> e
> [  8909.342671] fatal page fault in supervisor mode
> [  8909.342671] trap type 6 code 0 rip 0x8026aee8 cs 0x8 rflags
> 0x10286 cr2 0xba0020ad0070 ilevel 0 rsp 0xba013cc18b90
> [  8909.342671] curlwp 0x9974adb6e900 pid 0.175 lowest kstack
> 0xba013cc162c0
> [  8909.342671] panic: trap
> [  8909.342671] cpu5: Begin traceback...
> [  8909.342671] vpanic() at netbsd:vpanic+0x160
> [  8909.342671] snprintf() at netbsd:snprintf
> [  8909.342671] startlwp() at netbsd:startlwp
> [  8909.342671] alltraps() at netbsd:alltraps+0xbb
> [  8909.342671] dk_start() at netbsd:dk_start+0x102
> [  8909.342671] spec_strategy() at netbsd:spec_strategy+0xa7
> [  8909.342671] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x4c
> [  8909.352675] dkstart() at netbsd:dkstart+0x184
> [  8909.352675] spec_strategy() at netbsd:spec_strategy+0xa7
> [  8909.352675] VOP_STRATEGY() at netbsd:VOP_STRATEGY+0x4c
> [  8909.352675] wapbl_buffered_write_async() at
> netbsd:wapbl_buffered_write_async+0x7d
> [  8909.352675] wapbl_buffered_write() at netbsd:wapbl_buffered_write+0xdf
> [  8909.352675] wapbl_circ_write() at netbsd:wapbl_circ_write+0x103
> [  8909.352675] wapbl_flush() at netbsd:wapbl_flush+0x26f
> [  8909.352675] ffs_sync() at netbsd:ffs_sync+0x20a
> [  8909.362679] VFS_SYNC() at netbsd:VFS_SYNC+0x35
> [  8909.362679] sched_sync() at netbsd:sched_sync+0x98
> [  8909.362679] cpu5: End traceback...

So your disk is bad, with write errors, and it looks like there is
a bug in dk in this case

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: /dev/random issue

2020-10-01 Thread Manuel Bouyer
On Thu, Oct 01, 2020 at 09:39:18AM -0700, Paul Goyette wrote:
> 
> 
> > > On another machine with working random number generator (nearly all 
> > > modernish
> > > amd64 machines have that) do:
> > > 
> > >   dd if=/dev/random of=/tmp/file bs=32 count=1
> > > 
> > > then scp the file over and dd it into /dev/random:
> > > 
> > >   dd if=/tmp/file of=/dev/random bs=32 count=1
> > > 
> > > This will be preserved accross reboots, so it is a one-time only fix.
> > 
> > OK. But how is it preserved across reboot ? Where does the kernel stores it 
> > ?
> 
> Shutdown process will store a new seed file

ha OK, so it's preserved on shutdown(8), not reboot(2) 

which, basically. means that one should not use reboot, halt or poweroff
any more ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: /dev/random issue

2020-10-01 Thread Manuel Bouyer
On Thu, Oct 01, 2020 at 06:11:20PM +0200, Martin Husemann wrote:
> On Thu, Oct 01, 2020 at 05:57:12PM +0200, Manuel Bouyer wrote:
> > Source Bits Type  Flags
> > /dev/random   0 ???  estimate, collect, v
> [..]
> > seed  0 ???  estimate, collect, v
> 
> No random number generator and you did not seed the machine.

that doens't explain why the other sources of entropy, which were working
bedore, are not working any more.

> 
> On another machine with working random number generator (nearly all modernish
> amd64 machines have that) do:
> 
>   dd if=/dev/random of=/tmp/file bs=32 count=1
> 
> then scp the file over and dd it into /dev/random:
> 
>   dd if=/tmp/file of=/dev/random bs=32 count=1
> 
> This will be preserved accross reboots, so it is a one-time only fix.

OK. But how is it preserved across reboot ? Where does the kernel stores it ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


/dev/random issue

2020-10-01 Thread Manuel Bouyer
Hello,
I just got what looks like a /dev/random issue on HEAD.
A python process, part of the glib2 build, hangs on entropy.
I have enabled all the possible sources but rndctl show '0'
for everything:
Source Bits Type  Flags
/dev/random   0 ???  estimate, collect, v
uhid1 0 tty  estimate, collect, v, t, dt
uhid0 0 tty  estimate, collect, v, t, dt
ums0  0 tty  estimate, collect, v, t, dt
ukbd0 0 tty  estimate, collect, v, t, dt
wd0   0 disk estimate, collect, v, t, dt
cpu3  0 vm   estimate, collect, v, t, dv
cpu2  0 vm   estimate, collect, v, t, dv
cpu1  0 vm   estimate, collect, v, t, dv
cpu0  0 vm   estimate, collect, v, t, dv
re0   0 net  estimate, collect, v, t, dt
aibs0--+12-Volt   0 power estimate, collect, v, t, dv, dt
aibs0--+5-Volta   0 power estimate, collect, v, t, dv, dt
aibs0--+3.3-Vol   0 power estimate, collect, v, t, dv, dt
aibs0-Vcore-Vol   0 power estimate, collect, v, t, dv, dt
aibs0-MB-Temper   0 env  estimate, collect, v, t, dv, dt
aibs0-CPU-Tempe   0 env  estimate, collect, v, t, dv, dt
aibs0-POWER-FAN   0 env  estimate, collect, v, t, dv, dt
aibs0-CHASSIS20 env  estimate, collect, v, t, dv, dt
aibs0-CHASSIS10 env  estimate, collect, v, t, dv, dt
aibs0-CPU-FAN-S   0 env  estimate, collect, v, t, dv, dt
system-power  0 power estimate, collect, v, t, dt
autoconf  0 ???  estimate, collect, t
seed  0 ???  estimate, collect, v

This is kernel and userland from NetBSD-Daily/HEAD/202009281900Z/

Any idea ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Logging a kernel message when blocking on entropy

2020-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2020 at 03:23:59PM +0300, Andreas Gustafsson wrote:
> Manuel Bouyer wrote:
> > > In -current, entropy does not run out.
> > 
> > So, how can it block ?
> 
> When there's too little entropy to begin with.  Once you have
> gathered enough, it unblocks, and never blocks again.
> 
> This is assuming default settings.  If you actually want entropy
> to run out, you can do "sysctl -w kern.entropy.depletion=1", but
> there's no good reason to ever do that outside of testing.
> -- 
> Andreas Gustafsson, g...@gson.org

On Tue, Sep 22, 2020 at 02:25:25PM +0200, Martin Husemann wrote:
> On Tue, Sep 22, 2020 at 02:12:19PM +0200, Manuel Bouyer wrote:
> > So, how can it block ?
> 
> When the system never had enough entropy.
> 
> I would consider this a bug in the setup of the system, but as of now we do
> not deal with it at all during installation, and on systems that are not
> installed (bootable images) it is even harder.
> 
> Sysinst will complain about it soon (and offer options to help fix it).

OK, so the printf should never happen when the system has been properly
configured. In this case I have no objection.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Logging a kernel message when blocking on entropy

2020-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2020 at 03:05:34PM +0300, Andreas Gustafsson wrote:
> Manuel Bouyer wrote:
> > If you run a dd on /dev/random I guess the system will run out of
> > entropy pretty fast.
> 
> In -current, entropy does not run out.

So, how can it block ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Logging a kernel message when blocking on entropy

2020-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2020 at 02:50:37PM +0300, Andreas Gustafsson wrote:
> Manuel Bouyer wrote:
> > I think we should find and remove theses (or make them conditional)
> > instead of adding unconditional new ones
> 
> It would already be conditional in the sense that it's only printed if
> the system has no entropy.  If a multi-user system is lacking entropy,
> a user spamming the console about it is doing the administrator a
> favor.

If you run a dd on /dev/random I guess the system will run out of
entropy pretty fast.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Logging a kernel message when blocking on entropy

2020-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2020 at 02:31:50PM +0300, Andreas Gustafsson wrote:
> Manuel Bouyer wrote:
> > I'm not sure we want a user-triggerable kernel printf enabled by default.
> > This could be used to DOS the system (especially on serial consoles)
> 
> You can already trigger kernel printfs as an unprivileged user.
> The first one that comes to mind is "sorry, pid %d was killed:
> orphaned traced process", but I'm sure there are many others.

I think we should find and remove theses (or make them conditional)
instead of adding unconditional new ones

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Logging a kernel message when blocking on entropy

2020-09-22 Thread Manuel Bouyer
On Tue, Sep 22, 2020 at 01:07:04PM +0300, Andreas Gustafsson wrote:
> All,
> 
> The following patch will cause a kernel message to be logged when a
> process blocks on /dev/random or some other randomness API.  It may
> help some users befuddled by pkgsrc builds blocking on /dev/random,
> and I'm finding it useful when testing changes aimed at fixing PR
> 55659.
> 
> OK to commit?

I'm not sure we want a user-triggerable kernel printf enabled by default.
This could be used to DOS the system (especially on serial consoles)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: uvm object page remove question

2020-05-16 Thread Manuel Bouyer
On Sat, May 16, 2020 at 05:24:32AM -0700, Chuck Silvers wrote:
> On Wed, May 13, 2020 at 08:20:15PM +0200, Manuel Bouyer wrote:
> > Hello,
> > for Xen I need some non-standard VM operation: the tools want to map
> > some Xen objects for which we don't have a physical address.
> > The map/unmap operations are done with hypercalls which does the
> > page table update. In my implementation the tools ask the kernel to
> > do this via a ioctl on /kern/xen/privcmds
> > 
> > When a tool wants to map one of these Xen object, it first does a
> > mmap() to get some virtual space, and then an ioctl to map it.
> > The ioctl allocates a uvm_object and uvm_map() it for the range.
> > The pgo_fault() handler will map the VA to the Xen object using the
> > dedictated hypercall; this works fine.
> > 
> > But I have a problem for unmap: when uvm wants to remove the mapping
> > (either because the process called munmap() or because it exited),
> > pmap_remove() will try to clear the page table entries for this
> > special mapping, and Xen will kill the VM. This has to be done with
> > the right hypercall.
> > 
> > I see 2 ways to fix this: add a pgo_remove() and have UVM call it,
> > or intercept the pmap_remove() calls at the pmap level, and check
> > the VA against our uvm_objects.
> > 
> > Is there something I missed to get a custom page remove for uvm ojbects ?
> > what would be the best way to handle this ?
> 
> I don't think you're missing anything...
> currently the kernel is always able to clear a PTE by simply writing a zero.
> 
> There are more cases to consider, eg. pmap_protect() on this magic mapping
> would probably kill the VM too, since that will also try to modify the PTE.

Not sure; maybe changing the protection bits would be allowed.

We're already more or less in this world with pmap_protect_ma() which takes
the remote domain as (optional) argument, just for this.

> 
> Are all of these mappings single pages?

Some are single pages, but some also contains several pages (I guess for e.g.
framebuffer, or loading the initial code).

> If not, does xen allow removing
> one page of a magic mapping and leaving the rest in place?

Yes

> 
> Why does xen feel it necessary to kill the VM in these cases
> where the guest is reducing its access to these magic pages?
> This would be a lot easier to deal with if xen just let the guest
> operate on these PTEs the same way it operates on other PTEs.

Sure. But I think Xen wants to track the usage, and eventually we
have to take some extra action on unmap too (like a notification sent to
the remote domain).

Also, exact way to do the mapping depends on the type of the remote domain
(PV vs HVM)

> 
> What are these magic mappings for, exactly?

Various memory mapped structures (like the xenstore), emulated devices
(e.g. framebuffer).


> 
> Would it be practical to just map the magic mappings in the kernel
> and then have the tools use ioctls to access the magic mappings?
> That would avoid all of these problems.

This would impose very depth modifications in the tools.


For now I implemented this re-using the hooks in x86/pmap.c for NVMM
and ept tables (pm_remove and pm_data) and it's enough for my needs.

I have another question: when we implment a umv_object, it is possible to
map the whole range at uvm_map() time ?
If you look at the IOCTL_PRIVCMD_MMAPBATCH code in sys/arch/xen/xen/privcmd.c
we validate the machine addresses using a temporaty mapping in kernel space.
The mapping in userland itself is done when a fault occurs.
It would be easier to map the whole range at ioctl() time, so that a
fault never occurs.
I guess I would need to allocate virtual pages registers them in the map ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PCI: disable I/O or mem before probing BAR size

2020-05-05 Thread Manuel Bouyer
On Mon, May 04, 2020 at 09:17:28PM +0200, Manuel Bouyer wrote:
> Hello,
> while trying to boot a Xen PVH kernel as dom0, I found that Xen doesn't
> allow changing memory-mapped PCI BARs if memory decode is enabled in the
> command register. FreeBSD disables I/O or memory decoding in the command
> register before wiriing 0x for probing the BAR size/type, and
> re-enables it after. The attached patch (forget the printf, they're for
> debugging a issue with Xen only) does something similar for NetBSD.
> With this (and a few others hacks) I can boot a GENERIC kernel as
> PVH dom0. It also still boots fine on my core i5 laptop.
> 
> Does anyone see a problem with this ?

FYI I commited this, with an updated comment as suggested by Mouse
thanks for the replies

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: PCI: disable I/O or mem before probing BAR size

2020-05-04 Thread Manuel Bouyer
On Mon, May 04, 2020 at 04:09:41PM -0400, Mouse wrote:
> > while trying to boot a Xen PVH kernel as dom0, I found that Xen
> > doesn't allow changing memory-mapped PCI BARs if memory decode is
> > enabled in the command register.
> 
> Is this permitted behaviour for a PCI device according to the PCI
> specs?

No idea. But the FreeBSD comments says that bad thing could happen
when writing arbitrary values with decoding enabled (they don't mention Xen
specifically), and it makes sense to me.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


PCI: disable I/O or mem before probing BAR size

2020-05-04 Thread Manuel Bouyer
Hello,
while trying to boot a Xen PVH kernel as dom0, I found that Xen doesn't
allow changing memory-mapped PCI BARs if memory decode is enabled in the
command register. FreeBSD disables I/O or memory decoding in the command
register before wiriing 0x for probing the BAR size/type, and
re-enables it after. The attached patch (forget the printf, they're for
debugging a issue with Xen only) does something similar for NetBSD.
With this (and a few others hacks) I can boot a GENERIC kernel as
PVH dom0. It also still boots fine on my core i5 laptop.

Does anyone see a problem with this ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--
Index: dev/pci/pci_map.c
===
RCS file: /cvsroot/src/sys/dev/pci/pci_map.c,v
retrieving revision 1.39
diff -u -p -u -r1.39 pci_map.c
--- dev/pci/pci_map.c   2 Dec 2019 17:13:13 -   1.39
+++ dev/pci/pci_map.c   4 May 2020 19:06:39 -
@@ -49,7 +49,7 @@ static int
 pci_io_find(pci_chipset_tag_t pc, pcitag_t tag, int reg, pcireg_t type,
 bus_addr_t *basep, bus_size_t *sizep, int *flagsp)
 {
-   pcireg_t address, mask;
+   pcireg_t address, mask, csr;
int s;
 
if (reg < PCI_MAPREG_START ||
@@ -75,9 +75,14 @@ pci_io_find(pci_chipset_tag_t pc, pcitag
 */
s = splhigh();
address = pci_conf_read(pc, tag, reg);
+   /* Disable decoding via the command register before writing all-1 */
+   csr = pci_conf_read(pc, tag, PCI_COMMAND_STATUS_REG);
+   pci_conf_write(pc, tag, PCI_COMMAND_STATUS_REG,
+   csr & ~PCI_COMMAND_IO_ENABLE) ;
pci_conf_write(pc, tag, reg, 0x);
mask = pci_conf_read(pc, tag, reg);
pci_conf_write(pc, tag, reg, address);
+   pci_conf_write(pc, tag, PCI_COMMAND_STATUS_REG, csr);
splx(s);
 
if (PCI_MAPREG_TYPE(address) != PCI_MAPREG_TYPE_IO) {
@@ -107,6 +112,7 @@ pci_mem_find(pci_chipset_tag_t pc, pcita
pcireg_t address, mask, address1 = 0, mask1 = 0x;
uint64_t waddress, wmask;
int s, is64bit, isrom;
+   pcireg_t csr;
 
is64bit = (PCI_MAPREG_MEM_TYPE(type) == PCI_MAPREG_MEM_TYPE_64BIT);
isrom = (reg == PCI_MAPREG_ROM);
@@ -138,6 +144,11 @@ pci_mem_find(pci_chipset_tag_t pc, pcita
 */
s = splhigh();
address = pci_conf_read(pc, tag, reg);
+   csr = pci_conf_read(pc, tag, PCI_COMMAND_STATUS_REG);
+   /* Disable decoding via the command register before writing all-1 */
+   printf("mem_find write csr D 0x%x 0x%x\n", csr, address);
+   pci_conf_write(pc, tag, PCI_COMMAND_STATUS_REG,
+   csr & ~PCI_COMMAND_MEM_ENABLE) ;
pci_conf_write(pc, tag, reg, 0x);
mask = pci_conf_read(pc, tag, reg);
pci_conf_write(pc, tag, reg, address);
@@ -149,6 +160,8 @@ pci_mem_find(pci_chipset_tag_t pc, pcita
pci_conf_write(pc, tag, reg + 4, address1);
}
}
+   printf("mem_find write csr 0x%x 0x%x 0x%x\n", csr, address, mask);
+   pci_conf_write(pc, tag, PCI_COMMAND_STATUS_REG, csr);
splx(s);
 
if (!isrom) {
@@ -240,14 +253,27 @@ pci_mapreg_type(pci_chipset_tag_t pc, pc
 int
 pci_mapreg_probe(pci_chipset_tag_t pc, pcitag_t tag, int reg, pcireg_t *typep)
 {
-   pcireg_t address, mask;
+   pcireg_t address, mask, csr;
int s;
 
s = splhigh();
address = pci_conf_read(pc, tag, reg);
+   /* Disable decoding via the command register before writing all-1 */
+   csr = pci_conf_read(pc, tag, PCI_COMMAND_STATUS_REG);
+   if (PCI_MAPREG_TYPE(address) == PCI_MAPREG_TYPE_IO) {
+   pci_conf_write(pc, tag, PCI_COMMAND_STATUS_REG,
+   csr & ~PCI_COMMAND_IO_ENABLE);
+   } else {
+   printf("pci_mapreg_probe: write c/s D MEM 0x%x 0x%x\n", csr, 
address);
+   pci_conf_write(pc, tag, PCI_COMMAND_STATUS_REG,
+   csr & ~PCI_COMMAND_MEM_ENABLE);
+   }
pci_conf_write(pc, tag, reg, 0x);
mask = pci_conf_read(pc, tag, reg);
pci_conf_write(pc, tag, reg, address);
+   if (PCI_MAPREG_TYPE(address) == PCI_MAPREG_TYPE_MEM) 
+   printf("pci_mapreg_probe: write c/s E 0x%x 0x%x 0x%x\n", csr, 
address, mask);
+   pci_conf_write(pc, tag, PCI_COMMAND_STATUS_REG, csr);
splx(s);
 
if (mask == 0) /* unimplemented mapping register */


kernfs from Xen code

2020-04-26 Thread Manuel Bouyer
Hello,
I've been told that MODULAR kernels don't builds any more, because
Xen code (which is built by default) registers kernfs nodes.

Xen domU can work without kernfs; the kernfs node (/kern/xen/xenbus) allows
to use xenstore tools from the domU.

I don't see a compile-time option to detect if kernfs is being compiled in.

I see the following options:
- declare that KERNFS is mandatory if you compile Xen in.
- Add a needs-flag to kenrfs, so Xen can omit its kernfs code when
  compiled without kernfs. 
- move Xen's kernfs code to kernfs; this would need some stub functions
  when Xen is not compiled in.

And maybe more I didn't think about.

For me the second option would be best; I can live with the first one too.
The third is ugly.

Any idea/comment ?


-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: ipfilter crash in latest netbsd-9

2020-04-18 Thread Manuel Bouyer
On Sat, Apr 18, 2020 at 03:59:45AM +0200, Emmanuel Dreyfus wrote:
> Hello
> 
> After upgrading to 9.0, I experienced crashes when enabling 
> ipfilter (backtrace below). I tried latest netbsd-9 kernel without 
> improvement. 
> 
> Is this a known pending issue?

AFAIK no. I use ipf on netbsd-9 and didn't notice issues.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


detecting userconf disable

2020-04-17 Thread Manuel Bouyer
Hello,
As part of my Xen HVM work, I'd like to be able to be able to disable Xen
support from userconf to boot as a plain x86 emulated host

The obvious way would be to disable the
hypervisor* at mainbus? # Xen hypervisor
device. But, for CPU setup I need to do some Xen setup before attaching the
hypervisor device. And I need CPUs to be set up to attach the hypervisor
device. So I have a xen_hvm_init() function called early.
Is there a way, in this function, to detect if hypervisor has been disabled by
userconf ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: All (?) network tests failing

2020-03-30 Thread Manuel Bouyer
On Mon, Mar 30, 2020 at 08:28:10PM +0200, Martin Husemann wrote:
> On Mon, Mar 30, 2020 at 02:25:01PM -0400, Christos Zoulas wrote:
> > What is your build host?
> > I am running the latest build I installed built from NetBSD/current to 
> > NetBSD/current.
> 
> I see the same fallout on a NetBSD-current build on a NetBSD-current
> (but it crept in delayed, probably because something did not get immediately
> rebuild by build.sh -u).

I also see the same with builds from releng.netbsd.org:
http://www-soc.lip6.fr/~bouyer/NetBSD-tests/xen/HEAD/amd64/202003270130Z_atf.html#failed-tcs-summary

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Changes to reduce pressure on uvm_pageqlock

2019-12-07 Thread Manuel Bouyer
On Fri, Dec 06, 2019 at 11:33:27PM +, Andrew Doran wrote:
> [...]
> As to the reason why: at the start of November on my machine system time for
> a kernel build was 3200-3300s.  With this plus the remaining changes it's
> down to 850s so far and with !DIAGNOSTIC about 580s.

good job ! thanks !

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal: validate FFS root inode during the mount.

2019-11-20 Thread Manuel Bouyer
On Wed, Nov 20, 2019 at 10:11:14AM -0500, Mouse wrote:
> > During the fuzzing of FFS filesystem, we had a couple of issues
> > caused by corrupted inode fields.  [...]
> 
> > To make sure that corrupted mount won't cause harm to the user, I
> > want to add function to validate root inode on mount step (after
> > superblock validation)
> 
> Don't you have more or less the same issue with every other non-free
> inode in the filesystem?  The only thing I can see that's special about
> the root inode in this regard is that it is the only inode that is used
> immediately upon mount.

I think the point is, when the root inode is corrupted, you can't unmount
then filesystem.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-27 Thread Manuel Bouyer
On Fri, Sep 27, 2019 at 10:57:12AM +0200, Jaromír Dole?ek wrote:
> [...]
> Given the history, to me it's completely clear compat_linux shouldn't
> be on by default. Any possible linux-specific exploits should only be
> problem for people actually explicitly enabling it. Let's just stop
> pretending that we'd setup any kind of reasonable testing suite for
> this - it has not been done in last >20 years, it's even less likely
> to happen now that most of the major use cases are actually moot.
> 
> As Maya suggested, let's keep this concentrated on COMPAT_LINUX only
> to avoid further bikeshed flogging, so basically I propose doing this:
> 1) Comment out COMPAT_LINUX from all kernels configs for all archs
> which support modular
> 2) Disable autoload for compat_linux, requiring the user to explicitly
> configure system to load it. No extra sysctl.
> 
> Any major and specific objections?

not from me.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 07:39:45PM +0200, Martin Husemann wrote:
> On Thu, Sep 26, 2019 at 07:21:17PM +0200, Kamil Rytarowski wrote:
> > As I have proposed. MUSL+LTP for catching functional regressions/bugs
> > AND fuzzing to catch crashes can be good enough to keep it trusted. The
> > kernel certainly needs a lot of bug fixes, but instead of disabling this
> > crucial feature it is better to find a way to make it more trusted.
> 
> Indeed. If we have a toolchain in pkgsrc (or some commparable easy setup)
> and a test suite, we can build more trust for some compat feature.

If the toolchain can be in pkgsrc, then I it should be much easier.
At some point I could build linux binaries running a linux gcc under
compat_linux.

Maybe it's just a matter of adding a suse_toolchain package ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 06:45:44PM +0200, Maxime Villard wrote:
> Le 26/09/2019 à 18:15, Manuel Bouyer a écrit :
> > On Thu, Sep 26, 2019 at 05:28:11PM +0200, Maxime Villard wrote:
> > > Le 26/09/2019 à 17:25, Brian Buhrow a écrit :
> > > > [...]
> > > > One implication of your proposal is that you'll disable the 
> > > > autoload
> > > > functionality, users will turn it back on, use it, and be more 
> > > > vulnerable
> > > > than they are now because the primary developers aren't concern with 
> > > > making
> > > > things work or secure anymore.
> > > 
> > > Nobody is making compat_linux work, nobody is making compat_linux secure.
> 
> My experience with a Linux program using forks and signals is that it does not
> work at all; the children get killed randomly for no reason, the parent 
> doesn't
> receive the signals, and after some time everything needs to be killed and
> restarted to work again. Completely broken. I didn't manage to find where
> exactly the problem was.
> 
> Under reasonable assumptions, compat_linux indeed used to be the most
> maintained compat layer we had. This isn't the case anymore. Under reasonable
> assumptions as well, it has a marginal use case, and can be disabled.
> 
> Maybe Manuel can understand that for a minute? Or is he still looking for
> evidence that I'm not the Pope?

I'm conviced you're not the pope.
You don't seem to understand that I use compat_linux on a regular
basis. Probably not with the same softwares as you, but it does work for me.

I forgot to mention, I also use it with opera.


> 
> > Secure, I don't know.
> 
> Well Manuel can pretend everything he wants, but when it comes to security and
> compat_linux, we're entering the world of facts, and not reasonable 
> assumptions.

I never contested your facts on this topic. "I don't know" just means that:
I don't know. I've not looked at the code, nor at any changes in this
area for a long time.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 04:40:33PM +0200, Maxime Villard wrote:
> > 
> > Actually this is not clear. We have linux binaries in pkgsrc.
> 
> ... And? We have 22000 packages in pkgsrc.

How is it relevant ? I install less than 200. But there is suse_base in them

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 05:28:11PM +0200, Maxime Villard wrote:
> Le 26/09/2019 à 17:25, Brian Buhrow a écrit :
> > [...]
> > One implication of your proposal is that you'll disable the autoload
> > functionality, users will turn it back on, use it, and be more vulnerable
> > than they are now because the primary developers aren't concern with making
> > things work or secure anymore.
> 
> Nobody is making compat_linux work, nobody is making compat_linux secure.

Actually it *does* work. Secure, I don't know.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 04:52:35PM +0200, Maxime Villard wrote:
> Le 26/09/2019 à 16:47, Manuel Bouyer a écrit :
> > On Thu, Sep 26, 2019 at 04:40:33PM +0200, Maxime Villard wrote:
> > > > 
> > > > Actually this is not clear. We have linux binaries in pkgsrc.
> > > 
> > > ... And? We have 22000 packages in pkgsrc.
> > 
> > How is it relevant ? I install less than 200. But there is suse_base in them
> 
> The real question, is how the fact that there are linux binaries in
> pkgsrc relevant. Yes, there are packages. And? Most of them are

it's relevant because they are in the binary repositories, so anyone
can install them with pkg_add or pkgin. And they don't even have to know
these are linux binaries.

> completely outdated and haven't been updated in the last 10 years.

They were usefull 10 years ago; it doen't make then useless now.
For example I use then to run eagle, or microchip compilers, on
a regular basis.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 05:10:01PM +0200, Maxime Villard wrote:
> issues for a clearly marginal use case, and given the current general
 ^^^

This is where we dissagree. You guess it's marginal but there's no
evidence of that (and there's no evidence of the opposite either).

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 05:18:34PM +0200, Maxime Villard wrote:
> Le 26/09/2019 à 17:15, Manuel Bouyer a écrit :
> > On Thu, Sep 26, 2019 at 05:10:01PM +0200, Maxime Villard wrote:
> > > issues for a clearly marginal use case, and given the current general
> >   ^^^
> > 
> > This is where we dissagree. You guess it's marginal but there's no
> > evidence of that (and there's no evidence of the opposite either).
> 
> Can you provide evidence that it is used by the majority of the users?

As I said I don't have evidence either way

> And that therefore keeping vulnerabilities for 100% of the people is
> legitimate?

I never said it is. What I'm saying is that your claim about very few peoples
using the compat modules may be wrong. This is different.

(and to make it clear: I don't care if modules autoload or not; I'm not using
modules).

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 04:29:52PM +0200, Maxime Villard wrote:
> Le 26/09/2019 à 16:22, Mouse a écrit :
> > > > > Keeping them enabled for the <1% users interested means keeping
> > > > > vulnerabilities for the >99% who don't use these features.
> > > > Are the usage numbers really that extreme?  Where'd you get them?  I
> > > > didn't think there were any mechanisms in place that would allow
> > > > tracking compat usage.
> > > No, there is no strict procedure to monitor compat usage, and there
> > > never will be.  Maybe it's not <1%, but rather 1.5%; or maybe it's
> > > 5%, 10%, 15%.
> > 
> > > Who cares, exactly?
> > 
> > The short answer is "anyone who wants NetBSD to be useful".
> > 
> > If it really is only a tiny fraction - under ten people, say - then,
> > sure, yank it out.  If it's 90%, removing it would lose most of the
> > userbase, possibly provoke a fork.  15%, 40%, I don't think there is a
> > hard line between "pull it" and "keep it", and even if there were I'm
> > not sure it would matter because it appears nobody knows what the
> > actual use rate is anyway.
> 
> What is known, however, is that 100% of the users are affected by the
> vulnerabilities. So, do we keep these things enabled by default just
> because "uh we don't know so we shouldn't do anything"? Even as it's
> already been clear that the majority doesn't use compat_linux?

Actually this is not clear. We have linux binaries in pkgsrc.

> Is it such a Herculean effort to type "modload compat_linux" for the
> people that want to use Linux binaries? In order to keep the majority
> safe from the bugs and vulnerabilities?

Maybe some of them don't even know they are using compat_linux ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Manuel Bouyer
On Thu, Sep 26, 2019 at 01:47:06PM +, m...@netbsd.org wrote:
> Since the subject is a bit generic, I'd like to note that not all
> compats are equal. compat netbsd-6 and newer is more widely used due
> to programming languages either using custom asm (Go) or binary
> bootstrap (Ada).
> 
> These are fairly easy to fix and I don't mind helping with the work but
> it would be nice for someone to run a before/after bulk build of pkgsrc
> to see what breaks.

Sure. Also, I use NETBSD_COMPAT* on a regular basis for upgrade (boot new
kernel with old userland then upgrade userland)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: non-module build fail

2019-09-10 Thread Manuel Bouyer
On Tue, Sep 10, 2019 at 05:01:43PM +0100, Robert Swindells wrote:
> 
> Manuel Bouyer  wrote:
> >On Tue, Sep 10, 2019 at 04:38:46PM +0100, Robert Swindells wrote:
> >> 
> >> Manuel Bouyer  wrote:
> >> >I'm trying to build a evbarm kernel with 
> >> >options SLJIT
> >> >options BPFJIT
> >> 
> >> Are you building a 32 or 64-bit kernel ?
> >
> >32-bits.
> 
> Try this:
> 
> Index: files.generic
> ===
> RCS file: /cvsroot/src/sys/arch/evbarm/conf/files.generic,v
> retrieving revision 1.7
> diff -u -r1.7 files.generic
> --- files.generic   11 Jun 2019 13:01:48 -  1.7
> +++ files.generic   10 Sep 2019 15:53:24 -
> @@ -26,3 +26,9 @@
>  include "arch/arm/vexpress/files.vexpress"
>  include "arch/arm/virt/files.virt"
>  include "arch/arm/xilinx/files.zynq"
> +
> +#
> +# Stack-less Just-In-Time compiler
> +#
> +
> +include"external/bsd/sljit/conf/files.sljit"

Yes it works, thanks.
But as this doens't look evbarm-specific I wonder if it should be in
sys/arch/arm/conf/files.arm ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: non-module build fail

2019-09-10 Thread Manuel Bouyer
On Tue, Sep 10, 2019 at 05:43:51PM +0200, Manuel Bouyer wrote:
> On Tue, Sep 10, 2019 at 04:38:46PM +0100, Robert Swindells wrote:
> > 
> > Manuel Bouyer  wrote:
> > >I'm trying to build a evbarm kernel with 
> > >options SLJIT
> > >options BPFJIT
> > 
> > Are you building a 32 or 64-bit kernel ?
> 
> 32-bits.

It looks like what's missing is
include "external/bsd/sljit/conf/files.sljit"

in files.generic
Or maybe it should be in sys/arch/arm/conf/files.arm ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: non-module build fail

2019-09-10 Thread Manuel Bouyer
On Tue, Sep 10, 2019 at 04:38:46PM +0100, Robert Swindells wrote:
> 
> Manuel Bouyer  wrote:
> >I'm trying to build a evbarm kernel with 
> >options SLJIT
> >options BPFJIT
> 
> Are you building a 32 or 64-bit kernel ?

32-bits.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: /dev/random is hot garbage

2019-07-22 Thread Manuel Bouyer
On Mon, Jul 22, 2019 at 07:56:02PM +0200, Havard Eidnes wrote:
> > So, try forcing scp over IPv4 maybe?
> 
> Actually, configuring IPv6 in my case (I didn't have that
> already) works much better:
> 
> rust-std-1.32.0-i686-unknown-netbsd.tar.gz
>  72,048,862 100%  184.72kB/s0:06:20 (xfr#3, to-chk=22/69)
> rust-std-1.32.0-powerpc-unknown-netbsd.tar.gz
>  69,624,858 100%  165.15kB/s0:06:51 (xfr#4, to-chk=21/69)
> rust-std-1.32.0-sparc64-unknown-netbsd.tar.gz
>  71,612,611 100%  162.35kB/s0:07:10 (xfr#5, to-chk=20/69)

This is also the average speed I got from pkgbuild.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: /dev/random is hot garbage

2019-07-21 Thread Manuel Bouyer
On Sun, Jul 21, 2019 at 07:20:08PM +, Taylor R Campbell wrote:
> > Date: Sun, 21 Jul 2019 20:52:52 +0200
> > From: Manuel Bouyer 
> > 
> > /dev/randon actually works as documented and if rust wants /dev/urandom
> > behavior it should use /dev/urandom. Also I'd like to get explained why
> > a compiler needs that much random bits.
> 
> The difference is that /dev/random may block, and if it blocks, it
> doesn't wake up until the entropy pool is seeded.  In contrast,
> /dev/urandom never blocks, even if the entropy pool has not yet been
> seeded.
> 
> There is no reason in modern cryptography to read more than one byte
> from /dev/random ever in a single application; once you have done
> that, or confirmed some other way that the the entropy pool is seeded,
> you should generate keys from /dev/urandom.
> 
> What Rust's vendor/rand library seems to guarantee for its callers is
> that it won't return any data until the entropy pool has been seeded,
> and then it will return arbitrarily much data without ever blocking
> again.  It does this by reading a single byte from /dev/random, and
> then generating keys from /dev/urandom.
> 
> This is _locally_ sensible for a library that may have many users
> beyond a compiler.  But what seems to be happening (although I haven't
> dived into the build process myself to confirm) is that many
> subprocesses in the build process are _indepenently_ initializing the
> Rust vendor/rand library -- reading one byte from /dev/random, which
> sometimes blocks.

I suspect it's the problem. There were several rust processes, all
blocked on /dev/random

> 
> In a build chroot, or in a Xen guest, where you aren't handling any
> secrets (e.g., no sshd except on the local network, no package
> signing, ), you can replace /dev/random by a symlink to
> /dev/urandom and the build will never block.

Actually, I have no idea what the requirements for a full pbulk build are
in this area. We'll certainly want to do packages signing at some point.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: /dev/random is hot garbage

2019-07-21 Thread Manuel Bouyer
On Sun, Jul 21, 2019 at 06:57:30PM +, m...@netbsd.org wrote:
> On Sun, Jul 21, 2019 at 08:52:52PM +0200, Manuel Bouyer wrote:
> > On Sun, Jul 21, 2019 at 06:43:04PM +, m...@netbsd.org wrote:
> > > On Sun, Jul 21, 2019 at 11:55:23AM -0400, Greg Troxel wrote:
> > > > Another approach, harder, is to create a xenrnd(4) pseudodevice and
> > > > hypervisor call that gets bits from the host's /dev/random and injects
> > > > them as if from a hardware rng.
> > > > 
> > > > 
> > > 
> > > That requires the ability coordinate "please run this backported patch"
> > > to whoever does the package builds. Since we don't let anyone volunteer
> > > for tasks and would rather have highly critical things rely on people
> > > who stopped having NetBSD time about 5 years ago, that's not going to
> > > happen.
> > 
> > no that's not the problem.
> > Lots of nonsense has been written in this thread.
> > /dev/randon actually works as documented and if rust wants /dev/urandom
> > behavior it should use /dev/urandom. Also I'd like to get explained why
> > a compiler needs that much random bits.
> > 
> > BTW, while talking about packages availability, when will the bootstrap
> > kit for i386 be available ?
> 
> ftp://golden-delicious.urc.uninett.no/pub/rust/rust-std-1.35.0-i686-unknown-netbsd.tar.gz

Why is it not on ftp.NetBSD.org, with other bootstrap kits ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: /dev/random is hot garbage

2019-07-21 Thread Manuel Bouyer
On Sun, Jul 21, 2019 at 06:43:04PM +, m...@netbsd.org wrote:
> On Sun, Jul 21, 2019 at 11:55:23AM -0400, Greg Troxel wrote:
> > Another approach, harder, is to create a xenrnd(4) pseudodevice and
> > hypervisor call that gets bits from the host's /dev/random and injects
> > them as if from a hardware rng.
> > 
> > 
> 
> That requires the ability coordinate "please run this backported patch"
> to whoever does the package builds. Since we don't let anyone volunteer
> for tasks and would rather have highly critical things rely on people
> who stopped having NetBSD time about 5 years ago, that's not going to
> happen.

no that's not the problem.
Lots of nonsense has been written in this thread.
/dev/randon actually works as documented and if rust wants /dev/urandom
behavior it should use /dev/urandom. Also I'd like to get explained why
a compiler needs that much random bits.

BTW, while talking about packages availability, when will the bootstrap
kit for i386 be available ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Interface description support

2019-06-25 Thread Manuel Bouyer
On Tue, Jun 25, 2019 at 10:43:32AM +0200, Michael van Elst wrote:
> On Tue, Jun 25, 2019 at 09:49:46AM +0200, Manuel Bouyer wrote:
> > On Mon, Jun 24, 2019 at 09:56:35PM -, Michael van Elst wrote:
> > > IMHO such functionality doesn't belong into the kernel, it's much easier
> > > to have a configuration syntax with variables or macros to achieve hte
> > > same.
> > 
> > Exept it would make it harder to use in e.g. packet filters.
> > The interface may not exist when the packet filter rule file is parsed
> > (e.g. in a Xen dom0)
> 
> For some packet filters that's not even a question as these are
> attached to specific interfaces. For new interfaces you need
> to load new rules and that can be handled in userland.
> 
> npf, working in the IP layer, needs to filter packets according to
> interface. That allows more complex matching in the kernel, which
> makes it easier to use. But is pushing complexity into the kernel
> the right thing?

I think so. or example, on a Xen dom0, you'd need to reload the
config file each time a virtual interface is created or destroyed.
That's what I do right now but is has lots of problems (including
xl timing out on domUs with lots of interfaces, because the machinery
to patch and reload the ipf config file takes too much time).
Using interface name aliases in the config file (and down to the kernel)
would solve this nicely.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Interface description support

2019-06-25 Thread Manuel Bouyer
On Mon, Jun 24, 2019 at 09:56:35PM -, Michael van Elst wrote:
> e...@math.uni-bonn.de (Edgar =?iso-8859-1?B?RnXf?=) writes:
> 
> >Or is there an argument that descriptions ought to be able to look like 
> >names?
> 
> These are not really descriptions but aliases and people like to
> use them as such. If you allow arbitrary strings (even with special
> characters, whitespace or Unicode) then you need to adapt the syntax
> of configuration files that include interface names. It's easier
> to restrict such aliases to common identifiers as this is what
> parsers expect.
> 
> IMHO such functionality doesn't belong into the kernel, it's much easier
> to have a configuration syntax with variables or macros to achieve hte
> same.

Exept it would make it harder to use in e.g. packet filters.
The interface may not exist when the packet filter rule file is parsed
(e.g. in a Xen dom0)

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Interface description support

2019-06-24 Thread Manuel Bouyer
On Mon, Jun 24, 2019 at 04:59:04AM -0700, Jason Thorpe wrote:
> 
> > On Jun 24, 2019, at 4:29 AM, Manuel Bouyer  wrote:
> > 
> > I'd say that we should explicitely mention if we're looking up a name or
> > a description, to avoid confusion. For example if wm0 has descrition
> > "external if" we should be able to write in ipf:
> > 
> > block in on wm0 from any to any
> > or
> > block in on intf_desc "external if" from any to any
> > 
> > Same with netstat:
> > netstat -I wm0
> > netstat -D "external if"
> > 
> > and so on ...
> 
> I think that severely limits the utility of the description field.

I'm not sure why.
AFAIK this is already a different ioctl to retrieve the description, so
the tools will have to be changed anyway.

Alternatively we could make things like if_nametoindex(3) match on description
as well as interface name, but this opens a whole can of worms:
if_indextoname(3) will not return the same string, and what should we
do with if_nameindex(3) ?

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Interface description support

2019-06-24 Thread Manuel Bouyer
On Mon, Jun 24, 2019 at 04:20:59AM -0700, Jason Thorpe wrote:
> 
> > On Jun 24, 2019, at 12:15 AM, Manuel Bouyer  wrote:
> > 
> > I'd like to see this in NetBSD. I'd also like packet filters to be able
> > to use the description instead of the name for interfaces. This would make
> > my life much easier for e.g. ipfilter in Xen dom0, where the domU's virtual
> > interfaces have unpredicatble names.
> 
> I agree, we should be able to use the description as a means of looking up 
> the interface.  However, because descriptions can be arbitrary, you need to 
> have some rules around them:
> 
> 1- Duplicate descriptions are not allowed (should return EEXIST if an attempt 
> is made to set a duplicate).

Sure

> 
> 2- In order to prevent unpredictable behavior in the presence of name-"wm0" 
> and description-"wm0" being associated with different interfaces, the 
> hardware name should always take priority when looking up an interface.

I'd say that we should explicitely mention if we're looking up a name or
a description, to avoid confusion. For example if wm0 has descrition
"external if" we should be able to write in ipf:

block in on wm0 from any to any
or
block in on intf_desc "external if" from any to any

Same with netstat:
netstat -I wm0
netstat -D "external if"

and so on ...

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: Interface description support

2019-06-24 Thread Manuel Bouyer
On Mon, Jun 24, 2019 at 01:52:18PM +0900, KUSABA Takeshi wrote:
> Hi,
> 
> I would like to propose a feature: a description of a network interface.
> (FreeBSD and OpenBSD support this feature.)
> 
> Here is a patch.
> 
> https://gist.githubusercontent.com/maleic1618/f4881717cbd3b1e3182984f9773b1001/raw/114b5d6c4fbe1fd9cd473fdc802b003c072c2794/description.patch
> 
> The sammary of this patch is:
> 
> ioctl(2):
> - support the commands SIOCGIFDESCR/SIOCSIFDESCR to get/set the description.
> 
> ifconfig(8):
> - add the commands description,descr/-description,-descr to set/clear the
> description.
> 
> 
> Could you comment on this patch?
> I hope that this patch will be merged.

I'd like to see this in NetBSD. I'd also like packet filters to be able
to use the description instead of the name for interfaces. This would make
my life much easier for e.g. ipfilter in Xen dom0, where the domU's virtual
interfaces have unpredicatble names.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: evbarm hang

2019-04-19 Thread Manuel Bouyer
[redirected to port-arm]

On Fri, Apr 19, 2019 at 12:47:11PM +0200, Manuel Bouyer wrote:
> On Fri, Apr 19, 2019 at 08:33:00PM +1000, matthew green wrote:
> > > So here's our deadlock: cpu 0 holds the kernel lock and wants the pool 
> > > spin
> > > mutex; cpu 1 holds the spin mutex and wants the kenrel lock.
> > 
> > AFAICT, cpu 1 is at fault here.  this locking order is
> > backwards.
> > 
> > not sure why arm32 pmap operations on the kernel map are
> > with the kernel lock instead of another lock.  the arm64
> > version of this code doesn't take any lock.
> 
> Yes, this looks like it can cause lots of problems.
> 
> Indeed, I suspect the kernel_lock here could be replaced with pm_lock as
> aarch64 does. I will try this.

Here's what I found.
I think a per-pmap lock is needed even for the kernel pmap, because of
the l2_dtable (and related) structures contains statistics that
needs to be keep coherent. For user pmaps, pm_lock protects it.
For the kernel pmap we can't use pm_lock, because as pmap_kenter/pmap_kremove
can be called from interrupt context, it needs a mutex at IPL_VM.
So I added a kpm_lock kmutex, and use it in pmap_acquire_pmap_lock() for
kernel pmap.

Then we need to be carefull not calling a path that could sleep with this mutex
held. This needs some adjustements in pmap_enter() (don't call
pool_get()/pool_put() with the pmap locked)  and pmap_remove()
(defer pool_put() util it is safe to release the pmap lock).

While there, also use splvm() instead of splhigh() in pmap_growkernel(),
as x86 does.
Also, rename pmap_lock to pmap_pg_lock. As this can be used with the
kernel pmap lock held, this also needs to be a IPL_VM kmutex (in all cases).

It also includes the patch I posted earlier today to workaround deadlocks
in ddb.

I've been running a DIAGNOSTIC+LOCKDEBUG kernel on my lime2 allwinner A20
board, and it did build a few packages (using distcc, and a SATA disk for
local storage) without problems, so at this point it looks like
an improvement. I'll keep it running over the week-end and unless I get
negative feedback, I'll commit it early next week.


-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--
Index: arm32/pmap.c
===
RCS file: /cvsroot/src/sys/arch/arm/arm32/pmap.c,v
retrieving revision 1.371
diff -u -p -u -r1.371 pmap.c
--- arm32/pmap.c28 Oct 2018 14:59:17 -  1.371
+++ arm32/pmap.c19 Apr 2019 15:55:30 -
@@ -217,6 +217,10 @@
 
 #include 
 
+#ifdef DDB
+#include 
+#endif
+
 __KERNEL_RCSID(0, "$NetBSD: pmap.c,v 1.371 2018/10/28 14:59:17 skrll Exp $");
 
 //#define PMAP_DEBUG
@@ -516,7 +520,8 @@ static size_t cnptes;
 #endif
 vaddr_t memhook;   /* used by mem.c & others */
 kmutex_t memlock __cacheline_aligned;  /* used by mem.c & others */
-kmutex_t pmap_lock __cacheline_aligned;
+kmutex_t pmap_pg_lock __cacheline_aligned;
+kmutex_t kpm_lock __cacheline_aligned;
 extern void *msgbufaddr;
 int pmap_kmpages;
 /*
@@ -538,9 +543,14 @@ vaddr_t pmap_directlimit;
 static inline void
 pmap_acquire_pmap_lock(pmap_t pm)
 {
+#if defined(MULTIPROCESSOR) && defined(DDB)
+   if (__predict_false(db_onproc != NULL))
+   return;
+#endif
+   
if (pm == pmap_kernel()) {
 #ifdef MULTIPROCESSOR
-   KERNEL_LOCK(1, NULL);
+   mutex_enter(_lock);
 #endif
} else {
mutex_enter(pm->pm_lock);
@@ -550,9 +560,13 @@ pmap_acquire_pmap_lock(pmap_t pm)
 static inline void
 pmap_release_pmap_lock(pmap_t pm)
 {
+#if defined(MULTIPROCESSOR) && defined(DDB)
+   if (__predict_false(db_onproc != NULL))
+   return;
+#endif
if (pm == pmap_kernel()) {
 #ifdef MULTIPROCESSOR
-   KERNEL_UNLOCK_ONE(NULL);
+   mutex_exit(_lock);
 #endif
} else {
mutex_exit(pm->pm_lock);
@@ -562,20 +576,20 @@ pmap_release_pmap_lock(pmap_t pm)
 static inline void
 pmap_acquire_page_lock(struct vm_page_md *md)
 {
-   mutex_enter(_lock);
+   mutex_enter(_pg_lock);
 }
 
 static inline void
 pmap_release_page_lock(struct vm_page_md *md)
 {
-   mutex_exit(_lock);
+   mutex_exit(_pg_lock);
 }
 
 #ifdef DIAGNOSTIC
 static inline int
 pmap_page_locked_p(struct vm_page_md *md)
 {
-   return mutex_owned(_lock);
+   return mutex_owned(_pg_lock);
 }
 #endif
 
@@ -3057,6 +3071,10 @@ pmap_enter(pmap_t pm, vaddr_t va, paddr_
 #else
const bool vector_page_p = (va == vector_page);
 #endif
+   struct pmap_page *pp = pmap_pv_tracked(pa);
+   struct pv_entry *new_pv = NULL;
+   struct pv_entry *old_pv = NULL;
+   int error = 0;
 
UVMHIST_FUNC(__func__); UVMHIST_CALLED(maphist);
 
@@ -3072,6 +3090,12 @@ pmap_enter(pmap_t pm, vaddr_t va, paddr_
 * test for a managed page by checking pg != NU

Re: evbarm hang

2019-04-19 Thread Manuel Bouyer
On Fri, Apr 19, 2019 at 08:33:00PM +1000, matthew green wrote:
> > So here's our deadlock: cpu 0 holds the kernel lock and wants the pool spin
> > mutex; cpu 1 holds the spin mutex and wants the kenrel lock.
> 
> AFAICT, cpu 1 is at fault here.  this locking order is
> backwards.
> 
> not sure why arm32 pmap operations on the kernel map are
> with the kernel lock instead of another lock.  the arm64
> version of this code doesn't take any lock.

Yes, this looks like it can cause lots of problems.

Indeed, I suspect the kernel_lock here could be replaced with pm_lock as
aarch64 does. I will try this.

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


Re: evbarm hang

2019-04-19 Thread Manuel Bouyer
On Fri, Apr 19, 2019 at 11:13:40AM +0100, Nick Hudson wrote:
> On 19/04/2019 10:10, Manuel Bouyer wrote:
> > Overnight lockdebug did find something:
> > login: [ 1908.3939406] Mutex error: mutex_vector_enter,504: spinout
> > 
> > [ 1908.3939406] lock address : 0x90b79074 type :   
> > spin
> > [ 1908.3939406] initialized  : 0x8041601c
> > [ 1908.3939406] shared holds :  0 exclusive:
> >   1
> > [ 1908.3939406] shares wanted:  0 exclusive:
> >   1
> > [ 1908.3939406] current cpu  :  0 last held:
> >   1
> > [ 1908.3939406] current lwp  : 0x91fc3760 last held: 
> > 0x91fc26e0
> > [ 1908.3939406] last locked* : 0x80416668 unlocked : 
> > 0x804169e8
> > [ 1908.3939406] owner field  : 0x00010500 wait/spin:
> > 0/1
> > 
> > [ 1908.4626458] panic: LOCKDEBUG: Mutex error: mutex_vector_enter,504: 
> > spinout
> > [ 1908.4626458] cpu0: Begin traceback...
> > [ 1908.4626458] 0x9e4a192c: netbsd:db_panic+0x14
> > [ 1908.4626458] 0x9e4a1944: netbsd:vpanic+0x194
> > [ 1908.4626458] 0x9e4a195c: netbsd:snprintf
> > [ 1908.4626458] 0x9e4a199c: netbsd:lockdebug_more
> > [ 1908.4626458] 0x9e4a19d4: netbsd:lockdebug_abort+0xc0
> > [ 1908.4626458] 0x9e4a19f4: netbsd:mutex_abort+0x34
> > [ 1908.4626458] 0x9e4a1a64: netbsd:mutex_enter+0x580
> > [ 1908.4626458] 0x9e4a1abc: netbsd:pool_get+0x70
> > [ 1908.4626458] 0x9e4a1b0c: netbsd:pool_cache_get_slow+0x1f4
> > [ 1908.4626458] 0x9e4a1b5c: netbsd:pool_cache_get_paddr+0x288
> > [ 1908.4626458] 0x9e4a1b7c: netbsd:m_clget+0x34
> > [ 1908.4626458] 0x9e4a1bdc: netbsd:dwc_gmac_intr+0x194
> > [ 1908.4626458] 0x9e4a1bf4: netbsd:gic_fdt_intr+0x2c
> > [ 1908.4626458] 0x9e4a1c1c: netbsd:pic_dispatch+0x110
> > [ 1908.4626458] 0x9e4a1c7c: netbsd:armgic_irq_handler+0xf4
> > [ 1908.4626458] 0x9e4a1db4: netbsd:irq_entry+0x68
> > [ 1908.4626458] 0x9e4a1dec: netbsd:tcp_send_wrapper+0x9c
> > [ 1908.4626458] 0x9e4a1e84: netbsd:sosend+0x6fc
> > [ 1908.4626458] 0x9e4a1eac: netbsd:soo_write+0x3c
> > [ 1908.4626458] 0x9e4a1f04: netbsd:dofilewrite+0x7c
> > [ 1908.4626458] 0x9e4a1f34: netbsd:sys_write+0x5c
> > [ 1908.4626458] 0x9e4a1fac: netbsd:syscall+0x12c
> > [ 1908.4626458] cpu0: End traceback...
> > 
> 
> Does show event tell you if dwc_gmac interrupts are being distributed to
> both cpus?

Looks like they are

> I think we need to prevent or protect against this.

but the IRQ handler is not registered as MPSAFE, so it should run under the
kernel lock

-- 
Manuel Bouyer 
 NetBSD: 26 ans d'experience feront toujours la difference
--


  1   2   3   4   5   6   7   >