Re: NVMM sync up from DragonFlyBSD

2024-01-27 Thread David Brownlee
On Sat, 27 Jan 2024 at 11:28, Emile 'iMil' Heitor  wrote:
>
>
> I've synced up our NVMM code (kmod, lib and tool) with its current
> state in DragonFlyBSD; if you're using NVMM on NetBSD as your
> hypervisor you might want to give it a try
> https://github.com/NetBSD/src/compare/trunk...NetBSDfr:NetBSD-src:nvmm
>
> I've also added a vmware compatible CPU frequency cpuid leaf which can
> be used to get CPU frequency from the host instead of doing spending
> 100ms in DELAY(). Qemu knows hot to expose it via the -cpu host,+invtsc
> flag.
>

Looks like the patch applies almost cleanly to -10, just a small
hand-patchable section at the top of both
sys/dev/nvmm/x86/nvmm_x86_{svm,vmx}.c
Going to give it a spin there :)

Thanks!

David


Re: PVH boot with qemu

2023-12-06 Thread David Brownlee
On Wed, 6 Dec 2023 at 11:37, Emile `iMil' Heitor  wrote:
>
> I got it working.
>
> NetBSD/amd64 kernel booting in PVH mode straight from qemu -kernel flag.
> It now needs a lot of cleaninig as it's basically a PoC, but here's a
> WIP patch if anyone's interested in hacking into it.
>
> https://imil.net/NetBSD/qemu-pvh.patch
>
> Let me rephrase: I *know* it is ugly at the moment. I *will* make it
> clean, just wanted to share the joy ;)
>
> Cheers,

*excellent* work!

David


Re: kern.boottime drift after boot?

2023-10-10 Thread David Brownlee
On Tue, 10 Oct 2023 at 18:07, Robert Elz  wrote:
>
> Date:Tue, 10 Oct 2023 12:42:48 +0100
> From:    David Brownlee 
> Message-ID:  
> 
>
>   | I have a system which records the output of "sysctl -n kern.boottime"
>   | as part of a dhcpcd-exit.hook to ensure some processing only occurs
>   | once per boot.
>
> Cron's @reboot might help with that, that's its purpose.  See crontab(5)

(More context) - It's used as a dhcpcd-exit.hook to ensure some
services are enabled only after an interface has an IP address, so in
this case cron @reboot would not fit.

>   |  kern.boottime (KERN_BOOTTIME)
>   |  A struct timespec structure is returned.  This structure 
> contains
>   |  the time that the system was booted.  That time is defined 
> (for
>   |  this purpose) to be the time at which the kernel first 
> started
>   |  accumulating clock ticks.
>
> That's correct, the issue is that the kernel doesn't really know what the
> time is, early in the boot sequence, it just takes a guess based either
> upon the RTC if the system has one (those tend not to be very accurate),
> or the last mod time of the root filesystem (much less accurate) otherwise.
>
> As Crystal said, as the system time is corrected, the kernel can form a
> better idea of what the time actually was when the system booted, based upon
> the corrections that are being made to the current time of day.
>
> kern.boottime always contains the time that the system believes it was
> booted, as best it knows what that was.   The man page section you qouted
> above is correct, and doesn't need updating.

The manpage is correct, but incomplete. On reading it without
understanding the implementation, there is ambiguity as to whether
kern.boottime will be constant for any given boot (unless I've missed
something elsewhere in the page). Furthermore it has the unfortunate
behaviour of 'usually' appearing to be constant, which leads to easy
assumptions, based on lack of clarity. Particularly as it would be
easy to have an implementation which did have a constant per boot
value, which would be more useful in some ways and less in others (I'm
not arguing at this point that we should switch to such an
implementation)

David


kern.boottime drift after boot?

2023-10-10 Thread David Brownlee
I have a system which records the output of "sysctl -n kern.boottime"
as part of a dhcpcd-exit.hook to ensure some processing only occurs
once per boot.

Except... it doesn't quite work as the value appears to not be
constant for a given boot.

I have one case where the value recorded by dhcpcd-exit.hook called during rc is
1696860044
but the value of sysctl -n kern.boottime is now
1696860043

sysctl(7) states:

 kern.boottime (KERN_BOOTTIME)
 A struct timespec structure is returned.  This structure contains
 the time that the system was booted.  That time is defined (for
 this purpose) to be the time at which the kernel first started
 accumulating clock ticks.

I'm assuming it's calculated as some form of offset - maybe it would
be better as an absolute value. If not, and it can drift about, then
I'll at least update the manpage and look for a different mechanism
for my 'once per boot' :)

David


Re: GPT attributes in dkwedge [PATCH]

2023-09-18 Thread David Brownlee
On Mon, 18 Sept 2023 at 18:21, Martin Husemann  wrote:
>
> On Mon, Sep 18, 2023 at 06:14:58PM +0100, David Brownlee wrote:
> > Specifically in the absence of any other information (empty devname?
> > etc), would it not be reasonable to fall back to the bootme marked
> > filesystem as a root filesystem candidate? I'm thinking about
> > minimally configured disks moving between machines
>
> No, the bootme is usually on the EFI system partion, which is also
> usually not set up as a root partition for NetBSD :-)

Ah - that doesn't seem to match the FreeBSD usage of bootme - do we
have something comparable to that?
https://man.freebsd.org/cgi/man.cgi?query=gptboot=0=8=FreeBSD+13.2-RELEASE+and+Ports=default=html

Our gpt(8) states "bootme flag is used to indicate which partition
should be booted by UEFI boot code", which could be read either way.


Re: GPT attributes in dkwedge [PATCH]

2023-09-18 Thread David Brownlee
On Sun, 17 Sept 2023 at 23:25, Robert Elz  wrote:

> [...]
>
> That is what you MUST NOT do, BOOTME has nothing whatever to do
> with what is root.   That's the part that must be done some other way.
>
> (The bit where the flag was copied into the wedge info was just a
> layer violation, and easy to avoid, as your patch showed, that was
> never the real issue.)
>
> Fortunately, it seems (as demonstrated by later discussion) that the
> "other way" already exists, and none of this is needed at all.

Apologies for a potentially dumb question.

Specifically in the absence of any other information (empty devname?
etc), would it not be reasonable to fall back to the bootme marked
filesystem as a root filesystem candidate? I'm thinking about
minimally configured disks moving between machines

Thanks

David


Re: [GSoC] Emulating missing Linux syscalls project questions

2023-03-22 Thread David Brownlee
On Sun, 19 Mar 2023 at 04:22, Theodore Preduta  wrote:
>
> > The Linux Test Project (http://linux-test-project.github.io) would help
> > not only with finding missing syscalls, but also with finding bugs /
> > missing functionality in the existing Linux emul code.
>
> Yes this is a great idea!  Although my interpretation of the project
> idea is that the expectations are that the binary is functional by the
> end of the summer.  I obviously will not be able to implement all
> missing syscalls by the end of the summer, so I would have to draw an
> arbitrary line as to what I would/would not try to implement.
>
> Which brings me to my next comment.
>
> > It would be nice to have this running on NetBSD.
>
> In what way exactly does the LTP not function on NetBSD?  I tried it
> today and (after a few hours of troubleshooting) seemingly got it to work.
>
> Some assorted notes about what I did/what it took to get it to work:
>
> - I only looked at system call tests (so for all I know the other types
> of tests could be what you're referring to).
>
> - The actual testcases themselves can be trivially (just add -static)
> statically compiled on any Linux distro and can be run individually just
> fine, but the rest of the testing infrastructure cannot (because glibc).
> (Most of my time spent on this was dealing with glibc versions)
>
> - Otherwise you can compile everything normally on OpenSUSE 15.4, and
> with suse15_base installed the binaries will *almost* just work.
>
> - The ltp-pan binary does depend on /dev/kmsg (which doesn't currently
> exist in the emul code), but only writes to it, so touch
> /emul/linux/dev/kmsg is sufficient to trick it into working.
>
> - As expected, lots of tests fail, but also lots of tests pass!  I
> haven't looked to hard into the failing tests (yet), but I didn't find
> anything too surprising in the list of failing tests.
>
> Overall, I did enjoy going down this rabbit hole!  It definitely taught
> me a few new things about how the emul subsystem behaves.

I think making progress on running LTP on NetBSD (*), and fixing up a
useful subset of missing or incomplete syscall implementations would
make an _excellent_ GSoC project :)

*: Even if its "Build all the test infrastructure on Linux, then run
it on NetBSD under Linux emulation", or "Build and run it on Linux,
with the individual tests run as a ssh to a NetBSD box" - as the more
interesting part is what the tests show

David


Re: Add five new escape sequences to wscons

2023-01-18 Thread David Brownlee
On Mon, 16 Jan 2023 at 17:20, Valery Ushakov  wrote:
>
> On Mon, Jan 16, 2023 at 09:18:53 -0300, Crystal Kolipe wrote:
>
> > It's useful, because these sequences correspond to the terminfo
> > capabilities rin, indn, vpa, hpa, and cbt as defined in the xterm
> > terminfo entry.  With these sequences implemented, it becomes
> > slightly more practical to set TERM=xterm when connecting to remote
> > systems that don't have a comprehensive terminfo database.
>
> Why is is desirable to set specifically TERM=xterm instead of, say,
> vt220, or whichever vt entry describes wscons the closest?
>
> For multi-line scroll the patch just calls scrollup/scrolldown, but
> that's not what the single-line scroll commands do (see
> wsemul_vt100.c)
>
> I'm actually not entirely convinced that it's even correct to describe
> vt220 as having sf/ind scrolling capabilities, b/c the vt220 scrolling
> sequences take the scrolling region into account and the terminfo
> capabilities for scrolling are defined to operate on the whole screen
> as far as I can tell.
>
> So in its current form I don't think this patch is suitable and I'm
> not convinced it's needed at all.

Technically the wscons terminal type is wsvt25, an extended ANSI
compatible terminal, already supporting more sequences than vt100.

Having it also support a useful subset of xterm, providing it doesn't
add an excessive amount of complexity, seems like a useful addition,
particularly if other systems also have a "wscons" with similar
additional handling.

Double checking some of the new capabilities may well be a good idea,
plus noting in comments that they exactly match xterm behaviour, and a
short note in the wscons manpage - I don't have enough ANSI/terminfo
context to add anything directly useful on that point.

Thanks

David


Re: SCSI Polled io fixes for mac68k with PDMA enabled.

2023-01-03 Thread David Brownlee
On Fri, 30 Dec 2022 at 01:08, Izumi Tsutsui  wrote:
>   I don't know a history why mac68k has both MI ncr5380 (GENERICSBC) and
>   the own NCR5380 driver (GENERIC with dev/mac68k5380.c) for years.

Some models only worked with dev/mac68k5380 and some only worked with
MI ncr5380, but there was never an intersection of people able to look
at it and affected machines.

I'm tempted to say by now if we can confirm MI ncr5380 working on any
useful subset of machines then it should be the one in GENERIC and
provide a GENERICNCR for the alternate

David


Re: SCSI Polled io fixes for mac68k with PDMA enabled.

2022-12-27 Thread David Brownlee
On Thu, 22 Dec 2022 at 11:47, Nathanial Sloss
 wrote:
>
> Hi,
>
> I've found while working with mac68k that devices that require polled io scsi
> transfers would fail with sbc pdma (pseudo dma) would fail when reading and
> writing to the respective device.
>
> I've found that virtual devices for rascsi and the scsi2sd drives would fail
> using pdma.  For virtual disks they would fail always when writing to the
> device, reads were ok.
>
> For virtual ethernet devices they would fail for reading and writing.
>
> To address this I've inctrouduced a flag for sbc.4 PDMA_NO_WRITE which would
> fallback consistently to polled io when writing to the deivice.
>
> Also to check the current scsi transfer control flag XS_CTL_POLL and if set it
> would not use PDMA for that particular transfer.
>
> Please see:
>
> ftp.netbsd.org/pub/NetBSD/misc/nat/sbc_poll_fix.diff
>
> Any objections?

Might it be possible to detect this at runtime - potentially by
trapping a timeout and downgrading to polled io?

David


Re: Dell PERC H330: no disks, no volumes

2022-09-15 Thread David Brownlee
On Thu, 15 Sept 2022 at 19:27, Brad Spencer  wrote:
>
> In the foggy recesses of my memory this is Just How It Is Done.  At my
> final $DAYJOB we had a set of systems that had some PERC controller in
> them.  The desire was to present the raw disks to Hadoop and the only
> way that could be done was to create a virtual disk for each physical
> device.  There was no other option available to us.

I was annoyed enough by this behaviour to swap out the PERC on my old
T320 for another model, specifically one for which I could find
generic LSI firmware, so it would expose the 8 disks directly to
NetBSD (for ZFS use)

mpii0: SAS9217-8i, firmware 20.0.7.0, MPI 2.0

David


Re: Debugging/fixing a kernel stalled not crashing

2022-08-19 Thread David Brownlee
Tangentially...

If it's an issue picking up the root filesystem, you could boot an
INSTALL type kernel with a built in ramdisk with dhcpcd and sshd
enabled, and see if you can ssh into the box (I think someone had
pre-built arm images which did just that, so the code should be out
there :)

David


Re: killed: out of swap

2022-06-15 Thread David Brownlee
On Wed, 15 Jun 2022 at 08:31, Johnny Billquist  wrote:
>
> On 2022-06-15 06:57, Michael van Elst wrote:
> > b...@softjar.se (Johnny Billquist) writes:
> >
> >> I don't see any realistic way of doing anything with that.
> >> It's basically the first process that tries to allocate another page
> >> when there are no more. There are no other processes at that moment in
> >> time that have the problem, so why should any of them be considered?
> >
> > They might be the reason for the memory shortage. You can prefer large
> > processes as victims or protect system services to keep the system
> > managable.
>
> So when one process tries to grow, you'd kill a process that currently
> have no issues in running? Which means you might end up killing a lot of
> non-problematic processes because of one runaway process? Seems to me to
> not be a good decision.

As opposed to the process which had a successful malloc some time ago
and is running without issues, and is just about to try to use some of
its existing allocation?

Both options are wrong in some cases. Having a way to influence the
order in which processes are chosen would seem to be the best way to
end up with a better outcome. The existing behaviour should remain an
option, but (at least for me) it would not be the one chosen

David


Re: killed: out of swap

2022-06-14 Thread David Brownlee
On Tue, 14 Jun 2022 at 13:33, Robert Elz  wrote:
>
> NetBSD implements overcommitted swap - many processes malloc()
> (or mmap() which that really becomes in the current implementation)
> far more memory than they're ever going to actually use.  It is only
> when some real physical memory is required (rather than simply a marker
> "zero filled page might be required here") that the system actually
> allocates any real resources.   Similarly pages mapped from a file only
> need swap space if they're altered - otherwise the file serves as the
> backing store for it.
>
> Once upon a time there was a method to turn overcommitted swap off, and
> require actual allocations (of RAM or swap) to be made for all reserved
> (virtual) memory.  I used to enable that all the time - but I haven't seen
> any mention of it in ages, and the mechanism might no longer still exist.

What might be interesting is a way to influence the order in which
processes are chosen to kill...

David


Re: Slightly off topic, question about git

2022-06-06 Thread David Brownlee
On Mon, 6 Jun 2022 at 06:59, Brian Buhrow  wrote:
>
> Hello.  At the risk of raising the debate about which version control 
> system we should
> use, I have a question about git, as well as a comment about it relative to 
> the NetBSD source
> tree.  I should preface my comments with the caveat that I am not by any 
> means a git expert,
> and, in fact, I'm barely able to get anything I want out of it.  With that 
> said, here are my
> questions and observations.  I'd be interested to know how others work around 
> these issues
> and/or what you think of my observations.
>
> 1.  In CVS, I can do something like:
> cvs log sys/dev/pci/if_bge.c
> and be given a complete history of the changes to that file, as well as a 
> list of all the
> branches that file participates in and which versions apply to each branch.  
> And, I can do this
> without having to download all of the history of that file onto my local 
> storage.
> It seems like the only way to do this with a git repository is to 
> download the entire
> source tree, along with its history and branches, using git clone with an 
> infinite depth.  Is
> this correct?  If not, how can I see all the branches of a given repository 
> without having to
> download the entire repository?

git inherently looks at the local copy of the repo. So your options are
- have a local copy
- ssh to somewhere with a local copy
- use a web tool or similar to browse

> 2.  Also, in my exploration of git, it seems like the git log command shows 
> all the commits for
> each tag, rather than the comments for a specific file or object in the 
> repository.  Again, is
> this correct?

You can do either or both - "git log trunk" "git log build.sh" or "git
log trunk build.sh"

As an aside, I have an alias of gl -> "git log --name-status" as I
really prefer to see the filenames changed in each commit

> If I am correct in my guesses about how git works, it seems like I 
> would have to download
> the entire history of the NetBSD source tree if I want to browse its 
> branches, or the commit
> history for any given file.  This is a lot of overhead to examine tiny 
> portions of the tree,
> relatively speaking, assuming we move to git for our version control system.  
> It strikes me
> that requiring this much storage space from developers, would be a regression 
> from what we
> currently do.  Since I think we're smarter than that and since we have very 
> smart people on our
> development team, I want to understand what it is that I don't get about git 
> that precludes me
> from having to download the entire history of the source tree from day one 
> while still
> retaining access to that history over time.

"It's a feature". Half :) - Seriously though, the ability to actually
browse and search the full history of a source tree as git allows
compared to the godawful eye-of-the-needle view that CVS provides is a
very valuable benefit of the tradeoff of having a local history. When
looking at source tree history I use a cloned copy of the github src,
then apply to the CVS tree as needed.

For people with limited resources it will be a pain, though there are
any number of services which provide remote web access to git trees.
Having said that, the ever increasing memory requirements of modern
gcc is a much bigger pain for limited resources with a relatively
smaller benefit.

I suspect most of this also works with s/git/hg/ assuming NetBSD
switches to a mercurial repo

David


Re: High kernel time, page scan rate & reclaims?

2021-12-05 Thread David Brownlee
On Sun, 5 Dec 2021 at 05:42, Paul Ripke  wrote:
>
> For the archives, since I just got annoyed again by the behaviour (I'm
> running netbsd-9), this was likely fixed in:
>
>  PR kern/54209: NetBSD 8 large memory performance extremely low
>  PR kern/54210: NetBSD-8 processes presumably not exiting
>  PR kern/54727: writing a large file causes unreasonable system behaviour
>
> in -current, and will be in netbsd-10.

Just curious, but would you be willing to test boot a current kernel
(with every other file unchanged) to see if it does resolve everything
for you? If you can reproduce it currently in single user mode even
better as it narrows the test even further :)

David


Re: [PATCH] Move DRM-driver firmware from base to its own set, gpufw

2021-09-23 Thread David Brownlee
On Thu, 23 Sept 2021 at 17:57, Robert Swindells  wrote:
>
> David Brownlee  wrote:
> >
> >If gpu firmware is somewhat special, is there any sense in moving it
> >to /usr/libdata/firmware/gpu/... ?
>
> No.
>
> It needs to be in /libdata so that it is guaranteed to be on the boot
> filesystem.

Apologies - read that as libdata/firmware -> libdata/firmware/gpu

David


Re: [PATCH] Move DRM-driver firmware from base to its own set, gpufw

2021-09-23 Thread David Brownlee
If gpu firmware is somewhat special, is there any sense in moving it
to /usr/libdata/firmware/gpu/... ?

David


Re: Some changes to autoconfiguration APIs

2021-08-01 Thread David Brownlee
As an alternative to switching config_found() to a C99 init variant...

Code could be added to a tool which processes the source (cough cough
"lint") to scan config_found() calls and pick up semantically invalid
parameter uses

David


Re: Some changes to autoconfiguration APIs

2021-08-01 Thread David Brownlee
On Sun, 1 Aug 2021 at 22:47, Jason Thorpe  wrote:
>
> > On Aug 1, 2021, at 1:56 PM, Mouse  wrote:
> >
> >>>  config_found(CF_VERSION, self, whatever, (const struct cfargs *){
> >>>  .search = ...,
> >>>  .locators = ...,
> >>>  })
> >
> >> What do you propose should be the behavior if the versions don't match?  I 
> >> h$
> >
> > I thought the mail you replied to said, though admittedly partly by
> > implication:
> >
> >>> config_found() needs to check passed cf_version and convert for old
> >>> versions.  We are still left with a long tail of conversion code in
> >>> config_found(), but callers Just Work.
>
> Right, "callers Just Work" is carrying a lot of water here.  I want to know 
> specifically how people think it should behave.  For example: What should 
> happen in the case of a semantic conflict that can't be resolved during 
> conversion?
>
> (If you can't tell, I'm a bit annoyed about folks having plenty of energy to 
> express their distaste with one solution, only to float a hand-wavy 
> alternative lacking specifics that also has flaws; sorry, abs@, I'm not 
> trying to pick on you here...).

Not at all - my goal was to propose a potential alternative, and
poking at gaps helps evaluation :)

As I see it:

1) netbsd-9 had an API which provided some degree of type safety, but
was the result of accreting a baroque combinations of functions and
parameters to the point where it was difficult to use correctly - and
the tree had any number of examples which were actively wrong, and
would fail at runtime, mostly with misbehaviour, but potentially with
panics

2) current has an API which is much easier to understand and use, had
a nice degree of forward compatibility, though introduces some
potential misuse cases which can only be detected at runtime - as a
deliberate tradeoff to achieve a simple, compat calling API given the
limitations of C

3) This email takes one of Taylor's suggestions and hangs an explicit
version on the calls, which should give reasonable forward
compatibility (not as good as 2, but better than 1), keeps his
improved type safety, to hopefully give a more limited set of cases
which would fail at runtime (mis-specified cfargs contents, and cases
where a valid cfargs_v1 cannot be converted into a current cfargs)

Focussing on 2 & 3, the runtime issues are

a) Tag params missing value params & similar (applicable to 2)
b) Semantically valid options which do not make sense (applicable to 2 & 3)
For both of these the kernel can panic, or fail the attach with a
nasty loud message (which I rather prefer), but we have the same
runtime issue to handle for both 2 & 3

c) Parameters which made sense for an earlier version of the kernel
API, but do not now (applicable to 2 & 3)
The obvious reply is "Don't do that", but if for some reason we have
to, option 3 potentially has an advantage here, as for example the
conversion code called by config_found() can know that the "search"
value in cfargs_v1 needs to be swizzled differently to that of
cfargs_v2

tl;dr - all options allow code to call into config with bad data,
which it has to handle (panic or log & fail attach), we can only try
to reduce, not eliminate that.

(Let me know if I've reduced the hand waving in the right area :)

David


Re: Some changes to autoconfiguration APIs

2021-08-01 Thread David Brownlee
On Sun, 1 Aug 2021 at 21:50, Jason Thorpe  wrote:
>
> > On Aug 1, 2021, at 12:48 PM, David Brownlee  wrote:
> >
> > Possible  thought to provide type safety with automatic versioning.
> >
> > Use C99 initializers with a CF_VERSION define. When cfargs changes we
> > bump CF_VERSION.
> >
> > config_found() needs to check passed cf_version and convert for old
> > versions. We are still left with a long tail of conversion code in
> > config_found(), but callers Just Work.
> >
> >   config_found(CF_VERSION, self, whatever, (const struct cfargs *){
> >   .search = ...,
> >   .locators = ...,
> >   })
>
> I would probably hide it in a macro (part of what I object to about this 
> method, which was floated before, is that it is needlessly verbose).
>
> What do you propose should be the behavior if the versions don't match?  I 
> have an idea in mind, but I want to hear a concrete proposal first.

Well, we're well into into perl TMTOWTDI territory here, but my first
thought would be:
- We start with CF_VERSION 1 and struct cfargs
- when bumping from 1 to 2, copy the existing cfargs to cfargs_v1 then
update, and add a convert_from_cfargs_v1 function
- config_found() starts by checking if cf_version != CF_VERSION and
calls convert_from_cfargs_v1 as needed
- when bumping from 2 to 3, repeat with _v2, plus update
convert_from_cfargs_v1, and add a new case to the start of
config_found()

David


Re: Some changes to autoconfiguration APIs

2021-08-01 Thread David Brownlee
On Sun, 1 Aug 2021 at 15:57, Jason Thorpe  wrote:
>
> > On Aug 1, 2021, at 5:15 AM, Martin Husemann  wrote:
> >
> > On Mon, May 10, 2021 at 10:30:09PM -0700, Jason Thorpe wrote:
> >>
> >>> On May 10, 2021, at 7:58 PM, matthew green  wrote:
> >>>
> >>> please, can we revert and re-do with a type-safe API.
> >>
> >> I don't plan to revert, but I will consider a betterly-typed API
> >> that's not extremely cumbersome to use.  I am not a fan of Taylor's
> >> proposals.  Concrete proposals welcome.
> >
> > Ping?
> >
> > A decision on this API needs to happen before the netbsd-10 branch
> > (this is on the branch blocker list) - we need to either backout or move
> > forward some way.
>
> The situation hasn’t changed.  I’m still waiting for concrete proposals.
>

Possible  thought to provide type safety with automatic versioning.

Use C99 initializers with a CF_VERSION define. When cfargs changes we
bump CF_VERSION.

config_found() needs to check passed cf_version and convert for old
versions. We are still left with a long tail of conversion code in
config_found(), but callers Just Work.

   config_found(CF_VERSION, self, whatever, (const struct cfargs *){
   .search = ...,
   .locators = ...,
   })

David


Re: 9.1: boot-time delay?

2021-05-18 Thread David Brownlee
On Tue, 18 May 2021 at 20:02, Mouse  wrote:
>
> I'm dealing with a turnkey product running under 9.1/amd64.  On certain
> hardware, there is a pause, almost exactly 22 seconds, during autoconf.
> I'm trying to eliminate it.  A sufficiently cut-down kernel does the
> job, but another cut-down kernel doesn't.  I'm trying to track down
> what's responsible.  (The kernel that eliminates the pause is used by
> the installer; the one that doesn't is the one that's used in normal
> operation.  Unless it turns out to be something essential for
> operation, I'd like to cut it out of the operational kernel.)
[...]
> [ 3.288539] uhub2: 4 ports with 4 removable, self powered
> [ 3.288539] uhub3: 6 ports with 6 removable, self powered
> [25.272567] wd0 at atabus0 drive 0
> [25.273568] wd0: 

I'd take a long hard look at what ata or atapi devices were configured
in the kernel - smells like a timeout (though I would have expected 30
seconds...) Though that seems obvious enough to have already been
checked :-p

David


Re: one remaining mystery about the FreeBSD domU failure on NetBSD XEN3_DOM0

2021-04-16 Thread David Brownlee
On Fri, 16 Apr 2021 at 08:41, Greg A. Woods  wrote:

> What else is different?  What am I missing?  What could be different in
> NetBSD current that could cause a FreeBSD domU to (mis)behave this way?
> Could the fault still be in the FreeBSD drivers -- I don't see how as
> the same root problem caused corruption in both HVM and PVH domUs.

Random data collection thoughts:

- Can you reproduce it on tiny partitions (to speed up testing)
- If you newfs, shutdown the DOMU, then copy off the data from the
DOM0 does it pass FreeBSD fsck on a native boot
- Alternatively if you newfs an image on a native FreeBSD box and copy
to the DOM0 does the DOMU fsck fail
- Potentially based on results above - does it still happen with a
reboot between the newfs and fsck
- Can you ktrace whichever of newfs or fsck to see exactly what its
writing (tiny *tiny* filesystem for the win here :)

David


Re: Bounties for xhci features: scatter-gather, suspend/resume

2021-03-31 Thread David Brownlee
On Fri, 26 Mar 2021 at 09:22, nia  wrote:
>
> On Thu, Mar 25, 2021 at 08:36:25PM +, co...@sdf.org wrote:
> > Hi all,
> >
> > I'd like to offer bounties for the following.
> > I am also utilizing the wiki to make it easy for others to add their own
> > bounties: http://wiki.netbsd.org/projects/funded/
> >
> > 
> >
> > xHCI resume support
> >
> > xhci is everywhere, and for many machines, it's the only remaining step
> > for a flawless suspend/resume experiences.
> > xhci_{suspend,resume} are unimplemented, and devices do not work after
> > resume.
> >
> > (Contact nia in http://gnats.netbsd.org/56050 for actual hardware testing)
> >
> > I can offer a bounty of $200 for this.
> > Offer valid until 1/July/2021.
>
> Offering another $100 for this, with the "win condition" being
> working suspend on a lenovo x250. A regression with resume on
> this machine may have been introduced in -current, it can resume
> successfully 100% of the time with -9.

I can add another $200 for working suspend/resume on a Thinkpad T480
(-current resume does not complete - it prints a few "WARNING: TSC
time went backwards by 2650670841" type lines, not sure of -9 state).

David


Re: X vs serial console?

2021-02-09 Thread David Brownlee
On Tue, 9 Feb 2021 at 17:59, Mouse  wrote:
>
> I don't know whether this is kernel or X11.  There are things pointing
> each way.
>
> At work, I've got 9.1 on an amd64 machine.  When I boot it normally -
> console on screen/keyboard - X works fine.
>
> But I'm having an issue.  The machine is rebooting on me, sometimes,
> and I don't know whether it's some kind of quasi-spontaneous hard-reset
> or whether it's a panic.  But, with X on the console, I can't tell
> whether there's a panic or not.
>
> So, I booted it with serial console.  But now, X doesn't seem to work.
> There are a number of curious things involved.

I think NetBSD would really benefit from a way to reparent the console
device at runtime (I appreciate this comment does not directly help in
any way at this point :)

AFAIK X requires a wsdisplay to run on - which you don't seem to get
with a serial console. I wonder if it might be possible to run it on
genfb?

Those dmesg outputs are _so_ different, that something seems very much
off - can you get dmesg.boot from both cases?

David


Re: zfs panic in zfs:vdev_disk_open.part.4

2020-11-30 Thread David Brownlee
On Sat, 28 Nov 2020 at 19:50, Yorick Hardy  wrote:
>
> Dear Juergen,
>
> Of course! I had a slight disaster with my CVS checkout, I will commit
> and request a pullup it once I have completed a new checkout.

Can confirm change and pullup fix the issue I was seeing - many thanks!

# uname -v
NetBSD 9.1_STABLE (GENERIC) #0: Sun Nov 29 11:41:49 UTC 2020
mkre...@mkrepro.netbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC
# zpool list
NAMESIZE  ALLOC   FREE  EXPANDSZ   FRAGCAP  DEDUP  HEALTH  ALTROOT
iris0  7.25T  67.9G  7.18T - 0% 0%  1.00x  ONLINE  -

:)

David


zfs panic in zfs:vdev_disk_open.part.4

2020-11-22 Thread David Brownlee
I'm seeing a (new?) panic on netbsd-9 with zfs. It seems to trigger
when a newly created zfs pool attempts to be mounted:

panic: vrelel: bad ref count
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x160
vcache_reclaim() at netbsd:vcache_reclaim
vrelel() at netbsd:vrelel+0x22e
vdev_disk_open.part.4() at zfs:vdev_disk_open.part.4+0x44e
vdev_open() at zfs:vdev_open+0x9e
vdev_open_children() at zfs:vdev_open_children+0x39
vdev_root_open() at zfs:vdev_root_open+0x33
vdev_open() at zfs:vdev_open+0x9e
vdev_create() at zfs:vdev_create+0x1b
spa_create() at zfs:spa_create+0x28c
zfs_ioc_pool_create() at zfs:zfs_ioc_pool_create+0x19b
zfsdev_ioctl() at zfs:zfsdev_ioctl+0x265
nb_zfsdev_ioctl() at zfs:nb_zfsdev_ioctl+0x38
VOP_IOCTL() at netbsd:VOP_IOCTL+0x54
vn_ioctl() at netbsd:vn_ioctl+0xa5
sys_ioctl() at netbsd:sys_ioctl+0x5ab
syscall() at netbsd:syscall+0x157
--- syscall (number 54) ---
7e047af6822a:
cpu0: End traceback...

Anyone seeing anything similar (I continue to have a bunch of other
boxes which use zfs without issue)

David


Re: Sample boot.cfg for upgraded systems (rndseed & friends)

2020-09-22 Thread David Brownlee
On Tue, 22 Sep 2020 at 18:02, Jonathan A. Kollasch
 wrote:
>
> On Tue, Sep 22, 2020 at 05:53:49PM +0100, David Brownlee wrote:
> > Should NetBSD be shipping a default boot.cfg in /usr/share/examples
> > (*) - thinking primarily of people who have upgraded from earlier
> > NetBSD versions.
> >
> > I was looking to add in rndseed & just generally sync with the latest
> > version but there doesn't seem to be a default example shipped with
> > the system
> >
>
> /boot.cfg is already shipped as part of the 'etc' set, and is handled by
> etcupdate(8) like any other configuration file.

Ah, thanks, excellent... then I have a different question :-p

What would people think of installing an original copy of the etc set
in /usr/share/examples/etc or similar - its 4.9M extracted and ~500K
compressed and the ability to compare what is on the system to what it
was shipped with would have saved me so much effort over the years :)


David


Sample boot.cfg for upgraded systems (rndseed & friends)

2020-09-22 Thread David Brownlee
Should NetBSD be shipping a default boot.cfg in /usr/share/examples
(*) - thinking primarily of people who have upgraded from earlier
NetBSD versions.

I was looking to add in rndseed & just generally sync with the latest
version but there doesn't seem to be a default example shipped with
the system

*: Or /usr/mdec, or /etc/... ?

David


Re: Logging a kernel message when blocking on entropy

2020-09-22 Thread David Brownlee
On Tue, 22 Sep 2020 at 12:35, Manuel Bouyer  wrote:
>
> On Tue, Sep 22, 2020 at 02:31:50PM +0300, Andreas Gustafsson wrote:
> > Manuel Bouyer wrote:
> > > I'm not sure we want a user-triggerable kernel printf enabled by default.
> > > This could be used to DOS the system (especially on serial consoles)
> >
> > You can already trigger kernel printfs as an unprivileged user.
> > The first one that comes to mind is "sorry, pid %d was killed:
> > orphaned traced process", but I'm sure there are many others.
>
> I think we should find and remove theses (or make them conditional)
> instead of adding unconditional new ones

Maybe a standard way of rate limiting such messages, including
indicating how many were skipped due to rate limiting when the next
one gets printed?

David


Re: fsck updating but not fixing filesystem

2020-08-25 Thread David Brownlee
On Mon, 24 Aug 2020 at 09:04, David Brownlee  wrote:
>
> On Sun, 23 Aug 2020 at 20:50, David Holland  wrote:
> >
> > On Sun, Aug 23, 2020 at 08:14:31PM +0100, David Brownlee wrote:
> >  >
> >  > This time I've run fsck -f repeatedly and each time it marks the
> >  > filesystem as clean, but the next run finds another issue.
> >  >
> >  > This is netbsd-9 amd64 stable from nyftp, DELL, PERC H710P controller,
> >  > running RAID1.
> >
> > Are you sure the raid is clean? If it's not you can get bizarre
> > behavior like this depending on which side of it any given read is
> > serviced from. (That is: any given fsck run will see some of one
> > version and some of the other and make some changes, which may or may
> > not be consistent with what it sees the next time, and it all might
> > converge or might not...)

Hardware raid for the win... Or in this case not.

Taking a block copy of the filesystem to another device and it comes
up clean on fsck. I'm... a little annoyed at what purported to be a
relatively nice Dell PERC raid card - battery backup an' all.

Thanks David - I should have known better to trust hardware... Now I
just need to work out the best way to get to a trustworthy system  :)

David


Re: fsck updating but not fixing filesystem

2020-08-24 Thread David Brownlee
On Mon, 24 Aug 2020 at 11:46, Mouse  wrote:
>
> > I think the general consensus is that ffs can be inconsistent it ways
> > fsck is unable to detect.
>
> ...much less fix.  Yes.  When I was doing the program that eventually
> got massaged into resize_ffs, during development I had some filesystems
> that were definitely corrupted but that fsck was happy with.  (I rather
> wish I'd saved some of them as test cases, but I didn't.)

Sounds like there is an in interesting fuzzing project in there for
someone - make a filesystem mage and the repeatedly damage it, then
see if fsck can fix it, then if you get a rump panic when moving
everything around, and then re-run fsck to see if it indicates any new
issues :)

(So far 3.5TB of my original RAID1 filesystem transferred to a plain
disk, so should be able to run some A/B fsck tests later today to
establish if the raid controller is the issue in this case)

David


David


Re: fsck updating but not fixing filesystem

2020-08-24 Thread David Brownlee
On Sun, 23 Aug 2020 at 20:50, David Holland  wrote:
>
> On Sun, Aug 23, 2020 at 08:14:31PM +0100, David Brownlee wrote:
>  >
>  > This time I've run fsck -f repeatedly and each time it marks the
>  > filesystem as clean, but the next run finds another issue.
>  >
>  > This is netbsd-9 amd64 stable from nyftp, DELL, PERC H710P controller,
>  > running RAID1.
>
> Are you sure the raid is clean? If it's not you can get bizarre
> behavior like this depending on which side of it any given read is
> serviced from. (That is: any given fsck run will see some of one
> version and some of the other and make some changes, which may or may
> not be consistent with what it sees the next time, and it all might
> converge or might not...)

No problems are indicated by envstat for mfii, or in the BIOS setup
interface (Careful phrasing there).

However, I have a spare 8TB disk I can attach to the onboard ahcisata,
dd the filesystem across and re-run the fsck to confirm.
(I may be a little while in following up with that result :)

On Sun, 23 Aug 2020 at 21:26, Michael Cheponis
 wrote:
>[...]
> Then I was wondering: given today's disks are mostly lying to the software 
> about how its (internally) configured --- is there a 'better' FFS
> (FFSv3 ?) that would better map to today's disks?  Might there be a better 
> FFSvN for SSDs vs big HDs?  Or just wait till ZFS is up to snuff?

I would seriously consider ZFS - I have a couple of other boxes
running ZFS, but this particular one panics if any zpool is mounted in
multiuser (kern/55602)

David


fsck updating but not fixing filesystem

2020-08-23 Thread David Brownlee
I have a reasonably large ffs filesystem (7.4GB, 35,459,874 files)
used as a dirvish backup target (dirvish creates a hardlink tree copy
of the previous backup, and then runs rsync over it to provide
relatively space efficient backups).

One of the rsync processes hung, and upon reboot fsck checked the
filesystem and marked it clean, but after a while it happened again,
and then again a third time.

This time I've run fsck -f repeatedly and each time it marks the
filesystem as clean, but the next run finds another issue.

This is netbsd-9 amd64 stable from nyftp, DELL, PERC H710P controller,
running RAID1.

filesystem was mounted -o log, which could have contributed to getting
into this state, but presumably fsck should be able to get it out?
(Waves hands and mumbles "triple indirect blocks")

Each fsck run takes a little over 2 hours to complete (hence the
desire to run with -o log)

A sample is below.

** /dev/rdk5
** File system is already clean
** Last Mounted on /home/media
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED  I=112567242  OWNER=1000 MODE=40775
SIZE=1536 MTIME=Jun  8 17:11 2020
DIR=?

SALVAGE? yes

MISSING '.'  I=112567242  OWNER=1000 MODE=40775
SIZE=1536 MTIME=Jun  8 17:11 2020
DIR=?

FIX? yes

MISSING '..'  I=112567242  OWNER=1000 MODE=40775
SIZE=1536 MTIME=Jun  8 17:11 2020
DIR=/.backup/server1/20200628/tree/opt/server/backup/source/e7/0154904991e7bc764e08dbcd93b5/8c

FIX? yes

** Phase 3 - Check Connectivity
** Phase 4 - Check Reference Counts
LINK COUNT FILE I=67564638  OWNER=1000 MODE=100664
SIZE=14190 MTIME=May 13 03:14 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564639  OWNER=1000 MODE=100664
SIZE=45384 MTIME=May 13 03:19 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564640  OWNER=1000 MODE=100664
SIZE=52785 MTIME=May 13 03:18 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564641  OWNER=1000 MODE=100664
SIZE=56018 MTIME=May 13 03:24 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564642  OWNER=1000 MODE=100664
SIZE=34840 MTIME=May 13 03:34 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564643  OWNER=1000 MODE=100664
SIZE=87961 MTIME=May 13 03:31 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564644  OWNER=1000 MODE=100664
SIZE=24847 MTIME=May 13 03:42 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564645  OWNER=1000 MODE=100664
SIZE=43803 MTIME=May 13 03:44 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564646  OWNER=1000 MODE=100664
SIZE=55538 MTIME=May 13 03:50 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564647  OWNER=1000 MODE=100664
SIZE=64131 MTIME=May 13 04:05 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564648  OWNER=1000 MODE=100664
SIZE=32730 MTIME=May 13 04:00 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564649  OWNER=1000 MODE=100664
SIZE=35156 MTIME=May 13 04:50 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564650  OWNER=1000 MODE=100664
SIZE=91008 MTIME=May 13 05:04 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=67564651  OWNER=1000 MODE=100664
SIZE=15127 MTIME=Jun  8 17:11 2020   COUNT 10 SHOULD BE 9
ADJUST? yes

LINK COUNT FILE I=103736490  OWNER=1000 MODE=100664
SIZE=12134 MTIME=Mar 17 01:08 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736491  OWNER=1000 MODE=100664
SIZE=12007 MTIME=Mar 17 01:08 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736492  OWNER=1000 MODE=100664
SIZE=13711 MTIME=Mar 17 01:13 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736493  OWNER=1000 MODE=100664
SIZE=5313 MTIME=Mar 17 01:14 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736494  OWNER=1000 MODE=100664
SIZE=9659 MTIME=Mar 17 01:14 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736495  OWNER=1000 MODE=100664
SIZE=32231 MTIME=Mar 17 01:19 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736496  OWNER=1000 MODE=100664
SIZE=50302 MTIME=Mar 17 01:19 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736497  OWNER=1000 MODE=100664
SIZE=56209 MTIME=Mar 17 01:20 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736498  OWNER=1000 MODE=100664
SIZE=18932 MTIME=Mar 17 01:20 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736499  OWNER=1000 MODE=100664
SIZE=47033 MTIME=Mar 17 01:21 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736500  OWNER=1000 MODE=100664
SIZE=20355 MTIME=Mar 17 01:21 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736501  OWNER=1000 MODE=100664
SIZE=5218 MTIME=Mar 17 01:22 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736502  OWNER=1000 MODE=100664
SIZE=12071 MTIME=Mar 17 01:24 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736503  OWNER=1000 MODE=100664
SIZE=51133 MTIME=Mar 17 01:25 2020   COUNT 12 SHOULD BE 11
ADJUST? yes

LINK COUNT FILE I=103736504 

Re: modules item #14 revisited

2019-12-07 Thread David Brownlee
Very much like this - would assume that modules.tgz goes away?

Could logical extensions to this be:
a) Allow including a miniroot as a separate file
b) Use ustarfs to allow handling this layout of kernel, modules and/or
miniroot as a (optionally compressed) tar file

Thanks

David


Re: Adding an ioctl to check for disklabel existence

2019-10-03 Thread David Brownlee
While I agree NetBSD needs to support and work well with GPT in order
to interoperate with other systems, there is also prior art in
extending disklabel to 64bits - OpenBSD did this back in 2007 (though
there were a fair few follow up commits to cleanup the fallout :)

https://github.com/openbsd/src/commit/ddfcbf38c8ab6225a6b172d829aa957007d2587f#diff-192d23728acf9d8a70ab7259784d4162

David


Linux emulation epoll support?

2019-05-17 Thread David Brownlee
Is anyone working on epoll() or inotify support for compat_linux?

Most recent Linux binaries seem to expect epoll() to be available.

I noticed that there was some work towards it in FreeBSD...
https://wiki.freebsd.org/linux-kernel

Thanks :)

David


Re: Proposal: new audio framework

2019-04-02 Thread David Brownlee
On Tue, 2 Apr 2019 at 08:11, Tetsuya Isaki  wrote:
> Here is details:
>
> On -current, as you know, blocksize are decided as follows:
>  1. audio layer selects some size and ask it to hardawre driver.
> This is round_blocksize interface.
>  2. if hardware driver cannot accept the size (for example, DMA
> restrictions), hardware driver returns desired new size.
>  3. audio layer accepts it unconditionally.
>
> Due to Step3's behavior and rumors (or obsoleted restriction?) that
> block size must be a power of two, many drivers return the different
> size even if proposed size is acceptable.
>
> AUDIO2 internal takes a block-oriented strategy, not a bytestream-
> oriented, for performance and simplicity.
> So, AUDIO2 changed it as follows:
>  a1. audio layer calculates suitable blocksize from its hardware
>  precision(stride), channels, frequency and blk_ms (= block
>  length in msec) parameters.
>  a2. and then ask it to hardware driver.  It's round_blocksize.
>  a3. But if the hardware driver returns the other size, audio layer
>  cannot accept it because proposed size was calculated from
>  hardware encoding.
> At the moment, I have no good idea for this case. :(

If the various hardware restrictions are simple enough that they could
be encoded as a small struct - eg
boolean power_of_two;
unsigned int min_size;
unsigned int max_size;
then would it be reasonable for AUDIO2 to request the restrictions
from the driver and adjust blk_ms or other parameters until it finds a
fit?


Re: setting DDB_COMMANDONENTER="bt" by default

2018-02-15 Thread David Brownlee
On 15 February 2018 at 17:51, Manuel Bouyer  wrote:

> On Thu, Feb 15, 2018 at 01:19:31AM +, Sevan Janiyan wrote:
> >
> >
> > > On 15 Feb 2018, at 01:09, Paul Goyette  wrote:
> > >
> > > Sounds like a good case for a custom kernel.  Not sure that such a
> > > specific situation would warrant turning this on for everyone...
> >
> > We do have this set by default on some config files albeit with
> differing commands to run e.g xen kernels.
>
> The problem with setting it by default is that the important information
> (the panic message, or the function where the fault happended) may
> be scrolled out of the screen by the stack trace. So I wouldn't
> recommenend activating it by default.
> the Xen kernels are a special case, becasue the console output
> happens in an environnement where it's easy to scroll back.
>

Is there some useful variant where the panic message is shown again at the
end of the stack trace, or the stack trace defaults to a very small number
of entries by default?

David


Re: Proposal: Disable autoload of compat_xyz modules

2017-08-03 Thread David Brownlee
On 3 August 2017 at 12:11, Maxime Villard  wrote:

> Le 03/08/2017 à 10:42, matthew green a écrit :
>
>> Otherwise it has to be balanced.

>>>
>>> Certainly. It does not seem to me that moving compat_linux* into modules
>>> is in
>>> any way illegitimate or unbalanced. That's the opinion I was stating.
>>>
>>
>> if you want to move useful and used by a large number of users
>> functionality out of GENERIC and into modules then first perhaps
>> you should consider fixing modules.
>>
>> there are a large number of basic functionality issues that no
>> one pushing modules has solved yet.  for a start, see lukem's
>> original proposal about having a kernel+modules container,
>> the functionality of which is a _essential_ before it's going
>> to be considered OK to remove standard functionality from
>> GENERIC.
>>
>
> If your argument now is that there are technical difficulties that make
> switching to a module approach a complicated business, beyond the
> simplistic "I
> don't want to type modload" stuff - which I don't agree with -, then
> that's a
> fair point.
>
> As I said, doing this work certainly involves, among others, finding a way
> to
> remove the many #ifdefs spread across the tree; and having tried to do so
> two
> years ago, I know it is a painful work.
>
> claiming that compat_linux isn't a major piece of usability
>> is simply ignoring reality.
>>
>
> I have never claimed it is not used. It is an important feature, but it
> also
> happens to have many places that need special care, which regularly turn
> out to
> be exploitable. If we can reduce the attack surface and at the same time
> keep
> the feature nearby, in a balanced way that does not impose too much burden
> on
> the regular users, then we should do it. But that's indeed ignoring the
> technical difficulties behind achieving this goal.
>

How about a sysctl to enable/disable any non netbsd_ compat usage.
With it off compat code in GENERIC will not be run and (non netbsd32 etc)
compat modules not loaded.

David


Re: DISKLABEL_EI option for system with MBR

2017-02-15 Thread David Brownlee
On 12 February 2017 at 11:57, Rin Okuyama  wrote:
> Michael, Martin, thank you for letting me know about wedge(4).
> It is exactly what I need! It is more portable than my patch.
> I withdraw the patch and the PR.

I think that DISKLABEL_EI would still be a good idea - as it would
make other endian disklabels Just Work for people (including easy
fstab usage)


Re: /dev/sdN -> /dev/sdN[cd] (was: port-amd64/51216: Can't create wedges on a large (3TB) disk, gpt is ok but dkctl gives an error message)

2016-06-07 Thread David Brownlee
On 7 June 2016 at 10:00, Robert Elz  wrote:
> Date:Mon, 6 Jun 2016 18:35:43 +0200
> From:Edgar =?iso-8859-1?B?RnXf?= 
> Message-ID:  <20160606163542.gr5...@trav.math.uni-bonn.de>
>
>   | > ie /dev/wd1 is a link to /dev/wd1d on i386 (etc) or /dev/wd1c (on sparc 
> etc)
>   | YES.
>
> I offer attached alternate patches, the first makes /dev/wd0 as a chrdev
> and the second as a link.
>
> I do not have all the various architectures that have the various different
> strategies for naming and minor-numbering disk devices to test this thoroughly
> though, but what I have tested seems to work, and the changes (both versions)
> are so simple they seem unlikely to fail (and if they do, the effect would
> just be that the new nodes would not be correct, all the ones we're used to
> having would be fine, so simply removing the bogus ones would return the
> universe to its current state.)
>
> I prefer the chrdev version ... it is robust against removal of the ?dNx
> node names, which (sometime later, after tools/scripts have been adapted
> not to seek out the ?dN[cd] device names explicitly) might be something to
> do on a system using GPT and wedges (or even disklabel wedge autodiscovery).
> It also will provoke any lingering bugs if anything is currently relying on
> vnode locking for device exclusivity (with two different vnodes for the same
> underlying device).But either version should work (only one of them
> of course!)
>
> Either version consumes 2 more names, and inodes, per disk device configured.
>
> Opinions?

Also would prefer the chrdev version. We probably want to ensure these
are added to install media as well (which may push some of them over a
current inode limit but that is much less of a tweak than the ongoing
kernel growth :)


Re: Locking strategy for device deletion (also see PR kern/48536)

2016-06-07 Thread David Brownlee
On 7 June 2016 at 11:28, Paul Goyette  wrote:
> Can anyone suggest a reliable way to ensure that a device-driver module can
> be _really_ safely detached?
>
> The module could theoretically maintain an open/ref counter, but making this
> MP-safe is "difficult"!  Even if the module were to provide a mutex to
> control increment/decrement of it's counter, there's still a problem:
>
> Thread 1 initiates a module-unload, which takes the mutex
>
> Thread 2 attempts to open the device (or one of its units), attempts to
> grab the mutex, and waits
>
> Back in thread 1, the driver's module unload code determines that it is safe
> to unload (no current activites queued, no current opens), so it
> goes forward and unmaps the module - including the mutex!
>
> If the unload code releases the mutex, then thread 2 resumes, at an address
> which has been unmapped, leading to all sorts of bad-stuff(tm).
> (And, if the unload code doesn't bother to release the mutex before
> destroying it, then thread 2 stalls indefinitely.)
>
> There currently doesn't seem to be a safe way to unload driver modules.
>
>
> Any good MP-safe suggestions?

Other than having the mutex be for a nullable pointer to the device
which persists after driver detach and is reattached when the driver
reattaches, which adds an extra pointer dereference for every use...
:/


Re: RAIDframe raidN device order

2016-04-20 Thread David Brownlee
On 20 April 2016 at 10:22, Edgar Fuß  wrote:
> When I configure my RAIDframe devcices using raidN.conf, I may run into the
> problem that after a reboot, the MPT controller may have assigned new pseudo
> SCSI Target ID to my SAS discs, so they get different sdN numbers and the
> array may fail to configure.
> The solution seems to be to use auto-configuration for the arrays. But then,
> how do I know which raidN device is which array?
> Is there any way to use UUIDs to solve the problem?

tl;dr - the Right Thing should Just Happen

autoconfig will try to persist the raidN number, so once you have them
setup you should be able to renumber any and all devices which provide
raid partitions and have everything still work with the same raidN
numbers (providing you can still boot the kernel if you are booting
from raid :)

The only time the raidN number will change is if you have two
autoconfig devices with the same number (eg: when adding an already
setup raid to an machine which overlaps with an existing raid)


Re: Simplify bridge(4)

2016-02-16 Thread David Brownlee
On 15 February 2016 at 04:01, Ryota Ozaki  wrote:
> On Sat, Feb 13, 2016 at 7:19 AM, Mouse  wrote:
>> Sounds to me as though the most sensible way to model that would be to
>> give the address to the bridge interface itself.
>>
>> I don't think I've tried that.  If it does not work, is there any
>> particular reason to add vether(4) rather than making it work?  If it
>> does work, what functionality would vether(4) provide over it?
>
> It's a design choice. FreeBSD adopts extending bridge(4) to assign
> IP addresses and OpenBSD adopts vether(4). Both work and neither
> is wrong.
>
> I prefer vether's approach because it keeps bridge(4) simple still
> providing the same functionality of extending bridge itself.

I think NetBSD supporting vether would also fix a couple of (at least
interesting to some :) related use cases.

a) Single interface machine running xen which needs the xen VMs on an
internal network with dhcp and VPN/NAT on the external interface (this
becomes quickly brain twisting and the solution is to plug in an
additional ethernet card, just to act as the bridge endpoint)

b) Running an emulator (which expects to tap onto an ethernet
interface) on a machine with only a wifi interface


Re: i386 vs radeondrmkms problem - isa attachments suck

2015-03-02 Thread David Brownlee
On 28 February 2015 at 09:44, matthew green m...@eterna.com.au wrote:


 hi folks.

 i've been trying to find a least-ugly solution to the radeondrmkms
 on i386 problem.  quick summary of what's wrong:

 radeondrmkms doesn't complete attachments (and most
 importantly create a wsdisplay) until mountroot completes.
 this means it happens quite late in boot.  in i386 GENERIC,
 vga@isa and pcdisplay@isa are still enabled and they will
 attach to the legacy vga device, and present a wsdisplay0
 to the system.  later, radeon0 attaches, and we get a
 wsdisplay1 that has taken over the console output.

 this leaves us with a non-working console output, and the inability
 to run X11 even if accessed remotely.

 my first attempt (that is currently commited), made the radeondrmkms
 driver attempt to map the isa vga registers to reserve them from the
 vga@isa, and while that worked on my serial console machine, it does
 not work on a normal system due to x86 consinit() attaching the
 basic vga console driver (so we get early console output.)  in this
 case, it has already mapped these registers (ie, radeon is unable
 to map them) and the later real attachment knows not to attempt it
 again.  so that method doesn't work.

 we could have the vga driver detach itself at the right point, but
 that leaves the console detached for quite a while, during the time
 that drm is getting setup (ie, we'd miss several of its early
 messages.)  that seems less than desireable.

 it was suggested having a fake driver to attach instead of vga and
 thus avoiding the second phase of vga attachment, however this does
 not work due to the way isa indirect attachment works.  the first
 match routine that returns non-zero is attached, and the order of
 routines called seems to be something config(1) generates.  so having
 a radeon@isa that returns a history priority does nothing if the
 ordering is bad.  this means that the current expectation of eg,
 the vga@isa vs pcdisplay@isa drivers (where vga returns a higher
 match) is not used, it just happens that the cfdata[] array has the
 vga@isa entry before pcdisplay@isa.

Is the ignoring of attach priority a general characteristic of
indirect buses, and might it make sense for config to be able to
explicitly prioritise the order the cfdata[] entries? I know uebayasi@
has been rototilling config and wondered if he could be interested...
:)

 this is not a problem for the old drm code, as it does not create
 wsdisplay itself, but relies on the vga driver to do so.

 (see isa.c:isasearch() config_match*() call for where the first match
 to return non-zero is used.)

 any one have any other ideas?  at this point to make DRMKMS work for
 i386 on -7, i think we may have to createa a LEGACY kernel that has
 the vga|pcdisplay@isa drivers (and probably no drm at all?), and turn
 these devices off in GENERIC itself, but perhaps someone has a less
 ugly idea.


Re: posix message queues and multiple receivers

2013-12-05 Thread David Brownlee
On 3 December 2013 22:45, David Laight da...@l8s.co.uk wrote:
 On Tue, Nov 26, 2013 at 01:32:44PM -0500, Mouse wrote:

 When serving a request takes nontrivial time, and multiple requests can
 usefully be in progress at once, it is useful - it typically improves
 performance - to have multiple workers serving requests.  NFS, as
 mentioned above, is a fairly good example (in these respects).

 Except that NFS is a bad example, and mostly should have a single server.

 If you could arrange a NFS server for each disk spindle you might win.

 But what tends to happen is that the disk 'elevator' algorithm makes
 one of the server process wait ages for its disk access to complete,
 by which time the client has timed out and resubmitted the RPC request.
 The effect is that a slightly overloaded NFS server hits a catastrophic
 overload and transfer rates become almost zero.

 Run a single nfsd and it all works much better.

On that basis should the NetBSD default be changed from -n 4?


Re: high load, no bottleneck

2013-09-24 Thread David Brownlee
http://www.math.uni-bonn.de/people/ef/dotcache/ has a typo in the
first subheading Dotache :)

On 24 September 2013 13:38, Edgar Fuß e...@math.uni-bonn.de wrote:
 We want fsync to do a disk sync, and client are unlikely to be fixable.
 In my case, the culprit was SQLite used by browsers and dropbox.
 As these were not fixable, I ended up writing a system that re-directs these
 SQLite files to local storage 
 (http://www.math.uni-bonn.de/people/ef/dotcache).

 RMW?
 Read-Modify-Write.
 On a RAID 4/5, writing anything that's not an entire stripe needs either to
 read the rest of the stripe (to be able to compute the new parity) before
 writing the modified part and the parity; or it (if you modify less than half
 the stripe) reads both the old data and old parity to compute the new parity.
 You don't have that on RAID 1, of course.


Re: high load, no bottleneck

2013-09-24 Thread David Brownlee
crap, apologies for the non checked return address.

In the interest of trying to make a relevant reply - doesn't nfs3
support differing COMMIT sync levels which could be leveraged for
this? (assuming your server is stable :)

aside
I recall using NFS for file storage at Dreamworks in the late '90s and
discovering the reason that the SGI file servers boxes outperformed
everything else is that they lied to the client and indicated data has
been synced to disk as soon as it hit memory.

Wonderful performance feature... until someone insisted in putting
known buggy ATM drivers into production which could give up to a GB of
lost data when the fileservers paniced...
/aside


Re: divergence of ffs flags

2013-09-07 Thread David Brownlee
On 3 September 2013 03:04, David Holland dholland-t...@netbsd.org wrote:
 It seems that FreeBSD's and NetBSD's ffs superblock flags have been
 allowed to diverge:
[...]
 -#define FS_SUJ 0x008   /* Filesystem using softupdate journal */
 +#define FS_INDEXDIRS   0x008   /* kernel supports indexed directories */
[...]
 -#define FS_NFS4ACLS0x100   /* file system has NFSv4 ACLs enabled */
 -#define FS_INDEXDIRS   0x200   /* kernel supports indexed directories */
 -#define FS_TRIM0x400   /* issue BIO_DELETE for deleted blocks */
 +#define FS_DOWAPBL 0x100   /* Write ahead physical block logging */
 +#define FS_DOQUOTA20x200   /* in-filesystem quotas */

What are the options?

I assume we can version something in the superblock so new NetBSD 
FreeBSD code could resolve the overlaps but that doesn't help old
code...

Pick new conflicting flags for the overlaps, ask FreeBSD to reserve
them, add code to support both versions to all branches, and then in a
release or so migrate across to them?


Re: netbsd32 emulation in driver open() or read()

2011-08-31 Thread David Brownlee
On 30 August 2011 16:05, Manuel Bouyer bou...@antioche.eu.org wrote:

 On Tue, Aug 30, 2011 at 10:19:20AM -0400, Christos Zoulas wrote:
  On Aug 30,  3:18pm, bou...@antioche.eu.org (Manuel Bouyer) wrote:
  -- Subject: Re: netbsd32 emulation in driver open() or read()
 
  |  Yes, look at PK_32 in the process flags. If you are going to do this,
 please
  |  look at what FreeBSD did with bpf_ts/bpf_xhdr and the time format
 changes
  |  and do the same (provide timespec/bintime etc). This is how they
 handle
  |  compatibility mode too.
  |
  | This is related to the BIOCSTSTAMP ioctl isn't it ? I can see how it's
 used
  | in kernel but I couldn't find it in userland. So, to me it looks like
  | the old bpf_hdr is used most of the time ...
  | I'm not sure if it's worth implementing BIOCSTSTAMP (and we have to
 assure
  | compat for bpf_hdr anyway)
 
  Might as well bite the bullet and do the whole thing because with 10Gb+
  ethernet what we have now just does not cut it.

 This is not only the BIOCSTSTAMP that we need then, but also the zero-copy
 stuff, and probably more. And userland tools to use it (because AFAIK
 freebsd's tcpdump still uses the old bpf_hdr ...)

 That may be nice to have, but won't help with my problem which is
 getting a N32 mips binary to talk to a N64 kernel.


If the structure was versioned to have 64 bit fixed sized timestamps, then
the problem goes away for new code, though it does leave a COMPAT50 issue
for older code...


Re: The default system module area path

2011-08-10 Thread David Brownlee
On 10 August 2011 10:53, Marc Balmer mbal...@netbsd.org wrote:
 Currently, we install kernel modules under the following path

 /stand/arch/release/name/name.kmod

 The duplication of the name probabably was meant to prevent escaping the
 path when a module name like ../../../foo was given on the commandline.

 I recently changed the module loading behaviour so that a module that is
 loaded from the default system module area must not, and can not,
 contain a path separator character.

 Therefore I suggest that we install modules into

 /stand/arch/release/name.kmod


Seems excessively sane... (+1)


Re: Catweasel driver

2010-02-06 Thread David Brownlee
On 6 February 2010 13:33, Frank Wille fr...@phoenix.owl.de wrote:
 Joerg Sonnenberger wrote:

 Can't you use the approach e.g. of the wpi(4) driver and load the
 firmware image from the filesystem?

 No, firmload(9) would not really be an option, because I want to detect
 connected devices, like a keyboard and floppy disk drives during boot.
 Without the firmware this would be impossible.

firmload can currently load from a set of directories. Has anyone
considered extending firmload to optionally load from memory as well -
possibly an included ramdisk image? That would allow the choice of
building in firmware images which could be loaded at boot and then the
memory released. Plus it keeps a single consistent API.

Just a thought :)


Hang with heavy build on cgd and MP

2010-01-24 Thread David Brownlee
I have a largish java app which hang my netbsd-5 amd64 X60s when
building on cgd. It normally happens on the second consecutive build,
using the native openjdk7.

I originally noticed in on cgd on dk, but removing
DKWEDGE_METHOD_BSDLABEL didn't affect matters. Moving the tree from
the cgd partition on to a normal one (on the same disk) avoids the
issue, as does using cpuctl to take one of the two cpu core offline.

It hangs hard - I can't drop into ddb. This is happening under sources
from a day or so ago, and from early January.

Does anyone have any thoughts?

Thanks


Re: check reprogram PCI BAR

2010-01-19 Thread David Brownlee
2010/1/19 Manuel Bouyer bou...@antioche.eu.org:
 On Tue, Jan 19, 2010 at 12:57:30PM -0600, David Young wrote:
 On Tue, Jan 19, 2010 at 12:57:57PM +0100, Manuel Bouyer wrote:
  On Tue, Jan 19, 2010 at 12:50:55PM +0100, Christoph Egger wrote:
   Why are the *FIXUP options disabled by default in x86 kernels?
 
  Because on some systems it reprograms the BARs in a way which doesn't
  work. I'm not sure the kernel can do this in a reasonable and safe
  way anyway, it would need detailled knowledge of the hardware,
  which may not be available.

 What detailed knowledge do you have in mind?

 For example, device for which the kernel has no drivers, but still
 have registers mapped in I/O or memory space.
 I'm sure PC hardware also have a few fun things I don't know about :)

[Cutting across from a concurrent thread on another list...]

It would be nice if the *FIXUP options could be made runtime
configurable, so they could be enabled by 'boot -c' - would allow
people to get at them from a stock GENERIC...