Re: panic options

2023-09-12 Thread Mouse
>> There is a call to panic() if the kernel detects that there is no
>> console device found, I would like to make this call to it just
>> reboot without dropping into ddb.

> Well.  Not completely what you ask for, but...

> sysctl -w ddb.onpanic=0 (or even -1)

Well, notice that the original post writes of panicking if "there is no
console device found", which sounds to me like something that happens
during boot, much earlier than /etc/sysctl.conf can affect.

I'd say, rather, DDB_ONPANIC=0 in the kernel config and then
ddb.onpanic=1 during boot.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: panic options

2023-09-12 Thread Johnny Billquist

On 2023-09-12 23:55, Robert Swindells wrote:


Is there any way to get panic(9) to behave differently in some places
than others?

There is a call to panic() if the kernel detects that there is no
console device found, I would like to make this call to it just reboot
without dropping into ddb.

The amdgpu driver fails to initialize about 9 times in 10 for me so
would like to reduce the amount of typing needed.


Well. Not completely what you ask for, but...

sysctl -w ddb.onpanic=0 (or even -1)

You could place that in /etc/sysctl.conf as well. See sysctl(7) for some 
more details.


  Johnny

--
Johnny Billquist  || "I'm on a bus
  ||  on a psychedelic trip
email: b...@softjar.se ||  Reading murder books
pdp is alive! ||  tryin' to stay hip" - B. Idol


Re: panic options

2023-09-12 Thread Michael van Elst
r...@fdy2.co.uk (Robert Swindells) writes:

>There is a call to panic() if the kernel detects that there is no
>console device found, I would like to make this call to it just reboot
>without dropping into ddb.

Not without modifications.

You could include the nullcons (i.e. boot without console) and
detect the situation and reboot later in an rc script.



panic options

2023-09-12 Thread Robert Swindells


Is there any way to get panic(9) to behave differently in some places
than others?

There is a call to panic() if the kernel detects that there is no
console device found, I would like to make this call to it just reboot
without dropping into ddb.

The amdgpu driver fails to initialize about 9 times in 10 for me so
would like to reduce the amount of typing needed.


Re: raidctl -A softroot and a failed component

2023-09-12 Thread Michael van Elst
e...@math.uni-bonn.de (Edgar =?iso-8859-1?B?RnXf?=) writes:

>I had a RAIDframe level 1 RAID with the first component marked as failed, e,g,
>   component0: failed
>   /dev/dkN: optimal
>and although the set was configured -A softroot, the kernel didn't configure 
>raid0a as the root file system, presumably because the dk numbers didn't match.

The kernel will collect disks into raid sets based on the raidframe label
and use the found partitions or wedges on the raid sets, it then checks
each raid set if the booted_device matches a component device and choses
partition 0 from it (wedges don't have partitions, but "partition 0"
is the wedge itself).

IMHO the builroothack should just go away, same for lots of "guesswork"
done by the machdep code.

If you use unique wedge names, you can specify the root volume by name
at least on some archs, e.g. x86 and arm are fine.




Re: typo in raidN.conf leading to alledgedly failed component

2023-09-12 Thread Brian Buhrow
hello Edgar.  What you could do to improve that situation is exactly 
what you did, except 
use raidctl -C on the last configuration step.  Then, instead of having to 
recopy the entire 
contents of the array, all you need to do is recalculate the parity.  

-thanks
-Brian



typo in raidN.conf leading to alledgedly failed component

2023-09-12 Thread Edgar Fuß
I set up a server with a RAIDframe level 1 RAID and forgot raidctl -A softroot.
So I booted an installation kernel via PXE, typed in a /tmp/raid0.conf and did
raidctl -c /tmp/raid0.conf raid0, only I mistyped the name of the first 
component. That led to "hosed component", but worse, failed that component and 
apperantly marked the first component failed on the label of the second. 
So after raidctl -u raid0, correcting my typo and raidctl -c /tmp/raid0.conf 
raid0, 
I ended up with a failed first component that dind't relay fail.

Can that be improved?


raidctl -A softroot and a failed component

2023-09-12 Thread Edgar Fuß
I had a RAIDframe level 1 RAID with the first component marked as failed, e,g,
component0: failed
/dev/dkN: optimal
and although the set was configured -A softroot, the kernel didn't configure 
raid0a as the root file system, presumably because the dk numbers didn't match.
I was sitting in front of the console, so I could easily type raid0a etc.,
but this would have prevented an automatic boot.

I'm afraid little can be done about that weird situation?


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Emmanuel Dreyfus
On Tue, Sep 12, 2023 at 06:33:34PM +0200, Martin Husemann wrote:
> > Right, it is bad design. Re-parsing the GPT in raidframe code seems wrong 
> > too. Shall I modify struct dkwedge_info to add an union for the underlying
> > partition entry? That way, we can look for specific detail such as a 
> > bootme flag.
> 
> You mean the struct gpt_ent?

i mean something like this:
union {
struct gpt_ent   gpt;
struct disklabel disklabel;
struct mbr   mbr;
struct apple_part_map_entry apm;
(...)
} part_entry;

> But Michaels question is a good one - why can't the bootloader deal
> with all this and pass e.g. the start offset of the root partition
> via (optional) boot params?

I will have a look at that, but I am not sure how it works with multiboot.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Martin Husemann
On Tue, Sep 12, 2023 at 04:22:04PM +, Emmanuel Dreyfus wrote:
> Right, it is bad design. Re-parsing the GPT in raidframe code seems wrong 
> too. Shall I modify struct dkwedge_info to add an union for the underlying
> partition entry? That way, we can look for specific detail such as a 
> bootme flag.

You mean the struct gpt_ent?

That would be a serious abstraction violation, the wedge itself is supposed
to abstract away all persistent storage schemes that could have led to
its creation (and GPT is just one way to get there).

But Michaels question is a good one - why can't the bootloader deal
with all this and pass e.g. the start offset of the root partition
via (optional) boot params?

Martin


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Emmanuel Dreyfus
On Tue, Sep 12, 2023 at 11:07:33AM -, Michael van Elst wrote:
> A flag in GPT that is supposed to be used by a bootloader now causes
> changes in the kernel disk infrastructure to be used for a magic solution
> limited to the raidframe driver ?

Right, it is bad design. Re-parsing the GPT in raidframe code seems wrong 
too. Shall I modify struct dkwedge_info to add an union for the underlying
partition entry? That way, we can look for specific detail such as a 
bootme flag.

That would requires a compat layer for pre netbsd-10 userland.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: vfs_lockf changes (was: CVS commit: src)

2023-09-12 Thread Andrew Doran
On Mon, Sep 11, 2023 at 03:29:39PM +, Martin Husemann wrote:

> On Sun, Sep 10, 2023 at 02:45:53PM +, Andrew Doran wrote:
> > Module Name:src
> > Committed By:   ad
> > Date:   Sun Sep 10 14:45:53 UTC 2023
> > 
> > Modified Files:
> > src/common/lib/libc/gen: radixtree.c
> > src/sys/kern: init_main.c kern_descrip.c kern_lwp.c kern_mutex_obj.c
> > kern_resource.c kern_rwlock_obj.c kern_turnstile.c subr_kcpuset.c
> > vfs_cwd.c vfs_init.c vfs_lockf.c
> > src/sys/rump/librump/rumpkern: rump.c
> > src/sys/rump/librump/rumpvfs: rump_vfs.c
> > src/sys/sys: namei.src
> > src/sys/uvm: uvm_init.c uvm_map.c uvm_readahead.c
> > 
> > Log Message:
> > - Do away with separate pool_cache for some kernel objects that have no 
> > special
> >   requirements and use the general purpose allocator instead.  On one of my
> >   test systems this makes for a small (~1%) but repeatable reduction in 
> > system
> >   time during builds presumably because it decreases the kernel's cache /
> >   memory bandwidth footprint a little.
> > - vfs_lockf: cache a pointer to the uidinfo and put mutex in the data 
> > segment.
> 
> Hi Andrew, 
> 
> most (all?) vfs_lockf tests are now crashing, e.g.
> 
> https://netbsd.org/~martin/evbearmv7hf-atf/469_atf.html#failed-tcs-summary

I backed out these changes since I don't have time to debug this week. 
Apologies for the interruption and thanks a lot for the reports.

Andrew
 
> 
> Core was generated by `t_vnops'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  rumpuser_mutex_spin_p (mtx=0x0) at 
> /work/src/lib/librumpuser/rumpuser_pth.c:166
> [Current thread is 1 (process 18892)]
> #0  rumpuser_mutex_spin_p (mtx=0x0) at 
> /work/src/lib/librumpuser/rumpuser_pth.c:166
> #1  0x71c672dc in mutex_enter (mtx=0x71d78fc0 ) at 
> /work/src/lib/librump/../../sys/rump/librump/rumpkern/locks.c:164
> #2  0x71d3dd58 in lf_advlock (ap=0x7ff897b8, head=0x6cdc7140, size= out>) at /work/src/lib/librumpvfs/../../sys/rump/../kern/vfs_lockf.c:896
> #3  0x71d51bec in VOP_ADVLOCK (vp=0x6cde0600, id=, 
> op=, fl=, flags=64) at 
> /work/src/lib/librumpvfs/../../sys/rump/../kern/vnode_if.c:1867
> #4  0x71be5f64 in do_fcntl_lock (fd=fd@entry=0, cmd=cmd@entry=8, 
> fl=fl@entry=0x7ff89818) at 
> /work/src/lib/librump/../../sys/rump/../kern/sys_descrip.c:285
> #5  0x71be6154 in sys_fcntl (l=, uap=0x7ff898fc, 
> retval=) at 
> /work/src/lib/librump/../../sys/rump/../kern/sys_descrip.c:365
> #6  0x71c70d9c in sy_call (rval=0x7ff898f4, uap=0x7ff898fc, l=0x6ce09c80, 
> sy=0x71cbf6f0 ) at 
> /work/src/lib/librump/../../sys/rump/../sys/syscallvar.h:65
> #7  sy_invoke (code=0, rval=0x7ff898f4, uap=0x7ff898fc, l=0x6ce09c80, 
> sy=0x71cbf6f0 ) at 
> /work/src/lib/librump/../../sys/rump/../sys/syscallvar.h:94
> #8  rump_syscall (num=0, num@entry=92, data=0x7ff898fc, 
> data@entry=0x7ff898f4, dlen=dlen@entry=12, retval=0x7ff898f4, 
> retval@entry=0x7ff898ec) at 
> /work/src/lib/librump/../../sys/rump/librump/rumpkern/rump.c:782
> #9  0x71c63e8c in rump___sysimpl_fcntl (fd=fd@entry=0, cmd=cmd@entry=8, 
> arg=arg@entry=0x7ff899e8) at 
> /work/src/lib/librump/../../sys/rump/librump/rumpkern/rump_syscalls.c:1322
> #10 0x007f8120 in fcntl_getlock_pids (tc=0x855234 
> , mp=0x8329c0 "/mnt") at 
> /work/src/tests/fs/vfs/t_vnops.c:941
> #11 0x00811134 in atfu_ext2fs_fcntl_getlock_pids_body (tc=0x855234 
> ) at /work/src/tests/fs/vfs/t_vnops.c:1085
> #12 0x71ae8658 in atf_tc_run (tc=0x855234 
> , resfile=) at 
> /work/src/external/bsd/atf/dist/atf-c/tc.c:1024
> #13 0x71ae4f84 in run_tc (exitcode=, p=0x7ff8a840, 
> tp=0x7ff8a83c) at /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:510
> #14 controlled_main (exitcode=, add_tcs_hook=0x7ff8a86c, 
> argv=, argc=) at 
> /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:580
> #15 atf_tp_main (argc=, argv=, 
> add_tcs_hook=0x7ff8a86c) at 
> /work/src/external/bsd/atf/dist/atf-c/detail/tp_main.c:610
> #16 0x007e6aa4 in ___start ()
> 
> Martin


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Michael van Elst
mar...@duskware.de (Martin Husemann) writes:

>   if (flags & DKW_FLAGS_BOOTME)
>   rf_boot_from_filesystem_starting_at(dkw.offset)


A flag in GPT that is supposed to be used by a bootloader now causes
changes in the kernel disk infrastructure to be used for a magic solution
limited to the raidframe driver ?




Re: GPT attributes in dkwedgeq

2023-09-12 Thread Martin Husemann
On Tue, Sep 12, 2023 at 08:35:00AM +, Emmanuel Dreyfus wrote:
> How are we supposed to discover the start block number? All rf_buildroothack()
> knows is dk_nwedges from struct disk. It gets struct dkwedge_info using 
> dkwedge_find_by_parent(), which second argument seems to be the first dk 
> device number to inspect. Why can't a dkwedge_get_flags() use wedge nummber 
> as well?

The whole wedge in-kernel API confuses me, sorry if this makes
the discussion more complex.

I was expecting somehting like:

for (size_t ndx = 0; ndx < dk_nwedges; ndx++) {
struct dkwedge_info dkw;
unsigned flags;
device_t t = dkwedge_find_by_parent(parent, ndx);
>>> dkwedge_get_info(t, );
flags = dkwedge_get_flags(t);
if (flags & DKW_FLAGS_BOOTME)
rf_boot_from_filesystem_starting_at(dkw.offset)
...

but we don't seem to have something like dkwedge_get_info.

> The protoypes could closely match dkwedge_find_by_parent()
> int dkwedge_get_flags((const char *parent_name, size_t *dknum, int *flags)
> int dkwedge_set_flags((const char *parent_name, size_t *dknum, int flags)
> 
> This size_t * type seems oddd, but it is what dkwedge_find_by_parent() uses.

For boot purposes, and given that we have already discovered all wedges on
this disk and (for now) the list ist stable: yes, it could.


Martin


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Emmanuel Dreyfus
On Tue, Sep 12, 2023 at 09:33:23AM +0200, Martin Husemann wrote:
> You could use the block number where the wedge starts instead, but it probably
> does not matter a lot and we won't see concurrent changes to the gpt for
> this use case :-)

How are we supposed to discover the start block number? All rf_buildroothack()
knows is dk_nwedges from struct disk. It gets struct dkwedge_info using 
dkwedge_find_by_parent(), which second argument seems to be the first dk 
device number to inspect. Why can't a dkwedge_get_flags() use wedge nummber 
as well?

The protoypes could closely match dkwedge_find_by_parent()
int dkwedge_get_flags((const char *parent_name, size_t *dknum, int *flags)
int dkwedge_set_flags((const char *parent_name, size_t *dknum, int flags)

This size_t * type seems oddd, but it is what dkwedge_find_by_parent() uses.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Martin Husemann
On Tue, Sep 12, 2023 at 07:29:57AM +, Emmanuel Dreyfus wrote:
> On Tue, Sep 12, 2023 at 09:27:50AM +0200, Martin Husemann wrote:
> > partnum here is what gpt(8) calls index?
> 
> I was more thinking about wedge number: wedgenum would be a better name.

I don't mind the name, just wanted to have semantics clear.
You could use the block number where the wedge starts instead, but it probably
does not matter a lot and we won't see concurrent changes to the gpt for
this use case :-)

Martin


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Emmanuel Dreyfus
On Tue, Sep 12, 2023 at 09:27:50AM +0200, Martin Husemann wrote:
> partnum here is what gpt(8) calls index?

I was more thinking about wedge number: wedgenum would be a better name.

-- 
Emmanuel Dreyfus
m...@netbsd.org


Re: GPT attributes in dkwedgeq

2023-09-12 Thread Martin Husemann
On Tue, Sep 12, 2023 at 07:21:10AM +, Emmanuel Dreyfus wrote:
> 2b) struct dkwedge_softc is private, we could add a field here
> without being intrusive, it would just need accessor functions:
> int dkwedge_set_flags(struct disk *pdk, int partnum, int flags);
> int dkwedge_get_flags(struct disk *pdk, int partnum, int *flags);

partnum here is what gpt(8) calls index?

This sounds like the least intrusive and flexible way to me.

Martin


GPT attributes in dkwedgeq

2023-09-12 Thread Emmanuel Dreyfus
Hello

Context: if a RAIDframe set contains a GPT, it does not honour the
bootme atrtribute when loking for the root partition. The current 
behavior hardcodes the use of the first partition.

Fixing this requires rf_buildroothack() to know about the bootme
GPT attribute, but that information is not available from struct 
dkwedge_info.  I see the following ways to address the problem:

1) Read the GPT again in rf_buildroothack() and walk it looking
for the bootme attribute. This involves duplicating a lot of
code from sys/dev/dkwedge/dkwedge_gpt.c. I gave it a try and it
works, but I am not happy with doing the job twice, both in 
source code and at runtime.

2) Make sys/dev/dkwedge/dkwedge_gpt.c record the bootme flag
when it walks the GPT. The problem is where can we store it?

2a) We could create an enha struct dkwedge_info2, it could
have an flags field, or we could even an a union field for the
underlying partition entry (MBR, BSD label, GPT...). But since
struct dkwedge_info is known to userland, that would not be
an easy change.

2b) struct dkwedge_softc is private, we could add a field here
without being intrusive, it would just need accessor functions:
int dkwedge_set_flags(struct disk *pdk, int partnum, int flags);
int dkwedge_get_flags(struct disk *pdk, int partnum, int *flags);

3) Another idea?

I would favor solution 2b. For now it would just be a flags field
with just the bootme flag, but since struct dkwedge_softc is private, 
we can easily  change that to something more complicated later, if 
needed.

Opinions?

-- 
Emmanuel Dreyfus
m...@netbsd.org