from:"Cy Schubert"

Re: Panic after update main-n269202-4e7aa03b7076 -> n269230-f6f67f58c19d

2024-04-09 Thread Cy Schubert

Cy Schubert writes:
> In message , Gleb Smirnoff writes:
> > On Tue, Apr 09, 2024 at 07:02:11PM +0200, FreeBSD User wrote:
> > F> The crash is still present on the most recent checked out sources as of 
> mi
> > nutes ago.
> > F> I just checked out on HEAD the latest commits (see below, just for the r
> ec
> > ord and to prevent
> > F> being wrong here).
> > F> 
> > F> [...]
> > F> commit 841cf52595b6a6b98e266b63e54a7cf6fb6ca73e (HEAD -> main, origin/ma
> in
> > , origin/HEAD)
> >
> > Is the crash same or different? Can you please share backtrace?
>
> The new panic is:
>
> Fatal trap 12: page fault while in kernel mode
> cpuid = 3; apic id = 03
> fault virtual address   = 0x28
> fault code  = supervisor read data, page not present
> instruction pointer = 0x20:0x80729d8d
> stack pointer   = 0x28:0xfe00b59c0a70
> frame pointer   = 0x28:0xfe00b59c0aa0
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= interrupt enabled, resume, IOPL = 0
> current process = 2697 (rpcbind)
> rdi: f80004fcd720 rsi:  rdx: fe00b59c0b68
> rcx:   r8: 0001  r9: 3b9ac9e0
> rax: 3b9aca00 rbx: fe00b59c0b68 rbp: fe00b59c0aa0
> r10: 0020 r11:  r12: 
> r13: 0020 r14: 0020 r15: f80004fcd720
> trap number = 12
> panic: page fault
> cpuid = 3
> time = 1712682162
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> 0xfe00b59c0760
> vpanic() at vpanic+0x135/frame 0xfe00b59c0890
> panic() at panic+0x43/frame 0xfe00b59c08f0
> trap_fatal() at trap_fatal+0x40b/frame 0xfe00b59c0950
> trap_pfault() at trap_pfault+0x46/frame 0xfe00b59c09a0
> calltrap() at calltrap+0x8/frame 0xfe00b59c09a0
> --- trap 0xc, rip = 0x80729d8d, rsp = 0xfe00b59c0a70, rbp = 
> 0xfe00b59c0aa0 ---
> uiomove_faultflag() at uiomove_faultflag+0x9d/frame 0xfe00b59c0aa0
> uipc_soreceive_stream_or_seqpacket() at uipc_soreceive_stream_or_seqpacket+0
> x38c/frame 0xfe00b59c0b30
> soreceive() at soreceive+0x2f/frame 0xfe00b59c0b50
> clnt_vc_soupcall() at clnt_vc_soupcall+0x139/frame 0xfe00b59c0c00
> sorwakeup_locked() at sorwakeup_locked+0x98/frame 0xfe00b59c0c20
> uipc_sosend_stream_or_seqpacket() at uipc_sosend_stream_or_seqpacket+0x58e/f
> rame 0xfe00b59c0ce0
> sousrsend() at sousrsend+0x5f/frame 0xfe00b59c0d40
> dofilewrite() at dofilewrite+0x7f/frame 0xfe00b59c0d90
> sys_write() at sys_write+0xb3/frame 0xfe00b59c0e00
> amd64_syscall() at amd64_syscall+0x115/frame 0xfe00b59c0f30
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00b59c0f30
> --- syscall (4, FreeBSD ELF64, write), rip = 0x1d82f79281a, rsp = 
> 0x1d82c63be78, rbp = 0x1d82c63bee0 ---
> Uptime: 39s
> Dumping 515 out of 7969 MB:..4%..13%..22%..32%..41%..53%..63%..72%..81%..91%
>
> (kgdb) bt
> #0  __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
> #1  doadump (textdump=textdump@entry=1) at /opt/src/git-src/sys/kern/kern_sh
> utdown.c:404
> #2  0x806bd7d9 in kern_reboot (howto=260) at 
> /opt/src/git-src/sys/kern/kern_shutdown.c:524
> #3  0x806bdcf2 in vpanic (fmt=0x80ae0f0d "%s", 
> ap=ap@entry=0xfe00b59c08d0) at /opt/src/git-src/sys/kern/kern_shutdown.c
> :976
> #4  0x806bdb43 in panic (fmt=) at 
> /opt/src/git-src/sys/kern/kern_shutdown.c:892
> #5  0x80a597fb in trap_fatal (frame=0xfe00b59c09b0, eva=40) at 
> /opt/src/git-src/sys/amd64/amd64/trap.c:950
> #6  0x80a59846 in trap_pfault (frame=, usermode=false, 
> signo=, ucode=) at /opt/src/git-src/sys/amd64/
> amd64/trap.c:758
> #7  
> #8  uiomove_faultflag (cp=0xf80004fcd720, n=32, 
> uio=uio@entry=0xfe00b59c0b68, nofault=nofault@entry=0) at 
> /opt/src/git-src/sys/kern/subr_uio.c:240
> #9  0x80729ce9 in uiomove (cp=0xf80004fcd720, n=0, 
> uio=uio@entry=0xfe00b59c0b68) at /opt/src/git-src/sys/kern/subr_uio.c:19
> 3
> #10 0x80774f1c in uipc_soreceive_stream_or_seqpacket 
> (so=0xf800361f4000, psa=, uio=0xfe00b59c0b68, 
> mp0=, controlp=0xfe00b59c0bc0, flagsp=0xfe00b59c0ba8)
>  at /opt/src/git-src/sys/kern/uipc_usrreq.c:1420
> #11 0x8076d4ff in soreceive (so=0xf80004fcd720, 
> so@entry=0xf800361f4000, psa=psa@entry=0x0, uio=uio@entry=0xfe00b59c
> 0b68, mp0=0x0, mp0@entry=0xfe00b59c0bb8, controlp=0x1, 
> controlp@entry=0xfe0

Re: kernel crash in tcp_subr.c:2386

2024-02-12 Thread Cy Schubert

In message <20240212193044.e089d...@slippy.cwsent.com>, Cy Schubert writes:
> In message <625e0ea4-9413-45ad-b05c-500833a1d...@freebsd.org>, 
> tuexen@freebsd.o
> rg writes:
> > > On Feb 12, 2024, at 10:36, Alexander Leidinger =
> >  wrote:
> > >=20
> > > Hi,
> > >=20
> > > I got a coredump with sources from 2024-02-10-144617 (GMT+0100):
> > Hi Alexander,
> >
> > we are aware of this problem, but haven't found a way to reproduce it.
> > Do you know how to reproduce this?
>
> I've reproduced this by rebooting any one of my machines in my basement. 
> The other machines will panic as below.
>
> I've reverted the three tcp timer commits, expecting one of them to be the 
> cause.

Another data point:

I build on a build machine and NFS mount /usr/obj on my other machines. 
Another symptom of this problem is that the NFS share will appear 
corrupted. And df -htnfs will sometimes not display the mounted NFS share. 
If not a kernel page fault, random kernel memory can be overwritten 
resulting in bizarre behaviour prior.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: kernel crash in tcp_subr.c:2386

2024-02-12 Thread Cy Schubert

; No locals.
> > #15 0x808597d6 in syscallenter (td=3D0xf8068ef99740)
> >at =
> /space/system/usr_src/sys/amd64/amd64/../../kern/subr_syscall.c:186
> >se =3D 0x80a48330 
> >p =3D 0xfe07f29995c0
> >sa =3D 0xf8068ef99b30
> >error =3D 
> >sy_thr_static =3D 
> >traced =3D 
> > #16 amd64_syscall (td=3D0xf8068ef99740, traced=3D0)
> >at /space/system/usr_src/sys/amd64/amd64/trap.c:1192
> >ksi =3D {ksi_link =3D {tqe_next =3D 0xfe08a079ef30,
> >tqe_prev =3D 0x808588af }, ksi_info =3D =
> {
> >si_signo =3D 1, si_errno =3D 0, si_code =3D 2015268872, =
> si_pid =3D -512,
> >si_uid =3D 2398721856, si_status =3D -2042,
> >si_addr =3D 0xfe08a079ef40, si_value =3D {sival_int =3D =
> -1602621824,
> >  sival_ptr =3D 0xfe08a079ee80, sigval_int =3D =
> -1602621824,
> >  sigval_ptr =3D 0xfe08a079ee80}, _reason =3D {_fault =3D=
>  {
> >_trapno =3D 1489045984}, _timer =3D {_timerid =3D =
> 1489045984,
> >_overrun =3D 17999}, _mesgq =3D {_mqd =3D 1489045984}, =
> _poll =3D {
> >_band =3D 77306605406688}, _capsicum =3D {_syscall =3D =
> 1489045984},
> >  __spare__ =3D {__spare1__ =3D 77306605406688, __spare2__ =
> =3D {
> >  1489814048, 17999, 208, 0, 0, 0, 992191072,
> >  ksi_flags =3D 975329968, ksi_sigq =3D 0x8082f8f3 =
> }
> > #17 
> > No locals.
> > #18 0x3af13b17fc9a in ?? ()
> > No symbol table info available.
> > Backtrace stopped: Cannot access memory at address 0x3af13a225ab8
> > ---snip---
> >=20
> > Any ideas?
> >=20
> > Due to another issue in userland, I updated to 2024-02-11-212006, but =
> I have the above mentioned version and core still in a BE if needed
> >=20
> > Bye,
> > Alexander.
>
>



-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: noatime on ufs2

2024-01-30 Thread Cy Schubert

In message <3f6cf45c-3d34-4da6-9b81-337eb70bb...@karels.net>, Mike Karels 
write
s:
> On 30 Jan 2024, at 15:48, Cy Schubert wrote:
>
> > In message  c
> > om>
> > , Rick Macklem writes:
> >> On Tue, Jan 30, 2024 at 10:49=E2=80=AFAM Mike Karels  wro
> t=
> >> e:
> >>>
> >>> On 30 Jan 2024, at 3:00, Olivier Certner wrote:
> >>>
> >>>> Hi Warner,
> >>>>
> >>>>> I strongly oppose this notion to control this from loader.conf. Root i=
> >> s
> >>>>> mounted read-only, so it doesn't matter. That's why I liked Mike's
> >>>>> suggestion: root isn't special.
> >>>>
> >>>> Then in fact there is nothing to oppose.  You've just said yourself tha=
> >> t root is mounted first read-only.  As Mike already said, it is remounted 
> r=
> >> /w in userland later in the boot process.  I just re-checked the code, bec
> a=
> >> use I only had a vague recollection of all this, and can confirm.
> >>>>
> >>>> I mentioned the need to modify '/etc/loader.conf' as a possible consequ=
> >> ence, not as a goal.  Given what we have established, there is no need to 
> c=
> >> hange it at all.
> >>>>
> >>>> The root FS is thus in no way more special in the sysctl proposal than =
> >> with Mike's (assuming it doesn't rely on sysctl), this is an independent p
> r=
> >> operty due to the boot process design.
> >>>
> >>> With the possible exception that the sysctl mechanism might then have to
> >>> apply to mount update.
> >>>
> >>>>>>> It also seems undesirable to add a sysctl to control a value that th=
> >> e
> >>>>>>> kernel doesn't use.
> >>>>>>
> >>>>>> The kernel has to use it to guarantee some uniform behavior irrespect=
> >> ive
> >>>>>> of the mount being performed through mount(8) or by a direct call to
> >>>>>> nmount(2).  I think this consistency is important.  Perhaps all
> >>>>>> auto-mounters and mount helpers always run mount(8) and never deal wi=
> >> th
> >>>>>> nmount(2), I would have to check (I seem to remember that, a long tim=
> >> e ago,
> >>>>>> when nmount(2) was introduced as an enhancement over mount(2), the st=
> >> ance
> >>>>>> was that applications should use mount(8) and not nmount(2) directly)=
> >> .
> >>>>>> Even if there were no obvious callers of nmount(2), I would be a bit
> >>>>>> uncomfortable with this discrepancy in behavior.
> >>>
> >>> Based on a quick git grep, it looks like most of the things in base use
> >>> nmount(2), not mount(2).  If they use mount(8), then it's not a problem
> >>> because mount(8) would be the first thing to get things right.  If, by
> >>> mount helpers, you mean things like mount_nfs and mount_mfs, then mount(8
> =
> >> )
> >>> uses them rather than the reverse.  I also don't remember any admonition
> >>> not to use nmount(2).  mount(8) has a limited set of file system types th
> =
> >> at
> >>> it handles directly.
> >>>
> >>>>> I disagree. I think Mike's suggestion was better and dealt with POLA a=
> >> nd
> >>>>> POLA breaking in a sane way. If the default is applied universally in =
> >> user
> >>>>> space, then we need not change the kernel at all.
> >>>>
> >>>> I think applying the changes to userland only is really a bad idea.  I'=
> >> ve already explained why, but going to do it again in case you missed that
> .=
> >>   If you have counter-arguments, fine, but I would like to see them.
> >>>>
> >>>> Changing userland only causes a discrepancy between mount(8) and nmount=
> >> (2).  Even if the project would take a stance that nmount(2) is not a publ
> i=
> >> c API and mount(8) must always be used, the system call will still be ther
> e=
> >>   And if it's not supposed to be used, what's the problem with changing it
> =
> >>  as well?
> >>>
> >>> I don't think that stance has been taken; nmount(2) is certainly document
> =
> >> ed.
> >>> But I think that user level changes are required in both cases.  First, f
> =
> >> or
> >>> the kernel to do the right thing, it needs to

Re: noatime on ufs2

2024-01-30 Thread Cy Schubert

t; atime ..." is given on the command line, noatime will not be included in
> > the kernel options.  The kernel can't tell why, whether nothing was speci=
> fied
> > or the option was explicit.  In theory, three states can be encoded using
> > nmount; options could include "atime", "noatime", or neither.  But that's
> > not what the current user level does, so changes are required.  Given tha=
> t,
> > it makes the most sense to have mount(8) and others to incorporate the
> > default into their operation, and just give the kernel the answer.  btw,
> > see mntopts(3) for where this code would go.
> These days most mount options are parsed in the kernel via vfs_getopts(),
> but not "atime". It appears that "(no)atime" sets/clears MNT_NOATIME in
> userspace via the getmntopts() function that lives in
> /usr/src/sbin/mount/getmntopts.c.
>
> I think this is mostly cruft left over from the mount(2)->nmount(2) convers=
> ion,
> for generic options that cover all file systems.
>
> Personally, I like the idea of the addition of a defaults line in
> fstab(5), but am
> not sure what needs to be done for things like auto mounting?

automountd will require addition of of options to existing configuration. 
am-utils users can add a default line. Or an addition of a "default" 
specification, which would make it incompatible with Linux and Solaris. 
Currently our autofs is 100% compatible (minus the /net bug) with both.

>
> I'll admit I do not see what the default value of "(no)atime" is, so long a=
> s it
> can be overridden on a per mount basis. A change to what the installer sets=
> ,
> seems fine to me.
>
> rick
>

[...]


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-27 Thread Cy Schubert

On January 26, 2024 7:13:15 PM PST, Ed Maste  wrote:
>On Wed, 24 Jan 2024 at 15:43, Julian H. Stacey  wrote:
>>
>> Probably many do, clueless there's a proposal to remove them,
>> as many wont be tracking lists (I havent been tracking lately,
>> focused on moving home, other will have other distractions)
>
>As Rod suggested I'll have the tools emit a warning when they are run,
>so that those users will become aware.
>https://reviews.freebsd.org/D43585
>https://reviews.freebsd.org/D43586
>

We can also point people to the two new ports.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:Web:  https://FreeBSD.org
NTP: Web:  https://nwtime.org
e^(i*pi)+1=0

Pardon the typos. Small keyboard in use.

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-25 Thread Cy Schubert

In message <20240125101308.92e93...@slippy.cwsent.com>, Cy Schubert writes:
> In message <84c6f3b1-58b3-44f8-aeaf-35f78e059...@quip.cz>, Miroslav Lachman 
> wri
> tes:
> > On 25/01/2024 06:50, Cy Schubert wrote:
> > > In message 
> > >  l.
> > c
> >
> >
> > >>
> > >> What can they do that gpart can't do?
> > > 
> > > This was quite a while ago, booted off my recovery USB attempting to repa
> ir
> > > some self caused damage. The ability to edit (vi) a file with starting
> > > addresses and lengths, visually using bsdlabel, was suited to my panicked
> > > state as I worked to recover the machine.
> > > 
> > > A visual view of columns of a bsdlabel, editing a label using vi, checkin
> g
> > > and double checking numbers before committing them is handy.The visual
> > > format and the ability to adjust the numbers in an editor before committi
> ng
> > > them is handy. You can't do this with gpart, as it's transactional. And
> > > bsdinstall doesn't give one the opportunity to check the numbers in detai
> l
> > > on a console before committing them.
> >
> > If you really like your editor of choice to edit partition table, you 
> > can use gpart backup and gpart restore like this:
> >
> > gpart backup ada0 > ada0.part
> > vi ada0.part
> > gpart restore -F -l < ada0.part
>
> That would work.
>
> >
> > > Maybe a good GSoC project may be to replace bsdlabel's driect writes to
> > > disk with geom calls. Though, t doesn't need to be bsdlabel, but some kin
> d
> > > of utility that displays the existing label in an editor session where
> > > changes can be made, using the editor, and committed. This could even be 
> an
> > > enhancement to bsdinstall: call it expert mode or whatever.
> >
> > Manipulating partition table in editor session can be achieved by few 
> > lines of shell script as a wrapper around gpart backup & gpart restore.
>
> Or just build a gpart edit mode with the functions used to implement backup 
> and restore. Excellent idea. Thank you. A small project to work on.
>
> >
> > Kind regards
> > Miroslav Lachman

A freebsd-bsdlabel port has been created making way for its removal.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-25 Thread Cy Schubert

In message <84c6f3b1-58b3-44f8-aeaf-35f78e059...@quip.cz>, Miroslav Lachman 
wri
tes:
> On 25/01/2024 06:50, Cy Schubert wrote:
> > In message  c
>
>
> >>
> >> What can they do that gpart can't do?
> > 
> > This was quite a while ago, booted off my recovery USB attempting to repair
> > some self caused damage. The ability to edit (vi) a file with starting
> > addresses and lengths, visually using bsdlabel, was suited to my panicked
> > state as I worked to recover the machine.
> > 
> > A visual view of columns of a bsdlabel, editing a label using vi, checking
> > and double checking numbers before committing them is handy.The visual
> > format and the ability to adjust the numbers in an editor before committing
> > them is handy. You can't do this with gpart, as it's transactional. And
> > bsdinstall doesn't give one the opportunity to check the numbers in detail
> > on a console before committing them.
>
> If you really like your editor of choice to edit partition table, you 
> can use gpart backup and gpart restore like this:
>
> gpart backup ada0 > ada0.part
> vi ada0.part
> gpart restore -F -l < ada0.part

That would work.

>
> > Maybe a good GSoC project may be to replace bsdlabel's driect writes to
> > disk with geom calls. Though, t doesn't need to be bsdlabel, but some kind
> > of utility that displays the existing label in an editor session where
> > changes can be made, using the editor, and committed. This could even be an
> > enhancement to bsdinstall: call it expert mode or whatever.
>
> Manipulating partition table in editor session can be achieved by few 
> lines of shell script as a wrapper around gpart backup & gpart restore.

Or just build a gpart edit mode with the functions used to implement backup 
and restore. Excellent idea. Thank you. A small project to work on.

>
> Kind regards
> Miroslav Lachman


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-25 Thread Cy Schubert

In message <2369865.bDOn7JOVgO@ravel>, Olivier Certner writes:
> --nextPart5823302.8T7jmnknE8
> Content-Transfer-Encoding: 7Bit
> Content-Type: text/plain; charset="UTF-8"; protected-headers="v1"
> From: Olivier Certner 
> To: Cy Schubert 
> Subject: Re: Removing fdisk and bsdlabel (legacy partition tools)
> Date: Thu, 25 Jan 2024 10:43:18 +0100
> Message-ID: <2369865.bDOn7JOVgO@ravel>
> In-Reply-To: <20240125055019.ccf1...@slippy.cwsent.com>
> MIME-Version: 1.0
>
> Hi,
>
> > A visual view of columns of a bsdlabel, editing a label using vi, checking 
> > and double checking numbers before committing them is handy.The visual 
> > format and the ability to adjust the numbers in an editor before committing
>  
> > them is handy. You can't do this with gpart, as it's transactional. And 
> > bsdinstall doesn't give one the opportunity to check the numbers in detail 
> > on a console before committing them.
>
> You seem to want to be able to stack a number of modifications before actuall
> y pushing them.  Actually, gpart(8) already can do that!  Please see the "OPE
> RATIONAL FLAGS" section in gpart(8).
>
> In between your tentative modifications, just use 'gpart show' to see where y
> ou stand.

gpart(8) should have a vi mode. That is different than having changes 
pending and committing them. A person is still entering commands rather 
than doing something like editing a spreadsheet, which is what editing a 
file is kind-of like. Even something like,

gpart show ada0s2 > some_file
vi some_file
gpart batch ada0s2 < some_file


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-24 Thread Cy Schubert

In message 
, Warner Losh writes:
> --b0adc9060fbe7411
> Content-Type: text/plain; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> On Wed, Jan 24, 2024, 10:07=E2=80=AFPM Cy Schubert  om>
> wrote:
>
> > In message <202401242347.40onlwkz099...@gndrsh.dnsmgr.net>, "Rodney W.
> > Grimes"
> > writes:
> > > > I would agree personally, to moving to ports (eg ports/sysutils) with
> > > > a DEPRECATED in the DESCR or something, or better yet a Make
> > > > invokation event to say "superceded, here is how to proceed against
> > > > advice") or something.
> > >
> > > They are totally useless as ports when your booted from install
> > > media and working from a standalone shell.  These are the exact
> > > times you want things like fdisk and bsdlabel so you can figure
> > > out wtf is going on, and bsdinstall is NOT gona help you.
> >
> > This is certainly a good point.
> >
>
> What can they do that gpart can't do?

This was quite a while ago, booted off my recovery USB attempting to repair 
some self caused damage. The ability to edit (vi) a file with starting 
addresses and lengths, visually using bsdlabel, was suited to my panicked 
state as I worked to recover the machine.

A visual view of columns of a bsdlabel, editing a label using vi, checking 
and double checking numbers before committing them is handy.The visual 
format and the ability to adjust the numbers in an editor before committing 
them is handy. You can't do this with gpart, as it's transactional. And 
bsdinstall doesn't give one the opportunity to check the numbers in detail 
on a console before committing them.

Maybe a good GSoC project may be to replace bsdlabel's driect writes to 
disk with geom calls. Though, t doesn't need to be bsdlabel, but some kind 
of utility that displays the existing label in an editor session where 
changes can be made, using the editor, and committed. This could even be an 
enhancement to bsdinstall: call it expert mode or whatever.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-24 Thread Cy Schubert

In message , "Patrick M. 
Hause
n" writes:
> Hi all,
>
> > Am 25.01.2024 um 00:47 schrieb Rodney W. Grimes =
> :
> >=20
> >> I would agree personally, to moving to ports (eg ports/sysutils) with
> >> a DEPRECATED in the DESCR or something, or better yet a Make
> >> invokation event to say "superceded, here is how to proceed against
> >> advice") or something.
> >=20
> > They are totally useless as ports when your booted from install
> > media and working from a standalone shell.  These are the exact
> > times you want things like fdisk and bsdlabel so you can figure
> > out wtf is going on, and bsdinstall is NOT gona help you.
> >=20
> > I know there are a boat load of people that have built there
> > own installers for VM's and stuff, running UFS and I bet you
> > they are using MBR disks too.  PLEASE do not kick these tiny
> > little and very usable and pretty univeral (as far as I know
> > ALL BSD's have fdisk and bsdlabel/disklabel) tools out of
> > the base system.
> >=20
> > The world is NOT 2TB nvme drives with GPT, EFI and ZFS,
> > yours might not be, but I am pretty certain I am not
> > alone in this other world.
>
> I totally undestand that point, but what exactly do these tools do that
> gpart cannot? On MBR disks? With BSD partitions?
>
> Ever since I found out that gpart can manage *all* on-disk partition =
> formats
> I have not been using anything else. You can create your MBR partitions
> and BSD labels just fine with gpart. At least in all situations I =
> encountered,
> there might of course be edge cases I simply don't know.

On occasion when trying to manipulate a disk label, gpart will refuse to. 
Usually when creating or manipulating a label on a zvol one doesn't want to 
use on the host system, that is destined to be used in a VM. It's simpler 
to create the partitions and labels beforehand, attach the zvol to the VM, 
boot and install (or test) within the VM. In this case one doesn't even 
care if geom sees the "disk" or its partitions on the host because the 
"disk" is destined for use in a VM.

I've created zvols for use by various VMs in this manner.

I agree with Rod's remark that when one is in panic mode working through a 
difficult situation extra tools, not fewer, can help.

Regarding extra tools, I do maintain a full copy of FreeBSD on a USB disk, 
in order to recover from catastrophic situations. They're extremely rare, 
the last of which was the result of a commit that broke loader (or was it a 
boot blocks -- I can't remember the exact details anymore) in 12 or 
13-CURRENT. The extra tools came in handy as I worked through the mess.

>
> gpart is not the "GPT partition tool". It's the universal swiss army =
> knife
> "GEOM partition tool" for all disk partitioning in any format supported.
>
> Kind regards,
> Patrick=
>

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-24 Thread Cy Schubert

In message <202401242347.40onlwkz099...@gndrsh.dnsmgr.net>, "Rodney W. 
Grimes"
writes:
> > I would agree personally, to moving to ports (eg ports/sysutils) with
> > a DEPRECATED in the DESCR or something, or better yet a Make
> > invokation event to say "superceded, here is how to proceed against
> > advice") or something.
>
> They are totally useless as ports when your booted from install
> media and working from a standalone shell.  These are the exact
> times you want things like fdisk and bsdlabel so you can figure
> out wtf is going on, and bsdinstall is NOT gona help you.

This is certainly a good point.

>
> I know there are a boat load of people that have built there
> own installers for VM's and stuff, running UFS and I bet you
> they are using MBR disks too.  PLEASE do not kick these tiny
> little and very usable and pretty univeral (as far as I know
> ALL BSD's have fdisk and bsdlabel/disklabel) tools out of
> the base system. 
>
> The world is NOT 2TB nvme drives with GPT, EFI and ZFS,
> yours might not be, but I am pretty certain I am not
> alone in this other world.
>
> > -G
> > 
> > On Thu, Jan 25, 2024 at 3:30?AM Warner Losh  wrote:
> > >
> > > On Wed, Jan 24, 2024 at 8:45?AM Ed Maste  wrote:
> > >>
> > >> MBR (PC BIOS) partition tables were historically maintained with
> > >> fdisk(8), but gpart(8) has long been the preferred method for working
> > >> with partition tables of all types. fdisk has been declared as
> > >> obsolete in the man page since 2015. Similarly BSD disklabels were
> > >> historically maintained with bsdlabel. It does not yet have a
> > >> deprecation notice - I have proposed a man page addition in
> > >> https://reviews.freebsd.org/D43563.
> > >>
> > >> I would like to disconnect these from the build, and subsequently
> > >> remove them. This is prompted by a recent bsdlabel bug report which
> > >> uncovered a longstanding buffer overflow in that tool. Effort is much
> > >> better focused on contemporary, maintained tools rather than
> > >> investigating issues in deprecated ones. Removing these tools would
> > >> happen in FreeBSD 15 only (no change in 14 or 13).
> > >>
> > >> Code review to disconnect fdisk: https://reviews.freebsd.org/D43575
> > >>
> > >> Note that this effort is limited to these maintenance tools only -
> > >> there is no change to kernel or gpart support for MBR or BSD
> > >> disklablel partitioning. That said, MBR partitioning and BSD
> > >> disklabels are best considered legacy formats and should be avoided
> > >> for new installations, if possible.
> > >>
> > >> If anyone is using fdisk and/or bsdlabel rather than gpart I would
> > >> appreciate knowing what is preventing you from using the contemporary
> > >> tools.
> > >
> > >
> > > nanobsd's legacy.sh still is using disklabel in two spots.
> > >
> > > But one is to just do gpart create -s bsd and the other is to display it.
>  Easy
> > > to fix, but even easier to delete legacy.sh entirely. It's not really nee
> ded any
> > > more and was a product of CHS addressing... Now that we use LBA, it's
> > > better to use the new embedded ones. Even at $WORK where we kinda
> > > use legacy, we replace the partitioning stuff with our own custom thing..
> .
> > >
> > > Those are the only users in the tree, but not for long :)
> > >
> > > fdisk was good, but somewhere around the CHS -> LBA transition things
> > > got weird with it, and for really big disks there were reports of issues 
> that
> > > I could never encounter when I set out to fix them... Most likely due to 
> a
> > > mismatch in the CHS data and the LBA data being recorded in the MBR.
> > > The in-kernel gpart copes so much better.
> > >
> > > I wouldn't object to making these ports, but both these programs use 'sek
> ret'
> > > bits from the kernel that might not remain exposed as we clean things up.
> > > Though the IOCTLs they do (or used to do) may no longer be relevant. It's
> > > been so long that I've forgotten
> > >
> > > Warner
> -- 
> Rod Grimes rgri...@freebsd.or
> g
>


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Removing fdisk and bsdlabel (legacy partition tools)

2024-01-24 Thread Cy Schubert

In message 
, Ed Maste writes:
> MBR (PC BIOS) partition tables were historically maintained with
> fdisk(8), but gpart(8) has long been the preferred method for working
> with partition tables of all types. fdisk has been declared as
> obsolete in the man page since 2015. Similarly BSD disklabels were
> historically maintained with bsdlabel. It does not yet have a
> deprecation notice - I have proposed a man page addition in
> https://reviews.freebsd.org/D43563.
>
> I would like to disconnect these from the build, and subsequently
> remove them. This is prompted by a recent bsdlabel bug report which
> uncovered a longstanding buffer overflow in that tool. Effort is much
> better focused on contemporary, maintained tools rather than
> investigating issues in deprecated ones. Removing these tools would
> happen in FreeBSD 15 only (no change in 14 or 13).
>
> Code review to disconnect fdisk: https://reviews.freebsd.org/D43575
>
> Note that this effort is limited to these maintenance tools only -
> there is no change to kernel or gpart support for MBR or BSD
> disklablel partitioning. That said, MBR partitioning and BSD
> disklabels are best considered legacy formats and should be avoided
> for new installations, if possible.
>
> If anyone is using fdisk and/or bsdlabel rather than gpart I would
> appreciate knowing what is preventing you from using the contemporary
> tools.
>

We need to fix the kern.geom.debugflags sysctl foot shooting option so that 
it works. (Not that bsdlabel or fdisk worked around the issue). Otherwise 
one is left with boot to single user or from alternate media if that 
doesn't work.

I do have a patch that circumvents the problem. I haven't looked it it in 
years and probably needs some cleanup though.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: NFSv4 crash of CURRENT

2024-01-14 Thread Cy Schubert

In message 
, Rick Macklem writes:
> On Sat, Jan 13, 2024 at 12:39=E2=80=AFPM Ronald Klop =
>  wrote:
> >
> >
> > Van: FreeBSD User 
> > Datum: 13 januari 2024 19:34
> > Aan: FreeBSD CURRENT 
> > Onderwerp: NFSv4 crash of CURRENT
> >
> > Hello,
> >
> > running CURRENT client (FreeBSD 15.0-CURRENT #4 main-n267556-69748e62e82a=
> : Sat Jan 13 18:08:32
> > CET 2024 amd64). One NFSv4 server is same OS revision as the mentioned cl=
> ient, other is FreeBSD
> > 13.2-RELEASE-p8. Both offer NFSv4 filesystems, non-kerberized.
> >
> > I can crash the client reproducable by accessing the one or other NFSv4 F=
> S (a simple ls -la).
> > The NFSv4 FS is backed by ZFS (if this matters). I do not have physicla a=
> ccess to the client
> > host, luckily the box recovers.
> Did you rebuild both the nfscommon and nfscl modules from the same sources?
> I did a commit to main that changes the interface between these two
> modules and did bump the
> __FreeBSD_version to 1500010, which should cause both to be rebuilt.
> (If you have "options NFSCL" in your kernel config, both should have
> been rebuilt as a part of
> the kernel build.)
>

Is anyone by chance seeing autofs in the backtrace too?


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: CDE on FreeBSD 14.0 Release

2023-11-28 Thread Cy Schubert

In message <20231127172506.horde.iibmvttnl5om_j0h4cvd...@webmail.in-berlin.d
e>,
 "Rolf M. Dietze" writes:
> Hi,
>
> might be that I am on the wrong list for this, since my Problem
> exists on 14.0Release.
>
> After an out of the box install of FreeBSD 14.0 Release and following
> https://forums.freebsd.org/threads/setting-up-common-desktop-environment-for-
> modern-use.69475/

First, unless you plan on using the CDE calendar, you don't needs the dtspc 
entry in inetd.conf.

I'm currently running CDE, installed using pkg, on 14.0-RELEASE using 
dtlogin at $JOB. At home I use CDE on 15-CURRENT. I use xdm at home instead 
because dtlogin doesn't support PAM* but xdm does. Caveat, if you use xdm 
you will need to add dtsession to your .xsession file.

* My home network uses MIT KRB5 to serve passwords and LDAP to serve 
UID/GID information. This requires pam_krb5, and as dtlogin is not PAM 
aware it can only work with accounts in /etc/passwd. Hence xdm. I'll 
probably submit a pull request to the CDE development team one day but 
considering all the other things on my plate, PAM support within dtlogin is 
pretty low on my priority list.

> I am stuck with CDE. CDE loads, dtlogin starts an presents the login
> or greeter window, but upon logging in I get a popup telling
> "The desktop messaging system could not be started". Guess I am
> missing some config steps. Any pointer for further reading?
> I had CDE running on 13.2Release

Make sure your /etc/hosts has an entry for localhost. Also make sure your 
machine's PTR record is correct and that it matches its A record. It's most 
likely your hostname doesn't match any IP on your network. Just add your 
correct hostname's IP to /etc/hosts or if using dhcp, prefix your hostname 
to the localhost entry in /etc/hosts like this,

127.0.0.1  my_hostname_whatever_it_is localhost localhost.my.domain

Also make sure rpcbind is running.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

ÀÀÀÀÀÀÀÀÀ

autofs -hosts maps

2023-11-17 Thread Cy Schubert

Hi,

The discussion about NFS exports of ZFS snapshots prompted me to play 
around with -hosts maps on my network. -hosts maps are mounted on /net.

I've discovered that -hosts maps don't work with most shares but do with 
others. I've only played with this for a few minutes so I don't fully 
understand why some maps work and others not. Some of underlying 
directories that don't work are ZFS while others are UFS.

Yet, auto_home maps mounting the same directories does work. And mounting 
the shares by hand (using mount_nfs) also works.

Just putting this out there should someone else have noticed this.

I'll play around with this a little over the weekend.



-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀ

Re: revision not displayed in a2440348eed7

2023-11-08 Thread Cy Schubert

On Wed, 8 Nov 2023 15:14:34 +0100
Marek Zarychta  wrote:

> W dniu 8.11.2023 o 14:10, Marek Zarychta pisze:
> >
> > W dniu 27.09.2023 o 01:07, Tomoaki AOKI pisze:  
> >> On Tue, 26 Sep 2023 15:19:46 -0700
> >> Cy Schubert  wrote:
> >>  
> >>> In message <20230926231431.20f42fec1075c3980446c...@dec.sakura.ne.jp>,
> >>> Tomoaki
> >>> AOKI writes:  
> >>>> On Tue, 26 Sep 2023 15:48:50 +0200
> >>>> Marek Zarychta  wrote:
> >>>>  
> >>>>> W dniu 26.09.2023 oÂ 13:30, KIRIYAMA Kazuhiko pisze:  
> >>>>>> At least up to 15.0-CURRENT, nothing has happend by
> >>>>>> WITHOUT_REPRODUCIBLE_BUILD=yes. Something has changed in
> >>>>>> 15.0-CURRENT at some time. I've rebuilded with 3fb80f1476c7,
> >>>>>> but revision not showed by `uname -a' ;-(
> >>>>>>
> >>>>>> What changed   
> >>>>> Nothing changed. Perhaps your build system can't check git hash ? If
> >>>>> your sources are from git repository, you need at least git-lite
> >>>>> installed and full git repository available on build machine. If you
> >>>>> checked out the repository with gitup and have gitup installed, it
> >>>>> should also work. It won't work if your build machine has accessÂ  to
> >>>>> only a part of the repository like worktree.
> >>>>>
> >>>>> Cheers
> >>>>>
> >>>>> -- 
> >>>>> Marek Zarychta  
> >>>> Just a possibility, but copying src tree to directory other than the
> >>>> directory where checked out from git repo and building there could
> >>>> lose track with git hash.
> >>>>
> >>>> Another possibility is that if you build src with any user other than
> >>>> the one owning local (pulled) git repo could also lose track with git
> >>>> hash. For example, if I `git log HEAD` with regular user and the local
> >>>> repo is pulled by root, it fails. No special configuration is done.
> >>>>
> >>>> % git log HEAD
> >>>> fatal: detected dubious ownership in repository at '/usr/src'
> >>>> To add an exception for this directory, call:
> >>>>
> >>>>  git config --global --add safe.directory /usr/src
> >>>>
> >>>>  
> >>> This could be due to e6dc6a27230, which was committed this morning. 
> >>> There
> >>> is discussion on the src commits ML (dev-commits-src-all,
> >>> dev-commits-src-main) about reverting the change.
> >>>
> >>>
> >>> -- 
> >>> Cheers,
> >>> Cy Schubert 
> >>> FreeBSD UNIX:     Web: https://FreeBSD.org
> >>> NTP:       Web: https://nwtime.org
> >>>
> >>>     e^(i*pi)+1=0  
> >> Would be unrelated here, unfortunately.
> >> As the subject says, the commit the original reporter is bitten at (not
> >> bi-sected) is at a2440348eed7, which is before e6dc6a27230.  
> >
> > Let's refresh this thread. It looks like (at least for stable/14) 
> > build system doesn't hardcode revision into the kernel anymore. Last 
> > time it worked to me was just after branching stable/14. Today I tried 
> > to build kernel from sources mounted over NFS and I ened with:
> >
> > # strings /usr/obj/usr/src/amd64.amd64/sys/BSDONDELL/kernel | grep 
> > 14.0-STABLE
> > @(#)FreeBSD 14.0-STABLE #6 -dirty: Tue Nov  7 14:04:35 CET 2023
> > FreeBSD 14.0-STABLE #6 -dirty: Tue Nov  7 14:04:35 CET 2023
> > 14.0-STABLE
> >
> > the source repository is updated, consisted, but mounted read-only 
> > over NFS
> >
> > /usr/src# git status
> > On branch stable/14
> > Your branch is up to date with 'origin/stable/14'.
> >
> > Untracked files:
> >   (use "git add ..." to include in what will be committed)
> >     sys/amd64/conf/BSDONDELL
> >
> > It took 2.53 seconds to enumerate untracked files.
> > See 'git help status' for information on how to improve this.
> >
> > nothing added to commit but untracked files present (use "git add" to 
> > track)
> >
> >
> > Any clues what could be wrong ? Does /usr/src/  require write 
> > permissions now ?  
> 
> 
> I am sorry for the false alarm. It looks like using META MODE prevented 
> updating this info. Af

Re: Kernel with INVARIANTS panicing if drm is loaded

2023-11-06 Thread Cy Schubert

,
> arg=0x80a472c8 ) at /usr/src/sys/dev/vt/vt_core.c:101
> 8
> #30 0x8078ffcf in atkbd_intr (kbd=0x80cef898 ,
> arg=) at /usr/src/sys/dev/atkbdc/atkbd.c:565
> #31 0x804b1376 in intr_event_execute_handlers (ie=0xf800010ece00,
> p=) at /usr/src/sys/kern/kern_intr.c:1205
> #32 ithread_execute_handlers (ie=0xf800010ece00, p=)
> at /usr/src/sys/kern/kern_intr.c:1218
> #33 ithread_loop (arg=arg@entry=0xf80001c5aea0)
> at /usr/src/sys/kern/kern_intr.c:1306
> #34 0x804adae2 in fork_exit (
> callout=0x804b1120 , arg=0xf80001c5aea0,
> frame=0xfe00ce259f40) at /usr/src/sys/kern/kern_fork.c:1160
> #35 
> #36 0x0b88 in ?? ()
> Backtrace stopped: Cannot access memory at address 0xbc7
> (kgdb)
>
>
>
>
>
>

Can you submit a PR for this? GFP_KERNEL is an alias for M_WAITOK, which is 
verboten when intel_atomic_state_alloc() makes its call to kzalloc(), an 
alias for kmalloc().


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀ

Re: how to set vfs.zfs.arc.max in 15-current ?

2023-10-12 Thread Cy Schubert

In message , void writes:
> Is there a new way to set arc.max in 15-current?
>
> It's no longer settable (except to "0") in main-n265801 (Oct 7th)
> while multiuser.
>
> # sysctl vfs.zfs.arc.max=8589934592
> vfs.zfs.arc.max: 0
> sysctl: vfs.zfs.arc.max=8589934592: Invalid argument

Try reducing your arc.max by an order of 10. This suggests that it's 
probably failing in param_set_arc_max() in the val >= arc_all_memory()
comparison..


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀ

Re: revision not displayed in a2440348eed7

2023-09-26 Thread Cy Schubert

In message <20230926231431.20f42fec1075c3980446c...@dec.sakura.ne.jp>, 
Tomoaki
AOKI writes:
> On Tue, 26 Sep 2023 15:48:50 +0200
> Marek Zarychta  wrote:
>
> > W dniu 26.09.2023 oÂ 13:30, KIRIYAMA Kazuhiko pisze:
> > > At least up to 15.0-CURRENT, nothing has happend by
> > > WITHOUT_REPRODUCIBLE_BUILD=yes. Something has changed in
> > > 15.0-CURRENT at some time. I've rebuilded with 3fb80f1476c7,
> > > but revision not showed by `uname -a' ;-(
> > >
> > > What changed 
> > 
> > Nothing changed. Perhaps your build system can't check git hash ? If 
> > your sources are from git repository, you need at least git-lite 
> > installed and full git repository available on build machine. If you 
> > checked out the repository with gitup and have gitup installed, it 
> > should also work. It won't work if your build machine has accessÂ  to 
> > only a part of the repository like worktree.
> > 
> > Cheers
> > 
> > -- 
> > Marek Zarychta
>
> Just a possibility, but copying src tree to directory other than the
> directory where checked out from git repo and building there could
> lose track with git hash.
>
> Another possibility is that if you build src with any user other than
> the one owning local (pulled) git repo could also lose track with git
> hash. For example, if I `git log HEAD` with regular user and the local
> repo is pulled by root, it fails. No special configuration is done.
>
> % git log HEAD
> fatal: detected dubious ownership in repository at '/usr/src'
> To add an exception for this directory, call:
>
> git config --global --add safe.directory /usr/src
>
>

This could be due to e6dc6a27230, which was committed this morning. There 
is discussion on the src commits ML (dev-commits-src-all, 
dev-commits-src-main) about reverting the change.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀÀ

Re: ZFS Panics Still

2023-09-11 Thread Cy Schubert

On Tue, 12 Sep 2023 05:29:41 +0100
Graham Perrin  wrote:

> On 12/09/2023 00:17, Cy Schubert wrote:
> 
> > … poudriere …  
> 
> > panic: vm_page_dequeue_deferred: page 0xfe000b7e9748 has unexpected 
> > queue state
> > …  
> <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=265795> is for arm64. 
> Should we broaden the hardware field, there?

Probably.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

ZFS Panics Still

2023-09-11 Thread Cy Schubert

294, a = {{flags = 16, queue = 255 '\377', 
  act_count = 0 '\000'}, _bits = 16711696}, order = 13 '\r', 
  pool = 0 '\000', flags = 0 '\000', oflags = 0 '\000', psind = 0 '\000', 
  segind = 5 '\005', valid = 255 '\377', dirty = 0 '\000'}
(kgdb) 

At frame 13 *vfsp contains:

$7 = {mnt_vfs_ops = 1, mnt_kern_flag = 1090847177, mnt_flag = 268439568, 
  mnt_pcpu = 0xfe010d84cbc0, mnt_rootvnode = 0x0, 
  mnt_vnodecovered = 0xf8008dc691c0, 
  mnt_op = 0x83bb3080 , 
  mnt_vfc = 0x83bb3228 , mnt_mtx = {lock_object = {
  lo_name = 0x80abf68d "struct mount mtx", lo_flags = 16973824, 
  lo_data = 0, lo_witness = 0xf8021fd75b00}, mtx_lock = 0}, 
  mnt_gen = 1, mnt_list = {tqe_next = 0x0, tqe_prev = 0xfe00c45f2168}, 
  mnt_syncer = 0x0, mnt_ref = 27, mnt_nvnodelist = {
tqh_first = 0xf800665c5000, tqh_last = 0xf8007cc90aa8}, 
  mnt_nvnodelistsize = 24, mnt_writeopcount = 1, mnt_opt = 0xf80084d56cd0, 
  mnt_optnew = 0x0, mnt_stat = {f_version = 538182936, f_type = 222, 
f_flags = 268439568, f_bsize = 512, f_iosize = 131072, 
f_blocks = 251997486, f_bfree = 248369646, f_bavail = 248369646, 
f_files = 248516350, f_ffree = 248369646, f_syncwrites = 0, 
f_asyncwrites = 0, f_syncreads = 0, f_asyncreads = 0, 
f_nvnodelistsize = 113, f_spare0 = 0, f_spare = {0, 0, 0, 0, 0, 0, 0, 0, 
  0}, f_namemax = 255, f_owner = 0, f_fsid = {val = {-313067424, 
1444670686}}, f_charspare = '\000' , 
f_fstypename = "zfs", '\000' , 
f_mntfromname = "bob/poudriere/bob/jails/HEADi386-new-ports-ref/04", '\000' 
, 
f_mntonname = "/poudriere/bob/data/.m/HEADi386-new-ports/04", '\000' 
}, mnt_cred = 0xf800c83cb200, mnt_data = 
0xf800b713e000, 
  mnt_time = 0, mnt_iosize_max = 65536, mnt_export = 0x0, mnt_label = 0x0, 
  mnt_hashseed = 1242221059, mnt_lockref = 0, mnt_secondary_writes = 0, 
  mnt_secondary_accwrites = 0, mnt_susp_owner = 0x0, mnt_exjail = 0x0, 
  mnt_gjprovider = 0x0, mnt_listmtx = {lock_object = {
  lo_name = 0x80b1539e "struct mount vlist mtx", 
  lo_flags = 16973824, lo_data = 0, lo_witness = 0xf8021fd82a80}, 
mtx_lock = 0}, mnt_lazyvnodelist = {tqh_first = 0x0, 
tqh_last = 0xfe00da21d550}, mnt_lazyvnodelistsize = 0, 
  mnt_upper_pending = 0, mnt_explock = {lock_object = {
  lo_name = 0x80b6167f "explock", lo_flags = 108199936, 
  lo_data = 0, lo_witness = 0xf8021fd82880}, lk_lock = 1, 
lk_exslpfail 
ÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀÀ
tqh_first = 0x0, tqh_last = 0xfe00da21d590}, mnt_notify = {
tqh_first = 0x0, tqh_last = 0xfe00da21d5a0}, mnt_taskqueue_link = {
stqe_next = 0x0}, mnt_taskqueue_flags = 0, mnt_unmount_retries = 0}


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Possible regression in main causing poor performance

2023-09-05 Thread Cy Schubert

In message , Mark Millard 
write
s:
> On Sep 5, 2023, at 08:58, Cy Schubert  wrote:
>
> > In message <20230830204406.24fd...@slippy.cwsent.com>, Cy Schubert =
> writes:
> >> In message <20230830184426.gm1...@freebsd.org>, Glen Barber writes:
> >>>=20
> >>>=20
> >>> On Mon, Aug 28, 2023 at 06:06:09PM -0700, Mark Millard wrote:
> >>>> Has any more been learned about this? Is it still an issue?
> >>>> =3D20
> >>>=20
> >>> I rebooted the machine before the ALPHA3 builds with no other =
> changes,
> >>> and the overall times for 14.x builds went back to normal.  I do not
> >>> like to experiment with builders during a release cycle, but as we =
> are
> >>> going to have 15.x snapshots available moving forward, I will not =
> reboot
> >>> that machine next week in hopes to get some useful data.
> >>>=20
> >>> If my memory serves correctly, mm@ has a pending ZFS import from
> >>> upstream for both main and stable/14 pending.  Whether or not that =
> will
> >>> resolve any issue here, I do not know.
> >>=20
> >> Two of my poudriere builder machines have experienced different =
> panics=20
> >> since the ZFS import two days ago. The problems have been documented =
> on the=20
> >> -current list.
> >=20
> > Just an update.
> >=20
> > The three pull requests amotin@ pointed to did resolve all my =
> problems. A=20
> > subsequent update which included the latest ZFS commits worked just as=20=
>
> > well, without any new regressions. AFAIAC this problem has been =
> resolved.
> >=20
> > The random email corruptions have also been resolved.
> >=20
> >=20
> > --=20
> > Cheers,
> > Cy Schubert 
> > FreeBSD UNIX: Web:  https://FreeBSD.org
> > NTP:   Web:  https://nwtime.org
> >=20
> > e^(i*pi)+1=3D0
> >=20
> >=20
> >=20
> >=20
> > =C2=9C9O8
>
> The just-above quoted line looks like a corruption to me.
>

Hmm. Just to rule out that a build of the exmh2 and nmh-devel packages 
might have been corrupt, I've rebuilt the two and will continue to monitor.

This email was sent by a rebuilt exmh2 and nmh-devel.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀ

Re: Possible regression in main causing poor performance

2023-08-30 Thread Cy Schubert

In message <20230830184426.gm1...@freebsd.org>, Glen Barber writes:
> 
>
> On Mon, Aug 28, 2023 at 06:06:09PM -0700, Mark Millard wrote:
> > Has any more been learned about this? Is it still an issue?
> >=20
>
> I rebooted the machine before the ALPHA3 builds with no other changes,
> and the overall times for 14.x builds went back to normal.  I do not
> like to experiment with builders during a release cycle, but as we are
> going to have 15.x snapshots available moving forward, I will not reboot
> that machine next week in hopes to get some useful data.
>
> If my memory serves correctly, mm@ has a pending ZFS import from
> upstream for both main and stable/14 pending.  Whether or not that will
> resolve any issue here, I do not know.

Two of my poudriere builder machines have experienced different panics 
since the ZFS import two days ago. The problems have been documented on the 
-current list.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0



ÀÀÀÀÀÀÀÀ

Another ZFS Panic -- buffer modified while frozen

2023-08-30 Thread Cy Schubert

A different panic on a different amd64 machine also running poudriere but 
building amd64 packages. Exmh was just started, displaying back to my 
laptop at the time of panic.

panic: buffer modified while frozen!
cpuid = 1
time = 1693417762
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
0xfe008e67fba0
vpanic() at vpanic+0x132/frame 0xfe008e67fcd0
panic() at panic+0x43/frame 0xfe008e67fd30
arc_cksum_verify() at arc_cksum_verify+0x12c/frame 0xfe008e67fd80
arc_buf_destroy_impl() at arc_buf_destroy_impl+0x6f/frame 0xfe008e67fdc0
arc_buf_destroy() at arc_buf_destroy+0xd5/frame 0xfe008e67fdf0
dbuf_destroy() at dbuf_destroy+0x60/frame 0xfe008e67fe40
dbuf_evict_one() at dbuf_evict_one+0x176/frame 0xfe008e67fe70
dbuf_evict_thread() at dbuf_evict_thread+0x345/frame 0xfe008e67fef0
fork_exit() at fork_exit+0x82/frame 0xfe008e67ff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe008e67ff30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 3h46m10s
Dumping 1962 out of 8122 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91
%

__curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu
,
(kgdb) bt
#0  __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
#1  doadump (textdump=textdump@entry=1)
at /opt/src/git-src/sys/kern/kern_shutdown.c:405
#2  0x806c1b30 in kern_reboot (howto=260)
at /opt/src/git-src/sys/kern/kern_shutdown.c:526
#3  0x806c202f in vpanic (
fmt=0x83d82b7c "buffer modified while frozen!", 
ap=ap@entry=0xfe008e67fd10)
at /opt/src/git-src/sys/kern/kern_shutdown.c:970
#4  0x806c1dd3 in panic (fmt=)
at /opt/src/git-src/sys/kern/kern_shutdown.c:894
#5  0x83ae5f2c in arc_cksum_verify (buf=0xf80188cde180)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/arc.c:1475
#6  0x83ae99ff in arc_buf_destroy_impl (
buf=buf@entry=0xf80188cde180)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/arc.c:3113
#7  0x83ae9625 in arc_buf_destroy (buf=0xf80188cde180, 
tag=tag@entry=0xf80104a534c8)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/arc.c:3889
#8  0x83b0eee0 in dbuf_destroy (db=db@entry=0xf80104a534c8)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/dbuf.c:2983
#9  0x83b17996 in dbuf_evict_one ()
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/dbuf.c:781
--Type  for more, q to quit, c to continue without paging--c
#10 0x83b0c345 in dbuf_evict_thread (unused=)
at /opt/src/git-src/sys/contrib/openzfs/module/zfs/dbuf.c:819
#11 0x80677ab2 in fork_exit (
callout=0x83b0c000 , arg=0x0, 
frame=0xfe008e67ff40) at /opt/src/git-src/sys/kern/kern_fork.c:1160
#12 
(kgdb) 


FreeBSD cwsys 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #4 
komquats-n26508
9-b22aae410bc7: Wed Aug 30 04:38:24 PDT 2023 root@cwsys:/export/obj/opt/
src/
git-src/amd64.amd64/sys/BREAK2 amd64

Almost the same configuration as the other machine.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀ

Re: ZFS Page Derefrence

2023-08-30 Thread Cy Schubert

In message , Mark Johnston writes:
> On Tue, Aug 29, 2023 at 07:08:35PM -0700, Cy Schubert wrote:
> > Hi
> > 
> > Just got the following panic on an and64 machine running poudriere building
>  
> > i386 packages.
> > 
> > panic: vm_page_dequeue_deferred: page 0xfe000b222808 has unexpected 
> > queue state^M
> > [...]
> > 
> > uname reports,
> > 
> > FreeBSD bob 15.0-CURRENT FreeBSD 15.0-CURRENT amd64 150 #1 
> > komquats-n265075-2e8edbc285cf: Tue Aug 29 03:51:59 PDT 2023 
> > root@cwsys:/export/obj/opt/src/git-src/amd64.amd64/sys/BREAK2 amd64
> > 
> > My BREAK2 kernel removes devices I don't use and enables keystrokes to 
> > interrupt the system from the conosle (conserver). Local patches affect 
> > ipfilter only.
> > 
> > Head of core.txt:
> > 
> > __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:57
> > 57  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct
>  
> > pcpu
> > ,
> > (kgdb) #0  __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:
> 5
> > 7
> > #1  doadump (textdump=textdump@entry=1)
> > at /opt/src/git-src/sys/kern/kern_shutdown.c:405
> > #2  0x806c1b30 in kern_reboot (howto=260)
> > at /opt/src/git-src/sys/kern/kern_shutdown.c:526
> > #3  0x806c202f in vpanic (
> > fmt=0x80b5da55 "%s: page %p has unexpected queue state",
> > ap=ap@entry=0xfe00bf55d770)
> > at /opt/src/git-src/sys/kern/kern_shutdown.c:970
> > #4  0x806c1dd3 in panic (fmt=)
> > at /opt/src/git-src/sys/kern/kern_shutdown.c:894
> > #5  0x809daab2 in vm_page_dequeue_deferred (m=,
> > m@entry=0xfe000b222808) at /opt/src/git-src/sys/vm/vm_page.c:3790
> > #6  0x809ddfeb in vm_page_free_prep (m=m@entry=0xfe000b222808)
> > at /opt/src/git-src/sys/vm/vm_page.c:3928
>
> Could you please print/x *m from this frame?

Sure.

(kgdb) print/x *m
$1 = {plinks = {q = {tqe_next = 0x, 
  tqe_prev = 0x}, s = {ss = {
sle_next = 0x}}, memguard = {p = 
0x,
  v = 0x}, uma = {slab = 0x, 
  zone = 0x}}, listq = {tqe_next = 0x, 
tqe_prev = 0x}, object = 0x0, pindex = 0x572c, 
  phys_addr = 0x1b67d5000, md = {pv_list = {tqh_first = 0x0, 
  tqh_last = 0xfe000b222840}, pv_gen = 0xf4a, pat_mode = 0x6}, 
  ref_count = 0x0, busy_lock = 0xfffe, a = {{flags = 0x10, queue = 
0xff,
  act_count = 0x0}, _bits = 0xff0010}, order = 0xd, pool = 0x0, 
  flags = 0x1, oflags = 0x0, psind = 0x0, segind = 0x5, valid = 0xff, 
  dirty = 0x0}
(kgdb) 


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀÀ

ZFS Page Derefrence

2023-08-29 Thread Cy Schubert

dule/os/freebsd/zfs/kmod_core.
c:16
8
#16 0x8054b482 in devfs_ioctl (ap=0xfe00bf55dc50)
at /opt/src/git-src/sys/fs/devfs/devfs_vnops.c:933
#17 0x807cf032 in vn_ioctl (fp=0xf801909f6870,
com=, data=0xfe00bf55dd50,
active_cred=0xf800b6ed0b00, td=)
at /opt/src/git-src/sys/kern/vfs_vnops.c:1701
#18 0x8054bb5e in devfs_ioctl_f (fp=,
fp@entry=,
com=,
com@entry=,
data=,
data@entry=,
cred=,
cred@entry=,
td=,
td@entry=)
at /opt/src/git-src/sys/fs/devfs/devfs_vnops.c:864
#19 0x8073aca6 in fo_ioctl (fp=0xf801909f6870, com=3222821401,
data=, active_cred=, td=0xfe00c3e43900)
at /opt/src/git-src/sys/sys/file.h:366
#20 kern_ioctl (td=td@entry=0xfe00c3e43900, fd=4,
com=com@entry=3222821401, data=,
data@entry=0xfe00bf55dd50 "\017")
at /opt/src/git-src/sys/kern/sys_generic.c:805
#21 0x8073a9b2 in sys_ioctl (td=0xfe00c3e43900,
uap=0xfe00c3e43d00) at /opt/src/git-src/sys/kern/sys_generic.c:713
#22 0x80a73a88 in syscallenter (td=)
at /opt/src/git-src/sys/amd64/amd64/../../kern/subr_syscall.c:187
#23 amd64_syscall (td=0xfe00c3e43900, traced=0)
at /opt/src/git-src/sys/amd64/amd64/trap.c:1197
#24 
#25 0x191264a4fbca in ?? ()
Backtrace stopped: Cannot access memory at address 0x19125ca905c8
(kgdb)

*vp looks good.

Dump is available if needed.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0



ÀÀÀÀÀÀÀÀ

Re: Possible issue with linux xattr support?

2023-08-27 Thread Cy Schubert

On August 27, 2023 12:55:23 PM PDT, Felix Palmen  wrote:
>* Dmitry Chagin  [20230827 22:46]:
>> On Sun, Aug 27, 2023 at 07:59:32PM +0200, Felix Palmen wrote:
>> > * Dmitry Chagin  [20230827 20:54]:
>> > > 1. which fs are you using?
>> > 
>> > ZFS.
>> > 
>> > > 2. jailed?
>> > 
>> > Yes, this is during building ports with poudriere.
>> > 
>> 
>> I think it's a weird prohibition on changing system namespace extattr
>> attributes, look to comments in extattr_check_cred()
>
>Maybe that's when I should finally start trying to understand the stuff
>in src.git ;)
>
>> I can fix this completely disabling exttatr for jailed proc,
>> however, it's gonna be bullshit, though
>
>Would probably be better than nothing. AFAIK, "Linux jails" are used a
>lot, probably with userlands from distributions actually using xattr.
>
>Cheers, Felix
>

If we are to break it to fix a problem, maybe a sysctl to enable/disable then?


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:Web:  https://FreeBSD.org
NTP: Web:  https://nwtime.org
e^(i*pi)+1=0

Pardon the typos. Small keyboard in use.

Re: kabylake + drm-515-kmod/drm-510-kmod hangs

2023-08-21 Thread Cy Schubert

In message <76275772-a9c3-ed59-5fb3-47a13d2a6...@nomadlogic.org>, Pete 
Wright w
rites:
> hey there,
> i've got a kabylake laptop that i've been using with drm-kmod for 
> several years without much hassle.  after upgrading to a new CURRENT 
> this weekend I've found that when loading either the 510 or 515 drm-kmod 
> kernel modules my system will hang.
>
> unfortunately i am not getting a panic or crash, the screen stops 
> updating and i am unable to ping or SSH into the system.  interestingly 
> the capslock LED still toggles but doing a CTL+ALT+DEL does not seem to 
> do anything useful and i have to manually power cycle.
>
> any tips for finding out what's going on?  i've booted the system with 
> verbose dmesg output, and loaded the module with "kldload -v" but do not 
> get any useful output.
>
> here's the uname:
> FreeBSD colony 14.0-ALPHA2 FreeBSD 14.0-ALPHA2 amd64 1400096 #0 
> main-n264924-e2340276fc73: Sun Aug 20 21:28:44 PDT 2023 
> pete@colony:/usr/obj/usr/home/pete/git/freebsd/amd64.amd64/sys/GENERIC amd64
>
>
> these are the log messages i see before the system locks up:
> Aug 21 10:40:34 colony kernel: iic0:  on iicbus0
> Aug 21 10:40:35 colony kernel: drmn0:  on vgapci0
> Aug 21 10:40:35 colony kernel: vgapci0: child drmn0 requested pci_enable_io
> Aug 21 10:40:35 colony syslogd: last message repeated 1 times
> Aug 21 10:40:35 colony kernel: [drm] Unable to create a private tmpfs 
> mount, hugepage support will be disabled(-19).
> Aug 21 10:40:35 colony kernel: [drm] Got stolen memory base 0x4b80, 
> size 0x400
> Aug 21 10:40:35 colony kernel: lkpi_iic0:  on drmn0
> Aug 21 10:40:35 colony kernel: iicbus1:  on lkpi_iic0
> Aug 21 10:40:35 colony kernel: iic1:  on iicbus1
> Aug 21 10:40:35 colony kernel: lkpi_iic1:  on drmn0
> Aug 21 10:40:35 colony kernel: iicbus2:  on lkpi_iic1
> Aug 21 10:40:35 colony kernel: iic2:  on iicbus2
> Aug 21 10:40:35 colony kernel: lkpi_iic2:  on drmn0
> Aug 21 10:40:35 colony kernel: iicbus3:  on lkpi_iic2
> Aug 21 10:40:35 colony kernel: iic3:  on iicbus3
> Aug 21 10:40:35 colony kernel: lkpi_iic3:  on drmn0
> Aug 21 10:40:35 colony kernel: iicbus4:  on lkpi_iic3
> Aug 21 10:40:35 colony kernel: iic4:  on iicbus4
>
>
>
> cheers,
> -pete
>
> -- 
> Pete Wright
> p...@nomadlogic.org
> @nomadlogicLA
>

Rebuilding drm-51[05]-kmod after an update to LinuxKPI affecting the ABI 
used by the drm modules is required. Typically I get a kernel panic on a 
page fault when this occurs. Depending on how memory is laid out on your 
system you may get a hang instead.

You need to install thew new kernel and world first. Disable xdm, gdm, any 
other *dm, or simply not use startx. From a text console session rebuild 
the drm port and reinstall it.

I use poudriere here. My procedure is to update the poudriere jail, rebuild 
the port (-C option) and pkg upgrade -f or pkg install -f. Use this 
approach if you use poudriere.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


ÀÀÀÀÀÀÀÀ

Re: Defaulting serial communication to 115200 bps for FreeBSD 14

2023-08-15 Thread Cy Schubert

Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

   message dated "Tue, 15 Aug 2023 17:18:37 -0400."
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii

In message 
, Ed Maste writes:
> FreeBSD currently uses 9600 bps as the default for serial
> communication -- in the boot loader, kernel serial console, /etc/ttys,
> and so on. This was consistent with most equipment in the 90s, when
> these defaults were established. Today 115200 bps seems to be much
> more common, and I'm proposing that we make it the default for FreeBSD
> 14.0.
>
> I have a review open: https://reviews.freebsd.org/D36295. There are a
> few minor nits in the review to be addressed still but assuming
> there's general agreement I'll iterate on those and commit this in a
> few logical chunks.
>

There should probably be an UPDATING entry for those who use boot0 to 
revert back to 9600 in that case.

Re: ZFS deadlock in 14

2023-08-12 Thread Cy Schubert

bb112e in sleepq_wait (wchan=, 
> wchan@entry=0xf80108fe1540, pri=, pri@entry=0) at 
> /usr/src/sys/kern/subr_sleepqueue.c:660
>#4  0x80ade224 in _cv_wait (cvp=0xf80108fe1540, 
> lock=0xf80108fe14d0) at /usr/src/sys/kern/kern_condvar.c:146
>#5  0x820b383b in txg_wait_synced_impl (dp=0xf80108fe1000, 
> txg=8751529, txg@entry=0, wait_sig=wait_sig@entry=0) at 
> /usr/src/sys/contrib/openzfs/module/zfs/txg.c:726
>#6  0x820b31eb in txg_wait_synced (dp=, 
> txg=, txg@entry=0) at 
> /usr/src/sys/contrib/openzfs/module/zfs/txg.c:736
>#7  0x81fa5fc5 in zfsvfs_teardown (zfsvfs=0xf81ab3c81000, 
> unmounting=unmounting@entry=0) at 
> /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1661
>#8  0x81fa5db9 in zfs_suspend_fs (zfsvfs=) at 
> /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/zfs_vfsops.c:1954
>#9  0x821680ff in zfs_ioc_rollback (fsname=0xfe0301913000 
> "zroot-default-ref/03", fsname@entry= available>, innvl=, innvl@entry= is not available>, 
>outnvl=0xf81601748640, outnvl@entry= is not available>) at /usr/src/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:4401
>#10 0x82163836 in zfsdev_ioctl_common (vecnum=vecnum@entry=25, 
> zc=zc@entry=0xfe0301913000, flag=flag@entry=0) at 
> /usr/src/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7798
>#11 0x81f969aa in zfsdev_ioctl (dev=, 
> zcmd=, zcmd@entry= available>, arg=0xfe02fd546d50 "\017", arg@entry= value is not available>, flag=, td=)
>at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c:168
>#12 0x809dc9cc in devfs_ioctl (ap=0xfe02fd546c40) at 
> /usr/src/sys/fs/devfs/devfs_vnops.c:935
>#13 0x80c5cac0 in vn_ioctl (fp=0xf81e9207f0a0, com= out>, data=0xfe02fd546d50, active_cred=0xf8026a65a900, 
> td=) at /usr/src/sys/kern/vfs_vnops.c:1697
>#14 0x809dd07e in devfs_ioctl_f (fp=, fp@entry= reading variable: value is not available>, com=, 
> com@entry=, 
> data=, data@entry= available>, 
>cred=, cred@entry= available>, td=, td@entry= available>) at /usr/src/sys/fs/devfs/devfs_vnops.c:866
>#15 0x80bca1ce in fo_ioctl (fp=0xf81e9207f0a0, com=3222821401, 
> data=, active_cred=, td=) at 
> /usr/src/sys/sys/file.h:367
>#16 kern_ioctl (td=td@entry=0xfe0314249020, fd=, 
> com=com@entry=3222821401, data=, data@entry=0xfe02fd546d50 
> "\017") at /usr/src/sys/kern/sys_generic.c:807
>#17 0x80bc9f64 in sys_ioctl (td=0xfe0314249020, 
> td@entry=, 
> uap=0xfe0314249420, uap@entry= available>) at /usr/src/sys/kern/sys_generic.c:715
>#18 0x8104d8e0 in syscallenter (td=) at 
> /usr/src/sys/amd64/amd64/../../kern/subr_syscall.c:190
>#19 amd64_syscall (td=0xfe0314249020, traced=0) at 
> /usr/src/sys/amd64/amd64/trap.c:1199
>#20 
>#21 0x05c8e125953a in ?? ()
>Backtrace stopped: Cannot access memory at address 0x5c8d89c8018
>
>DES

Yes, this is the same panic my poudriere builder building amd64 packages gets. 
The poudeiere builder, also running on amd64, building i386 packages gets a 
different panic. I'm on my phone and don't have a keyboard to look up the PR 
number.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:Web:  https://FreeBSD.org
NTP: Web:  https://nwtime.org
e^(i*pi)+1=0

Pardon the typos. Small keyboard in use.

Re: ZFS deadlock in 14

2023-08-11 Thread Cy Schubert

The poudriere build machine building amd64 packages also panicked. But with:

Dumping 2577 out of 8122 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91
%

__curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:59
59  __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct 
pcpu
,
(kgdb) #0  __curthread () at /opt/src/git-src/sys/amd64/include/pcpu_aux.h:5
9
#1  doadump (textdump=textdump@entry=1)
at /opt/src/git-src/sys/kern/kern_shutdown.c:407
#2  0x806c10e0 in kern_reboot (howto=260)
at /opt/src/git-src/sys/kern/kern_shutdown.c:528
#3  0x806c15df in vpanic (
fmt=0x80b6c5f5 "%s: possible deadlock detected for %p (%s), 
blocked
for %d ticks\n", ap=ap@entry=0xfe008e698e90)
at /opt/src/git-src/sys/kern/kern_shutdown.c:972
#4  0x806c1383 in panic (fmt=)
at /opt/src/git-src/sys/kern/kern_shutdown.c:896
#5  0x8064a5ea in deadlkres ()
at /opt/src/git-src/sys/kern/kern_clock.c:201
#6  0x80677632 in fork_exit (callout=0x8064a2c0 ,
arg=0x0, frame=0xfe008e698f40)
at /opt/src/git-src/sys/kern/kern_fork.c:1162
#7  
(kgdb)

This is consistent with PR/271945. Reducing -J to 1 or 5:1 circumvents this 
panic.

This is certainly a different panic from the one experienced on the 
poudriere builder building i386 packages. Both machines run in amd64 mode.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

    e^(i*pi)+1=0


Cy Schubert writes:
> This is new. Instead of affecting the machine with poudriere building amd64 
> packages, it affected the other machine with poudriere building i386 
> packages. This is new since the two recent ZFS patches.
>
> Don't get me wrong, the two new patches have resulted in I believe better 
> availability of the poudriere machine building amd64 packages. I doubt the 
> two patches caused this but they may have exposed this problem, probably 
> fixed by another patch or two.
>
> Sorry, there was no dump produced by this panic. I'll need to check the 
> config of this machine, swap is a gmirror, which it doesn't like to dump 
> to. Below are serial console messages captured by conserver.
>
> panic: vm_page_dequeue_deferred: page 0xfe00028fb0d0 has unexpected 
> queue state^M
> cpuid = 3^M
> time = 1691807572^M
> KDB: stack backtrace:^M
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 
> 0xfe00c50bc600^M
> vpanic() at vpanic+0x132/frame 0xfe00c50bc730^M
> panic() at panic+0x43/frame 0xfe00c50bc790^M
> vm_page_dequeue_deferred() at vm_page_dequeue_deferred+0xb2/frame 
> 0xfe00c50bc7a0^M
> vm_page_free_prep() at vm_page_free_prep+0x11b/frame 0xfe00c50bc7c0^M
> vm_page_free_toq() at vm_page_free_toq+0x12/frame 0xfe00c50bc7f0^M
> vm_object_page_remove() at vm_object_page_remove+0xb6/frame 
> 0xfe00c50bc850^M
> vn_pages_remove_valid() at vn_pages_remove_valid+0x48/frame 
> 0xfe00c50bc880^M
> zfs_rezget() at zfs_rezget+0x35/frame 0xfe00c50bca60^M
> zfs_resume_fs() at zfs_resume_fs+0x1c8/frame 0xfe00c50bcab0^M
> zfs_ioc_rollback() at zfs_ioc_rollback+0x157/frame 0xfe00c50bcb00^M
> zfsdev_ioctl_common() at zfsdev_ioctl_common+0x612/frame 
> 0xfe00c50bcbc0^M
> zfsdev_ioctl() at zfsdev_ioctl+0x12a/frame 0xfe00c50bcbf0^M
> devfs_ioctl() at devfs_ioctl+0xd2/frame 0xfe00c50bcc40^M
> vn_ioctl() at vn_ioctl+0xc2/frame 0xfe00c50bccb0^M
> devfs_ioctl_f() at devfs_ioctl_f+0x1e/frame 0xfe00c50bccd0^M
> kern_ioctl() at kern_ioctl+0x286/frame 0xfe00c50bcd30^M
> sys_ioctl() at sys_ioctl+0x152/frame 0xfe00c50bce00^M
> amd64_syscall() at amd64_syscall+0x138/frame 0xfe00c50bcf30^M
> fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe00c50bcf30^M
> --- syscall (54, FreeBSD ELF64, ioctl), rip = 0x20938296107a, rsp = 
> 0x209379aeee18, rbp = 0x209379aeee90 ---^M
> Uptime: 42m33s^M
> Automatic reboot in 15 seconds - press a key on the console to abort^M
> Rebooting...^M
> cpu_reset: Restarting BSP^M
> cpu_reset_proxy: Stopped CPU 3^M
>
>
> -- 
> Cheers,
> Cy Schubert 
> FreeBSD UNIX: Web:  https://FreeBSD.org
> NTP:   Web:  https://nwtime.org
>
>   e^(i*pi)+1=0
>
>
> Cy Schubert writes:
> > I haven't experienced any problems (yet) either.
> >
> >
> > -- 
> > Cheers,
> > Cy Schubert 
> > FreeBSD UNIX: Web:  https://FreeBSD.org
> > NTP:   Web:  https://nwtime.org
> >
> > e^(i*pi)+1=0
> >
> >
> > In message  c
> > om>
> > , Kevin Bowling writes:
> > > The two MFVs on head have improved/fixed stability with po

Re: ZFS deadlock in 14

2023-08-10 Thread Cy Schubert

I haven't experienced any problems (yet) either.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


In message 
, Kevin Bowling writes:
> The two MFVs on head have improved/fixed stability with poudriere for
> me 48 core bare metal.
>
> On Thu, Aug 10, 2023 at 6:37=E2=80=AFAM Cy Schubert  com> wrote:
> >
> > In message  l.c
> > om>
> > , Kevin Bowling writes:
> > > Possibly https://github.com/openzfs/zfs/commit/2cb992a99ccadb78d97049b4=
> 0bd4=3D
> > > 42eb4fdc549d
> > >
> > > On Tue, Aug 8, 2023 at 10:08=3DE2=3D80=3DAFAM Dag-Erling Sm=3DC3=3DB8rg=
> rav  > > sd.org> wrote:
> > > >
> > > > At some point between 42d088299c (4 May) and f0c9703301 (26 June), a
> > > > deadlock was introduced in ZFS.  It is still present as of 9c2823bae9=
>  (4
> > > > August) and is 100% reproducable just by starting poudriere bulk in a
> > > > 16-core VM and waiting a few hours until deadlkres kicks in.  In the
> > > > latest instance, deadlkres complained about a bash process:
> > > >
> > > > #0  sched_switch (td=3D3Dtd@entry=3D3D0xfe02fb1d8000, flags=
> =3D3Dflags@e=3D
> > > ntry=3D3D259) at /usr/src/sys/kern/sched_ule.c:2299
> > > > #1  0x80b5a0a3 in mi_switch (flags=3D3Dflags@entry=3D3D25=
> 9) at /u=3D
> > > sr/src/sys/kern/kern_synch.c:550
> > > > #2  0x80babcb4 in sleepq_switch (wchan=3D3D0xf818543a=
> 9e70, =3D
> > > pri=3D3D64) at /usr/src/sys/kern/subr_sleepqueue.c:609
> > > > #3  0x80babb8c in sleepq_wait (wchan=3D3D, p=
> ri=3D3D<=3D
> > > unavailable>) at /usr/src/sys/kern/subr_sleepqueue.c:660
> > > > #4  0x80b1c1b0 in sleeplk (lk=3D3Dlk@entry=3D3D0xf818=
> 543a9e70=3D
> > > , flags=3D3Dflags@entry=3D3D2121728, ilk=3D3Dilk@entry=3D3D0x0, wmesg=
> =3D3Dwmesg@entry=3D
> > > =3D3D0x8222a054 "zfs", pri=3D3D, pri@entry=3D3D6=
> 4, timo=3D3D=3D
> > > timo@entry=3D3D6, queue=3D3D1) at /usr/src/sys/kern/kern_lock.c:310
> > > > #5  0x80b1a23f in lockmgr_slock_hard (lk=3D3D0xf81854=
> 3a9e70=3D
> > > , flags=3D3D2121728, ilk=3D3D, file=3D3D0x812544=
> fb "/usr/s=3D
> > > rc/sys/kern/vfs_subr.c", line=3D3D3057, lwa=3D3D0x0) at /usr/src/sys/ke=
> rn/kern_=3D
> > > lock.c:705
> > > > #6  0x80c59ec3 in VOP_LOCK1 (vp=3D3D0xf818543a9e00, f=
> lags=3D
> > > =3D3D2105344, file=3D3D0x812544fb "/usr/src/sys/kern/vfs_subr.c=
> ", line=3D
> > > =3D3D3057) at ./vnode_if.h:1120
> > > > #7  _vn_lock (vp=3D3Dvp@entry=3D3D0xf818543a9e00, flags=3D3D2=
> 105344, fi=3D
> > > le=3D3D, line=3D3D, line@entry=3D3D3057) at /=
> usr/src/sy=3D
> > > s/kern/vfs_vnops.c:1815
> > > > #8  0x80c4173d in vget_finish (vp=3D3D0xf818543a9e00,=
>  flags=3D
> > > =3D3D, vs=3D3Dvs@entry=3D3DVGET_USECOUNT) at /usr/src/sys/=
> kern/vfs_s=3D
> > > ubr.c:3057
> > > > #9  0x80c1c9b7 in cache_lookup (dvp=3D3Ddvp@entry=3D3D0xf=
> 802c=3D
> > > d02ac40, vpp=3D3Dvpp@entry=3D3D0xfe046b20ac30, cnp=3D3Dcnp@entry=3D=
> 3D0xfe04=3D
> > > 6b20ac58, tsp=3D3Dtsp@entry=3D3D0x0, ticksp=3D3Dticksp@entry=3D3D0x0) a=
> t /usr/src/s=3D
> > > ys/kern/vfs_cache.c:2086
> > > > #10 0x80c2150c in vfs_cache_lookup (ap=3D3D >) at =3D
> > > /usr/src/sys/kern/vfs_cache.c:3068
> > > > #11 0x80c32c37 in VOP_LOOKUP (dvp=3D3D0xf802cd02ac40,=
>  vpp=3D
> > > =3D3D0xfe046b20ac30, cnp=3D3D0xfe046b20ac58) at ./vnode_if.h:69
> > > > #12 vfs_lookup (ndp=3D3Dndp@entry=3D3D0xfe046b20abd8) at /usr=
> /src/sys=3D
> > > /kern/vfs_lookup.c:1266
> > > > #13 0x80c31ce1 in namei (ndp=3D3Dndp@entry=3D3D0xfe04=
> 6b20abd8=3D
> > > ) at /usr/src/sys/kern/vfs_lookup.c:689
> > > > #14 0x80c52090 in kern_statat (td=3D3D0xfe02fb1d8000,=
>  flag=3D
> > > =3D3D, fd=3D3D-100, path=3D3D0xa75b480e070  t access m=3D
> > > emory at address 0xa75b480e070>, pathseg=3D3Dpathseg@entry=3D3DUIO_USER=
> SPACE, s=3D
> > > bp=3D3Dsbp@entry=3D3D0xfe046b20ad18)
> > > > at /usr/src/sys/kern/vfs_syscalls.c:2441
> > > > #15 0x80c52797 in sys_fstatat (td=3D3D, uap=
> =3D3D0xff=3D
> > > fffe02fb1d8400) at /usr/src/sys/kern/vfs

Re: ZFS deadlock in 14

2023-08-10 Thread Cy Schubert

ff8204075b in dsl_dataset_rollback (fsname=3D >, fsname@entry=3D0xfe0401d15000 "zroot/poudriere/jails/13amd64-default=
> -ref/15", tosnap=3D, owner=3D, result=3Dresul=
> t@entry=3D0xf81c826a9ea0)
> > at /usr/src/sys/contrib/openzfs/module/zfs/dsl_dataset.c:3261
> > #10 0x82168dd9 in zfs_ioc_rollback (fsname=3D0xfe0401d150=
> 00 "zroot/poudriere/jails/13amd64-default-ref/15", fsname@entry=3D ading variable: value is not available>, innvl=3D, innvl@entry=
> =3D,
> > outnvl=3D0xf81c826a9ea0, outnvl@entry=3D le: value is not available>) at /usr/src/sys/contrib/openzfs/module/zfs/zfs=
> _ioctl.c:4405
> > #11 0x82164522 in zfsdev_ioctl_common (vecnum=3Dvecnum@entry=
> =3D25, zc=3Dzc@entry=3D0xfe0401d15000, flag=3Dflag@entry=3D0) at /usr/s=
> rc/sys/contrib/openzfs/module/zfs/zfs_ioctl.c:7798
> > #12 0x81f97fca in zfsdev_ioctl (dev=3D, zcmd=
> =3D, zcmd@entry=3D ble>, arg=3D0xfe02fb827d50 "\017", arg@entry=3D  value is not available>, flag=3D, td=3D)
> > at /usr/src/sys/contrib/openzfs/module/os/freebsd/zfs/kmod_core.c=
> :168
> > #13 0x809d6212 in devfs_ioctl (ap=3D0xfe02fb827c50) at /u=
> sr/src/sys/fs/devfs/devfs_vnops.c:935
> > #14 0x80c585f2 in vn_ioctl (fp=3D0xf8052cdd80f0, com=3D ptimized out>, data=3D0xfe02fb827d50, active_cred=3D0xf80122ab1e00,=
>  td=3D) at /usr/src/sys/kern/vfs_vnops.c:1704
> > #15 0x809d68ee in devfs_ioctl_f (fp=3D, fp@entry=
> =3D, com=3D, c=
> om@entry=3D, data=3D lable>, data@entry=3D,
> > cred=3D, cred@entry=3D  is not available>, td=3D, td@entry=3D  value is not available>) at /usr/src/sys/fs/devfs/devfs_vnops.c:866
> > #16 0x80bc57e6 in fo_ioctl (fp=3D0xf8052cdd80f0, com=3D32=
> 22821401, data=3D, active_cred=3D, td=3D0xfe0=
> 422ef8560) at /usr/src/sys/sys/file.h:367
> > #17 kern_ioctl (td=3Dtd@entry=3D0xfe0422ef8560, fd=3D4, com=3Dcom=
> @entry=3D3222821401, data=3D, data@entry=3D0xfffffe02fb827d50 =
> "\017") at /usr/src/sys/kern/sys_generic.c:807
> > #18 0x80bc54f2 in sys_ioctl (td=3D0xfe0422ef8560, uap=3D0=
> xfe0422ef8960) at /usr/src/sys/kern/sys_generic.c:715
> > #19 0x81049398 in syscallenter (td=3D) at /usr=
> /src/sys/amd64/amd64/../../kern/subr_syscall.c:190
> > #20 amd64_syscall (td=3D0xfe0422ef8560, traced=3D0) at /usr/src/s=
> ys/amd64/amd64/trap.c:1199
[...]

The backtrace looks different though it certainly smells like PR/271945.

I've had similar to PR/271945 panics on an amd64 with a mirrored zpool with 
four vdevs running poudriere with AMD64 jails. My other amd64 with a 
mirrored zpool with two vdevs using i386 jails has no such issue. All other 
workloads are unaffected.

On the affected machine running poudriere bulk with -J N:1 circumvents the 
issue. So far. There were two openzfs cherry-picks this morning. I intend 
to try them against a full bulk build later today.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: dhclient unable to negotiate on WPA2-Enterprise network (eduroam)

2023-07-01 Thread Cy Schubert

Pull request #787. I can look at it.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0


In message , "Naman 
Sood
" writes:
> Hi,
>
> wpa_supplicant-devel unfortunately did not fix my problem. However, applying 
> this patch did: https://github.com/freebsd/freebsd-src/commit/b393d862dc78a99
> 203455b01e685fb2108e51b05.
>
> Thanks,
> Naman.
> (they/them)
>
> On Sat, Jul 1, 2023, at 00:14, Cy Schubert wrote:
> > On Fri, 30 Jun 2023 10:56:54 -0700
> > Cy Schubert  wrote:
> > 
> > > Can you try wpa_supplicant-devel? It was updated last week. The -devel po
> rt tracks the latest WPA development. 
> > > 
> > > 
> > 
> > Now that I'm back at home, looking at hostap (our upstream w1.fi) commit
> > logs, there have been a few OpenSSL 3.0 patches applied to wpa since
> > wpa_supplicant/hostapd 2.10 was imported into FreeBSD base (on Jan 18,
> > 2022). Try the wpa_supplicant-devel port, it's current to the latest
> > upstream w1.fi commit. If it fixes your problem, I will import it into
> > FreeBSD base as well
> > 
> > I backport a few patches applied to base back into both ports next week.
> > 
> > 
> > -- 
> > Cheers,
> > Cy Schubert 
> > FreeBSD UNIX: Web:  https://FreeBSD.org 
> > <https://freebs
> d.org/>
> > NTP:   Web:  https://nwtime.org
> > 
> > e^(i*pi)+1=0
> > 
> >

Re: dhclient unable to negotiate on WPA2-Enterprise network (eduroam)

2023-06-30 Thread Cy Schubert

On Fri, 30 Jun 2023 10:56:54 -0700
Cy Schubert  wrote:

> Can you try wpa_supplicant-devel? It was updated last week. The -devel port 
> tracks the latest WPA development. 
> 
> 

Now that I'm back at home, looking at hostap (our upstream w1.fi) commit
logs, there have been a few OpenSSL 3.0 patches applied to wpa since
wpa_supplicant/hostapd 2.10 was imported into FreeBSD base (on Jan 18,
2022). Try the wpa_supplicant-devel port, it's current to the latest
upstream w1.fi commit. If it fixes your problem, I will import it into
FreeBSD base as well

I backport a few patches applied to base back into both ports next week.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: dhclient unable to negotiate on WPA2-Enterprise network (eduroam)

2023-06-30 Thread Cy Schubert

Can you try wpa_supplicant-devel? It was updated last week. The -devel port 
tracks the latest WPA development. 


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:Web:  https://FreeBSD.org
NTP: Web:  https://nwtime.org
e^(i*pi)+1=0

Pardon the typos. Small keyboard in use.

Re: Following a panic (271945): zpool status reports 1 data error but identifies no file

2023-06-11 Thread Cy Schubert

On June 11, 2023 5:58:49 AM PDT, Miroslav Lachman <000.f...@quip.cz> wrote:
>On 11/06/2023 14:02, Graham Perrin wrote:
>> See below, should I begin scrubbing? Or (before I begin) might zdb reveal 
>> something useful?
>> 
>> The supposed error was observable after 
>> <https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271945>
>> 
>> /271945 – panic: deadlres_td_sleep_q: possible deadlock detected for 
>> 0xfe0133324ac0 (stat), blocked for 1801328 ticks//
>> /
>
>[..]
>
>> errors: Permanent errors have been detected in the following files:
>
>
>Can it be that the error was in file which is deleted now? Or was in snapshot 
>which was already destroyed by some automatic script?
>
>Kind regards
>Miroslav Lachman
>
>

Zpool export/import or reboot may fix this.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX:Web:  https://FreeBSD.org
NTP: Web:  https://nwtime.org
e^(i*pi)+1=0

Pardon the typos. Small keyboard in use.

Re: another crash and going forward with zfs

2023-04-17 Thread Cy Schubert

In message , Pawel Jakub 
Dawi
dek writes:
> On 4/18/23 05:14, Mateusz Guzik wrote:
> > On 4/17/23, Pawel Jakub Dawidek  wrote:
> >> Correct me if I'm wrong, but from my understanding there were zero
> >> problems with block cloning when it wasn't in use or now disabled.
> >>
> >> The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
> >> avoid mess like this and give us more time to sort all the problems out
> >> while making it easy for people to try it.
> >>
> >> If there is no plan to revert the whole import, I don't see what value
> >> removing just block cloning will bring if it is now disabled by default
> >> and didn't cause any problems when disabled.
> >>
> > 
> > The feature definitely was not properly stress tested and what not and
> > trying to do it keeps running into panics. Given the complexity of the
> > feature I would expect there are many bug lurking, some of which
> > possibly related to the on disk format. Not having to deal with any of
> > this is can be arranged as described above and is imo the most
> > sensible route given the timeline for 14.0
>
> Block cloning doesn't create, remove or modify any on-disk data until it 
> is in use.
>
> Again, if we are not going to revert the whole merge, I see no point in 
> reverting block cloning as until it is enabled, its code is not 
> executed. This allow people who upgraded the pools to do nothing special 
> and it will allow people to test it easily.

In this case zpool upgrade and zpool status should return no feature 
upgrades are available instead of enticing users to zpool upgrade. The 
userland zpool command should test for this sysctl and print nothing 
regarding block_cloning. I can see a scenario when a user zpool upgrades 
their pools, notices the sysctl and does the unthinkable. Not only would 
this fill the mailing lists with angry chatter but it would spawn a number 
of PRs plus give us a lot of bad press for data loss.

Should we keep the new ZFS in 14, we should:

1. Make sure that zpool(8) does not mention or offer block_cloning in any 
way if the sysctl is disabled.

2. Print a cautionary note in release notes advising people not to enable 
this experimental sysctl. Maybe even have it print "(experimental)" to warn 
users that it will hurt.

3. Update the man pages to caution that block_cloning is experimental and 
unstable.

It's not enough to have a sysctl without hiding block_cloning completely 
from view. Only expose it in zpool(8) when the sysctl is enabled. Let's 
avoid people mistakenly enabling it.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert

In message <5a47f62d-0e78-4c3e-84c0-45eeb03c7...@yahoo.com>, Mark Millard 
write
s:
> On Apr 15, 2023, at 07:36, Cy Schubert  =
> wrote:
>
> > In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>,=20=
>
> > FreeBSD Us
> > er writes:
> >> Am Thu, 13 Apr 2023 22:18:04 -0700
> >> Mark Millard  schrieb:
> >>=20
> >>> On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> >>>=20
> >>>> Mark Millard wrote: =20
> >>>>> FYI: in my original report for a context that has never had
> >>>>> block_cloning enabled, I reported BOTH missing files and
> >>>>> file content corruption in the poudriere-devel bulk build
> >>>>> testing. This predates:
> >>>>> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> >>>>> but had the changes from:
> >>>>> https://github.com/openzfs/zfs/pull/14739/files
> >>>>> The files were missing from packages installed to be used
> >>>>> during a port's build. No other types of examples of missing
> >>>>> files happened. (But only 11 ports failed.) =20
> >>>> I also don't have block_cloning enabled. "Missing files" prior to =
> brt_rev
> >> ert may actually
> >>>> be present, but as the corruption also messes with the file(1) =
> signature,
> >> some tools like
> >>>> ldconfig report them as missing. =20
> >>>=20
> >>> For reference, the specific messages that were not explicit
> >>> null-byte complaints were (some shown with a little context):
> >>>=20
> >>>=20
> >>> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxml2.so - =
> not found
> >>> =3D=3D=3D>   Installing existing package =
> /packages/All/libxml2-2.10.3_1.pkg =20
> >>> [CA72_ZFS] Installing libxml2-2.10.3_1...
> >>> [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
> >>> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxml2.so - =
> found
> >>> (/usr/local/lib/libxml2.so) . . .
> >>> [CA72_ZFS] Extracting libxslt-1.1.37: .. done
> >>> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxslt.so - =
> found
> >>> (/usr/local/lib/libxslt.so) =3D=3D=3D>   Returning to build of =
> py39-lxml-4.9.2 =20
> >>> . . .
> >>> =3D=3D=3D>  Configuring for py39-lxml-4.9.2 =20
> >>> Building lxml version 4.9.2.
> >>> Building with Cython 0.29.33.
> >>> Error: Please make sure the libxml2 and libxslt development packages =
> are in
> >> stalled.
> >>>=20
> >>>=20
> >>> [CA72_ZFS] Extracting libunistring-1.1: .. done
> >>> =3D=3D=3D>   libidn2-2.3.4 depends on shared library: =
> libunistring.so - not found
> >>=20
> >>>=20
> >>>=20
> >>> [CA72_ZFS] Extracting gmp-6.2.1: .. done
> >>> =3D=3D=3D>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not =
> found =20
> >>>=20
> >>>=20
> >>> =3D=3D=3D>   nettle-3.8.1 depends on shared library: libgmp.so - not =
> found
> >>> =3D=3D=3D>   Installing existing package /packages/All/gmp-6.2.1.pkg =
> =20
> >>> [CA72_ZFS] Installing gmp-6.2.1...
> >>> the most recent version of gmp-6.2.1 is already installed
> >>> =3D=3D=3D>   nettle-3.8.1 depends on shared library: libgmp.so - not =
> found =20
> >>> *** Error code 1
> >>>=20
> >>>=20
> >>> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> >>>=20
> >>>=20
> >>> checking for GNU=20
> >>> M4 that supports accurate traces... configure: error: no acceptable =
> m4 coul
> >> d be found in
> >>> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is =
> recommended.
> >>> GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> >>> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> >>>=20
> >>>=20
> >>> ld: error: /usr/local/lib/libblkid.a: unknown file type
> >>>=20
> >>>=20
> >>> =3D=3D=3D
> >>> Mark Millard
> >>> marklmi at yahoo.com
> >>>=20
> >>>=20
> >>=20
> >> Hello=20
> >>=20
> >> whar is the recent status of fixing/mitigate this desatrous bug? =
> Especially f
> >>

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert

On Sat, 15 Apr 2023 18:07:34 +0200
Florian Smeets  wrote:

> On 15.04.23 17:51, FreeBSD User wrote:
> > Am Sat, 15 Apr 2023 07:36:25 -0700
> > Cy Schubert  schrieb:  
> >>
> >> With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded
> >> blocks. #14739" patch I didn't have any issues, except for email messages
> >> with corruption in my sent directory, nowhere else. I'm still investigating
> >> the email messages issue. IMO one is generally safe to run poudriere on the
> >> latest ZFS with the additional patch.  
> 
> This is also my current observation. I have 2 hosts where I was 
> unfortunate enough to update at the wrong time. I currently *think* that 
> I'm *not* seeing data corruption with head from April 12th and this 
> patch 
> https://github.com/openzfs/zfs/commit/d3a6e5ca3b2f684132238ca968bf0b96f17ec7e1.diff
>  
> applied.
> 
> One pool has been upgraded with feature@block_cloning and the other hasn't.
> > 
> > FreeBSD 14.0-CURRENT #8 main-n262175-5ee1c90e50ce: Sat Apr 15 07:57:16 CEST 
> > 2023 amd64
> > 
> > The box is crashing while trying to update ports with the well known issue:
> > 
> > Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
> >   
> On the pool that has block_cloning enabled I see the above insta panic 
> when poudriere starts building. I found a workaround though:
> 
> --- /usr/local/share/poudriere/include/fs.sh.orig 2023-04-15 
> 18:03:50.090823000 +0200
> +++ /usr/local/share/poudriere/include/fs.sh  2023-04-15 
> 18:04:04.144736000 +0200
> @@ -295,7 +295,6 @@
>   fi
> 
>   zfs clone -o mountpoint=${mnt} \
> - -o sync=disabled \
>   -o atime=off \
>   -o compression=off \
>   ${fs}@${snap} \
> 
> With this workaround I was able to build thousands of packages without 
> panics or failures due to data corruption.

Thanks for this. I'll test this next week. A one should be able to test
this by hand to capture a dump.

> 
> Florian



-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert

In message <20230415175218.777d0...@thor.intern.walstatt.dynvpn.de>, 
FreeBSD Us
er writes:
> Am Sat, 15 Apr 2023 07:36:25 -0700
> Cy Schubert  schrieb:
>
> > In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>, 
> > FreeBSD Us
> > er writes:
> > > Am Thu, 13 Apr 2023 22:18:04 -0700
> > > Mark Millard  schrieb:
> > >  
> > > > On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> > > >   
> > > > > Mark Millard wrote:
> > > > >> FYI: in my original report for a context that has never had
> > > > >> block_cloning enabled, I reported BOTH missing files and
> > > > >> file content corruption in the poudriere-devel bulk build
> > > > >> testing. This predates:
> > > > >> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> > > > >> but had the changes from:
> > > > >> https://github.com/openzfs/zfs/pull/14739/files
> > > > >> The files were missing from packages installed to be used
> > > > >> during a port's build. No other types of examples of missing
> > > > >> files happened. (But only 11 ports failed.)
> > > > > I also don't have block_cloning enabled. "Missing files" prior to brt
> _rev  
> > > ert may actually  
> > > > > be present, but as the corruption also messes with the file(1) signat
> ure,  
> > >  some tools like  
> > > > > ldconfig report them as missing.
> > > > 
> > > > For reference, the specific messages that were not explicit
> > > > null-byte complaints were (some shown with a little context):
> > > > 
> > > >   
> > > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not foun
> d
> > > > ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg  
>   
> > > > [CA72_ZFS] Installing libxml2-2.10.3_1...
> > > > [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done  
> > > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found  
> > > > (/usr/local/lib/libxml2.so) . . .
> > > > [CA72_ZFS] Extracting libxslt-1.1.37: .. done  
> > > > ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found  
> > > > (/usr/local/lib/libxslt.so) ===>   Returning to build of py39-lxml-4.9.
> 2  
> > > > . . .  
> > > > ===>  Configuring for py39-lxml-4.9.2
> > > > Building lxml version 4.9.2.
> > > > Building with Cython 0.29.33.
> > > > Error: Please make sure the libxml2 and libxslt development packages ar
> e in  
> > > stalled.  
> > > > 
> > > > 
> > > > [CA72_ZFS] Extracting libunistring-1.1: .. done  
> > > > ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not f
> ound  
> > > 
> > > > 
> > > > 
> > > > [CA72_ZFS] Extracting gmp-6.2.1: .. done  
> > > > ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found   
>  
> > > > 
> > > >   
> > > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> > > > ===>   Installing existing package /packages/All/gmp-6.2.1.pkg
> > > > [CA72_ZFS] Installing gmp-6.2.1...
> > > > the most recent version of gmp-6.2.1 is already installed  
> > > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found   
>  
> > > > *** Error code 1
> > > > 
> > > > 
> > > > autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> > > > 
> > > > 
> > > > checking for GNU 
> > > > M4 that supports accurate traces... configure: error: no acceptable m4 
> coul  
> > > d be found in  
> > > > $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is recommende
> d.
> > > > GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> > > > Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> > > > 
> > > > 
> > > > ld: error: /usr/local/lib/libblkid.a: unknown file type
> > > > 
> > > > 
> > > > ===
> > > > Mark Millard
> > > > marklmi at yahoo.com
> > > > 
> > > >   
> > >
> > > Hello 
> > >
> > > whar is the recent status of fixing/mitigate this desatr

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert

rectory, nowhere else. I'm still investigating 
the email messages issue. IMO one is generally safe to run poudriere on the 
latest ZFS with the additional patch.

My tests of the additional patch concluded that it resolved my last 
problems, except for the sent email problem I'm still investigating. I'm 
sure there's a simple explanation for it, i.e. the email thread was 
corrupted by the EXDEV regression which cannot be fixed by anything, even 
reverting to the previous ZFS -- the data in those files will remain 
damaged regardless.

I cannot speak to the others who have had poudriere and other issues. I 
never had any problems with poudriere on top of the new ZFS.

WRT reverting block_cloning pools to without, your only option is to backup 
your pool and recreate it without block_cloning. Then restore your data.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert

In message 
, Mateusz Guzik writes:
> On 4/13/23, Cy Schubert  wrote:
> > On Thu, 13 Apr 2023 19:54:42 +0900
> > Pawe=C5=82 Jakub Dawidek  wrote:
> >
> >> On Apr 13, 2023, at 16:10, Cy Schubert  wrote=
> :
> >> >
> >> > =EF=BB=BFIn message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Sc=
> hubert
> >> > writes:
> >> > In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert
> >> > writes:
> >> >> In message , Mark
> >> >> Millard
> >> >>> write
> >> >>> s:
> >> >>> [This just puts my prior reply's material into Cy's
> >> >>>> adjusted resend of the original. The To/Cc should
> >> >>>> be coomplete this time.]
> >> >>>>
> >> >>>> On Apr 12, 2023, at 22:52, Cy Schubert  =
> =3D
> >> >>>> wrote:
> >> >>>>
> >> >>>> In message , Mark =
> =3D
> >> >>>>> Millard=3D20
> >> >>>> write
> >> >>>>> s:
> >> >>>>> From: Charlie Li  wrote on
> >> >>>>>> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> >> >>>>>> =3D20
> >> >>>>>> Charlie Li wrote:
> >> >>>>>>> Mateusz Guzik wrote:
> >> >>>>>>>> can you please test poudriere with
> >> >>>>>>>>> https://github.com/openzfs/zfs/pull/14739/files
> >> >>>>>>>>> =3D20
> >> >>>>>>>>> After applying, on the md(4)-backed pool regardless of =3D3D
> >> >>>>>>>> block_cloning,=3D3D20
> >> >>>>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> =3D
> >> >>>>>>>> Will=3D3D20=3D3D
> >> >>>> =3D20
> >> >>>>>> report back on poudriere results (no block_cloning).
> >> >>>>>>>> =3D3D20
> >> >>>>>>>> As for poudriere, build failures are still rolling in. These ar=
> e
> >> >>>>>>>> =3D
> >> >>>>>>> (and=3D3D20=3D3D
> >> >>>> =3D20
> >> >>>>>> have been) entirely random on every run. Some examples from this =
> =3D
> >> >>>>>>> run:
> >> >>>> =3D3D20
> >> >>>>>>> lang/php81:
> >> >>>>>>> - post-install: @${INSTALL_DATA}
> >> >>>>>>> ${WRKSRC}/php.ini-development=3D3D20
> >> >>>>>>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D3D
> >> >>>>>>> ${STAGEDIR}/${PREFIX}/etc
> >> >>>>>> - consumers fail to build due to corrupted php.conf packaged
> >> >>>>>>> =3D3D20
> >> >>>>>>> devel/ninja:
> >> >>>>>>> - phase: stage
> >> >>>>>>> - install -s -m 555=3D3D20
> >> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D3D20
> >> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> >> >>>>>>> - consumers fail to build due to corrupted bin/ninja packaged
> >> >>>>>>> =3D3D20
> >> >>>>>>> devel/netsurf-buildsystem:
> >> >>>>>>> - phase: stage
> >> >>>>>>> - mkdir -p=3D3D20
> >> >>>>>>> =3D3D
> >> >>>>>>> =3D
> >> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> /share/n
> >> >>>> e=3D
> >> >> =3D3D
> >> >>>> tsurf-buildsystem/makefiles=3D3D20
> >> >>>>>> =3D3D
> >> >>>>>>> =3D
> >> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> /share/n
> >> >>>> e=3D
> >> >> =3D3D
> >> >>>> tsurf-buildsystem/testtools
> >> >>>>>> for M in Makefile.top Makefile.tools Makefile.subdir =3D3D
> >> >>>>>>> Makefile.pkgconfig=3D3D20
> >> >>>>>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.op

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert

On Thu, 13 Apr 2023 19:54:42 +0900
Paweł Jakub Dawidek  wrote:

> On Apr 13, 2023, at 16:10, Cy Schubert  wrote:
> > 
> > In message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Schubert writes:
> > In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert writes:
> >> In message , Mark Millard
> >>> write
> >>> s:
> >>> [This just puts my prior reply's material into Cy's
> >>>> adjusted resend of the original. The To/Cc should
> >>>> be coomplete this time.]
> >>>> 
> >>>> On Apr 12, 2023, at 22:52, Cy Schubert  =
> >>>> wrote:
> >>>> 
> >>>> In message , Mark =
> >>>>> Millard=20
> >>>> write
> >>>>> s:
> >>>>> From: Charlie Li  wrote on
> >>>>>> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> >>>>>> =20
> >>>>>> Charlie Li wrote:
> >>>>>>> Mateusz Guzik wrote:
> >>>>>>>> can you please test poudriere with
> >>>>>>>>> https://github.com/openzfs/zfs/pull/14739/files
> >>>>>>>>> =20
> >>>>>>>>> After applying, on the md(4)-backed pool regardless of =3D
> >>>>>>>> block_cloning,=3D20
> >>>>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> >>>>>>>> Will=3D20=3D
> >>>> =20
> >>>>>> report back on poudriere results (no block_cloning).
> >>>>>>>> =3D20
> >>>>>>>> As for poudriere, build failures are still rolling in. These are =
> >>>>>>> (and=3D20=3D
> >>>> =20
> >>>>>> have been) entirely random on every run. Some examples from this =
> >>>>>>> run:
> >>>> =3D20
> >>>>>>> lang/php81:
> >>>>>>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> >>>>>>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> >>>>>>> ${STAGEDIR}/${PREFIX}/etc
> >>>>>> - consumers fail to build due to corrupted php.conf packaged
> >>>>>>> =3D20
> >>>>>>> devel/ninja:
> >>>>>>> - phase: stage
> >>>>>>> - install -s -m 555=3D20
> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> >>>>>>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> >>>>>>> - consumers fail to build due to corrupted bin/ninja packaged
> >>>>>>> =3D20
> >>>>>>> devel/netsurf-buildsystem:
> >>>>>>> - phase: stage
> >>>>>>> - mkdir -p=3D20
> >>>>>>> =3D
> >>>>>>> =
> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> >>>> e=
> >> =3D
> >>>> tsurf-buildsystem/makefiles=3D20
> >>>>>> =3D
> >>>>>>> =
> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> >>>> e=
> >> =3D
> >>>> tsurf-buildsystem/testtools
> >>>>>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> >>>>>>> Makefile.pkgconfig=3D20
> >>>>>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> >>>>>>> cp makefiles/$M=3D20
> >>>>>>> =3D
> >>>>>>> =
> >>>>>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> >>>> e=
> >> =3D
> >>>> tsurf-buildsystem/makefiles/;=3D20
> >>>>>> \
> >>>>>>> done
> >>>>>>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> >>>>>>> Makefile.{clang,subdir}, causing nothing to link
> >>>>>>> =20
> >>>>>> Summary: I have problems building ports into packages
> >>>>>> via poudriere-devel use despite being fully updated/patched
> >>>>>> (as of when I started the experiment), never having enabled
> >>>>>> block_cloning ( still using openzfs-2.1-freebsd ).
> >>>>>> =20
> >>>

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert

In message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Schubert writes:
> In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert writes:
> > In message , Mark Millard 
> > write
> > s:
> > > [This just puts my prior reply's material into Cy's
> > > adjusted resend of the original. The To/Cc should
> > > be coomplete this time.]
> > >
> > > On Apr 12, 2023, at 22:52, Cy Schubert  =
> > > wrote:
> > >
> > > > In message , Mark =
> > > Millard=20
> > > > write
> > > > s:
> > > >> From: Charlie Li  wrote on
> > > >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> > > >>=20
> > > >>> Charlie Li wrote:
> > > >>>> Mateusz Guzik wrote:
> > > >>>>> can you please test poudriere with
> > > >>>>> https://github.com/openzfs/zfs/pull/14739/files
> > > >>>>>=20
> > > >>>> After applying, on the md(4)-backed pool regardless of =3D
> > > >> block_cloning,=3D20
> > > >>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> > > Will=3D20=3D
> > > >>=20
> > > >>>> report back on poudriere results (no block_cloning).
> > > >>>> =3D20
> > > >>> As for poudriere, build failures are still rolling in. These are =
> > > (and=3D20=3D
> > > >>=20
> > > >>> have been) entirely random on every run. Some examples from this =
> > > run:
> > > >>> =3D20
> > > >>> lang/php81:
> > > >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> > > >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> > > >> ${STAGEDIR}/${PREFIX}/etc
> > > >>> - consumers fail to build due to corrupted php.conf packaged
> > > >>> =3D20
> > > >>> devel/ninja:
> > > >>> - phase: stage
> > > >>> - install -s -m 555=3D20
> > > >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> > > >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> > > >>> - consumers fail to build due to corrupted bin/ninja packaged
> > > >>> =3D20
> > > >>> devel/netsurf-buildsystem:
> > > >>> - phase: stage
> > > >>> - mkdir -p=3D20
> > > >>> =3D
> > > >> =
> > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> e=
> > > =3D
> > > >> tsurf-buildsystem/makefiles=3D20
> > > >>> =3D
> > > >> =
> > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> e=
> > > =3D
> > > >> tsurf-buildsystem/testtools
> > > >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> > > >> Makefile.pkgconfig=3D20
> > > >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> > > >>> cp makefiles/$M=3D20
> > > >>> =3D
> > > >> =
> > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> e=
> > > =3D
> > > >> tsurf-buildsystem/makefiles/;=3D20
> > > >>> \
> > > >>> done
> > > >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> > > >>> Makefile.{clang,subdir}, causing nothing to link
> > > >>=20
> > > >> Summary: I have problems building ports into packages
> > > >> via poudriere-devel use despite being fully updated/patched
> > > >> (as of when I started the experiment), never having enabled
> > > >> block_cloning ( still using openzfs-2.1-freebsd ).
> > > >>=20
> > > >> In other words, I can confirm other reports that have
> > > >> been made.
> > > >>=20
> > > >> The details follow.
> > > >>=20
> > > >>=20
> > > >> [Written as I was working on setting up for the experiments
> > > >> and then executing those experiments, adjusting as I went
> > > >> along.]
> > > >>=20
> > > >> I've run my own tests in a context that has never had the
> > > >> zpool upgrade and that jump from before the openzfs import to
> > > >> after the existing commi

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert

In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert writes:
> In message , Mark Millard 
> write
> s:
> > [This just puts my prior reply's material into Cy's
> > adjusted resend of the original. The To/Cc should
> > be coomplete this time.]
> >
> > On Apr 12, 2023, at 22:52, Cy Schubert  =
> > wrote:
> >
> > > In message , Mark =
> > Millard=20
> > > write
> > > s:
> > >> From: Charlie Li  wrote on
> > >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> > >>=20
> > >>> Charlie Li wrote:
> > >>>> Mateusz Guzik wrote:
> > >>>>> can you please test poudriere with
> > >>>>> https://github.com/openzfs/zfs/pull/14739/files
> > >>>>>=20
> > >>>> After applying, on the md(4)-backed pool regardless of =3D
> > >> block_cloning,=3D20
> > >>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> > Will=3D20=3D
> > >>=20
> > >>>> report back on poudriere results (no block_cloning).
> > >>>> =3D20
> > >>> As for poudriere, build failures are still rolling in. These are =
> > (and=3D20=3D
> > >>=20
> > >>> have been) entirely random on every run. Some examples from this =
> > run:
> > >>> =3D20
> > >>> lang/php81:
> > >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> > >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> > >> ${STAGEDIR}/${PREFIX}/etc
> > >>> - consumers fail to build due to corrupted php.conf packaged
> > >>> =3D20
> > >>> devel/ninja:
> > >>> - phase: stage
> > >>> - install -s -m 555=3D20
> > >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> > >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> > >>> - consumers fail to build due to corrupted bin/ninja packaged
> > >>> =3D20
> > >>> devel/netsurf-buildsystem:
> > >>> - phase: stage
> > >>> - mkdir -p=3D20
> > >>> =3D
> > >> =
> > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> > =3D
> > >> tsurf-buildsystem/makefiles=3D20
> > >>> =3D
> > >> =
> > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> > =3D
> > >> tsurf-buildsystem/testtools
> > >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> > >> Makefile.pkgconfig=3D20
> > >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> > >>> cp makefiles/$M=3D20
> > >>> =3D
> > >> =
> > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> > =3D
> > >> tsurf-buildsystem/makefiles/;=3D20
> > >>> \
> > >>> done
> > >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> > >>> Makefile.{clang,subdir}, causing nothing to link
> > >>=20
> > >> Summary: I have problems building ports into packages
> > >> via poudriere-devel use despite being fully updated/patched
> > >> (as of when I started the experiment), never having enabled
> > >> block_cloning ( still using openzfs-2.1-freebsd ).
> > >>=20
> > >> In other words, I can confirm other reports that have
> > >> been made.
> > >>=20
> > >> The details follow.
> > >>=20
> > >>=20
> > >> [Written as I was working on setting up for the experiments
> > >> and then executing those experiments, adjusting as I went
> > >> along.]
> > >>=20
> > >> I've run my own tests in a context that has never had the
> > >> zpool upgrade and that jump from before the openzfs import to
> > >> after the existing commits for trying to fix openzfs on
> > >> FreeBSD. I report on the sequence of activities getting to
> > >> the point of testing as well.
> > >>=20
> > >> By personal policy I keep my (non-temporary) pool's compatible
> > >> with what the most recent ??.?-RELEASE supports, using
> > >> openzfs-2.1-freebsd for now. The pools involved below have
> > >> never had a zpool upgrade from where they started. (I've no
> > >> pools that have ever had a zpool upgrade.)
> > &g

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert

In message , Mark Millard 
write
s:
> [This just puts my prior reply's material into Cy's
> adjusted resend of the original. The To/Cc should
> be coomplete this time.]
>
> On Apr 12, 2023, at 22:52, Cy Schubert  =
> wrote:
>
> > In message , Mark =
> Millard=20
> > write
> > s:
> >> From: Charlie Li  wrote on
> >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> >>=20
> >>> Charlie Li wrote:
> >>>> Mateusz Guzik wrote:
> >>>>> can you please test poudriere with
> >>>>> https://github.com/openzfs/zfs/pull/14739/files
> >>>>>=20
> >>>> After applying, on the md(4)-backed pool regardless of =3D
> >> block_cloning,=3D20
> >>>> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> Will=3D20=3D
> >>=20
> >>>> report back on poudriere results (no block_cloning).
> >>>> =3D20
> >>> As for poudriere, build failures are still rolling in. These are =
> (and=3D20=3D
> >>=20
> >>> have been) entirely random on every run. Some examples from this =
> run:
> >>> =3D20
> >>> lang/php81:
> >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> >> ${STAGEDIR}/${PREFIX}/etc
> >>> - consumers fail to build due to corrupted php.conf packaged
> >>> =3D20
> >>> devel/ninja:
> >>> - phase: stage
> >>> - install -s -m 555=3D20
> >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> >>> - consumers fail to build due to corrupted bin/ninja packaged
> >>> =3D20
> >>> devel/netsurf-buildsystem:
> >>> - phase: stage
> >>> - mkdir -p=3D20
> >>> =3D
> >> =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> =3D
> >> tsurf-buildsystem/makefiles=3D20
> >>> =3D
> >> =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> =3D
> >> tsurf-buildsystem/testtools
> >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> >> Makefile.pkgconfig=3D20
> >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> >>> cp makefiles/$M=3D20
> >>> =3D
> >> =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> =3D
> >> tsurf-buildsystem/makefiles/;=3D20
> >>> \
> >>> done
> >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> >>> Makefile.{clang,subdir}, causing nothing to link
> >>=20
> >> Summary: I have problems building ports into packages
> >> via poudriere-devel use despite being fully updated/patched
> >> (as of when I started the experiment), never having enabled
> >> block_cloning ( still using openzfs-2.1-freebsd ).
> >>=20
> >> In other words, I can confirm other reports that have
> >> been made.
> >>=20
> >> The details follow.
> >>=20
> >>=20
> >> [Written as I was working on setting up for the experiments
> >> and then executing those experiments, adjusting as I went
> >> along.]
> >>=20
> >> I've run my own tests in a context that has never had the
> >> zpool upgrade and that jump from before the openzfs import to
> >> after the existing commits for trying to fix openzfs on
> >> FreeBSD. I report on the sequence of activities getting to
> >> the point of testing as well.
> >>=20
> >> By personal policy I keep my (non-temporary) pool's compatible
> >> with what the most recent ??.?-RELEASE supports, using
> >> openzfs-2.1-freebsd for now. The pools involved below have
> >> never had a zpool upgrade from where they started. (I've no
> >> pools that have ever had a zpool upgrade.)
> >>=20
> >> (Temporary pools are rare for me, such as this investigation.
> >> But I'm not testing block_cloning or anything new this time.)
> >>=20
> >> I'll note that I use zfs for bectl, not for redundancy. So
> >> my evidence is more limited in that respect.
> >>=20
> >> The activities were done on a HoneyComb (16 Cortex-A72 cores).
> >> The system has and supports ECC RAM, 64 GiBytes of RAM are
> >> present.
> >>=20
> >> I started by duplicating my normal zfs environment to an
> &

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-12 Thread Cy Schubert

> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =
> devel/freebsd-gcc12: Bump to 12.2.0.
> Author: John Baldwin 
> Commit: John Baldwin 
> CommitDate: 2023-03-25 00:06:40 +
> branch: main
> merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72
> merge-base: CommitDate: 2023-03-25 00:06:40 +
> n613214 (--first-parent --count for merge-base)
>
> poudriere attempted to build 476 packages, starting
> with pkg (in order to build the 56 that I explicitly
> indicate that I want). It is my normal set of ports.
> The form of building is biased to allowing a high
> load average compared to the number of hardware
> threads (same as cores here): each builder is allowed
> to use the full count of hardware threads. The build
> used USE_TMPFS=3D"data" instead of the USE_TMPFS=3Dall I
> normally use on the build machine involved.
>
> And it produced some random errors during the attempted
> builds. A type of example that is easy to interpret
> without further exploration is:
>
> pkg_resources.extern.packaging.requirements.InvalidRequirement: Parse =
> error at "'\x00\x00\x00\x00\x00\x00\x00\x00'": Expected W:(0-9A-Za-z)
>
> A fair number of errors are of the form: the build
> installing a previously built package for use in the
> builder but later the builder can not find some file
> from the package's installation.
>
> Another error reported was:
>
> ld: error: /usr/local/lib/libblkid.a: unknown file type
>
> For reference:
>
> [main-CA72-bulk_a-default] [2023-04-12_20h45m32s] [committing:] Queued: =
> 476 Built: 252 Failed: 11  Skipped: 213 Ignored: 0   Fetched: 0   =
> Tobuild: 0Time: 00:37:52
>
> I started another build that tried to build 224 packeges:
> the 11 failed and 213 skipped.
>
> Just 1 package built that failed before:
>
> [00:04:58] [09] [00:04:15] Finished databases/sqlite3@default | =
> sqlite3-3.41.0_1,1: Success
>
> It seems to be the only one where the original failure was not
> an example of complaining about the missing/corrupted content
> of a package install used for building. So it is an example
> of randomly varying behavior.
>
> That, in turn, allowed:
>
> [00:04:58] [01] [00:00:00] Building security/nss | nss-3.89
>
> to build but everything else failed or was skipped.
>
> The sqlite3 vs. other failure difference suggests that writes
> have random problems but later reads reliably see the problem
> that resulted (before the content is deleted).
>
>
> After the above:
>
> # zpool status
>   pool: zroot
>  state: ONLINE
> config:
>
> NAMESTATE READ WRITE CKSUM
> zroot   ONLINE   0 0 0
>   da0p8 ONLINE   0 0 0
>
> errors: No known data errors
>
>> # zpool status
>   pool: zroot
>  state: ONLINE
>   scan: scrub repaired 0B in 00:16:25 with 0 errors on Wed Apr 12 =
> 22:15:39 2023
> config:
>
> NAMESTATE READ WRITE CKSUM
> zroot   ONLINE   0 0 0
>   da0p8 ONLINE   0 0 0
>
> errors: No known data errors
>
>
> =3D=3D=3D
> Mark Millard
> marklmi at yahoo.com


Let's try this again. Claws-mail didn't include the list address in the 
header. Trying to reply, again, using exmh instead.


Did your pools suffer the EXDEV problem? The EXDEV also corrupted files.

I think, without sufficient investigation we risk jumping to
conclusions. I've taken an extremely cautious approach, rolling back
snapshots (as much as possible, i.e. poudriere datasets) when EXDEV
corruption was encountered.

I did not rollback any snapshots in my MH mail directory. Rolling back
snapshots of my MH maildir would result in loss of email. I have to
live with that corruption. Corrupted files in my outgoing sent email
directory remain:

slippy$ ugrep -cPa '\x00' ~/.Mail/note | grep -c :1 
53
slippy$ 

There are 53 corrupted files in my note log of 9913 emails. Those files
will never be fixed. They were corrupted by the EXDEV bug. Any new ZFS
or ZFS patches cannot retroactively remove the corruption from those
files.

But my poudriere files, because the snapshots were rolled back, were
"repaired" by the rolled back snapshots.

I'm not convinced that there is presently active corruption since
the problem has been fixed. I am convinced that whatever corruption
that was written at the time will remain forever or until those files
are deleted or replaced -- just like my email files written to disk at
the time.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: CURRENT: Panic VERIFY(!zil_replaying(zilog, tx)) failed (and crashing)

2023-04-11 Thread Cy Schubert

In message <20230411142831.db824...@slippy.cwsent.com>, Cy Schubert writes:
> In message <434b83db-f6bb-436f-8aa5-385730d20...@dawidek.net>, 
> =?utf-8?Q?Pawe=C
> 5=82_Jakub_Dawidek?= writes:
> > 
> >
> > > On Apr 11, 2023, at 11:31, Cy Schubert  wrote:
> > >=20
> > > =EF=BB=BFIn message <20230409161436.5412fa6e@thor.intern.walstatt.dynvpn.
> d=
> > e>,=20
> > > FreeBSD Us
> > > er writes:
> > >> Am Sun, 9 Apr 2023 14:37:03 +0200
> > >> Mateusz Guzik  schrieb:
> > >>=20
> > >>>> On 4/9/23, FreeBSD User  wrote:
> > >>>>> Today, after upgrading to FreeBSD 14.0-CURRENT #8 main-n262052-0d4038
> e=
> > 301
> > >>> 2b:
> > >>>>> Sun Apr  9
> > >>>>> 12:01:02 CEST 2023  amd64, AND upgrading ZPOOLs via
> > >>>>>=20
> > >>>>> zpool upgrade POOLNAME
> > >>>>>=20
> > >>>>> some boxes keep crashing when starting compiler runs (the trigger is
> > >>>>> different on boxes).
> > >>>>>=20
> > >>>>> ZFS module is statically compiled into the kernel (if this is of
> > >>>>> importance)
> > >>>>>=20
> > >>>>> Last known good was:
> > >>>>>=20
> > >>>>> [...]
> > >>>>> Apr  9 07:10:04 <0.2> thor kernel: FreeBSD 14.0-CURRENT #7
> > >>>>> main-n262051-75379ea2e461: Sun Apr
> > >>>>> 9 00:12:57 CEST 2023 Apr  9 07:10:04 <0.2> thor kernel:
> > >>>>> root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR amd64 Apr  9 07:10:04
>  <
> > =
> > 0.
> > >>> 2>
> > >>>>> thor kernel:
> > >>>>> FreeBSD clang version 15.0.7 (https://github.com/llvm/llvm-project.gi
> t=
> >
> > >>>>> llvmorg-15.0.7-0-g8dfdcc7b7bf6) Apr  9 07:10:04 <0.2> thor kernel:
> > >>>>> VT(efifb): resolution
> > >>>>> 2560x1440 Apr  9 07:10:04 <0.2> thor kernel: module zfsctrl already
> > >>>>> present!
> > >>>>> [...]
> > >>>>>=20
> > >>>>> The file /var/crash/info.X
> > >>>>>=20
> > >>>>> contains:
> > >>>>>=20
> > >>>>> [...]
> > >>>>>=20
> > >>>>> root@thor:/var/crash # more info.2
> > >>>>> Dump header from device: /dev/gpt/swap
> > >>>>>  Architecture: amd64
> > >>>>>  Architecture Version: 2
> > >>>>>  Dump Length: 1095192576
> > >>>>>  Blocksize: 512
> > >>>>>  Compression: none
> > >>>>>  Dumptime: 2023-04-09 11:43:41 +
> > >>>>>  Hostname: thor.local
> > >>>>>  Magic: FreeBSD Kernel Dump
> > >>>>>  Version String: FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e3012b: S
> u=
> > n=20
> > >>> Apr
> > >>>>> 9 12:01:02 CEST
> > >>>>> 2023
> > >>>>>root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR
> > >>>>>  Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
> > >>>>>=20
> > >>>>>  Dump Parity: 2961465682
> > >>>>>  Bounds: 2
> > >>>>>  Dump Status: good
> > >>>>>=20
> > >>>>> Until reconfigured for more debug stuff I do not have more to present
> .=
> >
> > >>>>>=20
> > >>>>> I rememeber now really scraed that there was a HEADSUP in the list re
> g=
> > ard
> > >>> ing
> > >>>>> some serious ZFS
> > >>>>> problems - I didn't find it right now.
> > >>>>>=20
> > >>>>> Thanks in advance,
> > >>>>>=20
> > >>>=20
> > >>> That's fallout from the new block cloning feature, adding the author
> > >>>=20
> > >>=20
> > >> Thanks.
> > >>=20
> > >> As of this moment, all systems with the newest kernel and the new ZFS op
> t=
> > ion=20
> > >> enabled, crash -
> > >> the reason is mostly in  different ZFS datasets. I guess there is no way
>  b
> > =
> > ack
> > >> once this faulty
> > >> option is enabled?
> > >=20
> > > I've run a test on a scratch pool here, first without block_cloning=20
> > > enabled, then with. There was no corruption when block_cloning was=20
> > > disabled. There was corruption when block_cloning was enabled.
> > >=20
> > > I don't know of any way to revert back nor is there any way to fix or=20
> > > recover the corrupted blocks.
> >
> > Is the corruption still present after EXDEV fixes?
>
> Yes and no.
>
> Yes, there is corruption when block_cloning is enabled.
>
> There is no corruption when block_cloning is disabled.

I should add some detail to this.

The corruption experienced when block cloning is disabled was fixed by:

- eb1feadc201a
- e2d997d1cbb9
- d012836fb616 (specifically this commit)
- 20be1b4fc4b7

When block_cloning is enabled, the pool is corrupted. This has not been 
fixed.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: CURRENT: Panic VERIFY(!zil_replaying(zilog, tx)) failed (and crashing)

2023-04-11 Thread Cy Schubert

In message <434b83db-f6bb-436f-8aa5-385730d20...@dawidek.net>, 
=?utf-8?Q?Pawe=C
5=82_Jakub_Dawidek?= writes:
> 
>
> > On Apr 11, 2023, at 11:31, Cy Schubert  wrote:
> >=20
> > =EF=BB=BFIn message <20230409161436.5412fa6e@thor.intern.walstatt.dynvpn.d=
> e>,=20
> > FreeBSD Us
> > er writes:
> >> Am Sun, 9 Apr 2023 14:37:03 +0200
> >> Mateusz Guzik  schrieb:
> >>=20
> >>>> On 4/9/23, FreeBSD User  wrote:
> >>>>> Today, after upgrading to FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e=
> 301
> >>> 2b:
> >>>>> Sun Apr  9
> >>>>> 12:01:02 CEST 2023  amd64, AND upgrading ZPOOLs via
> >>>>>=20
> >>>>> zpool upgrade POOLNAME
> >>>>>=20
> >>>>> some boxes keep crashing when starting compiler runs (the trigger is
> >>>>> different on boxes).
> >>>>>=20
> >>>>> ZFS module is statically compiled into the kernel (if this is of
> >>>>> importance)
> >>>>>=20
> >>>>> Last known good was:
> >>>>>=20
> >>>>> [...]
> >>>>> Apr  9 07:10:04 <0.2> thor kernel: FreeBSD 14.0-CURRENT #7
> >>>>> main-n262051-75379ea2e461: Sun Apr
> >>>>> 9 00:12:57 CEST 2023 Apr  9 07:10:04 <0.2> thor kernel:
> >>>>> root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR amd64 Apr  9 07:10:04 <
> =
> 0.
> >>> 2>
> >>>>> thor kernel:
> >>>>> FreeBSD clang version 15.0.7 (https://github.com/llvm/llvm-project.git=
>
> >>>>> llvmorg-15.0.7-0-g8dfdcc7b7bf6) Apr  9 07:10:04 <0.2> thor kernel:
> >>>>> VT(efifb): resolution
> >>>>> 2560x1440 Apr  9 07:10:04 <0.2> thor kernel: module zfsctrl already
> >>>>> present!
> >>>>> [...]
> >>>>>=20
> >>>>> The file /var/crash/info.X
> >>>>>=20
> >>>>> contains:
> >>>>>=20
> >>>>> [...]
> >>>>>=20
> >>>>> root@thor:/var/crash # more info.2
> >>>>> Dump header from device: /dev/gpt/swap
> >>>>>  Architecture: amd64
> >>>>>  Architecture Version: 2
> >>>>>  Dump Length: 1095192576
> >>>>>  Blocksize: 512
> >>>>>  Compression: none
> >>>>>  Dumptime: 2023-04-09 11:43:41 +
> >>>>>  Hostname: thor.local
> >>>>>  Magic: FreeBSD Kernel Dump
> >>>>>  Version String: FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e3012b: Su=
> n=20
> >>> Apr
> >>>>> 9 12:01:02 CEST
> >>>>> 2023
> >>>>>root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR
> >>>>>  Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
> >>>>>=20
> >>>>>  Dump Parity: 2961465682
> >>>>>  Bounds: 2
> >>>>>  Dump Status: good
> >>>>>=20
> >>>>> Until reconfigured for more debug stuff I do not have more to present.=
>
> >>>>>=20
> >>>>> I rememeber now really scraed that there was a HEADSUP in the list reg=
> ard
> >>> ing
> >>>>> some serious ZFS
> >>>>> problems - I didn't find it right now.
> >>>>>=20
> >>>>> Thanks in advance,
> >>>>>=20
> >>>=20
> >>> That's fallout from the new block cloning feature, adding the author
> >>>=20
> >>=20
> >> Thanks.
> >>=20
> >> As of this moment, all systems with the newest kernel and the new ZFS opt=
> ion=20
> >> enabled, crash -
> >> the reason is mostly in  different ZFS datasets. I guess there is no way b
> =
> ack
> >> once this faulty
> >> option is enabled?
> >=20
> > I've run a test on a scratch pool here, first without block_cloning=20
> > enabled, then with. There was no corruption when block_cloning was=20
> > disabled. There was corruption when block_cloning was enabled.
> >=20
> > I don't know of any way to revert back nor is there any way to fix or=20
> > recover the corrupted blocks.
>
> Is the corruption still present after EXDEV fixes?

Yes and no.

Yes, there is corruption when block_cloning is enabled.

There is no corruption when block_cloning is disabled.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: CURRENT: Panic VERIFY(!zil_replaying(zilog, tx)) failed (and crashing)

2023-04-10 Thread Cy Schubert

In message <20230409161436.5412f...@thor.intern.walstatt.dynvpn.de>, 
FreeBSD Us
er writes:
> Am Sun, 9 Apr 2023 14:37:03 +0200
> Mateusz Guzik  schrieb:
>
> > On 4/9/23, FreeBSD User  wrote:
> > > Today, after upgrading to FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e301
> 2b:
> > > Sun Apr  9
> > > 12:01:02 CEST 2023  amd64, AND upgrading ZPOOLs via
> > >
> > > zpool upgrade POOLNAME
> > >
> > > some boxes keep crashing when starting compiler runs (the trigger is
> > > different on boxes).
> > >
> > > ZFS module is statically compiled into the kernel (if this is of
> > > importance)
> > >
> > > Last known good was:
> > >
> > > [...]
> > > Apr  9 07:10:04 <0.2> thor kernel: FreeBSD 14.0-CURRENT #7
> > > main-n262051-75379ea2e461: Sun Apr
> > > 9 00:12:57 CEST 2023 Apr  9 07:10:04 <0.2> thor kernel:
> > > root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR amd64 Apr  9 07:10:04 <0.
> 2>
> > > thor kernel:
> > > FreeBSD clang version 15.0.7 (https://github.com/llvm/llvm-project.git
> > > llvmorg-15.0.7-0-g8dfdcc7b7bf6) Apr  9 07:10:04 <0.2> thor kernel:
> > > VT(efifb): resolution
> > > 2560x1440 Apr  9 07:10:04 <0.2> thor kernel: module zfsctrl already
> > > present!
> > > [...]
> > >
> > > The file /var/crash/info.X
> > >
> > > contains:
> > >
> > > [...]
> > >
> > > root@thor:/var/crash # more info.2
> > > Dump header from device: /dev/gpt/swap
> > >   Architecture: amd64
> > >   Architecture Version: 2
> > >   Dump Length: 1095192576
> > >   Blocksize: 512
> > >   Compression: none
> > >   Dumptime: 2023-04-09 11:43:41 +
> > >   Hostname: thor.local
> > >   Magic: FreeBSD Kernel Dump
> > >   Version String: FreeBSD 14.0-CURRENT #8 main-n262052-0d4038e3012b: Sun 
> Apr
> > >  9 12:01:02 CEST
> > > 2023
> > > root@thor:/usr/obj/usr/src/amd64.amd64/sys/THOR
> > >   Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
> > >
> > >   Dump Parity: 2961465682
> > >   Bounds: 2
> > >   Dump Status: good
> > >
> > > Until reconfigured for more debug stuff I do not have more to present.
> > >
> > > I rememeber now really scraed that there was a HEADSUP in the list regard
> ing
> > > some serious ZFS
> > > problems - I didn't find it right now.
> > >
> > > Thanks in advance,
> > >  
> > 
> > That's fallout from the new block cloning feature, adding the author
> > 
>
> Thanks.
>
> As of this moment, all systems with the newest kernel and the new ZFS option 
> enabled, crash -
> the reason is mostly in  different ZFS datasets. I guess there is no way back
>  once this faulty
> option is enabled?

I've run a test on a scratch pool here, first without block_cloning 
enabled, then with. There was no corruption when block_cloning was 
disabled. There was corruption when block_cloning was enabled.

I don't know of any way to revert back nor is there any way to fix or 
recover the corrupted blocks.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: trpt(8) to be decomissioned

2022-11-04 Thread Cy Schubert

In message , Gleb Smirnoff writes:
>   Max,
>
> the reason I want to retire it is not that it consumes 40 Kb
> in the repository.  The reason is that knows kernel structures,
> and fails to compile after changes to them.  So the tool that
> nobody uses requires special care when working on TCP.  The
> kernel headers disclose the structures for trpt (with some
> protection with _WANT_TCPCB, though) and some software from
> ports (not calling names!) would start use them too. Now a
> kernel developer needs to care not only about trpt, but
> about this software, too.

I recall when Bryan Cantrill came to one of the local hotels here to 
announce Solaris 9, I remember him saying that Solaris truss was now an app 
that called DTrace functions. If people feel the need for trpt-like 
utility, would it be an idea to write it using DTrace calls? Could it be a 
GSoC project? It would be kind of neat for a co-op student or someone to 
get their feet wet with systems programming.

I typically use DTrace when snooping around looking for that proverbial 
needle in a haystack. And TCPDEBUG seems to be one of those things that 
DTrace was designed to replace.

It would be a good project to have a still in school upcoming developer to 
work on.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: morse(6) sound

2022-10-30 Thread Cy Schubert

In message 
, Nuno Teixeira writes:
> Hello all,
>
> Is there any way to get sound from morse(6) without speaker(4) device?

My question is, why is this still in base? Shouldn't it be a port? I don't 
think this software is of interest to the majority of FreeBSD users out 
there and would be a perfect candidate for migration to ports.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Meta Mode (was: Re: BOOT CRASH -- Current -CURRENT)

2022-10-02 Thread Cy Schubert

In message 
, Warner Losh writes:
> --65ac9c05ea048b2a
> Content-Type: text/plain; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> On Sat, Oct 1, 2022 at 9:06 PM Larry Rosenman  wrote:
>
> > On 10/01/2022 10:04 pm, Warner Losh wrote:
> >
> > Do  you have a /boot tarball that can be loaded in a VM that recreates th=
> e
> > problem (along with a clean hash)?
> >
> > But before you try that, have you tried a completely clean rebuild of the
> > kernel to preclude the possibility that something is somehow cross thread=
> ed?
> >
> > Warner
> >
> > On Sat, Oct 1, 2022 at 8:39 PM Larry Rosenman  wrote:
> >
> >
> > =E2=9D=AF more info.11
> > Dump header from device: /dev/mfid0p3
> >Architecture: amd64
> >Architecture Version: 2
> >Dump Length: 126748815
> >Blocksize: 512
> >Compression: zstd
> >Dumptime: 2022-10-01 21:26:40 -0500
> >Hostname:
> >Magic: FreeBSD Kernel Dump
> >Version String: FreeBSD 14.0-CURRENT #168
> > ler/freebsd-main-changes-n258354-6cdd871ebc4: Sat Oct  1 21:13:01 CDT
> > 2022
> >  r...@borg.lerctr.org:/usr/obj/usr/src/amd64.amd64/sys/LER-MINIMAL
> >Panic String: page fault
> >Dump Parity: 501115454
> >Bounds: 11
> >Dump Status: good
> >
> > I do have source and debug stuff, BUT kgdb croaks on me.
> >
> > I *CAN* give access to the machine.
> >
> > the console backtrace showed something about the kld load of
> > dependencies.
> >
> >
> >
> > --
> > Larry Rosenman http://people.freebsd.org/~ler
> > Phone: +1 214-642-9640 E-Mail: l...@freebsd.org
> > US Mail: 5708 Sabbia Dr, Round Rock, TX 78665-2106
> >
> > let me wipe /usr/obj, and rebuild everything (I *DO* use meta-mode).
> >
>
> I've had fewer problems with it than non-meta mode, but this looks like a
> 'corruption' or 'cross threaded' crash I've chased in the past that went
> away with a rebuild. So it's better to be sure...

I think so too. What may appear to be a gratuitous rebuild of llvm, for 
example, is in fact meta mode rebuilding because of some makefile change. 
Without meta mode I've experienced odd weirdnesses that are fixed through a 
subsequent clean build.

I just started using meta mode again this week after a few years hiatus to 
see if it addresses the occasional weird behaviour due to something not 
being rebuilt when it should have been.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: Header symbols that shouldn't be visible to ports?

2022-09-07 Thread Cy Schubert

In message 
, Alan Somers writes:
> On Sat, Sep 3, 2022 at 11:10 PM Konstantin Belousov  wro
> te:
> >
> > On Sat, Sep 03, 2022 at 10:19:12AM -0600, Alan Somers wrote:
> > > Our /usr/include headers define a lot of symbols that are used by
> > > critical utilities in the base system like ps and ifconfig, but aren't
> > > stable across major releases.  Since they aren't stable, utilities
> > > built for older releases won't run correctly on newer ones.  Would it
> > > make sense to guard these symbols so they can't be used by programs in
> > > the ports tree?  There is some precedent for that, for example
> > > _WANT_SOCKET and _WANT_MNTOPTNAMES.
> > _WANT_SOCKET is clearly about exposing parts of the kernel definitions
> > for userspace code that wants to dig into kernel structures.  Similarly
> > for _WANT_MNTOPTNAMES, but in fact this thing is quite stable.  The
> > definitions are guarded by additional defines not due to their instability,
> > but because using them in userspace requires (much) more preparation from
> > userspace environment, which is either not trivial (_WANT_SOCKET) or
> > contradicts to standartized use of the header (_WANT_MNTOPTNAMES +
> > sys/mount.h).
> >
> > >
> > > I'm particular, I'm thinking about symbols like the following:
> > > MINCORE_SUPER
> > Why this symbol should be hidden?  It is implementation-defined and
> > intended to be exposed to userspace.  All MINCORE_* not only MINCORE_SUPER
> > are under BSD_VISIBLE braces, because POSIX does not define the symbols.
>
> Because it isn't stable.  It changed for example in rev 847ab36bf22
> for 13.0.  Programs using the older value (including virtually every
> Rust program) won't work on 13.0 and later.
>
> >
> > > TDF_*
> > These symbols coming from non-standard header sys/proc.h.  If userspace
> > includes the header, it is already outside any formal standard, and I
> > do not see a reason to make the implementation more convoluted there.
> >
> > > PRI_MAX*
> > > PRI_MIN*
> > > PI_*, PRIBIO, PVFS, etc
> > > IFCAP_*
> > These are all implementation-specific and come from non-standard headers,
> > unless I am mistaken, then please correct me.
> >
> > > RLIM_NLIMITS
> > > IFF_*
> > Same.
> >
> > > *_MAXID
> > This is too broad.
>
> I'm talking about symbols like IPV6CTL_MAXID, which record the size of
> sysctl lists.  Obviously, these symbols can't be stable, and probably
> aren't useful outside of the base system.
>
> >
> > >
> > > Clearly delineating private symbols like this would ease the
> > > maintenance burden on languages that rely on FFI, like Ruby and Rust.
> > > FFI basically assumes that symbols once defined will never change.
> >
> > Why e.g. sys/proc.h is ever consumed by FFI wrappers?
>
> I should add a little detail.  Rust uses FFI to access C functions,
> and #define'd constants are redefined in the Rust bindings.  For most
> Rust programs, the build process doesn't check the contents of
> /usr/include in any way.  Instead, all of that stuff is hard-coded in
> the Rust bindings.  That makes cross-compiling a breeze!  But it does
> cause problems when the C library changes.  Adding a new symbol, like
> copy_file_range, isn't so bad.  If your Rust program doesn't use it,
> then the Rust binding will become an unused symbol and get eliminated
> by the linker.  If your Rust program does use it OTOH, then it will be
> resolved by the dynamic linker at runtime - if you're running on
> FreeBSD 13 or newer.  Otherwise, your program will fail to run.  A
> bigger problem is with symbols that change.  For example, the 64-bit
> inode stuff.  Rust programs still use a FreeBSD 11 ABI (we're working
> on that).  But other symbols change more frequently.  Things like
> PRI_MAX_REALTIME can change between any two releases.  That creates a
> big maintenance burden to keep track of them in the FFI bindings.  And
> they also aren't very useful in cross-compiled programs targeting a
> FreeBSD 11 ABI.  Instead, they really need to have bindings
> automatically generated at build time.  That's possible, but it's not
> the default.

This is exactly what happened with DMD D. When 64-bit statfs was introduced 
all DMD D compiled programs failed to run and recompiling didn't help. The 
DMD upstream failed to understand the problem. Eventually the port had to 
be removed.

>
> So what the Rust community really needs is a way to know which symbols
> will be stable across releases, and which might vary.  Are you
> suggesting that anything from a non-POSIX header file should be
> considered variable?
>

Rust and every other community.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: security/clamav: /var/run on TMPFS renders the port broken by design

2022-08-29 Thread Cy Schubert

In message <20220829082514.63926...@thor.intern.walstatt.dynvpn.de>, 
FreeBSD Us
er writes:
> Am Sun, 28 Aug 2022 06:11:20 -0700
> Cy Schubert  schrieb:
>
> > In message <20220828130107.1a76d54a.gre...@freebsd.org>, Michael Gmelin 
> > writes:
> > > 
> > >
> > >
> > > On Sun, 28 Aug 2022 03:21:24 -0700
> > > Cy Schubert  wrote:
> > >  
> > > > In message <16b4-76a1-4e46-b7c3-60492d379...@freebsd.org>,
> > > > Michael Gmelin w
> > > > rites:  
> > > > > 
> > > > >
> > > > >
> > > > > > On 28. Aug 2022, at 10:42, free...@oldach.net wrote:
> > > > > >=20
> > > > > > =EF=BB=BFCy Schubert wrote on Sat, 27 Aug 2022 17:26:38 +0200
> > > > > > (CEST):
> > > > > >> As stated before in this thread, replacing /var/run with tmpfs
> > > > > >> is not a supported configuration.
> > > > > >=20
> > > > > > Not supported? What is the purpose of /etc/rc.d/var then? That
> > > > > > creates a t=
> > > > > mpfs backed /var, populates it through mtree, and makes a proper
> > > > > /var/run av= ailable.
> > > > > >=20
> > > > > > However it doesn't (yet) create /var/run/clamav of course.
> > > > > >=20
> > > > > > It would be fairly easy to extend /etc/rc.d/var by a logic that
> > > > > > walks thro=
> > > > > ugh /usr/local/etc/mtree/* and runs mtree on each of the files
> > > > > found as need= ed. All that the security/clamav port would need to
> > > > > do then is to drop an ap= propriate small mtree file as
> > > > > /usr/local/etc/mtree/clamav. =46rom a port's p= erspective that is
> > > > > the same logic as dropping service scripts as /usr/local/=
> > > > > etc/rc.d/clamav-*.
> > > > >
> > > > > =46rom a user's perspective, it would be preferable to have this
> > > > > happen at s= ervice start though, as (unlike in the setup
> > > > > described) reboots don't happen= that frequently, but files in
> > > > > /var/run might get deleted manually. Maybe so= me rc framework
> > > > > based solution would make sense, e.g., a variable `mtree_fil= es`,
> > > > > which, if set, is applied in the default start_precmd. Besides
> > > > > being mo= re resilient, this would also have the advantage that all
> > > > > required file syst= ems should be available at that point and the
> > > > > separation between system and p = orts would be more clear. Another
> > > > > advantage would be that directories are on= ly created for services
> > > > > that are actually enabled/started.
> > > > 
> > > > Unfortunately this requires all ports to include an mtree file.
> > > > Relying on port maintainers who are human to ensure that these files
> > > > are created and updated when ports are created and maintained will
> > > > result in more human error. I've learned over my long career to rely
> > > > more on automation than human beings. Automation [should] never fail
> > > > and when it does it does temporarily until the bug is found and
> > > > fixed. Human beings inconsistently fail.
> > > > 
> > > > If it were an auto-discovery script that created an mtree file as
> > > > part of the packaging process, it would be another matter. But this
> > > > optional solution path should be discussed on ports@, not here.
> > > > 
> > > >   
> > >
> > > I don't have much skin in the game, but I created a little proof of
> > > concept to allow further discussion (which is not ports-specific, as it
> > > works for all service scripts):
> > >
> > > https://reviews.freebsd.org/D36385  
> > 
> > I've been toying with the idea for a few months but was never bothered to 
> > create a review or even a script for that matter.
> > 
> > >
> > > This basically allows both system admins and port maintainers to
> > > create mtree files in /usr/local/etc/mtree (or /etc/mtree, as it's
> > > always relative to the service script called) which are automatically
> > > applied on service start. It's non-intrusive and doesn't require any
> > > sweeping changes to existing ports/services.  
> > 
> >

Re: security/clamav: /ar/run on TMPFS renders the port broken by design

2022-08-28 Thread Cy Schubert

In message <20220828130107.1a76d54a.gre...@freebsd.org>, Michael Gmelin 
writes:
> 
>
>
> On Sun, 28 Aug 2022 03:21:24 -0700
> Cy Schubert  wrote:
>
> > In message <16b4-76a1-4e46-b7c3-60492d379...@freebsd.org>,
> > Michael Gmelin w
> > rites:
> > > 
> > >
> > >  
> > > > On 28. Aug 2022, at 10:42, free...@oldach.net wrote:
> > > >=20
> > > > =EF=BB=BFCy Schubert wrote on Sat, 27 Aug 2022 17:26:38 +0200
> > > > (CEST):  
> > > >> As stated before in this thread, replacing /var/run with tmpfs
> > > >> is not a supported configuration.  
> > > >=20
> > > > Not supported? What is the purpose of /etc/rc.d/var then? That
> > > > creates a t=  
> > > mpfs backed /var, populates it through mtree, and makes a proper
> > > /var/run av= ailable.  
> > > >=20
> > > > However it doesn't (yet) create /var/run/clamav of course.
> > > >=20
> > > > It would be fairly easy to extend /etc/rc.d/var by a logic that
> > > > walks thro=  
> > > ugh /usr/local/etc/mtree/* and runs mtree on each of the files
> > > found as need= ed. All that the security/clamav port would need to
> > > do then is to drop an ap= propriate small mtree file as
> > > /usr/local/etc/mtree/clamav. =46rom a port's p= erspective that is
> > > the same logic as dropping service scripts as /usr/local/=
> > > etc/rc.d/clamav-*.
> > >
> > > =46rom a user's perspective, it would be preferable to have this
> > > happen at s= ervice start though, as (unlike in the setup
> > > described) reboots don't happen= that frequently, but files in
> > > /var/run might get deleted manually. Maybe so= me rc framework
> > > based solution would make sense, e.g., a variable `mtree_fil= es`,
> > > which, if set, is applied in the default start_precmd. Besides
> > > being mo= re resilient, this would also have the advantage that all
> > > required file syst= ems should be available at that point and the
> > > separation between system and p = orts would be more clear. Another
> > > advantage would be that directories are on= ly created for services
> > > that are actually enabled/started.  
> > 
> > Unfortunately this requires all ports to include an mtree file.
> > Relying on port maintainers who are human to ensure that these files
> > are created and updated when ports are created and maintained will
> > result in more human error. I've learned over my long career to rely
> > more on automation than human beings. Automation [should] never fail
> > and when it does it does temporarily until the bug is found and
> > fixed. Human beings inconsistently fail.
> > 
> > If it were an auto-discovery script that created an mtree file as
> > part of the packaging process, it would be another matter. But this
> > optional solution path should be discussed on ports@, not here.
> > 
> > 
>
> I don't have much skin in the game, but I created a little proof of
> concept to allow further discussion (which is not ports-specific, as it
> works for all service scripts):
>
> https://reviews.freebsd.org/D36385

I've been toying with the idea for a few months but was never bothered to 
create a review or even a script for that matter.

>
> This basically allows both system admins and port maintainers to
> create mtree files in /usr/local/etc/mtree (or /etc/mtree, as it's
> always relative to the service script called) which are automatically
> applied on service start. It's non-intrusive and doesn't require any
> sweeping changes to existing ports/services.

Understood that this is a manual process.

>
> In this specific case, the requester could create
> /usr/local/etc/mtree/clamav-clamd with the required content (or
> persuade the port maintainer to include that file).
>
> You could of course add some construct to the ports framework that
> picks up certain directories from the package list automatically and
> places them into an mtree file as part of the build or installation
> process. But that would be an additional feature on top of this change.

Someone could. Personally, I think that's a lot of work compared to simply 
saving the state of /var/run at shutdown and restoring it at boot. I can't 
speak for the ports management though.

>
> This is meant to inspire more discussions, I'm not trying to force
> anything in. ;)

Agreed.

I cobbled something up yesterday that saves the directory tree state of 
/var/run prior to shutdown (or manually) and restores it at boot.

https://reviews.freebsd.org/D36386

People can try it out if they want. If there's enough interest I'd be 
willing to commit it.

We have a few options on the table and probably more. The ports 
infrastructure option is probably the most work. Adding functionality to 
all the ports that use /var/run is also a lot of work and if relying on 
individual porters, will likely take some time and be varied in 
implementation and robustness.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: security/clamav: /ar/run on TMPFS renders the port broken by design

2022-08-28 Thread Cy Schubert

In message <16b4-76a1-4e46-b7c3-60492d379...@freebsd.org>, Michael 
Gmelin w
rites:
> 
>
>
> > On 28. Aug 2022, at 10:42, free...@oldach.net wrote:
> >=20
> > =EF=BB=BFCy Schubert wrote on Sat, 27 Aug 2022 17:26:38 +0200 (CEST):
> >> As stated before in this thread, replacing /var/run with tmpfs is not a
> >> supported configuration.
> >=20
> > Not supported? What is the purpose of /etc/rc.d/var then? That creates a t=
> mpfs backed /var, populates it through mtree, and makes a proper /var/run av=
> ailable.
> >=20
> > However it doesn't (yet) create /var/run/clamav of course.
> >=20
> > It would be fairly easy to extend /etc/rc.d/var by a logic that walks thro=
> ugh /usr/local/etc/mtree/* and runs mtree on each of the files found as need=
> ed. All that the security/clamav port would need to do then is to drop an ap=
> propriate small mtree file as /usr/local/etc/mtree/clamav. =46rom a port's p=
> erspective that is the same logic as dropping service scripts as /usr/local/=
> etc/rc.d/clamav-*.
>
> =46rom a user's perspective, it would be preferable to have this happen at s=
> ervice start though, as (unlike in the setup described) reboots don't happen=
>  that frequently, but files in /var/run might get deleted manually. Maybe so=
> me rc framework based solution would make sense, e.g., a variable `mtree_fil=
> es`, which, if set, is applied in the default start_precmd. Besides being mo=
> re resilient, this would also have the advantage that all required file syst=
> ems should be available at that point and the separation between system and p
> =
> orts would be more clear. Another advantage would be that directories are on=
> ly created for services that are actually enabled/started.

Unfortunately this requires all ports to include an mtree file. Relying on 
port maintainers who are human to ensure that these files are created and 
updated when ports are created and maintained will result in more human 
error. I've learned over my long career to rely more on automation than 
human beings. Automation [should] never fail and when it does it does 
temporarily until the bug is found and fixed. Human beings inconsistently 
fail.

If it were an auto-discovery script that created an mtree file as part of 
the packaging process, it would be another matter. But this optional 
solution path should be discussed on ports@, not here.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: security/clamav: /ar/run on TMPFS renders the port broken by design

2022-08-28 Thread Cy Schubert

In message <202208280842.27s8gdxn055...@nuc.oldach.net>, Helge Oldach 
writes:
> Cy Schubert wrote on Sat, 27 Aug 2022 17:26:38 +0200 (CEST):
> > As stated before in this thread, replacing /var/run with tmpfs is not a
> > supported configuration.
>
> Not supported? What is the purpose of /etc/rc.d/var then? That creates a tmpf
> s backed /var, populates it through mtree, and makes a proper /var/run availa
> ble.
>
> However it doesn't (yet) create /var/run/clamav of course.
>
> It would be fairly easy to extend /etc/rc.d/var by a logic that walks through
>  /usr/local/etc/mtree/* and runs mtree on each of the files found as needed. 
> All that the security/clamav port would need to do then is to drop an appropr
> iate small mtree file as /usr/local/etc/mtree/clamav. From a port's perspecti
> ve that is the same logic as dropping service scripts as /usr/local/etc/rc.d/
> clamav-*.
>
> Kind regards
> Helge

This is because you don't already have a /var/run/clamav yet. Unfortunately 
this dies not retroactively create /var/run/clamav.

My new copy of the script, attached, also does not retroactively create the 
directory. Create the directory by hand. Use your server. Reboot and the 
directories will be recreated.

If converting from UFS or ZFS /var/run, simply add the tmpfs mountpoint 
after adding and enabling the script and reboot. (I prefix all locally 
written scripts with kq-).

Remember, this does not retroactively create /var/run/clamav if it doesn't 
already exist. This only makes mounting of tmpfs /var/run an option 
possible.


#!/bin/sh

# PROVIDE: kq-var-run
# REQUIRE: zfs tmp
# BEFORE: FILESYSTEMS

. /etc/rc.subr

name=kq_var_run
rcvar=kq_var_run_enable
extra_commands="load save"
start_cmd="kq_var_run_start"
load_cmd="kq_var_run_load"
save_cmd="kq_var_run_save"
stop_cmd="kq_var_run_stop"

load_rc_config $name

# Set defaults
: ${kq_var_run_enable:="NO"}
: ${kq_var_run_mtree:="/var/db/mtree/BSD.var-run.mtree"}
: ${kq_var_run_autosave:="YES"}

kq_var_run_load() {
test -f ${kq_var_run_mtree} &&
mtree -U -i -q -f ${kq_var_run_mtree} -p /var/run > /dev/null
}

kq_var_run_save() {
if [ ! -d $(dirname ${kq_var_run_mtree}) ]; then
mkdir -p ${kq_var_run_mtree}
fi
mtree -dcbj -p /var/run > ${kq_var_run_mtree}
}

kq_var_run_start() {
df -ttmpfs /var/run > /dev/null 2>&1 &&
kq_var_run_load
}

kq_var_run_stop() {
df -ttmpfs /var/run > /dev/null 2>&1 && 
checkyesno kq_var_run_autosave &&
kq_var_run_save
}

run_rc_command "$1"
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: security/clamav: /ar/run on TMPFS renders the port broken by design

2022-08-27 Thread Cy Schubert

In message <20220827082638.57901a72@slippy>, Cy Schubert writes:
> On Sat, 27 Aug 2022 15:38:44 +0200
> Juraj Lutter  wrote:
>
> > > On 27 Aug 2022, at 15:27, Michael Gmelin  wrote:
> > >=20
> > >=20
> > >  =20
> > >> On 27. Aug 2022, at 15:18, free...@oldach.net wrote:
> > >>=20
> > >> =EF=BB=BFMichael Gmelin wrote on Sat, 27 Aug 2022 15:02:04 +0200 (CEST=
> ): =20
> > >>> (you're removing /var/run, which shouldn't be removed =20
> > >>=20
> > >> Not quite. It's actually not uncommon to boot with an empty /var. Plea=
> se see /etc/rc.d/var and related. =20
> > >=20
> > > That=E2=80=99s a good point.
> > >  =20
> > >> The request that ports/packages should consider this case is not exact=
> ly unreasonable IMO.
> > >>  =20
> > >=20
> > > If I was the maintainer, I would simply add the code to create the dire=
> ctory for robustness sake (I for one deleted subdirs in /var/run more than =
> once and would expect a port to fix this on restart, also to make sure corr=
> ect permissions are applied). But since it doesn=E2=80=99t seem like this i=
> s going to happen, adding a custom rc file would be a viable short term wor=
> karound for the requester.
> > >=20
> > > I like the idea of having something like tmpfiles.d, it would also help=
>  port maintainers (could also be done as a port).
> > >  =20
> >=20
> > As I have stated in one of those PR: clamd creates file in two locations:
> >=20
> > - PidFile
> > - LocalSocket
> >=20
> > Both the locations could be checked by rc.d script in clamd.conf (also fr=
> eshclam eventually) and respective directories can be created from within s=
> tart_precmd()
> >=20
> > otis
> >=20
> > =E2=80=94
> > Juraj Lutter
> > o...@freebsd.org
> >=20
>
> As stated before in this thread, replacing /var/run with tmpfs is not a
> supported configuration. However if users wish to replace /var/run
> with tmpfs they can create an rc script (I put my extra rc scripts in
> /etc/local/rc.d) to create the hierarc
> If one does this they can either use mtree(1) to create the hierarchy
> or simply take a snapshot (find /var/run -type d | cpio -o >
> /etc/local/my_var_run.cpio), having their rc script recreate the
> hierarchy using cpio -i < /etc/local/my_var_run.cpio). And
> be periodically updated the archive as needed, probably through a
> shutdown script.
>
> One will notice that /etc/mtree/BSD.var.dist shows us what is created
> in /var/run by default during installworld.
>
> The change requested is not specifically for an individual port but
> essentially a FreeBSD-wide infrastructure change. I don't think this
> is reasonable without a lot of consideration about what will be broken
> during the process of changing build and boot processes and the
> potential POLA fallout from such a change. A change like this needs to
> be architected.
>
> I don't think this is the mailing list to discuss this topic. This
> should be discussed on ports@. Not here. Maybe it should be moved there
> as this is a ports not a base O/S issue.

This will resolve the problem:

#!/bin/sh

# PROVIDE: kq-var-run
# REQUIRE: zfs tmp
# BEFORE: FILESYSTEMS

. /etc/rc.subr

name=kq_var_run
rcvar=kq_var_run_enable
extra_commands="update create"
start_cmd="kq_var_run_start"
create_cmd="kq_var_run_create"
update_cmd="kq_var_run_create"
# stop_cmd="kq_var_run_create"

load_rc_config $name

# Set defaults
: ${kq_var_run_enable:="NO"}
: ${kq_var_run_mtree:="/etc/local/mtree/KQ.var-run.mtree"}

kq_var_run_start() {
df -ttmpfs /var/run > /dev/null 2>&1 &&
mtree -f ${kq_var_run_mtree} -p /var/run
}

kq_var_run_create() {
mtree -cbdj -p /var/run > ${kq_var_run_mtree}
}

run_rc_command "$1"

A person could add stop_cmd="kq_var_run_create" to save the /var/run mtree 
at shutdown instead of manually. Works with tmpfs /var/run.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: security/clamav: /ar/run on TMPFS renders the port broken by design

2022-08-27 Thread Cy Schubert

On Sat, 27 Aug 2022 15:38:44 +0200
Juraj Lutter  wrote:

> > On 27 Aug 2022, at 15:27, Michael Gmelin  wrote:
> > 
> > 
> >   
> >> On 27. Aug 2022, at 15:18, free...@oldach.net wrote:
> >> 
> >> Michael Gmelin wrote on Sat, 27 Aug 2022 15:02:04 +0200 (CEST):  
> >>> (you're removing /var/run, which shouldn't be removed  
> >> 
> >> Not quite. It's actually not uncommon to boot with an empty /var. Please 
> >> see /etc/rc.d/var and related.  
> > 
> > That’s a good point.
> >   
> >> The request that ports/packages should consider this case is not exactly 
> >> unreasonable IMO.
> >>   
> > 
> > If I was the maintainer, I would simply add the code to create the 
> > directory for robustness sake (I for one deleted subdirs in /var/run more 
> > than once and would expect a port to fix this on restart, also to make sure 
> > correct permissions are applied). But since it doesn’t seem like this is 
> > going to happen, adding a custom rc file would be a viable short term 
> > workaround for the requester.
> > 
> > I like the idea of having something like tmpfiles.d, it would also help 
> > port maintainers (could also be done as a port).
> >   
> 
> As I have stated in one of those PR: clamd creates file in two locations:
> 
> - PidFile
> - LocalSocket
> 
> Both the locations could be checked by rc.d script in clamd.conf (also 
> freshclam eventually) and respective directories can be created from within 
> start_precmd()
> 
> otis
> 
> —
> Juraj Lutter
> o...@freebsd.org
> 

As stated before in this thread, replacing /var/run with tmpfs is not a
supported configuration. However if users wish to replace /var/run
with tmpfs they can create an rc script (I put my extra rc scripts in
/etc/local/rc.d) to create the hierarc
If one does this they can either use mtree(1) to create the hierarchy
or simply take a snapshot (find /var/run -type d | cpio -o >
/etc/local/my_var_run.cpio), having their rc script recreate the
hierarchy using cpio -i < /etc/local/my_var_run.cpio). And
be periodically updated the archive as needed, probably through a
shutdown script.

One will notice that /etc/mtree/BSD.var.dist shows us what is created
in /var/run by default during installworld.

The change requested is not specifically for an individual port but
essentially a FreeBSD-wide infrastructure change. I don't think this
is reasonable without a lot of consideration about what will be broken
during the process of changing build and boot processes and the
potential POLA fallout from such a change. A change like this needs to
be architected.

I don't think this is the mailing list to discuss this topic. This
should be discussed on ports@. Not here. Maybe it should be moved there
as this is a ports not a base O/S issue.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0

Re: DTrace Error

2022-07-31 Thread Cy Schubert

In message <20220725153706.a2bb...@slippy.cwsent.com>, Cy Schubert writes:
> In message , Mark Johnston writes:
> > On Sun, Jul 24, 2022 at 10:07:19AM -0700, Cy Schubert wrote:
> > > In message <20220724030857.b57f...@slippy.cwsent.com>, Cy Schubert writes
> :
> > > > In message <20220723185533.9ea7d...@slippy.cwsent.com>, Cy Schubert wri
> te
> > s:
> > > > > In message , Mark Johnston writes:
> > > > > > On Sat, Jul 23, 2022 at 07:14:44AM -0700, Cy Schubert wrote:
> > > > > > > In message <20220723035223.57cd...@slippy.cwsent.com>, Cy Schuber
> t 
> > writ
> > > > es
> > > > > :
> > > > > > > > I'm not sure if this is because my obj tree needs a fresh rebui
> ld
> >  and
> > > >  
> > > > > > > > reinstall or if this is a legitimate problem. Regardless of the
>  d
> > trac
> > > > e 
> > > > > > > > command entered, whether it be a fbt or sdt, the following erro
> r 
> > occu
> > > > rs
> > > > > :
> > > > > > > >
> > > > > > > > slippy# dtrace -n 'fbt::ieee80211_vap_setup:entry { printf("ent
> er
> > ing 
> > > > > > > > ieee80211_vap_setup\n"); }'
> > > > > > > > dtrace: invalid probe specifier fbt::ieee80211_vap_setup:entry 
> { 
> > > > > > > > printf("entering ieee80211_vap_setup\n"); }: "/usr/lib/dtrace/p
> si
> > nfo.
> > > > d"
> > > > > , 
> > > > > > > > line 1: failed to copy type of 'pr_gid': Conflicting type is al
> re
> > ady 
> > > > de
> > > > > fi
> > > > > > ned
> > > > > > > > slippy# 
> > > > > > > >
> > > > > > > > Old DTrace scripts I've used months or even years ago also fail
>  w
> > ith 
> > > > th
> > > > > e 
> > > > > > > > same error. It's not this one probe. All probes result in the p
> r_
> > gid 
> > > > er
> > > > > ro
> > > > > > r.
> > > > > > > >
> > > > > > > > I'm currently rebuilding my "prod" tree from scratch with the h
> op
> > e th
> > > > at
> > > > >  
> > > > > > > > it's simply something out of sync. But, should it not be, has a
> ny
> > one 
> > > > el
> > > > > se
> > > > > >  
> > > > > > > > encountered this lately?
> > > > > > > 
> > > > > > > A full clean rebuild and installworld/kernel did not change the r
> es
> > ult.
> > > >  
> > > > > > > This is a new problem.
> > > > > >
> > > > > > I don't see any such problem on a system built from commit 151abc80
> cd
> > e,
> > > > > > using GENERIC.  Are you using a custom kernel config?  Which kernel
> > > > > > modules do you have loaded?
> > > > >
> > > > > [...]
> > > >
> > > > chuck@ emailed me privately suggesting a roll back to cb2ae6163174. The
>  
> > > > problem is fixed. I'm creating a special branch that reverts only the l
> lv
> > m 
> > > > commits since then.
> > > 
> > > llvm 14 is not the problem. There must be something else after cb2ae61631
> 74
> >  
> > > that is causing the regression.
> >
> > Are you able to bisect?  I spent a bit of time trying to replicate the
> > problem based on your kernel config, without any luck yet.
>
> How fortuitous is this email. I just rebooted my sandbox again and 
> discovered this is related to non-INVARIANT kernels. Enabling INVARIANTS 
> "fixes" dtrace. There must be some commit since cb2ae6163174 that affected 
> non-INVARIANT kernels. As to which one, I'm not sure yet.

The commit that introduced the regression to non-INVARIANT kernels is 
2449b9e5fe565be757a4b29093fd1c9c6ffcf3c9. Looking at the diff I don't
see how it caused the problem but reverting it locally addresses the
regression. (Of course one needs to disable building the mac_ddb module
in order to have the build succeed.)

Without looking at it closer, I suspect that dtrace could be sensitive to 
one of the struct changes.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e**(i*pi)+1=0

Re: DTrace Error

2022-07-25 Thread Cy Schubert

In message , Mark Johnston writes:
> On Sun, Jul 24, 2022 at 10:07:19AM -0700, Cy Schubert wrote:
> > In message <20220724030857.b57f...@slippy.cwsent.com>, Cy Schubert writes:
> > > In message <20220723185533.9ea7d...@slippy.cwsent.com>, Cy Schubert write
> s:
> > > > In message , Mark Johnston writes:
> > > > > On Sat, Jul 23, 2022 at 07:14:44AM -0700, Cy Schubert wrote:
> > > > > > In message <20220723035223.57cd...@slippy.cwsent.com>, Cy Schubert 
> writ
> > > es
> > > > :
> > > > > > > I'm not sure if this is because my obj tree needs a fresh rebuild
>  and
> > >  
> > > > > > > reinstall or if this is a legitimate problem. Regardless of the d
> trac
> > > e 
> > > > > > > command entered, whether it be a fbt or sdt, the following error 
> occu
> > > rs
> > > > :
> > > > > > >
> > > > > > > slippy# dtrace -n 'fbt::ieee80211_vap_setup:entry { printf("enter
> ing 
> > > > > > > ieee80211_vap_setup\n"); }'
> > > > > > > dtrace: invalid probe specifier fbt::ieee80211_vap_setup:entry { 
> > > > > > > printf("entering ieee80211_vap_setup\n"); }: "/usr/lib/dtrace/psi
> nfo.
> > > d"
> > > > , 
> > > > > > > line 1: failed to copy type of 'pr_gid': Conflicting type is alre
> ady 
> > > de
> > > > fi
> > > > > ned
> > > > > > > slippy# 
> > > > > > >
> > > > > > > Old DTrace scripts I've used months or even years ago also fail w
> ith 
> > > th
> > > > e 
> > > > > > > same error. It's not this one probe. All probes result in the pr_
> gid 
> > > er
> > > > ro
> > > > > r.
> > > > > > >
> > > > > > > I'm currently rebuilding my "prod" tree from scratch with the hop
> e th
> > > at
> > > >  
> > > > > > > it's simply something out of sync. But, should it not be, has any
> one 
> > > el
> > > > se
> > > > >  
> > > > > > > encountered this lately?
> > > > > > 
> > > > > > A full clean rebuild and installworld/kernel did not change the res
> ult.
> > >  
> > > > > > This is a new problem.
> > > > >
> > > > > I don't see any such problem on a system built from commit 151abc80cd
> e,
> > > > > using GENERIC.  Are you using a custom kernel config?  Which kernel
> > > > > modules do you have loaded?
> > > >
> > > > [...]
> > >
> > > chuck@ emailed me privately suggesting a roll back to cb2ae6163174. The 
> > > problem is fixed. I'm creating a special branch that reverts only the llv
> m 
> > > commits since then.
> > 
> > llvm 14 is not the problem. There must be something else after cb2ae6163174
>  
> > that is causing the regression.
>
> Are you able to bisect?  I spent a bit of time trying to replicate the
> problem based on your kernel config, without any luck yet.

How fortuitous is this email. I just rebooted my sandbox again and 
discovered this is related to non-INVARIANT kernels. Enabling INVARIANTS 
"fixes" dtrace. There must be some commit since cb2ae6163174 that affected 
non-INVARIANT kernels. As to which one, I'm not sure yet.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e**(i*pi)+1=0

Re: DTrace Error

2022-07-24 Thread Cy Schubert

In message <20220724030857.b57f...@slippy.cwsent.com>, Cy Schubert writes:
> In message <20220723185533.9ea7d...@slippy.cwsent.com>, Cy Schubert writes:
> > In message , Mark Johnston writes:
> > > On Sat, Jul 23, 2022 at 07:14:44AM -0700, Cy Schubert wrote:
> > > > In message <20220723035223.57cd...@slippy.cwsent.com>, Cy Schubert writ
> es
> > :
> > > > > I'm not sure if this is because my obj tree needs a fresh rebuild and
>  
> > > > > reinstall or if this is a legitimate problem. Regardless of the dtrac
> e 
> > > > > command entered, whether it be a fbt or sdt, the following error occu
> rs
> > :
> > > > >
> > > > > slippy# dtrace -n 'fbt::ieee80211_vap_setup:entry { printf("entering 
> > > > > ieee80211_vap_setup\n"); }'
> > > > > dtrace: invalid probe specifier fbt::ieee80211_vap_setup:entry { 
> > > > > printf("entering ieee80211_vap_setup\n"); }: "/usr/lib/dtrace/psinfo.
> d"
> > , 
> > > > > line 1: failed to copy type of 'pr_gid': Conflicting type is already 
> de
> > fi
> > > ned
> > > > > slippy# 
> > > > >
> > > > > Old DTrace scripts I've used months or even years ago also fail with 
> th
> > e 
> > > > > same error. It's not this one probe. All probes result in the pr_gid 
> er
> > ro
> > > r.
> > > > >
> > > > > I'm currently rebuilding my "prod" tree from scratch with the hope th
> at
> >  
> > > > > it's simply something out of sync. But, should it not be, has anyone 
> el
> > se
> > >  
> > > > > encountered this lately?
> > > > 
> > > > A full clean rebuild and installworld/kernel did not change the result.
>  
> > > > This is a new problem.
> > >
> > > I don't see any such problem on a system built from commit 151abc80cde,
> > > using GENERIC.  Are you using a custom kernel config?  Which kernel
> > > modules do you have loaded?
> >
> > [...]
>
> chuck@ emailed me privately suggesting a roll back to cb2ae6163174. The 
> problem is fixed. I'm creating a special branch that reverts only the llvm 
> commits since then.

llvm 14 is not the problem. There must be something else after cb2ae6163174 
that is causing the regression.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e**(i*pi)+1=0

Re: DTrace Error

2022-07-23 Thread Cy Schubert

In message <20220723185533.9ea7d...@slippy.cwsent.com>, Cy Schubert writes:
> In message , Mark Johnston writes:
> > On Sat, Jul 23, 2022 at 07:14:44AM -0700, Cy Schubert wrote:
> > > In message <20220723035223.57cd...@slippy.cwsent.com>, Cy Schubert writes
> :
> > > > I'm not sure if this is because my obj tree needs a fresh rebuild and 
> > > > reinstall or if this is a legitimate problem. Regardless of the dtrace 
> > > > command entered, whether it be a fbt or sdt, the following error occurs
> :
> > > >
> > > > slippy# dtrace -n 'fbt::ieee80211_vap_setup:entry { printf("entering 
> > > > ieee80211_vap_setup\n"); }'
> > > > dtrace: invalid probe specifier fbt::ieee80211_vap_setup:entry { 
> > > > printf("entering ieee80211_vap_setup\n"); }: "/usr/lib/dtrace/psinfo.d"
> , 
> > > > line 1: failed to copy type of 'pr_gid': Conflicting type is already de
> fi
> > ned
> > > > slippy# 
> > > >
> > > > Old DTrace scripts I've used months or even years ago also fail with th
> e 
> > > > same error. It's not this one probe. All probes result in the pr_gid er
> ro
> > r.
> > > >
> > > > I'm currently rebuilding my "prod" tree from scratch with the hope that
>  
> > > > it's simply something out of sync. But, should it not be, has anyone el
> se
> >  
> > > > encountered this lately?
> > > 
> > > A full clean rebuild and installworld/kernel did not change the result. 
> > > This is a new problem.
> >
> > I don't see any such problem on a system built from commit 151abc80cde,
> > using GENERIC.  Are you using a custom kernel config?  Which kernel
> > modules do you have loaded?
>
> [...]

chuck@ emailed me privately suggesting a roll back to cb2ae6163174. The 
problem is fixed. I'm creating a special branch that reverts only the llvm 
commits since then.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e**(i*pi)+1=0

Re: DTrace Error

2022-07-23 Thread Cy Schubert

In message , Mark Johnston writes:
> On Sat, Jul 23, 2022 at 07:14:44AM -0700, Cy Schubert wrote:
> > In message <20220723035223.57cd...@slippy.cwsent.com>, Cy Schubert writes:
> > > I'm not sure if this is because my obj tree needs a fresh rebuild and 
> > > reinstall or if this is a legitimate problem. Regardless of the dtrace 
> > > command entered, whether it be a fbt or sdt, the following error occurs:
> > >
> > > slippy# dtrace -n 'fbt::ieee80211_vap_setup:entry { printf("entering 
> > > ieee80211_vap_setup\n"); }'
> > > dtrace: invalid probe specifier fbt::ieee80211_vap_setup:entry { 
> > > printf("entering ieee80211_vap_setup\n"); }: "/usr/lib/dtrace/psinfo.d", 
> > > line 1: failed to copy type of 'pr_gid': Conflicting type is already defi
> ned
> > > slippy# 
> > >
> > > Old DTrace scripts I've used months or even years ago also fail with the 
> > > same error. It's not this one probe. All probes result in the pr_gid erro
> r.
> > >
> > > I'm currently rebuilding my "prod" tree from scratch with the hope that 
> > > it's simply something out of sync. But, should it not be, has anyone else
>  
> > > encountered this lately?
> > 
> > A full clean rebuild and installworld/kernel did not change the result. 
> > This is a new problem.
>
> I don't see any such problem on a system built from commit 151abc80cde,
> using GENERIC.  Are you using a custom kernel config?  Which kernel
> modules do you have loaded?

The kernel config is custom. Here is what is reported by the kernel through 
strings:

options CONFIG_AUTOGENERATED
makeoptions WITH_CTF=1
makeoptions DEBUG=-g
options BREAK_TO_DEBUGGER
options SW_WATCHDOG
options DIRECTIO
options KDB_UNATTENDED
options IICHID_SAMPLING
options HID_DEBUG
options EVDEV_SUPPORT
options USB_DEBUG
options ATH_ENABLE_11N
options AH_AR5416_INTERRUPT_MITIGATION
options IEEE80211_SUPPORT_MESH
options IEEE80211_DEBUG
options SC_PIXEL_MODE
options PPS_SYNC
options COMPAT_LINUXKPI
options PCI_IOV
options PCI_HP
options IOMMU
options EARLY_AP_STARTUP
options SMP
options NETGDB
options NETDUMP
options DEBUGNET
options ZSTDIO
options GZIO
options EKCD
options VERBOSE_SYSINIT=0
options MALLOC_DEBUG_MAXZONES=8
options QUEUE_MACRO_DEBUG_TRASH
options DEADLKRES
options GDB
options FULL_BUF_TRACKING
options DDB
options BUF_TRACKING
options KDB_TRACE
options KDB
options RCTL
options RACCT_DEFAULT_TO_DISABLED
options RACCT
options INCLUDE_CONFIG_FILE
options DDB_CTF
options KDTRACE_HOOKS
options KDTRACE_FRAME
options MAC
options CAPABILITIES
options CAPABILITY_MODE
options AUDIT
options HWPMC_HOOKS
options KBD_INSTALL_CDEV
options PRINTF_BUFR_SIZE=128
options _KPOSIX_PRIORITY_SCHEDULING
options SYSVSEM
options SYSVMSG
options SYSVSHM
options STACK
options KTRACE
options SCSI_DELAY=5000
options COMPAT_FREEBSD13
options COMPAT_FREEBSD12
options COMPAT_FREEBSD11
options COMPAT_FREEBSD10
options COMPAT_FREEBSD9
options COMPAT_FREEBSD7
options COMPAT_FREEBSD6
options COMPAT_FREEBSD5
options COMPAT_FREEBSD4
options COMPAT_FREEBSD32
options EFIRT
options GEOM_LABEL
options GEOM_RAID
options TMPFS
options PSEUDOFS
options PROCFS
options CD9660
options MSDOSFS
options NFS_ROOT
options NFSLOCKD
options NFSD
options NFSCL
options MD_ROOT
options QUOTA
options UFS_GJOURNAL
options UFS_DIRHASH
options UFS_ACL
options SOFTUPDATES
options FFS
options KERN_TLS
options SCTP_SUPPORT
options TCP_RFC7413
options TCP_HHOOK
options TCP_BLACKBOX
options TCP_OFFLOAD
options FIB_ALGO
options ROUTE_MPATH
options IPSEC_SUPPORT
options INET6
options INET
options VIMAGE
options PREEMPTION
options NUMA
options SCHED_ULE
options NEW_PCIB
options CC_NEWRENO
options GEOM_PART_GPT
options GEOM_PART_MBR
options GEOM_PART_EBR
options GEOM_PART_BSD
options KDB_TRACE
device  isa
device  mem
device  io
device  uart_ns8250
device  acpi
device  pci
device  fdc
device  ahci
device  ata
device  siis
device  ahc
device  scbus
device  ch
device  da
device  sa
device  cd
device  pass
device  ses
device  atkbdc
device  atkbd
device  psm
device  vga
device  splash
device  sc
device  vt
device  vt_vga
device  vt_efifb
device  vt_vbefb
device  uart
device  puc
device  iflib
device  igc
device  axp
device  miibus
device  crypto
device  aesni
device  loop
device  padlock_rng
device  rdrand_rng
device  ether
device  md
device  firmware
device  xz
device  bpf
device  uhci
device  ohci
device  ehci
device  xhci
device  usb
device  ukbd
device  umass
device  virtio
device  virtio_pci
device  vtnet
device  virtio_blk
device  virtio_scsi
device  virtio_balloon
device  kvm_clock
device  xentimer
device  evdev
device  uinput
device  hid

Kernel modules are:

slippy# kldstat
Id Refs AddressSize Name
 1  185 0x8020  10290a8 kernel
 21 0x8122a000 36c

Re: DTrace Error

2022-07-23 Thread Cy Schubert

In message <20220723035223.57cd...@slippy.cwsent.com>, Cy Schubert writes:
> I'm not sure if this is because my obj tree needs a fresh rebuild and 
> reinstall or if this is a legitimate problem. Regardless of the dtrace 
> command entered, whether it be a fbt or sdt, the following error occurs:
>
> slippy# dtrace -n 'fbt::ieee80211_vap_setup:entry { printf("entering 
> ieee80211_vap_setup\n"); }'
> dtrace: invalid probe specifier fbt::ieee80211_vap_setup:entry { 
> printf("entering ieee80211_vap_setup\n"); }: "/usr/lib/dtrace/psinfo.d", 
> line 1: failed to copy type of 'pr_gid': Conflicting type is already defined
> slippy# 
>
> Old DTrace scripts I've used months or even years ago also fail with the 
> same error. It's not this one probe. All probes result in the pr_gid error.
>
> I'm currently rebuilding my "prod" tree from scratch with the hope that 
> it's simply something out of sync. But, should it not be, has anyone else 
> encountered this lately?

A full clean rebuild and installworld/kernel did not change the result. 
This is a new problem.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e**(i*pi)+1=0

DTrace Error

2022-07-22 Thread Cy Schubert

I'm not sure if this is because my obj tree needs a fresh rebuild and 
reinstall or if this is a legitimate problem. Regardless of the dtrace 
command entered, whether it be a fbt or sdt, the following error occurs:

slippy# dtrace -n 'fbt::ieee80211_vap_setup:entry { printf("entering 
ieee80211_vap_setup\n"); }'
dtrace: invalid probe specifier fbt::ieee80211_vap_setup:entry { 
printf("entering ieee80211_vap_setup\n"); }: "/usr/lib/dtrace/psinfo.d", 
line 1: failed to copy type of 'pr_gid': Conflicting type is already defined
slippy# 

Old DTrace scripts I've used months or even years ago also fail with the 
same error. It's not this one probe. All probes result in the pr_gid error.

I'm currently rebuilding my "prod" tree from scratch with the hope that 
it's simply something out of sync. But, should it not be, has anyone else 
encountered this lately?


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e**(i*pi)+1=0

Re: Loader can't find /boot/ua/loader.lua on UFS after main-n255828-18054d0220c

2022-05-30 Thread Cy Schubert

In message 
, Warner Losh writes:
> --8495bd05e03b4d42
> Content-Type: text/plain; charset="UTF-8"
> Content-Transfer-Encoding: quoted-printable
>
> On Mon, May 30, 2022 at 8:14 AM Toomas Soome  wrote:
>
> >
> >
> > On 30. May 2022, at 17:06, Warner Losh  wrote:
> >
> >
> >
> > On Mon, May 30, 2022 at 4:26 AM David Wolfskill 
> > wrote:
> >
> >> On Mon, May 30, 2022 at 08:40:10AM +0300, Toomas Soome wrote:
> >> > ...
> >> > Does loader_4th have same issue?
> >> > 
> >>
> >> I don't know; I hadn't tried it.  I will do so later today & report
> >> back.
> >>
> >
> > So if it's only one system, and it's only UFS, then what does fsck of tha=
> t
> > UFS system tell you?
> > The loader can't find its UFS filesystem to read the configuration from.
> > So either its having trouble
> > finding the device (unlikely since that code hasn't changed in a long
> > time), or its having heartburn
> > with the UFS system for some reason it's being silent about (within the
> > realm of possibilities because
> > there might be an unknown edge case in Kirks recent  UFS integrity
> > changes). I suspect that the 4th
> > boot loader will have the same issue, but a different error message.
> >
> > Others have reported issues with GELI, but that's not in play here, If I'=
> m
> > reading this correctly. Right?
> >
> > Warner
> >
> >
> > Ye, thats why I was asking about loader_4th. I=E2=80=99m trying to spot t=
> he issue
> > from ufs image sample.
> >
>
> I thought it was a good suggestion. My guess on it not working wasn't to
> imply it wasn't.

Backing out 076002f24d35962f0d21f44bfddd34ee4d7f015d restored the one 
machine of mine that did have the problem. The other three were fine with 
that commit.

To summarize, things I tried:

- Reinstall all boot blocks.
- set currdev to my USB rescue disk, ls works, boots fine
- boot from my USB rescue disk, set currdev to the boot disk, boots
- boot from the USB rescue disk, copy /boot/loader* to the boot disk, works 
around the problem.
- Revert 076002f24d35962f0d21f44bfddd34ee4d7f015d resolves the problem.

Other data points:

My three AMD machines on Asus motherboards had no problem with the commit. 
My Acer laptop with Intel CPU suffered the same problem. Could it be that 
malloc() worked on the Asus/AMD machines while it failed on the Acer/Intel 
laptop?

If my hunch that this may be caused by a malloc() failure, would it be a 
good idea to print a nasty warning when malloc failures in loader occur? 
Because silently failing, resulting in weird behaviour is more of a POLA 
than a nasty message. If not this, a loader variable to enable verbose 
messages might help in debugging these kinds of problems. Again, this 
assumes my hunch that it's a malloc() failure is what actually happened.

-- 
Cheers,
Cy Schubert  or 
FreeBSD UNIX: Web:  http://www.FreeBSD.org
NTP:   Web:  https://nwtime.org

e**(i*pi)+1=0

Re: Considering stepping down from all of my FreeBSD responsibilities

2022-04-01 Thread Cy Schubert

In message <20220401064816.gs60...@eureka.lemis.com>, Greg 'groggy' Lehey 
write
s:
> 
> --TSQPSNmi3T91JED+
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
>
> On Friday,  1 April 2022 at  5:58:39 +, Alexey Dokuchaev wrote:
> > On Fri, Apr 01, 2022 at 02:20:31PM +0900, Yasuhiro Kimura wrote:
> >> Hi Glen,
> >>
> >> From: Glen Barber 
> >> Subject: Considering stepping down from all of my FreeBSD responsibilities
> >> Date: Fri, 1 Apr 2022 00:15:02 +
> >>
> >>> Dear community,
> >>>
> >>> Given the mental toll the past two years or so have taken on me, I have
> >>> decided to step down from all of my "hats" within the Project, and take
> >>> some time to sort out what my future looks like going forward.
> >>>
> >>> Happy April 1st.  I'm not going anywhere.  :-)
> >>
> >> We are waiting for the announce of FreeBSD 2.2.10-RELEASE. :-)
> >>
> >> Cf. https://lists.freebsd.org/pipermail/freebsd-announce/2006-April/001055
> .html
> >
> > I don't think 2.2.10 is warranted.
>
> Agreed.  The upgrade isn't sufficiently important.
>
> How about 2.2.9.1?

I had a different more sinister thought: Announcing that we've moved from 
BSDL to GPLv3 to be more like Linux.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: Deprecating ISA sound cards

2022-03-18 Thread Cy Schubert

In message <20220319022405.ga29...@lonesome.com>, Mark Linimon writes:
> Anyone objecting to this, be careful, I might ship a pile of such
> things to you from the depths of the closets :-)

<<=1


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: DTrace Brokenness [Solved]

2022-03-18 Thread Cy Schubert

A full clean build resolved the problem. It was likely some incompatible 
CTF or possibly some other patch that touched DTrace that left my obj tree 
in an inconsistent state.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


In message <20220318234704.6c14...@slippy.cwsent.com>, Cy Schubert writes:
> It's been a while (~ 4-6 months) since I've last used dtrace. Needing to 
> use it again today scripts that worked before fail to.
>
> A first example:
>
> cwfw# cat dt10.d
> #!/usr/sbin/dtrace -s
>
> fbt::ipf_check:entry {
>   parintf("%x\n", (void) arg[1]);
> }
>
> cwfw# 
>
> Results in this error:
>
> cwfw# ./dt10.d 
> dtrace: failed to compile script ./dt10.d: "/usr/lib/dtrace/psinfo.d", line 
> 1: cannot translate from "struct thread *" to "lwpsinfo_t *"
> cwfw# 
>
> Another example,
>
> slippy# cat dtrace.d 
> #!/usr/sbin/dtrace -s
>
> fbt::uma_reclaim:entry {
>   printf("in uma_reclaim\n");
> }
> slippy# 
>
> Results in the same error:
>
> slippy# ./dtrace.d 
> dtrace: failed to compile script ./dtrace.d: "/usr/lib/dtrace/psinfo.d", 
> line 1: cannot translate from "struct thread *" to "lwpsinfo_t *"
> slippy# 
>
>
> A variation of the second example,
>
> slippy# cat dtrace.sh 
> #!/bin/sh -
> dtrace -n 'fbt::uma_reclaim:entry { printf("in uma_reclaim\n"); }'
> slippy# 
>
> Results in two errors, the first being that the -n option results in an 
> invalid probe specified and the second being the struct thread * error.
>
> slippy# ./dtrace.sh 
> dtrace: invalid probe specifier fbt::uma_reclaim:entry { printf("in 
> uma_reclaim\n"); }: "/usr/lib/dtrace/psinfo.d", line 1: cannot translate 
> from "struct thread *" to "lwpsinfo_t *"
> slippy# 
>
> I'm not sure if this is related to 2d5d2a986ce or something else.
>
>
> -- 
> Cheers,
> Cy Schubert 
> FreeBSD UNIX: Web:  https://FreeBSD.org
> NTP:   Web:  https://nwtime.org
>
>   The need of the many outweighs the greed of the few.
>
>
>

DTrace Brokenness

2022-03-18 Thread Cy Schubert

It's been a while (~ 4-6 months) since I've last used dtrace. Needing to 
use it again today scripts that worked before fail to.

A first example:

cwfw# cat dt10.d
#!/usr/sbin/dtrace -s

fbt::ipf_check:entry {
parintf("%x\n", (void) arg[1]);
}

cwfw# 

Results in this error:

cwfw# ./dt10.d 
dtrace: failed to compile script ./dt10.d: "/usr/lib/dtrace/psinfo.d", line 
1: cannot translate from "struct thread *" to "lwpsinfo_t *"
cwfw# 

Another example,

slippy# cat dtrace.d 
#!/usr/sbin/dtrace -s

fbt::uma_reclaim:entry {
printf("in uma_reclaim\n");
}
slippy# 

Results in the same error:

slippy# ./dtrace.d 
dtrace: failed to compile script ./dtrace.d: "/usr/lib/dtrace/psinfo.d", 
line 1: cannot translate from "struct thread *" to "lwpsinfo_t *"
slippy# 


A variation of the second example,

slippy# cat dtrace.sh 
#!/bin/sh -
dtrace -n 'fbt::uma_reclaim:entry { printf("in uma_reclaim\n"); }'
slippy# 

Results in two errors, the first being that the -n option results in an 
invalid probe specified and the second being the struct thread * error.

slippy# ./dtrace.sh 
dtrace: invalid probe specifier fbt::uma_reclaim:entry { printf("in 
uma_reclaim\n"); }: "/usr/lib/dtrace/psinfo.d", line 1: cannot translate 
from "struct thread *" to "lwpsinfo_t *"
slippy# 

I'm not sure if this is related to 2d5d2a986ce or something else.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: Dragonfly Mail Agent (dma) in the base system

2022-02-05 Thread Cy Schubert

In message , David 
Chisnall w
rites:
> On 30/01/2022 14:01, michael.osi...@siemens.com wrote:
> > Sendmail: The biggest problem is that authentication strictly requires 
> > Cyrus SASL, even for stupid ones like PLAIN/LOGIN, accourding to the 
> > handbook you must recompile sendmail from base with Cyrus SASL from 
> > ports to make this possible. A showstopper actually, for two reasons:
> > 1. I don't like mixing base and ports, it just creates a messy system.
> > 2. While this may work with hosts, when you have jails running off a 
> > RELEASE in Bastille this obviously will not work.
> > Not going to work with sendmail easily.
>
> I think this is a critical point: at the moment, we're paying the cost 
> of having a full-featured MTA in the base system, without getting most 
> of the benefits.  Around 2003, I hit exactly this problem.  The 
> instructions after update were slightly terrifying: after each base 
> system or ports update, I potentially had to recompile my own sendmail.
>
> There's now a sendmail+sasl configuration in packages and so I was 
> incredibly happy to be able to move away from using sendmail in base. 
> Now I have two copies of sendmail on some machines.  The one in ports, 
> for compatibility reasons, looks for config in /etc/mail not under 
> LOCALBASE, which is a layering violation and means that freebsd-update 
> periodically tries to corrupt my config.
>
> I have no strong opinions about where we move to, but moving *from* 
> shipping a limited sendmail in base would make me very happy.

I'd like to add, proceed cautiously. I've been running postfix on my 
external gateway for a couple of decades but recently migrated all but one 
of my internal machines from sendmail to postfix. There were a couple of 
hiccups along the way. In one case there was a mail loop of at(1) jobs 
which required the tweak of a procmail rule. In the second case nmh submits 
mail to localhost:587 requiring altering master.cf. nmh uses only that port 
though it can pipe directly to the sendmail binary when built that way. If 
dma doesn't support SMTP submission, we may need to review various port 
default options or whether ports even support it.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: git: 5e6a2d6eb220 - main - Reapply: move libc++ from /usr/lib to /lib

2021-12-30 Thread Cy Schubert

d.
>
> https://ci.freebsd.org and https://ci.freebsd.org show
> successful builds at this point.
>
>
> It looks like Cy may need to report more about the context
> for the reported build failure.

It was a NO_CLEAN build. A CLEAN build resolved it.

There were no mods to this, my prod tree, except for some upcoming ipfilter 
commits intended for the new year.

One would think a META_MODE build would also fail if NO_CLEAN fails.

Sorry for the late reply. There are other things here that needed some 
urgent attention.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: current now panics when starting VBox VM

2021-11-02 Thread Cy Schubert

In message <36923A7F-23DE-490D-B1FA-A8B064740BD6@unrelenting.technology>, 
Greg
V via freebsd-current writes:
> 
>
> On November 2, 2021 5:16:35 PM GMT+03:00, Michael Butler via freebsd-current 
>  wrote:
> >On current as of this morning (I haven't tried to bisect yet) ..
> >
> >  .. with either graphics/drm-devel-kmod or graphics/drm-current-kmod, 
> >trying to start a VirtualBox VM triggers this panic ..
> >
>
> >#16 0x80c81fc8 at calltrap+0x8
> >#17 0x808b4d69 at sysctl_kern_proc_pathname+0xc9
>
> something something https://reviews.freebsd.org/D32738 ? sysctl_kern_proc_pat
> hname was touched recently there.
>
> (Also can someone commit https://reviews.freebsd.org/D30174 ? These warning-f
> illed reports are unreadable >_<)

Usually the first thing to do with virtualbox is rebuild it. That usually 
fixes any panics I experience here. Of course, make sure your virtualbox 
ports subdirs are fully patched, as it's an opportune time to update it too.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: [HEADSUP] making /bin/sh the default shell for root

2021-09-29 Thread Cy Schubert

In message <20210922083645.4vnoajyvwq6wf...@aniel.nours.eu>, Baptiste 
Daroussin
 writes:
> Hello,
>
> TL;DR: this is not a proposal to deorbit csh from base!!!
>
> For years now, csh is the default root shell for FreeBSD, csh can be confusin
> g
> as a default shell for many as all other unix like settled on a bourne shell
> compatible interactive shell: zsh, bash, or variant of ksh.
>
> Recently our sh(1) has receive update to make it more user friendly in
> interactive mode:
> * command completion (thanks pstef@)
> * improvement in the emacs mode, to make it behave by default like other shel
> ls
> * improvement in the vi mode (in particular the vi edit to respect $EDITOR)
> * support for history as described by POSIX.
>
> This makes it a usable shell by default, which is why I would like to propose
>  to
> make it the default shell for root starting FreeBSD 14.0-RELEASE (not MFCed)
>
> If no strong arguments has been raised until October 15th, I will make this
> proposal happen.
>
> Again just in case: THIS IS NOT A PROPOSAL TO REMOVE CSH FROM BASE!

Having used /bin/sh as my root shell on all my FreeBSD machines, here and 
at $JOB, except for only one, I feel this is perfectly reasonable.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-08 Thread Cy Schubert

In message <86ilzaoc0z@shiori.com.br>, Filipe da Silva Santos via 
current w
rites:
> Hi, thank you for the support, and sorry for the late feedback.
>
> The procedure `wpa_poststart' didn't solve the regression on my system.
> wlan0 doesn't seem to come up.
>
> I have the following output on my log:
>
> | Sep  8 19:09:43 misaka wpa_supplicant[23325]: ioctl[SIOCS80211, op=3D103,=
>  val=3D0, arg_len=3D128]: Operation now in progress
> | Sep  8 19:09:43 misaka wpa_supplicant[23325]: wlan0: CTRL-EVENT-SCAN-FAIL=
> ED ret=3D-1 retry=3D1
>
> Here is a sanitized version of my wpa_supplicant.conf:
>
> ctrl_interface=3D/var/run/wpa_supplicant
> eapol_version=3D1
> ap_scan=3D1
> fast_reauth=3D1
> country=3DBR
> network=3D{
> ssid=3D""
> psk=3D""
> priority=3D5
> }
> [...]
>
> and rc.conf related settings:
>
> [...]
> ifconfig_wlan0=3D"WPA powersave 10.0.0.110 netmask 0xff00 broadcast 10.=
> 0.0.255"
> defaultrouter=3D"10.0.0.1"
>
> wlans_iwm0=3D"wlan0"
> create_args_wlan0=3D"country BR regdomain FCC"
> [...]
>
> The last fix still works, although `sleep' isn't necessary.
>
> @@ -29,6 +29,7 @@
>  }
> =20
>  wpa_poststart() {
> + ifconfig ${ifn} down
>   ifconfig ${ifn} up
>  }
>

d06d7eb09131 has taken care of this.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-08 Thread Cy Schubert

Sorry for the breakage.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


In message 
, Idwer Vollering writes:
> I can confirm this commit has addressed the wpa_supplicant 'breakage'.
> Thanks for fixing it.
>
> Op di 7 sep. 2021 om 19:37 schreef Cy Schubert :
> >
> > In message , David Wolfskill writes
> :
> >
> > --
> > Cheers,
> > Cy Schubert 
> > FreeBSD UNIX: Web:  https://FreeBSD.org
> > NTP:   Web:  https://nwtime.org
> >
> > The need of the many outweighs the greed of the few.
> > > On Tue, Sep 07, 2021 at 10:13:23AM +0200, Jakob Alvermark wrote:
> > > > ...=20
> > > > wlan0 does not associate after boot. (This is with iwm, AC 9260)
> > > >=20
> > > > My workaround is simply 'ifconfig wlan0 up'.
> > > >=20
> > > > After a few seconds wpa_supplicant associates and another few secods=20
> > > > later I have a DHCP IP address.
> > > > 
> > >
> > > I just tried that (running main-n249159-bb61ccd530b7), and that (also)
> > > works for me -- in case that data point is of use.
> >
> > Hi,
> >
> > Commit 5fcdc19a8111 has addressed this.
> >
> >
> >
> >

Re: killall, symlinks, and signal delivery?

2021-09-07 Thread Cy Schubert

On September 7, 2021 3:42:53 PM PDT, Steve Kargl 
 wrote:
>I have stumbled about a quandry, which I hope someone
>can shed some light upon.  In my day job, I often
>generate a sequence of images and display these images
>with ImageMagick's display command.  From my csh prompt,
>a quick and dirty foreach() loop
>
>% foreach i (*.png)
>> display $i &
>> sleep 3
>> end
>
>Instead of moving the cursor to each image and hitting
>'q' to close the images.  I normally kill all of the
>processes at one time.  This used to work:
>
>% killall display
>
>Now I geit, for example, 
>
>% display z.miff &
>% killall display
>No matching processes belonging to you were found
>% ps -Ukargl | grep display
>19463  1  S0:00.02 display z.miff (magick)
>19465  1  S+   0:00.00 grep display
>% ls -l /usr/local/bin/display 
>lrwxr-xr-x  1 root  wheel  - 6 Jun  1 14:18 /usr/local/bin/display@ -> magick
>
>So, there are two possibilities:
>(1) display was once an independent program and not a
>symlink to magick.  Thus, killall just worked. Or,
>(2) killall no longer works because command associated
>with process 19463 is not really 'display' and the
>symlink isn't resolved to actually kill 'magick'.
>
>So, just chekcing (2), here.  Is this a change in behvior
>for FreeBSD?
>

It's likely your app is replacing its process name (argv[0]) to something else. 
ps auxww may give you a hint what it might be now.

-- 
Pardon the typos and autocorrect, small keyboard in use. 
Cy Schubert 
FreeBSD UNIX:  Web: https://www.FreeBSD.org

The need of the many outweighs the greed of the few.

Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-07 Thread Cy Schubert

In message , David Wolfskill writes:

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.
> On Tue, Sep 07, 2021 at 10:13:23AM +0200, Jakob Alvermark wrote:
> > ...=20
> > wlan0 does not associate after boot. (This is with iwm, AC 9260)
> >=20
> > My workaround is simply 'ifconfig wlan0 up'.
> >=20
> > After a few seconds wpa_supplicant associates and another few secods=20
> > later I have a DHCP IP address.
> > 
>
> I just tried that (running main-n249159-bb61ccd530b7), and that (also)
> works for me -- in case that data point is of use.

Hi,

Commit 5fcdc19a8111 has addressed this.

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-06 Thread Cy Schubert

In message <86tuix4cys@shiori.com.br>, Filipe da Silva Santos writes:
> --=-=-=
> Content-Type: text/plain
> Content-Transfer-Encoding: quoted-printable
>
> > I'll have more questions later (need to start working on another job) but=
> =20
> > I'd like to learn more about your configuration to understand why it work=
> s=20
> > at boot for myself and phlip@ and not for you and the others here on=20
> > -current who have experienced the same issue. Understanding what triggers=
> =20
> > this will go a long way to resolving it.
>
> Hello, Cy,
> I have a Intel AC 3168 and can reproduce both problem and solution.
>
> I'd love to help with testing and info with the new version.

Can you also try the security/wpa_supplicant and security/wpa_supplicant-dev
el ports, both without the ifconfig mitigation patch? This will more than 
confirm that this is an upstream problem and not in the FreeBSD Makefiles. 
This would be of great help as I cannot reproduce the problem at boot but 
after boot using service netif (which the old wpa_supplicant 2.9 also had).

An additional confirmation that the -devel port has the same problem would 
help a lot.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-06 Thread Cy Schubert

In message <2780735.ssxfcku...@sigill.theweb.org.ua>, "Oleg V. Nauman" 
writes:
> On 2021 M09 6, Mon 20:31:33 EEST Cy Schubert wrote:
> > One last favour to ask, can you try this with the wpa_supplicant-devel
> > port, please? I'm trying to narrow down if this is related to the options
> > in usr.sbin/wpa/Makefile.inc or an upstream problem. If this behaves the
> > same using wpa_supplicant-devel, this tells me to look at the code instead
> > of Makefiles.
> > 
> > I can reproduce the service netif restart problem using the old
> > wpa_supplicant 2.9, so at least here there is no change in behaviour.
> > Though on my sandbox machine the ifconfig dow/up is not required -- though
> > even the older wpa_supplicant 2.9 behaves the same on my laptop, (no
> > regression experienced here).
> > 
> > To help point to either Makefile.inc or contrib/wpa, can you please try the
> > wpa_supplicant-devel port. This will tell me where to look next.
>
>  I can confirm that wpa_supplicant from security/wpa_supplicant-devel port 
> demonstrating the same behavior as wpa_supplicant from base - "ifconfig wlan0
>  
> down ; sleep 5 ; ifconfig wlan0 up" mitigate wlan association issue.

Thank you.

This is an issue that I'll need to chase down with our upstream. In the 
mean time while work on this/bring it to upstream's attention this should 
circumvent the issue:

diff --git a/libexec/rc/rc.d/wpa_supplicant b/libexec/rc/rc.d/wpa_supplicant
index 8a86fec90e4d..cfe5f1ab27c6 100755
--- a/libexec/rc/rc.d/wpa_supplicant
+++ b/libexec/rc/rc.d/wpa_supplicant
@@ -12,6 +12,7 @@
 
 name="wpa_supplicant"
 desc="WPA/802.11i Supplicant for wireless network devices"
+start_postcmd="wpa_poststart"
 rcvar=
 
 ifn="$2"
@@ -27,6 +28,12 @@ is_ndis_interface()
esac
 }
 
+wpa_poststart() {
+   ifconfig ${ifn} down
+   sleep 3
+   ifconfig ${ifn} up
+}
+
 if is_wired_interface ${ifn} ; then
driver="wired"
 elif is_ndis_interface ${ifn} ; then

I'll have more questions later (need to start working on another job) but 
I'd like to learn more about your configuration to understand why it works 
at boot for myself and phlip@ and not for you and the others here on 
-current who have experienced the same issue. Understanding what triggers 
this will go a long way to resolving it.

(cc'd philip@)

BTW, my laptop is configured so that wlan0 (iwn0) and bge0 are members of 
lagg0. Whereas on my sandbox wlan0 (ath0) is used directly.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-06 Thread Cy Schubert

One last favour to ask, can you try this with the wpa_supplicant-devel 
port, please? I'm trying to narrow down if this is related to the options 
in usr.sbin/wpa/Makefile.inc or an upstream problem. If this behaves the 
same using wpa_supplicant-devel, this tells me to look at the code instead 
of Makefiles.

I can reproduce the service netif restart problem using the old 
wpa_supplicant 2.9, so at least here there is no change in behaviour. 
Though on my sandbox machine the ifconfig dow/up is not required -- though 
even the older wpa_supplicant 2.9 behaves the same on my laptop, (no 
regression experienced here).

To help point to either Makefile.inc or contrib/wpa, can you please try the 
wpa_supplicant-devel port. This will tell me where to look next.

Fifteen seconds isn't needed. Two or three, even no wait, will do.

-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

In message <3000346.zmr5pbt...@sigill.theweb.org.ua>, "Oleg V. Nauman" 
writes:
> On 2021 M09 6, Mon 18:41:13 EEST Cy Schubert wrote:
> > I changed mine to be the same as yours. I can connect. (I use iwn(4) and
> > ath(4) here.)
>
>  a ) regular reboot - wlan can not associate
>  b ) service netif restart - wlan can not associate
>  c ) service netif stop wlan0 ; service netif start wlan0 - wlan can not 
> associate
>  d ) ifconfig wlan0 down; sleep 15 ; ifconfig wlan0 up - wlan associated
>  e ) regular reboot with ifconfig wlan0 down; sleep 15 ; ifconfig wlan0 up 
> added to /etc/rc.local - wlan associated
>
> Thank you.
>
> > 
> > Do you reboot every time you test or simply this?
> > 
> > service netif stop wlan0
> > service netif start wlan0
> > 
> > If simply above, does a reboot have it work again?
> > 
> > The reason I ask is, I discovered, today, a quirk in 14-CURRENT, regardless
> > of the wpa_supplicant installed. It will always associate following a
> > reboot however when running the above two commands to stop and start wlan0
> > I can reproduce your problem. The workaround for now is when running the
> > above two commands to also ifconfig wlan0 down; ifconfig wlan0 up.
> > 
> > Can you try ifconfig wlan0 down; ifconfig wlan0 up after stopping/starting
> > wlan0? You may need to wait 2-3 seconds between down and up.
>
>

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-06 Thread Cy Schubert

I changed mine to be the same as yours. I can connect. (I use iwn(4) and 
ath(4) here.)

Do you reboot every time you test or simply this?

service netif stop wlan0
service netif start wlan0

If simply above, does a reboot have it work again?

The reason I ask is, I discovered, today, a quirk in 14-CURRENT, regardless 
of the wpa_supplicant installed. It will always associate following a 
reboot however when running the above two commands to stop and start wlan0 
I can reproduce your problem. The workaround for now is when running the 
above two commands to also ifconfig wlan0 down; ifconfig wlan0 up.

Can you try ifconfig wlan0 down; ifconfig wlan0 up after stopping/starting 
wlan0? You may need to wait 2-3 seconds between down and up.

If this occurs at boot, try the ifconfig down and up anyway (to help narrow 
down the problem).



-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.



In message 
, Idwer Vollering writes:
> There's no core dump in /, wpa_supplicant connects to 802.11b/g/n
> (there's no way to lock this, instead of having a mix of standards) on
> 2,4GHz.
>
> /etc/wpa_supplicant.conf:
> network={
> ssid="some ssid"
> scan_ssid=1
> key_mgmt=WPA-PSK
> psk="some key"
> }
>
> Op ma 6 sep. 2021 om 15:23 schreef Cy Schubert :
> >
> > In message  c
> > om>
> > , Idwer Vollering writes:
> > > Op ma 6 sep. 2021 om 07:53 schreef Cy Schubert  >:
> > > >
> > > > In message <2838567.hhqauc6...@sigill.theweb.org.ua>, "Oleg V. Nauman"
> > > > writes:
> > > > > On 2021 M09 5, Sun 15:52:50 EEST David Wolfskill wrote:
> > > > > > Sorry I hadn't noticed this yesterday (so I could have repported it
> > > > > > then), but after updating the "head" slice of my laptopp from:
> > > > > >
> > > > > > FreeBSD g1-51.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #340
> > > > > > main-n249128-a0c64a443e4c: Fri Sep  3 04:06:12 PDT 2021
> > > > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CA
> NARY
> > > > > > amd64 1400032 1400032
> > > > > >
> > > > > > to:
> > > > > >
> > > > > > FreeBSD g1-51.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #341
> > > > > > main-n249146-cb5c07649aa0: Sat Sep  4 04:28:27 PDT 2021
> > > > > > r...@g1-51.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CA
> NARY
> > > > > > amd64 1400032 1400032
> > > > > >
> > > > > > I find that while the em0 NIC still works, wlan0 (iwn(4) HW) does n
> ot:
> > > > > > the WLAN LED doesn't light up.
> > > > >
> > > > >  I am also experiencing issues with wlan after my current update to
> > > > > 1f7a6325fe1b. I have checked ath(4) , run(4), rtwn(4) and all of them
> > > > > demonstrating the same behavior  - wlan can not associate.
> > > > > You can mitigate it by using security/wpa_supplicant from ports as re
> plac
> > > emen
> > > > > t
> > > > > of wpa_supplicant in base.
> > > > >
> > > > > .
> > > > > >
> > > > > > I note that exactly the same hardware works OK in stable/12 and sta
> ble/
> > > 13.
> > > > > >
> > > > > > Peace,
> > > > > > david
> > > > >
> > > >
> > > > Can you grep wpa_supplicant in /var/log/messages? This will give us a c
> lue.
> > >
> > > wpa_supplicant stops in wpa_driver_bsd_scan() -
> > > https://github.com/freebsd/freebsd-src/blob/bd452dcbede69b1862c769f244948
> f94b
> > > 86448b5/contrib/wpa/src/drivers/driver_bsd.c#L1315
> > >
> > > Here's some selected output from /var/log/messages.
> > >
> > > Before (built from commit a0c64a443e4cae67a5eea3a61a47d746866de3ee):
> > >
> > > Sep  6 13:29:40  wpa_supplicant[45348]: Successfully
> > > initialized wpa_supplicant
> > > Sep  6 13:29:40  wpa_supplicant[45348]: ioctl[SIOCS80211,
> > > op=20, val=0, arg_len=7]: Invalid argument
> > > Sep  6 13:29:40  syslogd: last message repeated 1 times
> > > Sep  6 13:29:46  wpa_supplicant[45349]: wlan1: Trying to
> > > associate with  (SSID='' freq=2447 MHz)
> > > Sep  6 13:29:46  wpa_supplicant[45349]: Failed to add
> > > supported

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-06 Thread Cy Schubert

In message 
, Idwer Vollering writes:
> Op ma 6 sep. 2021 om 07:53 schreef Cy Schubert :
> >
> > In message <2838567.hhqauc6...@sigill.theweb.org.ua>, "Oleg V. Nauman"
> > writes:
> > > On 2021 M09 5, Sun 15:52:50 EEST David Wolfskill wrote:
> > > > Sorry I hadn't noticed this yesterday (so I could have repported it
> > > > then), but after updating the "head" slice of my laptopp from:
> > > >
> > > > FreeBSD g1-51.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #340
> > > > main-n249128-a0c64a443e4c: Fri Sep  3 04:06:12 PDT 2021
> > > > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY
> > > > amd64 1400032 1400032
> > > >
> > > > to:
> > > >
> > > > FreeBSD g1-51.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #341
> > > > main-n249146-cb5c07649aa0: Sat Sep  4 04:28:27 PDT 2021
> > > > r...@g1-51.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY
> > > > amd64 1400032 1400032
> > > >
> > > > I find that while the em0 NIC still works, wlan0 (iwn(4) HW) does not:
> > > > the WLAN LED doesn't light up.
> > >
> > >  I am also experiencing issues with wlan after my current update to
> > > 1f7a6325fe1b. I have checked ath(4) , run(4), rtwn(4) and all of them
> > > demonstrating the same behavior  - wlan can not associate.
> > > You can mitigate it by using security/wpa_supplicant from ports as replac
> emen
> > > t
> > > of wpa_supplicant in base.
> > >
> > > .
> > > >
> > > > I note that exactly the same hardware works OK in stable/12 and stable/
> 13.
> > > >
> > > > Peace,
> > > > david
> > >
> >
> > Can you grep wpa_supplicant in /var/log/messages? This will give us a clue.
>
> wpa_supplicant stops in wpa_driver_bsd_scan() -
> https://github.com/freebsd/freebsd-src/blob/bd452dcbede69b1862c769f244948f94b
> 86448b5/contrib/wpa/src/drivers/driver_bsd.c#L1315
>
> Here's some selected output from /var/log/messages.
>
> Before (built from commit a0c64a443e4cae67a5eea3a61a47d746866de3ee):
>
> Sep  6 13:29:40  wpa_supplicant[45348]: Successfully
> initialized wpa_supplicant
> Sep  6 13:29:40  wpa_supplicant[45348]: ioctl[SIOCS80211,
> op=20, val=0, arg_len=7]: Invalid argument
> Sep  6 13:29:40  syslogd: last message repeated 1 times
> Sep  6 13:29:46  wpa_supplicant[45349]: wlan1: Trying to
> associate with  (SSID='' freq=2447 MHz)
> Sep  6 13:29:46  wpa_supplicant[45349]: Failed to add
> supported operating classes IE
> Sep  6 13:29:46  kernel: wlan1: link state changed to UP
> Sep  6 13:29:46  wpa_supplicant[45349]: wlan1: Associated with  >
> Sep  6 13:29:46  dhclient[45401]: send_packet: No buffer
> space available
> Sep  6 13:29:46  wpa_supplicant[45349]: wlan1: WPA: Key
> negotiation completed with  [PTK=CCMP GTK=CCMP]
> Sep  6 13:29:46  wpa_supplicant[45349]: wlan1:
> CTRL-EVENT-CONNECTED - Connection to  completed [id=0 id_str=]
>
> After (built from main):
>
> Sep  6 12:19:50  wpa_supplicant[1236]: Successfully
> initialized wpa_supplicant
> Sep  6 12:19:50  kernel: wlan1: Ethernet address: 
> Sep  6 12:19:50  wpa_supplicant[1236]: ioctl[SIOCS80211,
> op=20, val=0, arg_len=7]: Invalid argument
> Sep  6 12:19:50  syslogd: last message repeated 1 times
> Sep  6 12:19:50  wpa_supplicant[1237]: wlan1:
> CTRL-EVENT-SCAN-FAILED ret=-1 retry=1

Is there a wpa_supplicant.core dump in / ?

Can you also send me a sanitized copy of wpa_supplicant.conf, please? I'm 
interested in the lines proto=, key_mgmt=, pairwise=, group=, eap=, and 
phase2=. You may not be using eap= or phase2=, which is fine. I'd like to 
see if there are any differences from what was tested. Though, looking at 
your outputs above you're probably using something like:

proto=RSN WPA
key_mgmt=WPA-PSK
pairwise=CCMP
group=CCMP

Is this correct?

If you try ports/securitiy/wpa_supplicant-devel (same codebase as in 
14-CURRENT), does it work? (ports/security/wpa_supplicant is the old 2.9 
codebase.)

What is your AP set for? 802.11g, 802.11n, 802.11ac?


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: wlan0 no longer functional after n249128-a0c64a443e4c -> n249146-cb5c07649aa0

2021-09-05 Thread Cy Schubert

In message <2838567.hhqauc6...@sigill.theweb.org.ua>, "Oleg V. Nauman" 
writes:
> On 2021 M09 5, Sun 15:52:50 EEST David Wolfskill wrote:
> > Sorry I hadn't noticed this yesterday (so I could have repported it
> > then), but after updating the "head" slice of my laptopp from:
> > 
> > FreeBSD g1-51.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #340
> > main-n249128-a0c64a443e4c: Fri Sep  3 04:06:12 PDT 2021
> > r...@g1-55.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY 
> > amd64 1400032 1400032
> > 
> > to:
> > 
> > FreeBSD g1-51.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #341
> > main-n249146-cb5c07649aa0: Sat Sep  4 04:28:27 PDT 2021
> > r...@g1-51.catwhisker.org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY 
> > amd64 1400032 1400032
> > 
> > I find that while the em0 NIC still works, wlan0 (iwn(4) HW) does not:
> > the WLAN LED doesn't light up.
>
>  I am also experiencing issues with wlan after my current update to 
> 1f7a6325fe1b. I have checked ath(4) , run(4), rtwn(4) and all of them 
> demonstrating the same behavior  - wlan can not associate.
> You can mitigate it by using security/wpa_supplicant from ports as replacemen
> t 
> of wpa_supplicant in base.
>
> .
> > 
> > I note that exactly the same hardware works OK in stable/12 and stable/13.
> > 
> > Peace,
> > david
>

Can you grep wpa_supplicant in /var/log/messages? This will give us a clue.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


>
>

Re: drm-kmod kernel crash fatal trap 12

2021-06-10 Thread Cy Schubert

In message <4894bd36-92bd-596e-cc18-cd3e6aafe...@selasky.org>, Hans Petter 
Sela
sky writes:
> On 6/9/21 4:43 PM, Thomas Laus wrote:
> > I updated my system this morning to main-n247260-dc318a4ffab June 9 2012
> > and the first boot after the kernel was loaded I received:
> > 
> > 'fatal trap 12' fault virtual address = 0x0
> > fault code = supervisor write data, page not present
> > instruction pointer = 0x20:0x82fc3d1b
> > stack pointer = 0x28:0xfe011aea3330
> > frame pointer = 0x28:0xfe011aea3370
> > code segment = base 0x0 limit 0x, type 0x1b
> > DPL 0,pres 1, long 1, def 32 0, gran 1
> > processor eflags = interrupt enabled, resume, IOPL = 0
> > current process = 1187 (kldload)
> > trap number = 12
> > 
> > I hand copied the screen display since I was not able to generate a
> > crash dump to /var/crash on a zfs file system.
> > 
> > I am rebuilding the GENERIC kernel since the crash was using the NODEBUG
> > version.  This is 100 percent repeatable.
> > 
> > Tom
> > 
>
> Make sure you also re-build the drm-kmod module.

And while you're at it, update your copy of the drm-* port to the latest.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: wpa_supplicant: SIGBUS after main-n247052-d40cd26a86a7 -> main-n247092-ec7b47fc81b2

2021-06-01 Thread Cy Schubert

In message <72a5e40f-c973-473c-b2a4-acdd28685...@yahoo.com>, Mark Millard 
write
s:
> Cy Schubert  wrote on:
> Date: Tue, 01 Jun 2021 14:02:06 -0700 :
>
> > Can you provide me with a backtrace, using the bt command, please.
>
>
> That was in the original message from David W. A copy was
> in the reply that you sent to the list as well:
>
> > > (gdb) bt
> > > #0  0x010fb34f in wpa_sm_rx_eapol ()
> > > #1  0x010f3afe in l2_packet_receive ()
> > > #2  0x01122ef3 in eloop_run ()
> > > #3  0x010b44a8 in wpa_supplicant_run ()
> > > #4  0x0109fdec in main ()
>
> But it also had this report about the context:
>
> > > (No debugging symbols found in /usr/obj/usr/src/amd64.amd64/usr.sbin/wpa/
> wp=
> > > a_supplicant/wpa_supplicant)
>
>
> So it was apparently a non-debug build without symbols, limiting
> the information that is available.

Correct. We have debug symbols now and are chasing it down. I suspect a 
static function address in a structure may be incorrect.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.

Re: wpa_supplicant: SIGBUS after main-n247052-d40cd26a86a7 -> main-n247092-ec7b47fc81b2

2021-06-01 Thread Cy Schubert

Can you provide me with a backtrace, using the bt command, please.



-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


In message , David Wolfskill writes:
> 
> --+ZZXH3gC4eszK4ZV
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
> Content-Transfer-Encoding: quoted-printable
>
> Reading symbols from /usr/obj/usr/src/amd64.amd64/usr.sbin/wpa/wpa_supplica=
> nt/wpa_supplicant...
> (No debugging symbols found in /usr/obj/usr/src/amd64.amd64/usr.sbin/wpa/wp=
> a_supplicant/wpa_supplicant)
> [New LWP 100168]
> Core was generated by `/usr/sbin/wpa_supplicant -s -B -i wlan0 -c /etc/wpa_=
> supplicant.conf -D bsd -P /v'.
> Program terminated with signal SIGBUS, Bus error.
> --Type  for more, q to quit, c to continue without paging--
> #0  0x010fb34f in wpa_sm_rx_eapol ()
> (gdb) bt
> #0  0x010fb34f in wpa_sm_rx_eapol ()
> #1  0x010f3afe in l2_packet_receive ()
> #2  0x01122ef3 in eloop_run ()
> #3  0x010b44a8 in wpa_supplicant_run ()
> #4  0x0109fdec in main ()
> (gdb)=20
>
> wlan0 is an iwn(4) device, in this case.  Not yet sure how reproducible
> this is, but wpa_supplicant's issue(s) do not (yet) seem to prevent the
> machine from using teh network (as I'm typing on the laptop's keyboard
> to write this).
>
> uname strings: yesterday:
>
> FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #258 main-n2=
> 47052-d40cd26a86a7: Mon May 31 05:48:18 PDT 2021 root@g1-55.catwhisker.=
> org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1400018 1400018
>
> today:
>
> FreeBSD g1-55.catwhisker.org 14.0-CURRENT FreeBSD 14.0-CURRENT #259 main-n2=
> 47092-ec7b47fc81b2: Tue Jun  1 04:49:26 PDT 2021 root@g1-55.catwhisker.=
> org:/common/S4/obj/usr/src/amd64.amd64/sys/CANARY  amd64 1400018 1400018
>
> (Though the laptop did just lose connectivity; checking /var/log/messages:
>
> <6>1 2021-06-01T12:06:15.336865+00:00 g1-55.catwhisker.org kernel - - - wla=
> n0: link state changed to DOWN
> <2>1 2021-06-01T12:09:26.811751+00:00 g1-55.catwhisker.org kernel - - - if_=
> delmulti_locked: detaching ifnet instance 0xf800126d6800
> <2>1 2021-06-01T12:09:26.811773+00:00 g1-55.catwhisker.org syslogd - - - la=
> st message repeated 5 times
> <6>1 2021-06-01T12:09:26.811774+00:00 g1-55.catwhisker.org kernel - - - lo0=
> : link state changed to DOWN
> <27>1 2021-06-01T12:09:27.317032+00:00 g1-55.catwhisker.org dhclient 441 - =
> - My address (172.17.1.55) was deleted, dhclient exiting
> <2>1 2021-06-01T12:09:27.317474+00:00 g1-55.catwhisker.org kernel - - - if_=
> delmulti_locked: detaching ifnet instance 0xf800129a1800
>
> I tried "sudo service netif restart" and that brought the connection back
> (for now, anyway).
>
> As the laptop is a machine that I connect to networks I do not
> control, it uses packet filtering (ipfw, which I've been using since
> Whistle Communications, ca. 1998).
>
> The build typescript will be up at
> https://www.catwhisker.org/~david/FreeBSD/history/laptop.14_build_typescrip=
> t.txt
> shortly.
>
> Peace,
> david
> --=20
> David H. Wolfskill  da...@catwhisker.org
> Claiming that Donald Trump won the 2020 election is the opposite of
> patriotism.  Make of that what you will.
>
> See https://www.catwhisker.org/~david/publickey.gpg for my public key.
>
> --+ZZXH3gC4eszK4ZV
> Content-Type: application/pgp-signature; name="signature.asc"
>
> -BEGIN PGP SIGNATURE-
>
> iQGTBAEBCgB9FiEE4owz2QxMJyaxAefyQLJg+bY2PckFAmC2JXtfFIAALgAo
> aXNzdWVyLWZwckBub3RhdGlvbnMub3BlbnBncC5maWZ0aGhvcnNlbWFuLm5ldEUy
> OEMzM0Q5MEM0QzI3MjZCMTAxRTdGMjQwQjI2MEY5QjYzNjNEQzkACgkQQLJg+bY2
> PclQaQgA18e9tascZE1aSW6elcBrSo/cIRQU1KeguDAea+6LdqOx1ONqODh7GAQt
> GDhQylVwnZTci4aEXl48katwt4yYiqm+HvCD99uN1NJYw3Fjvn1jr0ql0x6ZNG3V
> hmd+pdatIJXGRwE9g/HFP0quvbDOdkHonCmZtZZaR8zb0azSMA3MbmWgcoPs+G/n
> JpNCZuPWs0fAKtK20pi9fsTL8LDu0Y6bAHPA5Ch1Hrmi/yuj2EJwYwj+Un2WwAY4
> NfVeUQW4glansw3JDAX7Uws3qvBXsz+l9QGvTNGMG2YscqLNlAgUsj+HX/UFi2vs
> X0QU8GDr7bXH9Kzmsa8orIbB/Hv3hQ==
> =WZoK
> -END PGP SIGNATURE-
>
> --+ZZXH3gC4eszK4ZV--
>

Re: boot loader blank screen

2021-01-06 Thread Cy Schubert

In message <63a29589-22b5-495b-8e0d-14e13091d...@yahoo.com>, Mark Millard 
write
s:
> 
>
> On 2021-Jan-5, at 17:54, David Wolfskill  wrote:
>
> > On Wed, Jan 06, 2021 at 12:46:08AM +0200, Toomas Soome wrote:
> >> ... 
> >>> the 58661b3ba9eb should hopefully fix the loader text mode issue, it woul
> d be cool if you can verify:)
> >>> 
> >>> thanks,
> >>> toomas
> >> 
> >> I think, I got it fixed (at least idwer did confirm for his system, thanks
> ). If you can test this patch: http://148-52-235-80.sta.estpak.ee/0001-loader
> -rewrite-font-install.patch <http://148-52-235-80.sta.estpak.ee/0001-loader-r
> ewrite-font-install.patch> it would be really nice.
> >> 
> >> thanks,
> >> toomas
> > 
> > I tested with each of the following "stanzas' in /boot/loader.conf,
> > using vt (vs. syscons) in each case (though that breaks video reset
> > on resume after suspend):
> > 
> > . . .
>
> I've done no experiments with an explicit vbe_max_resolution
> setting. My context for hw.vga.textmode="0" shows up as
> 1920x1200. (I do have the font for this set to 8x16, making
> for lots of character cells across and down.)
>
>
> For the below I do not have hw.vga.textmode="0".
>
> > # hw.vga.textmode="0"

textmode=1 doesn't work either. Been using it for years and this is the 
first time it's borked.

> > # vbe_max_resolution=1280x800
> > 
> > (That is, not specifying anything for hw.vga.textmode or
> > vbe_max_resolution.)
> > 
> > This boots OK, but I see no kernel probe messages or single- to
> > multi-user mode messages.  I can use (e.g.) Ctl+Alt+F2 to switch to
> > vty1, see a "login: " prompt, and that (also) works.  (This is the
> > initial symptom I had reported.)
>
> So I tried commenting out hw.vga.textmode="1" and I saw everything
> I expected in my context. Whiteish on black background (or at least
> something very dark). I did not take videos to do detailed
> inspections.

Didn't work for me. Then again my old eyes didn't detect much difference in 
contrast.

>
>
> > hw.vga.textmode="1"
> > # vbe_max_resolution=1280x800
> > 
> > This works -- boots OK, and I see kernel probe () messages; this is a
> > text console (mostly blue text; some white, against a dark background.
> > It's a medium-light blue, so it's easy enough to read (unlike a navy
> > blue, for example).
>
> FYI: whiteish on black (or at least something very dark)
> in my hw.vga.textmode="1" context. I saw everything here
> as well. I did not take videos to do detailed inspections.
>
> I did not notice any way to tell hw.vga.textmode="1" from
> having no hw.vga.textmode assignment at all. But, again,
> I did not set up for an after-the-fact detailed review of
> what is displayed.

Everything becomes normal when X starts, except for the three machines 
downstairs which don't use X.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: firewall choice

2020-11-27 Thread Cy Schubert

In message 
, grarpamp writes:
> >>> What's the "best" [1] choice for firewalling these days
> >>> There's pf, ipf and ipfw.
> >>
> >>This question comes up over years.
> >>
> >>Consider starting and joining with people to create
> >>a comparison page on the FreeBSD Wiki,
> >>both a feature / capability comparison table,
> >>and contextual paragraphs.
> >>A mini project like that can help many users
> >>and add their researches to it.
> >
> > I'd be happy to if I knew where to start/how to start/is there a guide.
>
> Starting a wiki is here...
> https://wiki.freebsd.org/
> https://wiki.freebsd.org/AboutWiki
>
> Which falls under larger handbook doc area...
> https://lists.freebsd.org/mailman/listinfo/freebsd-doc
>
> Much of comparison would pull from man pages.
>
> Could also come from posting a call for input / announce
> to questions, hackers, forum, etc.
>
> Wiki should not duplicate admin info from here...
> https://www.freebsd.org/doc/en_US.ISO8859-1/books/handbook/firewalls.html
> But would cover this handbook bullet item that is
> not actually covered in the handbook (which
> could link out to the wiki page for that)...
> "- The differences between the firewalls built into FreeBSD."
>
> A full comparison would also want to note and point to
> upstream sources, and have a table of which filter systems
> are supported going forward in each unix OS (the *BSD
> flavors including DragonFly ipfw3 pf, Linux netfilter+nftables,
> Illumos).

pf was originally written when Darren Reed took a job at Sun. He changed 
the license at the time. FreeBSD moved it (and other softwre to contrib), 
as did NetBSD (in their own way). OpenBSD wrote pf in the space of a week 
in reaction to the license change.

>
> And cover layer2 capabilities, switching, bridging, ipv6,
> nat, rate limits / shape / queue, proxy, arbitrary rewriting
> and routing hooks, etc.
>
> NetBSD where ipf was last released has deprecated
> both ipf and pf in favor of npf. While upstream devel and
> maintenance on ipf has died, pf still lives on at OpenBSD.

It's hardly deprecated in NetBSD. Christos Zoulas and I have exchanged a 
fair bit of code.

Darren Reed released and maintained IPF through the Australian National 
University. NetBSD imported it, like we do here at FreeBSD, into their src 
tree.

>
> Anyone can start. Have fun.

My ipf work is documented at https://wiki.freebsd.org/IPFilter.

> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.



___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: svn.freebsd.org

2020-11-11 Thread Cy Schubert

In message <20201112.042716.381474736225590586.y...@utahime.org>, Yasuhiro 
KIMU
RA writes:
> At first, lagging has disappeared now.
>
> From: "Bjoern A. Zeeb" 
> Subject: Re: svn.freebsd.org
> Date: Wed, 11 Nov 2020 19:12:06 +
>
> > svn.freebsd.org is geolocated imho; so unless youâll tell people to
> > which IPv6/IPv4 address you are connecting itâll be harder to track
> > this down if it is not all mirrors.
>
> I use 192.50.199.249. But svnweb.freebsd.org had also been lagging. So
> It doesn't seem the problem was specific to one mirror.

It's happened a few times this week. Last time was yesterday morning PDT. I 
didn't notice the exact time though. And correct, it's not happening now. 
It updated at approximately 1130U (that's 1930Z).

I also noticed that the github repo via the website listed the same commit 
as its latest as well.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: svn.freebsd.org

2020-11-11 Thread Cy Schubert

In message <0398cede-609f-4789-b056-2809712f9...@lists.zabbadoz.net>, 
"Bjoern A
. Zeeb" writes:
> On 11 Nov 2020, at 18:47, Yasuhiro KIMURA wrote:
>
> > From: Cy Schubert 
> > Subject: svn.freebsd.org
> > Date: Wed, 11 Nov 2020 10:20:55 -0800
> >
> >> I've noticed that svn.freebsd.org has been lagging with commits from
> >> repo.freebsd.org. Is this a change or is there something broken? (I 
> >> use
> >> svn.freebsd.org as the source of truth at $JOB.)
> >>
> >> At the moment svn.freebsd.org is at r367589 while repo.freebsd.org is 
> >> at
> >> r367596.
> >
> > Not only src but also ports has been lagging. Currently
> > https://svn.freebsd.org/ports/ is r554896 but I received commit
> > message of r554908 from svn-ports-all ML.
> >
> > Also this is not first time. Though I can't remember exactly when,
> > similar situation happened within a week.
>
> svn.freebsd.org is geolocated imho;  so unless youâll tell people to 
> which IPv6/IPv4 address you are connecting itâll be harder to track 
> this down if it is not all mirrors.

slippy$ nslookup svn.freebsd.org
Server: 127.0.0.1
Address:127.0.0.1#53

Non-authoritative answer:
svn.freebsd.org canonical name = svnmir.geo.freebsd.org.
Name:   svnmir.geo.freebsd.org
Address: 96.47.72.69
Name:   svnmir.geo.freebsd.org
Address: 2610:1c1:1:606c::e6a:0

slippy$ 


Located on West Coast Canada.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

svn.freebsd.org

2020-11-11 Thread Cy Schubert

I've noticed that svn.freebsd.org has been lagging with commits from 
repo.freebsd.org. Is this a change or is there something broken? (I use 
svn.freebsd.org as the source of truth at $JOB.)

At the moment svn.freebsd.org is at r367589 while repo.freebsd.org is at 
r367596.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: CURRENT failing at contrib/unbound/util/config_file.c:122:20

2020-11-07 Thread Cy Schubert

In message , Pete 
Wright w
rites:
> wondering if anyone else is having this error building CURRENT today:
>
>
> --- config_file.o ---
> /usr/home/pete/git/freebsd/contrib/unbound/util/config_file.c:122:20: 
> error: use of undeclared identifier 'UNBOUND_DNS_OVER_HTTPS_PORT'
>  Â Â Â Â Â Â Â  cfg->https_port = UNBOUND_DNS_OVER_HTTPS_PORT;
>  Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â  ^
> 1 error generated.
> --- all_subdir_lib/ncurses ---
>
>
> my last commit from the github mirror is:
>
> commit efb48d58bee75fdb221adece8ef5a13cede99e8c (HEAD -> master, 
> origin/master, origin/HEAD)
> Author: tuexen 
> Date:Â Â  Sat Nov 7 21:17:49 2020 +
>
>  Â Â Â  The ioctl() calls using FIONREAD, FIONWRITE, FIONSPACE, and SIOCATMAR
> K
>  Â Â Â  access the socket send or receive buffer. This is not possible for
>  Â Â Â  listening sockets since r319722.
>  Â Â Â  Because send()/recv() calls fail on listening sockets, fail also 
> ioctl()
>  Â Â Â  indicating EINVAL.
>
> so not sure if it's been found or if this is a real issue.

No such problem here.

What do you see on line 1397 of /usr/src/usr.sbin/unbound/config.h?

Also, uname -a, please.

And, git status usr.sbin/unbound, looking for local mods. Your cwd will 
need to be the root of your git tree.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.





___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: OpenZFS: kldload zfs.ko freezes on i386 4GB memory

2020-10-30 Thread Cy Schubert

In message 
, Matthew Macy writes:
> On Fri, Oct 30, 2020 at 4:50 PM Cy Schubert  wrote
> :
> >
> > In message <20201030233138.gd34...@zxy.spb.ru>, Slawa Olhovchenkov writes:
> > > On Fri, Oct 30, 2020 at 04:00:55PM -0700, Cy Schubert wrote:
> > >
> > > > > > > More stresses memory usually refers to performance penalty.
> > > > > > > Usually way for better performance is reduce memory access.
> > > > > >
> > > > > > The reason filesystems (UFS, ZFS, EXT4, etc.) cache is to avoid dis
> k
> > > > > > accesses. Nanoseconds vs milliseconds.
> > > > >
> > > > > I mean compared ZoL ZFS ARC vs old (BSD/Opensolaris/Illumos) ZFS ARC.
> > > > > Any reaason to rise ARC hit rate in ZoL case?
> > > >
> > > > That's what hit rate is. It's a memory access instead of a disk access.
> > > > That's what you want.
> > >
> > > Is ZoL ARC hit rate rise from FreeBSD ARC hit rate?
> >
> > We don't know that. You should be able to find out by running some tests
> > that would populate your ARC and run the test again. I see that my
> > -DNO_CLEAN buildworlds run faster, when I run them a second or third time
> > after making a minor edit, than they did before. Thus I assume it uses
> > memory more efficiently. By default it stores more metadata in ARC, 75%
> > instead of IIRC 25% by default.
> >
> > Getting back to your original question. A more efficient ARC would exercise
> > your memory more intensely because you are replacing disk reads with memory
> > reads. And as I said before the old ZFS "found" weak RAM on three separate
> > occasions in three different machines over the last ten years. You're
> > advised to replace the marginal memory.
>
> Ryan has been able to reproduce this in a VM with 4GB, similarly a VM
> with 2GB loads just fine. It would seem that 4GB triggers a bug in
> limit handling. We're hoping that we can simply lower one of the
> default limits on i386 and make the problem go away.
>
> Please don't shoot the messenger when I observe that, generally
> speaking, i386 is considered a self supported platform due to ZFS
> general inability to perform well with limited memory or KVA. Long
> mode has been available on virtually all processors shipped since
> 2006.

Yes, I was able to use ZFS on a 2 GB Pentium-M (i386) laptop for many 
years. ZFS worked well with a little tuning on such a small machine. Last 
time I booted it was late last year or early this year. It's in a drawer 
right now. I'll try to pull it out this coming week to test it out.

Serendipitous that I was thinking about pulling out that old laptop to test 
out the new ZFS just last week.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.




___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: OpenZFS: kldload zfs.ko freezes on i386 4GB memory

2020-10-30 Thread Cy Schubert

In message <20201030233138.gd34...@zxy.spb.ru>, Slawa Olhovchenkov writes:
> On Fri, Oct 30, 2020 at 04:00:55PM -0700, Cy Schubert wrote:
>
> > > > > More stresses memory usually refers to performance penalty.
> > > > > Usually way for better performance is reduce memory access.
> > > > 
> > > > The reason filesystems (UFS, ZFS, EXT4, etc.) cache is to avoid disk 
> > > > accesses. Nanoseconds vs milliseconds.
> > >
> > > I mean compared ZoL ZFS ARC vs old (BSD/Opensolaris/Illumos) ZFS ARC.
> > > Any reaason to rise ARC hit rate in ZoL case?
> > 
> > That's what hit rate is. It's a memory access instead of a disk access. 
> > That's what you want.
>
> Is ZoL ARC hit rate rise from FreeBSD ARC hit rate?

We don't know that. You should be able to find out by running some tests 
that would populate your ARC and run the test again. I see that my 
-DNO_CLEAN buildworlds run faster, when I run them a second or third time 
after making a minor edit, than they did before. Thus I assume it uses 
memory more efficiently. By default it stores more metadata in ARC, 75% 
instead of IIRC 25% by default.

Getting back to your original question. A more efficient ARC would exercise 
your memory more intensely because you are replacing disk reads with memory 
reads. And as I said before the old ZFS "found" weak RAM on three separate 
occasions in three different machines over the last ten years. You're 
advised to replace the marginal memory.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: OpenZFS: kldload zfs.ko freezes on i386 4GB memory

2020-10-30 Thread Cy Schubert

In message <20201030224734.gh2...@zxy.spb.ru>, Slawa Olhovchenkov writes:
> On Fri, Oct 30, 2020 at 03:34:10PM -0700, Cy Schubert wrote:
>
> > In message <20201030220809.gg2...@zxy.spb.ru>, Slawa Olhovchenkov writes:
> > > On Fri, Oct 30, 2020 at 01:53:10PM -0700, Cy Schubert wrote:
> > >
> > > > In message <20201030204622.gf2...@zxy.spb.ru>, Slawa Olhovchenkov write
> s:
> > > > > On Thu, Oct 29, 2020 at 08:13:00PM -0700, Cy Schubert wrote:
> > > > >
> > > > > > In message , qr
> oxan
> > > a 
> > > > > > writes
> > > > > > :
> > > > > > > 
> > > > > > > Hi,
> > > > > > >
> > > > > > > I have an old i386 machine running r364479. After upgrading to
> > > > > > > r367045, running kldload zfs.ko freezes the whole system.
> > > > > > >
> > > > > > > I also tried to replace the 4GB memory with another 2GB one
> > > > > > > and kldload zfs.ko works without freezing the machine.
> > > > > > 
> > > > > > ZFS ARC stresses memory. I've found a number of bad RAM chips over 
> the 
> > > > > > years using ZFS.
> > > > > > 
> > > > > > The OpenZFS upgrade significantly changed how it manages ARC. It's 
> like
> > > ly 
> > > > > > that prior to the OpenZFS upgrade your memory wasn't stressed to th
> e po
> > > int 
> > > > > > of failure. You can try to mask the problem by reducing your RAM cl
> ock 
> > > rate
> > > > >  
> > > > > > or or increase one of the other latency settings in your BIOS. Howe
> ver,
> > >  
> > > > > > again, this only masks an already weak RAM chip.
> > > > >
> > > > > Sounds like performance drop and regression
> > > > 
> > > > How so. Please explain.
> > >
> > > More stresses memory usually refers to performance penalty.
> > > Usually way for better performance is reduce memory access.
> > 
> > The reason filesystems (UFS, ZFS, EXT4, etc.) cache is to avoid disk 
> > accesses. Nanoseconds vs milliseconds.
>
> I mean compared ZoL ZFS ARC vs old (BSD/Opensolaris/Illumos) ZFS ARC.
> Any reaason to rise ARC hit rate in ZoL case?

That's what hit rate is. It's a memory access instead of a disk access. 
That's what you want.


-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

The need of the many outweighs the greed of the few.


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

1 2 3 4 5 >

1 - 100 of 439 matches

Mail list logo