Re: How to upgrade an EOL FreeBSD release or how to make it working again

2024-01-15 Thread John F Carr
Judging by a commit message BSD on the ARM Chromebook didn't work
when support was removed in 2019.

>RK* Exynos* and Meson*/Odroid* don't even work with current
>source code, if someone wants to make them work again they
>better use the Linux DTS.
https://cgit.freebsd.org/src/commit?id=9dfa2a54684978d1d6cef67bbf6242e825801f18

I have one of the "snow" Chromebooks.  The warnings in the web page
https://wiki.freebsd.org/arm/Chromebook led me not to try FreeBSD.
None of the many bugs seemed likely to ever be fixed.  I'm not using it
so I could try an experiment, but fighting with u-boot is not how I want
to spend my days.  Even the popular Raspberry Pi takes skill or luck.

(So "build an arm6 world and copy X, Y, and Z to the DOS partition
on your USB drive" is the kind of advice I need to supplement the old
Chromebook wiki page.)

There is at least a little value in getting it to work because the armv6
code is bit rotting and will go away entirely unless people use it.

John Carr


> On Jan 15, 2024, at 10:59, Mario Marietto  wrote:
> 
> Hello to everyone.
> 
> I'm trying to install FreeBSD 14 natively on my ARM Chromebook model xe303c12 
> ; I've found only one tutorial that teaches how to do that,that's it :
> 
> https://wiki.freebsd.org/arm/Chromebook
> 
> The problem is that it ends with the installation of FreeBSD 11,that's very 
> EOL.
> I can't use it as is. I need to upgrade it to 14 (but I'm on arm 32 
> bit,that's TIER-2,so I can't upgrade it automatically using the 
> freebsd-update script. It is also true that I can't install 14 directly on 
> that machine,as you can read below :
> 
> 
> 
> 
> I've looked all around and I found the tool pkgbase,that I'm talking about on 
> the FreeBSD forum,to understand if it allows the 11 to be usable or 
> upgradable. It does not seem to be the proper tool to achieve my goal. Do you 
> have any suggestions that can help me ? Thanks.
> 
> -- 
> Mario.




make installworld fails because /usr/include/c++/v1/__tuple is a file

2023-12-10 Thread John F Carr
On arm64 running CURRENT from two weeks ago I updated to

  c711af772782 Bump __FreeBSD_version for llvm 17.0.6 merge

and built and installed from source.  make installworld failed:

  install: target directory `/usr/include/c++/v1/__tuple/' does not exist

That pathname is a file:

  -r--r--r--  1 root wheel 20512 Feb 15  2023 /usr/include/c++/v1/__tuple

Early in make output is

  mtree -deU -i -f /usr/src/etc/mtree/BSD.include.dist -p /usr/include
  ./c++/v1/__algorithm/pstl_backends missing (created)
  [...]
  ./c++/v1/__tuple missing (not created: File exists)

Should I remove the file and try again, or is there a more elegant fix?

The word "tuple" does not appear in UPDATING.




sscanf change prevents build of CURRENT

2023-08-30 Thread John F Carr
I had a problem yesterday and today rebuilding a -CURRENT system from source:

  --- magic.mgc ---
  ./mkmagic magic
  magic, 4979: Warning: Current entry does not yet have a description for 
adding a MIME type
  mkmagic: could not find any valid magic files!

The cause was an sscanf call unexpectedly failing to parse the input.  This 
caused
the mkmagic program (internal tool used to build magic number table for file) 
to fail.

If I link mkmagic against the static libc.a in /usr/obj then it works.  So my 
installed
libc.so is broken and the latest source works.  I think.  My installed kernel 
is at
76edfabbecde, the end of the binary integer parsing commit series, so my libc
should be the same.

The program below demonstrates the bug.  See src/contrib/file/src for context.

I am trying to manually compile a working mkmagic and restart the build to get 
unstuck.

#include 
#include 

struct guid {
uint32_t data1;
uint16_t data2;
uint16_t data3;
uint8_t data4[8];
};

int main(int argc, char *argv[])
{
  struct guid g = {0, 0, 0, {0}};
  char *text = "75B22630-668E-11CF-A6D9-00AA0062CE6C";

  if (argc > 1)
text = argv[1];
  int count =
sscanf(text,
   "%8x-%4hx-%4hx-%2hhx%2hhx-%2hhx%2hhx%2hhx%2hhx%2hhx%2hhx",
   , , , [0], [1],
   [2], [3], [4], [5],
   [6], [7]);

  fprintf(stdout,
  
"[%d]:\n%08x-%04hx-%04hx-%02hhx%02hhx-%02hhx%02hhx%02hhx%02hhx%02hhx%02hhx\n",
  count,
  g.data1, g.data2, g.data3, g.data4[0], g.data4[1],
  g.data4[2], g.data4[3], g.data4[4], g.data4[5],
  g.data4[6], g.data4[7]);
  return count != 11;
}




Re: shell hung in fork system call

2023-07-10 Thread John F Carr



> On Jul 9, 2023, at 19:59, Konstantin Belousov  wrote:
> 
> On Sun, Jul 09, 2023 at 11:36:03PM +0000, John F Carr wrote:
>> 
>> 
>>> On Jul 9, 2023, at 19:25, Konstantin Belousov  wrote:
>>> 
>>> On Sun, Jul 09, 2023 at 10:41:27PM +, John F Carr wrote:
>>>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
>>>> irrelevant local changes, four 64 bit ARM processors, make.conf sets 
>>>> CPUTYPE?=cortex-a57.
>>>> 
>>>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung in 
>>>> the middle of fork().
>>>> 
>>>>> From the terminal:
>>>> 
>>>> # git log --oneline --|more
>>>> ^C^C^C
>>>> load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>>>> load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
>>>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>>>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>>>> 
>>>> According to ps -d on another terminal the shell has no children:
>>>> 
>>>> PID TT  STAT   TIME COMMAND
>>>> [...]
>>>> 873 u0  IWs 0:00.00 `-- login [pam] (login)
>>>> 874 u0  I   0:00.17   `-- -sh (sh)
>>>> 95504 u0  I   0:00.01 `-- su -
>>>> 95505 u0  D+  0:00.05   `-- -su (sh)
>>>> [...]
>>>> 
>>>> Nothing on the (115200 bps serial) console.  No change in system 
>>>> performance.
>>>> 
>>>> The system is busy copying a large amount of data from the network to a 
>>>> ZFS pool on spinning disks.  The git|more pipeline could have taken some 
>>>> time to get going while I/O requests worked their way through the queue.  
>>>> It would not have touched the busy pool, only the zroot pool on an SSD.
>>>> 
>>>> Has anything changed recently that might cause this?
>>> 
>>> There was some change around fork, but your sleep seems to be not from
>>> that change.  Can you show the wait channel for the process?  Do something
>>> like
>>> $ ps alxww
>>> 
>> 
>> UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN   STAT TTTIME COMMAND
>>   0 95505 95504  2  20  0 13508  2876 fork D+   u0 0:00.13 -su (sh)
>> 
>> This is probably the same information displayed as [fork] in the output from 
>> ^T.
>> 
>> Does it correspond to the source line
>> 
>> pause("fork", hz / 2);
>> 
>> ?
> 
> Yes, it is rate-limiting code.  Still it is interesting to see the whole
> ps output.
> 
> Do you have 7a70f17ac4bd64dc1a5020f in your source?

No, I do not have that commit.

The comment mentions livelock.  CPU use as reported by iostat did not change 
after the process hung.






Re: shell hung in fork system call

2023-07-09 Thread John F Carr



> On Jul 9, 2023, at 19:25, Konstantin Belousov  wrote:
> 
> On Sun, Jul 09, 2023 at 10:41:27PM +0000, John F Carr wrote:
>> Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
>> irrelevant local changes, four 64 bit ARM processors, make.conf sets 
>> CPUTYPE?=cortex-a57.
>> 
>> I typed ^C while /bin/sh was starting a pipeline and my shell got hung in 
>> the middle of fork().
>> 
>>> From the terminal:
>> 
>> # git log --oneline --|more
>> ^C^C^C
>> load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>> load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
>> mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
>> fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
>> 
>> According to ps -d on another terminal the shell has no children:
>> 
>>  PID TT  STAT   TIME COMMAND
>> [...]
>>  873 u0  IWs 0:00.00 `-- login [pam] (login)
>>  874 u0  I   0:00.17   `-- -sh (sh)
>> 95504 u0  I   0:00.01 `-- su -
>> 95505 u0  D+  0:00.05   `-- -su (sh)
>> [...]
>> 
>> Nothing on the (115200 bps serial) console.  No change in system performance.
>> 
>> The system is busy copying a large amount of data from the network to a ZFS 
>> pool on spinning disks.  The git|more pipeline could have taken some time to 
>> get going while I/O requests worked their way through the queue.  It would 
>> not have touched the busy pool, only the zroot pool on an SSD.
>> 
>> Has anything changed recently that might cause this?
> 
> There was some change around fork, but your sleep seems to be not from
> that change.  Can you show the wait channel for the process?  Do something
> like
> $ ps alxww
> 

 UID   PID  PPID  C PRI NI   VSZ   RSS MWCHAN   STAT TTTIME COMMAND
   0 95505 95504  2  20  0 13508  2876 fork D+   u0 0:00.13 -su (sh)

This is probably the same information displayed as [fork] in the output from ^T.

Does it correspond to the source line

pause("fork", hz / 2);

?




shell hung in fork system call

2023-07-09 Thread John F Carr
Kernel and system at a146207d66f320ed239c1059de9df854b66b55b7 plus some 
irrelevant local changes, four 64 bit ARM processors, make.conf sets 
CPUTYPE?=cortex-a57.

I typed ^C while /bin/sh was starting a pipeline and my shell got hung in the 
middle of fork().

>From the terminal:

# git log --oneline --|more
^C^C^C
load: 3.26  cmd: sh 95505 [fork] 5308.67r 0.00u 0.03s 0% 2860k
mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 
load: 3.16  cmd: sh 95505 [fork] 5311.75r 0.00u 0.03s 0% 2860k
mi_switch+0x198 sleepq_switch+0xfc sleepq_timedwait+0x40 _sleep+0x264 
fork1+0x67c sys_fork+0x34 do_el0_sync+0x4c8 handle_el0_sync+0x44 

According to ps -d on another terminal the shell has no children:

  PID TT  STAT   TIME COMMAND
[...]
  873 u0  IWs 0:00.00 `-- login [pam] (login)
  874 u0  I   0:00.17   `-- -sh (sh)
95504 u0  I   0:00.01 `-- su -
95505 u0  D+  0:00.05   `-- -su (sh)
[...]

Nothing on the (115200 bps serial) console.  No change in system performance.

The system is busy copying a large amount of data from the network to a ZFS 
pool on spinning disks.  The git|more pipeline could have taken some time to 
get going while I/O requests worked their way through the queue.  It would not 
have touched the busy pool, only the zroot pool on an SSD.

Has anything changed recently that might cause this?





Re: For snapshot builds: armv7 chroot on aarch64 has kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin hung up [in getpid?], unkillable, prevents reboot

2023-07-07 Thread John F Carr
On Jul 6, 2023, at 20:42, Mike Karels  wrote:
> 
> 
> Thanks for isolating this.  Let me know when you have the bug number.
> I just tested a fix (the compat code drops the reference on the current
> address space an extra time, probably freeing it).
> 
> Mike

The bug was introduced in January, 2022.   It allows 32 bit binaries to crash a 
64 bit system when COMPAT_FREEBSD32 is on.  Test coverage of the buggy function 
(sysctl_kern_proc_vm_layout) was added at the same time.

There should be routine runs of 32 bit test suites on 64 bit systems.  Although 
i386 and armv7 are tier 2 systems, the tier 1 COMPAT_FREEBSD32 kernel code 
needs to be exercised.  This bug was only discovered by manually running tests 
in the right environment, 17 months after automated testing could have 
discovered it.





Re: For snapshot builds: armv7 chroot on aarch64 has kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin hung up [in getpid?], unkillable, prevents reboot

2023-07-06 Thread John F Carr



> On Jun 25, 2023, at 20:16, Mark Millard  wrote:
> 
> Using the likes of:
> 
> FreeBSD-14.0-CURRENT-arm64-aarch64-ROCK64-20230622-b95d2237af40-263748.img
> and:
> FreeBSD-14.0-CURRENT-arm-armv7-GENERICSD-20230622-b95d2237af40-263748.img
> 
> I have shown the following behavior after setting up storage
> media based on them. (This was a test that my builds were not
> odd for the issue.)
> 
> Boot the aarch64 media and log in. (Note: I logged in
> as root.)
> 
> mount the armv7 media (-noatime is just my habit)
> and then put it to use:
> 
> # mount -onoatime /dev/da1s2a /mnt
> 
> # chroot /mnt/
> 
> # kyua test -k /usr/tests/Kyuafile sys/kern/kern_copyin
> sys/kern/kern_copyin:kern_copyin  ->  
> 
> On the serial console:
> 
> # ps -xu
> USER  PID   %CPU %MEM   VSZ  RSS TT  STAT STARTED  TIME COMMAND
> root   11 1498.4  0.0 0  256  -  RNL  23:24   542:52.92 [idle]
> root 1174  100.0  0.0 0   16  -  Rs   23:37 0:00.00 
> /usr/tests/sys/kern/kern_copyin -vunprivileged-user=tests 
> -r/tmp/kyua.9YUttj/2/result.atf kern_copyin
> root00.0  0.0 0 1616  -  DLs  23:24 0:00.50 [kernel]
> root10.0  0.0 11704 1288  -  ILs  23:24 0:00.02 /sbin/init
> root20.0  0.0 0  256  -  WL   23:24 0:00.26 [clock]
> root30.0  0.0 0  272  -  DL   23:24 0:00.00 [crypto]
> root40.0  0.0 0   80  -  DL   23:24 0:00.95 [cam]
> root50.0  0.0 0   16  -  DL   23:24 0:00.00 [busdma]
> root60.0  0.0 0   16  -  DL   23:24 0:00.03 [rand_harvestq]
> root70.0  0.0 0   48  -  DL   23:24 0:00.06 [pagedaemon]
> root80.0  0.0 0   16  -  DL   23:24 0:00.00 [vmdaemon]
> root90.0  0.0 0  160  -  DL   23:24 0:00.38 [bufdaemon]
> root   100.0  0.0 0   16  -  DL   23:24 0:00.00 [audit]
> root   120.0  0.0 0  880  -  WL   23:24 0:11.81 [intr]
> root   130.0  0.0 0   48  -  DL   23:24 0:00.04 [geom]
> root   140.0  0.0 0   16  -  DL   23:24 0:00.00 [sequencer 00]
> root   150.0  0.0 0  160  -  DL   23:24 0:06.42 [usb]
> root   160.0  0.0 0   16  -  DL   23:24 0:00.10 [acpi_thermal]
> root   170.0  0.0 0   16  -  DL   23:24 0:00.00 [acpi_cooling0]
> root   180.0  0.0 0   16  -  DL   23:24 0:00.04 [syncer]
> root   190.0  0.0 0   16  -  DL   23:24 0:00.00 [vnlru]
> root  6710.0  0.0 13260 2600  -  Is   23:25 0:00.00 dhclient: 
> system.syslog (dhclient)
> root  6740.0  0.0 13260 2752  -  Is   23:25 0:00.00 dhclient: dpni0 
> [priv] (dhclient)
> root  7610.0  0.0 14572 3972  -  Ss   23:25 0:00.02 /sbin/devd
> root  9640.0  0.0 12832 2764  -  Is   23:25 0:00.02 /usr/sbin/syslogd 
> -s
> root 10330.0  0.0 13012 2604  -  Ss   23:25 0:00.01 /usr/sbin/cron -s
> root 10580.0  0.0 21052 8308  -  Is   23:25 0:00.01 sshd: 
> /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
> root 10780.0  0.0 21288 9304  -  Is   23:26 0:00.09 sshd: root@pts/0 
> (sshd)
> root 11750.0  0.0 21288 9496  -  Is   23:37 0:00.04 sshd: root@pts/1 
> (sshd)
> root 10740.0  0.0 13380 3008 u0  Is   23:25 0:00.01 login [pam] 
> (login)
> root 10750.0  0.0 13460 3292 u0  S23:25 0:00.02 -sh (sh)
> root 12330.0  0.0 13588 3016 u0  R+   00:00 0:00.00 ps -xu
> root 10810.0  0.0 13460 3328  0  Is   23:26 0:00.02 -sh (sh)
> root 11700.0  0.0  5788 2884  0  I23:36 0:00.02 /bin/sh -i
> root 11720.0  0.0 10408 7192  0  I+   23:37 0:00.30 kyua test -k 
> /usr/tests/Kyuafile sys/kern/kern_copyin
> root 11780.0  0.0 13460 3320  1  Is+  23:38 0:00.01 -sh (sh)
> 
> 1174 is stuck, even if one waits for 30min+.
> kill and kill -9 will not kill 1174.
> 
> "shutdown -r now" hangs before the reboot happens
> and reports: "some processes would not die".
> 
> An interesting property is that ps and top disagree
> about 1174 CPU usage: ps 100%, top 0%. But top also
> indicates 1174 always has CPU0 "STATE". (Across
> tests CPUn varies but within a test it has
> a fixed n.)
> 
> I have also seen ps "STAT" being RXs.
> 
> The following is from my earlier activity with my own
> builds involved, here 1119, not the 1174 from above.
> truss reports as the last thing for the stuck process
> as "getpid()".
> 
> . . .
> 1119: 0.588983953 fstatat(AT_FDCWD,"/usr/tests/sys/kern/kern_copyin",{ 
> mode=-r-xr-xr-x ,inode=111756,size=9776,blksize=10240 },AT_SYMLINK_NOFOLLOW) 
> = 0 (0x0)
> 1119: 0.589065030 
> mmap(0x0,20480,PROT_READ|PROT_WRITE,MAP_PRIVATE|MAP_ANON|MAP_ALIGNED(12),-1,0x0)
>  = 1074188288 (0x4006d000)
> 1119: 0.589227544 
> openat(AT_FDCWD,"/tmp/kyua.aBQv6E/2/result.atf",O_WRONLY|O_CREAT|O_TRUNC,0644)
>  = 3 (0x3)
> 1119: 0.589276503 getpid()  = 1119 (0x45f)
> 
> 
> 
> For reference, from inside an armv7 chroot session
> before doing such a test:
> 
> # uname -apKU
> FreeBSD generic 

Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit

2023-06-26 Thread John F Carr


> On Jun 26, 2023, at 04:32, Mark Millard  wrote:
> 
> On Jun 24, 2023, at 17:25, Mark Millard  wrote:
> 
>> On Jun 24, 2023, at 14:26, John F Carr  wrote:
>> 
>>> 
>>>> On Jun 24, 2023, at 13:00, Mark Millard  wrote:
>>>> 
>>>> The running system build is a non-debug build (but
>>>> with symbols not stripped).
>>>> 
>>>> The HoneyComb's console log shows:
>>>> 
>>>> . . .
>>>> GEOM_STRIPE: Device stripe.IMfBZr destroyed.
>>>> GEOM_NOP: Device md0.nop created.
>>>> g_vfs_done():md0.nop[READ(offset=5885952, length=8192)]error = 5
>>>> GEOM_NOP: Device md0.nop removed.
>>>> GEOM_NOP: Device md0.nop created.
>>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
>>>> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
>>>> GEOM_NOP: Device md0.nop removed.
>>>> GEOM_NOP: Device md0.nop created.
>>>> GEOM_NOP: Device md0.nop removed.
>>>> Fatal data abort:
>>>> x0: a02506e64400
>>>> x1: 0001ea401880 (g_raid3_post_sync + 3a145f8)
>>>> x2:   4b
>>>> x3: a343932b0b22fb30
>>>> x4:0
>>>> x5:  3310b0d062d0e1d
>>>> x6: 1d0e2d060d0b3103
>>>> x7:0
>>>> x8: ea325df8
>>>> x9: 0001eec946d0 ($d.6 + 0)
>>>> x10: 0001ea401880 (g_raid3_post_sync + 3a145f8)
>>>> x11:0
>>>> x12:0
>>>> x13: 00cd8960 (lock_class_mtx_sleep + 0)
>>>> x14:0
>>>> x15: a02506e64405
>>>> x16: 0001eec94860 (_DYNAMIC + 160)
>>>> x17: 0063a450 (ifc_attach_cloner + 0)
>>>> x18: 0001eb290400 (g_raid3_post_sync + 48a3178)
>>>> x19: 0001eec94600 (vnet_epair_init_vnet_init + 0)
>>>> x20: 00fa5b68 (vnet_sysinit_sxlock + 18)
>>>> x21: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>>> x22: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>>> x23: a042e500
>>>> x24: a042e500
>>>> x25: 00ce0788 (linker_lookup_set_desc + 0)
>>>> x26: a0203cdef780
>>>> x27: 0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + 0)
>>>> x28: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
>>>> x29: 0001eb290430 (g_raid3_post_sync + 48a31a8)
>>>> sp: 0001eb290400
>>>> lr: 0001eec82a4c ($x.1 + 3c)
>>>> elr: 0001eec82a60 ($x.1 + 50)
>>>> spsr: 6045
>>>> far: 0002d8fba4c8
>>>> esr: 9646
>>>> panic: vm_fault failed: 0001eec82a60 error 1
>>>> cpuid = 14
>>>> time = 1687625470
>>>> KDB: stack backtrace:
>>>> db_trace_self() at db_trace_self
>>>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>>>> vpanic() at vpanic+0x13c
>>>> panic() at panic+0x44
>>>> data_abort() at data_abort+0x2fc
>>>> handle_el1h_sync() at handle_el1h_sync+0x14
>>>> --- exception, esr 0x9646
>>>> $x.1() at $x.1+0x50
>>>> vnet_register_sysinit() at vnet_register_sysinit+0x114
>>>> linker_load_module() at linker_load_module+0xae4
>>>> kern_kldload() at kern_kldload+0xfc
>>>> sys_kldload() at sys_kldload+0x60
>>>> do_el0_sync() at do_el0_sync+0x608
>>>> handle_el0_sync() at handle_el0_sync+0x44
>>>> --- exception, esr 0x5600
>>>> KDB: enter: panic
>>>> [ thread pid 70419 tid 101003 ]
>>>> Stopped at  kdb_enter+0x44: str xzr, [x19, #3200]
>>>> db> 
>>> 
>>> The failure appears to be initializing module if_epair.
>> 
>> Yep: trying:
>> 
>> # kldload if_epair.ko
>> 
>> was enough to cause the crash. (Just a HoneyComb context at
>> that point.)
>> 
>> I tried media dd'd from the recent main snapshot, booting the
>> same system. No crash. I moved my build boot media to some
>> other systems and tested them: crashes. I tried my boot media
>> built optimized for Cortex-A53 or Cortex-X1C/Cortex-A78C
>> instead of Cortex-A72: no crashes. (But only one system can
>> use the X1C/A78C code in that build.)
>> 
>> So variation testing only gets the crashes for my builds
>> that are code-optimized for Cortex-A72's. The same source
>> tree vinta

Re: aarch64 main-n263493-4e8d558c9d1c-dirty (so: 2023-Jun-10) Kyuafile run: "Fatal data abort" crash during vnet_register_sysinit

2023-06-24 Thread John F Carr


> On Jun 24, 2023, at 13:00, Mark Millard  wrote:
> 
> The running system build is a non-debug build (but
> with symbols not stripped).
> 
> The HoneyComb's console log shows:
> 
> . . .
> GEOM_STRIPE: Device stripe.IMfBZr destroyed.
> GEOM_NOP: Device md0.nop created.
> g_vfs_done():md0.nop[READ(offset=5885952, length=8192)]error = 5
> GEOM_NOP: Device md0.nop removed.
> GEOM_NOP: Device md0.nop created.
> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
> g_vfs_done():md0.nop[READ(offset=5935104, length=4096)]error = 5
> GEOM_NOP: Device md0.nop removed.
> GEOM_NOP: Device md0.nop created.
> GEOM_NOP: Device md0.nop removed.
> Fatal data abort:
>  x0: a02506e64400
>  x1: 0001ea401880 (g_raid3_post_sync + 3a145f8)
>  x2:   4b
>  x3: a343932b0b22fb30
>  x4:0
>  x5:  3310b0d062d0e1d
>  x6: 1d0e2d060d0b3103
>  x7:0
>  x8: ea325df8
>  x9: 0001eec946d0 ($d.6 + 0)
> x10: 0001ea401880 (g_raid3_post_sync + 3a145f8)
> x11:0
> x12:0
> x13: 00cd8960 (lock_class_mtx_sleep + 0)
> x14:0
> x15: a02506e64405
> x16: 0001eec94860 (_DYNAMIC + 160)
> x17: 0063a450 (ifc_attach_cloner + 0)
> x18: 0001eb290400 (g_raid3_post_sync + 48a3178)
> x19: 0001eec94600 (vnet_epair_init_vnet_init + 0)
> x20: 00fa5b68 (vnet_sysinit_sxlock + 18)
> x21: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x22: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x23: a042e500
> x24: a042e500
> x25: 00ce0788 (linker_lookup_set_desc + 0)
> x26: a0203cdef780
> x27: 0001eec94698 (__set_sysinit_set_sym_if_epairmodule_sys_init + 0)
> x28: 00d8e000 (sdt_vfs_vop_vop_spare4_return + 0)
> x29: 0001eb290430 (g_raid3_post_sync + 48a31a8)
>  sp: 0001eb290400
>  lr: 0001eec82a4c ($x.1 + 3c)
> elr: 0001eec82a60 ($x.1 + 50)
> spsr: 6045
> far: 0002d8fba4c8
> esr: 9646
> panic: vm_fault failed: 0001eec82a60 error 1
> cpuid = 14
> time = 1687625470
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> data_abort() at data_abort+0x2fc
> handle_el1h_sync() at handle_el1h_sync+0x14
> --- exception, esr 0x9646
> $x.1() at $x.1+0x50
> vnet_register_sysinit() at vnet_register_sysinit+0x114
> linker_load_module() at linker_load_module+0xae4
> kern_kldload() at kern_kldload+0xfc
> sys_kldload() at sys_kldload+0x60
> do_el0_sync() at do_el0_sync+0x608
> handle_el0_sync() at handle_el0_sync+0x44
> --- exception, esr 0x5600
> KDB: enter: panic
> [ thread pid 70419 tid 101003 ]
> Stopped at  kdb_enter+0x44: str xzr, [x19, #3200]
> db> 

The failure appears to be initializing module if_epair.  I see no recent 
changes in that module that would be likely to break initialization.

a9bfd080d09a if_epair: do not transmit packets that exceed the interface MTU
4d846d260e2b spdx: The BSD-2-Clause-FreeBSD identifier is obsolete, drop 
-FreeBSD
a6b55ee6be15 net: replace IFF_KNOWSEPOCH with IFF_NEEDSEPOCH
c69ae8419734 if_epair: also remove vlan metadata from mbufs
29c9b1673305 epair: Remove unneeded includes and sort some of the rest








Can't build with INVARIANTS but not WITNESS

2022-04-27 Thread John F Carr
My -CURRENT kernel has INVARIANTS (inherited from GENERIC) but not WITNESS:

include GENERIC
ident   STRIATUS
nooptions   WITNESS
nooptions   WITNESS_SKIPSPIN

My kernel build fails:

/usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:102:13: error: variable 'line' 
set but not used [-Werror,-Wunused-but-set-variable]
int flags, line __diagused;
   ^
/usr/home/jfc/freebsd/src/sys/kern/vfs_lookup.c:101:14: error: variable 'file' 
set but not used [-Werror,-Wunused-but-set-variable]
const char *file __diagused;

The problem is, __diagused expands to nothing if INVARIANTS _or_ WITNESS is 
defined, but the variable in vfs_lookup.c is only used if WITNESS is defined.

#if defined(INVARIANTS) || defined(WITNESS)
#define __diagused
#else
#define __diagused  __unused
#endif

I think this code is trying to be too clever and causing more trouble than it 
prevents.  Change the || to &&, or replace __diagused with __unused everywhere.





Emacs tramp mode doesn't work with CURRENT

2020-01-28 Thread John F Carr
I use emacs tramp mode, which opens an ssh connection to a remote machine for 
file access.  It works to Linux and FreeBSD 12.1, but not to CURRENT.  There 
has been a change in the way characters are echoed by the shell, with 12.1 
treating a consecutive run of backspace as an atomic unit and CURRENT 
processing them one at a time.  This is not necessarily a bug, but it is a 
nuisance and independently it is suboptimal.

I would like to blame libedit, which changed since 12.1.  I didn't see any 
changes in pty code and the problem happens with at least two different shells. 
 It could also be caused by a change to sshd or something I haven't through of.

Here is a longer explanation.

Emacs tramp mode opens an ssh connection to a remote machine.  It doesn't want 
to see input echoed back so it runs

stty -inlcr -onlcr -echo kill '^U' erase '^H'

This doesn't do anything useful if a shell is running in line editing mode 
(raw) instead of using the tty (cooked).  So tramp falls back to a hack to 
detect echoed input.  It sends "_echo" followed by a string of backspace 
characters.  "_echo" is unlikely to appear in program output.

Here is the next command after the initial stty:

_echo^H^H^H^H^Hstty icanon erase ^H cols 32767_echo^H^H^H^H^H

The groups of 5 ^H represent 5 backspace characters and the lone ^H in the 
middle is a two character sequence for stty.

The terminal output from a 12.1 system is

_echo^H ^H^H ^H^H ^H^H ^H^H ^Hstty icanon erase ^H cols 32767_echo^H ^H^H ^H^H 
^H^H ^H^H ^H
#$ 

where again the middle ^H is a two character sequence and the others are 
backspace characters.  There is a carriage return between the two lines.  "#$ " 
is the shell prompt set by tramp.

The terminal output from a CURRENT system is

_echo
#$ _ech ^H
#$ _ec ^H
#$ _e ^H
#$ _ ^H
#$  ^Hstty icanon erase ^H cols 32767_echo
#$ stty icanon erase ^H cols 32767_ech ^H
#$ stty icanon erase ^H cols 32767_ec ^H
#$ stty icanon erase ^H cols 32767_e ^H
#$ stty icanon erase ^H cols 32767_ ^H
#$ stty icanon erase ^H cols 32767 ^H
#$ 

with carriage returns between lines.  This does not make sense to emacs.

I tried both /bin/sh and /bin/csh as shells and tramp didn't work with either.  
I put set +V and set +E in my .profile thinking that would turn off line 
editing but there was no change.  Probably the shell still takes raw input.

A possible complicating factor is the CURRENT machines are both 64 bit ARM and 
the 12.1 machine is amd64.  One has unsigned characters, the other signed.  
Shouldn't matter, but I haven't tried 12.1 on ARM so I can't swear it works.
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"