Re: panic(s) in ZFS on CURRENT

2023-06-08 Thread Gleb Smirnoff
On Thu, Jun 08, 2023 at 07:56:07PM -0700, Gleb Smirnoff wrote:
T> I'm switching to INVARIANTS kernel right now and will see if that panics 
earlier.

This is what I got with INVARIANTS:

panic: VERIFY3(dev->l2ad_hand <= dev->l2ad_evict) failed (225142071296 <= 
225142063104)

cpuid = 17
time = 1686286015
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2c/frame 0xfe0160dcea90
kdb_backtrace() at kdb_backtrace+0x46/frame 0xfe0160dceb40
vpanic() at vpanic+0x21f/frame 0xfe0160dcebe0
spl_panic() at spl_panic+0x4d/frame 0xfe0160dcec60
l2arc_write_buffers() at l2arc_write_buffers+0xcda/frame 0xfe0160dcedf0
l2arc_feed_thread() at l2arc_feed_thread+0x547/frame 0xfe0160dceec0
fork_exit() at fork_exit+0x122/frame 0xfe0160dcef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe0160dcef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
Uptime: 1m4s
Dumping 5473 out of 65308 MB:..1%..11%..21%..31%..41%..51%..61%..71%..81%..91%

(kgdb) frame 4
#4  0x804342ea in l2arc_write_buffers (spa=0xfe022e942000, 
dev=0xfe023116a000, target_sz=16777216)
at /usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:9445
9445ASSERT3U(dev->l2ad_hand, <=, dev->l2ad_evict);
(kgdb) p dev
$1 = (l2arc_dev_t *) 0xfe023116a000
(kgdb) p dev->l2ad_hand 
$2 = 225142071296
(kgdb) p dev->l2ad_evict
$3 = 225142063104
(kgdb) p *dev
value of type `l2arc_dev_t' requires 66136 bytes, which is more than 
max-value-size

Never seen kgdb not being able to print a structure that reported to be too big.

-- 
Gleb Smirnoff



panic(s) in ZFS on CURRENT

2023-06-08 Thread Gleb Smirnoff
  Hi,

I got several panics on my desktop running eb2b00da564 which is
after the latest OpenZFS merge.

#1 (couple cores with this backtrace)

--- trap 0x9, rip = 0x803ab94b, rsp = 0xfe022e45ed30, rbp = 
0xfe022e45ed50 ---
buf_hash_insert() at buf_hash_insert+0xab/frame 0xfe022e45ed50
arc_write_done() at arc_write_done+0xfa/frame 0xfe022e45ed90
zio_done() at zio_done+0xf0b/frame 0xfe022e45ee00
zio_execute() at zio_execute+0x9f/frame 0xfe022e45ee40
taskqueue_run_locked() at taskqueue_run_locked+0x181/frame 0xfe022e45eec0
taskqueue_thread_loop() at taskqueue_thread_loop+0xc3/frame 0xfe022e45eef0
fork_exit() at fork_exit+0x7d/frame 0xfe022e45ef30
fork_trampoline() at fork_trampoline+0xe/frame 0xfe022e45ef30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
(kgdb) frame 7
#7  buf_hash_insert (hdr=hdr@entry=0xf8001b21fa28, 
lockp=lockp@entry=0xfe022e45ed60)
at /usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:1062
1062if (HDR_EQUAL(hdr->b_spa, >b_dva, hdr->b_birth, 
fhdr))
(kgdb) p hdr
$1 = (arc_buf_hdr_t *) 0xf8001b21fa28
(kgdb) p *hdr
$2 = {b_dva = {dva_word = {16, 20406677952}}, b_birth = 38447120, b_type = 
ARC_BUFC_METADATA, b_complevel = 255 '\377', b_reserved1 = 0 '\000', 
  b_reserved2 = 0, b_hash_next = 0x0, 
  b_flags = (ARC_FLAG_L2CACHE | ARC_FLAG_IO_IN_PROGRESS | 
ARC_FLAG_BUFC_METADATA | ARC_FLAG_HAS_L1HDR | ARC_FLAG_COMPRESSED_ARC | 
ARC_FLAG_COMPRESS_0 | ARC_FLAG_COMPRESS_1 | ARC_FLAG_COMPRESS_2 | 
ARC_FLAG_COMPRESS_3), b_psize = 8, b_lsize = 32, b_spa = 1230587331341359116, 
b_l2hdr = {
b_dev = 0x0, b_daddr = 0, b_hits = 0, b_arcs_state = ARC_STATE_ANON, 
b_l2node = {list_next = 0x0, list_prev = 0x0}}, b_l1hdr = {b_cv = {
  cv_description = 0x80bb5b02 "hdr->b_l1hdr.b_cv", cv_waiters = 0}, 
b_byteswap = 10 '\n', b_state = 0x80ef23c0 , 
b_arc_node = {list_next = 0x0, list_prev = 0x0}, b_arc_access = 0, 
b_mru_hits = 0, b_mru_ghost_hits = 0, b_mfu_hits = 0, 
b_mfu_ghost_hits = 0, b_bufcnt = 1, b_buf = 0xf80003139d80, b_refcnt = 
{rc_count = 2}, b_acb = 0x0, b_pabd = 0xf80a35dc6480}, 
  b_crypt_hdr = {b_rabd = 0x10, b_ot = 2744191968, b_ebufcnt = 4, b_dsobj = 
38340866, b_salt = "\001\000\000\000\000\000\000", 
b_iv = "\000\000\000\000\000\000\000\000\220\000\026\017", b_mac = "\b\000 
\000\f\230\262m\250\354\023\021\000\000\000"}}

#2 (single core)

--- trap 0x9, rip = 0x803ab94b, rsp = 0xfe0256158780, rbp = 
0xfe02561587a0 ---
buf_hash_insert() at buf_hash_insert+0xab/frame 0xfe02561587a0
arc_hdr_realloc() at arc_hdr_realloc+0x138/frame 0xfe0256158800
arc_read() at arc_read+0x2dc/frame 0xfe02561588b0
dbuf_read() at dbuf_read+0xb3e/frame 0xfe02561589f0
dmu_buf_hold() at dmu_buf_hold+0x46/frame 0xfe0256158a30
zap_cursor_retrieve() at zap_cursor_retrieve+0x167/frame 0xfe0256158a90
zfs_freebsd_readdir() at zfs_freebsd_readdir+0x383/frame 0xfe0256158cc0
VOP_READDIR_APV() at VOP_READDIR_APV+0x1f/frame 0xfe0256158ce0
kern_getdirentries() at kern_getdirentries+0x186/frame 0xfe0256158dd0
sys_getdirentries() at sys_getdirentries+0x29/frame 0xfe0256158e00
amd64_syscall() at amd64_syscall+0x100/frame 0xfe0256158f30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0256158f30
(kgdb) frame 7
#7  buf_hash_insert (hdr=0xf80c906b03e8, lockp=lockp@entry=0x0) at 
/usr/src/FreeBSD/sys/contrib/openzfs/module/zfs/arc.c:1062
1062if (HDR_EQUAL(hdr->b_spa, >b_dva, hdr->b_birth, 
fhdr))
(kgdb) p *hdr
$1 = {b_dva = {dva_word = {16, 19965896928}}, b_birth = 36629088, b_type = 
ARC_BUFC_METADATA, b_complevel = 0 '\000', b_reserved1 = 0 '\000', 
  b_reserved2 = 0, b_hash_next = 0x0, 
  b_flags = (ARC_FLAG_BUFC_METADATA | ARC_FLAG_HAS_L1HDR | ARC_FLAG_HAS_L2HDR | 
ARC_FLAG_COMPRESSED_ARC | ARC_FLAG_COMPRESS_1), b_psize = 5, 
  b_lsize = 5, b_spa = 3583499065027950438, b_l2hdr = {b_dev = 
0xfe02306c8000, b_daddr = 4917395456, b_hits = 0, 
b_arcs_state = ARC_STATE_MRU, b_l2node = {list_next = 0xf801313fc9b0, 
list_prev = 0xf801313fca70}}, b_l1hdr = {b_cv = {
  cv_description = 0x80bb5b02 "hdr->b_l1hdr.b_cv", cv_waiters = 0}, 
b_byteswap = 10 '\n', 
b_state = 0x80f02900 , b_arc_node = {list_next = 0x0, 
list_prev = 0x0}, b_arc_access = 0, b_mru_hits = 0, 
b_mru_ghost_hits = 0, b_mfu_hits = 0, b_mfu_ghost_hits = 0, b_bufcnt = 0, 
b_buf = 0x0, b_refcnt = {rc_count = 0}, b_acb = 0x0, b_pabd = 0x0}, 
  b_crypt_hdr = {b_rabd = 0x10, b_ot = 2786027712, b_ebufcnt = 4, b_dsobj = 
36629088, b_salt = "\001\000\000\000\000\000\000", 
b_iv = "\240\2769$\001\370\377\377\220\000\036\017", b_mac = "\b\000 
\000fw\357\327i%\2731\000\200l0"}}

#3 (not ZFS, but VFS, could be related?)

--- trap 0x9, rip = 0x80801408, rsp = 0xfe02348cbcc0, rbp = 
0xfe02348cbcf0 ---
pwd_chdir() at pwd_chdir+0x28/frame 0xfe02348cbcf0
kern_chdir() at kern_chdir+0x169/frame 

Re: CURRRENT snapshot won't boot due missing ZFS feature

2023-06-08 Thread Warner Losh
On Thu, Jun 8, 2023, 11:18 AM Michael Gmelin  wrote:

>
>
> On Thu, 8 Jun 2023 19:06:23 +0200
> Michael Gmelin  wrote:
>
> > On Thu, 8 Jun 2023 16:20:12 +
> > Glen Barber  wrote:
> >
> > > On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote:
> > > > Hi,
> > > >
> > > > I didn't dig into this yet.
> > > >
> > > > After installing the current 14-snapshot (June 1st) in a
> > > > bhyve-vm, I get this on boot:
> > > >
> > > >   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
> > > >
> > > > (booting stops at this point)
> > > >
> > > > Seems like the boot loader is missing this recently added feature.
> > > >
> > >
> > > Can you try today's snapshot?  They are propagated to most mirrors
> > > now.
> > >
> >
> > Tried today's snaphot, same problem.
> >
> >   # reboot
> >   Waiting (max 60 seconds) for system process `vnlru' to stop... done
> >   Waiting (max 60 seconds) for system process `syncer' to stop...
> >   Syncing disks, vnodes remaining... 0 0 0 0 done
> >   All buffers synced.
> >   Uptime: 4m14s
> >   Consoles: userboot
> >
> >   FreeBSD/amd64 User boot lua, Revision 1.2
> >   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
> >   ERROR: cannot open /boot/lua/loader.lua: no such file or directory.
> >
> >
> >   Type '?' for a list of commands, 'help' for more detailed help.
> >   OK
> >
> >
> > That's after installing CURRENT in a fresh vm managed by vm-bhyve
> > using bsdinstall's automatic ZFS option.
> >
>
> Thinking about this, it's possible that vm-bhyve is using the zfs boot
> loader from the host machine.
>
> Please consider this noise, unless you hear from me again.
>

Yes. It does. This can be an unfortunate design choice at times.

Warner

Best
> Michael
>
> --
> Michael Gmelin
>
>


Re: CURRRENT snapshot won't boot due missing ZFS feature

2023-06-08 Thread Michael Gmelin



On Thu, 8 Jun 2023 19:06:23 +0200
Michael Gmelin  wrote:

> On Thu, 8 Jun 2023 16:20:12 +
> Glen Barber  wrote:
> 
> > On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote:  
> > > Hi,
> > > 
> > > I didn't dig into this yet.
> > > 
> > > After installing the current 14-snapshot (June 1st) in a
> > > bhyve-vm, I get this on boot:
> > > 
> > >   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
> > > 
> > > (booting stops at this point)
> > > 
> > > Seems like the boot loader is missing this recently added feature.
> > > 
> > 
> > Can you try today's snapshot?  They are propagated to most mirrors
> > now.
> >   
> 
> Tried today's snaphot, same problem.
> 
>   # reboot
>   Waiting (max 60 seconds) for system process `vnlru' to stop... done
>   Waiting (max 60 seconds) for system process `syncer' to stop... 
>   Syncing disks, vnodes remaining... 0 0 0 0 done
>   All buffers synced.
>   Uptime: 4m14s
>   Consoles: userboot  
> 
>   FreeBSD/amd64 User boot lua, Revision 1.2
>   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
>   ERROR: cannot open /boot/lua/loader.lua: no such file or directory.
> 
> 
>   Type '?' for a list of commands, 'help' for more detailed help.
>   OK 
> 
> 
> That's after installing CURRENT in a fresh vm managed by vm-bhyve
> using bsdinstall's automatic ZFS option.
> 

Thinking about this, it's possible that vm-bhyve is using the zfs boot
loader from the host machine.

Please consider this noise, unless you hear from me again.

Best
Michael

-- 
Michael Gmelin



OpenSSL 3.0 in the base system update

2023-06-08 Thread Ed Maste
As previously mentioned[1] FreeBSD 14.0 will include OpenSSL 3.0.  We
expect to merge the update to main in the near future (within the next
week or two) and are ready for wider testing.

Supported by the FreeBSD Foundation, Pierre Pronchery has been working
on the update in the src tree, with assistance from Enji Cooper
(ngie@), and me (emaste@). Thanks to Antoine Brodin (antoine@) and
Muhammad Moinur Rahman (bofh@) for ports exp-runs and
fixes/workarounds and to Dag-Erling (des@) for updating ldns in the
base system.

## Base system compatibility status

Most of the base system is ready for a seamless switch to OpenSSL 3.0.
For several components we've added `-DOPENSSL_API_COMPAT=0x1010L`
to CFLAGS to specify the API version, which avoids deprecation
warnings from OpenSSL 3.0. Changes have also been made to avoid
OpenSSL APIs already deprecated in OpenSSL 1.1. We can continue the
process of updating to contemporary APIs after OpenSSL 3.0 is in the
tree.

Additional changes are still required for libarchive and seven
Kerberos-related libraries or tools. Workarounds are ready to go along
with the OpenSSL 3 import, and proper fixes are in progress in the
upstream projects.

A segfault from `openssl x509` in the i386 ports exp-run is under
investigation and needs to be addressed prior to the merge.

## Ports compatibility

With bofh@'s recent www/node18 and www/node20 patches the ports tree
is in reasonable shape for OpenSSL 3.0 in the base system. The exp-run
(link below) has a list of the failing ports, and I've emailed all of
the maintainers as a heads-up. None of the remaining failures are
responsible for a large number of skipped ports (i.e., the failures
are either leaf ports or are responsible for only a small number of
skipped ports). I expect that some or many of these will need to be
addressed after the change lands in the src tree.

## Call for testing

We welcome feedback from anyone willing to test the work in progress.
Pierre's update can be obtained from the pull request[2] or by
fetching the branch[3]. If desired I will provide a large diff against
main.

## Links

- Base system OpenSSL 3.0 update tracking PR:
  https://bugs.freebsd.org/271615

- Ports exp-run with OpenSSL 3.0 in the base system:
  https://bugs.freebsd.org/271656

[1] https://lists.freebsd.org/archives/freebsd-current/2023-May/003609.html
[2] https://github.com/freebsd/freebsd-src/pull/760
[3] https://github.com/khorben/freebsd-src/tree/khorben/openssl-3.0.9



Re: CURRRENT snapshot won't boot due missing ZFS feature

2023-06-08 Thread Michael Gmelin



On Thu, 8 Jun 2023 16:20:12 +
Glen Barber  wrote:

> On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote:
> > Hi,
> > 
> > I didn't dig into this yet.
> > 
> > After installing the current 14-snapshot (June 1st) in a bhyve-vm, I
> > get this on boot:
> > 
> >   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
> > 
> > (booting stops at this point)
> > 
> > Seems like the boot loader is missing this recently added feature.
> >   
> 
> Can you try today's snapshot?  They are propagated to most mirrors
> now.
> 

Tried today's snaphot, same problem.

  # reboot
  Waiting (max 60 seconds) for system process `vnlru' to stop... done
  Waiting (max 60 seconds) for system process `syncer' to stop... 
  Syncing disks, vnodes remaining... 0 0 0 0 done
  All buffers synced.
  Uptime: 4m14s
  Consoles: userboot  

  FreeBSD/amd64 User boot lua, Revision 1.2
  ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
  ERROR: cannot open /boot/lua/loader.lua: no such file or directory.


  Type '?' for a list of commands, 'help' for more detailed help.
  OK 


That's after installing CURRENT in a fresh vm managed by vm-bhyve using
bsdinstall's automatic ZFS option.

Best
Michael

-- 
Michael Gmelin



Re: CURRRENT snapshot won't boot due missing ZFS feature

2023-06-08 Thread Yuri
Yuri wrote:
> Michael Gmelin wrote:
>> Hi,
>>
>> I didn't dig into this yet.
>>
>> After installing the current 14-snapshot (June 1st) in a bhyve-vm, I
>> get this on boot:
>>
>>   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
>>
>> (booting stops at this point)
>>
>> Seems like the boot loader is missing this recently added feature.
> 
> Are you sure it was June 1's?  I saw this problem on:
> 
> FreeBSD-14.0-CURRENT-amd64-20230427-60167184abd5-262599-disc1.iso
> 
> ...but it was fixed since (for me, at least):
> 
> FreeBSD-14.0-CURRENT-amd64-20230504-4194bbb34c60-262746-disc1.iso

Trying to remember, I think I hit "send" too soon, it was 20230504
actually with a problem, and I think I had to use the previous one to
install.  Sorry for the noise.



Re: CURRRENT snapshot won't boot due missing ZFS feature

2023-06-08 Thread Yuri
Michael Gmelin wrote:
> Hi,
> 
> I didn't dig into this yet.
> 
> After installing the current 14-snapshot (June 1st) in a bhyve-vm, I
> get this on boot:
> 
>   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
> 
> (booting stops at this point)
> 
> Seems like the boot loader is missing this recently added feature.

Are you sure it was June 1's?  I saw this problem on:

FreeBSD-14.0-CURRENT-amd64-20230427-60167184abd5-262599-disc1.iso

...but it was fixed since (for me, at least):

FreeBSD-14.0-CURRENT-amd64-20230504-4194bbb34c60-262746-disc1.iso



Re: CURRRENT snapshot won't boot due missing ZFS feature

2023-06-08 Thread Glen Barber
On Thu, Jun 08, 2023 at 06:11:15PM +0200, Michael Gmelin wrote:
> Hi,
> 
> I didn't dig into this yet.
> 
> After installing the current 14-snapshot (June 1st) in a bhyve-vm, I
> get this on boot:
> 
>   ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2
> 
> (booting stops at this point)
> 
> Seems like the boot loader is missing this recently added feature.
> 

Can you try today's snapshot?  They are propagated to most mirrors now.

Glen



signature.asc
Description: PGP signature


CURRRENT snapshot won't boot due missing ZFS feature

2023-06-08 Thread Michael Gmelin
Hi,

I didn't dig into this yet.

After installing the current 14-snapshot (June 1st) in a bhyve-vm, I
get this on boot:

  ZFS: unsupported feature: com.klarasystems:vdev_zaps_v2

(booting stops at this point)

Seems like the boot loader is missing this recently added feature.

Best
Michael

-- 
Michael Gmelin



Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)

2023-06-08 Thread Rebecca Cran

On 6/8/23 05:48, Warner Losh wrote:



On Thu, Jun 8, 2023, 4:35 AM Rebecca Cran  wrote:

It's ZFS, using the default options when creating it via the FreeBSD
installer so I presume TRIM is enabled. Without a reliable way to
reproduce the error I'm not sure disabling TRIM will help at the
moment.

I don't think there's any newer firmware for it.


pci gen 4 has a highter error rate so that needs to be managed with 
retries.  There's a whole protocol to do that which linux implements. 
I suspect the time has come for us to do so too. There's some code 
floating around I'll have to track down.


Thanks. I dropped the configuration down to PCIe gen 3 and the errors 
have so far gone away.



nda0: nvme version 1.3 x8 (max x8) lanes PCIe Gen3 (max Gen4) link
nda1: nvme version 1.3 x4 (max x4) lanes PCIe Gen3 (max Gen4) link

--
Rebecca Cran




Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)

2023-06-08 Thread Warner Losh
On Thu, Jun 8, 2023, 4:35 AM Rebecca Cran  wrote:

> It's ZFS, using the default options when creating it via the FreeBSD
> installer so I presume TRIM is enabled. Without a reliable way to
> reproduce the error I'm not sure disabling TRIM will help at the moment.
>
> I don't think there's any newer firmware for it.
>

pci gen 4 has a highter error rate so that needs to be managed with
retries.  There's a whole protocol to do that which linux implements. I
suspect the time has come for us to do so too. There's some code floating
around I'll have to track down.

Warner

-- 
>
> Rebecca Cran
>
>
> On 6/8/23 04:25, Tomek CEDRO wrote:
> > what filesystem? is TRIM enabled on that drive? have you tried
> > disabling trim? i had similar ssd related problem on samsung's ssd
> > long time ago that was related to trim. maybe drive firmware can be
> > updated too? :-)
> >
> > --
> > CeDeROM, SQ7MHZ, http://www.tomek.cedro.info
>
>


Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)

2023-06-08 Thread Rebecca Cran
It's ZFS, using the default options when creating it via the FreeBSD 
installer so I presume TRIM is enabled. Without a reliable way to 
reproduce the error I'm not sure disabling TRIM will help at the moment.


I don't think there's any newer firmware for it.


--

Rebecca Cran


On 6/8/23 04:25, Tomek CEDRO wrote:
what filesystem? is TRIM enabled on that drive? have you tried 
disabling trim? i had similar ssd related problem on samsung's ssd 
long time ago that was related to trim. maybe drive firmware can be 
updated too? :-)


--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info




Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)

2023-06-08 Thread Tomek CEDRO
what filesystem? is TRIM enabled on that drive? have you tried disabling
trim? i had similar ssd related problem on samsung's ssd long time ago that
was related to trim. maybe drive firmware can be updated too? :-)

--
CeDeROM, SQ7MHZ, http://www.tomek.cedro.info


Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)

2023-06-08 Thread Rebecca Cran

On 6/8/23 00:24, Warner Losh wrote:

PCIe 3 or PCIe 4?


PCIe 4.


nda0 at nvme0 bus 0 scbus0 target 0 lun 1
nda0: 
nda0: Serial Number S55KNC0TC00168
nda0: nvme version 1.3 x8 (max x8) lanes PCIe Gen4 (max Gen4) link
nda0: 6104710MB (12502446768 512 byte sectors)

--

Rebecca Cran




Re: Seemingly random nvme (nda) write error on new drive (retries exhausted)

2023-06-08 Thread Warner Losh
On Wed, Jun 7, 2023 at 11:12 PM Rebecca Cran  wrote:

> I got a seemingly random nvme data transfer error on my new arm64 Ampere
> Altra machine, which has a Samsung PM1735 PCIe AIC NVMe drive.
>
> Since it's a new drive and smartctl doesn't show any errors I thought it
> might be worth mentioning here.
>
> I'm running 14.0-CURRENT FreeBSD 14.0-CURRENT #0 main-n263139-baef3a5b585f.
>
>
> dmesg contains:
>
> nvme0: WRITE sqid:16 cid:126 nsid:1 lba:2550684560 len:8
> nvme0: DATA TRANSFER ERROR (00/04) crd:0 m:0 dnr:0 sqid:16 cid:126 cdw0:0
> (nda0:nvme0:0:0:1): WRITE. NCB: opc=1 fuse=0 nsid=1 prp1=0 prp2=0
> cdw=98085b90 0 7 0 0 0
> (nda0:nvme0:0:0:1): CAM status: CCB request completed with an error
> (nda0:nvme0:0:0:1): Error 5, Retries exhausted
>
>
> nvmecontrol identify nvme0 shows:
>
> Vendor ID:   144d
> Subsystem Vendor ID: 144d
> Model Number:SAMSUNG MZPLJ6T4HALA-7
> Firmware Version:EPK9CB5Q
> Recommended Arb Burst:   8
> IEEE OUI Identifier: 00 25 38
> Multi-Path I/O Capabilities: Multiple controllers, Multiple ports
> Max Data Transfer Size:  131072 bytes
> Sanitize Crypto Erase:   Supported
> Sanitize Block Erase:Supported
> Sanitize Overwrite:  Not Supported
> Sanitize NDI:Not Supported
> Sanitize NODMMAS:Undefined
> Controller ID:   0x0041
> Version: 1.3.0
>

PCIe 3 or PCIe 4?

So the only documented reason for this error is if we setup the memory wrong
such that the drive couldn't start a transfer from the specified address.
This seems
weird to me... But in the prior paragraph it talks about other types of
aborts that
need software intervention. If this is a transient error, then  maybe we
should retry
it as part of the data recovery. Unless this do not retry bit is set. which
it isn't. I wonder
this is retried 5 times or not before generating the error...

Warner