Re: USB Disk Stalls on -current

2022-02-06 Thread Warner Losh
On Mon, Feb 7, 2022, 12:51 AM grarpamp  wrote:

> Yes, some USB hw is very flaky,
> but ZFS can work great on these...
>
> https://www.youtube.com/watch?v=7z526m1jvls
> https://www.youtube.com/watch?v=dougISKs2vQ
> https://vimeo.com/13758987
> https://www.youtube.com/watch?v=1zIoK_9UzHk


Doesn't help the performance hiccups though. If you are waiting for a drive
to spin up or come out of power save mode or a USB interface chip to reset,
the filesystem isn't going to matter.

ZFS does a good job recovering in the face of this, but it's not without a
performance hit.

Warner


Re: USB Disk Stalls on -current

2022-02-06 Thread grarpamp
Yes, some USB hw is very flaky,
but ZFS can work great on these...

https://www.youtube.com/watch?v=7z526m1jvls
https://www.youtube.com/watch?v=dougISKs2vQ
https://vimeo.com/13758987
https://www.youtube.com/watch?v=1zIoK_9UzHk



FW: pciconf -lbvV crashes kernel main-8d72c409c - 2022-02-07

2022-02-06 Thread Michael Jung
Hi:

Here are the kernel.full files some of you asked for.  Let me know what else
may be helpful to test.

Thanks!

Michael Jung

Notes below * (UPDATED)

* Started fresh

Installed FreeBSD-14.0-CURRENT-amd64-20220113-0910a41ef3b-252413-disc1.iso
with its accompany source tree. Built kernel/world, installed them, rebooted and

http://mail.mikej.com/core.txt.0
http://mail.mikej.com/info.0
http://mail.mikej.com/kernel.full.0
http://mail.mikej.com/vmcore.0

* I deleted all contents from /usr/obj built fresh /usr/src via git to 
b6724f7004c and

14.0-CURRENT FreeBSD 14.0-CURRENT #1 main-n252997-b6724f7004c

http://mail.mikej.com/core.txt.1
http://mail.mikej.com/info.1
http://mail.mikej.com/kernel.full.1
http://mail.mikej.com/vmcore.1


* It's the '-V' switch that triggers the panic
* I did note that pciconf does not identify my CPU correctly (see below)
* The panic always occurs after ix0 output from pciconf -lbvV (see below)
ix1 has always followed next with -lbv - it has never been present 
with -lbvV
*  Switching the LSI 9208-16E and Intel dual port X520 PCI slots did not 
help
*   ixl1 definitely works using the exact same SPF+ adapter and cable
*   Removing the Intel X520 82599ES allow pciconf -lbvV to run
*   I have a single port version of this intel X520 which works

It seem to me, that when pciconf iterates to ix1@pci0:2:0:1 the problem always 
occurs.

I build like this:

nice make -WITH_META_MODE -DFAST_DEPEND -DWITHOUT_CLEAN -sj12 buildworld
nice make -WITH_META_MODE -DFAST_DEPEND -DWITHOUT_CLEAN -sj12 buildkernel

root@draid:/var/crash # cat /etc/make.conf
WITH_CCACHE_BUILD=yes
CCACHE_DIR=/root/.ccache
root@draid:/var/crash

There is no src.conf or sysctl.conf and I have only added '"filemon_load="YES"'
to /boot/loader.conf



* I did Note that pciconf does not identify my CPU correctly

It is actually an Intel Core i7-3770K - a sample programs in ports that gets it
right is sysutils/cpufetch.



* Here is "-vlb"


root@draid:/home/mikej # pciconf -vlb

pciconf -lvb
hostb0@pci0:0:0:0:  class=0x06 rev=0x09 hdr=0x00 vendor=0x8086 
device=0x0150 subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v2/3rd Gen Core processor DRAM Controller'
class  = bridge
subclass   = HOST-PCI
pcib1@pci0:0:1:0:   class=0x060400 rev=0x09 hdr=0x01 vendor=0x8086 
device=0x0151 subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port'
class  = bridge
subclass   = PCI-PCI
pcib2@pci0:0:1:1:   class=0x060400 rev=0x09 hdr=0x01 vendor=0x8086 
device=0x0155 subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = 'Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port'
class  = bridge
subclass   = PCI-PCI
none0@pci0:0:22:0:  class=0x078000 rev=0x04 hdr=0x00 vendor=0x8086 
device=0x1c3a subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = '6 Series/C200 Series Chipset Family MEI Controller'
class  = simple comms
bar   [10] = type Memory, range 64, base 0xf7f0b000, size 16, enabled
ehci0@pci0:0:26:0:  class=0x0c0320 rev=0x05 hdr=0x00 vendor=0x8086 
device=0x1c2d subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = '6 Series/C200 Series Chipset Family USB Enhanced Host 
Controller'
class  = serial bus
subclass   = USB
bar   [10] = type Memory, range 32, base 0xf7f08000, size 1024, enabled
hdac1@pci0:0:27:0:  class=0x040300 rev=0x05 hdr=0x00 vendor=0x8086 
device=0x1c20 subvendor=0x1043 subdevice=0x8469
vendor = 'Intel Corporation'
device = '6 Series/C200 Series Chipset Family High Definition Audio 
Controller'
class  = multimedia
subclass   = HDA
bar   [10] = type Memory, range 64, base 0xf7f0, size 16384, enabled
pcib3@pci0:0:28:0:  class=0x060400 rev=0xb5 hdr=0x01 vendor=0x8086 
device=0x1c10 subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = '6 Series/C200 Series Chipset Family PCI Express Root Port 1'
class  = bridge
subclass   = PCI-PCI
pcib4@pci0:0:28:4:  class=0x060400 rev=0xb5 hdr=0x01 vendor=0x8086 
device=0x1c18 subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = '6 Series/C200 Series Chipset Family PCI Express Root Port 5'
class  = bridge
subclass   = PCI-PCI
pcib5@pci0:0:28:5:  class=0x060400 rev=0xb5 hdr=0x01 vendor=0x8086 
device=0x1c1a subvendor=0x1043 subdevice=0x844d
vendor = 'Intel Corporation'
device = '6 Series/C200 Series Chipset Family PCI Express Root Port 6'
class  = bridge
subclass   = PCI-PCI
pcib6@pci0:0:28:6:  class=0x060400 rev=0xb5 hdr=0x01 vendor=0x8086 
device=0x1c1c subvendor=0x1043 subdevice=0x844d
vendor = 'Intel 

Re: USB Disk Stalls on -current

2022-02-06 Thread Warner Losh
On Sun, Feb 6, 2022 at 11:32 PM grarpamp  wrote:

> > Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28
> > 00 36 69 02 6e 00 00 80 00
> > Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB
> > request completed with an error
> > Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command,
> > 2 more tries remain
>
> Isn't there mechanism for kernel/cam/driver to issue a
> sense request to fetch and print the actual error...
>

We do that, but since this is a timeout, there's no sense to
print (otherwise we'd print the sense here). We definitely report
those errors for things like media error, etc.

>
> assuming, which is fine, that the bus or devices aren't
> already locked up, in reset, etc such that a sense
> would go unfulfilled or already be cleared.
>

I'm pretty sure the problem here is that the disks are timing out
for some reason. Many USB drives are designed for occasional
use, and often have aggressive power saving modes, which are
known to hiccup like this. And many USB bridge to SATA chips
have been known to glitch out under load (sometimes transparently
sometimes not).

Warner


Re: USB Disk Stalls on -current

2022-02-06 Thread grarpamp
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28
> 00 36 69 02 6e 00 00 80 00
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB
> request completed with an error
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command,
> 2 more tries remain

Isn't there mechanism for kernel/cam/driver to issue a
sense request to fetch and print the actual error...
assuming, which is fine, that the bus or devices aren't
already locked up, in reset, etc such that a sense
would go unfulfilled or already be cleared.



Re: USB Disk Stalls on -current

2022-02-06 Thread Graham Perrin

On 06/02/2022 17:14, Sean Bruno wrote:

… the clanking/grinding sound of the spinning rust on my desk 
completely stops, the encoding of the video files stops (so its 
waiting for a read to complete)…


On 06/02/2022 19:02, Sean Bruno wrote:

… assuming that I have a fairly dodgy USB device, as the pauses seem 
to correspond to this from CAM being emitted:


Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 
28 00 36 69 02 6e 00 00 80 00
Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB 
request completed with an error
Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying 
command, 2 more tries remain


Things resume after this is emitted, but there is a substantial 
(multiple minutes) pause here. …


Yep, for a pause of that length (without a logged exhaustion) I'd 
suspect the hard disk drive.


Over the years, I've seen 'CCB request completed with an error' often 
enough, and in various situations, with _good_ media e.g. 
, to get a sense (for myself) of whether 
there's a marginal cable, marginal port at the computer, marginal port 
at the drive, or some combination of those three things.


Can you get S.M.A.R.T. data?

An extended self-test might not expose an issue that becomes evident 
with sustained _writes_ (although I note your opening post comment about 
waiting for a read to complete).


HTH
Graham




buildworld failed

2022-02-06 Thread qroxana
I know running make install for /usr/src/tools/build/test-includes can fix this,
but this still fails on a newly installed 14.0-CURRENT.

--- test-includes ---
cd /usr/src/tools/build/test-includes; MACHINE_ARCH=aarch64 MACHINE=arm64 
CPUTYPE= CC="cc -target aarch64-unknown-freebsd14.0 
--sysroot=/usr/obj/usr/src/arm64.aarch64/tmp 
-B/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin -target 
aarch64-unknown-freebsd14.0 --sysroot=/usr/obj/usr/src/arm64.aarch64/tmp 
-B/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin" CXX="c++ -target 
aarch64-unknown-freebsd14.0 --sysroot=/usr/obj/usr/src/arm64.aarch64/tmp 
-B/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin -target 
aarch64-unknown-freebsd14.0 --sysroot=/usr/obj/usr/src/arm64.aarch64/tmp 
-B/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin" CPP="cpp -target 
aarch64-unknown-freebsd14.0 --sysroot=/usr/obj/usr/src/arm64.aarch64/tmp 
-B/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin -target 
aarch64-unknown-freebsd14.0 --sysroot=/usr/obj/usr/src/arm64.aarch64/tmp 
-B/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin" AS="as" AR="ar" ELFCTL="elfctl" 
LD="ld" LLVM_LINK="" NM=nm OBJCOPY="objcopy" RANLIB=ranlib STRINGS= SIZE="size" 
STRIPBIN="strip" INSTALL="install -U" 
PATH=/usr/obj/usr/src/arm64.aarch64/tmp/bin:/usr/obj/usr/src/arm64.aarch64/tmp/usr/sbin:/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/usr/sbin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/usr/bin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/bin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/usr/libexec::/usr/obj/usr/src/arm64.aarch64/tmp/bin:/usr/obj/usr/src/arm64.aarch64/tmp/usr/sbin:/usr/obj/usr/src/arm64.aarch64/tmp/usr/bin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/usr/sbin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/usr/bin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/bin:/usr/obj/usr/src/arm64.aarch64/tmp/legacy/usr/libexec::/sbin:/bin:/usr/sbin:/usr/bin
 SYSROOT=/usr/obj/usr/src/arm64.aarch64/tmp make 
DESTDIR=/usr/obj/usr/src/arm64.aarch64/tmp test-includes
--- sys/abi_compat.c ---
--- sys/acct.c ---
--- sys/acl.c ---
--- sys/aio.c ---
--- sys/abi_compat.c ---
echo "#include " > sys/abi_compat.c
sh: cannot create sys/abi_compat.c: No such file or directory
*** [sys/abi_compat.c] Error code 2

make[4]: stopped in /usr/src/tools/build/test-includes
--- sys/acct.c ---
echo "#include " > sys/acct.c
sh: cannot create sys/acct.c: No such file or directory
*** [sys/acct.c] Error code 2

make[4]: stopped in /usr/src/tools/build/test-includes
--- sys/aio.c ---
echo "#include " > sys/aio.c
sh: cannot create sys/aio.c: No such file or directory
*** [sys/aio.c] Error code 2

make[4]: stopped in /usr/src/tools/build/test-includes
--- sys/acl.c ---
echo "#include " > sys/acl.c
sh: cannot create sys/acl.c: No such file or directory
*** [sys/acl.c] Error code 2

Re: Dragonfly Mail Agent (dma) in the base system

2022-02-06 Thread Jamie Landeg-Jones
Cy Schubert  wrote:

> In message <202202061553.216fr0yt071...@donotpassgo.dyslexicfish.net>, 
> Jamie La
> ndeg-Jones writes:
> > Cy Schubert  wrote:
> >
> > > dma doesn't support SMTP submission, we may need to review various port 
> > > default options or whether ports even support it.
> >
> > Good catch.
>
> You misquoted me. Read my email again!

Sorry, I read it again, but it still looks to me as "some ports only work via
SMTP submission, so they will need to be looked at."

I suggested an alternative of instead, "emulating" the SMTP submission
functionality (but maybe in a better way that my suggested hack, though)

After all, it isn't just ports - there could be other third party stuff
that only works via submission too.

So, to avoid breaking functionality, smtp submission is something to think
about continuing supporting, hence my use of the phrase "good catch".

Is this not correct?

cheers, Jamie

> > Would a suitable workaround be to parse the dma.conf file for the SMARTHOST
> > address, and then set up a simple tcp proxy on the local submission port to
> > that?
>
> Your comment is based on a false premise.



Re: USB Disk Stalls on -current

2022-02-06 Thread Mehmet Erol Sanliturk
On Sun, Feb 6, 2022 at 10:11 PM Warner Losh  wrote:

>
>
> On Sun, Feb 6, 2022 at 12:02 PM Sean Bruno  wrote:
>
>>
>>
>> >
>> >
>> > So there's some tools you can use. For usb, there's usbdump that can
>> > get you the USB transactions. I've not used it enough to give more
>> details
>> > here. This will let you know what's going on, and when, on the USB
>> endpoint.
>> >
>> > You can also enable the CAM_IOSCHED stuff. This will allow you to get
>> > latency
>> > measurements for 'requests in the sim' which basically will tell you
>> > what your
>> > latency spread is for the drives. This will tell you if things are
>> > getting caught
>> > up in the USB layer, or after CAM's da driver completes the I/O request
>> > (granted, that's almost certainly not happening, but it will help you
>> > figure out
>> > what's going on and put numbers to the oddities you are seeing).
>> >
>> > Also, make sure you have good cables. I've had lots of hicups over the
>> > years from dodgy USB cables. Also make sure you have good, high quality
>> > enclosures. Many from the USB2 time-period are sketchy at best and I
>> > went through several at one point trying to find a good one. I'd be
>> > tempted to
>> > get USB 3 enclosures. I've had better luck with USB3 gear than USB2 gear
>> > here, but you need a USB-3 controller to get USB-3 speeds which might
>> not
>> > be compatible with the NUC's built-in stuff (though my NUC has one USB3
>> > port, there's lots of different models).
>> >
>> > Usually, though, I see weirdness associated with dmesg messages from
>> > usb, cam, etc when the hardware is on the sketch end.
>> >
>> > Warner
>>
>> I'm assuming that I have a fairly dodgy USB device, as the pauses seem
>> to correspond to this from CAM being emitted:
>>
>> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28
>> 00 36 69 02 6e 00 00 80 00
>> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB
>> request completed with an error
>> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command,
>> 2 more tries remain
>>
>>
>> Things resume after this is emitted, but there is a substantial
>> (multiple minutes) pause here.  I would assume that timeouts would fire
>> much quicker.
>>
>
> The default timeout is 60s.
>
> You can reduce that substantially by setting kern.cam.da.default_timeout
> to a smaller level. Disk operations completed within 5s these days,
> except spin ups. Heck, nearly all complete within 500ms. You
> might try setting this value to maybe 3 or 5 or 10 to see if that helps the
> hiccups without introducing extra retries when the load is heavy. The
> smaller values give a faster recovery, but too small a number may result
> in timeouts and errors under load. I think you need to set this as a
> tuneable.
>
> Warner
>



Are your external disks  "GREEN" , i.e. ,  "energy saver" kind .

If the external disks are energy saver kind , they will start to sleep when
they are not
used for a while , and waking them up will take time which causes
significant distress ,
because to use them requires waiting every such wake up  .

At that point another important trouble is slowness of USB external disks
with respect to internal ( non-energy saver ) SATA disks .

When response time is important , it is necessary to avoid such "GREEN"
disks .



Mehmet Erol Sanliturk


Re: USB Disk Stalls on -current

2022-02-06 Thread Warner Losh
On Sun, Feb 6, 2022 at 12:02 PM Sean Bruno  wrote:

>
>
> >
> >
> > So there's some tools you can use. For usb, there's usbdump that can
> > get you the USB transactions. I've not used it enough to give more
> details
> > here. This will let you know what's going on, and when, on the USB
> endpoint.
> >
> > You can also enable the CAM_IOSCHED stuff. This will allow you to get
> > latency
> > measurements for 'requests in the sim' which basically will tell you
> > what your
> > latency spread is for the drives. This will tell you if things are
> > getting caught
> > up in the USB layer, or after CAM's da driver completes the I/O request
> > (granted, that's almost certainly not happening, but it will help you
> > figure out
> > what's going on and put numbers to the oddities you are seeing).
> >
> > Also, make sure you have good cables. I've had lots of hicups over the
> > years from dodgy USB cables. Also make sure you have good, high quality
> > enclosures. Many from the USB2 time-period are sketchy at best and I
> > went through several at one point trying to find a good one. I'd be
> > tempted to
> > get USB 3 enclosures. I've had better luck with USB3 gear than USB2 gear
> > here, but you need a USB-3 controller to get USB-3 speeds which might not
> > be compatible with the NUC's built-in stuff (though my NUC has one USB3
> > port, there's lots of different models).
> >
> > Usually, though, I see weirdness associated with dmesg messages from
> > usb, cam, etc when the hardware is on the sketch end.
> >
> > Warner
>
> I'm assuming that I have a fairly dodgy USB device, as the pauses seem
> to correspond to this from CAM being emitted:
>
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28
> 00 36 69 02 6e 00 00 80 00
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB
> request completed with an error
> Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command,
> 2 more tries remain
>
>
> Things resume after this is emitted, but there is a substantial
> (multiple minutes) pause here.  I would assume that timeouts would fire
> much quicker.
>

The default timeout is 60s.

You can reduce that substantially by setting kern.cam.da.default_timeout
to a smaller level. Disk operations completed within 5s these days,
except spin ups. Heck, nearly all complete within 500ms. You
might try setting this value to maybe 3 or 5 or 10 to see if that helps the
hiccups without introducing extra retries when the load is heavy. The
smaller values give a faster recovery, but too small a number may result
in timeouts and errors under load. I think you need to set this as a
tuneable.

Warner


Re: USB Disk Stalls on -current

2022-02-06 Thread Sean Bruno







So there's some tools you can use. For usb, there's usbdump that can
get you the USB transactions. I've not used it enough to give more details
here. This will let you know what's going on, and when, on the USB endpoint.

You can also enable the CAM_IOSCHED stuff. This will allow you to get 
latency
measurements for 'requests in the sim' which basically will tell you 
what your
latency spread is for the drives. This will tell you if things are 
getting caught

up in the USB layer, or after CAM's da driver completes the I/O request
(granted, that's almost certainly not happening, but it will help you 
figure out

what's going on and put numbers to the oddities you are seeing).

Also, make sure you have good cables. I've had lots of hicups over the
years from dodgy USB cables. Also make sure you have good, high quality
enclosures. Many from the USB2 time-period are sketchy at best and I
went through several at one point trying to find a good one. I'd be 
tempted to

get USB 3 enclosures. I've had better luck with USB3 gear than USB2 gear
here, but you need a USB-3 controller to get USB-3 speeds which might not
be compatible with the NUC's built-in stuff (though my NUC has one USB3
port, there's lots of different models).

Usually, though, I see weirdness associated with dmesg messages from
usb, cam, etc when the hardware is on the sketch end.

Warner


I'm assuming that I have a fairly dodgy USB device, as the pauses seem 
to correspond to this from CAM being emitted:


Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): READ(10). CDB: 28 
00 36 69 02 6e 00 00 80 00
Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): CAM status: CCB 
request completed with an error
Feb  6 11:56:43 alice kernel: (da0:umass-sim1:1:0:0): Retrying command, 
2 more tries remain



Things resume after this is emitted, but there is a substantial 
(multiple minutes) pause here.  I would assume that timeouts would fire 
much quicker.


sean



Re: USB Disk Stalls on -current

2022-02-06 Thread Warner Losh
On Sun, Feb 6, 2022 at 10:15 AM Sean Bruno  wrote:

> I'm doing something "gross" with ZFS & Plex on a little Intel NUC that I
> have here at the house to provide me with a nice little NAS at home.
> I'm using 2x USB2 external disks as the mirror.
>
> I noted that the two USB2 disks I'm using in a mirror seem to "stall"
> from time to time and its not clear to me why.
>
> I'd like to poke further into the USB system but I'm not sure where I
> should start to see if there is something amiss with the hardware (e.g.
> the disks suck) or if FreeBSD is losing track of something during I/O
> leading to a stall/timeout.
>
> I'm not seeing data loss or anything, I just note from time to time
> during large file transfers that the clanking/grinding sound of the
> spinning rust on my desk completely stops, the encoding of the video
> files stops (so its waiting for a read to complete) and its gets much
> quieter in my office.  :-)
>

So there's some tools you can use. For usb, there's usbdump that can
get you the USB transactions. I've not used it enough to give more details
here. This will let you know what's going on, and when, on the USB endpoint.

You can also enable the CAM_IOSCHED stuff. This will allow you to get
latency
measurements for 'requests in the sim' which basically will tell you what
your
latency spread is for the drives. This will tell you if things are getting
caught
up in the USB layer, or after CAM's da driver completes the I/O request
(granted, that's almost certainly not happening, but it will help you
figure out
what's going on and put numbers to the oddities you are seeing).

Also, make sure you have good cables. I've had lots of hicups over the
years from dodgy USB cables. Also make sure you have good, high quality
enclosures. Many from the USB2 time-period are sketchy at best and I
went through several at one point trying to find a good one. I'd be tempted
to
get USB 3 enclosures. I've had better luck with USB3 gear than USB2 gear
here, but you need a USB-3 controller to get USB-3 speeds which might not
be compatible with the NUC's built-in stuff (though my NUC has one USB3
port, there's lots of different models).

Usually, though, I see weirdness associated with dmesg messages from
usb, cam, etc when the hardware is on the sketch end.

Warner


Re: USB Disk Stalls on -current

2022-02-06 Thread Sean Bruno




On 2/6/22 10:52, Mehmet Erol Sanliturk wrote:



On Sun, Feb 6, 2022 at 8:15 PM Sean Bruno > wrote:


I'm doing something "gross" with ZFS & Plex on a little Intel NUC
that I
have here at the house to provide me with a nice little NAS at home.
I'm using 2x USB2 external disks as the mirror.

I noted that the two USB2 disks I'm using in a mirror seem to "stall"
from time to time and its not clear to me why.

I'd like to poke further into the USB system but I'm not sure where I
should start to see if there is something amiss with the hardware (e.g.
the disks suck) or if FreeBSD is losing track of something during I/O
leading to a stall/timeout.

I'm not seeing data loss or anything, I just note from time to time
during large file transfers that the clanking/grinding sound of the
spinning rust on my desk completely stops, the encoding of the video
files stops (so its waiting for a read to complete) and its gets much
quieter in my office.  :-)

sean



I encountered such a case in Fedora Linux with an external 2.0 USB disk .
When the external disk was connected to a 1.? USB port , the loading of 
operating system

was terrifically slow or sometimes some parts normal .

You may check your USB ports versions to ensure that they are conforming 
to each other .
Board USB port may be 2.0 , but connected chassis USB port may be 1.?  
like in my chassis .

When USB external disk is connected to the chassis  USB 2.0 port ,
everything has become normal .


Mehmet Erol Sanliturk






I see them all up as 480mbps / USB 2.0 if usbconfig and the driver 
attach is anything to go by.  Disk read/write perf when running seems to 
approach 40MB/s, so I think its running pretty close to the correct speed.

...
ugen0.7:  at usbus0
umass1 on uhub0
umass1:  on usbus0
da0 at umass-sim1 bus 1 scbus4 target 0 lun 0
da0:  Fixed Direct Access SCSI device
da0: 40.000MB/s transfers
da0: 1907729MB (3907029168 512 byte sectors)
da0: quirks=0x2
Root mount waiting for: usbus0

ugen0.8:  at usbus0
umass2 on uhub0
umass2:  on usbus0
da1 at umass-sim2 bus 2 scbus5 target 0 lun 0
da1:  Fixed Direct Access SPC-2 SCSI device
da1: Serial Number ABCDEF0123456847
da1: 40.000MB/s transfers
da1: 1907729MB (3907029168 512 byte sectors)
da1: quirks=0x2

...
ugen0.7:  at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) pwr=ON (2mA)
ugen0.8:  at usbus0, cfg=0 md=HOST spd=HIGH 
(480Mbps) pwr=ON (500mA)

...

sean



Re: USB Disk Stalls on -current

2022-02-06 Thread Mehmet Erol Sanliturk
On Sun, Feb 6, 2022 at 8:15 PM Sean Bruno  wrote:

> I'm doing something "gross" with ZFS & Plex on a little Intel NUC that I
> have here at the house to provide me with a nice little NAS at home.
> I'm using 2x USB2 external disks as the mirror.
>
> I noted that the two USB2 disks I'm using in a mirror seem to "stall"
> from time to time and its not clear to me why.
>
> I'd like to poke further into the USB system but I'm not sure where I
> should start to see if there is something amiss with the hardware (e.g.
> the disks suck) or if FreeBSD is losing track of something during I/O
> leading to a stall/timeout.
>
> I'm not seeing data loss or anything, I just note from time to time
> during large file transfers that the clanking/grinding sound of the
> spinning rust on my desk completely stops, the encoding of the video
> files stops (so its waiting for a read to complete) and its gets much
> quieter in my office.  :-)
>
> sean
>
>

I encountered such a case in Fedora Linux with an external 2.0 USB disk .
When the external disk was connected to a 1.? USB port , the loading of
operating system
was terrifically slow or sometimes some parts normal .

You may check your USB ports versions to ensure that they are conforming to
each other .
Board USB port may be 2.0 , but connected chassis USB port may be 1.?  like
in my chassis .
When USB external disk is connected to the chassis  USB 2.0 port ,
everything has become normal .


Mehmet Erol Sanliturk


USB Disk Stalls on -current

2022-02-06 Thread Sean Bruno
I'm doing something "gross" with ZFS & Plex on a little Intel NUC that I 
have here at the house to provide me with a nice little NAS at home. 
I'm using 2x USB2 external disks as the mirror.


I noted that the two USB2 disks I'm using in a mirror seem to "stall" 
from time to time and its not clear to me why.


I'd like to poke further into the USB system but I'm not sure where I 
should start to see if there is something amiss with the hardware (e.g. 
the disks suck) or if FreeBSD is losing track of something during I/O 
leading to a stall/timeout.


I'm not seeing data loss or anything, I just note from time to time 
during large file transfers that the clanking/grinding sound of the 
spinning rust on my desk completely stops, the encoding of the video 
files stops (so its waiting for a read to complete) and its gets much 
quieter in my office.  :-)


sean



Re: Dragonfly Mail Agent (dma) in the base system

2022-02-06 Thread Jamie Landeg-Jones
Cy Schubert  wrote:

> dma doesn't support SMTP submission, we may need to review various port 
> default options or whether ports even support it.

Good catch.

Would a suitable workaround be to parse the dma.conf file for the SMARTHOST
address, and then set up a simple tcp proxy on the local submission port to
that?



Re: pciconf -lbvV crashes kernel main-8d72c409c

2022-02-06 Thread Stefan Esser
Am 06.02.22 um 01:19 schrieb Michael Jung:
> Dump header from device: /dev/ada0p2
> Architecture: amd64
> Architecture Version: 2
> Dump Length: 900231168
> Blocksize: 512
> Compression: none
> Dumptime: 2022-02-04 15:48:08 -0500
> Hostname: draid.mikej.com
> Magic: FreeBSD Kernel Dump
> Version String: FreeBSD 14.0-CURRENT #1 main-8d72c409c: Thu Feb 3 18:14:01 
> EST 2022
> mikej@draid:/usr/obj/usr/src/amd64.amd64/sys/GENERIC
> Panic String: length mismatch
> Dump Parity: 1692982593
> Bounds: 2
> Dump Status: good

This is caused by the following code fragments:

/*


 * Calculate the amount of space needed in the data buffer.  An


 * identifier element is always present followed by the read-only


 * and read-write keywords.


 */
len = sizeof(struct pci_vpd_element) + strlen(vpd->vpd_ident);
for (i = 0; i < vpd->vpd_rocnt; i++)
len += sizeof(struct pci_vpd_element) + vpd->vpd_ros[i].len;
for (i = 0; i < vpd->vpd_wcnt; i++)
len += sizeof(struct pci_vpd_element) + vpd->vpd_w[i].len;
[...]
vpd_user = lvio->plvi_data;
[...]
vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
vpd_element.pve_flags = 0;
for (i = 0; i < vpd->vpd_rocnt; i++) {
vpd_element.pve_keyword[0] = vpd->vpd_ros[i].keyword[0];
vpd_element.pve_keyword[1] = vpd->vpd_ros[i].keyword[1];
vpd_element.pve_datalen = vpd->vpd_ros[i].len;
error = copyout(_element, vpd_user, sizeof(vpd_element));
if (error)
return (error);
error = copyout(vpd->vpd_ros[i].value, vpd_user->pve_data,
vpd->vpd_ros[i].len);
if (error)
return (error);
vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
}
vpd_element.pve_flags = PVE_FLAG_RW;
for (i = 0; i < vpd->vpd_wcnt; i++) {
vpd_element.pve_keyword[0] = vpd->vpd_w[i].keyword[0];
vpd_element.pve_keyword[1] = vpd->vpd_w[i].keyword[1];
vpd_element.pve_datalen = vpd->vpd_w[i].len;
error = copyout(_element, vpd_user, sizeof(vpd_element));
if (error)
return (error);
error = copyout(vpd->vpd_w[i].value, vpd_user->pve_data,
vpd->vpd_w[i].len);
if (error)
return (error);
vpd_user = PVE_NEXT_LEN(vpd_user, vpd_element.pve_datalen);
}
KASSERT((char *)vpd_user - (char *)lvio->plvi_data == len,
("length mismatch"));

The KASSERT triggered, indicating that a different amount of data has been
fetched than has previously been calculated.

It would be interesting to compare the pre-computed "len" and the actual
amount of data (i.e. the operands of == in the KASSERT).

The definition of PVE_NEXT_LEN looks correct, but in order to completely
understand what the issue is, a dump of the VPD range should be analyzed
(or you could add trace output to both the calculation of "len" and to
the fetching of the VPD data that advances vpd_user).

Regards, STefan

PS: You may want to build a kernel with the attached patch, which prints
the calculated lengths after each element that is added to "len".
The KASSERT will only trigger if the actual length exceeds the expected
value, and the printf() output should go to the console device.
My system does not seem to have a single device that provides VPD,
therefore the patch has only been compile tested ...diff --git a/sys/dev/pci/pci_user.c b/sys/dev/pci/pci_user.c
index a5f849e85c2d..c771db0b5070 100644
--- a/sys/dev/pci/pci_user.c
+++ b/sys/dev/pci/pci_user.c
@@ -565,6 +565,7 @@ pci_list_vpd(device_t dev, struct pci_list_vpd_io *lvio)
size_t len;
int error, i;
 
+   printf("%p / %p\n", lvio->plvi_data, PVE_NEXT_LEN(lvio->plvi_data, 1));
vpd = pci_fetch_vpd_list(dev);
if (vpd->vpd_reg == 0 || vpd->vpd_ident == NULL)
return (ENXIO);
@@ -575,10 +576,15 @@ pci_list_vpd(device_t dev, struct pci_list_vpd_io *lvio)
 * and read-write keywords.
 */
len = sizeof(struct pci_vpd_element) + strlen(vpd->vpd_ident);
-   for (i = 0; i < vpd->vpd_rocnt; i++)
+   printf("LEN(%d): %lu\n", -1, len);
+   for (i = 0; i < vpd->vpd_rocnt; i++) {
len += sizeof(struct pci_vpd_element) + vpd->vpd_ros[i].len;
-   for (i = 0; i < vpd->vpd_wcnt; i++)
+   printf("LEN(%d): %lu\n", i, len);
+   }
+   for (i = 0; i < vpd->vpd_wcnt; i++) {
len += sizeof(struct pci_vpd_element) + vpd->vpd_w[i].len;
+   printf("LEN(%d): %lu\n", i, len);
+   }
 
if (lvio->plvi_len == 0) {
lvio->plvi_len = len;
@@ -606,6 +612,7 @@ pci_list_vpd(device_t dev, struct