Re: ugen(4) communication issues with UPS (nut) blazer_usb and nutdrv_qx

2021-04-22 Thread xs
Errata:

A few minutes later the problem occured on 6.8

It was looking stable but:
Apr 23 00:04:49 libre /bsd: ugen0 detached
Apr 23 00:04:56 libre /bsd: ugen0 at uhub0 port 2 "INNO TECH USB to
Serial" rev 1.10/0.02 addr 2
Apr 23 00:06:08 libre upsmon[1619]: UPS ups0@localhost on battery
Apr 23 00:06:18 libre upsmon[1619]: UPS ups0@localhost on line power
Apr 23 00:07:50 libre pkg_add: Added gnuwatch-3.2.8p0
Apr 23 00:21:24 libre upsd[80608]: Data for UPS [ups0] is stale - check
driver
Apr 23 00:21:26 libre upsmon[1619]: Poll UPS [ups0@localhost] failed -
Data stale
Apr 23 00:21:26 libre upsmon[1619]: Communications with UPS
ups0@localhost lost
Apr 23 00:21:31 libre upsmon[1619]: Poll UPS [ups0@localhost] failed -
Data stale
Apr 23 00:22:06 libre last message repeated 7 times

restarting upsd made it re-connect, however still no `clear endpoints
failed!` in dmesg.


On Fri, 2021-04-23 at 00:17 +0200, xs wrote:
> My x230 laptop was setup to do the testing with 6.8.
> 
> First observations:
> 
> - There's no `ugen_clear_iface_eps: clear endpoints failed!` messages
> - I haven't managed to trigger the bug on 6.8
> - I'm still getting trouble to make upsd able to detect the ups at
> boot
> 
> I have to replug the USB then it works.
> 
> x230 laptop:
> OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
>    
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8360439808 (7973MB)
> avail mem = 8092012544 (7717MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xbff35020 (18 entries)
> bios0: vendor coreboot version "CBET4000 4.11-1765-g4bd6927388-dirty"
> date 03/24/2020
> bios0: LENOVO 23256CG
> acpi0 at bios0: ACPI 4.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
> acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3)
> LID_(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xf000, bus 0-63
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.87 MHz, 06-3a-09
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE
> 36
> ,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MW
> AI
> T,DS-
> CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,PO
> PC
> NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,
> FS
> GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVE
> OP
> T,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.44 MHz, 06-3a-09
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE
> 36
> ,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MW
> AI
> T,DS-
> CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,PO
> PC
> NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,
> FS
> GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVE
> OP
> T,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.44 MHz, 06-3a-09
> cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE
> 36
> ,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MW
> AI
> T,DS-
> CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,PO
> PC
> NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,
> FS
> GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVE
> OP
> T,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 1, package 0
> cpu3 at mainbus0: apid 3 (application processor)
> cpu3: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.44 MHz, 06-3a-09
> cpu3:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE
> 36
> ,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MW
> AI
> T,DS-
> CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,PO
> PC
> NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,
> FS
> GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVE
> OP
> T,MELTDOWN
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 1, core 1, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
> acpihpet0 at acpi0: 14318179 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 at acpi0: bus 1 (RP01)
> acpiprt2 at acpi0: bus 2 (RP02)
> acpiprt3 at acpi0: bus 3 (RP03)
> acpiprt4 at acpi0: bus -1 (RP04)
> acpiprt5 at acpi0: bus -1 (RP05)
> acp

Re: ugen(4) communication issues with UPS (nut) blazer_usb and nutdrv_qx

2021-04-22 Thread xs
My x230 laptop was setup to do the testing with 6.8.

First observations:

- There's no `ugen_clear_iface_eps: clear endpoints failed!` messages
- I haven't managed to trigger the bug on 6.8
- I'm still getting trouble to make upsd able to detect the ups at boot

I have to replug the USB then it works.

x230 laptop:
OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
   
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8360439808 (7973MB)
avail mem = 8092012544 (7717MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xbff35020 (18 entries)
bios0: vendor coreboot version "CBET4000 4.11-1765-g4bd6927388-dirty"
date 03/24/2020
bios0: LENOVO 23256CG
acpi0 at bios0: ACPI 4.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT MCFG TCPA APIC DMAR HPET
acpi0: wakeup devices HDEF(S4) EHC1(S4) EHC2(S4) XHC_(S4) SLPB(S3)
LID_(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-63
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.87 MHz, 06-3a-09
cpu0:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPC
NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FS
GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOP
T,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.44 MHz, 06-3a-09
cpu1:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPC
NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FS
GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOP
T,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.44 MHz, 06-3a-09
cpu2:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPC
NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FS
GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOP
T,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz, 2893.44 MHz, 06-3a-09
cpu3:
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36
,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAI
T,DS-
CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPC
NT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,FS
GSBASE,SMEP,ERMS,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOP
T,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 24 pins
acpihpet0 at acpi0: 14318179 Hz
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus 1 (RP01)
acpiprt2 at acpi0: bus 2 (RP02)
acpiprt3 at acpi0: bus 3 (RP03)
acpiprt4 at acpi0: bus -1 (RP04)
acpiprt5 at acpi0: bus -1 (RP05)
acpiprt6 at acpi0: bus -1 (RP06)
acpiprt7 at acpi0: bus -1 (RP07)
acpiprt8 at acpi0: bus -1 (RP08)
acpiec0 at acpi0
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
acpiac0 at acpi0: AC unit online
acpibat0 at acpi0: BAT0 model "45N1029" serial   338 type LION oem
"LGC"
acpibat1 at acpi0: BAT1 not present
acpibtn0 at acpi0: SLPB
acpibtn1 at acpi0: LID_
acpithinkpad0 at acpi0: version 1.0
acpicmos0 at acpi0
tpm0 at acpi0 TPM_ addr 0xfed4/0x5000, device 0x104a rev 0x4e
"PNP0C0B" at acpi0 not configured
"BOOT" at acpi0 not configured
acpicpu0 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10),
C1(1000@1 mwait.1), PSS
acpicpu1 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10),
C1(1000@1 mwait.1), PSS
acpicpu2 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10),
C1(1000@1 mwait.1), PSS
acpicpu3 at acpi0: C3(200@90 mwait.1@0x30), C2(500@63 mwait.1@0x10),
C1(1000@1 mwait.1), PSS
acpitz0 at acpi0: critical temperature is 100 degC
acpipwrres0 at acpi0: FPWR, resource for FAN_
acpitz1 at acpi0: critical temperature is 99 degC
acpivideo0 at acpi0: GFX0
acpivout0 at acpivideo0: LCD0
cpu0: using VERW MDS workaround (except on v

Re: ugen(4) communication issues with UPS (nut) blazer_usb and nutdrv_qx

2021-04-22 Thread Stuart Henderson
On 2021/04/22 22:52, xs wrote:
> - I've seen mentions of usb_quirks.c for usbhid driver in
>   /usr/local/share/doc/pkg-readmes/nut
> ```
> The option with fewest side-effects is to add the following entries to
> the table in /sys/dev/usb/usb_quirks.c and build a new kernel:
> 
> { USB_VENDOR_APC, USB_PRODUCT_APC_UPS, ANY, { UQ_BAD_HID }},
> { USB_VENDOR_APC, USB_PRODUCT_APC_UPS5G, ANY, { UQ_BAD_HID }},
> ```
> 
> - Would it be useful ?

No. That is for the case where a USB attaches to the uhid/upd drivers;
sometimes a device can still work with NUT when that happens, but often
not.

> - Is this related:
>   https://www.mail-archive.com/tech@openbsd.org/msg62221.html ?

Since it's attaching to ugen: no.

> - What should I try to get this USB communication reliable ?

Did it work any better with an older version?
Userland access to USB devices is definitely not perfect in OpenBSD.

btw, dmesg was missing from your mail.



ugen(4) communication issues with UPS (nut) blazer_usb and nutdrv_qx

2021-04-22 Thread xs
I'm having an USB communication issue with ugen and the blazer_usb NUT
driver on the last snapshot 6.9 GENERIC.MP#473 amd64.  Notably the
'ugen_clear_iface_eps: clear endpoints failed!' message and
communication loss with the UPS.

Here is a link to my dmesg: https://clbin.com/pc1dk

- NUT configuration is working normally until the following points
- Sometimes upsd can't access the device at boot, I have to manually
  restart the service
- "randomly" The communication fails and I have to restart the service
  too
- The device can be totally unaccessible after some restart, needing to
  replug the USB

Other points that could be useful:
- This UPS USB interface was working without any issue on FreeBSD and
  ugen
- The UPS is supported by nut with blazer_usb:
 
https://networkupstools.org/ddl/PowerWalker/Line-Interactive_VI_1400.html

It's mentioned it should work with the NUT nutdrv_qx driver too so I'm
trying with this one too, see below.

Below are detailed information of the issue:

The tail of /var/log/messages with comments:
Apr 22 19:30:25 se-h1 /bsd: sd5: 457860MB, 512 bytes/sector, 937697393
sectors
Apr 22 19:30:25 se-h1 /bsd: root on sd5a (468563ff41793b32.a) swap on
sd5b dump on sd5b
Apr 22 19:30:25 se-h1 /bsd: inteldrm0: 1024x768, 32bpp
Apr 22 19:30:25 se-h1 /bsd: wsdisplay0 at inteldrm0 mux 1: console
(std, vt100 emulation), using wskbd0
Apr 22 19:30:25 se-h1 /bsd: wsdisplay0: screen 1-5 added (std, vt100
emulation)
Apr 22 19:30:25 se-h1 ntpd[96719]: constraints configured but none
available
Apr 22 19:30:41 se-h1 ntpd[43289]: no reply received in time, skipping
initial time setting
Apr 22 19:30:41 se-h1 savecore: no core dump
## The following message happens when upsd tries to access the device
##
Apr 22 19:30:47 se-h1 /bsd: ugen_clear_iface_eps: clear endpoints
failed! 
Apr 22 19:30:56 se-h1 reorder_kernel: kernel relinking done
## At this point upsd can communicate to my device without any issue,
everything works as expected ##
## But suddenly, ten minutes later: ##
Apr 22 19:40:53 se-h1 blazer_usb[31504]: Communications with UPS lost:
status read failed!
Apr 22 19:40:55 se-h1 upsd[80896]: Data for UPS [ups0] is stale - check
driver
Apr 22 19:40:59 se-h1 upsmon[82703]: Poll UPS [ups0@localhost] failed -
Data stale
Apr 22 19:40:59 se-h1 upsmon[82703]: Communications with UPS
ups0@localhost lost
Apr 22 19:41:04 se-h1 upsmon[82703]: Poll UPS [ups0@localhost] failed -
Data stale
Apr 22 19:41:39 se-h1 last message repeated 7 times
Apr 22 19:42:24 se-h1 last message repeated 9 times
## So what I do there is rcctl restart upsd (sometimes I need to do it
several times) ##
Apr 22 19:42:27 se-h1 upsd[80896]: mainloop: Interrupted system call
Apr 22 19:42:29 se-h1 upsmon[82703]: Poll UPS [ups0@localhost] failed -
Write error: Broken pipe
## Again: ##
Apr 22 19:42:32 se-h1 /bsd: ugen_clear_iface_eps: clear endpoints
failed!
Apr 22 19:42:34 se-h1 upsmon[82703]: Communications with UPS
ups0@localhost established
## Back to an operational state ##

My device is 0665:5161
$ doas usbdevs -v
Controller /dev/usb0:
addr 01: 8086: Intel, xHCI root hub
 super speed, self powered, config 1, rev 1.00
 driver: uhub0
addr 02: 0665:5161 INNO TECH, USB to Serial # --- this one --- #
 low speed, power 100 mA, config 1, rev 0.02, iSerial 20100826
 driver: ugen0
addr 03: 1058:25a1 Western Digital, Elements 25A1
 super speed, power 224 mA, config 1, rev 10.14, iSerial
ommited
 driver: umass0
addr 04: 1058:1078 Western Digital, Elements 1078
 super speed, power 224 mA, config 1, rev 10.65, iSerial
ommited
 driver: umass1
addr 05: 1058:25a2 Western Digital, Elements 25A2
 super speed, power 224 mA, config 1, rev 10.04, iSerial
ommited
 driver: umass2
Controller /dev/usb1:
addr 01: 8086: Intel, EHCI root hub
 high speed, self powered, config 1, rev 1.00
 driver: uhub1
addr 02: 8087:8008 Intel, Rate Matching Hub
 high speed, self powered, config 1, rev 0.04
 driver: uhub3
Controller /dev/usb2:
addr 01: 8086: Intel, EHCI root hub
 high speed, self powered, config 1, rev 1.00
 driver: uhub2
addr 02: 8087:8000 Intel, Rate Matching Hub
 high speed, self powered, config 1, rev 0.04
 driver: uhub4

_ups permissions are set with /etc/rc.local
$ cat /etc/rc.local   
# ups0 ugen0 permissions for USB access
chgrp _ups /dev/usb0 /dev/ugen0*   
chmod g+rw /dev/usb0 /dev/ugen0*   

When access works:
$ upsc ups0
battery.charge: 100
battery.voltage: 13.70
battery.voltage.high: 13.00
battery.voltage.low: 10.40
battery.voltage.nominal: 12.0
device.type: ups
driver.flag.nolock: enabled
driver.name: blazer_usb
driver.parameter.pollinterval: 2
driver.parameter.port: /dev/ugen0
driver.parameter.productid: 5161
driver.parameter.protocol: megatec
driver.parameter.synchronous: yes
driver.parameter.vendorid: 0665
driver.version: 2.