Re: bsd.rd hangs on boot; bsd.mp works
On Mon, Dec 28, 2020 at 06:09:57PM +1100, Jonathan Gray wrote: > On Mon, Dec 28, 2020 at 04:31:09AM +, James Cook wrote: > > On Mon, Dec 28, 2020 at 01:19:19PM +1100, Jonathan Gray wrote: > > > On Sun, Dec 27, 2020 at 07:48:58PM +, James Cook wrote: > > > > > from your earlier bsd.mp dmesg: > > > > > > > > > > cpu0: Enhanced SpeedStep 16268 MHz: speeds: 1601, 1600, 1500, 1400, > > > > > 1300, 1200, 1100, 1000, 900, 800, 700, 600, 500, 400 MHz > > > > > > > > > > 1601 is variable/turbo speed the others are fixed. When running in > > > > > turbo mode getting the current frequency involves msrs which > > > > > hw.cpuspeed doesn't do. The initial 'x MHz:' is from cpuspeed not from > > > > > the acpi table. > > > > > > > > > > On modern machines these values are from acpicpu(4)/acpi _PSS. > > > > > > > > > > Can you show the output of running with the following diff to dump the > > > > > performance counter control values? > > > > > > > > > > MSR_PERF_FIXED_CTR_CTRL 0x38d > > > > > MSR_PERF_GLOBAL_CTRL 0x38f > > > > > > > > > > on a broadwell laptop this shows > > > > > > > > > > cpu0 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603 > > > > > cpu0 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3 > > > > > cpu0 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0 > > > > > cpu0 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf > > > > > cpu1 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603 > > > > > cpu1 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3 > > > > > cpu1 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0 > > > > > cpu1 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf > > > > > cpu2 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603 > > > > > cpu2 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3 > > > > > cpu2 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0 > > > > > cpu2 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf > > > > > cpu3 cpu_freq_ctr cpuid 0x0a eax 0x7300403 ebx 0x0 ecx 0x0 edx 0x603 > > > > > cpu3 cpu_freq_ctr perf ver 3 gp ctrs 4 fixed 3 > > > > > cpu3 cpu_freq_ctr MSR_PERF_FIXED_CTR_CTRL 0x0 > > > > > cpu3 cpu_freq_ctr MSR_PERF_GLOBAL_CTRL 0xf > > > > > > > > Here's dmesg with that patch. > > > > > > Thanks, here is another diff to try. It sets the bit to be enabled > > > not just for ring 0, fixes clearing a value and shows the values > > > read out of the counter. > > > > > > Index: sys/arch/amd64/amd64/identcpu.c > > > === > > > RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v > > > retrieving revision 1.117 > > > diff -u -p -r1.117 identcpu.c > > > --- sys/arch/amd64/amd64/identcpu.c 13 Sep 2020 05:57:28 - > > > 1.117 > > > +++ sys/arch/amd64/amd64/identcpu.c 28 Dec 2020 02:03:27 - > > > @@ -411,6 +411,9 @@ via_update_sensor(void *args) > > > } > > > #endif > > > > > > +#define MSR_PERF_GLOBAL_STATUS 0x38e > > > +#define MSR_PERF_GLOBAL_INUSE 0x392 > > > + > > > uint64_t > > > cpu_freq_ctr(struct cpu_info *ci) > > > { > > > @@ -421,13 +424,22 @@ cpu_freq_ctr(struct cpu_info *ci) > > > CPUIDEDX_NUM_FC(cpu_perf_edx) <= 1) > > > return (0); > > > > > > + if ((cpu_perf_eax & CPUIDEAX_VERID) > 3) { > > > + msr = rdmsr(MSR_PERF_GLOBAL_INUSE); > > > + printf("%s %s MSR_PERF_GLOBAL_INUSE 0x%llx\n", > > > ci->ci_dev->dv_xname, > > > + __func__, msr); > > > + } > > > + msr = rdmsr(MSR_PERF_GLOBAL_STATUS); > > > + printf("%s %s MSR_PERF_GLOBAL_STATUS before 0x%llx\n", > > > ci->ci_dev->dv_xname, > > > + __func__, msr); > > > + > > > msr = rdmsr(MSR_PERF_FIXED_CTR_CTRL); > > > if (msr & MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_MASK)) { > > > /* some hypervisor is dicking us around */ > > > return (0); > > > } > > > > > > - msr |= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_1); > > > + msr |= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_ANY); > > > wrmsr(MSR_PERF_FIXED_CTR_CTRL, msr); > > > > > > msr = rdmsr(MSR_PERF_GLOBAL_CTRL) | MSR_PERF_GLOBAL_CTR1_EN; > > > @@ -437,13 +449,20 @@ cpu_freq_ctr(struct cpu_info *ci) > > > delay(10); > > > count = rdmsr(MSR_PERF_FIXED_CTR1); > > > > > > + msr = rdmsr(MSR_PERF_GLOBAL_STATUS); > > > + printf("%s %s MSR_PERF_GLOBAL_STATUS after 0x%llx\n", > > > ci->ci_dev->dv_xname, > > > + __func__, msr); > > > + > > > msr = rdmsr(MSR_PERF_FIXED_CTR_CTRL); > > > - msr &= MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_MASK); > > > + msr &= ~MSR_PERF_FIXED_CTR_FC(1, MSR_PERF_FIXED_CTR_FC_MASK); > > > wrmsr(MSR_PERF_FIXED_CTR_CTRL, msr); > > > > > > msr = rdmsr(MSR_PERF_GLOBAL_CTRL); > > > msr &= ~MSR_PERF_GLOBAL_CTR1_EN; > > > wrmsr(MSR_PERF_GLOBAL_CTRL, msr); > > > + > > > + printf("%s %s count %lld last_count %lld freq %lld\n", > > > ci->ci_dev->dv_xname, > > > + __func__, count, last_count, ((count - last_count) * 10)); > > > > > > return ((count - last_count) * 10); > > > } > > > > Here's the end of /var/log/messages, recording a boot with the
Re: Unable to use Canon MG3250 USB scanner more than once
On Mon, Dec 28, 2020 at 08:55:33PM +, miko...@kucharski.name wrote: > > I've started adding printfs to xhci_device_generic_transfer() but that > slowed code path to the point that problem disappeared and scanner > started to work reliably, but (obviously) scan got really slow, from > couple of seconds to about 4 minutes. > ..and printfs to xhci_device_generic_start() -- Regards, Mikolaj
Unable to use Canon MG3250 USB scanner more than once
>Synopsis: After first successful scan on Canon MG3250 scanner, all next >ugenread()s timeout >Category: kernel >Environment: System : OpenBSD 6.8 Details : OpenBSD 6.8-current (GENERIC.MP) #246: Sun Dec 27 18:17:02 MST 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: When I use scanner first time via scanimage: scanimage --device-name pixma:04A91762_860FE4 \ --format png --mode gray \ --resolution 300 --output-file scan.png above command works and PNG file is properly saved with scanned image. When I want to scan again command fails: scanimage: sane_read: Operation was canceled When increasing libusb debug logs example iteration of failed attempt to scan looks as follows: [ 0.584877] [0003ef68] libusb: debug [libusb_alloc_transfer] transfer 0xea40237d150 [ 0.584902] [0003ef68] libusb: debug [libusb_submit_transfer] transfer 0xea40237d150 [ 0.584921] [0003ef68] libusb: debug [obsd_submit_transfer] [ 0.584979] [0003ef68] libusb: debug [_access_endpoint] endpoint 8 mode 0 [ 1.579537] [0003ef68] libusb: debug [_errno_to_libusb] error: Operation timed out (60) [ 1.579601] [0003ef68] libusb: debug [libusb_free_transfer] transfer 0xea40237d150 [ 1.579696] [0003ef68] libusb: debug [libusb_clear_halt] endpoint 88 [ 1.579862] [0003ef68] libusb: debug [obsd_clear_halt] On the kernel side my debug printfs look as follow: 2020-12-27T12:29:39.939Z pce-0067 /bsd: MMM>> dv_xname=ugen0 endpt=8 flag=0 [ugenread()|ugen.c|644] 2020-12-27T12:29:39.940Z pce-0067 /bsd: MMM dv_xname=ugen0 endpt=8 [ugen_do_read()|ugen.c|484] 2020-12-27T12:29:39.940Z pce-0067 /bsd: MMM dv_xname=ugen0 endpt=8 xfertype=2 [ugen_do_read()|ugen.c|503] 2020-12-27T12:29:39.940Z pce-0067 /bsd: MMM start UE_BULK [ugen_do_read()|ugen.c|549] 2020-12-27T12:29:42.997Z pce-0067 /bsd: MMM ! polling=0 flags=16 err=0 xfer->flags=6 xfer->status=15 xfer->done=1 [usbd_transfer()|usbdi.c|375] 2020-12-27T12:29:42.998Z pce-0067 /bsd: MMM final polling=0 xfer->flags=6 xfer->status=15 [usbd_transfer()|usbdi.c|389] 2020-12-27T12:29:42.998Z pce-0067 /bsd: MMM dv_xname=ugen0 endpt=8 err=15 n=1024 timeout=3000 flags=6 [ugen_do_read()|ugen.c|565] 2020-12-27T12:29:42.998Z pce-0067 /bsd: MMM end UE_BULK [ugen_do_read()|ugen.c|582] 2020-12-27T12:29:42.998Z pce-0067 /bsd: MMM<< dv_xname=ugen0 endpt=8 error=60 [ugenread()|ugen.c|647] It maybe hard to see from above but the core of it is that in usbdi.c file, in function usbd_transfer(): 358} else { 359while (!xfer->done) { 360flags = PRIBIO|(xfer->flags & USBD_CATCH ? PCATCH : 0); 361 362err = tsleep_nsec(xfer, flags, "usbsyn", INFSLP); 363if (err && !xfer->done) { 364usbd_abort_pipe(pipe); 365if (err == EINTR) 366xfer->status = USBD_INTERRUPTED; 367else 368xfer->status = USBD_TIMEOUT; 369} 370} 371} above function tsleep_nsec() on line 362 times out and xfer->status is set to USBD_TIMEOUT on line 368. When tsleep_nsec() times out, xhci_timeout() and xhci_timeout_task() from xhci.c are executed. I've started adding printfs to xhci_device_generic_transfer() but that slowed code path to the point that problem disappeared and scanner started to work reliably, but (obviously) scan got really slow, from couple of seconds to about 4 minutes. This is where I got stuck as I got out of ideas how to drill this down further in efficient manner. Decided to look on the mailing list and found: - https://marc.info/?t=15815036211=1=2 - https://marc.info/?t=15872902191=1=2 In above email threads I found patch quoted below in the Fix section. >How-To-Repeat: - boot OpenBSD - power on scanner - use scanimage to scan a picture - when first scan succeeds - any another scan fails with tsleep_nsec() on line 362 from usbdi.c function usbd_transfer() is timing out - unplugging from USB scanner is needed to be able to scan again - or power cycle scanner - or power cycle OpenBSD - basically ugen(4) detach / attach is needed >Fix: Per email threads on tech@ from above marc.info links below is not proper solution, but a workaround: --8<-- Index: dev/usb/ugen.c === RCS file: /cvs/src/sys/dev/usb/ugen.c,v retrieving revision 1.109 diff -u -p -u -r1.109 ugen.c --- dev/usb/ugen.c 25 Dec 2020 12:59:52 - 1.109 +++ dev/usb/ugen.c 27 Dec 2020 22:38:59 - @@ -441,6 +442,7 @@
Re: Memory leak with getaddrinfo()
On Mon, Dec 28, 2020 at 08:54:46PM +0100, Theo Buehler wrote: > On Sun, Dec 27, 2020 at 08:44:52PM +0100, Otto Moerbeek wrote: > > Hi, > > > > So here the diff that just fixes the mem leak on thread exit. It does > > not contains the TLS init part, I'd like to do that differently than > > what I did in the version I posted earlier. > > > > As stated before, this makes the getaddrino(3) test program run in > > constant memory. > > > > ok? > > Please wrap this line at ,void: > > > -_thread_tag_storage(void **tag, void *storage, size_t sz, void *err) > > +_thread_tag_storage(void **tag, void *storage, size_t sz, void (*dt)(void > > *),void *err) > > This diff made firefox 84.0.1 crash frequently on my main laptop (about > once every 10-15 minutes) while iridium was still stable and I could not > observe any other fallout. > > The backtrace always looked like this: OK, the two problems cannot be solved independently. I'll come up with a new diff -Otto > > #0 thrkill () at /tmp/-:3 > #1 0x00ae9eab90ee in nsProfileLock::FatalSignalHandler (signo=11, > info=0xae12f05420, context=0xae12f05330) > at > /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/toolkit/profile/nsProfileLock.cpp:168 > #2 0x00ae9f5aada7 in WasmTrapHandler (signum=11, info=0xae12f05420, > context=0xae12f05330) at > /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/js/src/wasm/WasmSignalHandlers.cpp:980 > #3 > #4 _asr_ctx_unref (ac=0xdfdfdfdfdfdfdfdf) at /usr/src/lib/libc/asr/asr.c:401 > #5 _asr_resolver_done_tp (arg=0xadc8b2eec0) at > /usr/src/lib/libc/asr/asr.c:141 > #6 0x00ae5f6d0025 in _rthread_tls_destructors (thread=0xae81077440) at > /usr/src/lib/libc/thread/rthread_tls.c:182 > #7 0x00ae5f65ec13 in _libc_pthread_exit (retval=) at > /usr/src/lib/libc/thread/rthread.c:150 > #8 0x00ae39f038d9 in _rthread_start (v=) at > /usr/src/lib/librthread/rthread.c:97 > #9 0x00ae5f6e1c8a in __tfork_thread () at > /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 > (gdb) >
Re: Memory leak with getaddrinfo()
On Sun, Dec 27, 2020 at 08:44:52PM +0100, Otto Moerbeek wrote: > Hi, > > So here the diff that just fixes the mem leak on thread exit. It does > not contains the TLS init part, I'd like to do that differently than > what I did in the version I posted earlier. > > As stated before, this makes the getaddrino(3) test program run in > constant memory. > > ok? Please wrap this line at ,void: > -_thread_tag_storage(void **tag, void *storage, size_t sz, void *err) > +_thread_tag_storage(void **tag, void *storage, size_t sz, void (*dt)(void > *),void *err) This diff made firefox 84.0.1 crash frequently on my main laptop (about once every 10-15 minutes) while iridium was still stable and I could not observe any other fallout. The backtrace always looked like this: #0 thrkill () at /tmp/-:3 #1 0x00ae9eab90ee in nsProfileLock::FatalSignalHandler (signo=11, info=0xae12f05420, context=0xae12f05330) at /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/toolkit/profile/nsProfileLock.cpp:168 #2 0x00ae9f5aada7 in WasmTrapHandler (signum=11, info=0xae12f05420, context=0xae12f05330) at /usr/obj/ports/firefox-84.0.1/firefox-84.0.1/js/src/wasm/WasmSignalHandlers.cpp:980 #3 #4 _asr_ctx_unref (ac=0xdfdfdfdfdfdfdfdf) at /usr/src/lib/libc/asr/asr.c:401 #5 _asr_resolver_done_tp (arg=0xadc8b2eec0) at /usr/src/lib/libc/asr/asr.c:141 #6 0x00ae5f6d0025 in _rthread_tls_destructors (thread=0xae81077440) at /usr/src/lib/libc/thread/rthread_tls.c:182 #7 0x00ae5f65ec13 in _libc_pthread_exit (retval=) at /usr/src/lib/libc/thread/rthread.c:150 #8 0x00ae39f038d9 in _rthread_start (v=) at /usr/src/lib/librthread/rthread.c:97 #9 0x00ae5f6e1c8a in __tfork_thread () at /usr/src/lib/libc/arch/amd64/sys/tfork_thread.S:84 (gdb)
Re: X11 blanking problem related to page flipping?
On Sat, Nov 21, 2020 at 10:16:23AM +0100, Antoine Jacoutot wrote: >> I've been having an intermittent problem for some months with my X11 >> screen blanking (i.e. going totally black -- however, the mouse still >> displays and changes shape depending on what it's hovering over!). >> Switching to Ctrl+Alt+F1 and back fixes it, though once in a while causes >> X11 to crash. Sometimes this doesn't happen for a few days, at which point >> I think I've fixed it. Sometimes it happens several times in 10 minutes. >> There is no obvious trigger for it: I might have a web browser loaded or >> not, or ffmpeg running or not etc etc. > I have been seeing the same. I can usually reproduce it by having lots of > chromium tabs with heavy web sites opened (like clicking on 5 or 6 links > from google news). A follow-up to this, since the problem still occurs. I've been playing around with not running Chrome or Iridium, and I think Antoine has put his finger on the nub of the problem: if I don't use Chrome or Iridium, the problem never manifests. At some point after using them, I then have intermittent blanking, which sometimes needs a console switch to fix, and perhaps 1/3 of the time a console fix kills X. I don't know what Chrome or Iridium are doing that triggers this problem, but nothing else I run seems to trigger it. Laurie