Fatal protection fault when installing under qemu/kvm

2023-07-26 Thread Tom Lawlor
Hello

I have tried to install openbsd under qemu/kvm, but when it is installing
the sets it triggers a protection fault.

[image: image.png]

The vm has:
1 vCPU
1Gib RAM

It is running under a zen3 processor.

Steps to reproduce

1. Press enter using automatic options for every option
2. Protection fault occurs


Realtek 8169sc chiptest not compatible

2023-05-16 Thread Tom Hsiung
Hello, sir

My name is Tom. Recently I installed a OpenBSD system at home. And it is
great. I brought a PCI ethernet card, tp-link tg3269c. But I failed to
light it up. Here is data:

ifconfig

re0: flags=8802 mtu 1500

lladdr xx:xx:xx:xx:xx:xx

index 1 priority 0 llprio 3

media: Ethernet none (none)


dmesg


re0 at pci1 dev 7 function 0 "Realtek 8168" rev 0x10re0: reset never
completed!

re0: unknown bus speed, assume 33MHz

: unknown ASIC (0x7c80), apic 2 int 10, address ff:ff:ff:ff:ff:ff

re0: no PHY found!


There is the driver available for Unix system but it’s only for FreeBSD.
Could you guide me to drive this device? Thank you.


Device detail information:
https://static.tp-link.com/res/down/doc/TG-3269_V3_UG.pdf


Tom


Re: Kernel panic involving drm

2021-07-13 Thread Tom Murphy
Hi Jonathan,

On Tue, Jul 13, 2021 at 01:13:03PM +1000, Jonathan Gray wrote:
> On Mon, Jul 12, 2021 at 06:22:36PM +0000, Tom Murphy wrote:
> > I had firefox open (various tabs/windows) and was playing a 3D game
> > (games/quakespasm) and after a random amount of time I got a hard lock up,
> > but the second time it happened I was able to get into a ddb prompt. I've
> > added the panic message and trace and dmesg.
> > 
> > I don't have a serial console on this laptop so had to transcribe this by
> > hand from a photo I took on my phone. (Is there an easier way to save
> > these?)
> 
> It is possible to get a trace out of a crash dump, see crash(8).
> But yes serial or amt sol is easier.

Thanks! I'll have a closer look at crash(8).

> > panic: kernel diagnostic assertion "to_ticks >= 0" failed: file
> > "/usr/src/sys/kern/kern_timeout.c", line 299
> > Stopped at  db_enter+0x10:  popq   %rbp
> > TIDPIDUID PRFLAGSPFLAGS   CPU COMMAND
> > *395451  18070  0 0x14000 0x200 0Kdrmtskl
> >   61931  46160  0 0x14000 0x200 2 drmwq
> >  185485  53694  0 0x14000 0x200 1 drmwq
> >  284820  77991  0 0x14000 0x200 3 drmwq
> > db_enter() at db_enter+0x10
> > panic(81e5f243) at panic+0xbf
> > __assert(81ec940a,81eb2a3e,12b,81ec1795) at
> > __assert+0x2b
> > timeout_add(80bf0410,) at timeout_add+0x1cc
> > process_csb(80bf) at process_csb+0x38b
> > execlists_submission_tasklet(80bf) at
> > execlists_submission_tasklet+0x48
> > tasklet_run(80bf03c0) at tasklet_run+0x44
> > taskq_thread(80220f00) at taskq_thread+0x81
> > end trace frame: 0x0, count: 7
> 
> The timeout_add() call comes from i915_utils.c set_timer_ms()
>   mod_timer(t, jiffies + timeout ?: 1);
> 
> can you try this patch?
> 
> Index: sys/dev/pci/drm/include/linux/timer.h
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/include/linux/timer.h,v
> retrieving revision 1.6
> diff -u -p -r1.6 timer.h
> --- sys/dev/pci/drm/include/linux/timer.h 7 Jul 2021 02:38:36 -   
> 1.6
> +++ sys/dev/pci/drm/include/linux/timer.h 13 Jul 2021 02:54:26 -
> @@ -24,10 +24,20 @@
>  #include 
>  #include 
>  
> -#define mod_timer(x, y)  timeout_add((x), ((y) - jiffies))
>  #define del_timer_sync(x)timeout_del_barrier((x))
>  #define del_timer(x) timeout_del((x))
>  #define timer_pending(x) timeout_pending((x))
> +
> +static inline int
> +mod_timer(struct timeout *to, unsigned long j)
> +{
> + int ticks = j - jiffies;
> + if (ticks <= 0) {
> + timeout_del(to);
> + return timeout_add(to, 1);
> + }
> + return timeout_add(to, ticks);
> +}
>  
>  static inline unsigned long
>  round_jiffies_up(unsigned long j)

This patch seems to work for me. I did some pretty rigorous testing
with it and attempted to recreate the same conditions that made it
crash however I wasn't able to get the kernel panic so that is a good
sign!

Thanks,
Tom



Kernel panic involving drm

2021-07-12 Thread Tom Murphy
I had firefox open (various tabs/windows) and was playing a 3D game 
(games/quakespasm) and after a random amount of time I got a hard lock 
up, but the second time it happened I was able to get into a ddb prompt. 
I've added the panic message and trace and dmesg.


I don't have a serial console on this laptop so had to transcribe this 
by hand from a photo I took on my phone. (Is there an easier way to save 
these?)


Let me know if you need any more information.

Thanks,
Tom

panic: kernel diagnostic assertion "to_ticks >= 0" failed: file 
"/usr/src/sys/kern/kern_timeout.c", line 299

Stopped at  db_enter+0x10:  popq   %rbp
TIDPIDUID PRFLAGSPFLAGS   CPU COMMAND
*395451  18070  0 0x14000 0x200 0Kdrmtskl
  61931  46160  0 0x14000 0x200 2 drmwq
 185485  53694  0 0x14000 0x200 1 drmwq
 284820  77991  0 0x14000 0x200 3 drmwq
db_enter() at db_enter+0x10
panic(81e5f243) at panic+0xbf
__assert(81ec940a,81eb2a3e,12b,81ec1795) at 
__assert+0x2b

timeout_add(80bf0410,) at timeout_add+0x1cc
process_csb(80bf) at process_csb+0x38b
execlists_submission_tasklet(80bf) at 
execlists_submission_tasklet+0x48

tasklet_run(80bf03c0) at tasklet_run+0x44
taskq_thread(80220f00) at taskq_thread+0x81
end trace frame: 0x0, count: 7
https://www.openbsd.orf/ddb.html describes the minimum info required in 
bug

reports.  Insufficient info makes it difficult to find and fix bugs.

trace:

db_enter() at db_enter+0x10
panic(81e5f243) at panic+0xbf
__assert(81ec940a,81eb2a3e,12b,81ec1795) at 
__assert+0x2b

timeout_add(80bf0410,) at timeout_add+0x1cc
process_csb(80bf) at process_csb+0x38b
execlists_submission_tasklet(80bf) at 
execlists_submission_tasklet+0x48

tasklet_run(80bf03c0) at tasklet_run+0x44
taskq_thread(80220f00) at taskq_thread+0x81
end trace frame: 0x0, count: -8
dmesg:

OpenBSD 6.9-current (GENERIC.MP) #121: Fri Jul  9 12:22:52 MDT 2021
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17040445440 (16251MB)
avail mem = 16507965440 (15743MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries)
bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017
bios0: PC Specialist LTD N13xWU
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT 
SSDT SSDT DBGP DBG2 DMAR BGRT ASF! WSMT
acpi0: wakeup devices RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) 
PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) 
PXSX(S4) RP24(S4) PXSX(S4) [...]

acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.77 MHz, 06-8e-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN

cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.06 MHz, 06-8e-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN

cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.06 MHz, 06-8e-0a
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,L

Re: Bridge with lots of (90) vlans as ports does not forward frames after restart

2020-06-16 Thread Tom Smyth
This Issue is resolved in 6.7 Release and 6.7 Stable,  (and I assume in
Current)

Thanks for your help and I hope the details I sent earlier are of some help
(they were stuck in my draft folder)

Thanks again,
Tom Smyth

On Wed, 17 Jun 2020 at 01:35, Tom Smyth 
wrote:

> Hello,
> Sorry for the delay in replying  I dont get the opportunity to reboot
> that box often and with the COVID-19 Crisis
> Im trying to minimise maintenance  that impacts customers
>
> i also tried to split the number of vlans across an additional ix(4)
> parent interface but this did not change the behaviour
>
> Please find output of diagnostic commands below
>
> #netstat -nl bridge101
>
> netstat -nI bridge101  (after reboot (broken)
> ngabr# cat netstat-nI-bridge101-afterreboot
> NameMtu   Network Address  Ipkts IfailOpkts Ofail
> Colls
> bridge101 1500 51546 0  2813599
>0 0
>
> ngabr# cat netstat-nIbridge101-postnetstart  (after netstart (fixed))
> NameMtu   Network Address  Ipkts IfailOpkts Ofail
> Colls
> bridge101 1500   5119709 0  8565403
>0 0
>
> ###
> #netstat -s
>
> ngabr# cat netstat-s-afterreboot  (broken)
> ip:
> 3999 total packets received
> 0 bad header checksums
> 0 with size smaller than minimum
> 0 with data size < data length
> 0 with header length < data size
> 0 with data length < header length
> 0 with bad options
> 0 with incorrect version number
> 0 fragments received
> 0 fragments dropped (duplicates or out of space)
> 0 malformed fragments dropped
> 0 fragments dropped after timeout
> 0 packets reassembled ok
> 237 packets for this host
> 0 packets for unknown/unsupported protocol
> 0 packets forwarded
> 3092 packets not forwardable
> 0 redirects sent
> 10977 packets sent from this host
> 0 packets sent with fabricated ip header
> 0 output packets dropped due to no bufs, etc.
> 11122 output packets discarded due to no route
> 0 output datagrams fragmented
> 0 fragments created
> 0 datagrams that can't be fragmented
> 0 fragment floods
> 0 packets with ip length > max ip packet size
> 0 tunneling packets that can't find gif
> 0 datagrams with bad address in header
> 3656 input datagrams software-checksummed
> 2837975 output datagrams software-checksummed
> 3335 multicast packets which we don't join
> icmp:
> 225 calls to icmp_error
> 0 errors not generated because old message was icmp
> 0 errors not generated because of rate limitation
> Output packet histogram:
> destination unreachable: 225
> 0 messages with bad code fields
> 0 messages < minimum length
> 0 bad checksums
> 0 messages with bad length
> 0 echo requests to broadcast/multicast rejected
> 0 message responses generated
> igmp:
> 0 messages received
> 0 messages received with too few bytes
> 0 messages received with bad checksum
> 0 membership queries received
> 0 membership queries received with invalid field(s)
> 0 membership reports received
> 0 membership reports received with invalid field(s)
> 0 membership reports received for groups to which we belong
> 0 membership reports sent
> ipencap:
> 0 total input packets
> 0 total output packets
> 0 packets shorter than header shows
> 0 packets dropped due to policy
> 0 packets with possibly spoofed local addresses
> 0 packets were dropped due to full output queue
> 0 input bytes
> 0 output bytes
> 0 protocol family mismatches
> 0 attempts to use tunnel with unspecified endpoint(s)
> tcp:
> 54 packets sent
> 24 data packets (2022 bytes)
> 0 data packets (0 bytes) retransmitted
> 0 fast retransmitted packets
> 18 ack-only packets (18 delayed)
> 0 URG only packets
> 0 window probe packets
> 0 window update packets
> 12 control packets
> 0 packets software-checksummed
> 51 packets received
> 21 acks (for 1956 bytes)
> 6 duplicate acks
> 0 acks for unsent data
> 

Re: Bridge with lots of (90) vlans as ports does not forward frames after restart

2020-06-16 Thread Tom Smyth
  0 no route
0 administratively prohibited
0 beyond scope
0 address unreachable
0 port unreachable
0 packet too big
0 time exceed transit
0 time exceed reassembly
0 erroneous header field
0 unrecognized next header
0 unrecognized option
0 redirect
0 unknown
0 message responses generated
0 messages with too many ND options
0 messages with bad ND options
0 bad neighbor solicitation messages
0 bad neighbor advertisement messages
0 bad router solicitation messages
0 bad router advertisement messages
0 bad redirect messages
0 path MTU changes
rip6:
0 messages received
0 checksum calculations on inbound
0 messages with bad checksum
0 messages dropped due to no socket
0 multicast messages dropped due to no socket
0 messages dropped due to full socket buffers
0 delivered
0 datagrams output


On Fri, 20 Mar 2020 at 10:55, Tom Smyth  wrote:
>
> Hi Stefan,
> I have attached the hostname.bridge101,
> and the output of ifconfig bridge101 (both after reboot and after the
> restart of the interface
>
> The server is in production so I'll run those commands you requested,
> I will build up a current box
> to drop in and test also Thanks
> Tom Smyth
>
> On Fri, 20 Mar 2020 at 07:54, Stefan Sperling  wrote:
> >
> > On Fri, Mar 20, 2020 at 01:50:42AM +, Tom Smyth wrote:
> > > >Synopsis:  > > >reboot >
> > > >Category: 
> > > >Environment:
> > > System  : OpenBSD 6.6
> > > Details : OpenBSD 6.6 (GENERIC.MP) #7: Thu Mar 12 11:55:22 MDT 2020
> > > r...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > > Architecture: OpenBSD.amd64
> > > Machine : amd64
> > > >Description:
> > >  > > protected ports
> > >  the bridge will not forward frames however if you run sh
> > > /etc/netstart bridgex it does
> > > start to forward
> > > ifconfig bridge101 after reboot compared with ifconfig bridge101 after
> > > the restart of the interface
> > > using sh /etc/netstart  appear very similar, the vlans appear to be
> > > members of the bridge fine
> > > both the bridge appears to learn mac addresses on ports both after
> > > reboot and after manual restart
> > > of interface)
> > > the only difference that I observed was the interface index
> > > after reboot bridge index was 6
> > > after restarting the interface the bridge index was 98 >
> > > >How-To-Repeat:
> > >  > >  reboot the machine >
> > > >Fix:
> > >  > > I have added sh /etc/netstart bridge101 to /etc/rc.local
> >
> > Tom, could you share your hostname.bridge101 file?
> > That might make it easier for others to reproduce the issue.
> >
> > Is there any obvious difference in the counters displayed by
> > netstat -nI bridge101 or netstat -s in the working vs non-working states?
> >
> > And does it also happen on -current?
>
>
>
> --
> Kindest regards,
> Tom Smyth.



--
Kindest regards,
Tom Smyth.



Re: Bridge with lots of (90) vlans as ports does not forward frames after restart

2020-03-20 Thread Tom Smyth
Hi Stefan,
I have attached the hostname.bridge101,
and the output of ifconfig bridge101 (both after reboot and after the
restart of the interface

The server is in production so I'll run those commands you requested,
I will build up a current box
to drop in and test also Thanks
Tom Smyth

On Fri, 20 Mar 2020 at 07:54, Stefan Sperling  wrote:
>
> On Fri, Mar 20, 2020 at 01:50:42AM +0000, Tom Smyth wrote:
> > >Synopsis: 
> > >Category: 
> > >Environment:
> > System  : OpenBSD 6.6
> > Details : OpenBSD 6.6 (GENERIC.MP) #7: Thu Mar 12 11:55:22 MDT 2020
> > r...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> >  > protected ports
> >  the bridge will not forward frames however if you run sh
> > /etc/netstart bridgex it does
> > start to forward
> > ifconfig bridge101 after reboot compared with ifconfig bridge101 after
> > the restart of the interface
> > using sh /etc/netstart  appear very similar, the vlans appear to be
> > members of the bridge fine
> > both the bridge appears to learn mac addresses on ports both after
> > reboot and after manual restart
> > of interface)
> > the only difference that I observed was the interface index
> > after reboot bridge index was 6
> > after restarting the interface the bridge index was 98 >
> > >How-To-Repeat:
> >  >  reboot the machine >
> > >Fix:
> >  > I have added sh /etc/netstart bridge101 to /etc/rc.local
>
> Tom, could you share your hostname.bridge101 file?
> That might make it easier for others to reproduce the issue.
>
> Is there any obvious difference in the counters displayed by
> netstat -nI bridge101 or netstat -s in the working vs non-working states?
>
> And does it also happen on -current?



-- 
Kindest regards,
Tom Smyth.
bridge101: flags=41
index 6 llprio 3
groups: bridge
priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto rstp
designated: id 00:00:00:00:00:00 priority 0
vlan3982 flags=3
port 8 ifpriority 0 ifcost 0 protected 1
vlan3983 flags=3
port 9 ifpriority 0 ifcost 0 protected 1
vlan3984 flags=3
port 10 ifpriority 0 ifcost 0 protected 1
vlan3985 flags=3
port 11 ifpriority 0 ifcost 0 protected 1
vlan3986 flags=3
port 12 ifpriority 0 ifcost 0 protected 1
vlan3987 flags=3
port 13 ifpriority 0 ifcost 0 protected 1
vlan3988 flags=3
port 14 ifpriority 0 ifcost 0 protected 1
vlan3989 flags=3
port 15 ifpriority 0 ifcost 0 protected 1
vlan3990 flags=3
port 16 ifpriority 0 ifcost 0 protected 1
vlan3991 flags=3
port 17 ifpriority 0 ifcost 0 protected 1
vlan3992 flags=3
port 18 ifpriority 0 ifcost 0 protected 1
vlan3993 flags=3
port 19 ifpriority 0 ifcost 0 protected 1
vlan3994 flags=3
port 20 ifpriority 0 ifcost 0 protected 1
vlan3995 flags=3
port 21 ifpriority 0 ifcost 0 protected 1
vlan3996 flags=3
port 22 ifpriority 0 ifcost 0 protected 1
vlan3997 flags=3
port 23 ifpriority 0 ifcost 0 protected 1
vlan3998 flags=3
port 24 ifpriority 0 ifcost 0 protected 1
vlan3999 flags=3
port 25 ifpriority 0 ifcost 0 protected 1
vlan4000 flags=3
port 26 ifpriority 0 ifcost 0 protected 1
vlan4001 flags=3
port 27 ifpriority 0 ifcost 0 protected 1
vlan4002 flags=3
port 28 ifpriority 0 ifcost 0 protected 1
vlan4003 flags=3
port 29 ifpriority 0 ifcost 0 protected 1
vlan4004 flags=3
port 30 ifpriority 0 ifcost 0 protected 1
vlan4005 flags=3
port 31 ifpriority 0 ifcost 0 protected 1
vlan4006 flags=3
port 32 ifpriority 0 ifcost 0 protected 1
vlan4007 flags=3
port 33 ifpriority 0 ifcost 0 protected 1
vlan4008 flags=3
port 34 ifpriority 0 ifcost 0 protected 1
vlan4009 flags=3
port 35 ifpriority 0 ifcost 0 protected 1
vlan4010 flags=3
port 36 ifpriority 0 ifcost 0 protected 1
vlan4011 flags=3
port 37 ifpriority 0 ifcost 0 protected 1
vlan4012 flags=3
port 38 ifpriority 0 ifcost 0 protected 1
vlan4013 flags=3
port 39 ifpriority 0 ifcost 0 protected 1
vlan4014 flags=3
port 40 ifpriority 0 ifcost 0 protecte

Re: Bridge with lots of (90) vlans as ports does not forward frames after restart

2020-03-20 Thread Tom Smyth
Hello, Ryan
I had suspected the syspatches but reverting the few that I thought I had
applied since early January  didn't seem to work. my focus was diverted
when I found that restarting the interface seemed to work with the the latest
patches applied.
Thanks


On Fri, 20 Mar 2020 at 10:41, Ryan Freeman  wrote:
>
> On Fri, Mar 20, 2020 at 03:30:52AM -0700, Ryan Freeman wrote:
> > On Fri, Mar 20, 2020 at 08:54:40AM +0100, Stefan Sperling wrote:
> > > On Fri, Mar 20, 2020 at 01:50:42AM +, Tom Smyth wrote:
> > > > >Synopsis:  > > > >reboot >
> > > > >Category: 
> > > > >Environment:
> > > > System  : OpenBSD 6.6
> > > > Details : OpenBSD 6.6 (GENERIC.MP) #7: Thu Mar 12 11:55:22 MDT 2020
> > > > r...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > >
> > > > Architecture: OpenBSD.amd64
> > > > Machine : amd64
> > > > >Description:
> > > >  > > > protected ports
> > > >  the bridge will not forward frames however if you run sh
> > > > /etc/netstart bridgex it does
> > > > start to forward
> > > > ifconfig bridge101 after reboot compared with ifconfig bridge101 after
> > > > the restart of the interface
> > > > using sh /etc/netstart  appear very similar, the vlans appear to be
> > > > members of the bridge fine
> > > > both the bridge appears to learn mac addresses on ports both after
> > > > reboot and after manual restart
> > > > of interface)
> > > > the only difference that I observed was the interface index
> > > > after reboot bridge index was 6
> > > > after restarting the interface the bridge index was 98 >
> > > > >How-To-Repeat:
> > > >  > > >  reboot the machine >
> > > > >Fix:
> > > >  > > > I have added sh /etc/netstart bridge101 to /etc/rc.local
> > >
> > > Tom, could you share your hostname.bridge101 file?
> > > That might make it easier for others to reproduce the issue.
> > >
> > > Is there any obvious difference in the counters displayed by
> > > netstat -nI bridge101 or netstat -s in the working vs non-working states?
> > >
> > > And does it also happen on -current?
> > >
> >
> > Hey,
> >
> > I just experienced something very similar to this last night on two
> > separate hosts running amd64 6.6.  I had rebooted the first machine
> > after installing syspatch 021, 022 and noted i couldn't reach the
> > vm after reboot.
>
> Gah, that should say syspatch 022_sysctl, 023_sosplice.  Apologies.



-- 
Kindest regards,
Tom Smyth.



Bridge with lots of (90) vlans as ports does not forward frames after restart

2020-03-19 Thread Tom Smyth
SENDBUG: -*- sendbug -*-
SENDBUG: Lines starting with `SENDBUG' will be removed automatically.
SENDBUG:
SENDBUG: Choose from the following categories:
SENDBUG:
SENDBUG: system user library documentation kernel alpha amd64 arm hppa
i386 m88k mips64 powerpc sh sparc sparc64 vax
SENDBUG:
SENDBUG:
To: bugs@openbsd.org
Subject: hostname.bridge with 90 protected ports does not forward on reboot
From: tom.sm...@wirelessconnect.eu
Cc: tom.sm...@wirelessconnect.eu
Reply-To: tom.sm...@wirelessconnect.eu

>Synopsis: 
>Category: 
>Environment:
System  : OpenBSD 6.6
Details : OpenBSD 6.6 (GENERIC.MP) #7: Thu Mar 12 11:55:22 MDT 2020
r...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:

>How-To-Repeat:

>Fix:


SENDBUG: dmesg, pcidump, acpidump and usbdevs are attached.
SENDBUG: Feel free to delete or use the -D flag if they contain
sensitive information.

dmesg:
OpenBSD 6.6 (GENERIC.MP) #7: Thu Mar 12 11:55:22 MDT 2020

r...@syspatch-66-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 2130526208 (2031MB)
avail mem = 2053304320 (1958MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf5920 (9 entries)
bios0: vendor SeaBIOS version
"rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org" date 04/01/2014
bios0: QEMU Standard PC (Q35 + ICH9, 2009)
acpi0 at bios0: ACPI 3.0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC SSDT HPET MCFG
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 591.81 MHz, 06-3e-04
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,TSC_ADJUST,SMEP,ERMS,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
64b/line 16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 618.40 MHz, 06-3e-04
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,TSC_ADJUST,SMEP,ERMS,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
64b/line 16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 615.32 MHz, 06-3e-04
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,TSC_ADJUST,SMEP,ERMS,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
64b/line 16-way L2 cache
cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 616.93 MHz, 06-3e-04
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,TSC_ADJUST,SMEP,ERMS,UMIP,MD_CLEAR,IBRS,IBPB,STIBP,SSBD,ARAT,XSAVEOPT,MELTDOWN
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
64b/line 16-way L2 cache
cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: smt 0, core 3, package 0
ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
acpihpet0 at acpi0: 1 Hz
acpimcfg0 at acpi0
acpimcfg0: addr 0xb000, bus 0-255
acpiprt0 at acpi0: bus 0 (PCI0)
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no _STA method
no 

Re: Errors in X - Build Date Mar 14 2020 01:48:26 UTC

2020-03-14 Thread Tom Murphy
On Sat, Mar 14, 2020 at 03:53:58PM +0100, Theo Buehler wrote:
> On Sat, Mar 14, 2020 at 02:01:23PM +0000, Tom Murphy wrote:
> > Hi,
> > 
> >   I've narrowed it down.
> >   Further testing shows it's not kernel, but Xenocara. If I
> >   install xbase66.tgz from 12th March, it's fine. If I use the
> >   newest snapshot xbase66.tgz, X crashes.
> 
> There was only one xenocara commit in that window:
> https://marc.info/?l=openbsd-cvs=158413257531932=2
> 
> I managed to reproduce a cwm crash once while playing around with the
> set-title option in tmux, but have trouble doing so now.

It looks like this line just under the variable declarations
in client_set_name makes it all blow up:

free(cc->name);

It's not being checked if cc->name is NULL.

Yes, I am using cwm as my window manager.

Thanks,
Tom



Re: Errors in X - Build Date Mar 14 2020 01:48:26 UTC

2020-03-14 Thread Tom Murphy
Hi,

  I've narrowed it down.
  Further testing shows it's not kernel, but Xenocara. If I
  install xbase66.tgz from 12th March, it's fine. If I use the
  newest snapshot xbase66.tgz, X crashes.

  Thanks,
  Tom



Re: Errors in X - Build Date Mar 14 2020 01:48:26 UTC

2020-03-14 Thread Tom Murphy
Hi,

   I tested -current src with and without jsg@'s commits from:
   https://marc.info/?l=openbsd-cvs=158415441706312=2

   Both kernels work fine.

   It looks like the problem was introduced between the March 12 and
   the March 13/14th snapshot.

   So I don't think it's those commits, it's something to do with the
   snapshot.

   Thanks,
   Tom



Errors in X - Build Date Mar 14 2020 01:48:26 UTC

2020-03-14 Thread Tom Murphy
Hi,

  I upgraded from the March 8th snapshot to March 14th one and
  when running various apps in X, X crashes and goes back to Xenodm
  login screen.

  1. Run xterm, then run tmux in it:
 xterm: fatal IO error 35 (Resource temporarily unavailable) or 
 KillClient on X server ":0"

X crashes back to Xenodm

  2. Run Firefox:
 Gdk-Message: 10:34:14.918: /usr/local/lib/firefox/firefox: Fatal 
 IO error 35 (Resource temporarily unavailable) on X server :0.
 Gdk-Message: 10:34:14.961: firefox: Fatal IO error 9 (Bad file 
 descriptor) on X server :0.
 Gdk-Message: 10:34:14.954: /usr/local/lib/firefox/firefox: Fatal 
 IO error 35 (Resource temporarily unavailable) on X server :0.

  I downgraded to Build date: 1583972840 - Thu Mar 12 00:27:20 UTC 2020
  and that seems to have fixed the problem.

  I see some commits for the Intel display driver by jsg@ on the 13th 
  March.  Not sure if related...

  Dmesg attached below.

  Thanks,
  Tom

OpenBSD 6.6-current (RAMDISK_CD) #49: Fri Mar 13 19:47:01 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/RAMDISK_CD
real mem = 17040445440 (16251MB)
avail mem = 16519979008 (15754MB)
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries)
bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017
bios0: PC Specialist LTD N13xWU
acpi0 at bios0: ACPI 6.0
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT 
DBGP DBG2 DMAR BGRT ASF! WSMT
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.74 MHz, 06-8e-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
cpu at mainbus0: not configured
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 120 pins
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG0)
acpiprt2 at acpi0: bus -1 (PEG1)
acpiprt3 at acpi0: bus -1 (PEG2)
acpiprt4 at acpi0: bus -1 (RP17)
acpiprt5 at acpi0: bus -1 (RP18)
acpiprt6 at acpi0: bus -1 (RP19)
acpiprt7 at acpi0: bus -1 (RP20)
acpiprt8 at acpi0: bus -1 (RP21)
acpiprt9 at acpi0: bus -1 (RP22)
acpiprt10 at acpi0: bus -1 (RP23)
acpiprt11 at acpi0: bus -1 (RP24)
acpiprt12 at acpi0: bus 1 (RP01)
acpiprt13 at acpi0: bus -1 (RP02)
acpiprt14 at acpi0: bus -1 (RP03)
acpiprt15 at acpi0: bus -1 (RP04)
acpiprt16 at acpi0: bus 58 (RP05)
acpiprt17 at acpi0: bus 59 (RP06)
acpiprt18 at acpi0: bus -1 (RP07)
acpiprt19 at acpi0: bus -1 (RP08)
acpiprt20 at acpi0: bus -1 (RP09)
acpiprt21 at acpi0: bus -1 (RP10)
acpiprt22 at acpi0: bus -1 (RP11)
acpiprt23 at acpi0: bus -1 (RP12)
acpiprt24 at acpi0: bus -1 (RP13)
acpiprt25 at acpi0: bus -1 (RP14)
acpiprt26 at acpi0: bus -1 (RP15)
acpiprt27 at acpi0: bus -1 (RP16)
acpiec0 at acpi0
acpicpu at acpi0 not configured
acpitz at acpi0 not configured
"PNP0A08" at acpi0 not configured
acpicmos0 at acpi0
"PNP0C14" at acpi0 not configured
"INT33A1" at acpi0 not configured
"PNPC000" at acpi0 not configured
"PNP0C0C" at acpi0 not configured
"PNP0C0E" at acpi0 not configured
"PNP0C0D" at acpi0 not configured
"ACPI0003" at acpi0 not configured
"PNP0C0A" at acpi0 not configured
"PNP0C14" at acpi0 not configured
cpu0: using Skylake AVX MDS workaround
pci0 at mainbus0 bus 0
pchb0 at pci0 dev 0 function 0 "Intel Core 8G Host" rev 0x08
"Intel UHD Graphics 620" rev 0x07 at pci0 dev 2 function 0 not configured
xhci0 at pci0 dev 20 function 0 "Intel 100 Series xHCI" rev 0x21: msi, xHCI 1.0
usb0 at xhci0: USB revision 3.0
uhub0 at usb0 configuration 1 interface 0 "Intel xHCI root hub" rev 3.00/1.00 
addr 1
"Intel 100 Series Thermal" rev 0x21 at pci0 dev 20 function 2 not configured
"Intel 100 Series MEI" rev 0x21 at pci0 dev 22 function 0 not configured
ahci0 at pci0 dev 23 function 0 "Intel 100 Series AHCI" rev 0x21: msi, AHCI 
1.3.1
ahci0: port 0: 6.0Gb/s
scsibus0 at ahci0: 32 targets
sd0 at scsibus0 targ 0 lun 0:  naa.5001b448b66ac709
sd0: 476940MB, 512 bytes/sector, 976773168 sectors, thin
ppb0 at pci0 dev 28 function 0 "Intel 100 Series PCIE" rev 0xf1: msi

Re: 6.5 stable SMP (2 processor) Kernel panic when ifconfig is called and there are 2 bridges are running in two different Rdomains

2019-05-23 Thread Tom Smyth
Ill have a go at enabling witness, in the kernel,  and re -running...


On Thu, 23 May 2019 at 12:41, Martin Pieuchot  wrote:

> On 23/05/19(Thu) 02:29, Tom Smyth wrote:
> > Hello
> > Kernel panic when ifconfig is called and 2 bridges are running in two
> > different rdomains
>
> Your trace shows that a context switch occurs while a thread is still
> holding a mutex.  It isn't clear which thread is that.
>
> The traces shows that a userland process gets preempted via Xsyscall/ast
> /preempt.  Which raises the question: is the assertwaitok() happening
> too late?
>
> If you can easily reproduce the problem, please enable WITNESS in your
> kernel and try again.
>


-- 
Kindest regards,
Tom Smyth.


Re: 6.5 stable SMP (2 processor) Kernel panic when ifconfig is called and there are 2 bridges are running in two different Rdomains

2019-05-22 Thread Tom Smyth
hello

I can confim that bridge with a virtio vio interface seems to be the
only interface affected by this bug ...

I have tried em vlan tap vmx also and these seem to be bug free

I hope this helps

On Thu, 23 May 2019 at 03:36, Tom Smyth 
wrote:

> Hello all,
>
> the issue was not related to 2 bridges, or 2 rdomains as previously
> thought
>
> it seems to be related to the bridge and the virtio network driver vio
>
> by adding 1 virtio  vio interface into just 1 bridge
> and then running ifconfig bridge causes the kernel panic
>
> demsg is shown below
>
> WARNING: SPL NOT LOWERED ON SYSCALL 54 3 EXIT 0 9
> Stopped at  savectx+0xb1:   movl$0,%gs:0x508
> ddb{0}> rebooting...
> OpenBSD 6.5 (GENERIC.MP) #3: Sat Apr 13 14:48:43 MDT 2019
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 4278042624 (4079MB)
> avail mem = 4138766336 (3947MB)
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf69e0 (10 entries)
> bios0: vendor SeaBIOS version "
> rel-1.9.3-0-ge2fc41e-prebuilt.qemu-project.org" date 04/01/2014
> bios0: QEMU Standard PC (i440FX + PIIX, 1996)
> acpi0 at bios0: rev 0
> acpi0: sleep states S3 S4 S5
> acpi0: tables DSDT FACP APIC HPET SRAT
> acpi0: wakeup devices
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 486.78 MHz, 06-3e-04
> cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,SMEP,ERMS,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
> 64b/line 16-way L2 cache
> cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> cpu0: apic clock running at 999MHz
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 681.80 MHz, 06-3e-04
> cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,SMEP,ERMS,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB
> 64b/line 16-way L2 cache
> cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
> cpu1: smt 0, core 1, package 0
> ioapic0 at mainbus0: apid 0 pa 0xfec0, version 11, 24 pins
> acpihpet0 at acpi0: 1 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpicpu0 at acpi0: C1(@1 halt!)
> acpicpu1 at acpi0: C1(@1 halt!)
> "ACPI0006" at acpi0 not configured
> acpipci0 at acpi0 PCI0: _OSC failed
> acpicmos0 at acpi0
> "PNP0A06" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "PNP0A06" at acpi0 not configured
> "QEMU0002" at acpi0 not configured
> "ACPI0010" at acpi0 not configured
> pvbus0 at mainbus0: KVM
> pvclock0 at pvbus0
> pci0 at mainbus0 bus 0
> pchb0 at pci0 dev 0 function 0 "Intel 82441FX" rev 0x02
> pcib0 at pci0 dev 1 function 0 "Intel 82371SB ISA" rev 0x00
> pciide0 at pci0 dev 1 function 1 "Intel 82371SB IDE" rev 0x00: DMA,
> channel 0 wired to compatibility, channel 1 wired to compatibility
> pciide0: channel 0 disabled (no drives)
> atapiscsi0 at pciide0 channel 1 drive 0
> scsibus1 at atapiscsi0: 2 targets
> cd0 at scsibus1 targ 0 lun 0:  ATAPI 5/cdrom
> removable
> cd0(pciide0:1:0): using PIO mode 4, DMA mode 2
> uhci0 at pci0 dev 1 function 2 "Intel 82371SB USB" rev 0x01: apic 0 int 11
> piixpm0 at pci0 dev 1 function 3 "Intel 82371AB Power" rev 0x03: apic 0
> int 9
> iic0 at piixpm0
> vga1 at pci0 dev 2 function 0 "Cirrus Logic CL-GD5446" rev 0x00
> wsdisplay0 at vga1 mux 1: console (80x25, vt100 emulation)
> wsdisplay0: screen 1-5 added (80x25, vt100 emulation)
> virtio0 at pci0 dev 5 function 0 "Qumranet Virtio SCSI" rev 0x00
> vioscsi0 at virtio0: qsize 128
> scsibus2 at vioscsi0: 255 targets
> sd0 at scsibus2 targ 0 lun 0:  SCSI3 0/direct
> fixed
> sd0: 10240MB, 512 bytes/sector, 20971520 sectors, thin
> virtio0: msix shared
> virtio1 at pci0 dev 18 funct

Re: 6.5 stable SMP (2 processor) Kernel panic when ifconfig is called and there are 2 bridges are running in two different Rdomains

2019-05-22 Thread Tom Smyth
 at ppb1 bus 2
isa0 at pcib0
isadma0 at isa0
fdc0 at isa0 port 0x3f0/6 irq 6 drq 2
pckbc0 at isa0 port 0x60/5 irq 1 irq 12
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0: console keyboard, using wsdisplay0
pms0 at pckbc0 (aux slot)
wsmouse0 at pms0 mux 0
pcppi0 at isa0 port 0x61
spkr0 at pcppi0
usb0 at uhci0: USB revision 1.0
uhub0 at usb0 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (1daaa231ebc31d47.a) swap on sd0b dump on sd0b
WARNING: / was not properly unmounted
fd0 at fdc0 drive 1: density unknown

I hope this helps,


Tom Smyth



I hope this helps

Tom Smyth


On Thu, 23 May 2019 at 02:47, Tom Smyth 
wrote:

> Further testing
>
> show that Rdomains and the bridge may not be the issue,  as starting both
> bridges from boot in the same rdomain (rdomain0 )
>
> and then running ifconfig bridge
> caused the crash
> Ill try to isolate the exact config that is causing the issue further
>
> On Thu, 23 May 2019 at 02:29, Tom Smyth 
> wrote:
>
>> Hello
>> Kernel panic when ifconfig is called and 2 bridges are running in two
>> different rdomains
>>
>> cat /etc/hostname.bridge240
>> rdomain 240
>> maxaddr 16384
>> timeout 300
>> up
>> add vlan1097 blocknonip vlan1097 -stp vlan1097
>> add vlan240 blocknonip vlan240 -stp vlan240
>>
>>
>> cat /etc/hostname.bridge101
>>
>>  cat /root/hostname.bridge101
>> maxaddr 16384
>> timeout 300
>> up
>> add vio2 blocknonip vio2 -stp vio2
>> add vlan101 blocknonip vlan101 -stp vlan101
>>
>>
>> the crash does not happen when the two bridges are in the same rdomain.
>>
>> the crash seems to occur if  i run
>> ifconfig bridge
>>
>>
>> ifconfig bridge
>> bridge240: flags=41
>> index 7 llprio 3
>> groups: bridge
>> priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto
>> rstp
>> designated: id 00:00:00:00:00:00 priority 0
>> vlan240 flags=7
>> port 14 ifpriority 0 ifcost 0
>> vlan1097 flags=7
>> port 13 ifpriority 0 ifcost 0
>> Addresses (max cache: 16384, timeout: 300):
>> 7a:cb:d2:c3:28:70 vlan1097 1 flags=0<>
>> bridge101: flags=41
>> index 16 llprio 3
>> groups: bridge
>> priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto
>> rstp
>> designated: id 00:00:00:00:00:00 priority 0
>> vlan101 flags=7
>> port 11 ifpriority 0 ifcost 0
>> vio2 flags=7
>> port 3 ifpriority 0 ifcost 0
>> Addresses (max cache: 16384, timeout: 300):
>>
>> No further output is observed and the there is a panic msg on the vga
>> console
>>
>> I have attached images of the panic and I have tried to follow ddb
>> output debugging as best I can
>>
>> Thanks
>> Tom smyth
>>
>>
>>
>>
>>
>> --
>> Kindest regards,
>> Tom Smyth.
>>
>
>
> --
> Kindest regards,
> Tom Smyth.
>


-- 
Kindest regards,
Tom Smyth.


Re: 6.5 stable SMP (2 processor) Kernel panic when ifconfig is called and there are 2 bridges are running in two different Rdomains

2019-05-22 Thread Tom Smyth
Further testing

show that Rdomains and the bridge may not be the issue,  as starting both
bridges from boot in the same rdomain (rdomain0 )

and then running ifconfig bridge
caused the crash
Ill try to isolate the exact config that is causing the issue further

On Thu, 23 May 2019 at 02:29, Tom Smyth 
wrote:

> Hello
> Kernel panic when ifconfig is called and 2 bridges are running in two
> different rdomains
>
> cat /etc/hostname.bridge240
> rdomain 240
> maxaddr 16384
> timeout 300
> up
> add vlan1097 blocknonip vlan1097 -stp vlan1097
> add vlan240 blocknonip vlan240 -stp vlan240
>
>
> cat /etc/hostname.bridge101
>
>  cat /root/hostname.bridge101
> maxaddr 16384
> timeout 300
> up
> add vio2 blocknonip vio2 -stp vio2
> add vlan101 blocknonip vlan101 -stp vlan101
>
>
> the crash does not happen when the two bridges are in the same rdomain.
>
> the crash seems to occur if  i run
> ifconfig bridge
>
>
> ifconfig bridge
> bridge240: flags=41
> index 7 llprio 3
> groups: bridge
> priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto
> rstp
> designated: id 00:00:00:00:00:00 priority 0
> vlan240 flags=7
> port 14 ifpriority 0 ifcost 0
> vlan1097 flags=7
> port 13 ifpriority 0 ifcost 0
> Addresses (max cache: 16384, timeout: 300):
> 7a:cb:d2:c3:28:70 vlan1097 1 flags=0<>
> bridge101: flags=41
> index 16 llprio 3
> groups: bridge
> priority 32768 hellotime 2 fwddelay 15 maxage 20 holdcnt 6 proto
> rstp
> designated: id 00:00:00:00:00:00 priority 0
> vlan101 flags=7
> port 11 ifpriority 0 ifcost 0
> vio2 flags=7
> port 3 ifpriority 0 ifcost 0
> Addresses (max cache: 16384, timeout: 300):
>
> No further output is observed and the there is a panic msg on the vga
> console
>
> I have attached images of the panic and I have tried to follow ddb  output
> debugging as best I can
>
> Thanks
> Tom smyth
>
>
>
>
>
> --
> Kindest regards,
> Tom Smyth.
>


-- 
Kindest regards,
Tom Smyth.


acme-client: renewal fails

2019-01-28 Thread tom
>Synopsis:  acme-client: renewal fails
>Category:  system
>Environment:
System  : OpenBSD 6.4
Details : OpenBSD 6.4 (GENERIC.MP) #364: Thu Oct 11 13:30:23 MDT 
2018
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Renewal fails:
# acme-client -vv lists.dl6tom.de
acme-client: /etc/acme/letsencrypt-privkey.pem: loaded RSA account key
acme-client: /etc/ssl/lists.dl6tom.de.crt: certificate renewable: -42 days left
acme-client: /etc/ssl/private/lists.dl6tom.de.key: loaded RSA domain key
acme-client: https://acme-v01.api.letsencrypt.org/directory: directories
acme-client: acme-v01.api.letsencrypt.org: DNS: 104.111.246.175
acme-client: transfer buffer: [{ "0wdNjYxn8kA": 
"https://community.letsencrypt.org/t/adding-random-entries-to-the-directory/33417;,
 "key-change": "https://acme-v01.api.letsencrypt.org/acme/key-change;, "meta": 
{ "caaIdentities": [ "letsencrypt.org" ], "terms-of-service": 
"https://letsencrypt.org/documents/LE-SA-v1.2-November-15-2017.pdf;, "website": 
"https://letsencrypt.org; }, "new-authz": 
"https://acme-v01.api.letsencrypt.org/acme/new-authz;, "new-cert": 
"https://acme-v01.api.letsencrypt.org/acme/new-cert;, "new-reg": 
"https://acme-v01.api.letsencrypt.org/acme/new-reg;, "revoke-cert": 
"https://acme-v01.api.letsencrypt.org/acme/revoke-cert; }] (658 bytes)
acme-client: https://acme-v01.api.letsencrypt.org/acme/new-authz: req-auth: 
lists.dl6tom.de
acme-client: acme-v01.api.letsencrypt.org: cached
acme-client: acme-v01.api.letsencrypt.org: cached
acme-client: transfer buffer: [{ "identifier": { "type": "dns", "value": 
"lists.dl6tom.de" }, "status": "pending", "expires": "2019-01-29T18:19:20Z", 
"challenges": [ { "type": "tls-alpn-01", "status": "pending", "uri": 
"https://acme-v01.api.letsencrypt.org/acme/challenge/IibpqF0ckn28LYY5bfA-_qbAlYsWq-DJcQlAw0SWCE0/11749882442;,
 "token": "v8oZc_-YhBHNLCaALLEBZ03hEl--KM63pMdqixg_9Io" }, { "type": "http-01", 
"status": "pending", "uri": 
"https://acme-v01.api.letsencrypt.org/acme/challenge/IibpqF0ckn28LYY5bfA-_qbAlYsWq-DJcQlAw0SWCE0/11749882443;,
 "token": "yW3-6mo2IK-ZASKPB6lV6rPq1qbvfP1NdUE9AV0xRTs" }, { "type": 
"tls-sni-01", "status": "pending", "uri": 
"https://acme-v01.api.letsencrypt.org/acme/challenge/IibpqF0ckn28LYY5bfA-_qbAlYsWq-DJcQlAw0SWCE0/11749882444;,
 "token": "yfhU9kYZg5wHaRlxLmg6m_DWgzzEdwUnztXAKBmhE6w" }, { "type": "dns-01", 
"status": "pending", "uri": 
"https://acme-v01.api.letsencrypt.org/acme/challenge/IibpqF0ckn28LYY5bfA-_qbAlYsWq-DJcQlAw0SWCE0/11749882445;,
 "token": "iDBP2CeNpp0r5NCWTbpKUoiBOSZz8cJN8HphHRVXULk" } ], "combinations": [ 
[ 2 ], [ 0 ], [ 1 ], [ 3 ] ] }] (1271 bytes)
acme-client: /var/www/acme/yW3-6mo2IK-ZASKPB6lV6rPq1qbvfP1NdUE9AV0xRTs: created
acme-client: 
https://acme-v01.api.letsencrypt.org/acme/challenge/IibpqF0ckn28LYY5bfA-_qbAlYsWq-DJcQlAw0SWCE0/11749882443:
 challenge
acme-client: acme-v01.api.letsencrypt.org: cached
acme-client: acme-v01.api.letsencrypt.org: cached
acme-client: transfer buffer: [{ "type": "http-01", "status": "pending", "uri": 
"https://acme-v01.api.letsencrypt.org/acme/challenge/IibpqF0ckn28LYY5bfA-_qbAlYsWq-DJcQlAw0SWCE0/11749882443;,
 "token": "yW3-6mo2IK-ZASKPB6lV6rPq1qbvfP1NdUE9AV0xRTs", "keyAuthorization": 
"yW3-6mo2IK-ZASKPB6lV6rPq1qbvfP1NdUE9AV0xRTs.YJLLEKdoM4e4WocQ9C9xvXqa6dAO4zUn6hdCgEgIfBs"
 }] (337 bytes)
acme-client: 
https://acme-v01.api.letsencrypt.org/acme/challenge/IibpqF0ckn28LYY5bfA-_qbAlYsWq-DJcQlAw0SWCE0/11749882443:
 status
acme-client: acme-v01.api.letsencrypt.org: cached
acme-client: https://acme-v01.api.letsencrypt.org/acme/new-cert: certificate
acme-client: acme-v01.api.letsencrypt.org: cached
acme-client: acme-v01.api.letsencrypt.org: cached
acme-client: https://acme-v01.api.letsencrypt.org/acme/new-cert: bad HTTP: 403
acme-client: transfer buffer: [{ "type": "urn:acme:error:unauthorized", 
"detail": "Error creating new cert :: authorizations for these names not found 
or expired: lists.dl6tom.de", "status": 403 }] (171 bytes)
acme-client: bad exit: netproc(61794): 1

/var/www/logs/access.log says:
default 66.133.109.36 - - [22/Jan/2019:19:19:31 +0100] "GET 
/.well-known/acme-challenge/yW3-6mo2IK-ZASKPB6lV6rPq1qbvfP1NdUE9AV0xRTs 
HTTP/1.1" 404 0

I fetched the acme-client source and modified it to not delete the token (sry, 
did not find the post pointing to the "status: pending" problem), now I get:
# acme-client -vv lists.dl6tom.de
acme-client: /etc/acme/letsencrypt-privkey.pem: loaded RSA account key
acme-client: /etc/ssl/lists.dl6tom.de.crt: certificate renewable: -42 days left
acme-client: /etc/ssl/private/lists.dl6tom.de.key: loaded RSA domain key
acme-client: https://acme-v01.api.letsencrypt.org/directory: directories
acme-client: acme-v01.api.letsencrypt.org: DNS: 104.111.246.175
acme-client: transfer buffer: [{ "K7_kgkaQbu0": 

Re: 6.4 openBGPD Segfault caused by filters referencing undeclared prefix-set

2018-11-18 Thread Tom Smyth
Hello Stuart

Thanks for the helpful advice on giving a better crash report i will do
that going forward...

On Sun 18 Nov 2018, 10:50 Stuart Henderson  On 2018/11/18 08:58, Tom Smyth wrote:
> > I have attached the coredump
>
> Generally the coredump by itself isn't that useful to others as it
> needs the right binary to go with it, a backtrace is usually better:
>
> gdb /usr/sbin/bgpd /path/to/bgpd.core
> bt
>
> For reference, in some cases multiple processes will crash, because
> the filenames are all "bgpd.core" they can overwrite each other.
> See the bottom of the sysctl manpage for a way around that.
>
>


Re: 6.4 openBGPD Segfault caused by filters referencing undeclared prefix-set

2018-11-18 Thread Tom Smyth
just to confim
im running 6.4 GENERIC.MP#364 amd64
On Sun, 18 Nov 2018 at 08:58, Tom Smyth  wrote:
>
> Hello,
> I was configuring an openbsd 6.4 bgpd router using the
> /etc/examples/bgpd.conf as a template
>
> If you comment out the  prefix-set mynetworks
> # list of networks that may be originated by our ASN
> #prefix-set mynetworks {\
> #192.0.2.0/24\
> #2001:db8:abcd::/48\
> #}
>
> and leave filters that depend on the prefix-set
> in the bgpd.conf file
>
> bgpd segfaults on startup  as shown below
>
> corertfw2# bgpd -dv
> startup
> rereading config
> session engine ready
> ASN = "62129"
> peer closed imsg connection
> Segmentation fault (core dumped)
> SE: Lost connection to parent
> session engine exiting
> corertfw2# route decision engine ready
> peer closed imsg connection
> fatal in RDE: Lost connection to parent
>
>
> I have attached the coredump
>
> Removing filters that reference undeclared prefix-set mynetworks
> resolved the issue ...
>
> --
> Kindest regards,
> Tom Smyth



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-09-26 Thread Tom Murphy
Hi Martin,
On Wed, Sep 19, 2018 at 10:36:41AM -0300, Martin Pieuchot wrote:
> At that moment you don't need to plug back the phone.  The ugen(4)
> driver has a flawed logic where it waits 1min for all the transfers
> to finish.  So just wait until you see:
> 
> usb_detach_wait: ugen1 didn't detach
> 
> What would help a lot is the output of:
> 
> # ps -lAk
> 
> After unplugging the phone but before seeing "usb_detach_wait".
> 
> If you can also get this output after seeing the message & before
> killing the server that would also help.
> 
> Cheers,
> Martin

OK I managed to capture the ps from before usb_detach_wait and after.

Here's the dmesg-related stuff:

1) Plug in phone:

xhci0: port=7 change=0x80
xhci0: port=7 change=0x80
xhci0: xhci_cmd_slot_control
xhci0: dev 3, input=0xff00766a6000 slot=0xff00766a6020 
ep0=0xff00766a6040
xhci0: dev 3, setting DCBAA to 0x766a7000
xhci_pipe_init: pipe=0x8154f000 addr=0 depth=1 port=7 speed=3 dev 3 dci 
1 (epAddr=0x0)
xhci0: xhci_cmd_set_address BSR=1
xhci0: xhci_cmd_set_address BSR=0
xhci0: dev 3 addr 3
ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4
ugen_set_config: ugen1 to configno 1, sc=0x815c8000
ugen_set_config: ifaceno 0
ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x815c8468
ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x815c8310

2) Start adb:

ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=1 endpt=0
ugenioctl: cmd=80045572
ugenioctl: cmd=c020556f
xhci0: short xfer 0xff047d8afc30 for 48
ugenioctl: cmd=80045572
ugenioctl: cmd=c020556f
xhci0: short xfer 0xff047d8afc30 for 51
ugenopen: flag=3, mode=8192, unit=1 endpt=1
ugenopen: sc=0x815c8000, endpt=1, dir=0, sce=0x815c8310
xhci_pipe_init: pipe=0x81586000 addr=4 depth=1 port=7 speed=3 dev 3 dci 
2 (epAddr=0x1)
xhci0: xhci_cmd_configure_ep dev 3
ugenopen: flag=3, mode=8192, unit=1 endpt=1
ugenopen: sc=0x815c8000, endpt=1, dir=0, sce=0x815c8310
xhci_pipe_init: pipe=0x815cf000 addr=4 depth=1 port=7 speed=3 dev 3 dci 
2 (epAddr=0x1)
xhci0: xhci_cmd_configure_ep dev 3
ugenopen: sc=0x815c8000, endpt=1, dir=1, sce=0x815c8468
xhci_pipe_init: pipe=0x815d addr=4 depth=1 port=7 speed=3 dev 3 dci 
3 (epAddr=0x81)
xhci0: xhci_cmd_configure_ep dev 3
ugenioctl: cmd=80045572
ugen1: ugenwrite: 1
ugenwrite: transfer 24 bytes
ugenopen: sc=0x815c8000, endpt=1, dir=1, sce=0x815c8468
xhci_pipe_init: pipe=0x815d1000 addr=4 depth=1 port=7 speed=3 dev 3 dci 
3 (epAddr=0x81)
xhci0: xhci_cmd_configure_ep dev 3
xhci0: wrong trb index (4294967040) max is 255
ugenioctl: cmd=80045572
ugenioctl: cmd=80045571
ugen1: ugenread: 1
ugenread: start transfer 24 bytes
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0

3) Full dmesg

OpenBSD 6.4-beta (CUSTOM) #6: Wed Sep 26 21:00:22 BST 2018
tom@freya.pertho.local:/usr/src/sys/arch/amd64/compile/CUSTOM
real mem = 17040445440 (16251MB)
avail mem = 16514740224 (15749MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries)
bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017
bios0: PC Specialist LTD N13xWU
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT 
DBGP DBG2 DMAR
BGRT ASF! WSMT
acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) 
PXSX(S4) RP20(S4)
PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) PXSX(S4) RP24(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.76 MHz, 06-8e-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX1

Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-09-12 Thread Tom Murphy
On Wed, Sep 12, 2018 at 12:55:01PM -0300, Martin Pieuchot wrote:
> Hello Tom,
> 
> On 08/09/18(Sat) 12:07, Tom Murphy wrote:
> > On Thu, Sep 06, 2018 at 01:06:50PM -0300, Martin Pieuchot wrote:
> > > Tom, as I said previously you've found a race in the ugen(4) driver.
> > > 
> > > That's the symptom:
> > > 
> > > > [...]
> > > > usb_detach_wait: ugen1 didn't detach
> > > 
> > > To be able to understand which race we are chasing, could you rebuild a
> > > kernel with UGEN_DEBUG defined and set `ugendebug' to 6?
> >  
> > OK here's the output per each step. Below that will be the dmesg and the 
> > backtrace.
> 
> Thanks a lot, but I need the same outputs with both UGEN_DEBUG and
> XHCI_DEBUG, and of course `ugendebug' set to 6 :)
> 
> The interaction between ugen(4) the stack and xhci(4) is what will tell
> us where is the use-after-free :)

Oh sorry about that I had replaced XHCI_DEBUG with UGEN_DEBUG.

Anyway here's testing under kernel with both turned on and the ugendebug set to 
6:

1. Plugging in phone

xhci0: port=7 change=0x80
xhci0: port=7 change=0x80
xhci0: xhci_cmd_slot_control
xhci0: dev 3, input=0xff0077164000 slot=0xff0077164020 
ep0=0xff0077164040
xhci0: dev 3, setting DCBAA to 0x77165000
xhci_pipe_init: pipe=0x81596000 addr=0 depth=1 port=7 speed=3 dev 3 dci 
1 (epAddr=0x0)
xhci0: xhci_cmd_set_address BSR=1
xhci0: xhci_cmd_set_address BSR=0
xhci0: dev 3 addr 3
ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4
ugen_set_config: ugen1 to configno 1, sc=0x81534000
ugen_set_config: ifaceno 0
ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81534468
ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81534310

2. Starting adb

ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=1 endpt=0
ugenioctl: cmd=80045572
ugenioctl: cmd=c020556f
xhci0: short xfer 0xff047d8afe10 for 48
ugenioctl: cmd=80045572
ugenioctl: cmd=c020556f
xhci0: short xfer 0xff047d8afe10 for 51
ugenopen: flag=3, mode=8192, unit=1 endpt=1
ugenopen: sc=0x81534000, endpt=1, dir=0, sce=0x81534310
xhci_pipe_init: pipe=0x8155 addr=4 depth=1 port=7 speed=3 dev 3 dci 
2 (epAddr=0x1)
xhci0: xhci_cmd_configure_ep dev 3
ugenopen: flag=3, mode=8192, unit=1 endpt=1
ugenopen: sc=0x81534000, endpt=1, dir=0, sce=0x81534310
xhci_pipe_init: pipe=0x815e2000 addr=4 depth=1 port=7 speed=3 dev 3 dci 
2 (epAddr=0x1)
xhci0: xhci_cmd_configure_ep dev 3
ugenopen: sc=0x81534000, endpt=1, dir=1, sce=0x81534468
xhci_pipe_init: pipe=0x815e3000 addr=4 depth=1 port=7 speed=3 dev 3 dci 
3 (epAddr=0x81)
xhci0: xhci_cmd_configure_ep dev 3
ugenioctl: cmd=80045572
ugenioctl: cmd=80045571
ugen1: ugenread: 1
ugenread: start transfer 24 bytes
ugenopen: sc=0x81534000, endpt=1, dir=1, sce=0x81534468
xhci_pipe_init: pipe=0x815e4000 addr=4 depth=1 port=7 speed=3 dev 3 dci 
3 (epAddr=0x81)
xhci0: xhci_cmd_configure_ep dev 3
ugenioctl: cmd=80045572
ugen1: ugenwrite: 1
ugenwrite: transfer 24 bytes
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0

3. Unplugged phone

ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=0 endpt=0
xhci0: port=7 change=0x80
ugen_detach: sc=0x81534000 flags=1
xhci_abort_xfer: xfer=0xff047d8afc30 status=IN_PROGRESS err=CANCELLED 
actlen=0 len=24 idx=0
xhci0: xhci_cmd_stop_ep dev 3 dci 2
xhci0: wrong trb index (4294967040) max is 255
xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 2

4. Plug back in phone

xhci0: port=7 change=0x80
usb_detach_wait: ugen1 didn't detach
ugenclose: close control
ugenclose: endpt=1 dir=0 sce=0x81534310
xhci0: xhci_cmd_configure_ep dev 3
ugenclose: endpt=1 dir=1 sce=0x81534468
xhci0: xhci_cmd_configure_ep dev 3
ugen1 detached
xhci0: xhci_cmd_configure_ep dev 3
xhci0: xhci_cmd_slot_control
xhci0: port=7 change=0x80
xhci0: xhci_cmd_slot_control
xhci0: dev 4, input=0xff00764c5000 slot=0xff00764c5020 
ep0=0xff00764c5040
xhci0: dev 4, setting DCBAA to 0x77164000
xhci_pipe_init: pipe=0x815db000 addr=0 depth=1 port=7 speed=3 dev 4 dci 
1 (epAddr=0x0)
xhci0: xhci_cmd_set_address BSR=1
xhci0: xhci_cmd_set_address BSR=0
xhci0: dev 4 addr 4
ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4
ugen_set_config: ugen1 to configno 1, sc=0x81534000
ugen_set_config: ifaceno 0
ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81534468
ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81534310
ugenopen: flag=3, mode=8192, u

Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-09-08 Thread Tom Murphy
Hi Martin,

On Thu, Sep 06, 2018 at 01:06:50PM -0300, Martin Pieuchot wrote:
> Tom, as I said previously you've found a race in the ugen(4) driver.
> 
> That's the symptom:
> 
> > [...]
> > usb_detach_wait: ugen1 didn't detach
> 
> To be able to understand which race we are chasing, could you rebuild a
> kernel with UGEN_DEBUG defined and set `ugendebug' to 6?
 
OK here's the output per each step. Below that will be the dmesg and the 
backtrace.

1. Plug in phone:

ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4
ugen_set_config: ugen1 to configno 1, sc=0x81533000
ugen_set_config: ifaceno 0
ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81533468
ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81533310

2. Run adb start-server:

ugenopen: flag=3, mode=8192, unit=0 endpt=0
ugenopen: flag=3, mode=8192, unit=1 endpt=0
ugenioctl: cmd=80045572
ugenioctl: cmd=c020556f
ugenioctl: cmd=80045572
ugenioctl: cmd=c020556f
ugenopen: flag=3, mode=8192, unit=1 endpt=1
ugenopen: sc=0x81533000, endpt=1, dir=0, sce=0x81533310
ugenopen: flag=3, mode=8192, unit=1 endpt=1
ugenopen: sc=0x81533000, endpt=1, dir=0, sce=0x81533310
ugenopen: sc=0x81533000, endpt=1, dir=1, sce=0x81533468
ugenioctl: cmd=80045572
ugenioctl: cmd=80045571
ugen1: ugenread: 1
ugenread: start transfer 24 bytes


This message starts repeating over and over:
ugenopen: flag=3, mode=8192, unit=0 endpt=0

3. Disconnect phone:

ugen_detach: sc=0x81533000 flags=1
xhci0: wrong trb index (4294967040) max is 255

4. Run adb kill-server

usb_detach_wait: ugen1 didn't detach
ugenclose: close control
ugenclose: endpt=1 dir=0 sce=0x81533310
ugenclose: endpt=1 dir=1 sce=0x81533468
ugen1 detached
ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 4
ugen_set_config: ugen1 to configno 1, sc=0x81533000
ugen_set_config: ifaceno 0
ugen_set_config: endptno 0, endpt=0x81(1,128), sce=0x81533468
ugen_set_config: endptno 1, endpt=0x01(1,0), sce=0x81533310

Right after the above ugen messages, the kernel protection fault trap:

kernel: protection fault trap, code=0
Stopped at   xhci_abort_xfer+0x57:cmpb $0,0x471(%r14)
ddb{1}> bt
xhci_abort_xfer(8f61ab7a60c70bc6,4) at xhci_abort_xfer+0x57
usbd_transfer(4369088e1033fbfc) at usbd_transfer+0x24d
ugen_do_read(ba86656ba01aa6cf,800031bf8570,81533000,ff047e7b2a20)
at ugen_do_read+0x3eb
ugenread(a7bc233819d8f0fa,800031bf8570,800031bf8470) at ugenread+0x47
spec_read(6cdfcf35dbaa936b) at spec_read+0xab
VOP_READ(bbd6b618d0407947,e27a94ce45ee2e22,ff047e7b2a20,ff03)
at VOP_READ+0x49
dofilereadv(408a5e296dd9b5f,30,8000fffe79f8,3,800031bf86a0) at
dofilereadv+0xe0
sys_read(bbd6b618d0080a08,e27a94ce45ee2e22,18) at sys_read+0x5c
syscall(6cdfcf35db6c51e6) at syscall+0x32a
Xsyscall(6,3,17bc55b14750,3,1,17bc84df5a00) at Xsyscall+0x128
end of kernel
end trace frame: 0x17bc9ef24890, count: -11


Dmesg:

OpenBSD 6.4-beta (CUSTOM) #3: Sat Sep  8 10:43:09 BST 2018
tom@freya.pertho.local:/sys/arch/amd64/compile/CUSTOM
real mem = 17040445440 (16251MB)
avail mem = 16514748416 (15749MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries)
bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017
bios0: PC Specialist LTD N13xWU
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT 
DBGP DBG2 DMAR BGRT ASF! WSMT
acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) 
PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) 
PXSX(S4) RP24(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.76 MHz, 06-8e-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,D

Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-09-05 Thread Tom Murphy
Hi Martin,

On Wed, Sep 05, 2018 at 11:19:56AM -0300, Martin Pieuchot wrote:
> On 05/09/18(Wed) 09:51, Tom Murphy wrote:
> > [...] 
> > I physically unplug the phone and the kernel starts generating the xhci0: 
> > timeout aborting transfer messages in a loop.
> 
> Aah!  So the messages appear *after* you unplugged the device.  That
> makes sense.  Does the diff below help?  I just committed the previous
> one so make sure you have current sources.
> 
> Index: xhci.c
> ===
> RCS file: /cvs/src/sys/dev/usb/xhci.c,v
> retrieving revision 1.88
> diff -u -p -r1.88 xhci.c
> --- xhci.c5 Sep 2018 14:03:28 -   1.88
> +++ xhci.c5 Sep 2018 14:17:39 -
> @@ -2055,8 +2055,13 @@ xhci_abort_xfer(struct usbd_xfer *xfer, 
>   xp->aborted_xfer = xfer;
>  
>   /* Stop the endpoint and wait until the hardware says so. */
> - if (xhci_cmd_stop_ep(sc, xp->slot, xp->dci))
> + if (xhci_cmd_stop_ep(sc, xp->slot, xp->dci)) {
>   DPRINTF(("%s: error stopping endpoint\n", DEVNAME(sc)));
> + /* Assume the device is gone. */
> + xfer->status = status;
> + usb_transfer_complete(xfer);
> + return;
> + }
>  
>   /*
>* The transfer was already completed when we stopped the

I've just tested the patch. It does fix the behavior when unplugging the device,
however, I've gotten the kernel to crash when I do the following:

1) Start adb
2) Connect the phone
3) Try connecting to phone with adb shell (it doesn't work.. says device is 
offline)
4) Disconnect phone
5) Reconnect phone
6) run: 'adb kill-server'

Then kernel protection fault happens. Dmesg and backtrace and XHCI debug 
entries below.

dmesg:

OpenBSD 6.4-beta (CUSTOM) #1: Wed Sep  5 19:44:41 BST 2018
tom@freya.pertho.local:/sys/arch/amd64/compile/CUSTOM
real mem = 17040445440 (16251MB)
avail mem = 16514736128 (15749MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries)
bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017
bios0: PC Specialist LTD N13xWU
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT 
DBGP DBG2 DMAR BGRT ASF! WSMT
acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) 
PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) 
PXSX(S4) RP24(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.47 MHz, 06-8e-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i7-8550U CPU @ 1.

Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-09-05 Thread Tom Murphy
On Tue, Sep 04, 2018 at 03:24:19PM -0300, Martin Pieuchot wrote:
> On 31/08/18(Fri) 09:19, Tom Murphy wrote:
> > [...] 
> > Here's the dmesg from a XHCI_DEBUG kernel before it crashes (it appears
> > to loop quite a few times before the kernel protection fault kicks in.)
> 
> You're now hitting a bug a ugen(4).  Because you're detaching a device
> while ugen_do_read() is sleeping for a transfer to finish.  This is
> definitively something that needs to be fixed.
> 
> However...
> 
> > xhci0: timeout aborting transfer
> > xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED 
> > actlen=0 len=24 idx=0
> 
> How do you generate this message?

I plug in the Android phone, I start up adb with 'adb start-server'.
For whatever reason, the phone comes back as error: device offline (I can only 
actually
connect via adb when the phone is booted into recovery mode, so not sure if 
this is a bug
with LineageOS, anyway, that's not really relevant to the bug here)

I physically unplug the phone and the kernel starts generating the xhci0: 
timeout aborting transfer 
messages in a loop. After a while the protection fault drops me into ddb.

I don't think the transfer ever finishes because the device doesn't respond to 
it in the first
place. So unplugging the device is what causes it to error. If I keep the phone 
plugged in,
the kernel protection fault doesn't show up until I unplug it.



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-31 Thread Tom Murphy
Hi Martin,

On Thu, Aug 30, 2018 at 03:38:25PM -0300, Martin Pieuchot wrote:
> On 30/08/18(Thu) 18:15, Tom Murphy wrote:
> > On Thu, Aug 30, 2018 at 10:30:04AM -0300, Martin Pieuchot wrote:
> > > On 30/08/18(Thu) 14:00, Tom Murphy wrote:
> > > > On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote:
> > > > > On 28/08/18(Tue) 22:22, Tom Murphy wrote:
> > > > > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote:
> > > > > > > Hello Tom,
> > > > > > > 
> > > > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote:
> > > > > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote:
> > > > > > > > > On 2018-08-25 21:40:57, Tom Murphy  wrote:
> > > > > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
> > > > > > > > > > >  I've narrowed it down. 
> > > > > > > > > > >
> > > > > > > > > > >Last kernel where adb works:  June 24 09:59:46 MDT 2018
> > > > > > > > > > >1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
> > > > > > > 
> > > > > > > The real problem is in the xhci(4) driver.  When a command with a
> > > > > > > timeout is submitted we should ensure no other command is enqueued
> > > > > > > before continuing.  Sadly the driver did not include any mechanism
> > > > > > > to serialize command submissions.  Diff below does that and should
> > > > > > > fix your problem.
> > > > > > > 
> > > > > > > Can you try it on top of -current?  Make sure you have no diff
> > > > > > > reverted.
> > > > > > 
> > > > > > Hi,
> > > > > > 
> > > > > >   I think I spoke a little too soon. I found a case where it
> > > > > > started printing xhci0: command timeout over and over until
> > > > > > eventually the kernel panics with a protection fault. I couldn't
> > > > > > catch the backtrace properly, but it looked around the same area
> > > > > > as this original bug report.
> > > > > 
> > > > > Without backtrace I can't make progress.
> > > > 
> > > > Apologies for the delay. Just found time to reproduce this. Here's
> > > > a backtrace:
> > > 
> > > Almost, can you send the full dmesg with the backtrace at the end?
> > 
> > Hi, Sorry, here's the dmesg with the backtrace.
> 
> Is it the live dmesg?  I don't see any 'xhci0: command timeout'.  Btw
> this message doesn't exist so I can't understand which code path is
> triggering the problem.  Could you build a kernel with XHCI_DEBUG
> enabled, reproduce the page fault and send the dmesg (at least the last
> 10 lines before crashing) + the trace?

Here's the dmesg from a XHCI_DEBUG kernel before it crashes (it appears
to loop quite a few times before the kernel protection fault kicks in.)

xhci0: timeout aborting transfer
xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED 
actlen=0 len=24 idx=0
xhci0: xhci_cmd_stop_ep dev 3 dci 3
xhci0: event error code=19, result=33
trb=0x800031d7bc58 (0x684400b0 0x1300 0x3008401)
xhci0: error stopping endpoint
xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3
xhci0: timeout aborting transfer
xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED 
actlen=0 len=24 idx=0
xhci0: xhci_cmd_stop_ep dev 3 dci 3
xhci0: event error code=19, result=33
trb=0x800031d7bc58 (0x684400d0 0x1300 0x3008401)
xhci0: error stopping endpoint
xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3
xhci0: timeout aborting transfer
xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED 
actlen=0 len=24 idx=0
xhci0: xhci_cmd_stop_ep dev 3 dci 3
xhci0: event error code=19, result=33
trb=0x800031d7bc58 (0x6844 0x1300 0x3008401)
xhci0: error stopping endpoint
xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3
xhci0: timeout aborting transfer
xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED 
actlen=0 len=24 idx=0
xhci0: xhci_cmd_stop_ep dev 3 dci 3
xhci0: event error code=19, result=33
trb=0x800031d7bc58 (0x68440020 0x1300 0x3008401)
xhci0: error stopping endpoint
xhci0: xhci_cmd_set_tr_deq_async dev 3 dci 3
xhci0: timeout aborting transfer
xhci_abort_xfer: xfer=0xff047d8afe10 status=IN_PROGRESS err=CANCELLED 
actlen=0 len=24 idx=0
xhci0: xhci_cmd_stop_ep dev 3 dci 3
xhci0: event error code=19, result=33

Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-30 Thread Tom Murphy
On Thu, Aug 30, 2018 at 10:30:04AM -0300, Martin Pieuchot wrote:
> On 30/08/18(Thu) 14:00, Tom Murphy wrote:
> > Hi Martin,
> > 
> > On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote:
> > > On 28/08/18(Tue) 22:22, Tom Murphy wrote:
> > > > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote:
> > > > > Hello Tom,
> > > > > 
> > > > > On 28/08/18(Tue) 11:10, Tom Murphy wrote:
> > > > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote:
> > > > > > > On 2018-08-25 21:40:57, Tom Murphy  wrote:
> > > > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
> > > > > > > > >  I've narrowed it down. 
> > > > > > > > >
> > > > > > > > >Last kernel where adb works:  June 24 09:59:46 MDT 2018
> > > > > > > > >1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
> > > > > 
> > > > > The real problem is in the xhci(4) driver.  When a command with a
> > > > > timeout is submitted we should ensure no other command is enqueued
> > > > > before continuing.  Sadly the driver did not include any mechanism
> > > > > to serialize command submissions.  Diff below does that and should
> > > > > fix your problem.
> > > > > 
> > > > > Can you try it on top of -current?  Make sure you have no diff
> > > > > reverted.
> > > > 
> > > > Hi,
> > > > 
> > > >   I think I spoke a little too soon. I found a case where it
> > > > started printing xhci0: command timeout over and over until
> > > > eventually the kernel panics with a protection fault. I couldn't
> > > > catch the backtrace properly, but it looked around the same area
> > > > as this original bug report.
> > > 
> > > Without backtrace I can't make progress.
> > 
> > Apologies for the delay. Just found time to reproduce this. Here's
> > a backtrace:
> 
> Almost, can you send the full dmesg with the backtrace at the end?

Hi, Sorry, here's the dmesg with the backtrace.

OpenBSD 6.4-beta (GENERIC.MP) #1: Tue Aug 28 21:00:29 BST 2018
tom@freya.pertho.local:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17040445440 (16251MB)
avail mem = 16514805760 (15749MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries)
bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017
bios0: PC Specialist LTD N13xWU
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT 
DBGP DBG2 DMAR BGRT ASF! WSMT
acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) 
PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) 
PXSX(S4) RP24(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.75 MHz, 06-8e-0a
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.07 MHz, 06-8e-0a
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,S

Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-30 Thread Tom Murphy
Hi Martin,

On Wed, Aug 29, 2018 at 10:44:51AM -0300, Martin Pieuchot wrote:
> On 28/08/18(Tue) 22:22, Tom Murphy wrote:
> > On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote:
> > > Hello Tom,
> > > 
> > > On 28/08/18(Tue) 11:10, Tom Murphy wrote:
> > > > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote:
> > > > > On 2018-08-25 21:40:57, Tom Murphy  wrote:
> > > > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
> > > > > > >  I've narrowed it down. 
> > > > > > >
> > > > > > >Last kernel where adb works:  June 24 09:59:46 MDT 2018
> > > > > > >1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
> > > 
> > > The real problem is in the xhci(4) driver.  When a command with a
> > > timeout is submitted we should ensure no other command is enqueued
> > > before continuing.  Sadly the driver did not include any mechanism
> > > to serialize command submissions.  Diff below does that and should
> > > fix your problem.
> > > 
> > > Can you try it on top of -current?  Make sure you have no diff
> > > reverted.
> > 
> > Hi,
> > 
> >   I think I spoke a little too soon. I found a case where it
> > started printing xhci0: command timeout over and over until
> > eventually the kernel panics with a protection fault. I couldn't
> > catch the backtrace properly, but it looked around the same area
> > as this original bug report.
> 
> Without backtrace I can't make progress.

Apologies for the delay. Just found time to reproduce this. Here's
a backtrace:

kernel: protection fault trap, code=0
Stopped at xhci_abort_xfer+0x57:  cmpb  $0,0x471(%r14)
ddb{1}> bt
xhci_abort_xfer(9cbfd0450eaf1f9c,4) at xhci_abort_xfer+0x57
usbd_transfer(b41ac578436f71d1) at usb_transfer+0x24d
ugen_do_read(7fd9672719de8ccd,800031d0,8153,
ff047e7b2ba0) at ugen_do_read+0x347
ugenread(d30250dfbb30abef,800031d0,800031d66560) at
ugenread+0x47
spec_read(ae30accb64c92cb4) at spec_read+0xab
VOP_READ(1be36585d83bbddf,2d04c5610d671db7,ff047e7b2ba0,
ff04) at VOP_READ+0x49
vn_read(f8ef5df9c7a62c48,ff03ff215788,10) at vn_read+0xf5
dofilereadv(1dd31abb163e61f3,30,8000fffea028,3,
800031d66790) at dofilereadv+0xe0
sys_read(cc3368a684a7a8f,2d04c5610d671db7,18) at sys_read+0x5c
syscall(eec44dfe2a55ecc1) at syscall+0x32a
Xsyscall(6,3,2b360799d50,3,1,2b3d0a18e00) at Xsyscall+0x128
end of kernel
end trace frame: 0x2b39d0546f0, count: -11

I managed to trigger it by disconnecting the phone, reconnecting it,
then running adb kill-server to re-run adb start-server to see if it
picked up on the phone and the panic happened right after
'adb kill-server'.

Please let me know if there's anything more you'd like me to test.

Thanks,
Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-28 Thread Tom Murphy
On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote:
> Hello Tom,
> 
> On 28/08/18(Tue) 11:10, Tom Murphy wrote:
> > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote:
> > > On 2018-08-25 21:40:57, Tom Murphy  wrote:
> > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
> > > > >  I've narrowed it down. 
> > > > >
> > > > >Last kernel where adb works:  June 24 09:59:46 MDT 2018
> > > > >1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
> 
> The real problem is in the xhci(4) driver.  When a command with a
> timeout is submitted we should ensure no other command is enqueued
> before continuing.  Sadly the driver did not include any mechanism
> to serialize command submissions.  Diff below does that and should
> fix your problem.
> 
> Can you try it on top of -current?  Make sure you have no diff
> reverted.

Hi,

  I think I spoke a little too soon. I found a case where it
started printing xhci0: command timeout over and over until
eventually the kernel panics with a protection fault. I couldn't
catch the backtrace properly, but it looked around the same area
as this original bug report.

Thanks,
Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-28 Thread Tom Murphy
Hi Martin,

On Tue, Aug 28, 2018 at 04:20:41PM -0300, Martin Pieuchot wrote:
> Hello Tom,
> 
> On 28/08/18(Tue) 11:10, Tom Murphy wrote:
> > On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote:
> > > On 2018-08-25 21:40:57, Tom Murphy  wrote:
> > > > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
> > > > >  I've narrowed it down. 
> > > > >
> > > > >Last kernel where adb works:  June 24 09:59:46 MDT 2018
> > > > >1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
> 
> The real problem is in the xhci(4) driver.  When a command with a
> timeout is submitted we should ensure no other command is enqueued
> before continuing.  Sadly the driver did not include any mechanism
> to serialize command submissions.  Diff below does that and should
> fix your problem.
> 
> Can you try it on top of -current?  Make sure you have no diff
> reverted.

I've just tested this and was able to send some files to the phone
with adb over USB without any issues.

The rwlock diff did the trick. Many thanks for this!

Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-28 Thread Tom Murphy
On Tue, Aug 28, 2018 at 02:49:38PM +0900, Bryan Linton wrote:
> [CCing visa@ in]
> 
> On 2018-08-25 21:40:57, Tom Murphy  wrote:
> > On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
> > >  I've narrowed it down. 
> > >
> > >Last kernel where adb works:  June 24 09:59:46 MDT 2018
> > >1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
> > >
> > > [...]
> > >
> > >  I'm going to look at the commits next.
> > >
> > >-Tom
> > 
> > I can verify that this commit is what makes the kernel panic when adb is
> > run and an Android device is connected to the machine with ADB enabled:
> > 
> > https://marc.info/?l=openbsd-cvs=152996258723362=2
> > 
> > CVSROOT:/cvs
> > Module name:src
> > Changes by: v...@cvs.openbsd.org2018/06/25 10:06:27
> > 
> > Modified files:
> > sys/kern   : vfs_syscalls.c 
> > lib/libc/sys   : dup.2 
> > 
> > Log message:
> > During open(2), release the fdp lock before calling vn_open(9).
> > This lets other threads of the process modify the file descriptor
> > table even if the vn_open(9) call blocks.
> > 
> > The change has an effect on dup2(2) and dup3(2). If the new descriptor
> > is the same as the one reserved by an unfinished open(2), the system
> > call will fail with error EBUSY. The accept(2) system call already
> > behaves like this.
> > 
> > Issue pointed out by art@ via mpi@
> > 
> > Tested in a bulk build by ajacoutot@
> > OK mpi@
> > 
> > * * *
> > 
> > I tested kernels compiled just before that commit and right after, and that
> > commit makes the kernel panic.
> > 
> 
> I can also confirm that reverting this patch fixes the kernel
> panics when launching ADB for me as well.  I'm currently syncing
> my phone to my HDD as I type this.
> 
> I'm still building against kernel sources from here:
> OpenBSD 6.3-current (GENERIC.MP) #163: Mon Jul 30 12:45:31 MDT 2018
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
> So fair warning, my tree is still a little bit out of date (I'm
> planning on upgrading to a newer snap this weekend if I have the
> time) but, as stated above, I can at least confirm that reverting
> this patch fixes the panics for me.
> 
> -- 
> Bryan

Hi,

  I've checked out the src from -current, and reverted the 25th June 2018 commit
(just to src/sys/kern/vfs_syscalls.c to test) and I'm able to transfer files to
and from my Android phone over USB now without any panics. This is the diff to
revert. I don't know what kind of an effect this will have overall on the kernel
as I see in the commit messages that there is some refactoring going on (if I 
read
it correctly)?

Index: sys/kern/vfs_syscalls.c
===
RCS file: /cvs/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.304
diff -u -p -u -p -r1.304 vfs_syscalls.c
--- sys/kern/vfs_syscalls.c 20 Aug 2018 16:00:22 -  1.304
+++ sys/kern/vfs_syscalls.c 28 Aug 2018 10:07:56 -
@@ -1007,8 +1007,6 @@ doopenat(struct proc *p, int fd, const c
fdplock(fdp);
if ((error = falloc(p, , )) != 0)
goto out;
-   fdpunlock(fdp);
-
flags = FFLAGS(oflags);
if (flags & FREAD) {
ni_pledge |= PLEDGE_RPATH;
@@ -1035,7 +1033,6 @@ doopenat(struct proc *p, int fd, const c
flags &= ~O_TRUNC;  /* Must do truncate ourselves */
}
if ((error = vn_open(, flags, cmode)) != 0) {
-   fdplock(fdp);
if (error == ENODEV &&
p->p_dupfd >= 0 &&  /* XXX from fdopen */
(error =
@@ -1070,7 +1067,6 @@ doopenat(struct proc *p, int fd, const c
VOP_UNLOCK(vp);
error = VOP_ADVLOCK(vp, (caddr_t)fp, F_SETLK, , type);
if (error) {
-   fdplock(fdp);
/* closef will vn_close the file for us. */
fdremove(fdp, indx);
closef(fp, p);
@@ -1093,7 +1089,6 @@ doopenat(struct proc *p, int fd, const c
}
if (error) {
VOP_UNLOCK(vp);
-   fdplock(fdp);
/* closef will close the file for us. */
fdremove(fdp, indx);
closef(fp, p);
@@ -1102,7 +1097,6 @@ doopenat(struct proc *p, int fd, const c
}
VOP_UNLOCK(vp);
*retval = indx;
-   fdplock(fdp);
fdinsert(fdp, indx, cloexec, fp);
FRELE(fp, p);
 out:


Kind regards,
Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-25 Thread Tom Murphy
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Tom Murphy wrote:
>  I've narrowed it down. 
>
>Last kernel where adb works:  June 24 09:59:46 MDT 2018
>1st Kernel where adb panics:  June 25 13:10:32 MDT 2018
>
>  I did notice when my phone's booted into LineageOS and I have ADB
>turned on, when I connect the phone via USB I get:
>
>ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 8
>
>  However, I'm not able to actually connect to it with adb shell or
>anything else. It says: Error device offline or something.
>
>  When I boot the phone into recovery mode, the phone shows up like
>this when I plug it in:
>
>  ugen1 at uhub0 port 7 "Motorola Moto G LTE" rev 2.00/2.28 addr 4
>
>  (different name!) and I am able use adb shell, adb push/pull, etc..
>
>  I think there's some issue with LineageOS' ADB mode, but that's not
>really relevant here (it's a separate issue and outside of OpenBSD
>perhaps though I'll have to test with Linux or some other OS.)
>
>  I'm going to look at the commits next.
>
>-Tom

I can verify that this commit is what makes the kernel panic when adb is
run and an Android device is connected to the machine with ADB enabled:

https://marc.info/?l=openbsd-cvs=152996258723362=2

CVSROOT:/cvs
Module name:src
Changes by: v...@cvs.openbsd.org2018/06/25 10:06:27

Modified files:
sys/kern   : vfs_syscalls.c 
lib/libc/sys   : dup.2 

Log message:
During open(2), release the fdp lock before calling vn_open(9).
This lets other threads of the process modify the file descriptor
table even if the vn_open(9) call blocks.

The change has an effect on dup2(2) and dup3(2). If the new descriptor
is the same as the one reserved by an unfinished open(2), the system
call will fail with error EBUSY. The accept(2) system call already
behaves like this.

Issue pointed out by art@ via mpi@

Tested in a bulk build by ajacoutot@
OK mpi@

* * *

I tested kernels compiled just before that commit and right after, and that
commit makes the kernel panic.

-Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-25 Thread Tom Murphy
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Bryan Linton wrote:
> So I found some time to try to bisect this, but was hampered by my
> phone being somewhat temperamental.
> 
> Everything up to July 3rd was fine.  No crashes occurred.
> 
> On a July 15th checkout, my system panicked when trying to run adb
> with my phone connected.
> 
> Unfortunately when I tried to bisect this further, my phone began
> refusing to connect to my computer.  I get a generic
>   "uhub0: device problem, disabling port 2"
> error and cannot get my phone to attach to my computer even if I
> reboot it, plug/unplug it, etc.
> 
> I'll see if I can try to bisect this further once I figure out
> what the problem is with my phone, but in the meantime, I wanted
> to at least update the bugs@ list with my findings so far.
> 
> I see a few potential commits in that time-frame that could be
> responsible, so I'm going to see if I can manage to narrow this
> down even further.
> 
> -- 
> Bryan

Hi Bryan,

  I've narrowed it down. 

Last kernel where adb works:  June 24 09:59:46 MDT 2018
1st Kernel where adb panics:  June 25 13:10:32 MDT 2018

  I did notice when my phone's booted into LineageOS and I have ADB
turned on, when I connect the phone via USB I get:

ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 8

  However, I'm not able to actually connect to it with adb shell or
anything else. It says: Error device offline or something.

  When I boot the phone into recovery mode, the phone shows up like
this when I plug it in:

  ugen1 at uhub0 port 7 "Motorola Moto G LTE" rev 2.00/2.28 addr 4

  (different name!) and I am able use adb shell, adb push/pull, etc..

  I think there's some issue with LineageOS' ADB mode, but that's not
really relevant here (it's a separate issue and outside of OpenBSD
perhaps though I'll have to test with Linux or some other OS.)

  I'm going to look at the commits next.

-Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-23 Thread Tom Murphy
On Thu, Aug 23, 2018 at 08:45:54PM +0900, Bryan Linton wrote:
> On 2018-08-21 17:00:50, Bryan Linton  wrote:
> > On 2018-08-18 23:02:24, Tom Murphy  wrote:
> > > >Synopsis:Kernel panic in xhci.c when Android phone with ADB
> > >
> > > [...]
> > >
> > > >How-To-Repeat:
> > >   I can reproduce this every time just by enabling ADB on the
> > > phone and connecting it to the OpenBSD machine.
> > > 
> > 
> > I've been seeing the same panics for about a month or two now.
> > 
> > I hadn't filed a bug report because I was hoping I'd be able to
> > find the time to bisect when the panics first started happening.
> > 
> 
> So I found some time to try to bisect this, but was hampered by my
> phone being somewhat temperamental.
> 
> Everything up to July 3rd was fine.  No crashes occurred.
> 
> On a July 15th checkout, my system panicked when trying to run adb
> with my phone connected.
> 
> Unfortunately when I tried to bisect this further, my phone began
> refusing to connect to my computer.  I get a generic
>   "uhub0: device problem, disabling port 2"
> error and cannot get my phone to attach to my computer even if I
> reboot it, plug/unplug it, etc.
> 
> I'll see if I can try to bisect this further once I figure out
> what the problem is with my phone, but in the meantime, I wanted
> to at least update the bugs@ list with my findings so far.
> 
> I see a few potential commits in that time-frame that could be
> responsible, so I'm going to see if I can manage to narrow this
> down even further.
> 
> -- 
> Bryan

Hi Bryan,

  Thanks for having a look at this! I've been up to my eyeballs in
work lately so haven't had a chance to play with it much more. I'll
have a look at the xhci stuff that's gone in and see what could be
doing it this weekend.

  -Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-20 Thread Tom Murphy
Hi,

  I managed to get XHCI_DEBUG working. Here are the messages and I 
connect the Android device:

xhci0: xhci_cmd_configure_ep dev 1
xhci0: xhci_cmd_slot_control
xhci0: port=7 change=0x80
xhci0: port=7 change=0x80
xhci0: xhci_cmd_slot_control
xhci0: dev 6, input=0xff00687b9000 slot=0xff00687b9020
ep0=0xff00687b9040
xhci0: dev 6, setting DCBAA to 0x687ba000
xhci_pipe_init: pipe=0x81557000 addr=0 depth=1 port=7 speed=3
dev 6 dci 1 (epAddr=0x0)
xhci0: xhci_cmd_set_address BSR=1
xhci0: xhci_cmd_set_address BSR=0
xhci0: dev 6 addr 6
ugen1 at uhub0 port 7 "motorola XT1039" rev 2.00/2.28 addr 2
xhci0: short xfer 0xff047d8ad960 for 48
xhci0: short xfer 0xff047d8ad960 for 51

Then, when I start adb with 'adb wait-for-devices':

pxahnciic_:p ikpee_init: pipe=0x80c57000 addr=2 depth=1 port=7
speed=3 dev 6 dci 3 (epAddr=0x81)
xhci0: xhci_cmd_configure_ep dev 6

Then the system hangs (doesn't drop into ddb) requiring a hard reset.

When I was in ddb before, I tried 'show struct sc' but that didn't print
anything helpful.

Thanks,
Tom



Re: Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-18 Thread Tom Murphy
Hi,

  I would like to amend the above bug report. When reproducing I 
originally said it could be reproduced when the Android device was
just plugged in.

  After going over my notes, and doing more testing, I found it only
occurs when 'adb start-server' is running AND the device is plugged in
via USB.

  If adb is not started, plugging in the device registers it in the kernel
but does nothing. Starting adb then triggers the panic.

  I recompiled the kernel with option XHCI_debug but could not see any extra
messages.

  Thanks,
  Tom



Plugging in ADB-enabled Android device makes kernel panic with xhci

2018-08-18 Thread Tom Murphy
>Synopsis:  Kernel panic in xhci.c when Android phone with ADB
>   connected
>Category:  kernel
>Environment:
System  : OpenBSD 6.4
Details : OpenBSD 6.4-beta (GENERIC.MP) #224: Fri Aug 17 23:42:30 
MDT 2018
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
When plugging in an Android phone into USB when it has the
Android Debug Bridge (ADB) enabled causes a kernel panic. No other
application needs to be running (not even devel/adb needs to be
running.)


ddb{2}> show panic
kernel diagnostic assertion "timeout == 0 || sc->sc_cmd_trb == NULL"
failed file "/usr/src/sys/dev/usb/xhci.c", line 1643
ddb{2}> trace
db_enter() at db_enter+0x12
panic() at panic+0x120
__assert(817bfa54,800031e1a680,806ab000,800031e1a6e8)
at __assert+0x24
xhci_command_submit(893f1452e29dc378,806ab000,8178a068)
at xhci_command_submit+0x220
xhci_pipe_init(2517ab0354ba223c,8178a000) at
xhci_pipe_init+0x1b9
xhci_pipe_open(c71aa3f3774c08e5) at xhci_pipe_open+0x11b
usbd_setup_pipe(12ef9866b0b08d0e,6,3,38e55c0cf670c5f1) at
usbd_open_pipe+0xa1
ugenopen(fdb4584eb53365ed,800031e1a948,800031bc8968,15a8) at
ugenopen+0x1b4
spec_open(0d8608c87102661b) at spec_open+0xe1
VOP_OPEN(f6c6c903b309846,800031e1aad8,38e55c0cf670c5f1,800031bc8968)
at VOP_OPEN+0x58
vn_open(3e77a72b92ba5b4b,800031bc8968,1) at vn_open+0x28b
doopenat(6e2d8c17e8ec19d2,50,800031bc8968,5,800031e1ad70,426cbc483d8)
at doopenat+0x1d3
syscall(739d180e92889b4d) at syscall+0x32a
Xsyscall(6,5,42690c29c00,5,1,1) at Xsyscall+0x128
end of kernel
end of trace frame: 0x426cbc484c0, count: -15

There appears to be a similar bugs reported here:
https://marc.info/?l=openbsd-bugs=147306540522232=2
https://marc.info/?l=openbsd-bugs=143456823814213=2

except I get the above panic on xhci, not ehci. The panic occurs as soon as
the phone is physically connected.

My machine only uses xhci. If I disable xhci(4) in the kernel, USB stops
working, so I cannot test with ehci(4) (there is also no BIOS option to
disable USB3)

>How-To-Repeat:
I can reproduce this every time just by enabling ADB on the
phone and connecting it to the OpenBSD machine.

dmesg:
OpenBSD 6.4-beta (GENERIC.MP) #224: Fri Aug 17 23:42:30 MDT 2018
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17040445440 (16251MB)
avail mem = 16514805760 (15749MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x7b288000 (41 entries)
bios0: vendor American Megatrends Inc. version "1.05.07" date 09/29/2017
bios0: PC Specialist LTD N13xWU
acpi0 at bios0: rev 2
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP APIC FPDT FIDT MCFG SSDT SSDT HPET UEFI SSDT SSDT SSDT 
DBGP DBG2 DMAR BGRT ASF! WSMT
acpi0: wakeup devices PXSX(S4) RP17(S4) PXSX(S4) RP18(S4) PXSX(S4) RP19(S4) 
PXSX(S4) RP20(S4) PXSX(S4) RP21(S4) PXSX(S4) RP22(S4) PXSX(S4) RP23(S4) 
PXSX(S4) RP24(S4) [...]
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.76 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 23MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.04 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,SGX,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz, 1696.04 MHz
cpu2: 

bgpd incorrectly validates updates when peer does not have 4 byte AS capability

2018-03-30 Thread Tom Beard
The commit below to bgpd/rde_attr.c introduced a check for AS0 in the  
AS-PATH received in an UPDATE message:



revision 1.98
date: 2017/05/26 20:55:30;  author: phessler;  state: Exp;  lines: +8  
-2;  commitid: p82SYNnm0pVdbRCn;

AS 0 is special and should be considered an error.

Drop the session if it shows during OPEN or CAPA, or mark as invalid if  
it is part of an Update.


required by RFC 7607


This works fine if the peer has 4-byte AS capability but fails if not.

aspath_verify() happens before aspath_inflate() fixes up the 2-byte AS  
path.  As a result when iterating over the path in aspath_verify(),  
aspath_extract() is returning incorrect AS number values, sometimes this  
will return a 0 causing aspath_verify() to drop the update with message  
"bad ASPATH, path invalidated and prefix withdrawn".


The patch below flips the order so that aspath_inflate() for a peer without  
4-byte capability happens before aspath_verify().  This also means there is  
no need to pass the 4-byte capability indicator to aspath_verify() as it’ll  
only ever be looking at a path containing 4 byte AS numbers.


Appreciate the work on bgpd. Hope this is of some use.

Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.372
diff -u -p -r1.372 rde.c
--- rde.c   14 Sep 2017 18:16:28 -  1.372
+++ rde.c   30 Mar 2018 19:17:14 -
@@ -1465,7 +1465,12 @@ bad_flags:
case ATTR_ASPATH:
if (!CHECK_FLAGS(flags, ATTR_WELL_KNOWN, 0))
goto bad_flags;
-   error = aspath_verify(p, attr_len, rde_as4byte(peer));
+   if (rde_as4byte(peer)) {
+   npath = p;
+   nlen = attr_len;
+   } else
+   npath = aspath_inflate(p, attr_len, );
+   error = aspath_verify(npath, nlen);
if (error == AS_ERR_SOFT) {
/*
 * soft errors like unexpected segment types are
@@ -1482,11 +1487,6 @@ bad_flags:
}
if (a->flags & F_ATTR_ASPATH)
goto bad_list;
-   if (rde_as4byte(peer)) {
-   npath = p;
-   nlen = attr_len;
-   } else
-   npath = aspath_inflate(p, attr_len, );
a->flags |= F_ATTR_ASPATH;
a->aspath = aspath_get(npath, nlen);
if (npath != p)
@@ -1687,7 +1687,7 @@ bad_flags:
if (!CHECK_FLAGS(flags, ATTR_OPTIONAL|ATTR_TRANSITIVE,
ATTR_PARTIAL))
goto bad_flags;
-   if ((error = aspath_verify(p, attr_len, 1)) != 0) {
+   if ((error = aspath_verify(p, attr_len)) != 0) {
/*
 * XXX RFC does not specify how to handle errors.
 * XXX Instead of dropping the session because of a
Index: rde.h
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.h,v
retrieving revision 1.162
diff -u -p -r1.162 rde.h
--- rde.h   30 May 2017 18:08:15 -  1.162
+++ rde.h   30 Mar 2018 19:17:14 -
@@ -349,7 +349,7 @@ void attr_free(struct rde_aspath *, st
 #define attr_optlen(x) \
 ((x)->len > 255 ? (x)->len + 4 : (x)->len + 3)

-int aspath_verify(void *, u_int16_t, int);
+int aspath_verify(void *, u_int16_t);
 #define AS_ERR_LEN -1
 #define AS_ERR_TYPE-2
 #define AS_ERR_BAD -3
Index: rde_attr.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde_attr.c,v
retrieving revision 1.100
diff -u -p -r1.100 rde_attr.c
--- rde_attr.c  31 May 2017 10:44:00 -  1.100
+++ rde_attr.c  30 Mar 2018 19:17:14 -
@@ -421,10 +421,10 @@ SIPHASH_KEY astablekey;
[(x) & astable.hashmask]

 int
-aspath_verify(void *data, u_int16_t len, int as4byte)
+aspath_verify(void *data, u_int16_t len)
 {
u_int8_t*seg = data;
-   u_int16_tseg_size, as_size = 2;
+   u_int16_tseg_size;
u_int8_t seg_len, seg_type;
int  i, error = 0;

@@ -432,9 +432,6 @@ aspath_verify(void *data, u_int16_t len,
/* odd length aspath are invalid */
return (AS_ERR_BAD);

-   if (as4byte)
-   as_size = 4;
-
for (; len > 0; len -= seg_size, seg += seg_size) {
if (len < 2)/* header length check */
return (AS_ERR_BAD);
@@ -452,7 +449,7 @@ aspath_verify(void *data, u_int16_t len,
seg_type != AS_CONFED_SEQUENCE && seg_type != AS_CONFED_SET)
return (AS_ERR_TYPE);

-   seg_size = 2 + as_size * seg_len;
+   

/bin/ksh ksh(1) typeset -p output weirdness

2018-02-07 Thread tom
>Synopsis:  ksh(pd) `typeset -p` lists all variables as readonly
>Category:  user
>Environment:
System  : OpenBSD 6.2
Details : OpenBSD 6.2 (GENERIC.MP) #5: Fri Feb  2 23:02:19 CET 2018
 
r...@syspatch-62-amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
The ksh(1) man page describes the `-p` option as being equivalent
to the default of no parameters, but actually using the `-p` option
results in a very different output with the somewhat bizarre
behavior of prefixing each variable with "readonly "
>How-To-Repeat:
From within ksh issue the command `typeset -p`
>Fix:
Within the source code (src/bin/ksh/c_ksh.c), the offending lines
test the pflag variable and if set test the variable `flag` instead
of testing `vp->flag` and further assume that the a shell variable
can only be EXPORT or RDONLY.

The following patch will fix that behavior but it still won't be
consistent with the manual. And the default printout and the `-p`
format of ksh93 is the reverse of what the fix will do, though that
seems to be what the code intended to do anyway.

Possibly a better fix would be to remove the special handling of
the `-p` format.


Index: c_ksh.c
===
RCS file: /home/tw/src/sys/openbsd-cvs/src/bin/ksh/c_ksh.c,v
retrieving revision 1.58
diff -u -p -r1.58 c_ksh.c
--- c_ksh.c 16 Jan 2018 22:52:32 -  1.58
+++ c_ksh.c 7 Feb 2018 23:09:06 -
@@ -790,10 +790,12 @@ c_typeset(char **wp)
if (vp->flag)
break;
} else {
-   if (pflag)
-   shprintf("%s ",
-   (flag & EXPORT) ?
-   "export" : 
"readonly");
+   if (pflag) {
+   if (vp->flag & EXPORT)
+   
shprintf("export ");
+   else if 
((vp->flag))
+   
shprintf("readonly ");
+   }
if ((vp->flag) && any_set)
shprintf("%s[%d]",
vp->name, 
vp->index);



Re: Openbsd 6.1 and Current Console Freezes and lockup Proxmox PVE5.0

2017-10-27 Thread Tom Smyth
Hello Theo, Mike, All,

@Theo Understood it is important to protect developers and the project goals
... @Mike Thanks for your Generosity in the time you took on this thread,
Yes I want Mike to make VMM more awesome :)  @Mike keep up the good work

I cant disagree with any point that Theo made in his email on this tread
that said,
unfortunately I cant always choose my hypervisor and I dearly want to run
OpenBSD on it proxmox...

I do think (based on the fact that OpenBSD 6.0-6.2 works on PVE 4.4 it is
probably a (virtual Hardware issue ) .. not necessarily an OpenBSD issue
I will raise this with the PVE Support guys (as I have already done since mid
July )

Any further posts on this thread from me will be (hopefully for other OpenBSD
 users benefit (if I make progress)
and certainly not intended as a request or a distraction for Core
OpenBSD Developers

All the Best,

Tom Smyth

On 27 October 2017 at 06:37, Theo de Raadt <dera...@openbsd.org> wrote:
> Tom,
>
> A virtual machine setup is an operating system running on an operating
> system on top of an operating system.
>
> OK, not quite.  The middle one, the VM itself, is as a bit less
> complex than a full operating system as machine-independent code goes,
> but nevertheless the machine-dependent bat-shit-crazy stuff is far
> more complex with gobs of extremely messy nuances face it on both
> sides because x86 is a fucking minefield
>
> Everyone needs to adjust their expectation that all 3 layers are
> perfect, AND not assume that it is our layer doing the wrong thing
>
> Really the layers should simplify but the current marketplace is still
> gaining more value out of product differentiation than
> simplification+convergence, both sw and hw
>
> Even if our subsystem isn't doing something 'right', it is NOT the
> stated goal of OpenBSD to run well on every garbage VM, because it has
> become impossible for the little guy to be perfect.
>
> Concerted efforts to diagnose and improve these low-level issues uses
> the same crowd of people who are trying to improve other edges which
> may be more important.  do you want our vmm to work well?  or do you
> want us to work better on someone else's vmm?  Sorry, limited
> skillset, pick what you want mlarkin to focus on!  But that is unfair,
> and even if he listened to your wishlist, UNPRODUCTIVE.
>
> Where does this go?  Get ready for monopolies in everything, or
> oligopolies at best... or fight their establishment.
>
>> Just to say the gaps in ping response seems  get worse as the uptime 
>> increases
>> ie
>> with the uptime around 5 minutes the gaps between ping results are around 1 
>> sec
>> (what I consider normal)
>> with the uptime around 2 hrs 45 minutes the gaps between ping results are 13 
>> sec
>> with the uptime 8 hrs 30 minutes  the gaps between ping results are 35 
>> seconds
>>
>> Output of sysctl kern.timecounter below
>>
>> kern.timecounter.tick=1
>> kern.timecounter.timestepwarnings=0
>> kern.timecounter.hardware=acpihpet0
>> kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
>> dummy(-1000000)
>>
>> I will change the ACPI  now to i8254  and report back later on
>> Thanks
>>
>>
>> On 26 October 2017 at 20:25, Mike Belopuhov <m...@belopuhov.com> wrote:
>> > On Thu, Oct 26, 2017 at 19:05 +0100, Tom Smyth wrote:
>> >> Lads,
>> >>
>> >> Im pleased to say that my testing of OpenBSD 6.1  and OpenBSD 6.2
>> >> Release
>> >> amd64 ,
>> >> appear to work  a little better  in Proxmox PVE5.1 as released this week,
>> >>
>> >> I used iso version 5.1-722cc488-1 from Proxmox
>> >> Updated on 24 October 2017
>> >>
>> >> The Console no longer freezes but after a few hours
>> >> the console (vga console accessed via Proxmox webinterface seems
>> >> to lag a little
>> >> the interval between pings for instance takes up to 13 seconds, which
>> >> is a bit strange...  ie it takes 13 seconds for each line of Ping result
>> >> which is u
>> >> Ill report more feedback later, but at least OpenBSD is not freezing
>> >> as bad in this
>> >> version of Proxmox PVE 5.1
>> >>
>> >
>> > Hi,
>> >
>> > Can you please show us the output of "sysctl kern.timecounter".
>> > If you're currently using an acpihpet0, can you please try
>> > switching to the acpitimer0 (and if that doesn't help, i8254) via
>> >
>> >  sysctl kern.timecounter.hardware=acpitimer0
>> >
>> > and attempt to reproduce the 13 secod delay.
>> >
>> > Regards,
>> > Mike
>>
>



Re: Openbsd 6.1 and Current Console Freezes and lockup Proxmox PVE5.0

2017-10-26 Thread Tom Smyth
Hello Mike
just to follow up

the issue seems to still occur with the kern.timecounter hardware
set to i8254
sysctl kern.timecounter
kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=i8254
kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
dummy(-100)

when I ping after boot there is the normal 1 Second interval
between ping result lines
however at after 25 minutes  runtime there is about 4 seconds
 of an interval between the ping result lines

 Thanks

Tom Smyth

On 27 October 2017 at 03:51, Tom Smyth <tom.sm...@wirelessconnect.eu> wrote:
> Hi Mike
>
> Just to say the gaps in ping response seems  get worse as the uptime increases
> ie
> with the uptime around 5 minutes the gaps between ping results are around 1 
> sec
> (what I consider normal)
> with the uptime around 2 hrs 45 minutes the gaps between ping results are 13 
> sec
> with the uptime 8 hrs 30 minutes  the gaps between ping results are 35 seconds
>
> Output of sysctl kern.timecounter below
>
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=acpihpet0
> kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
> dummy(-100)
>
> I will change the ACPI  now to i8254  and report back later on
> Thanks
>
>
> On 26 October 2017 at 20:25, Mike Belopuhov <m...@belopuhov.com> wrote:
>> On Thu, Oct 26, 2017 at 19:05 +0100, Tom Smyth wrote:
>>> Lads,
>>>
>>> Im pleased to say that my testing of OpenBSD 6.1  and OpenBSD 6.2
>>> Release
>>> amd64 ,
>>> appear to work  a little better  in Proxmox PVE5.1 as released this week,
>>>
>>> I used iso version 5.1-722cc488-1 from Proxmox
>>> Updated on 24 October 2017
>>>
>>> The Console no longer freezes but after a few hours
>>> the console (vga console accessed via Proxmox webinterface seems
>>> to lag a little
>>> the interval between pings for instance takes up to 13 seconds, which
>>> is a bit strange...  ie it takes 13 seconds for each line of Ping result
>>> which is u
>>> Ill report more feedback later, but at least OpenBSD is not freezing
>>> as bad in this
>>> version of Proxmox PVE 5.1
>>>
>>
>> Hi,
>>
>> Can you please show us the output of "sysctl kern.timecounter".
>> If you're currently using an acpihpet0, can you please try
>> switching to the acpitimer0 (and if that doesn't help, i8254) via
>>
>>  sysctl kern.timecounter.hardware=acpitimer0
>>
>> and attempt to reproduce the 13 secod delay.
>>
>> Regards,
>> Mike



-- 
Kindest regards,
Tom Smyth

Mobile: +353 87 6193172
The information contained in this E-mail is intended only for the
confidential use of the named recipient. If the reader of this message
is not the intended recipient or the person responsible for
delivering it to the recipient, you are hereby notified that you have
received this communication in error and that any review,
dissemination or copying of this communication is strictly prohibited.
If you have received this in error, please notify the sender
immediately by telephone at the number above and erase the message
You are requested to carry out your own virus check before
opening any attachment.



Re: Openbsd 6.1 and Current Console Freezes and lockup Proxmox PVE5.0

2017-10-26 Thread Tom Smyth
Hi Mike

Just to say the gaps in ping response seems  get worse as the uptime increases
ie
with the uptime around 5 minutes the gaps between ping results are around 1 sec
(what I consider normal)
with the uptime around 2 hrs 45 minutes the gaps between ping results are 13 sec
with the uptime 8 hrs 30 minutes  the gaps between ping results are 35 seconds

Output of sysctl kern.timecounter below

kern.timecounter.tick=1
kern.timecounter.timestepwarnings=0
kern.timecounter.hardware=acpihpet0
kern.timecounter.choice=i8254(0) acpihpet0(1000) acpitimer0(1000)
dummy(-100)

I will change the ACPI  now to i8254  and report back later on
Thanks


On 26 October 2017 at 20:25, Mike Belopuhov <m...@belopuhov.com> wrote:
> On Thu, Oct 26, 2017 at 19:05 +0100, Tom Smyth wrote:
>> Lads,
>>
>> Im pleased to say that my testing of OpenBSD 6.1  and OpenBSD 6.2
>> Release
>> amd64 ,
>> appear to work  a little better  in Proxmox PVE5.1 as released this week,
>>
>> I used iso version 5.1-722cc488-1 from Proxmox
>> Updated on 24 October 2017
>>
>> The Console no longer freezes but after a few hours
>> the console (vga console accessed via Proxmox webinterface seems
>> to lag a little
>> the interval between pings for instance takes up to 13 seconds, which
>> is a bit strange...  ie it takes 13 seconds for each line of Ping result
>> which is u
>> Ill report more feedback later, but at least OpenBSD is not freezing
>> as bad in this
>> version of Proxmox PVE 5.1
>>
>
> Hi,
>
> Can you please show us the output of "sysctl kern.timecounter".
> If you're currently using an acpihpet0, can you please try
> switching to the acpitimer0 (and if that doesn't help, i8254) via
>
>  sysctl kern.timecounter.hardware=acpitimer0
>
> and attempt to reproduce the 13 secod delay.
>
> Regards,
> Mike



Re: Openbsd 6.1 and Current Console Freezes and lockup Proxmox PVE5.0

2017-10-26 Thread Tom Smyth
B revision 1.0
uhub0 at usb0 configuration 1 interface 0 "Intel UHCI root hub" rev
1.00/1.00 addr 1
uhidev0 at uhub0 port 1 configuration 1 interface 0 "QEMU QEMU USB
Tablet" rev 2.00/0.00 addr 2
uhidev0: iclass 3/0
ums0 at uhidev0: 3 buttons, Z dir
wsmouse1 at ums0 mux 0
vscsi0 at root
scsibus3 at vscsi0: 256 targets
softraid0 at root
scsibus4 at softraid0: 256 targets
root on sd0a (9ce3c4cfff12a7d0.a) swap on sd0b dump on sd0b
fd0 at fdc0 drive 1: density unknown





I hope this helps...  I will do more extensive testing
but I got much further with this




no hangs on the Console window seen so far

On 8 October 2017 at 15:55, Tom Smyth <tom.sm...@wirelessconnect.eu> wrote:
> Hello,
>
> I have had this issue in proxmox in 5.0 in all releases
> (the beta 1, beta2 and the july 4th release of PRoxmox 5 and the
> update in August on intel systems with CPUS newer than the X5460)
> Curiously it worked fine as far as I could tell on AMD systems
> (Opteron Gen 2/ gen 3 systems)
>
> I have posted in Bugs,
> https://marc.info/?l=openbsd-bugs=150097397016837=2
>
> To be fair to OpenBSD it wasnt a bug in 6.1 (as proxmox 5.0 was not
>  released when 6.1 was released,
> however Current 6.1+  didnt work either I had opened a Ticket with
> Proxmox  and worked on it for about a month
> and they couldn't repeat it  which is weird
> cause it was just too easy for me to crash openbsd on Proxmox 5.0.
> Other operating systems running on Proxmos 5.0 seem to be unaffected
>
> so this issue will only become a bug when OpenBSD 6.2 is released :)
>
> OpenBSD 6.1 & 6.0 work fine in proxmox PVE 4.4 on the same hardware
>
> I hope this helps
>
> Tom Smyth



Re: Console freeze OpenBSD 6.1 release and curent on proxmox ve 5.0

2017-07-25 Thread Tom Smyth
Hi Lads,

Just to update you,

OpenBSD 6.1 and Current as a guest on Proxmox 4.4 Ve just fine.

OpenBSD 6.1 and Current as a guest on Proxmox 5.0 Ve Fails on
Intel Platforms such as
Ivy Bridge (Xeon E5 2660 V2)
Xeon X5650 Based systems

OpenBSD 6.1 and Current seems to work as a guest on Proxmox 5.0 Ve
on AMD Opteron G2 Processors
and on Intel X5550 Xeon Processors

The Issue when it occurs seems to occur regardless of the type of
VGA Adapter.
I have tried Spice, QXL / Cirrus, / Vmware Compatible VGA /Standard
VGA.

Im happy to give access to a Test  Proxmox 5 Infrastucture  if
any dev has the time  to take a look at the issue.
I have also opened a support case with proxmox themselves regarding
this Issue

Thanks
Tom Smyth

On 19 July 2017 at 18:47, Tom Smyth <tom.sm...@wirelessconnect.eu> wrote:
> Hello
> Just an Update,
> Proxmox5.0 running on AMD Opteron G2 2435  Based systems
>  are NOT affected by the bug
>
> So the Bug seems to only affect Intel systems (well)
> IvyBridge Xeon e5 2660-v2 or Xeon X5650 based systems
>
>
> the OPenBSD 6.1 Release and OpenBSD Current systems
> running on proxmox 5.0 ve run fine without the Standard
> VGA Display...on Intel systems
> (ie they are operating on serial console only)
>
> I hope this helps



Console freeze OpenBSD 6.1 release and curent on proxmox ve 5.0

2017-07-19 Thread Tom Smyth
Hello
Just an Update,
Proxmox5.0 running on AMD Opteron G2 2435  Based systems
 are NOT affected by the bug

So the Bug seems to only affect Intel systems (well)
IvyBridge Xeon e5 2660-v2 or Xeon X5650 based systems


the OPenBSD 6.1 Release and OpenBSD Current systems
running on proxmox 5.0 ve run fine without the Standard
VGA Display...on Intel systems
(ie they are operating on serial console only)

I hope this helps



Console freeze OpenBSD 6.1 release and curent on proxmox ve 5.0

2017-07-18 Thread Tom . smyth
>Synopsis:  VGA Console Freeze on proxmox pve 5.0
>Category:  Operating system Freeze 
>Environment:
System  : OpenBSD 6.1
Details : OpenBSD 6.1-current (GENERIC.MP) #98: Sun Jul 16 17:59:41 
MDT 2017
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
when running openbsd6.1 release or 6.1 current for a few minutes
in proxmox 5.0 using any emulated storage, network or cpu hardware
 if you hold keys down for a few seconds after about 5-10 minutes
of runtime the console will lock up and one core goes to 100% 
and  any ssh sessions that any ssh sessions that were running are disconnecteD
>How-To-Repeat:
install openbsd on proxmox 5.0ve 
using host CPUs 4 cores, 4GB ram, (any type of emulated hardare other than that
use defaults for openbsd headless install (all packages except for games and x
components)
reboot the openbsd box after install.
open up the web(vnc) console for the vm in proxmox. 
log intO openbsd on the vga console.
hold down any key on the console enter for example. 
run a few arbitary commands such as top and enter rubish text.
hold down enter for about 30 seconds to a minute.
the screen will freeze and any ssh sessions you had are disconnected. 
>Fix:
problem occurs regardless of emulated hardware (storage or networking)
problem occurs on KVM default CPUs (4 cores ) or Ivy bridge processors
or host processor emulation. 
removing the vga adapter and setting the "display in proxmox vm" to 
serial port seems to solve the issue


dmesg:
OpenBSD 6.1-current (GENERIC.MP) #98: Sun Jul 16 17:59:41 MDT 2017
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 4278042624 (4079MB)
avail mem = 4142596096 (3950MB)
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0xf68e0 (10 entries)
bios0: vendor SeaBIOS version "rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org" 
date 04/01/2014
bios0: QEMU Standard PC (i440FX + PIIX, 1996)
acpi0 at bios0: rev 0
acpi0: sleep states S3 S4 S5
acpi0: tables DSDT FACP APIC HPET SRAT
acpi0: wakeup devices
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 443.00 MHz
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,SMEP,ERMS,ARAT
cpu0: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu0: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 1000MHz
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 625.20 MHz
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,SMEP,ERMS,ARAT
cpu1: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu1: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 618.69 MHz
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,SMEP,ERMS,ARAT
cpu2: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu2: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Xeon(R) CPU E5-2660 v2 @ 2.20GHz, 618.47 MHz
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,SS,HTT,SSE3,PCLMUL,SSSE3,CX16,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND,HV,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,FSGSBASE,SMEP,ERMS,ARAT
cpu3: 64KB 64b/line 2-way I-cache, 64KB 64b/line 2-way D-cache, 512KB 64b/line 
16-way L2 cache
cpu3: ITLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: DTLB 255 4KB entries direct-mapped, 255 4MB entries direct-mapped
cpu3: smt 0, 

Re: Bugs in Relayd demote option

2012-01-13 Thread Tom Knienieder
Exactly ...

Am 13.01.2012 um 16:02 schrieb Peter Hessler:

 I would really like the feature to be available, but I agree that the
 current implentation causes problems.

 The correct behaviour is +1/-1, of course.


 On 2012 Jan 13 (Fri) at 15:51:54 +0100 (+0100), Pierre-Yves Ritschard
wrote:
 :This would make sense to me
 :
 :On Sat, Jan 14, 2012 at 3:33 PM, Camiel Dobbelaar c...@sentia.nl wrote:
 :
 :
 : On Wed, 11 Jan 2012, Henning Brauer wrote:
 :
 : * Camiel Dobbelaar c...@sentia.nl [2012-01-11 19:35]:
 :  Maybe we should take the global demote option out then, I don't think
 :  there is a way that relayd can tell that the demote counter was
raised
 :  by a previous relayd.  (besides picking some magic value)
 : 
 :  Here's the manpage description from relayd.conf:
 : 
 :demote group
 :Enable the global carp(4) demotion option, resetting
the
 :carp
 :demotion counter for the specified interface group to
zero
 :on
 :startup and to 128 on shutdown of the daemon.  For
more
 :information on interface groups, see the group keyword
in
 :ifconfig(8).
 :
 : ugh. that is pretty damn wrong.
 :
 :
 : How about removing it then?
 :
 :
 : Index: parse.y
 : ===
 : RCS file: /cvs/src/usr.sbin/relayd/parse.y,v
 : retrieving revision 1.159
 : diff -u -p -u -r1.159 parse.y
 : --- parse.y 21 Sep 2011 18:45:40 -  1.159
 : +++ parse.y 13 Jan 2012 14:31:05 -
 : @@ -365,24 +365,6 @@ main   : INTERVAL NUMBER   {
 :}
 :conf-sc_prefork_relay = $2;
 :}
 : -   | DEMOTE STRING {
 : -   if (loadcfg)
 : -   break;
 : -   conf-sc_flags |= F_DEMOTE;
 : -   if (strlcpy(conf-sc_demote_group, $2,
 : -   sizeof(conf-sc_demote_group))
 : -   = sizeof(conf-sc_demote_group)) {
 : -   yyerror(yyparse: demote group name too
 :long);
 : -   free($2);
 : -   YYERROR;
 : -   }
 : -   free($2);
 : -   if (carp_demote_init(conf-sc_demote_group, 1)
==
 :-1) {
 : -   yyerror(yyparse: error initializing
group
 :%s,
 : -   conf-sc_demote_group);
 : -   YYERROR;
 : -   }
 : -   }
 :| SEND TRAP {
 :if (loadcfg)
 :break;
 : Index: relayd.c
 : ===
 : RCS file: /cvs/src/usr.sbin/relayd/relayd.c,v
 : retrieving revision 1.104
 : diff -u -p -u -r1.104 relayd.c
 : --- relayd.c4 Sep 2011 20:26:58 -   1.104
 : +++ relayd.c13 Jan 2012 14:31:05 -
 : @@ -361,8 +361,6 @@ parent_shutdown(struct relayd *env)
 :proc_kill(env-sc_ps);
 :control_cleanup(env-sc_ps-ps_csock);
 :carp_demote_shutdown();
 : -   if (env-sc_flags  F_DEMOTE)
 : -   carp_demote_reset(env-sc_demote_group, 128);
 :
 :free(env-sc_ps);
 :free(env);
 : Index: relayd.conf.5
 : ===
 : RCS file: /cvs/src/usr.sbin/relayd/relayd.conf.5,v
 : retrieving revision 1.124
 : diff -u -p -u -r1.124 relayd.conf.5
 : --- relayd.conf.5   24 Jun 2011 14:42:36 -  1.124
 : +++ relayd.conf.5   13 Jan 2012 14:31:05 -
 : @@ -115,17 +115,6 @@ table \*(Ltwebhosts\*(Gt {
 :  .Sh GLOBAL CONFIGURATION
 :  Here are the settings that can be set globally:
 :  .Bl -tag -width Ds
 : -.It Ic demote Ar group
 : -Enable the global
 : -.Xr carp 4
 : -demotion option, resetting the carp demotion counter for the
 : -specified interface group to zero on startup and to 128 on shutdown of
 : -the daemon.
 : -For more information on interface groups,
 : -see the
 : -.Ic group
 : -keyword in
 : -.Xr ifconfig 8 .
 :  .It Ic interval Ar number
 :  Set the interval in seconds at which the hosts will be checked.
 :  The default interval is 10 seconds.
 :

 --
 If everybody minded their own business, the world would go
 around a deal faster.
   -- The Duchess, Through the Looking Glass



Re: Bugs in Relayd demote option

2011-11-29 Thread Tom Knienieder
Same problem and requirement here, I4m looking into this in the next days.

Regards,
Tom

Am 23.11.2011 um 09:09 schrieb aniyokshuffle:

 Hi,

 In OpenBSD 5.0 the demote option of relayd.conf doesn't work anymore .

 Extract of relayd.conf :

 # Global Options
 demote carp
 timeout 3000
 prefork 15
 interval 5

 if i pkill relayd , carp interface stay in MASTER .

 In OpenBSD 4.8 it works well.



Re: kernel/6561

2011-03-05 Thread Tom Murphy
I put in printfs and delays and found the system becomes unstable in
cpu_boot_secondary() in arch/amd64/amd64/cpu.c on line 458:

ci-ci_flags |= CPUF_GO; /* XXX atomic */

Whatever acpimadt(4) does with the processor and I/O APICs, appears to
have a knock-on effect with that piece of code to make the system 
unstable. I know that disabling acpimadt(4) is not a solution, but
it appears to be a deciding factor on whether or not the machine can
start up the secondary processor or not.

If I were to put Debugger(); in before that line and perhaps print
ci-ci_flags, would that possibly give some clues?

Should I contact kettenis@? I know your time is valuable and would
rather not waste it if I can.

Tom