Re: dwqe ifconfig down panic

2024-03-27 Thread Patrick Wildt
On Fri, Mar 01, 2024 at 12:00:29AM +0100, Alexander Bluhm wrote:
> Hi,
> 
> When doing flood ping transmit from a machine and simultaneously
> ifconfig down/up in a loop, dwqe(4) interface driver crashes.
> 
> dwqe_down() contains an interrupt barrier, but somehow it does not
> work.  Immediately after Xspllower() a transmit interrupt is
> processed.
> 
> bluhm

Unfortunately I can't see it in the dmesg, but I wonder: Is it MSIs?
Maybe the edge-triggered interrupt stays in the controller because it
isn't cleared.  But things you could try are:

* Clear the IRQ status in addition to disabling them.  This might not
  do something in case the MSI is already in the IRQ, there are no
  takebacks.  But then maybe when the interrupt fires, the code path
  sees the cleared status and doesn't run the tx/rx proc.
* Don't run TX/RX proc in case the interface is down?

Cheers,
Patrick

> kernel: protection fault trap, code=0
> Stopped at  m_tag_delete_chain+0x30:movq0(%rsi),%rax
> 
> ddb{0}> trace
> m_tag_delete_chain(fd806bfa5300) at m_tag_delete_chain+0x30
> m_free(fd806bfa5300) at m_free+0x9e
> m_freem(fd806bfa5300) at m_freem+0x38
> dwqe_tx_proc(80304800) at dwqe_tx_proc+0x194
> dwqe_intr(80304800) at dwqe_intr+0x9b
> intr_handler(80003f86e760,805f4f80) at intr_handler+0x72
> Xintr_ioapic_edge36_untramp() at Xintr_ioapic_edge36_untramp+0x18f
> Xspllower() at Xspllower+0x1d
> dwqe_ioctl(80304870,80206910,80003f86e990) at dwqe_ioctl+0x18c
> ifioctl(fd81ffabe1e8,80206910,80003f86e990,80003f94e550) at 
> ifioctl+0x726
> sys_ioctl(80003f94e550,80003f86eb50,80003f86eac0) at 
> sys_ioctl+0x2af
> syscall(80003f86eb50) at syscall+0x55b
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x73ef48509270, count: -13
> 
> ddb{0}> show register
> rdi   0xfd806bfa5300
> rsi   0xdeafbeaddeafbead
> rbp   0x80003f86e5f0
> rbx0xf40
> rdx0
> rcx0
> rax   0xab56__ALIGN_SIZE+0x9b56
> r8  0x90
> r9 0x24634ac__kernel_rodata_phys+0x3624ac
> r10   0xe676ed611cc13e4f
> r11   0xd2619954b795f246
> r12   0x81110f48
> r13   0xfd807282
> r14   0xfd806bfa5300
> r15   0xfd805f6def00
> rip   0x81daae80m_tag_delete_chain+0x30
> cs   0x8
> rflags   0x10282__ALIGN_SIZE+0xf282
> rsp   0x80003f86e5d0
> ss  0x10
> m_tag_delete_chain+0x30:movq0(%rsi),%rax
> 
> ddb{0}> x/s version
> version:OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 2024\012 
>r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP\012
> 
> ddb{0}> ps
>PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> *70039   16536  80360  0  7   0x803ifconfig
>  41531  214934  36719 51  3   0x8100033  netlock   ping
> 
> OpenBSD 7.5 (GENERIC.MP) #2: Thu Feb 29 23:42:26 CET 2024
> r...@ot50.obsd-lab.genua.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 8038207488 (7665MB)
> avail mem = 7773556736 (7413MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.3 @ 0x769c7000 (85 entries)
> bios0: vendor American Megatrends Inc. version "1.02.10" date 06/27/2022
> efi0 at bios0: UEFI 2.7
> efi0: American Megatrends rev 0x50013
> acpi0 at bios0: ACPI 6.2
> acpi0: sleep states S0 S5
> acpi0: tablesfg0: addr 0xc000, bus 0-255
> acpihpet0 at acpi0: 1920 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel Atom(R) x6425RE Processor @ 1.90GHz, 1895.90 MHz, 06-96-01, patch 
> 0017
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,CX16,xTPR,PDCM,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,RDRAND,NXE,RDTSCP,LONG,LAHF,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SMEP,ERMS,RDSEED,SMAP,CLFLUSHOPT,CLWB,PT,SHA,UMIP,WAITPKG,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,IBRS_ALL,SKIP_L1DFL,MDS_NO,IF_PSCHANGE,MISC_PKG_CT,ENERGY_FILT,FB_CLEAR,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 1MB 64b/line 
> 12-way L2 cache, 4MB 64b/line 16-way L3 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 38MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.0.2.2.1.1.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel Atom(R) x6425RE Processor @ 

Re: ARM64 installation with new snapshots not possible any longer

2023-07-07 Thread Patrick Wildt
Root cause has been found, the next snap, whenever it will be out, will have 
the fixes!

Sorry it took so long, reverting wasn‘t a solution as that would have broken 
another machine, and it took some time too find out what was wrong.

Cheers,
Patrick

> 
> On 20. Jun 2023, at 12:35, develo...@robert-palm.de wrote:
> 
> 
> Quoting Mark Kettenis :
> 
>>> Date: Tue, 20 Jun 2023 09:31:58 +0200
>>> From: develo...@robert-palm.de
>>> 
>>> Hi,
>>> 
>>> I noticed that an ARM64 installation with latest snapshots is not
>>> possible any longer in hetzner cloud arm64 instances (ampere altra).
>>> 
>>> Last snapshot working is
>>> https://ftp.hostserver.de/archive/2023-06-18-0105/snapshots/arm64/miniroot73.img
>>> 
>>> Later snapshots get stuck at "scsibus1 at softraid0: 256 targets".
>>> 
>>> So it does not get to "root on rd0a swap on rd0b dump on rd0b" any more.
>>> 
>>> Think there were 2 or 3 changes related to arm64 between (17-Jun-2023
>>> 06:36) and (18-Jun-2023 19:59) that might cause this.
>>> 
>>> Please, can you have a look at it?
>> 
>> The most likely candidate is:
>> 
>>  CVSROOT:/cvs
>>  Module name:src
>>  Changes by: kette...@cvs.openbsd.org2023/06/18 10:25:21
>> 
>>  Modified files:
>>  sys/arch/arm64/dev: agintc.c
>> 
>>  Log message:
>>  Remove spurious comment.
>> 
>>  ok patrick@
>> 
>> Can you try reverting that change and see of the resulting kernel boots?
>> 
>> Also, I'd like to understand why you're hitting this case.  Can you
>> show a dmesg from the last working kernel?
> 
> Thanks for your fast reply.
> 
> Please find the dmesg_log.txt attached.
> 
> Is it possible you revert the change and create a new miniroot.img file for 
> me ?
> 
> Currently I am just using images. Need to figure out how to compile the 
> sources and create the image to try the installation again.
> 
> 
> 
> 
> 



Re: ARM64 installation with new snapshots not possible any longer

2023-07-04 Thread Patrick Wildt
On Sat, Jun 24, 2023 at 05:48:15PM +0200, Volker Schlecht wrote:
> > > Date: Tue, 20 Jun 2023 09:31:58 +0200
> > > From: develo...@robert-palm.de
> > > 
> > > Hi,
> > > 
> > > I noticed that an ARM64 installation with latest snapshots is not
> > > possible any longer in hetzner cloud arm64 instances (ampere altra).
> > > 
> > > Last snapshot working is  
> > > https://ftp.hostserver.de/archive/2023-06-18-0105/snapshots/arm64/miniroot73.img
> [...]
> > The most likely candidate is:
> > 
> >   CVSROOT:/cvs
> >   Module name:src
> >   Changes by: kette...@cvs.openbsd.org2023/06/18 10:25:21
> >   Modified files:
> >   sys/arch/arm64/dev: agintc.c  Log message:
> >   Remove spurious comment.
> >   ok patrick@
> > 
> > Can you try reverting that change and see of the resulting kernel boots?
> 
> I hope that commit didn't have any functional consequences, so I built a
> -current kernel with the previous diff reverted, and it boots fine now:
> 
> CVSROOT:  /cvs
> Module name:  src
> Changes by:   kette...@cvs.openbsd.org2023/06/17 16:10:19
> 
> Modified files:
>   sys/arch/arm64/dev: agintc.c
> 
> Log message:
> Flush the ITS after we disestablish an MSI.  Fixes an issue seen on Ampere
> eMAG with an AMD GPU with an HD audio function where azalia(4) doesn't
> fully attach.
> 
> ok patrick@

It should not have a functional consequence on the VM, but obviously it
does make a difference because it stops booting.  I have tried to come
up with a diff that replaces INVALL with targeted INV, but those still
lead to a hang.

Since backing this diff out breaks another machine, a solution needs to
be found, not a backout.  I'll have a look.



Re: Hetzner arm64 Cloud

2023-04-23 Thread Patrick Wildt
On Tue, Apr 18, 2023 at 04:25:27PM +0200, Patrick Wildt wrote:
> On Sun, Apr 16, 2023 at 11:39:33PM +0200, Patrick Wildt wrote:
> > You can also simply dd the image to /dev/sda and reboot, but that still
> > doesn't solve the problem.  The bootup is hard to debug because the
> > console is KVM and uses viogpu.  As soon as we exit the EFI bootservices
> > the framebuffer is shut down for whatever reason.  Means we can only get
> > access to it again through viogpu, which happens pretty late.  I wish we
> > had a serial console, because Qemu/edk2 can do it, they just don't make
> > it available.  This is gonna be "fun" to debug without serial.
> 
> It actually wasn't too bad.  We will have to land a few diffs in the tree,
> and then snapshots will have to catch up.  I think you'll be able to run
> on Hetzner VMs in a few weeks.  I'll send an update as soon as it works
> with the snapshots.

The diffs to make the Hetzner arm64 VM have been committed.  It might
take a few days for snapshots to appear that contain those diffs.

Note that I still have an issue where sometimes the console believes
that I'm pressing the return key, even though I am not anymore.  I
propose you add an auto_install.conf to the ramdisk, something like:

ftp http://cdn.openbsd.org/pub/OpenBSD/snapshots/arm64/bsd.rd
rdsetroot -x bsd.rd > mr.fs
vnconfig vnd0 mr.fs
mount /dev/vnd0a /mnt
vim /mnt/auto_install.conf
umount /mnt
vnconfig -u vnd0
rdsetroot bsd.rd mr.fs

ftp http://cdn.openbsd.org/pub/OpenBSD/snapshots/arm64/miniroot73.img
vnconfig vnd0 miniroot73.img
mount /dev/vnd0a /mnt
cp bsd.rd /mnt/
umount /mnt
vnconfig -u vnd0

Cheers,
Patrick



Re: Hetzner arm64 Cloud

2023-04-18 Thread Patrick Wildt
On Sun, Apr 16, 2023 at 11:39:33PM +0200, Patrick Wildt wrote:
> You can also simply dd the image to /dev/sda and reboot, but that still
> doesn't solve the problem.  The bootup is hard to debug because the
> console is KVM and uses viogpu.  As soon as we exit the EFI bootservices
> the framebuffer is shut down for whatever reason.  Means we can only get
> access to it again through viogpu, which happens pretty late.  I wish we
> had a serial console, because Qemu/edk2 can do it, they just don't make
> it available.  This is gonna be "fun" to debug without serial.

It actually wasn't too bad.  We will have to land a few diffs in the tree,
and then snapshots will have to catch up.  I think you'll be able to run
on Hetzner VMs in a few weeks.  I'll send an update as soon as it works
with the snapshots.

Cheers,
Patrick



Re: Hetzner arm64 Cloud

2023-04-16 Thread Patrick Wildt
You can also simply dd the image to /dev/sda and reboot, but that still
doesn't solve the problem.  The bootup is hard to debug because the
console is KVM and uses viogpu.  As soon as we exit the EFI bootservices
the framebuffer is shut down for whatever reason.  Means we can only get
access to it again through viogpu, which happens pretty late.  I wish we
had a serial console, because Qemu/edk2 can do it, they just don't make
it available.  This is gonna be "fun" to debug without serial.

On Sat, Apr 15, 2023 at 11:33:39AM +0100, Chris Narkiewicz wrote:
> 
> I asked Hetzner to import install73.img and mounted it as VM CD-ROM,
> but it doesn't boot. I'm not sure if this is a bug either.
> 
> Cheers,
> Chris Narkiewic
> 
> On Thu, 2023-04-13 at 16:16 +, Mikolaj Kucharski wrote:
> > Hi,
> > 
> > I'm not sure does this belong to bugs@
> > 
> > However what I used in the past was Yaifo and I still use it every
> > few
> > years, but it takes too much effort to rebase it to -current, so I
> > didn't touch it for few years now, but for me it worked really
> > nicely.
> > 
> > https://github.com/jedisct1/yaifo
> > 
> > 
> > On Thu, Apr 13, 2023 at 09:00:23AM +0200, Peter J. Philipp wrote:
> > > Hi,
> > > 
> > > Yesterday hetzner.com came out with arm64 cloud instances, I tried
> > > one out.
> > > Here is what I found.  The images they give you a choice of does
> > > not include
> > > OpenBSD, so I had to get a ubuntu OS.  That's fine the EFI
> > > partition was
> > > already mounted.  Through trialing this I found the best way of
> > > getting the
> > > OpenBSD loader to boot was the following way:
> > > 
> > > 1. place miniroot73.img on the EFI partition root (/boot/efi/)
> > > 2. reboot
> > > 3. press escape to get to the BIOS, there is 3 options one is a
> > > configuration
> > >    option under 1, enter it.  I'm working off memory here I didn't
> > > save 
> > >    anything so take it with a grain of salt on exactness.  In this
> > > option is
> > >    an option to create a RAM drive from a file, go there and enter
> > > the
> > >    miniroot73.img (45MB).  The down arrows didn't work in this BIOS
> > > so it was
> > >    great that it wrapped around going up.
> > > 4. next go back into the main bios screen by pressing escape. 
> > > There is option
> > >    3 for boot options, enter it.  There is a boot from file option
> > > enter it.
> > >    Select the RAM drive and manouver your way to the bootaa64.efi
> > > file.  Press
> > >    enter.
> > > 5. OpenBSD loader now loads.  ls displays bsd and bsd.rd, the
> > > console is on
> > >    comcons0 or something like that.  Switching to fb0 works too. 
> > > Then when
> > >    pressing boot a blank screen happens.  Waiting a while no
> > > prompts and I
> > >    didn't try to blind type anything.  Doing this again with fb0
> > > doesn't
> > >    work either.
> > > 6. Full stop, I didn't get further.
> > > 
> > > I then deleted my instance as ubuntu is not good enough for me.  I
> > > guess we'll
> > > have to wait until the pros get to it.  Thanks!
> > > 
> > > Best Regards,
> > > -peter
> > > 
> > 
> 
> -- 
> +44 7502 415 180 (Phone, Signal, WhatsApp)
> @ezaquarii:etacassiopeiae.net (Matrix)



Re: arm64 (rockpro64) regression

2022-09-19 Thread Patrick Wildt
On Sun, Sep 18, 2022 at 10:45:41AM +, Klemens Nanni wrote:
> On Sun, Sep 18, 2022 at 12:13:34PM +0200, Martin Pieuchot wrote:
> > The rockpro64 no longer boots in multi-user on -current.  It hangs after
> > displaying the following lines:
> > 
> > rkiis0 at mainbus0
> > rkiis1 at mainbus0
> > 
> > The 8/09 snapshot works, the next one from 11/09 doesn't.
> 
> Smells like a similar hang in 'rkvop0 at ...' I see on the Pinebook Pro.
> 
> Reverting this sys/dev/ofw/fdt.c fixed it (I mailed them already):
> 
>   revision 1.31
>   date: 2022/09/11 08:33:03;  author: kettenis;  state: Exp;  lines: +21 
> -4;
>   Change OF_getnodebyname() such that lokking up a node using just the 
> name
>   without a unit number (so without the @1234 bit) works as well.
> 
>   ok patrick@, gkoehler@
> 
> > 
> > bsd.rd still boots.
> 
> Same on Pinebook Pro.

Found the issue, see inline.

> commit a4fa68764dee46c3d376031e17913b52c65045a0
> Author: kettenis 
> Date:   Sun Sep 11 08:33:03 2022 +
> 
> Change OF_getnodebyname() such that lokking up a node using just the name
> without a unit number (so without the @1234 bit) works as well.
> 
> ok patrick@, gkoehler@
> 
> diff --git a/sys/dev/ofw/fdt.c b/sys/dev/ofw/fdt.c
> index 9de907d6693..834c83ad55d 100644
> --- a/sys/dev/ofw/fdt.c
> +++ b/sys/dev/ofw/fdt.c
> @@ -1,4 +1,4 @@
> -/*   $OpenBSD: fdt.c,v 1.30 2022/08/21 12:52:10 jasper Exp $ */
> +/*   $OpenBSD: fdt.c,v 1.31 2022/09/11 08:33:03 kettenis Exp $   */
>  
>  /*
>   * Copyright (c) 2009 Dariusz Swiderski 
> @@ -882,16 +882,33 @@ int
>  OF_getnodebyname(int handle, const char *name)
>  {
>   void *node = (char *)tree.header + handle;
> + void *child;
> + char *data;
> + int len;
>  
>   if (handle == 0)
>   node = fdt_find_node("/");
>  
> - for (node = fdt_child_node(node); node; node = fdt_next_node(node)) {
> - if (strcmp(name, fdt_node_name(node)) == 0)
> + for (child = fdt_child_node(node); child;
> +  child = fdt_next_node(child)) {
> + if (strcmp(name, fdt_node_name(child)) == 0)
>   break;
>   }
> + if (child)
> + return (char *)child - (char *)tree.header;
> +
> + len = strlen(name);
> + for (child = fdt_child_node(node); child;
> +  child = fdt_next_node(node)) {

The call to fdt_next_node() needs to happen on child instead of node,
otherwise this will become an endless loop.

> + data = fdt_node_name(child);
> + if (strncmp(name, data, len) == 0 &&
> + strlen(data) > len && data[len] == '@')
> + break;
> + }
> + if (child)
> + return (char *)child - (char *)tree.header;
>  
> - return node ? ((char *)node - (char *)tree.header) : 0;
> + return 0;
>  }
>  
>  int



Re: Possible fix for bwfm driver not working after resume with A/C charger connected

2022-07-19 Thread Patrick Wildt
Am Sat, Jul 02, 2022 at 08:52:57PM -0500 schrieb 
leonardo.moreno.urbi...@gmail.com:
> >Synopsis:bwfm driver sends message "could not set mpc " after resuming 
> >from sleep with A/C charger connected
> >Category:kernel amd64
> >Environment:
>   System  : OpenBSD 7.1
>   Details : OpenBSD 7.1-current (GENERIC.MP) #257: Sat Jul  2 
> 20:25:54 CDT 2022
>
> leomor...@leomacbsd.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Wi-Fi works well at boot time. The problem is when resuming from sleep 
> while the A/C charger is connected.
>   Kernel sends a message "could not set mpc". This problem does not 
> appear when resuming from sleep while
>   the laptop is running from battery.
> >How-To-Repeat:
>   1) Boot normally.
>   2) Close lid and wait for laptop to go to sleep while A/C charger is 
> connected.
>   3) Open lid to resume.
>   4) Kernel message "could not set mpc" appears.
> >Fix:
>   I attach possible fix. Compared it to linux kernel driver and seems to 
> me that the logic in second argument is
>   inverted.
>   Tested the fix and now the device operates normally after resuming with 
> or without A/C charger.

No, it's not inverted. If bwfm_pci_send_mb_data() fails we want to
re-init the device, that's what both Linux and OpenBSD are doing.

Better questions are: why is bwfm_pci_send_mb_data() failing?  Why is
there a pending mb command?

And: why doesn't cleanup and reactivate work?

Cheers,
Patrick

>   P.D I'm a total noob with respect to any kernel modification, cvs and 
> even sending this report. Please do tell me
>   if something needs fixing. Thanks.
> 
> 
> Index: if_bwfm_pci.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_bwfm_pci.c,v
> retrieving revision 1.71
> diff -u -p -u -p -r1.71 if_bwfm_pci.c
> --- if_bwfm_pci.c 21 Mar 2022 19:46:56 -  1.71
> +++ if_bwfm_pci.c 3 Jul 2022 01:29:19 -
> @@ -954,7 +954,7 @@ bwfm_pci_activate(struct device *self, i
>   /* If device can't be resumed, re-init. */
>   if (bwfm_pci_intmask(sc) == 0 ||
>   bwfm_pci_send_mb_data(sc,
> - BWFM_PCI_H2D_HOST_D0_INFORM) != 0) {
> + BWFM_PCI_H2D_HOST_D0_INFORM) == 0) {
>   bwfm_cleanup(bwfm);
>   bwfm_pci_cleanup(sc);
>   }
> 
> 
> dmesg:
> OpenBSD 7.1-current (GENERIC.MP) #257: Sat Jul  2 20:25:54 CDT 2022
> leomor...@leomacbsd.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17032024064 (16243MB)
> avail mem = 16498462720 (15734MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x78f8a000 (34 entries)
> bios0: vendor Apple Inc. version "427.140.8.0.0" date 06/13/2021
> bios0: Apple Inc. MacBookPro11,5
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP HPET APIC SBST ECDT SSDT SSDT SSDT SSDT SSDT SSDT 
> SSDT SSDT SSDT SSDT DMAR MCFG VFCT
> acpi0: wakeup devices PEG0(S3) GFX0(S3) PEG1(S3) PEG2(S3) EC__(S3) GMUX(S3) 
> HDEF(S3) RP03(S4) ARPT(S4) RP04(S4) XHC1(S3) ADP1(S3) LID0(S3)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz, 3492.37 MHz, 06-46-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB 
> 64b/line 8-way L2 cache, 6MB 64b/line 12-way L3 cache, 128MB 64b/line 16-way 
> L4 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz, 3491.92 MHz, 06-46-01
> cpu1: 
> 

Re: Bug report for Gigabyte_R152_P30 - Multiprocessor boot fails with kernel panic

2022-06-13 Thread Patrick Wildt
Am Mon, Jun 13, 2022 at 07:44:24PM +0200 schrieb Mark Kettenis:
> > From: kiltz 
> > Date: Mon, 13 Jun 2022 18:12:27 +0200
> > 
> > Dear Mark,
> > first of all, thank you very much for your explainations, the diff  
> > and, indeed, the ultra swift reply!
> > That helps us a lot already.
> > A snapshot with a higher value of max CPUs out of the box, of course,  
> > would be the proverbial icing on the cake.
> > Probably a strange question but I hazard it anyways - should we  
> > monitor the snapshot directory the /pub/OpenBSD/snapshots folder or is  
> > there a quicker way to find out what your fellow developers think?
> > Again, many thanks for your help and best wishes,
> 
> Hi Stefan,
> 
> Theo put that diff in snaphots.  I suspect that tomorrow's snapshot
> will have it.  You can easily tell, since all 80 CPUs should attach
> with that diff.
> 
> Cheers,
> 
> Mark

And it's nice to hear that SP install already worked.  I remember
booting it up on an Oracle machine with an Ampere Altra which led
to messages like

agintcmsi0 at agintc0: unsupported type 0x001700026f31

See http://ix.io/3GEX

While I had a diff somewhere to 'fix that', I never got the timer
interrupt to fire.

That you already had an SP install means all that should be fine.
If this change/new snap works, I'd be interested to read a full
dmesg!

Cheers,
Patrick



Re: kernel: page fault trap, code=0 after zzz on amd64 apu1d

2022-02-14 Thread Patrick Wildt
Am Mon, Feb 14, 2022 at 05:57:44PM -0700 schrieb Theo de Raadt:
> your root filesystem is on a detachable device?

Sure?  It seems to use sd0a for root, and that is coming from ahci0.
There is an sd1 though, which seems detachable, but that is not used
as root device as far as I can see?

> That does not work.  Suspend/resume has been written for
> reasonable laptops which contain a bunch of ACPI tables usually
> curated by vendors to at least work.
> 
> What you are trying to do is silly.
> 
> Tilo Stritzky  wrote:
> 
> > After issuing zzz on a pcengines apu1d running amd64 I found myself at
> > the ddb prompt (full transcript below):
> > 
> > uvm_fault(0x82326b70, 0x0, 0, 1) -> e
> > kernel: page fault trap, code=3D3D0
> > Stopped at  bufq_destroy+0x83:  movq0(%rcx),%rcx
> > TIDPIDUID PRFLAGS PFLAGS  CPU  COMMAND
> >   28358   7357   1000 0x3  01  bash
> > *514894  76393  0 0x14000  0x2000K usbtask
> > bufq_destroy(80a73d10) at bufq_destroy+0x83
> > sddetach(80a73c00,1) at sddetach+0x42
> > config_detach(80a73c00,1) at config_detach+0x142
> > scsi_detach_link(807c6e00,1) at scsi_detach_link+0x4d
> > scsibusdetach(80a86580,1) at scsibusdetach+0x54
> > config_detach(80a86580,1) at config_detach+0x142
> > umass_scsi_detach(80a73a00,1) at umass_scsi_detach+0x35
> > umass_detach(80a73a00,1) at umass_detach+0xd4
> > config_detach(80a73a00,1) at config_detach+0x142
> > usbd_detach(807c6d00,80761a00) at usbd_detach+0x5a
> > uhub_detach(80761a00,1) at uhub_detach+0x85
> > config_detach(80761a00,1) at config_detach+0x142
> > usbd_detach(80749300,80749200) at usbd_detach+0x5a
> > usb_explore(80749200) at usb_explore+0x144
> > end trace frame: 0x800020fcdc80, count: 0
> > https://www.openbsd.org/ddb.html describes the minimum info required in bu=
> > g
> > --db_more--=3D08=3D08=3D08=3D08=3D08=3D08=3D08=3D08=3D08=3D08=3D08=
> >=3D08=3D08=3D08=3D08=3D08=3D08=3D
> > =3D08=3D08=3D08=3D08=3D08reports.  Insufficient info makes it difficult to=
> >  find and f=3D
> > ix bugs.
> > 
> > This is with a snapshot
> > $ cat BUILDINFO
> > Build date: 1643740875 - Tue Feb  1 18:41:15 UTC 2022
> > 
> > 
> > I've got another one with a homebuild from -current as of 2022-02-13,
> > will put traces in another mail.
> > Dunno if this is a regression, the last time I tried suspend with this
> > machine was several years ago and it wouldn't resume at all.
> > 
> > This is mildly reproducible, I see it about 1 in 10 tries I run zzz on
> > this machine. I can't see any correlation with load or io activity.
> > ZZZ works fine on this machine.
> > 
> > Below is the log from console, including dmesg, reboot and a succesful
> > zzz.
> > Crash dump is available.
> > 
> > tilo
> > 
> > Press F10 key now for boot menu, N for PXE boot
> > Booting from Hard Disk...
> > Using drive 0, partition 3.
> > Loading..
> > probing: pc0 com0 com1 com2 com3 mem[639K 2029M a20=3D3Don]=3D20
> > disk: hd0+
> > >> OpenBSD/amd64 BOOT 3.53
> > |=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08sw=
> > itching console to com>> =3D
> > OpenBSD/amd64 BOOT 3.53
> > boot> 0=3D0D
> > 
> > |=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08booting hd0a:/=
> > bsd: -=3D08\=3D08|=3D08/=3D08=3D
> > -=3D08\=3D0815570200|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08=
> > /=3D08-=3D08\=3D08|=3D08/=3D08-=3D
> > =3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D=
> > 08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08=3D
> > /=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D=
> > 08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D
> > =3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D=
> > 08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08=3D
> > -=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D=
> > 08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D
> > =3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D=
> > 08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08=3D
> > \=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D=
> > 08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D
> > =3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D=
> > 08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08=3D
> > |=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D=
> > 08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D
> > =3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D=
> > 08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08=3D
> > /=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D=
> > 08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D
> > =3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08-=3D08\=3D=
> > 08|=3D08/=3D08-=3D08\=3D08|=3D08/=3D08=3D
> > 

Re: arm64 bootaa64 can load a kernel from non-'a' partition, but will only mount the 'a' partition on boot

2021-12-01 Thread Patrick Wildt
Hi,

I was actually wondering why we removed it and it stems from a
discussion with kettenis when I was doing cleanup, he wrote:

"Maybe it is time to retire the boot_file parsing completely.  These
days we use the bootduid.  The device we get from boot_file is only
used if we don't find a match for bootduid or if bootduid is set to
all zeroes."

I think what we forgot was that this feature allowed loading kernels
from another partition, as bootduid only selects the disk and not the
partition.

But yeah, it also looked about like this:

+   /* boot_file is of the format :/bsd we want the device part */
+   if ((p = strchr(boot_file, ':')) != NULL)
+   len = p - boot_file;
+   else
+   len = strlen(boot_file);
+   bootdv = parsedisk(boot_file, len, 0, );
+   if (tmpdev != NODEV)
+   part = DISKPART(tmpdev);

But I realize boot_file is gone and the 'original' boot_args is the
bootargs array.  I think your diff does make sense, but I don't think
we should change the printf.

Patrick

Am Mon, Nov 29, 2021 at 08:39:02AM -0600 schrieb Brian Conway:
> ping. Thanks!
> 
> Brian Conway
> 
> On Mon, Nov 15, 2021 at 4:20 PM Brian Conway  wrote:
> >
> > I noticed that unlike amd64 and i386, arm64's bootaa64 supports
> > loading a kernel from a non-'a' partition (i.e. boot> sd0d:/bsd),
> > however the kernel does not respect that partition when mounting the
> > root filesystem. It looks like this is caused by hard-coding 'part =
> > 0' in arm64's autoconf.c.
> >
> > Digging further, it appears that at one point this behavior was present:
> >
> > https://marc.info/?l=openbsd-cvs=154703994927978
> > (git cfd4bff0313b7a29ea82f9484862326f1c6b2643)
> >
> > It was later removed as part of (I believe) a general clean up:
> >
> > https://marc.info/?l=openbsd-cvs=160466957311120
> > (git 35fd387b3e5263176046603ff3e402ceeef58cf3)
> >
> > My attempt at a minimally-invasive patch to restore this functionality
> > follows. Tested on RPi 3B+ and 3B, as those are the only arm64 devices
> > I have. May not be idiomatic/good code. Thanks.
> >
> > Brian Conway
> >
> > diff --git sys/arch/arm64/arm64/autoconf.c sys/arch/arm64/arm64/autoconf.c
> > index bda3cb3f6b0..5f68d2665e1 100644
> > --- sys/arch/arm64/arm64/autoconf.c
> > +++ sys/arch/arm64/arm64/autoconf.c
> > @@ -75,11 +75,23 @@ diskconf(void)
> >  {
> >  #if defined(NFSCLIENT)
> >  extern uint8_t *bootmac;
> > -dev_t tmpdev = NODEV;
> >  #endif
> > +dev_t tmpdev = NODEV;
> > +extern char bootargs[256];
> > +char *p;
> > +size_t len;
> >  struct device *bootdv = NULL;
> >  int part = 0;
> >
> > +/* Extract part from :/bsd */
> > +if ((p = strchr(bootargs, ':')) != NULL)
> > +len = p - bootargs;
> > +else
> > +len = strlen(bootargs);
> > +bootdv = parsedisk(bootargs, len, 0, );
> > +if (tmpdev != NODEV)
> > +part = DISKPART(tmpdev);
> > +
> >  #if defined(NFSCLIENT)
> >  if (bootmac) {
> >  struct ifnet *ifp;
> > diff --git sys/arch/arm64/arm64/machdep.c sys/arch/arm64/arm64/machdep.c
> > index c84cf6d9b13..2eb5da44409 100644
> > --- sys/arch/arm64/arm64/machdep.c
> > +++ sys/arch/arm64/arm64/machdep.c
> > @@ -1130,7 +1130,7 @@ process_kernel_args(void)
> >
> >  boot_args = cp;
> >
> > -printf("bootargs: %s\n", boot_args);
> > +printf("boot_args: %s\n", boot_args);
> >
> >  /* Setup pointer to boot flags */
> >  while (*cp != '-')
> >
> > dmesg (7.0-stable):
> >
> > OpenBSD 7.0-stable (GENERIC.MP) #6: Mon Nov 15 21:50:51 UTC 2021
> > 
> > bcon...@b70-arm64.int.rcesoftware.com:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > real mem  = 970907648 (925MB)
> > avail mem = 908574720 (866MB)
> > random: good seed from bootblocks
> > mainbus0 at root: Raspberry Pi 3 Model B Plus Rev 1.3
> > cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
> > cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu0: 512KB 64b/line 16-way L2 cache
> > cpu0: CRC32,ASID16
> > cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4
> > cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu1: 512KB 64b/line 16-way L2 cache
> > cpu1: CRC32,ASID16
> > cpu2 at mainbus0 mpidr 2: ARM Cortex-A53 r0p4
> > cpu2: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu2: 512KB 64b/line 16-way L2 cache
> > cpu2: CRC32,ASID16
> > cpu3 at mainbus0 mpidr 3: ARM Cortex-A53 r0p4
> > cpu3: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
> > cpu3: 512KB 64b/line 16-way L2 cache
> > cpu3: CRC32,ASID16
> > efi0 at mainbus0: UEFI 2.8
> > efi0: Das U-Boot rev 0x20210700
> > apm0 at mainbus0
> > simplefb0 at mainbus0: 656x416, 32bpp
> > wsdisplay0 at simplefb0 mux 1
> > wsdisplay0: screen 0-5 added (std, vt100 emulation)
> > "system" at mainbus0 not configured
> > "axi" at mainbus0 not configured
> > simplebus0 at mainbus0: "soc"
> > bcmclock0 at simplebus0
> > bcmmbox0 at simplebus0
> > 

Re: novena panic on boot

2021-08-28 Thread Patrick Wildt
Am Sat, Aug 28, 2021 at 05:22:34PM +1200 schrieb Graeme Neilson:
> >Synopsis:  imxspi causes a kernel panic during boot on novena (armv7)
> >Category:  kernel arm
> >Environment:
> System  : OpenBSD 6.9
> Details : OpenBSD 6.9 (GENERIC) #386: Tue Apr 20 04:06:48 MDT 2021
>  
> dera...@armv7.openbsd.org:/usr/src/sys/arch/armv7/compile/GENERIC
> 
> Architecture: OpenBSD.armv7
> Machine : armv7
> >Description:
> On the Novena bsd.rd boots and the install is successful but bsd fails to 
> boot due to a panic caused by imxpsi.
> 
> (Note: To be able to boot I had to create u-boot-dtb.img from stock u-boot 
> source. 
> The SPL in the dtb pkg looks for u-boot-dtb.img, not u-boot.img which is 
> supplied in the u-boot pkg, and fails to load load u-boot shell.)

That apparently was my fault.  Typically we store output from
OF_getproplen in an int (signed), because OF_getproplen can
return -1, I think.  This should fix it.

Patrick

diff --git a/sys/dev/fdt/imxspi.c b/sys/dev/fdt/imxspi.c
index 05015f878ef..68bfd62dc6c 100644
--- a/sys/dev/fdt/imxspi.c
+++ b/sys/dev/fdt/imxspi.c
@@ -91,7 +91,7 @@ struct imxspi_softc {
int  sc_node;
 
uint32_t*sc_gpio;
-   size_t   sc_gpiolen;
+   int  sc_gpiolen;
 
struct rwlocksc_buslock;
struct spi_controllersc_tag;
@@ -178,7 +178,7 @@ imxspi_attachhook(struct device *self)
clock_enable(sc->sc_node, NULL);
 
sc->sc_gpiolen = OF_getproplen(sc->sc_node, "cs-gpios");
-   if (sc->sc_gpiolen) {
+   if (sc->sc_gpiolen > 0) {
sc->sc_gpio = malloc(sc->sc_gpiolen, M_DEVBUF, M_WAITOK);
OF_getpropintarray(sc->sc_node, "cs-gpios",
sc->sc_gpio, sc->sc_gpiolen);
@@ -218,7 +218,8 @@ imxspi_detach(struct device *self, int flags)
 
HWRITE4(sc, SPI_CONREG, 0);
bus_space_unmap(sc->sc_iot, sc->sc_ioh, sc->sc_ios);
-   free(sc->sc_gpio, M_DEVBUF, sc->sc_gpiolen);
+   if (sc->sc_gpiolen > 0)
+   free(sc->sc_gpio, M_DEVBUF, sc->sc_gpiolen);
return 0;
 }
 



Re: acpi panic

2021-03-16 Thread Patrick Wildt
Am Tue, Mar 16, 2021 at 03:25:46PM -0700 schrieb Rafael Ávila de Espíndola:
> 
> Patrick Wildt  writes:
> 
> > Am Tue, Mar 16, 2021 at 09:24:46AM -0700 schrieb Rafael Ávila de Espíndola:
> >> Patrick Wildt  writes:
> >> 
> >> > Am Tue, Mar 16, 2021 at 08:17:09AM -0700 schrieb Rafael Ávila de 
> >> > Espíndola:
> >> >> Got the same error this morning after updating to MP#408.
> >> >> 
> >> >> Cheers,
> >> >> Rafael
> >> >
> >> > Is this a regression? Did 6.8 work and 6.9-beta broke it?  Were you
> >> > already following snapshots and switching from one snapshot to the
> >> > other broke it?
> >> 
> >> I was following snapshots, but not very frequently. The last one that I
> >> was using without any problems was from Feb 15 (or at least that is when
> >> I got the "upgrade log" email).
> >> 
> >> Cheers,
> >> Rafael
> >
> > Can this be the cause?  There have been only few changes in ACPI, and
> > this definitely does some AML store thing:
> >
> > https://github.com/openbsd/src/commit/c1053d6a5a6ff0c23fa9cda5b6bd2d6feb9d82b6
> 
> Yes, that is the one. A patch with
> c1053d6a5a6ff0c23fa9cda5b6bd2d6feb9d82b6,
> abe2aacc6845fd5ce80896a50e770d4116e9f80c and
> 71a585ed804adbdd9d4eee13175b0812699dc34b reverted boots fine, but one
> with just the last two reverted fails in the same way as the snapshot.
> 
> Thanks,
> Rafael
> 

Wait, so which one is it?



Re: acpi panic

2021-03-16 Thread Patrick Wildt
Am Tue, Mar 16, 2021 at 09:24:46AM -0700 schrieb Rafael Ávila de Espíndola:
> Patrick Wildt  writes:
> 
> > Am Tue, Mar 16, 2021 at 08:17:09AM -0700 schrieb Rafael Ávila de Espíndola:
> >> Got the same error this morning after updating to MP#408.
> >> 
> >> Cheers,
> >> Rafael
> >
> > Is this a regression? Did 6.8 work and 6.9-beta broke it?  Were you
> > already following snapshots and switching from one snapshot to the
> > other broke it?
> 
> I was following snapshots, but not very frequently. The last one that I
> was using without any problems was from Feb 15 (or at least that is when
> I got the "upgrade log" email).
> 
> Cheers,
> Rafael

Can this be the cause?  There have been only few changes in ACPI, and
this definitely does some AML store thing:

https://github.com/openbsd/src/commit/c1053d6a5a6ff0c23fa9cda5b6bd2d6feb9d82b6

The other "bigger" change is

https://github.com/openbsd/src/commit/abe2aacc6845fd5ce80896a50e770d4116e9f80c

but I don't think it's related to that.

Can you build a kernel without that aml_store() change and see if it
fixes something for you?

Patrick



Re: acpi panic

2021-03-16 Thread Patrick Wildt
Am Tue, Mar 16, 2021 at 08:17:09AM -0700 schrieb Rafael Ávila de Espíndola:
> Got the same error this morning after updating to MP#408.
> 
> Cheers,
> Rafael

Is this a regression? Did 6.8 work and 6.9-beta broke it?  Were you
already following snapshots and switching from one snapshot to the
other broke it?



Re: Unresponsive mouse on latest snapshot

2020-08-31 Thread Patrick Wildt
On Sat, Aug 29, 2020 at 08:12:37AM -0700, open...@evantann.com wrote:
> >Synopsis:Mouse is not functional in latest snapshot
> >Category:
> >Environment:
>   System  : OpenBSD 6.7
>   Details : OpenBSD 6.7-current (GENERIC.MP) #48: Fri Aug 28 23:21:33 
> MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   My mouse that formerly worked does not respond at all on the latest
>   snapshot. I cannot move it, click any button, etc. In dmesg I see
>   "uhidev_intr: bad repid 228" each time I reseat the mouse.
> >How-To-Repeat:
>   1. Install latest snapshot
>   2. Plug in mouse
>   3. Mouse does not work
> >Fix:
>   Unknown
> 
> 
> dmesg:
> OpenBSD 6.7-current (GENERIC.MP) #48: Fri Aug 28 23:21:33 MDT 2020
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17023057920 (16234MB)
> avail mem = 16492109824 (15728MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 3.0 @ 0x8b1a8000 (94 entries)
> bios0: vendor American Megatrends Inc. version "3801" date 03/14/2018
> bios0: ASUSTeK COMPUTER INC. Z170-AR
> acpi0 at bios0: ACPI 6.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC FPDT DBG2 MCFG SSDT FIDT SSDT SSDT HPET SSDT 
> SSDT UEFI SSDT LPIT WSMT SSDT SSDT DBGP
> acpi0: wakeup devices PEG0(S4) PEGP(S4) PEG1(S4) PEGP(S4) PEG2(S4) PEGP(S4) 
> SIO1(S3) PS2K(S4) PS2M(S4) UAR1(S4) RP09(S4) PXSX(S4) RP10(S4) PXSX(S4) 
> RP11(S4) PXSX(S4) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz, 3510.81 MHz, 06-5e-03
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 24MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1, IBE
> cpu1 at mainbus0: apid 2 (application processor)
> cpu1: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz, 3509.52 MHz, 06-5e-03
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 0, core 1, package 0
> cpu2 at mainbus0: apid 4 (application processor)
> cpu2: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz, 3509.51 MHz, 06-5e-03
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu2: 256KB 64b/line 8-way L2 cache
> cpu2: smt 0, core 2, package 0
> cpu3 at mainbus0: apid 6 (application processor)
> cpu3: Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz, 3509.51 MHz, 06-5e-03
> cpu3: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
> cpu3: 256KB 64b/line 8-way L2 cache
> cpu3: smt 0, core 3, package 0
> ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 120 pins
> acpimcfg0 at acpi0
> acpimcfg0: addr 0xe000, bus 0-255
> acpihpet0 at acpi0: 2399 Hz
> acpiprt0 at acpi0: bus 0 (PCI0)
> acpiprt1 

Re: OpenBSD 6.7 crashes on APU2C4 with LTE modem Huawei E3372s-153 HiLink

2020-06-08 Thread Patrick Wildt
On Mon, Jun 08, 2020 at 05:31:44PM +0200, Gerhard Roth wrote:
> On 2020-05-25 13:19, Martin Pieuchot wrote:
> > On 25/05/20(Mon) 12:56, Gerhard Roth wrote:
> > > On 5/22/20 9:05 PM, Mark Kettenis wrote:
> > > > > From: Łukasz Lejtkowski 
> > > > > Date: Fri, 22 May 2020 20:51:57 +0200
> > > > > 
> > > > > Probably power supply 12 V is broken. Showing 16,87 V(Fluke 179) -
> > > > > too high. Should be 12,25-12,50 V. I replaced to the new one.
> > > > 
> > > > That might be why the device stops responding.  The fact that cleaning
> > > > up from a failed USB transaction leads to this panic is a bug though.
> > > > 
> > > > And somebody just posted a very similar panic with ure(4).  Something
> > > > in the network stack is holding a mutex when it shouldn't.
> > > 
> > > I think that holding the mutex is ok. The bug is calling the stop
> > > routine in case of errors.
> > > 
> > > This is what common foo_start() does:
> > > 
> > >   m_head = ifq_deq_begin(>if_snd);
> > >   if (foo_encap(sc, m_head, 0)) {
> > >   ifq_deq_rollback(>if_snd, m_head);
> > >   ...
> > >   return;
> > >   }
> > >   ifq_deq_commit(>if_snd, m_head);
> > > 
> > > Here, ifq_deq_begin() grabs a mutex and it is held while
> > > calling foo_encap().
> > > 
> > > For USB network interfaces foo_encap() mostly does this:
> > > 
> > >   err = usbd_transfer(sc->sc_xfer);
> > >   if (err != USBD_IN_PROGRESS) {
> > >   foo_stop(sc);
> > >   return EIO;
> > >   }
> > > 
> > > And foo_stop() calls usbd_abort_pipe() -> xhci_command_submit(),
> > > which might sleep.
> > > 
> > > How to fix? We could do the foo_encap() after the ifq_deq_commit(),
> > > possibly dropping the current mbuf if encap fails (who cares
> > > for the packets after foo_stop() anyway).
> > 
> > That's the approach taken by drivers using ifq_dequeue(9) instead of
> > ifq_deq_begin/commit().
> > 
> > > Or change all the drivers to follow the path that if_aue.c takes:
> > > 
> > >   err = usbd_transfer(c->aue_xfer);
> > >   if (err != USBD_IN_PROGRESS) {
> > >   ...
> > >   /* Stop the interface from process context. */
> > >   usb_add_task(sc->aue_udev, >aue_stop_task);
> > >   return (EIO);
> > >   }
> > 
> > That's just trading the current problem for another one with higher
> > complexity.
> > 
> > > Any ideas, what's better? Or alternative proposals?
> > 
> > Using ifq_dequeue(9) would have the advantage of unifying the code base.
> > It introduces a behavior change.  A simpler fix would be to call
> > foo_stop() in the error path after ifq_deq_rollback().
> > 
> 
> Hi,
> 
> two weeks passed any nobody objected Martin's proposal. So I thought,
> we could try to move on this way.
> 
> Gerhard
> 

>From what I remember from various discussions, the goal should be to
check if there's a buffer free in the ring, then dequeue and send, and
it it can't be sent out, then drop it.  With USB apparently those
drivers "always" have an open buffer, so we can just dequeue and send,
like you do in this diff.  And if it gets dropped, that's fine.

That said, I think IFQ_DEQUEUE() is old compat code, and we actually
nowadays prefer:

m_head = ifq_dequeue(>if_snd);

If you look at the define for IFQ_DEQUEUE() you'll see it's marked
as compat code.  If you look at a new driver, like ixl(4), you'll
see that it also uses ifq_dequeue().

Sorry to to give you some work, but with that fixed: ok patrick@

Patrick

> 
> Index: sys/dev/usb/if_axe.c
> ===
> RCS file: /cvs/src/sys/dev/usb/if_axe.c,v
> retrieving revision 1.139
> diff -u -p -u -p -r1.139 if_axe.c
> --- sys/dev/usb/if_axe.c  7 Jul 2019 06:40:10 -   1.139
> +++ sys/dev/usb/if_axe.c  8 Jun 2020 15:13:25 -
> @@ -1223,6 +1223,7 @@ axe_encap(struct axe_softc *sc, struct m
>   /* Transmit */
>   err = usbd_transfer(c->axe_xfer);
>   if (err != USBD_IN_PROGRESS) {
> + c->axe_mbuf = NULL;
>   axe_stop(sc);
>   return(EIO);
>   }
> @@ -1246,16 +1247,15 @@ axe_start(struct ifnet *ifp)
>   if (ifq_is_oactive(>if_snd))
>   return;
> 
> - m_head = ifq_deq_begin(>if_snd);
> + IFQ_DEQUEUE(>if_snd, m_head);
>   if (m_head == NULL)
>   return;
> 
>   if (axe_encap(sc, m_head, 0)) {
> - ifq_deq_rollback(>if_snd, m_head);
> + m_freem(m_head);
>   ifq_set_oactive(>if_snd);
>   return;
>   }
> - ifq_deq_commit(>if_snd, m_head);
> 
>   /*
>* If there's a BPF listener, bounce a copy of this frame
> Index: sys/dev/usb/if_axen.c
> ===
> RCS file: /cvs/src/sys/dev/usb/if_axen.c,v
> retrieving revision 1.27
> diff -u -p -u -p -r1.27 if_axen.c
> --- sys/dev/usb/if_axen.c 7 Jul 2019 06:40:10 -   1.27
> +++ sys/dev/usb/if_axen.c 8 Jun 2020 14:49:26 -
> @@ -1186,6 +1186,7 

Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!

2020-06-02 Thread Patrick Wildt
On Mon, Jun 01, 2020 at 08:46:32AM +, Mikolaj Kucharski wrote:
> On Sat, Apr 18, 2020 at 02:01:14PM +0200, Patrick Wildt wrote:
> > On Tue, Apr 14, 2020 at 06:42:53PM +, Mikolaj Kucharski wrote:
> > > On Tue, Apr 14, 2020 at 11:06:00AM +0200, Patrick Wildt wrote:
> > > > On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote:
> > > > > Can you print the status (full 32 bits) of that particular QTD?
> > > > 
> > > > Yeah, I wish I could see the status field for each sqtd in that
> > > > chain.
> > > 
> > > XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, 
> > > sqtd_count=3
> > > XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, 
> > > status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1
> > > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, 
> > > status=0xc00, qtd_status=0xc00, sqtd_count=2
> > > XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, 
> > > status=0x8d00, qtd_status=0x8d00, sqtd_count=3
> > > panic: _dmamap_sync: ran off map!
> > 
> > Thanks for the information!  I'll try and see if I can find what's
> > happening.  But yeah, that's good information.  This is a control
> > transfers, which has 3 transfer descriptors.
> > 
> > The three TDs should be: SETUP, DATA (because of len=4), and STATUS.
> > 
> > As you can see, the middle one, DATA, has a len of 4, and the BYTES
> > field in the status is 0, indicating that the transfer completed.
> > 
> > Then we have STATUS, which has no len, looks just fine as well.
> > 
> > But the first one, the SETUP, is weird.  The len is fine, 8 bytes,
> > but ehci_qtd_len looks completely off, the BYTES field also says
> > "lots of bytes remaining".  This is where we fuck up, I suppose.
> > 
> > What's weird is that we actually account it, because we have this
> > check:
> > 
> > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP)
> > actlen += sqtd->len - EHCI_QTD_GET_BYTES(status);
> > 
> > But it looks like the status field on the SETUP descriptor does
> > not say PID_SETUP, but PID_IN.  Did the DMA controller change the
> > bit?  Or did someone else overwrite it?
> > 
> > Patrick
> 
> I'm kinda stuck with this. I added KASSERT()s in places where I think
> condtions should be checked and otherwise kernel should panic. Obviously
> as I can trigger bogus condition I ended up with following ddb trace:
> 
> urtwn0: SCAN -> INIT
> XXX S1 ehci_idone: actlen=4, sqtd_len=0, ehci_qtd_len=16000, status=0x3e808d00
> XXX S2 ehci_idone: actlen=-15996, sqtd_len=0, ehci_qtd_len=16000, 
> status=0x3e808d00
> panic: kernel diagnostic assertion "sqtd_len >= ehci_qtd_len" failed: file 
> "/home/mkucharski/src/sys/dev/usb/ehci.c", line 915
> ...
> db_enter() at panic+0x14c
> panic() at ehci_idone+0x264
> ehci_isoc_idone() at ehci_softintr+0x158
> ehci_softintr() at softintr_biglock_wrap+0x1c
> softintr_biglock_wrap() at softintr_dispatch+0x9c
> softintr_dispatch() at arm_do_pending_intr+0xa8
> arm_do_pending_intr() at ampintc_irq_handler+0x178
> 
> This is retyped by hand from a glass console, be aware of typos.
> 
> I don't understand why ehci_idone() goes to ehci_isoc_idone() in above
> ddb trace output.

Hi,

I haven't gotten to it yet, but mpi@ said we should try to rely less
on the hardware.  I think an idea would be to store in each sqtd whether
or not it carries data (that needs to be counted).  So then you only
read the actlen from the SQTD if it actually is supposed to carry data.
This way we don't even try to read a length from anything else, like
STATUS.

Patrick



Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!

2020-04-18 Thread Patrick Wildt
On Tue, Apr 14, 2020 at 06:42:53PM +, Mikolaj Kucharski wrote:
> On Tue, Apr 14, 2020 at 11:06:00AM +0200, Patrick Wildt wrote:
> > On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote:
> > > Can you print the status (full 32 bits) of that particular QTD?
> > 
> > Yeah, I wish I could see the status field for each sqtd in that
> > chain.
> 
> XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, 
> sqtd_count=3
> XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, 
> status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1
> XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, 
> status=0xc00, qtd_status=0xc00, sqtd_count=2
> XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, 
> status=0x8d00, qtd_status=0x8d00, sqtd_count=3
> panic: _dmamap_sync: ran off map!

Thanks for the information!  I'll try and see if I can find what's
happening.  But yeah, that's good information.  This is a control
transfers, which has 3 transfer descriptors.

The three TDs should be: SETUP, DATA (because of len=4), and STATUS.

As you can see, the middle one, DATA, has a len of 4, and the BYTES
field in the status is 0, indicating that the transfer completed.

Then we have STATUS, which has no len, looks just fine as well.

But the first one, the SETUP, is weird.  The len is fine, 8 bytes,
but ehci_qtd_len looks completely off, the BYTES field also says
"lots of bytes remaining".  This is where we fuck up, I suppose.

What's weird is that we actually account it, because we have this
check:

if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP)
actlen += sqtd->len - EHCI_QTD_GET_BYTES(status);

But it looks like the status field on the SETUP descriptor does
not say PID_SETUP, but PID_IN.  Did the DMA controller change the
bit?  Or did someone else overwrite it?

Patrick

> Above are my debug lines in dmesg generated just before panic.
> 
> The code which prints those lines is below. It's similar for-loop to
> what is currently in CVS in ehci_idone() function. I'm doing this
> for-loop twice. First version of for-loop is executed as is in CVS
> and record condition that panic will happen, then below for-loop is
> executed again with more debug output, otherwise printf() will slow
> the code path so mutch that I will not be able to trigger error
> condition.
> 
> With below code (lines from 1 to 29) I already know panic condition is
> met and I'm executing code with printf()s added.
> 
> sqtd_count=3 means that for-loop from line 6 to 28 was executed 3 times.
> 
> XXX #1# ehci_idone: len=4 actlen=4294951308, xf_actlen=0, status=0x8d00, 
> sqtd_count=3
> 
> Above message is from line 2 of the code listed below.
> 
> Then we enter the for-loop at line 6, which iterates 3 times. Each
> iteration gives following results:
> 
> XXX #2# ehci_idone: x_actlen=4294951304, sqtd_len=8, ehci_qtd_len=16000, 
> status=0xbe808d00, qtd_status=0xbe808d00, sqtd_count=1
> XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=4, ehci_qtd_len=0, 
> status=0xc00, qtd_status=0xc00, sqtd_count=2
> XXX #2# ehci_idone: x_actlen=4294951308, sqtd_len=0, ehci_qtd_len=0, 
> status=0x8d00, qtd_status=0x8d00, sqtd_count=3
> 
> then code proceeds and triggers panic.
> 
>  1if (xfer->length < actlen) {
>  2printf("XXX #1# ehci_idone: len=%u, actlen=%u, 
> xf_actlen=%u, "
>  3"status=0x%x, sqtd_count=%d\n", xfer->length, 
> actlen, xfer->actlen, status, sqtd_count);
>  4x_actlen = 0;
>  5sqtd_count = 0;
>  6for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) {
>  7sqtd_count++;
>  8usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd),
>  9BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
> 10nstatus = letoh32(sqtd->qtd.qtd_status);
> 11qtd_status = nstatus;
> 12if (nstatus & EHCI_QTD_ACTIVE)
> 13break;
> 14
> 15status = nstatus;
> 16/* halt is ok if descriptor is last, and complete */
> 17if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) 
> &&
> 18EHCI_QTD_GET_BYTES(status) == 0)
> 19status &= ~EHCI_QTD_HALTED;
> 20if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP) {
> 21sqtd_len = sqtd->len;
> 22ehci_qtd_len = EHCI_QTD_GET_BYTES(status);
> 23x_actlen += sqtd_len - ehci_qtd_len;
> 24

Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!

2020-04-14 Thread Patrick Wildt
On Mon, Apr 13, 2020 at 09:36:04PM +0200, Mark Kettenis wrote:
> > From: "Theo de Raadt" 
> > cc: Patrick Wildt 
> > Comments: In-reply-to Mikolaj Kucharski 
> >message dated "Mon, 13 Apr 2020 18:55:37 -."
> > Content-Type: text/plain; charset="us-ascii"
> > Content-ID: <61132.158680562...@cvs.openbsd.org>
> > Date: Mon, 13 Apr 2020 13:20:27 -0600
> > 
> > Mikolaj Kucharski  wrote:
> > 
> > > On Mon, Apr 13, 2020 at 12:44:54PM -0600, Theo de Raadt wrote:
> > > > Then you need to look at why this crazy value is composed here in this
> > > > loop:
> > > > 
> > > > actlen = 0;
> > > > for (sqtd = ex->sqtdstart; sqtd != NULL; sqtd = sqtd->nextqtd) {
> > > > usb_syncmem(>dma, sqtd->offs, sizeof(sqtd->qtd),
> > > > BUS_DMASYNC_POSTWRITE | BUS_DMASYNC_POSTREAD);
> > > > nstatus = letoh32(sqtd->qtd.qtd_status);
> > > > if (nstatus & EHCI_QTD_ACTIVE)
> > > > break;
> > > > 
> > > > status = nstatus;
> > > > /* halt is ok if descriptor is last, and complete */
> > > > if (sqtd->qtd.qtd_next == htole32(EHCI_LINK_TERMINATE) 
> > > > &&
> > > > EHCI_QTD_GET_BYTES(status) == 0)
> > > > status &= ~EHCI_QTD_HALTED;
> > > > if (EHCI_QTD_GET_PID(status) != EHCI_QTD_PID_SETUP)
> > > > actlen += sqtd->len - 
> > > > EHCI_QTD_GET_BYTES(status);
> > > > }
> > > 
> > > But I just described that in one of my previous emails few minutes ago.
> > > 
> > > In my diff I'm using second loop of the same code, but I guess I can
> > > provide more surgurucal diff to be more explicit. What I wrote in my
> > > previous email can be written also as follows:
> > > 
> > >   actlen += sqtd->len - EHCI_QTD_GET_BYTES(status);
> > > 
> > > In above equation, on first for loop iteration:
> > > 
> > > actlen is 0
> > > EHCI_QTD_GET_BYTES(status) is 16000
> > > sqtd->len is 8
> > > 
> > > actlen += 8 - 1600;
> > > 
> > > Which results of actlen being 4294951304, from that moment we are
> > > heading to kernel panic.
> > 
> > Then next step is to figure out why the sqtd is incoherent.
> 
> I actually suspect that the 16000 here is bogus.

Exactly.  I looked at it yesterday, without his additional info, and
realized that in this case the calculated actlen for that control
transfer must be broken.

xfer->length was 8, xfer->actlen was 15996, so something was doing
(0 - 4), or maybe (4 - 1600), which actually makes much more sense,
since otherwise sqtd->len would have been 0, which would be very
weird.

So now, why does EHCI_QTD_GET_BYTES return 16000...

> Can you print the status (full 32 bits) of that particular QTD?

Yeah, I wish I could see the status field for each sqtd in that
chain.

On the bright side:  This is not a bug in my DMA diff!  But it
exposed a bug in the actlen calculation.

Patrick



Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!

2020-04-12 Thread Patrick Wildt
On Sun, Apr 12, 2020 at 11:51:52AM +, Mikolaj Kucharski wrote:
> On Sun, Apr 12, 2020 at 10:05:28AM +0200, Patrick Wildt wrote:
> > I'm now trying to reproduce it, running this diff.  I'm sure the printfs
> > will slow the system down, but it's one way to see what's going on.
> > 
> > If you can reproduce it easily, would you mind trying again, with the
> > debug printfs?
> 
> So far I cannot reproduce kernel panic with below diff. At the end of
> this email dmesg output after machine boots up and works for few
> minutes. I will continue rebooting and hope panic will finally
> trigger. Giving how easy it was to trigger before, and as I rebooted few
> times Pinebook already I think with below changes it will not panic :/

Thats weird yhough, since it's the same diff as before, but with a
printf.  Maybe the printf changes the timing and thus the behaviour.  

> > Patrick
> > 
> > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c
> > index 41851defe4f..00a9af6903f 100644
> > --- a/sys/dev/usb/ehci.c
> > +++ b/sys/dev/usb/ehci.c
> > @@ -912,13 +912,18 @@ ehci_idone(struct usbd_xfer *xfer)
> > xfer->status = USBD_STALLED;
> > else
> > xfer->status = USBD_IOERROR; /* more info XXX */
> > -   } else
> > +   } else {
> > +   if (xfer->actlen) {

Maybe, make that printf conditional on something weird, like the
following.  Since, if it runs off the DMA buffers map, maybe actlen
is broken.

if (xfer->length < xfer->actlen)
printf("ehci_idon: 

> > +   printf("ehci_idone: len=%d, actlen=%d, cerr=%d, "
> > +   "status=0x%x type=0x%x\n", xfer->length, actlen, 
> > cerr,
> > +   status, 
> > UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes));
> > +   usb_syncmem(>dmabuf, 0, xfer->actlen,
> > +   usbd_xfer_isread(xfer) ?
> > +   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
> > +   }
> > xfer->status = USBD_NORMAL_COMPLETION;
> > +   }
> >  
> > -   if (xfer->actlen)
> > -   usb_syncmem(>dmabuf, 0, xfer->actlen,
> > -   usbd_xfer_isread(xfer) ?
> > -   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
> > usb_transfer_complete(xfer);
> > DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex));
> >  }
> 
> 
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d08 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=1580, actlen=1580, cerr=3, status=0x8c00 type=0x2
> ehci_idone: len=776, actlen=776, cerr=3, status=0x8c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=1580, actlen=1580, cerr=3, status=0x8c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2
> ehci_idone: len=1208, actlen=1208, cerr=3, status=0x80008c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2
> ehci_idone: len=1480, actlen=1480, cerr=3, status=0x8c00 type=0x2
> ehci_idone: len=752, actlen=752, cerr=3, status=0x8c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=504, actlen=504, cerr=3, status=0x80008c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2
> ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=584, actlen=584, cerr=3, status=0x80008c00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=1, actlen=1, cerr=3, status=0x8c00 type=0x0
> ehci_idone: len=2, actlen=2, cerr=3, status=0x8d00 type=0x0
> ehci_idone: len=16384, actlen=256, cerr=3, status=0xbf008d00 type=0x2
> ehci_idone: len=4, actlen=4, cerr=3, status=0x8d00 type=0x0
> ehci_idone: len=4, actlen=4, cerr=3, status=0x8d00 type=0x0
> ehci_idone: len=16384, actlen=256, cerr=3, status=0x3f008d00 type=0x2
> ehci_idone: len=384, actlen=384, cerr=3, status=0x8c00 type=0x2
> ehci_idone: len=384, actlen=

Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!

2020-04-12 Thread Patrick Wildt
On Sun, Apr 12, 2020 at 09:19:48AM +0200, Patrick Wildt wrote:
> On Sun, Apr 12, 2020 at 05:41:15AM +, Mikolaj Kucharski wrote:
> > On Sat, Apr 11, 2020 at 09:46:49PM +0200, Patrick Wildt wrote:
> > > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020
> > > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > > 
> > > > panic: _dmamap_sync: ran off map!
> > > 
> > > Oh, wow!  That means that the DMA sync is using a longer length than the
> > > DMA buffer has segments for.  It's very plausible that this happens on
> > > the Pinebook, and not on an x86 machine, because on the Pinebook it has
> > > to do DMA syncs, while on x86 that's essentially a no-op.
> > > 
> > > > Stopped at  panic+0x150:
> > > > TID PID UID PRFLAGS PFLAGS  CPU COMMAND
> > > > 242142  87433   0   0x120   3   sha256
> > > > 192114  36379   0   0x14000 0x200   2K  sdmmc2
> > > > db_enter() at panic+0x14c
> > > > panic() at ehci_idone+0x1d4
> > > > ehci_idone() at ehci_softintr+0x158
> > > 
> > > I see!  This is the one.  I think I made a slight error in this
> > > function, since for all the other drivers I made sure to sync only
> > > on successful completions, while for ehci(4) I missed this for one.
> > > 
> > > I think this diff should fix it, can you give it a go?
> > 
> > Unfortunately kernel panics in very similar way:
> 
> Ok, too bad.  I'll reproduce this on my Pinebook and hopefully have an
> update for you soon.
> 
> Patrick

I'm now trying to reproduce it, running this diff.  I'm sure the printfs
will slow the system down, but it's one way to see what's going on.

If you can reproduce it easily, would you mind trying again, with the
debug printfs?

Patrick

diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c
index 41851defe4f..00a9af6903f 100644
--- a/sys/dev/usb/ehci.c
+++ b/sys/dev/usb/ehci.c
@@ -912,13 +912,18 @@ ehci_idone(struct usbd_xfer *xfer)
xfer->status = USBD_STALLED;
else
xfer->status = USBD_IOERROR; /* more info XXX */
-   } else
+   } else {
+   if (xfer->actlen) {
+   printf("ehci_idone: len=%d, actlen=%d, cerr=%d, "
+   "status=0x%x type=0x%x\n", xfer->length, actlen, 
cerr,
+   status, 
UE_GET_XFERTYPE(xfer->pipe->endpoint->edesc->bmAttributes));
+   usb_syncmem(>dmabuf, 0, xfer->actlen,
+   usbd_xfer_isread(xfer) ?
+   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
+   }
xfer->status = USBD_NORMAL_COMPLETION;
+   }
 
-   if (xfer->actlen)
-   usb_syncmem(>dmabuf, 0, xfer->actlen,
-   usbd_xfer_isread(xfer) ?
-   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
usb_transfer_complete(xfer);
DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex));
 }



Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!

2020-04-12 Thread Patrick Wildt
On Sun, Apr 12, 2020 at 05:41:15AM +, Mikolaj Kucharski wrote:
> On Sat, Apr 11, 2020 at 09:46:49PM +0200, Patrick Wildt wrote:
> > > OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020
> > > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > > 
> > > panic: _dmamap_sync: ran off map!
> > 
> > Oh, wow!  That means that the DMA sync is using a longer length than the
> > DMA buffer has segments for.  It's very plausible that this happens on
> > the Pinebook, and not on an x86 machine, because on the Pinebook it has
> > to do DMA syncs, while on x86 that's essentially a no-op.
> > 
> > > Stopped atpanic+0x150:
> > > TID   PID UID PRFLAGS PFLAGS  CPU COMMAND
> > > 24214287433   0   0x120   3   sha256
> > > 19211436379   0   0x14000 0x200   2K  sdmmc2
> > > db_enter() at panic+0x14c
> > > panic() at ehci_idone+0x1d4
> > > ehci_idone() at ehci_softintr+0x158
> > 
> > I see!  This is the one.  I think I made a slight error in this
> > function, since for all the other drivers I made sure to sync only
> > on successful completions, while for ehci(4) I missed this for one.
> > 
> > I think this diff should fix it, can you give it a go?
> 
> Unfortunately kernel panics in very similar way:

Ok, too bad.  I'll reproduce this on my Pinebook and hopefully have an
update for you soon.

Patrick

> ddb{0}> show panic
> _dmamap_sync: ran off map!
> 
> ddb{0}> trace
> db_enter() at panic+0x14c
> panic() at ehci_idone+0x1d8
> ehci_idone() at ehci_softintr+0x158
> ehci_softintr() at softintr_biglock_wrap+0x1c
> softintr_biglock_wrap() at softintr_dispatch+0x9c
> softintr_dispatch() at arm_do_pending_intr+0xa8
> arm_do_pending_intr() at ampintc_irq_handle+0x178
> ampintc_irq_handle() at arm_cpu_intr+0x30
> arm_cpu_intr() at handle_el1h_irq+06c
> handle_el1h_irq() at sched_idle+0x220
> sched_idle() at proc_trampoline+0x10
> 
> > mpi@, kettenis@: ok?
> > 
> > Patrick
> > 
> > diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c
> > index 41851defe4f..249bb073fca 100644
> > --- a/sys/dev/usb/ehci.c
> > +++ b/sys/dev/usb/ehci.c
> > @@ -912,13 +912,14 @@ ehci_idone(struct usbd_xfer *xfer)
> > xfer->status = USBD_STALLED;
> > else
> > xfer->status = USBD_IOERROR; /* more info XXX */
> > -   } else
> > +   } else {
> > +   if (xfer->actlen)
> > +   usb_syncmem(>dmabuf, 0, xfer->actlen,
> > +   usbd_xfer_isread(xfer) ?
> > +   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
> > xfer->status = USBD_NORMAL_COMPLETION;
> > +   }
> >  
> > -   if (xfer->actlen)
> > -   usb_syncmem(>dmabuf, 0, xfer->actlen,
> > -   usbd_xfer_isread(xfer) ?
> > -   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
> > usb_transfer_complete(xfer);
> > DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex));
> >  }
> 
> -- 
> Regards,
>  Mikolaj
> 



Re: OpenBSD 6.7-beta on arm64 with urtwn(4) gets panic: _dmamap_sync: ran off map!

2020-04-11 Thread Patrick Wildt
On Sat, Apr 11, 2020 at 12:41:36PM +, Mikolaj Kucharski wrote:
> On Sat, Apr 11, 2020 at 09:06:57AM +, Mikolaj Kucharski wrote:
> > >Synopsis:  kernel panic with message _dmamap_sync: ran off map!
> > >Category:  kernel
> > >Environment:
> > System  : OpenBSD 6.7
> > Details : OpenBSD 6.7-beta (GENERIC.MP) #546: Mon Apr  6 12:39:22 
> > MDT 2020
> >  
> > dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.arm64
> > Machine : arm64
> > >Description:
> > Kernel panic. It happens after usage of network via wireless adapter
> > connected via USB, urtwn(4):
> > 
> > urtwn0 at uhub4 port 2 configuration 1 interface 0 "Realtek 802.11n WLAN 
> > Adapter" rev 2.00/2.00 addr 5
> > urtwn0: MAC/BB RTL8188CUS, RF 6052 1T1R, address 80:1f:02:4b:6a:6b
> > 
> > Currently don't have trace output from ddb as it doesn't have function 
> > names, but addresses.
> > Will try to update kernel to latest arm64 snapshot.
> > 
> > >How-To-Repeat:
> > Connect urtwn(4) to Pinebook and use network, after a while kernel 
> > panics.
> > For me simplest way is to start `sysupgrade -s -f`
> > 
> > >Fix:
> > Unknown.
> 
> OpenBSD 6.7-beta (GENERIC.MP) #552: Fri Apr 10 20:48:05 MDT 2020
> dera...@arm64.openbsd.org:/usr/src/sys/arch/arm64/compile/GENERIC.MP
> 
> panic: _dmamap_sync: ran off map!

Oh, wow!  That means that the DMA sync is using a longer length than the
DMA buffer has segments for.  It's very plausible that this happens on
the Pinebook, and not on an x86 machine, because on the Pinebook it has
to do DMA syncs, while on x86 that's essentially a no-op.

> Stopped atpanic+0x150:
> TID   PID UID PRFLAGS PFLAGS  CPU COMMAND
> 24214287433   0   0x120   3   sha256
> 19211436379   0   0x14000 0x200   2K  sdmmc2
> db_enter() at panic+0x14c
> panic() at ehci_idone+0x1d4
> ehci_idone() at ehci_softintr+0x158

I see!  This is the one.  I think I made a slight error in this
function, since for all the other drivers I made sure to sync only
on successful completions, while for ehci(4) I missed this for one.

I think this diff should fix it, can you give it a go?

mpi@, kettenis@: ok?

Patrick

diff --git a/sys/dev/usb/ehci.c b/sys/dev/usb/ehci.c
index 41851defe4f..249bb073fca 100644
--- a/sys/dev/usb/ehci.c
+++ b/sys/dev/usb/ehci.c
@@ -912,13 +912,14 @@ ehci_idone(struct usbd_xfer *xfer)
xfer->status = USBD_STALLED;
else
xfer->status = USBD_IOERROR; /* more info XXX */
-   } else
+   } else {
+   if (xfer->actlen)
+   usb_syncmem(>dmabuf, 0, xfer->actlen,
+   usbd_xfer_isread(xfer) ?
+   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
xfer->status = USBD_NORMAL_COMPLETION;
+   }
 
-   if (xfer->actlen)
-   usb_syncmem(>dmabuf, 0, xfer->actlen,
-   usbd_xfer_isread(xfer) ?
-   BUS_DMASYNC_POSTREAD : BUS_DMASYNC_POSTWRITE);
usb_transfer_complete(xfer);
DPRINTFN(/*12*/2, ("ehci_idone: ex=%p done\n", ex));
 }



Re: Unable to make cross-tools for arm64 target

2019-07-05 Thread Patrick Wildt
On Tue, Jul 02, 2019 at 02:07:22PM +0200, Krystian Lewandowski wrote:
> 
> > Wiadomość napisana przez Krystian Lewandowski  w dniu 
> > 01.07.2019, o godz. 21:50:
> > 
> > I thought it would be a good idea to rebuild cross-tools because of LLVM 
> > version bump
> > but - with recent src - I'm unable to do so:
> > $ doas make -f Makefile.cross TARGET=${target} CROSSDIR="${destdir}" 
> > cross-tools
> > fails with:
> > aarch64-unknown-openbsd6: error: unable to execute command: Segmentation 
> > fault
> > (core dumped)
> > 
> > With updated /etc/mk.conf:
> > DEBUG=-gline-tables-only
> > (I think it’s unused for clang but disables stripping during installation?)
> > 
> > and src/gnu/usr.bin/clang/Makefile.inc:
> > CPPFLAGS+=  -DNDEBUG -gline-tables-only
> > (I think it’s similar to what cmake RelWithDebugInfo does.)
> > 
> > I was eventually able to get debug symbols. More details in attached files.
> > I reproduced this crash on other machine.
> > 
> > Is this something for LLVM team to look at?
> > 
> > If anyone would like to give it a quick try, then please just follow 
> > „Setup” section from:
> > https://github.com/elewarr/openbsd-arm64-src-dev
> > 
> > -- 
> > Krystian
> > 
> > 
> > 
> > 
> 
> Reported to LLVM: https://bugs.llvm.org/show_bug.cgi?id=42478
> 
> -- 
> Krystian
> 

I see this as well, what a bummer.  What I can see is that if I use my
make wrapper to compile the files, and -j1, it doesn't crash.  I wonder
what the big difference is.



Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

2018-11-22 Thread Patrick Wildt
On Tue, Nov 20, 2018 at 07:56:31PM -0800, Carlos Cardenas wrote:
> 
> I've been running the updated patch and so far so good...
> 
> If y'all don't mind, I want to keep pounding on it for the next day or
> two before giving a thumbs up.
> 
> Thanks.
> 
> +--+
> Carlos
> 

I had a look into the IRQ controller drivers used on arm64 and armv7
and it seems like the following diff would fix the panic and also
modify mvmpic(4) similarly.  I haven't touched omgpio(4) since we
do no use IRQs there.

diff --git a/sys/arch/arm/cortex/ampintc.c b/sys/arch/arm/cortex/ampintc.c
index bdfc14b621d..19d6eebdd2a 100644
--- a/sys/arch/arm/cortex/ampintc.c
+++ b/sys/arch/arm/cortex/ampintc.c
@@ -161,8 +161,8 @@ struct intrhand {
 
 struct intrq {
TAILQ_HEAD(, intrhand) iq_list; /* handler list */
-   int iq_irq; /* IRQ to mask while handling */
-   int iq_levels;  /* IPL_*'s this IRQ has */
+   int iq_irq_max; /* IRQ to mask while handling */
+   int iq_irq_min; /* lowest IRQ when shared */
int iq_ist; /* share type */
 };
 
@@ -402,10 +402,12 @@ ampintc_calc_mask(void)
min = ih->ih_ipl;
}
 
-   if (sc->sc_ampintc_handler[irq].iq_irq == max) {
+   if (sc->sc_ampintc_handler[irq].iq_irq_max == max &&
+   sc->sc_ampintc_handler[irq].iq_irq_min == min)
continue;
-   }
-   sc->sc_ampintc_handler[irq].iq_irq = max;
+
+   sc->sc_ampintc_handler[irq].iq_irq_max = max;
+   sc->sc_ampintc_handler[irq].iq_irq_min = min;
 
if (max == IPL_NONE)
min = IPL_NONE;
@@ -538,7 +540,7 @@ ampintc_irq_handler(void *frame)
if (irq >= sc->sc_nintr)
return;
 
-   pri = sc->sc_ampintc_handler[irq].iq_irq;
+   pri = sc->sc_ampintc_handler[irq].iq_irq_max;
s = ampintc_splraise(pri);
TAILQ_FOREACH(ih, >sc_ampintc_handler[irq].iq_list, ih_list) {
 #ifdef MULTIPROCESSOR
diff --git a/sys/arch/arm64/dev/agintc.c b/sys/arch/arm64/dev/agintc.c
index 2d9afa9016d..d39530e85a0 100644
--- a/sys/arch/arm64/dev/agintc.c
+++ b/sys/arch/arm64/dev/agintc.c
@@ -171,8 +171,8 @@ struct intrhand {
 
 struct intrq {
TAILQ_HEAD(, intrhand)  iq_list;/* handler list */
-   int iq_irq; /* IRQ to mask while handling */
-   int iq_levels;  /* IPL_*'s this IRQ has */
+   int iq_irq_max; /* IRQ to mask while handling */
+   int iq_irq_min; /* lowest IRQ when shared */
int iq_ist; /* share type */
int iq_route;
 };
@@ -746,13 +746,16 @@ agintc_calc_irq(struct agintc_softc *sc, int irq)
min = ih->ih_ipl;
}
 
-   if (sc->sc_handler[irq].iq_irq == max)
-   return;
-   sc->sc_handler[irq].iq_irq = max;
-
if (max == IPL_NONE)
min = IPL_NONE;
 
+   if (sc->sc_handler[irq].iq_irq_max == max &&
+   sc->sc_handler[irq].iq_irq_min == min)
+   return;
+
+   sc->sc_handler[irq].iq_irq_max = max;
+   sc->sc_handler[irq].iq_irq_min = min;
+
 #ifdef DEBUG_AGINTC
if (min != IPL_NONE)
printf("irq %d to block at %d %d \n", irq, max, min );
@@ -828,7 +831,7 @@ agintc_route_irq(void *v, int enable, struct cpu_info *ci)
 
if (enable) {
agintc_set_priority(sc, ih->ih_irq,
-   sc->sc_handler[ih->ih_irq].iq_irq);
+   sc->sc_handler[ih->ih_irq].iq_irq_min);
agintc_route(sc, ih->ih_irq, IRQ_ENABLE, ci);
agintc_intr_enable(sc, ih->ih_irq);
}
@@ -937,7 +940,7 @@ agintc_irq_handler(void *frame)
return;
}
 
-   pri = sc->sc_handler[irq].iq_irq;
+   pri = sc->sc_handler[irq].iq_irq_max;
s = agintc_splraise(pri);
TAILQ_FOREACH(ih, >sc_handler[irq].iq_list, ih_list) {
agintc_run_handler(ih, frame, s);
diff --git a/sys/arch/arm64/dev/ampintc.c b/sys/arch/arm64/dev/ampintc.c
index 2c4ce63a150..de6ed057f95 100644
--- a/sys/arch/arm64/dev/ampintc.c
+++ b/sys/arch/arm64/dev/ampintc.c
@@ -160,8 +160,8 @@ struct intrhand {
 
 struct intrq {
TAILQ_HEAD(, intrhand) iq_list; /* handler list */
-   int iq_irq; /* IRQ to mask while handling */
-   int iq_levels;  /* IPL_*'s this IRQ has */
+   int iq_irq_max; /* IRQ to mask while handling */
+   int iq_irq_min; /* lowest IRQ when shared */
int iq_ist; /* share type */
 };
 
@@ -469,14 +469,16 @@ ampintc_calc_mask(void)
min = ih->ih_ipl;
}
 
-   if 

Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

2018-11-20 Thread Patrick Wildt
On Tue, Nov 20, 2018 at 04:44:56PM +0100, Mark Kettenis wrote:
> > Date: Mon, 19 Nov 2018 20:35:44 +0100
> > From: Patrick Wildt 
> > 
> > On Mon, Nov 19, 2018 at 08:16:31PM +0100, Patrick Wildt wrote:
> > > On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:
> > > > > That means bnxt(4) and dwpcie(4) share the same interrupt line, but 
> > > > > one
> > > > > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > > > > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > > > > interrupt is now allowed to interrupt IPL_VM.
> > > > > 
> > > > > I don't think there's a regression, I think it behaves as implemented.
> > > > > The question that remains is: How should it behave if not like that?
> 
> Admittedly the current INTx implementation in dwpcie(4) is a bit of a
> hack.  The hardware has a register that signals which of the four
> legacy PCI interrupts pin was triggered so the driver really should
> expose itself as a secondary interrupt controller.  But proper support
> for secondary interrupt controllers requires a bit of thought.  For
> one thing we'd need a way to unblock interrupts on decondary interrupt
> controllers in response to an splx(9) call.
> 
> > > > When shared, it should operate at the lowest level not the highest.  
> > > > That
> > > > may cause other difficulties.
> > > 
> > > Actually it seems like we do set the lowest level, not the highest,
> > > sorry.  But I think there might be a bug.  I will have a look.
> > > 
> > 
> > I hope I'm not wrong, but I think there's a bug in the calculation
> > code.  The ampinctc_calc_mask() function sets the priority of an IRQ
> > every time a handler is established or disestablished on the given IRQ.
> > In this case, the IPL_AUDIO is probably established before the IPL_NET,
> > thus when the IPL_NET establish happens, iq_irq is already set to "max"
> > as in IPL_AUDIO.  That means the code that sets the priority to "min" is
> > skipped, as the continue statement will take effect.
> > 
> > I guess each intrq object should have the lowest and highest IPL, and
> > the loop should only continue of both are the same.  The lowest IPL is
> > then set for the IPL priority, and the highest IPL is used to splraise()
> > once the IRQ actually hits and the handler is suppoed to run.
> > 
> > Untested, I hope that someone has a look as well and to spot if I have
> > a mistake in my understanding.
> > 
> > We have this issue in at least 4 files: agintc/ampintc for arm64/armv7.
> > 
> > Patrick
> > 
> > diff --git a/sys/arch/arm64/dev/ampintc.c b/sys/arch/arm64/dev/ampintc.c
> > index 2c4ce63a150..a1b28a20ea7 100644
> > --- a/sys/arch/arm64/dev/ampintc.c
> > +++ b/sys/arch/arm64/dev/ampintc.c
> > @@ -160,8 +160,8 @@ struct intrhand {
> >  
> >  struct intrq {
> > TAILQ_HEAD(, intrhand) iq_list; /* handler list */
> > -   int iq_irq; /* IRQ to mask while handling */
> > -   int iq_levels;  /* IPL_*'s this IRQ has */
> > +   int iq_irq_max; /* IRQ to mask while handling */
> > +   int iq_irq_min; /* lowest IRQ when shared */
> > int iq_ist; /* share type */
> >  };
> >  
> > @@ -469,14 +469,16 @@ ampintc_calc_mask(void)
> > min = ih->ih_ipl;
> > }
> >  
> > -   if (sc->sc_handler[irq].iq_irq == max) {
> > -   continue;
> > -   }
> > -   sc->sc_handler[irq].iq_irq = max;
> > -
> > if (max == IPL_NONE)
> > min = IPL_NONE;
> >  
> > +   if (sc->sc_handler[irq].iq_irq_max == max ||
> > +   sc->sc_handler[irq].iq_irq_min == min)
> > +   continue;
> 
> Shouldn't that be && instead of ||?

Yes, you are right, it should be &&.

> > +
> > +   sc->sc_handler[irq].iq_irq_max = max;
> > +   sc->sc_handler[irq].iq_irq_min = min;
> > +
> > /* Enable interrupts at lower levels, clear -> enable */
> > /* Set interrupt priority/enable */
> > if (min != IPL_NONE) {
> > @@ -604,7 +606,7 @@ ampintc_route_irq(void *v, int enable, struct cpu_info 
> > *ci)
> > bus_space_write_4(sc->sc_iot, sc->sc_d_ioh, ICD_ICRn(ih->ih_irq), 0);
> > if (enable) {
> > ampintc_set_priority(ih->ih_irq,
> > -   sc->sc_handler[ih->ih_irq].iq_irq);
> > +   sc->sc_handler[ih->ih_irq].iq_irq_min);
> > ampintc_intr_enable(ih->ih_irq);
> > }
> >  
> > @@ -646,7 +648,7 @@ ampintc_irq_handler(void *frame)
> > if (irq >= sc->sc_nintr)
> > return;
> >  
> > -   pri = sc->sc_handler[irq].iq_irq;
> > +   pri = sc->sc_handler[irq].iq_irq_max;
> > s = ampintc_splraise(pri);
> > TAILQ_FOREACH(ih, >sc_handler[irq].iq_list, ih_list) {
> >  #ifdef MULTIPROCESSOR
> > 
> > 
> 



Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

2018-11-19 Thread Patrick Wildt
On Mon, Nov 19, 2018 at 08:16:31PM +0100, Patrick Wildt wrote:
> On Mon, Nov 19, 2018 at 11:27:54AM -0700, Theo de Raadt wrote:
> > > That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
> > > has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
> > > pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
> > > interrupt is now allowed to interrupt IPL_VM.
> > > 
> > > I don't think there's a regression, I think it behaves as implemented.
> > > The question that remains is: How should it behave if not like that?
> > 
> > When shared, it should operate at the lowest level not the highest.  That
> > may cause other difficulties.
> 
> Actually it seems like we do set the lowest level, not the highest,
> sorry.  But I think there might be a bug.  I will have a look.
> 

I hope I'm not wrong, but I think there's a bug in the calculation
code.  The ampinctc_calc_mask() function sets the priority of an IRQ
every time a handler is established or disestablished on the given IRQ.
In this case, the IPL_AUDIO is probably established before the IPL_NET,
thus when the IPL_NET establish happens, iq_irq is already set to "max"
as in IPL_AUDIO.  That means the code that sets the priority to "min" is
skipped, as the continue statement will take effect.

I guess each intrq object should have the lowest and highest IPL, and
the loop should only continue of both are the same.  The lowest IPL is
then set for the IPL priority, and the highest IPL is used to splraise()
once the IRQ actually hits and the handler is suppoed to run.

Untested, I hope that someone has a look as well and to spot if I have
a mistake in my understanding.

We have this issue in at least 4 files: agintc/ampintc for arm64/armv7.

Patrick

diff --git a/sys/arch/arm64/dev/ampintc.c b/sys/arch/arm64/dev/ampintc.c
index 2c4ce63a150..a1b28a20ea7 100644
--- a/sys/arch/arm64/dev/ampintc.c
+++ b/sys/arch/arm64/dev/ampintc.c
@@ -160,8 +160,8 @@ struct intrhand {
 
 struct intrq {
TAILQ_HEAD(, intrhand) iq_list; /* handler list */
-   int iq_irq; /* IRQ to mask while handling */
-   int iq_levels;  /* IPL_*'s this IRQ has */
+   int iq_irq_max; /* IRQ to mask while handling */
+   int iq_irq_min; /* lowest IRQ when shared */
int iq_ist; /* share type */
 };
 
@@ -469,14 +469,16 @@ ampintc_calc_mask(void)
min = ih->ih_ipl;
}
 
-   if (sc->sc_handler[irq].iq_irq == max) {
-   continue;
-   }
-   sc->sc_handler[irq].iq_irq = max;
-
if (max == IPL_NONE)
min = IPL_NONE;
 
+   if (sc->sc_handler[irq].iq_irq_max == max ||
+   sc->sc_handler[irq].iq_irq_min == min)
+   continue;
+
+   sc->sc_handler[irq].iq_irq_max = max;
+   sc->sc_handler[irq].iq_irq_min = min;
+
/* Enable interrupts at lower levels, clear -> enable */
/* Set interrupt priority/enable */
if (min != IPL_NONE) {
@@ -604,7 +606,7 @@ ampintc_route_irq(void *v, int enable, struct cpu_info *ci)
bus_space_write_4(sc->sc_iot, sc->sc_d_ioh, ICD_ICRn(ih->ih_irq), 0);
if (enable) {
ampintc_set_priority(ih->ih_irq,
-   sc->sc_handler[ih->ih_irq].iq_irq);
+   sc->sc_handler[ih->ih_irq].iq_irq_min);
ampintc_intr_enable(ih->ih_irq);
}
 
@@ -646,7 +648,7 @@ ampintc_irq_handler(void *frame)
if (irq >= sc->sc_nintr)
return;
 
-   pri = sc->sc_handler[irq].iq_irq;
+   pri = sc->sc_handler[irq].iq_irq_max;
s = ampintc_splraise(pri);
TAILQ_FOREACH(ih, >sc_handler[irq].iq_list, ih_list) {
 #ifdef MULTIPROCESSOR



Re: arm64: bnxt(4) crash running pkg_add -u on mcbin single-shot

2018-11-19 Thread Patrick Wildt
On Mon, Nov 19, 2018 at 02:13:52PM -0200, Martin Pieuchot wrote:
> On 18/11/18(Sun) 14:29, Carlos Cardenas wrote:
> > Howdy.
> > 
> > After an upgrade, my mcbin-ss panic'ed while performing pkg_add -u.
> > 
> > Attached is the dmesg and ddb output(s).
> 
> This seems to be a SPL issue.  In the trace below uvm_pmr_remove_1strange()
> is called with `fpageq' lock, which is a IPL_VM mutex.  So the bnx(4)
> interrupt shouldn't be triggered.

Yeah, I think I can tell you why.  First of all, bnxt(4) does not use
MSI, even though that ARM64 platform supports it, because it's disabled
in bnxt(4).  Thus we have to use legacy interrupts.

The device tree for the PCIe nodes specify an interrupt map:


pcie@f260 {
[...]
interrupt-map-mask = <0x 0x 0x 0x>;
interrupt-map = <0x 0x 0x 0x 
0x0008 0x 0x0016 0x0004>;
interrupts = <0x 0x0016 0x0004>;
[...]
};

The "interrupts" property is used to establish an interrupt handler on
the controller itself.  In this case apparently we have a handler that
ACKs INTx interrupts and returns, because someone else will take care of
the actual interrupt handling (i.e. PCIe device driver).

sc->sc_ih = fdt_intr_establish(sc->sc_node, IPL_AUDIO | IPL_MPSAFE,
dwpcie_armada8k_intr, sc, sc->sc_dev.dv_xname);

The PCIe device driver like bnxt(4) maps the interrupt through the
interrupt-map which apparently has the same interrupt ids.

cookie = fdt_intr_establish_imap(sc->sc_node, reg,
sizeof(reg), level, func, arg, name);

That means bnxt(4) and dwpcie(4) share the same interrupt line, but one
has IPL_AUDIO and the other one IPL_NET.  Since the highest IPL on a
pin counts, this line is now IPL_AUDIO (which is >IPL_VM).  A bnxt(4)
interrupt is now allowed to interrupt IPL_VM.

I don't think there's a regression, I think it behaves as implemented.
The question that remains is: How should it behave if not like that?

Patrick



Re: bwfm fails to load firmware on Asus T100HA

2018-11-07 Thread Patrick Wildt
On Wed, Nov 07, 2018 at 10:26:11PM +0100, Jan Johansson wrote:
> Patrick Wildt  wrote:
> > Oh, that's an interesting hardware.  The second x86 machine I see that
> > has bwfm(4) on SDIO.  So the thing is, in addition to your firmware
> > you need a product-specific NVRAM file, which then needs to be processed
> > into a NVRAM binary.
> > 
> > Apparently, like on the other x86 machines, it is stored in the EFI env.
> > There's a guide how to retrieve it using linux[0].  If you can retrieve
> > it and get back to me we can make it work.
> > 
> > [0] https://wiki.debian.org/InstallingDebianOn/Asus/T100TA#WiFi
> 
> Hello!
> 
> Thank you for the respons.
> 
> Using a Debian Live USB it was easy to extract the nvram. I saved
> it as /etc/firmware/brcmfmac43340-sdio.nvram and now the dmesg
> looks like this

That's why I told you to get back to me.  This text file needs to be
processed by a tool that strips comments and adds some checksum to
the file.

The tool reads the text file and creates an nvram.out that should be
moved to /etc/firmware/brcmfmac43340-sdio.nvram.  Simply compile it
and invoke it as:

./nvram brcmfmac43340-sdio.txt

Patrick

--- /dev/null   2018-11-07 23:14:20.0 +0100
+++ nvram.c 2018-02-13 09:43:37.0 +0100
@@ -0,0 +1,480 @@
+/*
+ * Copyright (c) 2013 Broadcom Corporation
+ *
+ * Permission to use, copy, modify, and/or distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY
+ * SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION
+ * OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN
+ * CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+//#include 
+
+#define BRCMF_FW_MAX_NVRAM_SIZE64000
+#define BRCMF_FW_NVRAM_DEVPATH_LEN 19  /* devpath0=pcie/1/4/ */
+#define BRCMF_FW_NVRAM_PCIEDEV_LEN 10  /* pcie/1/4/ + \0 */
+#define BRCMF_FW_DEFAULT_BOARDREV  "boardrev=0xff"
+
+enum nvram_parser_state {
+   IDLE,
+   KEY,
+   VALUE,
+   COMMENT,
+   END
+};
+
+/**
+ * struct nvram_parser - internal info for parser.
+ *
+ * @state: current parser state.
+ * @data: input buffer being parsed.
+ * @nvram: output buffer with parse result.
+ * @nvram_len: lenght of parse result.
+ * @line: current line.
+ * @column: current column in line.
+ * @pos: byte offset in input buffer.
+ * @entry: start position of key,value entry.
+ * @multi_dev_v1: detect pcie multi device v1 (compressed).
+ * @multi_dev_v2: detect pcie multi device v2.
+ * @boardrev_found: nvram contains boardrev information.
+ */
+struct nvram_parser {
+   enum nvram_parser_state state;
+   char *data;
+   char *nvram;
+   uint32_t nvram_len;
+   uint32_t line;
+   uint32_t column;
+   uint32_t pos;
+   uint32_t entry;
+   int multi_dev_v1;
+   int multi_dev_v2;
+   int boardrev_found;
+};
+
+/**
+ * is_nvram_char() - check if char is a valid one for NVRAM entry
+ *
+ * It accepts all printable ASCII chars except for '#' which opens a comment.
+ * Please note that ' ' (space) while accepted is not a valid key name char.
+ */
+static int is_nvram_char(char c)
+{
+   /* comment marker excluded */
+   if (c == '#')
+   return 0;
+
+   /* key and value may have any other readable character */
+   return (c >= 0x20 && c < 0x7f);
+}
+
+static int is_whitespace(char c)
+{
+   return (c == ' ' || c == '\r' || c == '\n' || c == '\t');
+}
+
+static enum nvram_parser_state brcmf_nvram_handle_idle(struct nvram_parser 
*nvp)
+{
+   char c;
+
+   c = nvp->data[nvp->pos];
+   if (c == '\n')
+   return COMMENT;
+   if (is_whitespace(c) || c == '\0')
+   goto proceed;
+   if (c == '#')
+   return COMMENT;
+   if (is_nvram_char(c)) {
+   nvp->entry = nvp->pos;
+   return KEY;
+   }
+   printf("warning: ln=%d:col=%d: ignoring invalid character\n",
+ nvp->line, nvp->column);
+proceed:
+   nvp->column++;
+   nvp->pos++;
+   return IDLE;
+}
+
+static enum nvram_parser_state brcmf_nvram_handle_key(struct nvram_parser *nvp)
+{
+   enum nvram_parser_state st = nvp->state;
+   char c;
+
+   c = nvp->data[nvp->pos];
+   if (c == 

Re: bwfm fails to load firmware on Asus T100HA

2018-10-29 Thread Patrick Wildt
On Sun, Oct 28, 2018 at 06:36:57PM +0100, Jan Johansson wrote:
> >Synopsis:bwfm fails to load firmware on Asus T100HA
> >Category:amd64
> >Environment:
>   System  : OpenBSD 6.4
>   Details : OpenBSD 6.4 (GENERIC.MP) #0: Sun Oct 28 00:20:45 CEST 2018
>
> j...@dax.wenf.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
>   Firmware package has been installed by fw_update but it
> seems bwfm is not completely happy with it on this system.
> Kernel is build from OPENBSD_6_4 source checkout with "option
> BWFM_DEBUG" as I though it might be helpful. In 6.3 the interface
> was not detected at all.
> 
> bwfm0 at sdmmc1 function 1
> bwfm0: F1 signature read @0x1800=1602a94c
> bwfm0: 0x240:0  base 0x wrap 0x18106000
> bwfm0: 0x135:0  base 0x wrap 0x18107000
> bwfm0: 0x81a:18 base 0x18005000 wrap 0x18105000
> bwfm0: 0x80e:20 base 0x18004000 wrap 0x18104000
> bwfm0: 0x82a:7  base 0x18003000 wrap 0x18103000
> bwfm0: 0x829:19 base 0x18002000 wrap 0x18102000
> bwfm0: 0x812:39 base 0x18001000 wrap 0x18101000
> bwfm0: 0x800:43 base 0x1800 wrap 0x1810
> manufacturer 0x02d0, product 0x4334 at sdmmc1 function 2 not configured
> manufacturer 0x02d0, product 0x4334 at sdmmc1 function 3 not configured
> bwfm0: failed loadfirmware of file brcmfmac43340-sdio.nvram

Oh, that's an interesting hardware.  The second x86 machine I see that
has bwfm(4) on SDIO.  So the thing is, in addition to your firmware
you need a product-specific NVRAM file, which then needs to be processed
into a NVRAM binary.

Apparently, like on the other x86 machines, it is stored in the EFI env.
There's a guide how to retrieve it using linux[0].  If you can retrieve
it and get back to me we can make it work.

All the best,
Patrick

[0] https://wiki.debian.org/InstallingDebianOn/Asus/T100TA#WiFi



Re: cannot mount tmpfs on 6.3

2018-04-05 Thread Patrick Wildt
On Thu, Apr 05, 2018 at 12:20:15PM +0900, KAWAMATA Yoshihiro wrote:
> I built a 6.3 kernel with tmpfs enabled.
> 
> Here's a diff of this kernel with GENERIC:
> 
> # diff -u GENERIC GENERIC_TMPFS
> --- GENERIC Thu Mar 15 03:52:16 2018
> +++ GENERIC_TMPFS   Thu Apr  5 00:56:50 2018
> @@ -11,6 +11,7 @@
>  
>  machineamd64
>  include"../../../conf/GENERIC"
> +option TMPFS   # efficient memory file system
>  maxusers   80  # estimated number of users
>  
>  option USER_PCICONF# user-space PCI configuration
> 
> And I tried to mount tmpfs on this kernel, but it didn't:
> 
> # mount_tmpfs tmpfs /root/mnt_test
> mount_tmpfs: tmpfs on /root/mnt_test: Bad address


I have seen that issue as well.  I think it's because we are now
being passed the data and we don't have to do the copyin ourselves.
When the conversion happened, someone forgot to remove the copyin.

Does this help?

diff --git a/sys/tmpfs/tmpfs_vfsops.c b/sys/tmpfs/tmpfs_vfsops.c
index 7fe10c5f00e..b4d671d1160 100644
--- a/sys/tmpfs/tmpfs_vfsops.c
+++ b/sys/tmpfs/tmpfs_vfsops.c
@@ -121,9 +121,6 @@ tmpfs_mount(struct mount *mp, const char *path, void *data,
if (tmpfs_mem_info(1) < TMPFS_PAGES_RESERVED)
return EINVAL;
 
-   error = copyin(data, args, sizeof(struct tmpfs_args));
-   if (error)
-   return error;
if (args->ta_root_uid == VNOVAL || args->ta_root_gid == VNOVAL ||
args->ta_root_mode == VNOVAL)
return EINVAL;



Re: ikectl ca certificate create fails due to invalid subjectAltName

2017-11-07 Thread Patrick Wildt
On Sun, Nov 05, 2017 at 09:19:04PM +0100, Patrick Wildt wrote:
> On Wed, Oct 25, 2017 at 06:03:44AM -0700, j...@posteo.de wrote:
> > The patched submitted by Andrei fixed it for me.
> > There are some style issues, I fixed the ones I saw and reattached the
> > patch.
> 
> Good find by Andrei, I will have a look!
> 

The diff can be massively reduced.  The expand_string() code looks for
$ENV::CERTFQDN and $ENV::CERTIP and replaces those in the config file.
Since IP: and DNS: is prepended in the config file already, there is
no need to do the dance as the prefix will still be there after the
$ENV::STUFF has been replaced.

The issue though is that due to the rework, we store a pointer to the
stack in the "CA config" struct.  The function returns, and another
function then uses that pointer.  But it's the stack, so it's already
garbage.  The important thing from the diff is the strdup().

diff --git a/usr.sbin/ikectl/ikeca.c b/usr.sbin/ikectl/ikeca.c
index 3dacac9e83e..0bf2bbd5738 100644
--- a/usr.sbin/ikectl/ikeca.c
+++ b/usr.sbin/ikectl/ikeca.c
@@ -85,7 +85,7 @@ struct {
 };
 
 /* explicitly list allowed variables */
-const char *ca_env[][2] = {
+char *ca_env[][2] = {
{ "$ENV::CADB", NULL },
{ "$ENV::CASERIAL", NULL },
{ "$ENV::CERTFQDN", NULL },
@@ -899,20 +899,26 @@ void
 ca_clrenv(void)
 {
int  i;
-   for (i = 0; ca_env[i][0] != NULL; i++)
+   for (i = 0; ca_env[i][0] != NULL; i++) {
+   free(ca_env[i][1]);
ca_env[i][1] = NULL;
+   }
 }
 
 void
 ca_setenv(const char *key, const char *value)
 {
int  i;
+   char*p = NULL;
 
for (i = 0; ca_env[i][0] != NULL; i++) {
if (strcmp(ca_env[i][0], key) == 0) {
if (ca_env[i][1] != NULL)
errx(1, "env %s already set: %s", key, value);
-   ca_env[i][1] = value;
+   p = strdup(value);
+   if (p == NULL)
+   err(1, NULL);
+   ca_env[i][1] = p;
return;
}
}



Re: ikectl ca certificate create fails due to invalid subjectAltName

2017-11-05 Thread Patrick Wildt
On Wed, Oct 25, 2017 at 06:03:44AM -0700, j...@posteo.de wrote:
> The patched submitted by Andrei fixed it for me.
> There are some style issues, I fixed the ones I saw and reattached the
> patch.
> 
> Index: ikeca.c
> ===
> RCS file: /cvs/src/usr.sbin/ikectl/ikeca.c,v
> retrieving revision 1.46
> diff -u -p -r1.46 ikeca.c
> --- ikeca.c   8 Jun 2017 11:45:44 -   1.46
> +++ ikeca.c   25 Oct 2017 12:51:59 -
> @@ -85,11 +85,11 @@ struct {
>  };
>  
>  /* explicitly list allowed variables */
> -const char *ca_env[][2] = {
> +char *ca_env[][2] = {
>   { "$ENV::CADB", NULL },
>   { "$ENV::CASERIAL", NULL },
> - { "$ENV::CERTFQDN", NULL },
> - { "$ENV::CERTIP", NULL },
> + { "DNS:$ENV::CERTFQDN", NULL },
> + { "IP:$ENV::CERTIP", NULL },
>   { "$ENV::CERTPATHLEN", NULL },
>   { "$ENV::CERTUSAGE", NULL },
>   { "$ENV::CERT_C", NULL },
> @@ -202,23 +202,26 @@ ca_request(struct ca *ca, char *keyname,
>  {
>   charcmd[PATH_MAX * 2];
>   charhostname[HOST_NAME_MAX+1];
> - charname[128];
> + charsubjaltname[HOST_NAME_MAX+5];
>   charpath[PATH_MAX];
>  
>   ca_setenv("$ENV::CERT_CN", keyname);
>  
> - strlcpy(name, keyname, sizeof(name));
> -
>   if (type == HOST_IPADDR) {
> - ca_setenv("$ENV::CERTIP", name);
> + snprintf(subjaltname, sizeof(subjaltname), "IP:%s", keyname);
> + ca_setenv("IP:$ENV::CERTIP", subjaltname);
>   ca_setenv("$ENV::REQ_EXT", "x509v3_IPAddr");
>   } else if (type == HOST_FQDN) {
>   if (!strcmp(keyname, "local")) {
>   if (gethostname(hostname, sizeof(hostname)))
>   err(1, "gethostname");
> - strlcpy(name, hostname, sizeof(name));
> + snprintf(subjaltname, sizeof(subjaltname), "DNS:%s",
> + hostname);
> + } else {
> + snprintf(subjaltname, sizeof(subjaltname), "DNS:%s",
> + keyname);
>   }
> - ca_setenv("$ENV::CERTFQDN", name);
> + ca_setenv("DNS:$ENV::CERTFQDN", subjaltname);
>   ca_setenv("$ENV::REQ_EXT", "x509v3_FQDN");
>   } else {
>   errx(1, "unknown host type %d", type);
> @@ -306,6 +309,9 @@ ca_certificate(struct ca *ca, char *keyn
>   ca_request(ca, keyname, type);
>   ca_sign(ca, keyname, type);
>  
> + /* call ca_clrenv again to free the char*'s allocated by ca_setenv */
> + ca_clrenv();
> +
>   return (0);
>  }
>  
> @@ -440,6 +446,9 @@ ca_create(struct ca *ca)
>   /* Create the CRL revocation list */
>   ca_revoke(ca, NULL);
>  
> + /* call ca_clrenv again to free the char*'s allocated by ca_setenv */
> + ca_clrenv();
> +
>   return (0);
>  }
>  
> @@ -892,6 +901,11 @@ ca_revoke(struct ca *ca, char *keyname)
>   ca->passfile, ca->sslpath, ca->sslpath);
>   system(cmd);
>  
> + if (keyname) {
> + /* ca_revoke() called directly from ca_opt() so free char *'s */
> + ca_clrenv();
> + }
> +
>   return (0);
>  }
>  
> @@ -899,20 +913,26 @@ void
>  ca_clrenv(void)
>  {
>   int  i;
> - for (i = 0; ca_env[i][0] != NULL; i++)
> + for (i = 0; ca_env[i][0] != NULL; i++) {
> + free((char *) ca_env[i][1]);
>   ca_env[i][1] = NULL;
> + }
>  }
>  
>  void
>  ca_setenv(const char *key, const char *value)
>  {
>   int  i;
> + char*p = NULL;
>  
>   for (i = 0; ca_env[i][0] != NULL; i++) {
>   if (strcmp(ca_env[i][0], key) == 0) {
>   if (ca_env[i][1] != NULL)
>   errx(1, "env %s already set: %s", key, value);
> - ca_env[i][1] = value;
> + p = strdup(value);
> + if (p == NULL)
> + err(1, NULL);
> + ca_env[i][1] = p;
>   return;
>   }
>   }

Good find by Andrei, I will have a look!



Re: [RPI3] Kernel Relink

2017-08-24 Thread Patrick Wildt
On Wed, Aug 23, 2017 at 08:30:39PM -0300, R0me0 *** wrote:
> I have noticed this just with RPI3, every reboot, kernel relink fails. ( It
> comes from 6.1-current )
> 
> /usr/share/compile/GENERIC/relink.log
> 
> (SHA256) /bsd: OK
> LD="ld" sh makegap.sh 0xd4d4d4d4 gapdummy.o
> ld: error: gap.link:11: unknown command ;
> ld: error: gap.link:11: LONG(0xd4d4d4d4);
> ld: error: gap.link:11:   ^
> ld: error: cannot open gapdummy.o: No such file or directory
> ld: error: target emulation unknown: -m or at least one .o file required
> *** Error 1 in /usr/share/compile/GENERIC (Makefile:531 'newbsd')
> 
> 
> is it expected?
> 
> Thanks in advance,
> 
> 
> dmesg:
> 
> 
> OpenBSD 6.2-beta (GENERIC) #0: Tue Aug 22 17:53:50 AEST 2017
>j...@arm64.jsg.id.au:/usr/src/sys/arch/arm64/compile/GENERIC
> real mem  = 965042176 (920MB)
> avail mem = 909217792 (867MB)
> mainbus0 at root: Raspberry Pi 3 Model B Rev 1.2
> cpu0 at mainbus0: ARM Cortex-A53 r0p4
> simplefb0 at mainbus0: 656x416
> wsdisplay0 at simplefb0 mux 1
> wsdisplay0: screen 0 added (std, vt100 emulation)
> simplebus0 at mainbus0: "soc"
> bcmintc0 at simplebus0
> bcmdog0 at simplebus0
> pluart0 at simplebus0
> bcmaux0 at simplebus0
> com0 at simplebus0: ns16550, no working fifo
> com0: console
> dwctwo0 at simplebus0
> agtimer0 at simplebus0: tick rate 19200 KHz
> simplebus1 at mainbus0: "clocks"
> usb0 at dwctwo0: USB revision 2.0
> uhub0 at usb0 configuration 1 interface 0 "Broadcom DWC2 root hub" rev
> 2.00/1.00 addr 1
> uhub1 at uhub0 port 1 configuration 1 interface 0 "Standard Microsystems
> product 0x9514" rev 2.00/2.00 addr 2
> smsc0 at uhub1 port 1 configuration 1 interface 0 "Standard Microsystems
> SMSC9512/14" rev 2.00/2.00 addr 3
> smsc0: address b8:27:eb:78:27:cd
> ukphy0 at smsc0 phy 1: Generic IEEE 802.3u media interface, rev. 3: OUI
> 0x0001f0, model 0x000c
> umass0 at uhub1 port 3 configuration 1 interface 0 "Kingston DataTraveler
> 2.0" rev 2.00/1.00 addr 4
> umass0: using SCSI over Bulk-Only
> scsibus0 at umass0: 2 targets, initiator 0
> sd0 at scsibus0 targ 1 lun 0:  SCSI2
> 0/direct removable serial.09511665FDB1A9792441
> sd0: 14762MB, 512 bytes/sector, 30233588 sectors
> urtwn0 at uhub1 port 4 configuration 1 interface 0 "802.11n USB WLAN" rev
> 2.00/2.00 addr 5
> urtwn0: MAC/BB RTL8192CU, RF 6052 2T2R, address 98:de:d0:11:ac:a1
> vscsi0 at root
> scsibus1 at vscsi0: 256 targets
> softraid0 at root
> scsibus2 at softraid0: 256 targets
> bootfile: sd0a:/bsd
> boot device: sd0
> root on sd0a (5023b2de782a611f.a) swap on sd0b dump on sd0b
> WARNING: CHECK AND RESET THE DATE!

This is to be expected until LLVM 5.0 hits the tree, which should have
an improved lld/linker that can cope with the magic linker script.



arm: pmap uvm_fault findings

2016-07-26 Thread Patrick Wildt
Hi,

I've been trying to debug a pmap issue on ARM.  I have no solution, but
I would like to share my findings so far.

First of all, the panic/uvm_fault:


login:
uvm_fault(0xca06c858, 0, 1, 0) -> e
Fatal kernel mode data abort: 'Translation Fault (P)'
trapframe: 0xc986bd98
DFSR=0007, DFAR=000c, spsr=6113
r0 =, r1 =ccaa7148, r2 =00136000, r3 =
r4 =0012, r5 =ccaa7148, r6 =3213, r7 =0005
r8 =3213000e, r9 =0021, r10=cc634698, r11=c986be34
r12=cd43a9a0, ssp=c986bde8, slr=c054c1d0, pc =c054c1d0

Stopped at  pmap_enter+0x178:   ldr r6, [r0, #0x00c]
ddb> trace
pmap_enter+0xc
scp=0xc054c064 rlv=0xc04f60c0 (uvm_fault+0xa28)
rsp=0xc986be38 rfp=0xc986bf54
r10=0x r9=0x0001 r8=0x r7=0xc050fe80
r6=0x r5=0xc4a5aad0 r4=0x0001
uvm_fault+0xc
scp=0xc04f56a4 rlv=0xc0547e68 (data_abort_handler+0x264)
rsp=0xc986bf58 rfp=0xc986bfac
r10=0xc986bfb0 r9=0x0007 r8=0xc986a000 r7=0x0001
r6=0xca06c858 r5=0x0001 r4=0x00136000
data_abort_handler+0xc
scp=0xc0547c10 rlv=0xc05477bc (exception_exit)
rsp=0xc986bfb0 rfp=0xbffc2430
r10=0x00da776c r9=0x0001 r8=0x60165dbc r7=0x
r6=0x0200 r5=0x00d0c920 r4=0x4188e9f4

The faulting instruction is in pmap_enter() of pamp7.c:

if (opg) {
/*
 * Replacing an existing mapping with a new one.
 * It is part of our managed memory so we
 * must remove it from the PV list
 */
pve = pmap_remove_pv(opg, pm, va);
pve is NULL ->  oflags = pve->pv_flags;
} else

How did we come here?  pmap_enter() was called to enter mapping for a
given virtual and physical address.  The VA is 0x136000.  Then it looked
if there is already a mapping at the given address.  Turns out there is!
opte (old pagetable entry) is set to 0x429f202c.  This is an "invalid"
(means not active) page entry pointing to the physical page 0x429f2000.
Using the vm_pages struct for that physical address, we can have a look
at its virtual addresses.

vm_pages:
0xc4f86754: 429f1000
0xc4f86758: 0
0xc4f8675c: 0
0xc4f86760: 3
0xc4f86764: 0
0xc4f86768: 0
0xc4f8676c: 0
0xc4f86770: c47263f0
0xc4f86774: c4f86590
0xc4f86778: c4a721d0
0xc4f8677c: c520d200
0xc4f86780: 0
0xc4f86784: 0
0xc4f86788: ca32b230
0xc4f8678c: 0
0xc4f86790: 0
0xc4f86794: 0
0xc4f86798: 14
0xc4f8679c: 41
0xc4f867a0: 0
0xc4f867a4: 429f2000  <-- vm_page.phys_addr
0xc4f867a8: 0
0xc4f867ac: cd43a9a0  <-- vm_page.mdpage.pvh_list
0xc4f867b0: 3
0xc4f867b4: 0
0xc4f867b8: 0
0xc4f867bc: 0
0xc4f867c0: 
0xc4f867c4: 
0xc4f867c8: c4f50760
0xc4f867cc: c4c67990
0xc4f867d0: c50374d0
0xc4f867d4: 0
0xc4f867d8: 0
0xc4f867dc: c0714248
0xc4f867e0: cf2c6000
0xc4f867e4: 0
0xc4f867e8: 4d
0xc4f867ec: 385
0xc4f867f0: 1
0xc4f867f4: 429f3000

vm_page.mdpage.pvh_list:
0xcd43a9a0: 0 <-- next ptr in list
0xcd43a9a4: ccaa7148 <- pmap
0xcd43a9a8: 717c7000 <- va
0xcd43a9ac: b <- flags

This means, the physical page already has a VA.  Oh, and it belongs
to the same pmap.  Have a look at register r5, that's the pmap we want
to enter a mapping for.  But it's not the kernel pmap:

ddb> print kernel_pmap_store
c073b8f4

So wait, the only VA for that physical address is 0x717c7000; but the
VA I used to look the physical page up is 0x136000.  That is weird.

So now we got a pmap_enter() for a physical address and a virtual
address, where the virtual address is already used by the same pmap
for another physical address.  But that physical address thinks a
completely different VA is using it.

The phys address behind the mapping is 0x429f2000.  The phys address
it wanted to map the VA to is in R6: 0x3213.

That's probably not enough to find the exact fault, but it's a writeup
of what I have found so far.

Patrick



Re: ARMv7 Kernel crash (most recent snapshot) on Cubieboard 2

2016-03-19 Thread Patrick Wildt
On Sat, Mar 19, 2016 at 11:36:14AM +0100, Peter Oruba wrote:
> Hello,
> 
> I was testing the most recent ARMv7 snapshot on my Cubieboard2 and got the 
> crash documented in the attached file. Please let me know if I can be of any 
> help.
> 
> Thanks,
> Peter
> 

Hi,

unfortunately that issue is known but not yet resolved.

There's a temporary workaround for it.  I think it would be better to
only flush the TLB for a specified VA, not the entire TLB.  I have not
yet cooked a diff for that though.

Patrick

diff --git sys/arch/arm/include/pmap.h sys/arch/arm/include/pmap.h
index f8a4c9e..50338a9 100644
--- sys/arch/arm/include/pmap.h
+++ sys/arch/arm/include/pmap.h
@@ -328,6 +328,8 @@ do {
\
};  \
cpu_drain_writebuf();   \
}   \
+   asm("mcr p15, 0, r0, c8, c7, 0 /* Invalidate entire unified TLB */");   
\
+   cpu_drain_writebuf();   \
 } while (/*CONSTCOND*/0)
 
 #definePTE_SYNC_RANGE(pte, cnt)
\
@@ -344,6 +346,8 @@ do {
\
};  \
cpu_drain_writebuf();   \
}   \
+   asm("mcr p15, 0, r0, c8, c7, 0 /* Invalidate entire unified TLB */");   
\
+   cpu_drain_writebuf();   \
 } while (/*CONSTCOND*/0)
 
 #definel1pte_valid(pde)(((pde) & L1_TYPE_MASK) != L1_TYPE_INV)