Re: endless loop in tcpdump
dunno why the strange combination of To/Cc headers so I'll keep bugs@ in Cc: On Sat, Oct 24 2020, p...@centroid.eu wrote: >>Synopsis: a specially crafted packet can set tcpdump into an endless loop >>Category: system >>Environment: > System : OpenBSD 6.8 > Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct 4 18:13:26 MDT 2020 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 >>Description: > I was (un)fortunate enough to have this treat of a bug from a cracker > on the 'net. Thanks! They sent me an infinite loop in tcpdump. I bug hunted > and shared my findings on the misc@ mailing list. I wasn't sure becuase I'm > not the best code reader so I crafted a DOS to exploit this infinite loop. > I have mailed OpenBSD off-lists with this proof of concept code. >>How-To-Repeat: > before patch: > > tcpdump -v -n -i lo0 port > > 10:30:54.667969 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok] GTPv1-C (teid > 0, len 0) [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS > Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS > Support] [MBMS Support] [MBMS Support] ... > 2 packets received by filter > 0 packets dropped by kernel > > This was triggered with netcat: > > nc -up 2123 localhost < dos.packet > >>Fix: > after patch: > > tcpdump.p: listening on lo0, link-type LOOP > 10:43:41.005389 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok] GTPv1-C (teid > 0, len 0) [|GTPv1-C] (ttl 64, id 58060, len 41) > ^C > 2 packets received by filter > 0 packets dropped by kernel > spica# tcpdump.p -v -n -i lo0 -Xp-sp1500pport8 > tcpdump.p: listening on lo0, link-type LOOP > 10:44:11.956464 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok] GTPv1-C (teid > 0, len 0) [|GTPv1-C] (ttl 64, id 18693, len 41) > : 4500 0029 4905 4011 33bd 7f00 0001E..)I...@.3. > 0010: 7f00 0001 084b 22b8 0015 a2bd 3400 .K".4... > 0020: 0001 00 . > > ^C > 2 packets received by filter > 0 packets dropped by kernel > > The patch looks like follows: > > Index: print-gtp.c > === > RCS file: /cvs/src/usr.sbin/tcpdump/print-gtp.c,v > retrieving revision 1.12 > diff -u -p -u -r1.12 print-gtp.c > --- print-gtp.c 20 May 2020 01:20:37 - 1.12 > +++ print-gtp.c 24 Oct 2020 08:56:00 - > @@ -927,6 +927,8 @@ gtp_v1_print(const u_char *cp, u_int len > > /* Header length is a 4 octet multiplier. */ I've never seen GTP in the wild but indeed a length of zero is invalid. In case it can help other reviewers: https://www.etsi.org/deliver/etsi_ts/129000_129099/129060/15.05.00_60/ts_129060v150500p.pdf > hlen = (int)p[0] * 4; > + if (hlen == 0) > + goto trunc; LGTM, though I'd suggest printing why we're bailing out. Index: print-gtp.c === RCS file: /d/cvs/src/usr.sbin/tcpdump/print-gtp.c,v retrieving revision 1.12 diff -u -p -p -u -r1.12 print-gtp.c --- print-gtp.c 20 May 2020 01:20:37 - 1.12 +++ print-gtp.c 24 Oct 2020 22:02:47 - @@ -927,6 +927,11 @@ gtp_v1_print(const u_char *cp, u_int len /* Header length is a 4 octet multiplier. */ hlen = (int)p[0] * 4; + if (hlen == 0) { + printf(" [Invalid zero-length header %u]", + nexthdr); + goto trunc; + } TCHECK2(p[0], hlen); switch (nexthdr) { -- jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF DDCC 0DFA 74AE 1524 E7EE
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
> 2020-10-24 16:41 GMT+02:00 Stefan Sperling : > > On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote: > > > Fair enough, but "there's no auto-assembly and it's inefficient and > > > nothing stops you from messing with the intermediate discipline" is a > > > different kind of not supported than "you should expect kernel panics". > > > > > > If the latter is the case, maybe it should be documented in the > > > softraid(4) CAVEATS, as it breaks the sd(4) abstraction. > > > > Neither Joel's mail nor the word "unsupported" imply a promise > > that it will work without auto-assembly and with inefficient i/o. > > > > Unsupported means unsupported. We don't need to list any reasons > > for this in user-facing documentation. > > I'm not suggesting justifying why, I am saying that softraid(4) is > documented to assemble sd(4) devices into sd(4) devices. If it's > actually "sd(4) devices that are not themselves softraid(4) backed", > that would be worth documenting as it breaks the sd(4) abstraction. > > Said another way, how was I supposed to find out this is unsupported? > It's not like "a mirrored full-disk encrypted device" is an exotic > configuration that would give me pause. It's documented in the FAQ: > Note that "stacking" softraid modes (mirrored drives and encryption, > for example) is not supported at this time https://www.openbsd.org/faq/faq14.html#softraidFDE
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
2020-10-24 19:26 GMT+02:00 Theo de Raadt : > Filippo Valsorda wrote: > > > 2020-10-24 19:01 GMT+02:00 Theo de Raadt : > > > > Filippo Valsorda wrote: > > > > > Said another way, how was I supposed to find out this is unsupported? > > > > The way you just found out. > > > > > It's not like "a mirrored full-disk encrypted device" is an exotic > > > configuration that would give me pause. > > > > there's a song that goes "You can't always get what you want" > > > > Nothing is perfect. Do people rail against other groups in the same way? > > > > Alright, I'm disengaging. > > > > This was a bizarre interaction, I just reported a crash that doesn't > > even affect me anymore (I was disassembling that system), trying to > > follow the reporting guidelines as much as possible, for something that > > I had no way of knowing was unsupported. > > > > You are disengaging... but just have to get ONE MORE snipe in! > > Meanwhile, no diff. Not for the kernel, that would be difficult. > > But no diff for the manual pages either (it is rather obviously that > the people who hit this would know what what pages they read, and > where they should have seen a warning, and what form it should take) Ah, if you're interested in a patch for the manual page, happy to send one. I'll read the contribution docs and send one tomorrow. I had suggested both the page and the section where I would have found a warning, but sending a diff telling you what you support and what you don't felt more like overstepping. In my own projects, I prefer users don't do that, as they can't know the boundary of what is supported and what is not.
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
Filippo Valsorda wrote: > 2020-10-24 19:01 GMT+02:00 Theo de Raadt : > > Filippo Valsorda wrote: > > > Said another way, how was I supposed to find out this is unsupported? > > The way you just found out. > > > It's not like "a mirrored full-disk encrypted device" is an exotic > > configuration that would give me pause. > > there's a song that goes "You can't always get what you want" > > Nothing is perfect. Do people rail against other groups in the same way? > > Alright, I'm disengaging. > > This was a bizarre interaction, I just reported a crash that doesn't > even affect me anymore (I was disassembling that system), trying to > follow the reporting guidelines as much as possible, for something that > I had no way of knowing was unsupported. > You are disengaging... but just have to get ONE MORE snipe in! Meanwhile, no diff. Not for the kernel, that would be difficult. But no diff for the manual pages either (it is rather obviously that the people who hit this would know what what pages they read, and where they should have seen a warning, and what form it should take) But no. Either the margin is too narrow for such a diff, and it's easier to assume that "I am right" commentary will generate results. Some users really are their own worst enemy.
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
2020-10-24 19:01 GMT+02:00 Theo de Raadt : > Filippo Valsorda wrote: > > > Said another way, how was I supposed to find out this is unsupported? > > The way you just found out. > > > It's not like "a mirrored full-disk encrypted device" is an exotic > > configuration that would give me pause. > > there's a song that goes "You can't always get what you want" > > > Nothing is perfect. Do people rail against other groups in the same way? Alright, I'm disengaging. This was a bizarre interaction, I just reported a crash that doesn't even affect me anymore (I was disassembling that system), trying to follow the reporting guidelines as much as possible, for something that I had no way of knowing was unsupported.
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
Filippo Valsorda wrote: > Said another way, how was I supposed to find out this is unsupported? The way you just found out. > It's not like "a mirrored full-disk encrypted device" is an exotic > configuration that would give me pause. there's a song that goes "You can't always get what you want" Nothing is perfect. Do people rail against other groups in the same way?
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
2020-10-24 16:41 GMT+02:00 Stefan Sperling : > On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote: > > Fair enough, but "there's no auto-assembly and it's inefficient and > > nothing stops you from messing with the intermediate discipline" is a > > different kind of not supported than "you should expect kernel panics". > > > > If the latter is the case, maybe it should be documented in the > > softraid(4) CAVEATS, as it breaks the sd(4) abstraction. > > Neither Joel's mail nor the word "unsupported" imply a promise > that it will work without auto-assembly and with inefficient i/o. > > Unsupported means unsupported. We don't need to list any reasons > for this in user-facing documentation. I'm not suggesting justifying why, I am saying that softraid(4) is documented to assemble sd(4) devices into sd(4) devices. If it's actually "sd(4) devices that are not themselves softraid(4) backed", that would be worth documenting as it breaks the sd(4) abstraction. Said another way, how was I supposed to find out this is unsupported? It's not like "a mirrored full-disk encrypted device" is an exotic configuration that would give me pause.
Intel I219-LM Failure
Following on from Message-ID https://marc.info/?i=3f8cd8d2d3e7b4e8%20()%20p50%20!%20cerberus%20!%20com This error notifications occurs when the laptop is rebooted. If one boots the device with a Linux distribution (usb); poweroff the device the next time openbsd boots up the error message has gone. As soon as one reboots openbsd the error message is back. The only way to avoid the error message is to poweroff the device ("doas halt -p") and not reboot OpenBSD 6.8-current (GENERIC.MP) #128: Tue Oct 20 23:51:05 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 16977334272 (16190MB) avail mem = 16447705088 (15685MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x56fd5000 (67 entries) bios0: vendor LENOVO version "N1EET89W (1.62 )" date 06/18/2020 bios0: LENOVO 20ENCTO1WW acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP UEFI SSDT SSDT ECDT HPET APIC MCFG SSDT DBGP DBG2 BOOT BATB SSDT SSDT MSDM SSDT SSDT DMAR ASF! FPDT BGRT UEFI acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) XHCI(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpiec0 at acpi0 acpihpet0 at acpi0: 2399 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2594.91 MHz, 06-5e-03 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 24MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2 at mainbus0: apid 4 (application processor) cpu2: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 2, package 0 cpu3 at mainbus0: apid 6 (application processor) cpu3: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 0, core 3, package 0 cpu4 at mainbus0: apid 1 (application processor) cpu4: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03 cpu4:
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
Demi M. Obenour wrote: > On 10/24/20 10:41 AM, Stefan Sperling wrote: > > On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote: > >> Fair enough, but "there's no auto-assembly and it's inefficient and > >> nothing stops you from messing with the intermediate discipline" is a > >> different kind of not supported than "you should expect kernel panics". > >> > >> If the latter is the case, maybe it should be documented in the > >> softraid(4) CAVEATS, as it breaks the sd(4) abstraction. > > > > Neither Joel's mail nor the word "unsupported" imply a promise > > that it will work without auto-assembly and with inefficient i/o. > > > > Unsupported means unsupported. We don't need to list any reasons > > for this in user-facing documentation. > > One could also argue that the kernel must never panic because userspace > did something wrong. The only exceptions I am aware of are: > > - init dying > - corrupt kernel image > - corrupt root filesystem > - not being able to mount the root filesystem > - overwriting kernel memory with /dev/mem or DMA > - hardware fault Really. rm -rf / reboot Oh my god, it panics on reboot. And hundreds of other possible ways for root to configure a broken system for the next operation. Sadly, the margin was too narrow for any solution in the form of source code or diff, instead we as developers get instructed on What To Do. If you guys aren't part of the solution, you are part of the precipitate.
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
On 10/24/20 10:41 AM, Stefan Sperling wrote: > On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote: >> Fair enough, but "there's no auto-assembly and it's inefficient and >> nothing stops you from messing with the intermediate discipline" is a >> different kind of not supported than "you should expect kernel panics". >> >> If the latter is the case, maybe it should be documented in the >> softraid(4) CAVEATS, as it breaks the sd(4) abstraction. > > Neither Joel's mail nor the word "unsupported" imply a promise > that it will work without auto-assembly and with inefficient i/o. > > Unsupported means unsupported. We don't need to list any reasons > for this in user-facing documentation. One could also argue that the kernel must never panic because userspace did something wrong. The only exceptions I am aware of are: - init dying - corrupt kernel image - corrupt root filesystem - not being able to mount the root filesystem - overwriting kernel memory with /dev/mem or DMA - hardware fault In particular, I would expect that at securelevel 1 or higher, userspace should not be able to cause a fatal kernel page fault. Demi OpenPGP_0xB288B55FFF9C22C1.asc Description: application/pgp-keys OpenPGP_signature Description: OpenPGP digital signature
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote: > Fair enough, but "there's no auto-assembly and it's inefficient and > nothing stops you from messing with the intermediate discipline" is a > different kind of not supported than "you should expect kernel panics". > > If the latter is the case, maybe it should be documented in the > softraid(4) CAVEATS, as it breaks the sd(4) abstraction. Neither Joel's mail nor the word "unsupported" imply a promise that it will work without auto-assembly and with inefficient i/o. Unsupported means unsupported. We don't need to list any reasons for this in user-facing documentation.
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
2020-10-24 15:37 GMT+02:00 Stefan Sperling : > On Sat, Oct 24, 2020 at 03:10:05PM +0200, Filippo Valsorda wrote: > > >Synopsis: kernel page fault in "sddetach -> bufq_destroy" during "bioctl > > >-d" > > >Category: kernel > > >Environment: > > System : OpenBSD 6.8 > > Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct 4 18:13:26 MDT 2020 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > > > Architecture: OpenBSD.amd64 > > Machine : amd64 > > >Description: > > Starting with two RAID 1 arrays with a CRYPTO device on top of each (see > > "devices" below), I first unmounted the filesystems, then successfully > > detached the CRYPTO devices, and then tried to detach the first RAID 1. > > Stacking softraid volumes is not supported. > > That's the best answer at this point in time. It's clear that a raid1+crypto > solution is needed, but nobody has done the work to make it happen. > > As Joel explains here: > https://marc.info/?l=openbsd-misc=154349798307366=2 > you get to keep the pieces when it breaks. Fair enough, but "there's no auto-assembly and it's inefficient and nothing stops you from messing with the intermediate discipline" is a different kind of not supported than "you should expect kernel panics". If the latter is the case, maybe it should be documented in the softraid(4) CAVEATS, as it breaks the sd(4) abstraction.
Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
On Sat, Oct 24, 2020 at 03:10:05PM +0200, Filippo Valsorda wrote: > >Synopsis:kernel page fault in "sddetach -> bufq_destroy" during "bioctl > >-d" > >Category:kernel > >Environment: > System : OpenBSD 6.8 > Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct 4 18:13:26 MDT 2020 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > Architecture: OpenBSD.amd64 > Machine : amd64 > >Description: > Starting with two RAID 1 arrays with a CRYPTO device on top of each (see > "devices" below), I first unmounted the filesystems, then successfully > detached the CRYPTO devices, and then tried to detach the first RAID 1. Stacking softraid volumes is not supported. That's the best answer at this point in time. It's clear that a raid1+crypto solution is needed, but nobody has done the work to make it happen. As Joel explains here: https://marc.info/?l=openbsd-misc=154349798307366=2 you get to keep the pieces when it breaks.
Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"
>Synopsis: kernel page fault in "sddetach -> bufq_destroy" during "bioctl >-d" >Category: kernel >Environment: System : OpenBSD 6.8 Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct 4 18:13:26 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: Starting with two RAID 1 arrays with a CRYPTO device on top of each (see "devices" below), I first unmounted the filesystems, then successfully detached the CRYPTO devices, and then tried to detach the first RAID 1. # bioctl -d sd8 # CRYPTO on top of sd5a # bioctl -d sd7 # CRYPTO on top of sd6a # bioctl -d sd5 # RAID 1 on top of sd2a,sd3a The machine immediately dropped into ddb, which I reached from the serial console. I was unable to recover the console output before the panic. ddb{0}> show panic kernel page fault uvm_fault(0x82153778, 0x0002, 0, 1) -> e bufq_destroy(80115710) at bufq_destroy+0x83 end trace frame: 0x8000227379a0, count: 0 ddb{0}> trace bufq_destroy(80115710) at bufq_destroy+0x83 sddetach(80115600,1) at sddetach+0x42 config_detach(80115600,1) at config_detach+0x142 scsi_detach_link(80125800,1) at scsi_detach_link+0x4d sr_discipline_shutdown(80124000,1,0) at sr_discipline_shutdown+0x13e sr_bio_handler(8012,80124000,c2d04227,80655c00) at sr_bio_handler+0x1ce sdioctl(d52,c2d04227,80655c00,3,800022604030) at sdioctl+0x4e9 VOP_IOCTL(fd8234962018,c2d04227,80655c00,3,fd828b7bd300,800022604030) at VOP_IOCTL+0x55 vn_ioctl(fd80edfe2788,c2d04227,80655c00,800022604030) at vn_ioctl+0x75 sys_ioctl(800022604030,800022737e60,800022737ec0) at sys_ioctl+0x2d4 syscall(800022737f30) at syscall+0x389 Xsyscall() at Xsyscall+0x128 end of kernel end trace frame: 0x7f7de2e0, count: -12 After rebooting that RAID 1 was gone. I successfully detached the remaining RAID 1, and generated the sendbug(1) output after wiping sd0,sd1,sd2,sd3. devices: ==> sd0 / duid: b27184271af409ec sd0: , serial WD-WCC4N2HCXC13 a: 2794.5G 64RAID c: 2794.5G0 unused ==> sd1 / duid: cbcf06a16339dea8 sd1: , serial WD-WCC4N2KYDERF a: 2794.5G 64RAID c: 2794.5G0 unused ==> sd2 / duid: 612a362a9042bc12 sd2: , serial WD-WCC4N4ARTXUZ a: 2794.5G 64RAID c: 2794.5G0 unused ==> sd3 / duid: 6c4bf1c584cf2ad5 sd3: , serial WD-WCC4N5VRVF3C a: 2794.5G 64RAID c: 2794.5G0 unused ==> sd4 / duid: 3141c7e6e6fc07f5 sd4: , serial (unknown) a: 0.3G 64 4.2BSD 2048 16384 5657 # / b: 0.5G 730144swap c:14.6G0 unused d: 0.4G 1739776 4.2BSD 2048 16384 7206 # /mfs/dev e: 0.6G 2662144 4.2BSD 2048 16384 9870 # /mfs/var f: 1.9G 3925504 4.2BSD 2048 16384 12960 # /usr g: 0.5G 7843264 4.2BSD 2048 16384 8061 # /home h: 1.6G 8883424 4.2BSD 2048 16384 12960 # /usr/local i: 1.4G 12249248 4.2BSD 2048 16384 12960 # /usr/src j: 5.2G 15080800 4.2BSD 2048 16384 12960 # /usr/obj ==> sd5 / duid: 7cc6a1f7b9a86bc3 Volume Status Size Device softraid0 0 Online 3000592646656 sd5 RAID1 0 Online 3000592646656 0:0.0 noencl 1 Online 3000592646656 0:1.0 noencl a: 2794.5G0RAID c: 2794.5G0 unused ==> sd6 / duid: b5e9c71ced61dca9 Volume Status Size Device softraid0 1 Online 3000592646656 sd6 RAID1 0 Online 3000592646656 1:0.0 noencl 1 Online 3000592646656 1:1.0 noencl a: 1863.0G0RAID c: 2794.5G0 unused d: 930.5G 3907021696 4.2BSD 8192 65536 52238 # /array/ct ==> sd7 / duid: d74b6781f7ca7b59 Volume Status Size Device softraid0 2 Online 2000394827264 sd7 CRYPTO 0 Online 2000394827264 2:0.0 noencl a: 1863.0G 64 4.2BSD 8192 65536 24688 # /array/misc c: 1863.0G0 unused ==> sd8 / duid: 246216bff2633caa Volume Status Size Device softraid0 3 Online 3000592376320 sd8 CRYPTO 0 Online 3000592376320 3:0.0 noencl a: 931.5G0 4.2BSD 8192 65536 52238 # /array/main c: 2794.5G
endless loop in tcpdump
>Synopsis: a specially crafted packet can set tcpdump into an endless loop >Category: system >Environment: System : OpenBSD 6.8 Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct 4 18:13:26 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP Architecture: OpenBSD.amd64 Machine : amd64 >Description: I was (un)fortunate enough to have this treat of a bug from a cracker on the 'net. Thanks! They sent me an infinite loop in tcpdump. I bug hunted and shared my findings on the misc@ mailing list. I wasn't sure becuase I'm not the best code reader so I crafted a DOS to exploit this infinite loop. I have mailed OpenBSD off-lists with this proof of concept code. >How-To-Repeat: before patch: tcpdump -v -n -i lo0 port 10:30:54.667969 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok] GTPv1-C (teid 0, len 0) [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] ... 2 packets received by filter 0 packets dropped by kernel This was triggered with netcat: nc -up 2123 localhost < dos.packet >Fix: after patch: tcpdump.p: listening on lo0, link-type LOOP 10:43:41.005389 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok] GTPv1-C (teid 0, len 0) [|GTPv1-C] (ttl 64, id 58060, len 41) ^C 2 packets received by filter 0 packets dropped by kernel spica# tcpdump.p -v -n -i lo0 -Xp-sp1500pport8 tcpdump.p: listening on lo0, link-type LOOP 10:44:11.956464 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok] GTPv1-C (teid 0, len 0) [|GTPv1-C] (ttl 64, id 18693, len 41) : 4500 0029 4905 4011 33bd 7f00 0001 E..)I...@.3. 0010: 7f00 0001 084b 22b8 0015 a2bd 3400 .K".4... 0020: 0001 00 . ^C 2 packets received by filter 0 packets dropped by kernel The patch looks like follows: Index: print-gtp.c === RCS file: /cvs/src/usr.sbin/tcpdump/print-gtp.c,v retrieving revision 1.12 diff -u -p -u -r1.12 print-gtp.c --- print-gtp.c 20 May 2020 01:20:37 - 1.12 +++ print-gtp.c 24 Oct 2020 08:56:00 - @@ -927,6 +927,8 @@ gtp_v1_print(const u_char *cp, u_int len /* Header length is a 4 octet multiplier. */ hlen = (int)p[0] * 4; + if (hlen == 0) + goto trunc; TCHECK2(p[0], hlen); switch (nexthdr) { dmesg: OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct 4 18:13:26 MDT 2020 dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 17059287040 (16269MB) avail mem = 16527237120 (15761MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x8afac000 (32 entries) bios0: vendor Apple Inc. version "186.0.0.0.0" date 06/14/2019 bios0: Apple Inc. MacBookPro12,1 acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP HPET APIC SBST ECDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT DMAR MCFG acpi0: wakeup devices PEG0(S3) EC__(S3) HDEF(S3) RP01(S3) RP02(S3) RP03(S4) ARPT(S4) RP05(S3) RP06(S3) SPIT(S3) XHC1(S3) ADP1(S3) LID0(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz, 2800.35 MHz, 06-3d-04 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 100MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 2 (application processor) cpu1: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz, 2800.01 MHz, 06-3d-04 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 0, core 1, package 0 cpu2
Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta
Now you have on M less in your tree checkout :-) Thanks for tracking this down. On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote: > Honestly, I haven't spent much time to investigate how the curlen = 0 is > getting generated exactly, because for me it will be very difficult to > understand that without the hardware on my side re-producing the same. > > But I had look when the code was introduced to handle curlen == 0 later > in the function: > > if (iscontrol) { > /* >* adjust the toggle based on the number of packets >* in this qtd >*/ > if curlen + mps - 1) / mps) & 1) || curlen == 0) > qtdstatus ^= EHCI_QTD_TOGGLE_MASK; > } > > This was introduced by revision 1.57 of ehci.c 14 years ago: > > *** > > If a zero-length bulk or interrupt transfer is requested then assume > USBD_FORCE_SHORT_XFER to ensure that we actually build and execute > a transfer. > > Based on changes in FreeBSD rev1.47 > > *** > > While the DIAGNOSTIC code to panic at curlen == 0 was introduced with > the first commit of ehci.c. I think the revision 1.57 should have > removed that DIAGNOSTIC code already, since we obviously can cope > with curlen = 0. > > Given that, your below diff would be OK for me. > > On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote: > > > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote: > > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote: > > > > So time of the crash varies and I guess probably there is no pattern > > > > there. Here is new panic report, with some kernel printf()s added. > > > > > > > > Problem seems to be related that dataphyspage is greater than > > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000. > > > > The same is observable in all those previous panics. > > > > > > > > > > Here ia nother analysis, without my patches, I think they may be > > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values > > > of variables which I collected via my debugging effort. > > > > > > I think I have understanding what is the flow, but not sure what > > > code should do in those circumstances. > > > > > > > I didn't get any feedback on this so far. I'm running custom kernel > > with additional printf()'s to help me understand the flow. > > > > I ended up with following diff, which I think is safe to do, > > as code handles situation when curlen == 0 further down in > > ehci_alloc_sqtd_chain() function. My machine runs more than > > two weeks with below panic() removed and condition of panic > > is triggered multiple times during uptime and I didn't notice > > any side effects. > > > > However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me > > anyway. At this stage I don't know how to improve things, beyond below > > diff. > > > > Any feedback would be appreciated. I don't like running custom kernels, > > so I would like to drop below M from my source checkout. > > > > > > Index: ehci.c > > === > > RCS file: /cvs/src/sys/dev/usb/ehci.c,v > > retrieving revision 1.211 > > diff -u -p -u -r1.211 ehci.c > > --- ehci.c 6 Aug 2020 14:06:12 - 1.211 > > +++ ehci.c 23 Oct 2020 10:00:35 - > > @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc > > curlen -= curlen % mps; > > DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, " > > "curlen=%u\n", curlen)); > > -#ifdef DIAGNOSTIC > > - if (curlen == 0) > > - panic("ehci_alloc_std: curlen == 0"); > > -#endif > > } > > > > DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x " > > > > -- > > Regards, > > Mikolaj > > >