Re: endless loop in tcpdump

2020-10-24 Thread Jeremie Courreges-Anglas


dunno why the strange combination of To/Cc headers so I'll keep bugs@ in Cc:

On Sat, Oct 24 2020, p...@centroid.eu wrote:
>>Synopsis: a specially crafted packet can set tcpdump into an endless loop
>>Category: system
>>Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>   Architecture: OpenBSD.amd64
>   Machine : amd64
>>Description:
>   I was (un)fortunate enough to have this treat of a bug from a cracker
> on the 'net.  Thanks!  They sent me an infinite loop in tcpdump.  I bug hunted
> and shared my findings on the misc@ mailing list.  I wasn't sure becuase I'm
> not the best code reader so I crafted a DOS to exploit this infinite loop.
> I have mailed OpenBSD off-lists with this proof of concept code.
>>How-To-Repeat:
> before patch:
>
> tcpdump -v -n -i lo0 port 
>
> 10:30:54.667969 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok]  GTPv1-C (teid 
> 0, len 0) [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS 
> Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS 
> Support] [MBMS Support] [MBMS Support] ...
> 2 packets received by filter
> 0 packets dropped by kernel
>
> This was triggered with netcat:
>
> nc -up 2123 localhost  < dos.packet
>
>>Fix:
> after patch:
>
> tcpdump.p: listening on lo0, link-type LOOP
> 10:43:41.005389 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok]  GTPv1-C (teid 
> 0, len 0) [|GTPv1-C] (ttl 64, id 58060, len 41)
> ^C
> 2 packets received by filter
> 0 packets dropped by kernel
> spica# tcpdump.p -v -n -i lo0 -Xp-sp1500pport8
> tcpdump.p: listening on lo0, link-type LOOP
> 10:44:11.956464 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok]  GTPv1-C (teid 
> 0, len 0) [|GTPv1-C] (ttl 64, id 18693, len 41)
>   : 4500 0029 4905  4011 33bd 7f00 0001E..)I...@.3.
>   0010: 7f00 0001 084b 22b8 0015 a2bd 3400 .K".4...
>   0020:    0001 00 .
>
> ^C
> 2 packets received by filter
> 0 packets dropped by kernel
>
> The patch looks like follows:
>
> Index: print-gtp.c
> ===
> RCS file: /cvs/src/usr.sbin/tcpdump/print-gtp.c,v
> retrieving revision 1.12
> diff -u -p -u -r1.12 print-gtp.c
> --- print-gtp.c   20 May 2020 01:20:37 -  1.12
> +++ print-gtp.c   24 Oct 2020 08:56:00 -
> @@ -927,6 +927,8 @@ gtp_v1_print(const u_char *cp, u_int len
>  
>   /* Header length is a 4 octet multiplier. */

I've never seen GTP in the wild but indeed a length of zero is invalid.
In case it can help other reviewers:

  
https://www.etsi.org/deliver/etsi_ts/129000_129099/129060/15.05.00_60/ts_129060v150500p.pdf

>   hlen = (int)p[0] * 4;
> + if (hlen == 0)
> + goto trunc;

LGTM, though I'd suggest printing why we're bailing out.


Index: print-gtp.c
===
RCS file: /d/cvs/src/usr.sbin/tcpdump/print-gtp.c,v
retrieving revision 1.12
diff -u -p -p -u -r1.12 print-gtp.c
--- print-gtp.c 20 May 2020 01:20:37 -  1.12
+++ print-gtp.c 24 Oct 2020 22:02:47 -
@@ -927,6 +927,11 @@ gtp_v1_print(const u_char *cp, u_int len
 
/* Header length is a 4 octet multiplier. */
hlen = (int)p[0] * 4;
+   if (hlen == 0) {
+   printf(" [Invalid zero-length header %u]",
+   nexthdr);
+   goto trunc;
+   }
TCHECK2(p[0], hlen);
 
switch (nexthdr) {

-- 
jca | PGP : 0x1524E7EE / 5135 92C1 AD36 5293 2BDF  DDCC 0DFA 74AE 1524 E7EE



Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread T.J. Townsend
> 2020-10-24 16:41 GMT+02:00 Stefan Sperling :
> > On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote:
> > > Fair enough, but "there's no auto-assembly and it's inefficient and
> > > nothing stops you from messing with the intermediate discipline" is a
> > > different kind of not supported than "you should expect kernel panics".
> > > 
> > > If the latter is the case, maybe it should be documented in the
> > > softraid(4) CAVEATS, as it breaks the sd(4) abstraction.
> > 
> > Neither Joel's mail nor the word "unsupported" imply a promise
> > that it will work without auto-assembly and with inefficient i/o.
> > 
> > Unsupported means unsupported. We don't need to list any reasons
> > for this in user-facing documentation.
> 
> I'm not suggesting justifying why, I am saying that softraid(4) is
> documented to assemble sd(4) devices into sd(4) devices. If it's
> actually "sd(4) devices that are not themselves softraid(4) backed",
> that would be worth documenting as it breaks the sd(4) abstraction.
> 
> Said another way, how was I supposed to find out this is unsupported?
> It's not like "a mirrored full-disk encrypted device" is an exotic
> configuration that would give me pause.

It's documented in the FAQ:

> Note that "stacking" softraid modes (mirrored drives and encryption,
> for example) is not supported at this time

https://www.openbsd.org/faq/faq14.html#softraidFDE



Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Filippo Valsorda
2020-10-24 19:26 GMT+02:00 Theo de Raadt :
> Filippo Valsorda  wrote:
> 
> > 2020-10-24 19:01 GMT+02:00 Theo de Raadt :
> > 
> >  Filippo Valsorda  wrote:
> > 
> >  > Said another way, how was I supposed to find out this is unsupported?
> > 
> >  The way you just found out.
> > 
> >  > It's not like "a mirrored full-disk encrypted device" is an exotic
> >  > configuration that would give me pause.
> > 
> >  there's a song that goes "You can't always get what you want"
> > 
> >  Nothing is perfect.  Do people rail against other groups in the same way?
> > 
> > Alright, I'm disengaging.
> > 
> > This was a bizarre interaction, I just reported a crash that doesn't
> > even affect me anymore (I was disassembling that system), trying to
> > follow the reporting guidelines as much as possible, for something that
> > I had no way of knowing was unsupported.
> > 
> 
> You are disengaging... but just have to get ONE MORE snipe in!
> 
> Meanwhile, no diff.  Not for the kernel, that would be difficult.
> 
> But no diff for the manual pages either (it is rather obviously that
> the people who hit this would know what what pages they read, and
> where they should have seen a warning, and what form it should take)

Ah, if you're interested in a patch for the manual page, happy to send
one. I'll read the contribution docs and send one tomorrow.

I had suggested both the page and the section where I would have
found a warning, but sending a diff telling you what you support
and what you don't felt more like overstepping. In my own projects, I
prefer users don't do that, as they can't know the boundary of what is
supported and what is not.


Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Theo de Raadt
Filippo Valsorda  wrote:

> 2020-10-24 19:01 GMT+02:00 Theo de Raadt :
> 
>  Filippo Valsorda  wrote:
> 
>  > Said another way, how was I supposed to find out this is unsupported?
> 
>  The way you just found out.
> 
>  > It's not like "a mirrored full-disk encrypted device" is an exotic
>  > configuration that would give me pause.
> 
>  there's a song that goes "You can't always get what you want"
> 
>  Nothing is perfect.  Do people rail against other groups in the same way?
> 
> Alright, I'm disengaging.
> 
> This was a bizarre interaction, I just reported a crash that doesn't
> even affect me anymore (I was disassembling that system), trying to
> follow the reporting guidelines as much as possible, for something that
> I had no way of knowing was unsupported.
> 

You are disengaging... but just have to get ONE MORE snipe in!

Meanwhile, no diff.  Not for the kernel, that would be difficult.

But no diff for the manual pages either (it is rather obviously that
the people who hit this would know what what pages they read, and
where they should have seen a warning, and what form it should take)

But no.

Either the margin is too narrow for such a diff, and it's easier to
assume that "I am right" commentary will generate results.

Some users really are their own worst enemy.



Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Filippo Valsorda
2020-10-24 19:01 GMT+02:00 Theo de Raadt :
> Filippo Valsorda  wrote:
> 
> > Said another way, how was I supposed to find out this is unsupported?
> 
> The way you just found out.
> 
> > It's not like "a mirrored full-disk encrypted device" is an exotic
> > configuration that would give me pause.
> 
> there's a song that goes "You can't always get what you want"
> 
> 
> Nothing is perfect.  Do people rail against other groups in the same way?

Alright, I'm disengaging.

This was a bizarre interaction, I just reported a crash that doesn't
even affect me anymore (I was disassembling that system), trying to
follow the reporting guidelines as much as possible, for something that
I had no way of knowing was unsupported.


Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Theo de Raadt
Filippo Valsorda  wrote:

> Said another way, how was I supposed to find out this is unsupported?

The way you just found out.

> It's not like "a mirrored full-disk encrypted device" is an exotic
> configuration that would give me pause.

there's a song that goes "You can't always get what you want"


Nothing is perfect.  Do people rail against other groups in the same way?



Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Filippo Valsorda
2020-10-24 16:41 GMT+02:00 Stefan Sperling :
> On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote:
> > Fair enough, but "there's no auto-assembly and it's inefficient and
> > nothing stops you from messing with the intermediate discipline" is a
> > different kind of not supported than "you should expect kernel panics".
> > 
> > If the latter is the case, maybe it should be documented in the
> > softraid(4) CAVEATS, as it breaks the sd(4) abstraction.
> 
> Neither Joel's mail nor the word "unsupported" imply a promise
> that it will work without auto-assembly and with inefficient i/o.
> 
> Unsupported means unsupported. We don't need to list any reasons
> for this in user-facing documentation.

I'm not suggesting justifying why, I am saying that softraid(4) is
documented to assemble sd(4) devices into sd(4) devices. If it's
actually "sd(4) devices that are not themselves softraid(4) backed",
that would be worth documenting as it breaks the sd(4) abstraction.

Said another way, how was I supposed to find out this is unsupported?
It's not like "a mirrored full-disk encrypted device" is an exotic
configuration that would give me pause.


Intel I219-LM Failure

2020-10-24 Thread su.root
Following on from Message-ID
https://marc.info/?i=3f8cd8d2d3e7b4e8%20()%20p50%20!%20cerberus%20!%20com

This error notifications occurs when the laptop is rebooted. If one
boots the device with a Linux distribution (usb); poweroff the
device the next time openbsd boots up the error message has gone. As
soon as one reboots openbsd the error message is back.

The only way to avoid the error message is to poweroff
the device ("doas halt -p") and not reboot

OpenBSD 6.8-current (GENERIC.MP) #128: Tue Oct 20 23:51:05 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 16977334272 (16190MB)
avail mem = 16447705088 (15685MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.8 @ 0x56fd5000 (67 entries)
bios0: vendor LENOVO version "N1EET89W (1.62 )" date 06/18/2020
bios0: LENOVO 20ENCTO1WW
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP UEFI SSDT SSDT ECDT HPET APIC MCFG SSDT DBGP DBG2 BOOT 
BATB SSDT SSDT MSDM SSDT SSDT DMAR ASF! FPDT BGRT UEFI
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) PXSX(S4) PXSX(S4) PXSX(S4) 
PXSX(S4) PXSX(S4) PXSX(S4) PXSX(S4) XHCI(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpiec0 at acpi0
acpihpet0 at acpi0: 2399 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2594.91 MHz, 06-5e-03
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 24MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,SGX,BMI1,HLE,AVX2,SMEP,BMI2,ERMS,INVPCID,RTM,MPX,RDSEED,ADX,SMAP,CLFLUSHOPT,PT,SRBDS_CTRL,MD_CLEAR,TSXFA,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,XSAVEC,XGETBV1,XSAVES,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 0, core 3, package 0
cpu4 at mainbus0: apid 1 (application processor)
cpu4: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz, 2593.96 MHz, 06-5e-03
cpu4: 

Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Theo de Raadt
Demi M. Obenour  wrote:

> On 10/24/20 10:41 AM, Stefan Sperling wrote:
> > On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote:
> >> Fair enough, but "there's no auto-assembly and it's inefficient and
> >> nothing stops you from messing with the intermediate discipline" is a
> >> different kind of not supported than "you should expect kernel panics".
> >>
> >> If the latter is the case, maybe it should be documented in the
> >> softraid(4) CAVEATS, as it breaks the sd(4) abstraction.
> > 
> > Neither Joel's mail nor the word "unsupported" imply a promise
> > that it will work without auto-assembly and with inefficient i/o.
> > 
> > Unsupported means unsupported. We don't need to list any reasons
> > for this in user-facing documentation.
> 
> One could also argue that the kernel must never panic because userspace
> did something wrong.  The only exceptions I am aware of are:
> 
> - init dying
> - corrupt kernel image
> - corrupt root filesystem
> - not being able to mount the root filesystem
> - overwriting kernel memory with /dev/mem or DMA
> - hardware fault

Really.

rm -rf /
reboot

Oh my god, it panics on reboot.  And hundreds of other possible ways for
root to configure a broken system for the next operation.

Sadly, the margin was too narrow for any solution in the form of source code
or diff, instead we as developers get instructed on What To Do.

If you guys aren't part of the solution, you are part of the precipitate.



Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Demi M. Obenour
On 10/24/20 10:41 AM, Stefan Sperling wrote:
> On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote:
>> Fair enough, but "there's no auto-assembly and it's inefficient and
>> nothing stops you from messing with the intermediate discipline" is a
>> different kind of not supported than "you should expect kernel panics".
>>
>> If the latter is the case, maybe it should be documented in the
>> softraid(4) CAVEATS, as it breaks the sd(4) abstraction.
> 
> Neither Joel's mail nor the word "unsupported" imply a promise
> that it will work without auto-assembly and with inefficient i/o.
> 
> Unsupported means unsupported. We don't need to list any reasons
> for this in user-facing documentation.

One could also argue that the kernel must never panic because userspace
did something wrong.  The only exceptions I am aware of are:

- init dying
- corrupt kernel image
- corrupt root filesystem
- not being able to mount the root filesystem
- overwriting kernel memory with /dev/mem or DMA
- hardware fault

In particular, I would expect that at securelevel 1 or higher,
userspace should not be able to cause a fatal kernel page fault.

Demi


OpenPGP_0xB288B55FFF9C22C1.asc
Description: application/pgp-keys


OpenPGP_signature
Description: OpenPGP digital signature


Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Stefan Sperling
On Sat, Oct 24, 2020 at 04:11:00PM +0200, Filippo Valsorda wrote:
> Fair enough, but "there's no auto-assembly and it's inefficient and
> nothing stops you from messing with the intermediate discipline" is a
> different kind of not supported than "you should expect kernel panics".
> 
> If the latter is the case, maybe it should be documented in the
> softraid(4) CAVEATS, as it breaks the sd(4) abstraction.

Neither Joel's mail nor the word "unsupported" imply a promise
that it will work without auto-assembly and with inefficient i/o.

Unsupported means unsupported. We don't need to list any reasons
for this in user-facing documentation.



Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Filippo Valsorda
2020-10-24 15:37 GMT+02:00 Stefan Sperling :
> On Sat, Oct 24, 2020 at 03:10:05PM +0200, Filippo Valsorda wrote:
> > >Synopsis: kernel page fault in "sddetach -> bufq_destroy" during "bioctl 
> > >-d"
> > >Category: kernel
> > >Environment:
> > System  : OpenBSD 6.8
> > Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
> >  dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > Architecture: OpenBSD.amd64
> > Machine : amd64
> > >Description:
> > Starting with two RAID 1 arrays with a CRYPTO device on top of each (see
> > "devices" below), I first unmounted the filesystems, then successfully
> > detached the CRYPTO devices, and then tried to detach the first RAID 1.
> 
> Stacking softraid volumes is not supported.
> 
> That's the best answer at this point in time. It's clear that a raid1+crypto
> solution is needed, but nobody has done the work to make it happen.
> 
> As Joel explains here: 
> https://marc.info/?l=openbsd-misc=154349798307366=2 
> you get to keep the pieces when it breaks.

Fair enough, but "there's no auto-assembly and it's inefficient and
nothing stops you from messing with the intermediate discipline" is a
different kind of not supported than "you should expect kernel panics".

If the latter is the case, maybe it should be documented in the
softraid(4) CAVEATS, as it breaks the sd(4) abstraction.


Re: Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Stefan Sperling
On Sat, Oct 24, 2020 at 03:10:05PM +0200, Filippo Valsorda wrote:
> >Synopsis:kernel page fault in "sddetach -> bufq_destroy" during "bioctl 
> >-d"
> >Category:kernel
> >Environment:
>   System  : OpenBSD 6.8
>   Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
>
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> 
>   Architecture: OpenBSD.amd64
>   Machine : amd64
> >Description:
> Starting with two RAID 1 arrays with a CRYPTO device on top of each (see
> "devices" below), I first unmounted the filesystems, then successfully
> detached the CRYPTO devices, and then tried to detach the first RAID 1.

Stacking softraid volumes is not supported.

That's the best answer at this point in time. It's clear that a raid1+crypto
solution is needed, but nobody has done the work to make it happen.

As Joel explains here: https://marc.info/?l=openbsd-misc=154349798307366=2 
you get to keep the pieces when it breaks.



Kernel page fault in "sddetach -> bufq_destroy" during "bioctl -d"

2020-10-24 Thread Filippo Valsorda
>Synopsis:  kernel page fault in "sddetach -> bufq_destroy" during "bioctl 
>-d"
>Category:  kernel
>Environment:
System  : OpenBSD 6.8
Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
Starting with two RAID 1 arrays with a CRYPTO device on top of each (see
"devices" below), I first unmounted the filesystems, then successfully
detached the CRYPTO devices, and then tried to detach the first RAID 1.

# bioctl -d sd8 # CRYPTO on top of sd5a
# bioctl -d sd7 # CRYPTO on top of sd6a
# bioctl -d sd5 # RAID 1 on top of sd2a,sd3a

The machine immediately dropped into ddb, which I reached from the serial
console. I was unable to recover the console output before the panic.

ddb{0}> show panic
kernel page fault
uvm_fault(0x82153778, 0x0002, 0, 1) -> e
bufq_destroy(80115710) at bufq_destroy+0x83
end trace frame: 0x8000227379a0, count: 0

ddb{0}> trace
bufq_destroy(80115710) at bufq_destroy+0x83
sddetach(80115600,1) at sddetach+0x42
config_detach(80115600,1) at config_detach+0x142
scsi_detach_link(80125800,1) at scsi_detach_link+0x4d
sr_discipline_shutdown(80124000,1,0) at sr_discipline_shutdown+0x13e
sr_bio_handler(8012,80124000,c2d04227,80655c00) 
at sr_bio_handler+0x1ce
sdioctl(d52,c2d04227,80655c00,3,800022604030) at sdioctl+0x4e9

VOP_IOCTL(fd8234962018,c2d04227,80655c00,3,fd828b7bd300,800022604030)
 at VOP_IOCTL+0x55
vn_ioctl(fd80edfe2788,c2d04227,80655c00,800022604030) at 
vn_ioctl+0x75
sys_ioctl(800022604030,800022737e60,800022737ec0) at 
sys_ioctl+0x2d4
syscall(800022737f30) at syscall+0x389
Xsyscall() at Xsyscall+0x128
end of kernel
end trace frame: 0x7f7de2e0, count: -12

After rebooting that RAID 1 was gone. I successfully detached the remaining
RAID 1, and generated the sendbug(1) output after wiping sd0,sd1,sd2,sd3.

devices:
==> sd0 / duid: b27184271af409ec
sd0: , serial WD-WCC4N2HCXC13
  a:  2794.5G   64RAID
  c:  2794.5G0  unused
==> sd1 / duid: cbcf06a16339dea8
sd1: , serial WD-WCC4N2KYDERF
  a:  2794.5G   64RAID
  c:  2794.5G0  unused
==> sd2 / duid: 612a362a9042bc12
sd2: , serial WD-WCC4N4ARTXUZ
  a:  2794.5G   64RAID
  c:  2794.5G0  unused
==> sd3 / duid: 6c4bf1c584cf2ad5
sd3: , serial WD-WCC4N5VRVF3C
  a:  2794.5G   64RAID
  c:  2794.5G0  unused
==> sd4 / duid: 3141c7e6e6fc07f5
sd4: , serial (unknown)
  a: 0.3G   64  4.2BSD   2048 16384  5657 # /
  b: 0.5G   730144swap
  c:14.6G0  unused
  d: 0.4G  1739776  4.2BSD   2048 16384  7206 # /mfs/dev
  e: 0.6G  2662144  4.2BSD   2048 16384  9870 # /mfs/var
  f: 1.9G  3925504  4.2BSD   2048 16384 12960 # /usr
  g: 0.5G  7843264  4.2BSD   2048 16384  8061 # /home
  h: 1.6G  8883424  4.2BSD   2048 16384 12960 # /usr/local
  i: 1.4G 12249248  4.2BSD   2048 16384 12960 # /usr/src
  j: 5.2G 15080800  4.2BSD   2048 16384 12960 # /usr/obj
==> sd5 / duid: 7cc6a1f7b9a86bc3
Volume  Status   Size Device
softraid0 0 Online  3000592646656 sd5 RAID1
  0 Online  3000592646656 0:0.0   noencl 
  1 Online  3000592646656 0:1.0   noencl 
  a:  2794.5G0RAID
  c:  2794.5G0  unused
==> sd6 / duid: b5e9c71ced61dca9
Volume  Status   Size Device
softraid0 1 Online  3000592646656 sd6 RAID1
  0 Online  3000592646656 1:0.0   noencl 
  1 Online  3000592646656 1:1.0   noencl 
  a:  1863.0G0RAID
  c:  2794.5G0  unused
  d:   930.5G   3907021696  4.2BSD   8192 65536 52238 # /array/ct
==> sd7 / duid: d74b6781f7ca7b59
Volume  Status   Size Device
softraid0 2 Online  2000394827264 sd7 CRYPTO
  0 Online  2000394827264 2:0.0   noencl 
  a:  1863.0G   64  4.2BSD   8192 65536 24688 # /array/misc
  c:  1863.0G0  unused
==> sd8 / duid: 246216bff2633caa
Volume  Status   Size Device
softraid0 3 Online  3000592376320 sd8 CRYPTO
  0 Online  3000592376320 3:0.0   noencl 
  a:   931.5G0  4.2BSD   8192 65536 52238 # /array/main
  c:  2794.5G 

endless loop in tcpdump

2020-10-24 Thread pjp
>Synopsis:  a specially crafted packet can set tcpdump into an endless loop
>Category:  system
>Environment:
System  : OpenBSD 6.8
Details : OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
 
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP

Architecture: OpenBSD.amd64
Machine : amd64
>Description:
I was (un)fortunate enough to have this treat of a bug from a cracker
on the 'net.  Thanks!  They sent me an infinite loop in tcpdump.  I bug hunted
and shared my findings on the misc@ mailing list.  I wasn't sure becuase I'm
not the best code reader so I crafted a DOS to exploit this infinite loop.
I have mailed OpenBSD off-lists with this proof of concept code.
>How-To-Repeat:
before patch:

tcpdump -v -n -i lo0 port 

10:30:54.667969 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok]  GTPv1-C (teid 0, 
len 0) [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS 
Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS Support] [MBMS 
Support] [MBMS Support] [MBMS Support] ...
2 packets received by filter
0 packets dropped by kernel

This was triggered with netcat:

nc -up 2123 localhost  < dos.packet

>Fix:
after patch:

tcpdump.p: listening on lo0, link-type LOOP
10:43:41.005389 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok]  GTPv1-C (teid 0, 
len 0) [|GTPv1-C] (ttl 64, id 58060, len 41)
^C
2 packets received by filter
0 packets dropped by kernel
spica# tcpdump.p -v -n -i lo0 -Xp-sp1500pport8
tcpdump.p: listening on lo0, link-type LOOP
10:44:11.956464 127.0.0.1.2123 > 127.0.0.1.: [udp sum ok]  GTPv1-C (teid 0, 
len 0) [|GTPv1-C] (ttl 64, id 18693, len 41)
  : 4500 0029 4905  4011 33bd 7f00 0001  E..)I...@.3.
  0010: 7f00 0001 084b 22b8 0015 a2bd 3400   .K".4...
  0020:    0001 00   .

^C
2 packets received by filter
0 packets dropped by kernel

The patch looks like follows:

Index: print-gtp.c
===
RCS file: /cvs/src/usr.sbin/tcpdump/print-gtp.c,v
retrieving revision 1.12
diff -u -p -u -r1.12 print-gtp.c
--- print-gtp.c 20 May 2020 01:20:37 -  1.12
+++ print-gtp.c 24 Oct 2020 08:56:00 -
@@ -927,6 +927,8 @@ gtp_v1_print(const u_char *cp, u_int len
 
/* Header length is a 4 octet multiplier. */
hlen = (int)p[0] * 4;
+   if (hlen == 0)
+   goto trunc;
TCHECK2(p[0], hlen);
 
switch (nexthdr) {



dmesg:
OpenBSD 6.8 (GENERIC.MP) #98: Sun Oct  4 18:13:26 MDT 2020
dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 17059287040 (16269MB)
avail mem = 16527237120 (15761MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0x8afac000 (32 entries)
bios0: vendor Apple Inc. version "186.0.0.0.0" date 06/14/2019
bios0: Apple Inc. MacBookPro12,1
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP HPET APIC SBST ECDT SSDT SSDT SSDT SSDT SSDT SSDT SSDT 
SSDT DMAR MCFG
acpi0: wakeup devices PEG0(S3) EC__(S3) HDEF(S3) RP01(S3) RP02(S3) RP03(S4) 
ARPT(S4) RP05(S3) RP06(S3) SPIT(S3) XHC1(S3) ADP1(S3) LID0(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz, 2800.35 MHz, 06-3d-04
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 100MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz, 2800.01 MHz, 06-3d-04
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,3DNOWP,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,RDSEED,ADX,SMAP,PT,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 0, core 1, package 0
cpu2 

Re: panic: ehci_alloc_std: curlen == 0 on 6.8-beta

2020-10-24 Thread Marcus Glocker
Now you have on M less in your tree checkout :-)
Thanks for tracking this down.

On Fri, Oct 23, 2020 at 06:50:53PM +0200, Marcus Glocker wrote:

> Honestly, I haven't spent much time to investigate how the curlen = 0 is
> getting generated exactly, because for me it will be very difficult to
> understand that without the hardware on my side re-producing the same.
> 
> But I had look when the code was introduced to handle curlen == 0 later
> in the function:
> 
> if (iscontrol) {
>   /*
>* adjust the toggle based on the number of packets
>* in this qtd
>*/
>   if curlen + mps - 1) / mps) & 1) || curlen == 0)
>   qtdstatus ^= EHCI_QTD_TOGGLE_MASK;
> }
> 
> This was introduced by revision 1.57 of ehci.c 14 years ago:
> 
> ***
> 
> If a zero-length bulk or interrupt transfer is requested then assume
> USBD_FORCE_SHORT_XFER to ensure that we actually build and execute
> a transfer.
> 
> Based on changes in FreeBSD rev1.47
> 
> ***
> 
> While the DIAGNOSTIC code to panic at curlen == 0 was introduced with
> the first commit of ehci.c.  I think the revision 1.57 should have
> removed that DIAGNOSTIC code already, since we obviously can cope
> with curlen = 0.
> 
> Given that, your below diff would be OK for me.
> 
> On Fri, Oct 23, 2020 at 10:09:14AM +, Mikolaj Kucharski wrote:
> 
> > On Sat, Sep 26, 2020 at 11:00:46PM +, Mikolaj Kucharski wrote:
> > > On Wed, Sep 16, 2020 at 09:19:37PM +, Mikolaj Kucharski wrote:
> > > > So time of the crash varies and I guess probably there is no pattern
> > > > there. Here is new panic report, with some kernel printf()s added.
> > > > 
> > > > Problem seems to be related that dataphyspage is greater than
> > > > dataphyslastpage: lastpage=0xa0e page=0xa0e1000 phys=0xa0e1000.
> > > > The same is observable in all those previous panics.
> > > > 
> > > 
> > > Here ia nother analysis, without my patches, I think they may be
> > > confusing. I will show flow of ehci_alloc_sqtd_chain() with values
> > > of variables which I collected via my debugging effort.
> > > 
> > > I think I have understanding what is the flow, but not sure what
> > > code should do in those circumstances.
> > > 
> > 
> > I didn't get any feedback on this so far. I'm running custom kernel
> > with additional printf()'s to help me understand the flow.
> > 
> > I ended up with following diff, which I think is safe to do,
> > as code handles situation when curlen == 0 further down in
> > ehci_alloc_sqtd_chain() function. My machine runs more than
> > two weeks with below panic() removed and condition of panic
> > is triggered multiple times during uptime and I didn't notice
> > any side effects.
> > 
> > However, flow of ehci_alloc_sqtd_chain() is hard to follow. Well, for me
> > anyway. At this stage I don't know how to improve things, beyond below
> > diff.
> > 
> > Any feedback would be appreciated. I don't like running custom kernels,
> > so I would like to drop below M from my source checkout.
> > 
> > 
> > Index: ehci.c
> > ===
> > RCS file: /cvs/src/sys/dev/usb/ehci.c,v
> > retrieving revision 1.211
> > diff -u -p -u -r1.211 ehci.c
> > --- ehci.c  6 Aug 2020 14:06:12 -   1.211
> > +++ ehci.c  23 Oct 2020 10:00:35 -
> > @@ -2407,10 +2407,6 @@ ehci_alloc_sqtd_chain(struct ehci_softc 
> > curlen -= curlen % mps;
> > DPRINTFN(1,("ehci_alloc_sqtd_chain: multiple QTDs, "
> > "curlen=%u\n", curlen));
> > -#ifdef DIAGNOSTIC
> > -   if (curlen == 0)
> > -   panic("ehci_alloc_std: curlen == 0");
> > -#endif
> > }
> >  
> > DPRINTFN(4,("ehci_alloc_sqtd_chain: dataphys=0x%08x "
> > 
> > -- 
> > Regards,
> >  Mikolaj
> > 
>