Re: Use SMR_TAILQ for `ps_threads'

2020-12-01 Thread Jonathan Matthew
On Tue, Dec 01, 2020 at 02:35:18PM -0300, Martin Pieuchot wrote:
> On 01/12/20(Tue) 15:30, Claudio Jeker wrote:
> > [...] 
> > Did you run a make build with that smr_barrier() in it and checked that it
> > does not cause a slow down? I am sceptical, smr_barrier() is a very slow
> > construct which introduces large delays and should be avoided whenever
> > possible.
> 
> I did build GENERIC.MP multiple times on a 4CPU sparc64 with the diff
> below, without noticeable difference.
> 
> I'm happy to hear from sceptical performance checkers :o)

On a reasonably fast amd64 box, this increases GENERIC.MP make -j6 build time 
from
~3m06s to ~3m44s, which seems a bit much to me.

Replacing smr_barrier() with smr_flush() reduces the overhead to a couple of
seconds, and it seems warranted here.

> 
> diff --git lib/libkvm/kvm_proc2.c lib/libkvm/kvm_proc2.c
> index 96f7dc91b92..1f4f9b914bb 100644
> --- lib/libkvm/kvm_proc2.c
> +++ lib/libkvm/kvm_proc2.c
> @@ -341,8 +341,9 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process 
> *pr,
>   kp.p_pctcpu = 0;
>   kp.p_stat = (process.ps_flags & PS_ZOMBIE) ? SDEAD :
>   SIDL;
> - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; 
> - p = TAILQ_NEXT(&proc, p_thr_link)) {
> + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads);
> + p != NULL;
> + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) {
>   if (KREAD(kd, (u_long)p, &proc)) {
>   _kvm_err(kd, kd->program,
>   "can't read proc at %lx",
> @@ -376,8 +377,8 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process 
> *pr,
>   if (!dothreads)
>   continue;
>  
> - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; 
> - p = TAILQ_NEXT(&proc, p_thr_link)) {
> + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); p != NULL;
> + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) {
>   if (KREAD(kd, (u_long)p, &proc)) {
>   _kvm_err(kd, kd->program,
>   "can't read proc at %lx",
> diff --git sys/kern/exec_elf.c sys/kern/exec_elf.c
> index 5e455208663..575273b306c 100644
> --- sys/kern/exec_elf.c
> +++ sys/kern/exec_elf.c
> @@ -85,6 +85,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -1360,7 +1361,7 @@ coredump_notes_elf(struct proc *p, void *iocookie, 
> size_t *sizep)
>* threads in the process have been stopped and the list can't
>* change.
>*/
> - TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) {
> + SMR_TAILQ_FOREACH_LOCKED(q, &pr->ps_threads, p_thr_link) {
>   if (q == p) /* we've taken care of this thread */
>   continue;
>   error = coredump_note_elf(q, iocookie, ¬esize);
> diff --git sys/kern/init_main.c sys/kern/init_main.c
> index fed6be19435..2b657ffe328 100644
> --- sys/kern/init_main.c
> +++ sys/kern/init_main.c
> @@ -519,7 +519,7 @@ main(void *framep)
>*/
>   LIST_FOREACH(pr, &allprocess, ps_list) {
>   nanouptime(&pr->ps_start);
> - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
> + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) {
>   nanouptime(&p->p_cpu->ci_schedstate.spc_runtime);
>   timespecclear(&p->p_rtime);
>   }
> diff --git sys/kern/kern_exit.c sys/kern/kern_exit.c
> index a20775419e3..3c526ab83b8 100644
> --- sys/kern/kern_exit.c
> +++ sys/kern/kern_exit.c
> @@ -63,6 +63,7 @@
>  #ifdef SYSVSEM
>  #include 
>  #endif
> +#include 
>  #include 
>  
>  #include 
> @@ -161,7 +162,8 @@ exit1(struct proc *p, int xexit, int xsig, int flags)
>   }
>  
>   /* unlink ourselves from the active threads */
> - TAILQ_REMOVE(&pr->ps_threads, p, p_thr_link);
> + SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link);
> + smr_barrier();
>   if ((p->p_flag & P_THREAD) == 0) {
>   /* main thread gotta wait because it has the pid, et al */
>   while (pr->ps_refcnt > 1)
> @@ -724,7 +726,7 @@ process_zap(struct process *pr)
>   if (pr->ps_ptstat != NULL)
>   free(pr->ps_ptstat, M_SUBPROC, sizeof(*pr->ps_ptstat));
>   pool_put(&rusage_pool, pr->ps_ru);
> - KASSERT(TAILQ_EMPTY(&pr->ps_threads));
> + KASSERT(SMR_TAILQ_EMPTY_LOCKED(&pr->ps_threads));
>   lim_free(pr->ps_limit);
>   crfree(pr->ps_ucred);
>   pool_put(&process_pool, pr);
> diff --git sys/kern/kern_fork.c sys/kern/kern_fork.c
> index 9fb239bc8b4..e1cb587b2b8 100644
> --- sys/kern/kern_fork.c
> +++ sys/kern/kern_fork.c
> @@ -52,6 +52,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 

Re: ACPI diff that needs wide testing

2020-12-01 Thread Florian Obser
On Tue, Dec 01, 2020 at 10:40:30PM +0100, Mark Kettenis wrote:
> > From: Greg Steuck 
> > Date: Mon, 30 Nov 2020 20:54:59 -0800
> > 
> > Mark Kettenis  writes:
> > 
> > > The diff below fixes the way we handle named references in AML
> > > packages.  This fixes some bugs but I'd like to make sure that this
> > > doesn't inadvertedly break things.  So tests on a wide variety of
> > > machines are welcome.
> > 
> > I see a prompt crash:
> 
> Does this one work better?

Indeed it does, no panic, suspend resume still works, no change in dmesg.
X1 Gen 2.

OpenBSD 6.8-current (GENERIC.MP) #2: Wed Dec  2 08:09:03 CET 2020

flor...@openbsd-build.home.narrans.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 8233394176 (7851MB)
avail mem = 7968555008 (7599MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdcd3d000 (61 entries)
bios0: vendor LENOVO version "GRET40WW (1.17 )" date 09/02/2014
bios0: LENOVO 20A7006VUS
acpi0 at bios0: ACPI 5.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SLIC DBGP ECDT HPET APIC MCFG SSDT SSDT SSDT SSDT SSDT 
SSDT SSDT PCCT SSDT UEFI MSDM ASF! BATB FPDT UEFI DMAR
acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3)
acpitimer0 at acpi0: 3579545 Hz, 24 bits
acpiec0 at acpi0
acpihpet0 at acpi0: 14318179 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.30 MHz, 06-45-01
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu0: 256KB 64b/line 8-way L2 cache
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
cpu1 at mainbus0: apid 1 (application processor)
cpu1: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.16 MHz, 06-45-01
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu1: 256KB 64b/line 8-way L2 cache
cpu1: smt 1, core 0, package 0
cpu2 at mainbus0: apid 2 (application processor)
cpu2: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.16 MHz, 06-45-01
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu2: 256KB 64b/line 8-way L2 cache
cpu2: smt 0, core 1, package 0
cpu3 at mainbus0: apid 3 (application processor)
cpu3: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.16 MHz, 06-45-01
cpu3: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
cpu3: 256KB 64b/line 8-way L2 cache
cpu3: smt 1, core 1, package 0
ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 40 pins
acpimcfg0 at acpi0
acpimcfg0: addr 0xf800, bus 0-63
acpiprt0 at acpi0: bus 0 (PCI0)
acpiprt1 at acpi0: bus -1 (PEG_)
acpiprt2 at acpi0: bus 2 (EXP1)
acpiprt3 at acpi0: bus 3 (EXP2)
acpiprt4 at acpi0: bus -1 (EXP3)
dwiic0 at acpi0 I2C1 addr 0xfe105000/0x1000 irq 7
iic0 at dwiic0
"CPLM3218" at iic0 addr 0x48 not configured
acpibtn0 at acpi0: LID_
acpibtn1 at acpi0: SLPB
acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001
acpicmos0 at acpi0
acpibat0 at acpi0: BAT0 model "45N1703" serial  3191 type LiP oem "SMP"
acpiac0 at acpi0: AC unit offline
acpithinkpad0 at acpi0: version 2.0
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"PNP0C14" at acpi0 not configured
"INT340F" at acpi0 not configured
acpicpu0 at acpi0: C3(200@506 mwait.1@0x60), C2(200@148 mwait.1@0x33), 
C1(1000@1 mwa

Re: ACPI diff that needs wide testing

2020-12-01 Thread Greg Steuck
Mark Kettenis  writes:

>> From: Greg Steuck 
>> Date: Mon, 30 Nov 2020 20:54:59 -0800
>> 
>> Mark Kettenis  writes:
>> 
>> > The diff below fixes the way we handle named references in AML
>> > packages.  This fixes some bugs but I'd like to make sure that this
>> > doesn't inadvertedly break things.  So tests on a wide variety of
>> > machines are welcome.
>> 
>> I see a prompt crash:
>
> Does this one work better?

Much better. No crash and no difference in dmesg with/without
patch. Also survives suspend/wakeup. I'll keep running with this for
now.

Thanks
Greg



Re: Dell XPS 9310 succesful install but bootloader can't read disk label

2020-12-01 Thread George Koehler
On Wed, 2 Dec 2020 01:59:00 +0100
Noth  wrote:

> Disk: sd0   Usable LBA: 34 to 4000797326 [4000797360 Sectors]
>     #: type [   start: size ]
> 
>     0: EFI Sys  [    2048: 389120 ]
>     1: e3c9e316-0b5c-4db8-817d-f92df00215ae [  391168: 262144 ]
>     2: FAT12    [  653312: 1071679488 ]
>     3: 516e7cba-6ecf-11d6-8ff8-00022d09712b [  1072332800: 3905536 ]
>     4: Linux files* [  1076238336: 2692558848 ]
>     5: OpenBSD  [  3768797184: 195311616 ]
>     6: Win Recovery [  3964108800: 2027520 ]
>     7: Win Recovery [  3966136320: 31772672 ]
>     8: Win Recovery [  3997911040: 2885632 ]

OpenBSD offset 3768797184 is about 1797G.  I believe that our EFI
bootloader works only if OpenBSD partition 'a' is in the first 1024G
of the drive.  Back in January 2020, I suspected that a daddr32_t
would overflow in /sys/arch/amd64/stand/efiboot/efidev.c, see
https://marc.info/?l=openbsd-bugs&m=158007879212894&w=2

I have no drives larger than 1024G, so I have no way to reproduce the
problem.--George



Re: Dell XPS 9310 succesful install but bootloader can't read disk label

2020-12-01 Thread Noth
Here we go (sd0 is the NVMe, sd0a is an encrypted OpenBSD install, which 
is on sd2). And yes, once decrypted, I can write to the partitions:


Disk: sd0   Usable LBA: 34 to 4000797326 [4000797360 Sectors]
   #: type [   start: size ]

   0: EFI Sys  [    2048: 389120 ]
   1: e3c9e316-0b5c-4db8-817d-f92df00215ae [  391168: 262144 ]
   2: FAT12    [  653312: 1071679488 ]
   3: 516e7cba-6ecf-11d6-8ff8-00022d09712b [  1072332800: 3905536 ]
   4: Linux files* [  1076238336: 2692558848 ]
   5: OpenBSD  [  3768797184: 195311616 ]
   6: Win Recovery [  3964108800: 2027520 ]
   7: Win Recovery [  3966136320: 31772672 ]
   8: Win Recovery [  3997911040: 2885632 ]

Disk: sd2   Usable LBA: 64 to 195311024 [195311088 Sectors]
   #: type [   start: size ]

   1: EFI Sys  [  64: 960 ]
   3: OpenBSD  [    1024: 195310001 ]

# /dev/rsd0c:
type: SCSI
disk: SCSI disk
label: KXG60PNV2T04 NVM
duid: 9bfcd2a2080acb33
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 249038
total sectors: 4000797360
boundstart: 3768797184
boundend: 3964108800
drivedata: 0

16 partitions:
#    size   offset  fstype [fsize bsize   cpg]
  a:    195311616   3768797184    RAID
  c:   4000797360    0  unused
  i:   389120 2048   MSDOS
  j:   262144   391168 unknown
  k:   1071679488   653312   MSDOS
  l:  2027520   3964108800 unknown
  m: 31772672   3966136320 unknown
  n:  2885632   3997911040 unknown
  o:  3905536   1072332800  ext2fs
  p:   2692558848   1076238336  ext2fs

# /dev/rsd2c:
type: SCSI
disk: SCSI disk
label: SR CRYPTO
duid: 72cca61970b173bb
flags:
bytes/sector: 512
sectors/track: 63
tracks/cylinder: 255
sectors/cylinder: 16065
cylinders: 12157
total sectors: 195311088
boundstart: 1024
boundend: 195311025
drivedata: 0

16 partitions:
#    size   offset  fstype [fsize bsize   cpg]
  a:  2097152 1024  4.2BSD   2048 16384 12960
  b: 17486680  2098176    swap
  c:    195311088    0  unused
  d:  8388576 19584864  4.2BSD   2048 16384 12960
  e: 22683520 27973440  4.2BSD   2048 16384 12960
  f: 12582912 50656960  4.2BSD   2048 16384 12960
  g:  2097152 63239872  4.2BSD   2048 16384 12960
  h: 28081408 65337024  4.2BSD   2048 16384 12960
  i:  960   64   MSDOS
  j:  4194304 93418432  4.2BSD   2048 16384 12960
  k: 12582912 97612736  4.2BSD   2048 16384 12960
  l: 85115360    110195648  4.2BSD   2048 16384 12960

I hope this isn't because I'm triplebooting? I guess that is an 
edgecase, but all the same...


Cheers,

Noth

On 01/12/2020 21:39, Mark Kettenis wrote:

From: Noth 
Date: Tue, 1 Dec 2020 21:18:14 +0100

Hi,

    As a follow up, I got a usb-c stick and installed -current to that.
It boots the system so I now have a dmesg, pcidump and usbdump for you:

http://casper.nineinchnetworks.ch/images/dmesg9310.txt

http://casper.nineinchnetworks.ch/images/pcidump9310.txt

http://casper.nineinchnetworks.ch/images/usbdump9310.txt

Cheers,

Noth

While booted using the USB stick, can you:

a) Access the NVMe drive?

b) Show the output of disklabel sd0?

c) Show the output of fdisk sd0?

Cheers,

Mark


On 19/11/2020 00:31, Noth wrote:

Hi,

   I've got a brand new Dell XPS 9310 and I've tried to get 6.7, 6.8
and the latest snapshot on it. Installation works fine but neither the
3.50, 3.54 or 3.55 UEFI bootloaders can read the disk label so it's
unbootable. I of course disactivated Secure Boot and anything in the
BIOS that looked like it might be an obstacle. I've got two pictures
of the bsd.rd dmesg for the latest snapshot for you here:

http://casper.nineinchnetworks.ch/images/dmesg1.jpg

http://casper.nineinchnetworks.ch/images/dmesg2.jpg

I have no idea what to do next, hope this is of use to you.

Cheers,

Noth

P.S.: I'm not subscribed to this mailing list, please CC me when
answering.







PF synproxy should act on inbound packets only

2020-12-01 Thread Alexandr Nedvedicky
Hello,

the issue described here has been hit bu Stuart some time ago.  feel free to
stop reading if you don't care/use pf(4) synproxy.

let's assume there are rules which allow just surfing web over http:

block all
pass proto tcp from any to any port = 80 synproxy state
pass proto udp from any to any port = 53 

The 'synproxy' rule is not bound to any interface nor any direction. As such
it matches every forwarded SYN packet to port 80 two times. If there would
have been no 'synproxy state' in rule, then pair of states gets created:

A -> B:80 @ IN  # inbound direction at RX NIC

A -> B:80 @ OUT # outbound direction at TX NIC

The 'synproxy' action changes things a bit. The 'synproxy' action is
handled here in pf_create_state():


4163 if (pd->proto == IPPROTO_TCP && (th->th_flags & (TH_SYN|TH_ACK)) ==
4164 TH_SYN && r->keep_state == PF_STATE_SYNPROXY) {
4165 int rtid = pd->rdomain;
4166 if (act->rtableid >= 0)
4167 rtid = act->rtableid;
4168 pf_set_protostate(s, PF_PEER_SRC, PF_TCPS_PROXY_SRC);
4169 s->src.seqhi = arc4random();
4170 /* Find mss option */
4171 mss = pf_get_mss(pd);
4172 mss = pf_calc_mss(pd->src, pd->af, rtid, mss);
4173 mss = pf_calc_mss(pd->dst, pd->af, rtid, mss);
4174 s->src.mss = mss;
4175 pf_send_tcp(r, pd->af, pd->dst, pd->src, th->th_dport,
4176 th->th_sport, s->src.seqhi, ntohl(th->th_seq) + 1,
4177 TH_SYN|TH_ACK, 0, s->src.mss, 0, 1, 0, pd->rdomain);
 ^^ 
4178 REASON_SET(&reason, PFRES_SYNPROXY);
4179 return (PF_SYNPROXY_DROP);
4180 }

Note pf_send_tcp() at line 4175 is being called with argument `tag` set to 1.
`tag` == 1 orders pf_send_tcp() to send PF_TAG_GENERATED flag at packet.

2770 struct mbuf *
2771 pf_build_tcp(const struct pf_rule *r, sa_family_t af,
2772 const struct pf_addr *saddr, const struct pf_addr *daddr,
2773 u_int16_t sport, u_int16_t dport, u_int32_t seq, u_int32_t ack,
2774 u_int8_t flags, u_int16_t win, u_int16_t mss, u_int8_t ttl, int tag,
2775 u_int16_t rtag, u_int sack, u_int rdom)
2776 {

2805 
2806 /* create outgoing mbuf */
2807 m = m_gethdr(M_DONTWAIT, MT_HEADER);
2808 if (m == NULL)
2809 return (NULL);
2810 if (tag)
2811 m->m_pkthdr.pf.flags |= PF_TAG_GENERATED;
2812 m->m_pkthdr.pf.tag = rtag;

the PF_TAG_GENERATED flag brings an unexpected effect to packet sent by
pf_send_tcp() at line 4175. The packet sent by pf_send_tcp() is intercepted
by pf_test() as outbound. However PF_TAG_GENERATED flag turns pf_test()
into No-OP:

6855 int
6856 pf_test(sa_family_t af, int fwdir, struct ifnet *ifp, struct mbuf **m0)
6857 {

#ifdef DIAGNOSTIC
6890 if (((*m0)->m_flags & M_PKTHDR) == 0)
6891 panic("non-M_PKTHDR is passed to pf_test");
6892 #endif /* DIAGNOSTIC */
6893 
6894 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_GENERATED)
6895 return (PF_PASS);
6896 
6897 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_DIVERTED_PACKET)
6898 return (PF_PASS);
6899 

however it is not desired in this case. We actually do want PF to create a
state for such packet, so response B:80 -> A can enter the firewall.

the fix is to apply synproxy action on inbound packets only. Diff below
does that exactly. Furthermore it also makes pfctl(8) to emit warning,
when synproxy is being used in outbound/unbound rule:

lumpy$ echo 'pass proto tcp from any to any port = 80 synproxy state' |./pfctl 
-nf -  
warning (stdin:1): synproxy acts on inbound packets only
synproxy action is ignored for outbound packets


OK?

thanks and
regards
sashan

8<---8<---8<--8<
diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y
index f06171158cb..d052b5b9a0e 100644
--- a/sbin/pfctl/parse.y
+++ b/sbin/pfctl/parse.y
@@ -4042,6 +4042,13 @@ rule_consistent(struct pf_rule *r)
"synproxy state or modulate state");
problems++;
}
+
+   if ((r->keep_state == PF_STATE_SYNPROXY) && (r->direction != PF_IN))
+   fprintf(stderr, "warning (%s:%d): "
+   "synproxy acts on inbound packets only\n"
+   "synproxy action is ignored for outbound packets\n",
+   file->name, yylval.lineno);
+
if ((r->nat.addr.type != PF_ADDR_NONE ||
r->rdr.addr.type != PF_ADDR_NONE) &&
r->action != PF_MATCH && !r->keep_state) {
diff --git a/sys/net/pf.c b/sys/net/pf.c
index 823fdc22133..986ee57bff9 100644
--- a/sys/net/pf.c
+++ b/sys/net/pf.c
@@ -4161,7 +4161,7 @@ pf_create_state(struct pf_pdesc *pd, struct pf_rule *r, 

Re: stdio: fclose(3), fopen(3): intended locking hierarchy?

2020-12-01 Thread Scott Cheloha
On Mon, Nov 30, 2020 at 08:44:41PM -0800, Philip Guenther wrote:
> On Mon, Nov 30, 2020 at 6:10 PM Scott Cheloha 
> wrote:
> 
> > On Sat, Nov 28, 2020 at 01:41:50PM -0800, Philip Guenther wrote:
> >
> ...
> 
> > > After thinking through states more, #4 isn't safe: _fwalk() can't take
> > > sfp_mutex because it's called from __srefill() which has its FILE locked,
> > > which would reverse the order: two concurrent calls to __srefill() from
> > > line-buffered FILEs could have one in _fwalk() blocking on locking the
> > > other, while the other blocks on the sfp_mutex for _fwalk().
> >
> > This part in __srefill():
> >
> > /*
> >  * Before reading from a line buffered or unbuffered file,
> >  * flush all line buffered output files, per the ANSI C
> >  * standard.
> >  */
> >
> ...
> 
> > Where in the standard(s) is this behavior required?  I'm not even sure
> > how to look for it.
> >
> 
> Pick a version of the C standard and search for "buffered" until you find
> something like this, which is part of 7.19.3p3 in the draft I'm looking at:
> 
><...>  When a stream is line buffered, characters are intended to be
>transmitted to or from the host environment as a block when a new-line
> character is
>encountered. Furthermore, characters are intended to be transmitted as a
> block to the host
>environment when a buffer is filled, when input is requested on an
> unbuffered stream, or
>when input is requested on a line buffered stream that requires the
> transmission of
>characters from the host environment. Support for these characteristics
> is
>implementation-defined, and may be affected via the setbuf and setvbuf
> functions.
> 
> Effectively the same text appears in the POSIX standard in XSH 2.5p6.
> 
> Basically, you're allowed to stop doing that by instead printing out your
> cell-phone number so that everyone who wants to complain that their program
> failed to output its prompt before blocking for input can call and scream
> at you.  :D

If they want that they need to call fflush(3) on the stream the prompt
is printed to if that stream is line- or block-buffered.

Anyway, the key phrase is:

> Furthermore, characters are intended to be transmitted as a block to the host
> environment when a buffer is filled, when input is requested on an unbuffered
> stream, or when input is requested on a line buffered stream that requires the
> transmission of characters from the host environment.

When I read this I get the impression they are only talking about the
FILE being read, not that we should transfer characters as a block for
*every* open FILE when input is requested on any unbuffered or
line-buffered stream.

But I am not certain exactly what is meant by the passage you quoted.
It is pretty vague.

One way around the deadlock in __srefill() is to add a FTRYLOCKFILE()
macro and do that during _fwalk() before acting on each FILE.  If we
can't lock the FILE we just move on to the next one.

That doesn't seem right to me though.

FWIW, musl doesn't flush every file when refilling a line- or
unbuffered FILE.  Then again, maybe they misinterpreted the standard. 



Re: ACPI diff that needs wide testing

2020-12-01 Thread Mark Kettenis
> From: Greg Steuck 
> Date: Mon, 30 Nov 2020 20:54:59 -0800
> 
> Mark Kettenis  writes:
> 
> > The diff below fixes the way we handle named references in AML
> > packages.  This fixes some bugs but I'd like to make sure that this
> > doesn't inadvertedly break things.  So tests on a wide variety of
> > machines are welcome.
> 
> I see a prompt crash:

Does this one work better?

Index: dev/acpi/acpi.c
===
RCS file: /cvs/src/sys/dev/acpi/acpi.c,v
retrieving revision 1.391
diff -u -p -r1.391 acpi.c
--- dev/acpi/acpi.c 27 Aug 2020 01:08:55 -  1.391
+++ dev/acpi/acpi.c 1 Dec 2020 21:40:03 -
@@ -2980,21 +2980,24 @@ acpi_getprop(struct aml_node *node, cons
/* Check properties. */
for (i = 0; i < dsd.v_package[1]->length; i++) {
struct aml_value *res = dsd.v_package[1]->v_package[i];
+   struct aml_value *val;
int len;
 
if (res->type != AML_OBJTYPE_PACKAGE || res->length != 2 ||
res->v_package[0]->type != AML_OBJTYPE_STRING)
continue;
 
-   len = res->v_package[1]->length;
-   switch (res->v_package[1]->type) {
+   val = res->v_package[1];
+   if (val->type == AML_OBJTYPE_OBJREF)
+   val = val->v_objref.ref;
+
+   len = val->length;
+   switch (val->type) {
case AML_OBJTYPE_BUFFER:
-   memcpy(buf, res->v_package[1]->v_buffer,
-   min(len, buflen));
+   memcpy(buf, val->v_buffer, min(len, buflen));
return len;
case AML_OBJTYPE_STRING:
-   memcpy(buf, res->v_package[1]->v_string,
-   min(len, buflen));
+   memcpy(buf, val->v_string, min(len, buflen));
return len;
}
}
@@ -3031,14 +3034,22 @@ acpi_getpropint(struct aml_node *node, c
/* Check properties. */
for (i = 0; i < dsd.v_package[1]->length; i++) {
struct aml_value *res = dsd.v_package[1]->v_package[i];
+   struct aml_value *val;
 
if (res->type != AML_OBJTYPE_PACKAGE || res->length != 2 ||
-   res->v_package[0]->type != AML_OBJTYPE_STRING ||
-   res->v_package[1]->type != AML_OBJTYPE_INTEGER)
+   res->v_package[0]->type != AML_OBJTYPE_STRING)
+   continue;
+
+   val = res->v_package[1];
+   if (val->type == AML_OBJTYPE_OBJREF)
+   val = val->v_objref.ref;
+
+   if (val->type != AML_OBJTYPE_INTEGER)
continue;
 
-   if (strcmp(res->v_package[0]->v_string, prop) == 0)
-   return res->v_package[1]->v_integer;
+   if (strcmp(res->v_package[0]->v_string, prop) == 0 &&
+   val->type == AML_OBJTYPE_INTEGER)
+   return val->v_integer;
}
 
return defval;
@@ -3130,7 +3141,7 @@ const char *acpi_isa_hids[] = {
 void
 acpi_attach_deps(struct acpi_softc *sc, struct aml_node *node)
 {
-   struct aml_value res;
+   struct aml_value res, *val;
struct aml_node *dep;
int i;
 
@@ -3141,9 +3152,14 @@ acpi_attach_deps(struct acpi_softc *sc, 
return;
 
for (i = 0; i < res.length; i++) {
-   if (res.v_package[i]->type != AML_OBJTYPE_STRING)
+   val = res.v_package[i];
+   if (val->type == AML_OBJTYPE_OBJREF)
+   val = val->v_objref.ref;
+   if (val->type != AML_OBJTYPE_DEVICE) {
+   printf("%s: type %d\n", __func__, val->type);
continue;
-   dep = aml_searchrel(node, res.v_package[i]->v_string);
+   }
+   dep = val->node;
if (dep == NULL || dep->attached)
continue;
dep = aml_searchname(dep, "_HID");
Index: dev/acpi/acpiprt.c
===
RCS file: /cvs/src/sys/dev/acpi/acpiprt.c,v
retrieving revision 1.49
diff -u -p -r1.49 acpiprt.c
--- dev/acpi/acpiprt.c  11 Apr 2020 11:01:18 -  1.49
+++ dev/acpi/acpiprt.c  1 Dec 2020 21:40:03 -
@@ -272,14 +272,6 @@ acpiprt_prt_add(struct acpiprt_softc *sc
}
 
pp = v->v_package[2];
-   if (pp->type == AML_OBJTYPE_STRING) {
-   node = aml_searchrel(sc->sc_devnode, pp->v_string);
-   if (node == NULL) {
-   printf("Invalid device\n");
-   return;
-   }
-   pp = node->value;
-   }
if (pp->type == AML_OBJTYPE_NAMEREF) {
node = aml_searchrel(sc->sc_devnode, pp->v_nameref);
if

Re: Dell XPS 9310 succesful install but bootloader can't read disk label

2020-12-01 Thread Mark Kettenis
> From: Noth 
> Date: Tue, 1 Dec 2020 21:18:14 +0100
> 
> Hi,
> 
>    As a follow up, I got a usb-c stick and installed -current to that. 
> It boots the system so I now have a dmesg, pcidump and usbdump for you:
> 
> http://casper.nineinchnetworks.ch/images/dmesg9310.txt
> 
> http://casper.nineinchnetworks.ch/images/pcidump9310.txt
> 
> http://casper.nineinchnetworks.ch/images/usbdump9310.txt
> 
> Cheers,
> 
> Noth

While booted using the USB stick, can you:

a) Access the NVMe drive?

b) Show the output of disklabel sd0?

c) Show the output of fdisk sd0?

Cheers,

Mark

> On 19/11/2020 00:31, Noth wrote:
> > Hi,
> >
> >   I've got a brand new Dell XPS 9310 and I've tried to get 6.7, 6.8 
> > and the latest snapshot on it. Installation works fine but neither the 
> > 3.50, 3.54 or 3.55 UEFI bootloaders can read the disk label so it's 
> > unbootable. I of course disactivated Secure Boot and anything in the 
> > BIOS that looked like it might be an obstacle. I've got two pictures 
> > of the bsd.rd dmesg for the latest snapshot for you here:
> >
> > http://casper.nineinchnetworks.ch/images/dmesg1.jpg
> >
> > http://casper.nineinchnetworks.ch/images/dmesg2.jpg
> >
> > I have no idea what to do next, hope this is of use to you.
> >
> > Cheers,
> >
> > Noth
> >
> > P.S.: I'm not subscribed to this mailing list, please CC me when 
> > answering.
> >
> 
> 



Re: Dell XPS 9310 succesful install but bootloader can't read disk label

2020-12-01 Thread Noth

Hi,

  As a follow up, I got a usb-c stick and installed -current to that. 
It boots the system so I now have a dmesg, pcidump and usbdump for you:


http://casper.nineinchnetworks.ch/images/dmesg9310.txt

http://casper.nineinchnetworks.ch/images/pcidump9310.txt

http://casper.nineinchnetworks.ch/images/usbdump9310.txt

Cheers,

Noth

On 19/11/2020 00:31, Noth wrote:

Hi,

  I've got a brand new Dell XPS 9310 and I've tried to get 6.7, 6.8 
and the latest snapshot on it. Installation works fine but neither the 
3.50, 3.54 or 3.55 UEFI bootloaders can read the disk label so it's 
unbootable. I of course disactivated Secure Boot and anything in the 
BIOS that looked like it might be an obstacle. I've got two pictures 
of the bsd.rd dmesg for the latest snapshot for you here:


http://casper.nineinchnetworks.ch/images/dmesg1.jpg

http://casper.nineinchnetworks.ch/images/dmesg2.jpg

I have no idea what to do next, hope this is of use to you.

Cheers,

Noth

P.S.: I'm not subscribed to this mailing list, please CC me when 
answering.






Re: ACPI diff that needs wide testing

2020-12-01 Thread Mark Kettenis
> Date: Tue, 1 Dec 2020 20:17:47 +0100
> From: Theo Buehler 
> 
> On Tue, Dec 01, 2020 at 08:10:33PM +0100, Mark Kettenis wrote:
> > > From: Greg Steuck 
> > > Date: Mon, 30 Nov 2020 20:54:59 -0800
> > > 
> > > Mark Kettenis  writes:
> > > 
> > > > The diff below fixes the way we handle named references in AML
> > > > packages.  This fixes some bugs but I'd like to make sure that this
> > > > doesn't inadvertedly break things.  So tests on a wide variety of
> > > > machines are welcome.
> > > 
> > > I see a prompt crash: https://photos.app.goo.gl/5NkwAHA6jxrXFzEV7
> > 
> > Can you send me the acpidump output for that machine?
> 
> gnezdo provided iasl -d output:
> 
> > > The contents of
> > >cp -R /var/db/acpi /tmp && iasl -d /tmp/acpi/*
> > > are here: https://blackgnezdo.github.io/hp-elitebook-840.tgz

Ah crap.  Reading is hard...  Sorry Greg.



Re: ACPI diff that needs wide testing

2020-12-01 Thread Theo Buehler
On Tue, Dec 01, 2020 at 08:10:33PM +0100, Mark Kettenis wrote:
> > From: Greg Steuck 
> > Date: Mon, 30 Nov 2020 20:54:59 -0800
> > 
> > Mark Kettenis  writes:
> > 
> > > The diff below fixes the way we handle named references in AML
> > > packages.  This fixes some bugs but I'd like to make sure that this
> > > doesn't inadvertedly break things.  So tests on a wide variety of
> > > machines are welcome.
> > 
> > I see a prompt crash: https://photos.app.goo.gl/5NkwAHA6jxrXFzEV7
> 
> Can you send me the acpidump output for that machine?

gnezdo provided iasl -d output:

> > The contents of
> >cp -R /var/db/acpi /tmp && iasl -d /tmp/acpi/*
> > are here: https://blackgnezdo.github.io/hp-elitebook-840.tgz



Re: ACPI diff that needs wide testing

2020-12-01 Thread Mark Kettenis
> From: Greg Steuck 
> Date: Mon, 30 Nov 2020 20:54:59 -0800
> 
> Mark Kettenis  writes:
> 
> > The diff below fixes the way we handle named references in AML
> > packages.  This fixes some bugs but I'd like to make sure that this
> > doesn't inadvertedly break things.  So tests on a wide variety of
> > machines are welcome.
> 
> I see a prompt crash: https://photos.app.goo.gl/5NkwAHA6jxrXFzEV7

Can you send me the acpidump output for that machine?

> OCRed:
> 7b880 size Ox50 previous type ACPI (0xdeaf4151 = 0xdeaf4152)
> Stopped at db_enter+0x10: : pop %rbp
> TID PID UID PR FLAGS PFLAGS
> 0x1 0x200 CPU COMMAND 0K swapper
> db_enter(10, 8251cf10,282,8, 81887a30, 8251cf18) at 
> db_ enter +0x10
> panic(81dd9174, 81dd9174, 3f, 8le23d0e, 
> 80080807b888,9) at panic+0x12a
> malloc(50,21,9,50,69ffcbe5966771e0, 8010d488) at malloc+0x783
> aml_parse(8810d488,69, 8003a810, 8010d488, 
> ee7eae2dea48b7a 4, 8010d488) at aml_parse+0x2a32
> aml_parse(8010d488, 54,0,80008818d488, ee?eae2dea 
> 4983ef,80880818d488) at aml_parse+0x3bc
> aml_eval(80176f88, 808088082108, 74, 18021, 8, 
> 800888176f88) at aml_eval+0x317
> aml_parse(80176f88, 74, 7d, 80176f88, ee?eae2dea 48b7a4, 
> 89800017 6f88) at aml_parse+0x2df6
> aml_parse(808080176f88,54, 88176f88, 88176f88, 
> ee7eae2dea4983e, f80176f88) at aml_parse+8x3bc
> aml_eval(0, 80081a08, 74,0,0,0) at aml_eval+0x317
> aml_evalnode(88008202c, 80082008,8,0, 
> 8251d648,8808088 8202c) at aml_evalnode +0xb4
> aml_evalinteger(8002e400, 80082c88, 81ddbf04,8,8, 
>  8251d6f0) at aml_evalinteger+0x4e
> acpitz_gettempreading(80257800, 81ddbf04, dad3d43216419857, 
> ff ff8251d73b, 80257800, aac) at acpitz_gettempreading +0x37
> acpitz_attach(8002e400, 80257800, 8251d818, 
> 8002e4 08,44b6122d7f8019d2, 8002e480) at acpitz_attach+Oxlea
> config_attach(80028400, 8210ef70, 82510818, 
> 817576 40,68aa0891922ad23e, 80082c88) at config_attach+8x1f4
> end trace frame: 0x8251d950, count: 0
> 
> The contents of
>cp -R /var/db/acpi /tmp && iasl -d /tmp/acpi/*
> are here: https://blackgnezdo.github.io/hp-elitebook-840.tgz
> 
> dmesg when not running your patch:
> OpenBSD 6.8-current (GENERIC.MP) #69: Mon Nov 30 19:53:29 PST 2020
> g...@hippy.nest.cx:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 17055608832 (16265MB)
> avail mem = 16523407360 (15757MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbbe3f000 (33 entries)
> bios0: vendor Hewlett-Packard version "L71 Ver. 01.20" date 07/28/2014
> bios0: Hewlett-Packard HP EliteBook 840 G1
> acpi0 at bios0: ACPI 5.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP HPET APIC MCFG TCPA SSDT SSDT FPDT BGRT SSDT SSDT 
> SSDT SSDT SSDT ASF! DMAR
> acpi0: wakeup devices LANC(S5) EHC1(S3) XHC_(S3) PCIB(S5) NIC_(S5) RP04(S5) 
> WNIC(S5)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpihpet0 at acpi0: 14318179 Hz
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 1995.70 MHz, 06-45-01
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: apic clock running at 99MHz
> cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE
> cpu1 at mainbus0: apid 1 (application processor)
> cpu1: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 1995.39 MHz, 06-45-01
> cpu1: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu1: 256KB 64b/line 8-way L2 cache
> cpu1: smt 1, core 0, package 0
> cpu2 at mainbus0: apid 2 (application processor)
> cpu2: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 1995.40 MHz, 06-45-01
> cpu2: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMO

[PATCH] __rec_open() should set the type

2020-12-01 Thread Boudewijn Dijkstra
According to dbopen(3), the 'type' field of struct DB holds the "type of  
the underlying access method (and file format)."  In __bt_open() it is set  
to DB_BTREE and in __hash_open() it is set to DB_HASH, so one might expect  
that in __rec_open() it is set to DB_RECNO.  However, it is left untouched.


Index: rec_open.c
===
RCS file: /cvs/src/lib/libc/db/recno/rec_open.c,v
retrieving revision 1.13
diff -u -p -u -r1.13 rec_open.c
--- rec_open.c  21 Sep 2016 04:38:56 -  1.13
+++ rec_open.c  1 Dec 2020 13:01:37 -
@@ -154,6 +154,7 @@ slow:   if ((t->bt_rfp = fdopen(rfd, "r"
dbp->put = __rec_put;
dbp->seq = __rec_seq;
dbp->sync = __rec_sync;
+   dbp->type = DB_RECNO;

/* If the root page was created, reset the flags. */
if ((h = mpool_get(t->bt_mp, P_ROOT, 0)) == NULL)



--
Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/



Re: Use SMR_TAILQ for `ps_threads'

2020-12-01 Thread Martin Pieuchot
On 01/12/20(Tue) 15:30, Claudio Jeker wrote:
> [...] 
> Did you run a make build with that smr_barrier() in it and checked that it
> does not cause a slow down? I am sceptical, smr_barrier() is a very slow
> construct which introduces large delays and should be avoided whenever
> possible.

I did build GENERIC.MP multiple times on a 4CPU sparc64 with the diff
below, without noticeable difference.

I'm happy to hear from sceptical performance checkers :o)

diff --git lib/libkvm/kvm_proc2.c lib/libkvm/kvm_proc2.c
index 96f7dc91b92..1f4f9b914bb 100644
--- lib/libkvm/kvm_proc2.c
+++ lib/libkvm/kvm_proc2.c
@@ -341,8 +341,9 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process *pr,
kp.p_pctcpu = 0;
kp.p_stat = (process.ps_flags & PS_ZOMBIE) ? SDEAD :
SIDL;
-   for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; 
-   p = TAILQ_NEXT(&proc, p_thr_link)) {
+   for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads);
+   p != NULL;
+   p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) {
if (KREAD(kd, (u_long)p, &proc)) {
_kvm_err(kd, kd->program,
"can't read proc at %lx",
@@ -376,8 +377,8 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process *pr,
if (!dothreads)
continue;
 
-   for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; 
-   p = TAILQ_NEXT(&proc, p_thr_link)) {
+   for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); p != NULL;
+   p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) {
if (KREAD(kd, (u_long)p, &proc)) {
_kvm_err(kd, kd->program,
"can't read proc at %lx",
diff --git sys/kern/exec_elf.c sys/kern/exec_elf.c
index 5e455208663..575273b306c 100644
--- sys/kern/exec_elf.c
+++ sys/kern/exec_elf.c
@@ -85,6 +85,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -1360,7 +1361,7 @@ coredump_notes_elf(struct proc *p, void *iocookie, size_t 
*sizep)
 * threads in the process have been stopped and the list can't
 * change.
 */
-   TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) {
+   SMR_TAILQ_FOREACH_LOCKED(q, &pr->ps_threads, p_thr_link) {
if (q == p) /* we've taken care of this thread */
continue;
error = coredump_note_elf(q, iocookie, ¬esize);
diff --git sys/kern/init_main.c sys/kern/init_main.c
index fed6be19435..2b657ffe328 100644
--- sys/kern/init_main.c
+++ sys/kern/init_main.c
@@ -519,7 +519,7 @@ main(void *framep)
 */
LIST_FOREACH(pr, &allprocess, ps_list) {
nanouptime(&pr->ps_start);
-   TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
+   SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) {
nanouptime(&p->p_cpu->ci_schedstate.spc_runtime);
timespecclear(&p->p_rtime);
}
diff --git sys/kern/kern_exit.c sys/kern/kern_exit.c
index a20775419e3..3c526ab83b8 100644
--- sys/kern/kern_exit.c
+++ sys/kern/kern_exit.c
@@ -63,6 +63,7 @@
 #ifdef SYSVSEM
 #include 
 #endif
+#include 
 #include 
 
 #include 
@@ -161,7 +162,8 @@ exit1(struct proc *p, int xexit, int xsig, int flags)
}
 
/* unlink ourselves from the active threads */
-   TAILQ_REMOVE(&pr->ps_threads, p, p_thr_link);
+   SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link);
+   smr_barrier();
if ((p->p_flag & P_THREAD) == 0) {
/* main thread gotta wait because it has the pid, et al */
while (pr->ps_refcnt > 1)
@@ -724,7 +726,7 @@ process_zap(struct process *pr)
if (pr->ps_ptstat != NULL)
free(pr->ps_ptstat, M_SUBPROC, sizeof(*pr->ps_ptstat));
pool_put(&rusage_pool, pr->ps_ru);
-   KASSERT(TAILQ_EMPTY(&pr->ps_threads));
+   KASSERT(SMR_TAILQ_EMPTY_LOCKED(&pr->ps_threads));
lim_free(pr->ps_limit);
crfree(pr->ps_ucred);
pool_put(&process_pool, pr);
diff --git sys/kern/kern_fork.c sys/kern/kern_fork.c
index 9fb239bc8b4..e1cb587b2b8 100644
--- sys/kern/kern_fork.c
+++ sys/kern/kern_fork.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -179,8 +180,8 @@ process_initialize(struct process *pr, struct proc *p)
 {
/* initialize the thread links */
pr->ps_mainproc = p;
-   TAILQ_INIT(&pr->ps_threads);
-   TAILQ_INSERT_TAIL(&pr->ps_threads, p, p_thr_link);
+   SMR_TAILQ_INIT(&pr->ps_threads);
+   SMR_TAILQ_INSERT_TAIL_LOCKED(&pr->ps_threads, p, p_thr_link);
pr->ps_refcnt = 1;
p->p_p = pr;
 
@@ -557,7 +558,7 @@ t

Re: More uvm_pagefree() lock fixes

2020-12-01 Thread Mark Kettenis
> Date: Tue, 1 Dec 2020 13:18:59 -0300
> From: Martin Pieuchot 
> 
> On 01/12/20(Tue) 15:18, Mark Kettenis wrote:
> > > Date: Tue, 1 Dec 2020 11:08:50 -0300
> > > From: Martin Pieuchot 
> > > 
> > > As recently pointed out by jmatthew@ these need the lock as well, ok?
> > 
> > No.  These pages are "unmanaged" and therefore they can not be on any
> > of the page queues and locking those page queues would be pointless.
> > 
> > We probably should adjust the documentation to reflect this.
> 
> Something like that?

Yes, because...

> Btw did you look at the uses of uvm_pagefree(), uvm_pagelookup() & co in
> the various pmap?  Do you have any suggestion about how to handle them?

...the pages in the various pmaps aren't managed either.  However the
use of uvm_pagelookup() in some of the pmaps suggests that we need to
think a bit about PG_TABLED.  It doesn't really make sense to use the
page queue lock for that though.

For the amd64 pmap there are separate objects for each individual
userland pmap.  So the per-pmap lock may already provide the necessary
locking.

I also need to go back to your previous diff.  In hindsight I think
locking the page queues should not be necessary for buffer map pages
either.

> Index: uvm/uvm_page.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_page.c,v
> retrieving revision 1.153
> diff -u -p -r1.153 uvm_page.c
> --- uvm/uvm_page.c1 Dec 2020 13:56:22 -   1.153
> +++ uvm/uvm_page.c1 Dec 2020 16:16:29 -
> @@ -928,7 +928,7 @@ uvm_pagerealloc(struct vm_page *pg, stru
>   * uvm_pageclean: clean page
>   *
>   * => erase page's identity (i.e. remove from object)
> - * => caller must lock page queues
> + * => caller must lock page queues if `pg' is managed
>   * => assumes all valid mappings of pg are gone
>   */
>  void
> @@ -937,7 +937,8 @@ uvm_pageclean(struct vm_page *pg)
>   u_int flags_to_clear = 0;
>  
>  #if all_pmap_are_fixed
> - MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
> + if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE))
> + MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
>  #endif
>  
>  #ifdef DEBUG
> @@ -998,14 +999,15 @@ uvm_pageclean(struct vm_page *pg)
>   *
>   * => erase page's identity (i.e. remove from object)
>   * => put page on free list
> - * => caller must lock page queues
> + * => caller must lock page queues if `pg' is managed
>   * => assumes all valid mappings of pg are gone
>   */
>  void
>  uvm_pagefree(struct vm_page *pg)
>  {
>  #if all_pmap_are_fixed
> - MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
> + if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE))
> + MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
>  #endif
>  
>   uvm_pageclean(pg);
> 



Re: [PATCH] __rec_open() should set the type

2020-12-01 Thread Todd C . Miller
On Tue, 01 Dec 2020 14:07:18 +0100, "Boudewijn Dijkstra" wrote:

> According to dbopen(3), the 'type' field of struct DB holds the "type of  
> the underlying access method (and file format)."  In __bt_open() it is set  
> to DB_BTREE and in __hash_open() it is set to DB_HASH, so one might expect  
> that in __rec_open() it is set to DB_RECNO.  However, it is left untouched.

Thanks, committed.

 - todd



Re: More uvm_pagefree() lock fixes

2020-12-01 Thread Martin Pieuchot
On 01/12/20(Tue) 15:18, Mark Kettenis wrote:
> > Date: Tue, 1 Dec 2020 11:08:50 -0300
> > From: Martin Pieuchot 
> > 
> > As recently pointed out by jmatthew@ these need the lock as well, ok?
> 
> No.  These pages are "unmanaged" and therefore they can not be on any
> of the page queues and locking those page queues would be pointless.
> 
> We probably should adjust the documentation to reflect this.

Something like that?

Btw did you look at the uses of uvm_pagefree(), uvm_pagelookup() & co in
the various pmap?  Do you have any suggestion about how to handle them?

Index: uvm/uvm_page.c
===
RCS file: /cvs/src/sys/uvm/uvm_page.c,v
retrieving revision 1.153
diff -u -p -r1.153 uvm_page.c
--- uvm/uvm_page.c  1 Dec 2020 13:56:22 -   1.153
+++ uvm/uvm_page.c  1 Dec 2020 16:16:29 -
@@ -928,7 +928,7 @@ uvm_pagerealloc(struct vm_page *pg, stru
  * uvm_pageclean: clean page
  *
  * => erase page's identity (i.e. remove from object)
- * => caller must lock page queues
+ * => caller must lock page queues if `pg' is managed
  * => assumes all valid mappings of pg are gone
  */
 void
@@ -937,7 +937,8 @@ uvm_pageclean(struct vm_page *pg)
u_int flags_to_clear = 0;
 
 #if all_pmap_are_fixed
-   MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
+   if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE))
+   MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
 #endif
 
 #ifdef DEBUG
@@ -998,14 +999,15 @@ uvm_pageclean(struct vm_page *pg)
  *
  * => erase page's identity (i.e. remove from object)
  * => put page on free list
- * => caller must lock page queues
+ * => caller must lock page queues if `pg' is managed
  * => assumes all valid mappings of pg are gone
  */
 void
 uvm_pagefree(struct vm_page *pg)
 {
 #if all_pmap_are_fixed
-   MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
+   if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE))
+   MUTEX_ASSERT_LOCKED(&uvm.pageqlock);
 #endif
 
uvm_pageclean(pg);



snmpd drop traphandler process

2020-12-01 Thread Martijn van Duren
Hello tech@,

Long story short: the traphandler process in snmpd annoys me a great
deal and is in the way for overhauling the transport mapping section
of snmpe, which is needed for implementing new agentx master support.

The current traphandler process is also a joke, since all it does is
receive a message, does a minimal parsing validation and then pass
then entire buffer to the parent process, which forks the actual
handlers. This means that the current pledgeset is also way too wide
for what traphandler does.

Since snmpe already does a similar packet parsing I see no reason to
keep this initial verification in the traphandler process. The diff
below drops the traphandler process and moves the receiving of the
packets to snmpe, which then forwards the pdu (instead of the
current whole packet) to the parent. I also extended the checking, so
it should be a little more resilient to malformed packets.

While here I also didn't like the fact that listening on a trap-port
always implies that snmp itself is also listening. This diff adds the
trap keyword (for udp) on the listen on line, so that we can setup a
traphandler-only setup. This gives the added bonus that we can now
also choose the port we listen on for traps. Adding a listen trap
statement requires a trap handle and vice versa.

It's not the pretties code, but it's a stopgap that allows me to move
forward without creating even larger diffs and without (hopefully)
breaking existing setups (apart from the keyword change).

OK?

martijn@

Index: parse.y
===
RCS file: /cvs/src/usr.sbin/snmpd/parse.y,v
retrieving revision 1.62
diff -u -p -r1.62 parse.y
--- parse.y 30 Oct 2020 07:43:48 -  1.62
+++ parse.y 1 Dec 2020 16:12:20 -
@@ -94,11 +94,9 @@ char *symget(const char *);
 struct snmpd   *conf = NULL;
 static int  errors = 0;
 static struct usmuser  *user = NULL;
-static char*snmpd_port = SNMPD_PORT;
 
 int host(const char *, const char *, int,
struct sockaddr_storage *, int);
-int listen_add(struct sockaddr_storage *, int);
 
 typedef struct {
union {
@@ -284,13 +282,16 @@ listenproto   : UDP listen_udp
 listen_udp : STRING port   {
struct sockaddr_storage ss[16];
int nhosts, i;
+   char *port = $2;
 
-   nhosts = host($1, $2, SOCK_DGRAM, ss, nitems(ss));
+   if (port == NULL)
+   port = SNMPD_PORT;
+
+   nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss));
if (nhosts < 1) {
yyerror("invalid address: %s", $1);
free($1);
-   if ($2 != snmpd_port)
-   free($2);
+   free($2);
YYERROR;
}
if (nhosts > (int)nitems(ss))
@@ -298,10 +299,38 @@ listen_udp: STRING port   {
$1, $2, nitems(ss));
 
free($1);
-   if ($2 != snmpd_port)
+   free($2);
+   for (i = 0; i < nhosts; i++) {
+   if (listen_add(&(ss[i]), SOCK_DGRAM, 0) == -1) {
+   yyerror("calloc");
+   YYERROR;
+   }
+   }
+   }
+   | STRING port TRAP  {
+   struct sockaddr_storage ss[16];
+   int nhosts, i;
+   char *port = $2;
+
+   if (port == NULL)
+   port = SNMPD_TRAPPORT;
+
+   nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss));
+   if (nhosts < 1) {
+   yyerror("invalid address: %s", $1);
+   free($1);
free($2);
+   YYERROR;
+   }
+   if (nhosts > (int)nitems(ss))
+   log_warn("%s:%s resolves to more than %zu 
hosts",
+   $1, $2, nitems(ss));
+
+   free($1);
+   free($2);
for (i = 0; i < nhosts; i++) {
-   if (listen_add(&(ss[i]), SOCK_DGRAM) == -1) {
+   if (listen_add(&(ss[i]), SOCK_DGRAM,
+   ADDRESS_FLAG_TRAP) == -1) {
yyerror("calloc");
YYERROR;
   

Re: Use SMR_TAILQ for `ps_threads'

2020-12-01 Thread Claudio Jeker
On Tue, Dec 01, 2020 at 10:46:00AM -0300, Martin Pieuchot wrote:
> On 01/12/20(Tue) 21:47, Jonathan Matthew wrote:
> > On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote:
> > > On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote:
> > > > Every multi-threaded process keeps a list of threads in `ps_threads'.
> > > > This list is iterated in interrupt and process context which makes it
> > > > complicated to protect it with a rwlock.
> > > > 
> > > > One of the places where such iteration is done is inside the tsleep(9)
> > > > routines, directly in single_thread_check() or via CURSIG().  In order
> > > > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to
> > > > use SMR_TAILQ.  This has the advantage of not introducing lock
> > > > dependencies and allow us to address every iteration one-by-one.
> > > > 
> > > > Diff below is a first step into this direction, it replaces the existing
> > > > TAILQ_* macros by the locked version of SMR_TAILQ*.  This is mostly 
> > > > lifted
> > > > from claudio@'s diff and should not introduce any side effect.
> > > > 
> > > > ok?
> > > > 
> > > > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c
> > > > index 390307c4c81..40a10e4c1c5 100644
> > > > --- sys/uvm/uvm_glue.c
> > > > +++ sys/uvm/uvm_glue.c
> > > > @@ -369,7 +369,7 @@ uvm_swapout_threads(void)
> > > >  * the smallest p_slptime
> > > >  */
> > > > slpp = NULL;
> > > > -   TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
> > > > +   SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, 
> > > > p_thr_link) {
> > > > switch (p->p_stat) {
> > > > case SRUN:
> > > > case SONPROC:
> > > > 
> > > 
> > > Why did you not include the smr_call() to safely free struct proc in this
> > > diff?
> 
> So we can discuss it separately,  I'm happy to do it now :)
> 
> > I was wondering about this too.  Freeing the struct proc is already delayed
> > by some amount since it happens in the reaper or in the parent process, does
> > it make sense to combine that with the SMR wait?
> 
> exit1() for a multi-threaded process does the following:
> 
>   atomic_setbits_int(&p->p_flag, P_WEXIT);
> 
>   single_thread_check(p, 0);  // directly or via single_thread_set()
> 
>   SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link);
> 
> 
> Note that exit1() can't be called in parallel.
> 
> If exit1() is called with EXIT_NORMAL, the current thread will start by
> setting `ps_single' then iterate over `ps_thread'.  This is done before
> updating `ps_thread' so there's no race in that case.
> 
> If exit1() is called with EXIT_THREAD, the current thread might end up
> removing itself from `ps_thread' while another one is iterating over it.
> Since we're currently concerned about the iteration in single_thread_set()
> notice that `P_WEXIT' is checked first.  So if my understanding is correct
> is should be enough to do the following:
> 
>   SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link);
>   smr_barrier();
> 
> That would delays exit1() a bit but doesn't involve callback which is
> IMHO simpler.
> 
> Did I miss anything?

Did you run a make build with that smr_barrier() in it and checked that it
does not cause a slow down? I am sceptical, smr_barrier() is a very slow
construct which introduces large delays and should be avoided whenever
possible.

-- 
:wq Claudio



Re: Prevent race in single_thread_set()

2020-12-01 Thread Claudio Jeker
On Tue, Dec 01, 2020 at 10:27:15AM -0300, Martin Pieuchot wrote:
> On 01/12/20(Tue) 10:21, Claudio Jeker wrote:
> > On Mon, Nov 30, 2020 at 07:19:28PM -0300, Martin Pieuchot wrote:
> > > On 04/11/20(Wed) 11:19, Martin Pieuchot wrote:
> > > > Here's a 3rd approach to solve the TOCTOU race in single_thread_set().
> > > > The issue being that the lock serializing access to `ps_single' is not
> > > > held when calling single_thread_check().
> > > > 
> > > > The approach below is controversial because it extends the scope of the
> > > > SCHED_LOCK().  On the other hand, the two others approaches that both
> > > > add a new lock to avoid this race ignore the fact that accesses to
> > > > `ps_single' are currently not clearly serialized w/o KERNEL_LOCK().
> > > > 
> > > > So the diff below improves the situation in that regard and do not add
> > > > more complexity due to the use of multiple locks.  After having looked
> > > > for a way to split the SCHED_LOCK() I believe this is the simplest
> > > > approach.
> > > > 
> > > > I deliberately used a *_locked() function to avoid grabbing the lock
> > > > recursively as I'm trying to get rid of the recursion, see the other
> > > > thread on tech@.
> > > > 
> > > > That said the uses of `ps_single' in ptrace_ctrl() are not covered by
> > > > this diff and I'd be glad to hear some comments about them.  This is
> > > > fine as long as all the code using `ps_single' runs under KERNEL_LOCK()
> > > > but since we're trying to get the single_thread_* API out of it, this
> > > > need to be addressed.
> > > > 
> > > > Note that this diff introduces a helper for initializing ps_single*
> > > > values in order to keep all the accesses of those fields in the same
> > > > file.
> > > 
> > > Anyone?  With this only the `ps_threads' iteration must receive some
> > > love in order to take the single_thread_* API out of the KERNEL_LOCK().
> > > For that I just sent a SMR_TAILQ conversion diff.
> > > 
> > > Combined with the diff to remove the recursive attribute of the
> > > SCHED_LOCK() we're ready to split it into multiple mutexes.
> > > 
> > > Does that make any sense?  Comments?  Oks?
> > > 
> > > > Index: kern/kern_fork.c
> > > > ===
> > > > RCS file: /cvs/src/sys/kern/kern_fork.c,v
> > > > retrieving revision 1.226
> > > > diff -u -p -r1.226 kern_fork.c
> > > > --- kern/kern_fork.c25 Oct 2020 01:55:18 -  1.226
> > > > +++ kern/kern_fork.c4 Nov 2020 12:52:54 -
> > > > @@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta
> > > >  * if somebody else wants to take us to single threaded mode,
> > > >  * count ourselves in.
> > > >  */
> > > > -   if (pr->ps_single) {
> > > > -   atomic_inc_int(&pr->ps_singlecount);
> > > > -   atomic_setbits_int(&p->p_flag, P_SUSPSINGLE);
> > > > -   }
> > > > +   single_thread_init(p);
> > 
> > This is not the right name for this function. It does not initalize
> > anything. Why is this indirection needed? I would just put the
> > SCHED_LOCK() around this bit. It makes more sense especially with the
> > comment above.
> 
> Updated diff does that.  I introduced a function because it helps me
> having all the locking in the same place.
> 
> > > > Index: kern/kern_sig.c
> > > > ===
> > > > RCS file: /cvs/src/sys/kern/kern_sig.c,v
> > > > retrieving revision 1.263
> > > > diff -u -p -r1.263 kern_sig.c
> > > > --- kern/kern_sig.c 16 Sep 2020 13:50:42 -  1.263
> > > > +++ kern/kern_sig.c 4 Nov 2020 12:38:35 -
> > > > @@ -1932,11 +1932,27 @@ userret(struct proc *p)
> > > > p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri;
> > > >  }
> > > >  
> > > > +void
> > > > +single_thread_init(struct proc *p)
> > > > +{
> > > > +   struct process *pr = p->p_p;
> > > > +   int s;
> > > > +
> > > > +   SCHED_LOCK(s);
> > > > +   if (pr->ps_single) {
> > > > +   atomic_inc_int(&pr->ps_singlecount);
> > > > +   atomic_setbits_int(&p->p_flag, P_SUSPSINGLE);
> > > > +   }
> > > > +   SCHED_UNLOCK(s);
> > > > +}
> > > > +
> > > >  int
> > > > -single_thread_check(struct proc *p, int deep)
> > > > +_single_thread_check_locked(struct proc *p, int deep)
> > 
> > Please don't add the leading _ to this function. There is no need for it.
> 
> Done.
> 
> > > > @@ -2014,7 +2043,6 @@ single_thread_set(struct proc *p, enum s
> > > > panic("single_thread_mode = %d", mode);
> > > >  #endif
> > > > }
> > > > -   SCHED_LOCK(s);
> > > > pr->ps_singlecount = 0;
> > > > membar_producer();
> > > > pr->ps_single = p;
> > 
> > I think you can remove the membar_producer() here. It is of no use
> > anymore.
> 
> I can indeed because SCHED_LOCK() is currently taken by sleep_setup().
> I'd argue this is an implementation detail and I hope to change th

Re: More uvm_pagefree() lock fixes

2020-12-01 Thread Mark Kettenis
> Date: Tue, 1 Dec 2020 11:08:50 -0300
> From: Martin Pieuchot 
> 
> As recently pointed out by jmatthew@ these need the lock as well, ok?

No.  These pages are "unmanaged" and therefore they can not be on any
of the page queues and locking those page queues would be pointless.

We probably should adjust the documentation to reflect this.

> Index: uvm/uvm_km.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_km.c,v
> retrieving revision 1.137
> diff -u -p -r1.137 uvm_km.c
> --- uvm/uvm_km.c  23 May 2020 06:15:09 -  1.137
> +++ uvm/uvm_km.c  1 Dec 2020 14:00:38 -
> @@ -291,7 +291,9 @@ uvm_km_pgremove_intrsafe(vaddr_t start, 
>   pg = PHYS_TO_VM_PAGE(pa);
>   if (pg == NULL)
>   panic("uvm_km_pgremove_intrsafe: no page");
> + uvm_lock_pageq();
>   uvm_pagefree(pg);
> + uvm_unlock_pageq();
>   }
>  }
>  
> @@ -799,7 +801,9 @@ uvm_km_doputpage(struct uvm_km_free_page
>   if (freeva)
>   uvm_unmap(kernel_map, va, va + PAGE_SIZE);
>  
> + uvm_lock_pageq();
>   uvm_pagefree(pg);
> + uvm_unlock_pageq();
>   return (nextfp);
>  }
>  #endif   /* !__HAVE_PMAP_DIRECT */
> 
> 



More uvm_pagefree() lock fixes

2020-12-01 Thread Martin Pieuchot
As recently pointed out by jmatthew@ these need the lock as well, ok?

Index: uvm/uvm_km.c
===
RCS file: /cvs/src/sys/uvm/uvm_km.c,v
retrieving revision 1.137
diff -u -p -r1.137 uvm_km.c
--- uvm/uvm_km.c23 May 2020 06:15:09 -  1.137
+++ uvm/uvm_km.c1 Dec 2020 14:00:38 -
@@ -291,7 +291,9 @@ uvm_km_pgremove_intrsafe(vaddr_t start, 
pg = PHYS_TO_VM_PAGE(pa);
if (pg == NULL)
panic("uvm_km_pgremove_intrsafe: no page");
+   uvm_lock_pageq();
uvm_pagefree(pg);
+   uvm_unlock_pageq();
}
 }
 
@@ -799,7 +801,9 @@ uvm_km_doputpage(struct uvm_km_free_page
if (freeva)
uvm_unmap(kernel_map, va, va + PAGE_SIZE);
 
+   uvm_lock_pageq();
uvm_pagefree(pg);
+   uvm_unlock_pageq();
return (nextfp);
 }
 #endif /* !__HAVE_PMAP_DIRECT */



Re: Use SMR_TAILQ for `ps_threads'

2020-12-01 Thread Martin Pieuchot
On 01/12/20(Tue) 21:47, Jonathan Matthew wrote:
> On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote:
> > On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote:
> > > Every multi-threaded process keeps a list of threads in `ps_threads'.
> > > This list is iterated in interrupt and process context which makes it
> > > complicated to protect it with a rwlock.
> > > 
> > > One of the places where such iteration is done is inside the tsleep(9)
> > > routines, directly in single_thread_check() or via CURSIG().  In order
> > > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to
> > > use SMR_TAILQ.  This has the advantage of not introducing lock
> > > dependencies and allow us to address every iteration one-by-one.
> > > 
> > > Diff below is a first step into this direction, it replaces the existing
> > > TAILQ_* macros by the locked version of SMR_TAILQ*.  This is mostly lifted
> > > from claudio@'s diff and should not introduce any side effect.
> > > 
> > > ok?
> > > 
> > > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c
> > > index 390307c4c81..40a10e4c1c5 100644
> > > --- sys/uvm/uvm_glue.c
> > > +++ sys/uvm/uvm_glue.c
> > > @@ -369,7 +369,7 @@ uvm_swapout_threads(void)
> > >* the smallest p_slptime
> > >*/
> > >   slpp = NULL;
> > > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
> > > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) {
> > >   switch (p->p_stat) {
> > >   case SRUN:
> > >   case SONPROC:
> > > 
> > 
> > Why did you not include the smr_call() to safely free struct proc in this
> > diff?

So we can discuss it separately,  I'm happy to do it now :)

> I was wondering about this too.  Freeing the struct proc is already delayed
> by some amount since it happens in the reaper or in the parent process, does
> it make sense to combine that with the SMR wait?

exit1() for a multi-threaded process does the following:

atomic_setbits_int(&p->p_flag, P_WEXIT);

single_thread_check(p, 0);  // directly or via single_thread_set()

SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link);


Note that exit1() can't be called in parallel.

If exit1() is called with EXIT_NORMAL, the current thread will start by
setting `ps_single' then iterate over `ps_thread'.  This is done before
updating `ps_thread' so there's no race in that case.

If exit1() is called with EXIT_THREAD, the current thread might end up
removing itself from `ps_thread' while another one is iterating over it.
Since we're currently concerned about the iteration in single_thread_set()
notice that `P_WEXIT' is checked first.  So if my understanding is correct
is should be enough to do the following:

SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link);
smr_barrier();

That would delays exit1() a bit but doesn't involve callback which is
IMHO simpler.

Did I miss anything?



Re: Prevent race in single_thread_set()

2020-12-01 Thread Martin Pieuchot
On 01/12/20(Tue) 10:21, Claudio Jeker wrote:
> On Mon, Nov 30, 2020 at 07:19:28PM -0300, Martin Pieuchot wrote:
> > On 04/11/20(Wed) 11:19, Martin Pieuchot wrote:
> > > Here's a 3rd approach to solve the TOCTOU race in single_thread_set().
> > > The issue being that the lock serializing access to `ps_single' is not
> > > held when calling single_thread_check().
> > > 
> > > The approach below is controversial because it extends the scope of the
> > > SCHED_LOCK().  On the other hand, the two others approaches that both
> > > add a new lock to avoid this race ignore the fact that accesses to
> > > `ps_single' are currently not clearly serialized w/o KERNEL_LOCK().
> > > 
> > > So the diff below improves the situation in that regard and do not add
> > > more complexity due to the use of multiple locks.  After having looked
> > > for a way to split the SCHED_LOCK() I believe this is the simplest
> > > approach.
> > > 
> > > I deliberately used a *_locked() function to avoid grabbing the lock
> > > recursively as I'm trying to get rid of the recursion, see the other
> > > thread on tech@.
> > > 
> > > That said the uses of `ps_single' in ptrace_ctrl() are not covered by
> > > this diff and I'd be glad to hear some comments about them.  This is
> > > fine as long as all the code using `ps_single' runs under KERNEL_LOCK()
> > > but since we're trying to get the single_thread_* API out of it, this
> > > need to be addressed.
> > > 
> > > Note that this diff introduces a helper for initializing ps_single*
> > > values in order to keep all the accesses of those fields in the same
> > > file.
> > 
> > Anyone?  With this only the `ps_threads' iteration must receive some
> > love in order to take the single_thread_* API out of the KERNEL_LOCK().
> > For that I just sent a SMR_TAILQ conversion diff.
> > 
> > Combined with the diff to remove the recursive attribute of the
> > SCHED_LOCK() we're ready to split it into multiple mutexes.
> > 
> > Does that make any sense?  Comments?  Oks?
> > 
> > > Index: kern/kern_fork.c
> > > ===
> > > RCS file: /cvs/src/sys/kern/kern_fork.c,v
> > > retrieving revision 1.226
> > > diff -u -p -r1.226 kern_fork.c
> > > --- kern/kern_fork.c  25 Oct 2020 01:55:18 -  1.226
> > > +++ kern/kern_fork.c  4 Nov 2020 12:52:54 -
> > > @@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta
> > >* if somebody else wants to take us to single threaded mode,
> > >* count ourselves in.
> > >*/
> > > - if (pr->ps_single) {
> > > - atomic_inc_int(&pr->ps_singlecount);
> > > - atomic_setbits_int(&p->p_flag, P_SUSPSINGLE);
> > > - }
> > > + single_thread_init(p);
> 
> This is not the right name for this function. It does not initalize
> anything. Why is this indirection needed? I would just put the
> SCHED_LOCK() around this bit. It makes more sense especially with the
> comment above.

Updated diff does that.  I introduced a function because it helps me
having all the locking in the same place.

> > > Index: kern/kern_sig.c
> > > ===
> > > RCS file: /cvs/src/sys/kern/kern_sig.c,v
> > > retrieving revision 1.263
> > > diff -u -p -r1.263 kern_sig.c
> > > --- kern/kern_sig.c   16 Sep 2020 13:50:42 -  1.263
> > > +++ kern/kern_sig.c   4 Nov 2020 12:38:35 -
> > > @@ -1932,11 +1932,27 @@ userret(struct proc *p)
> > >   p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri;
> > >  }
> > >  
> > > +void
> > > +single_thread_init(struct proc *p)
> > > +{
> > > + struct process *pr = p->p_p;
> > > + int s;
> > > +
> > > + SCHED_LOCK(s);
> > > + if (pr->ps_single) {
> > > + atomic_inc_int(&pr->ps_singlecount);
> > > + atomic_setbits_int(&p->p_flag, P_SUSPSINGLE);
> > > + }
> > > + SCHED_UNLOCK(s);
> > > +}
> > > +
> > >  int
> > > -single_thread_check(struct proc *p, int deep)
> > > +_single_thread_check_locked(struct proc *p, int deep)
> 
> Please don't add the leading _ to this function. There is no need for it.

Done.

> > > @@ -2014,7 +2043,6 @@ single_thread_set(struct proc *p, enum s
> > >   panic("single_thread_mode = %d", mode);
> > >  #endif
> > >   }
> > > - SCHED_LOCK(s);
> > >   pr->ps_singlecount = 0;
> > >   membar_producer();
> > >   pr->ps_single = p;
> 
> I think you can remove the membar_producer() here. It is of no use
> anymore.

I can indeed because SCHED_LOCK() is currently taken by sleep_setup().
I'd argue this is an implementation detail and I hope to change that
soon.  So unless I'm completely mistaken, I'd keep it.  But again I
don't have a strong opinion.

Index: kern/kern_fork.c
===
RCS file: /cvs/src/sys/kern/kern_fork.c,v
retrieving revision 1.226
diff -u -p -r1.226 kern_fork.c
--- kern/kern_fork.c25 Oct 2020 01:55:18 -  1.226
+++ kern/kern_fork.c1 Dec 2020 13:04:37 -
@@ -516,7 +516,

Re: Use SMR_TAILQ for `ps_threads'

2020-12-01 Thread Claudio Jeker
On Tue, Dec 01, 2020 at 09:47:35PM +1000, Jonathan Matthew wrote:
> On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote:
> > On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote:
> > > Every multi-threaded process keeps a list of threads in `ps_threads'.
> > > This list is iterated in interrupt and process context which makes it
> > > complicated to protect it with a rwlock.
> > > 
> > > One of the places where such iteration is done is inside the tsleep(9)
> > > routines, directly in single_thread_check() or via CURSIG().  In order
> > > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to
> > > use SMR_TAILQ.  This has the advantage of not introducing lock
> > > dependencies and allow us to address every iteration one-by-one.
> > > 
> > > Diff below is a first step into this direction, it replaces the existing
> > > TAILQ_* macros by the locked version of SMR_TAILQ*.  This is mostly lifted
> > > from claudio@'s diff and should not introduce any side effect.
> > > 
> > > ok?
> > > 
> > > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c
> > > index 390307c4c81..40a10e4c1c5 100644
> > > --- sys/uvm/uvm_glue.c
> > > +++ sys/uvm/uvm_glue.c
> > > @@ -369,7 +369,7 @@ uvm_swapout_threads(void)
> > >* the smallest p_slptime
> > >*/
> > >   slpp = NULL;
> > > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
> > > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) {
> > >   switch (p->p_stat) {
> > >   case SRUN:
> > >   case SONPROC:
> > > 
> > 
> > Why did you not include the smr_call() to safely free struct proc in this
> > diff?
> 
> I was wondering about this too.  Freeing the struct proc is already delayed
> by some amount since it happens in the reaper or in the parent process, does
> it make sense to combine that with the SMR wait?

My diff actually did that. My fear was that SMR callbacks can be very
slow and so combining the two was worthwhile. 

-- 
:wq Claudio



Re: rad(8): rdomain support

2020-12-01 Thread Florian Obser
I copied the (void)strlcpy from somewhere, likely ifconfig originally.
Thanks for pointing it out, I shall remove it before commit.
I don't like that style and don't think it helps anything during review since 
one needs to check anyway that truncation either can't happen or doesn't matter.

On 1 December 2020 12:02:14 CET, Theo Buehler  wrote:
>On Mon, Nov 30, 2020 at 05:46:57PM +0100, Florian Obser wrote:
>> On Sun, Nov 29, 2020 at 12:20:31PM +0100, Florian Obser wrote:
>> > 
>> > Let rad(8) handle all rdomains in a single daemon, similar to
>previous
>> > work in slaacd.
>> > 
>> > Suggested / requested by tb who showed me previous work by reyk
>which
>> > unfortunately predated my work in slaacd and followed a different
>> > pattern.
>> > 
>> > There has already been a fair bit of testing and input from tb on
>> > previous iterations.
>> > 
>> > More tests, OKs?
>> > 
>> 
>> Updated diff after some testing by tb@
>> 
>> - our AF_ROUTE socket needs a RTABLE_ANY ROUTE_TABLEFILTER otherwise
>>   we don't see link-local addresses arriving in rdomains != 0.
>> - check if the arriving link-local address is tentative (or a
>>   douplicate) and ignore it because we can't use it and sendmsg
>failes.
>>   Monitor RTM_CHGADDRATTR messages to see when the link-local address
>>   becomes valid
>
>Thank you so much for this!
>
>I think I have done all the testing I can for my use case and this now
>looks all good. I think this covers most, if not all, code paths added
>by this diff. I did as thorough a review as I can, being foreign to
>this
>part of the tree.
>
>ok tb
>
>I only have one tiny comment:
>
>> +++ frontend.c
>
>[...]
>
>> @@ -746,15 +743,29 @@ get_link_state(char *if_name)
>>  return ls;
>>  }
>>  
>> +int
>> +get_ifrdomain(char *if_name)
>> +{
>> +struct ifreq ifr;
>> +
>> +(void) strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name));
>
>I'm not sure why you explicitly indicate that you don't want to check,
>while you don't so in interface_has_linklocal_address() (both copy the
>same name to an array of length IFNAMSIZ).

-- 
Sent from a mobile device. Please excuse poor formating.



Re: Use SMR_TAILQ for `ps_threads'

2020-12-01 Thread Jonathan Matthew
On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote:
> On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote:
> > Every multi-threaded process keeps a list of threads in `ps_threads'.
> > This list is iterated in interrupt and process context which makes it
> > complicated to protect it with a rwlock.
> > 
> > One of the places where such iteration is done is inside the tsleep(9)
> > routines, directly in single_thread_check() or via CURSIG().  In order
> > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to
> > use SMR_TAILQ.  This has the advantage of not introducing lock
> > dependencies and allow us to address every iteration one-by-one.
> > 
> > Diff below is a first step into this direction, it replaces the existing
> > TAILQ_* macros by the locked version of SMR_TAILQ*.  This is mostly lifted
> > from claudio@'s diff and should not introduce any side effect.
> > 
> > ok?
> > 
> > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c
> > index 390307c4c81..40a10e4c1c5 100644
> > --- sys/uvm/uvm_glue.c
> > +++ sys/uvm/uvm_glue.c
> > @@ -369,7 +369,7 @@ uvm_swapout_threads(void)
> >  * the smallest p_slptime
> >  */
> > slpp = NULL;
> > -   TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
> > +   SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) {
> > switch (p->p_stat) {
> > case SRUN:
> > case SONPROC:
> > 
> 
> Why did you not include the smr_call() to safely free struct proc in this
> diff?

I was wondering about this too.  Freeing the struct proc is already delayed
by some amount since it happens in the reaper or in the parent process, does
it make sense to combine that with the SMR wait?



Re: wireguard + witness

2020-12-01 Thread Stuart Henderson
On 2020/12/01 21:27, Matt Dunwoodie wrote:
> On Tue, 1 Dec 2020 10:32:29 +0100
> Sebastien Marie  wrote:
> 
> > Jason, Matt,
> > 
> > sthen@ told me that the same lock is reported several times (exactly,
> > two locks are reported several times: lock1, lock2, lock1, lock2...)
> > 
> > witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in
> > internal part of `struct rwlock'. Next it sets it, and skip futher
> > analysis for this particular lock.
> > 
> > if the same lock is reported several times, it means the memory is
> > zeroed (and so the flag removed). it could be intented or not. but in
> > all cases, a rwlock should be properly initialized with rw_init() or
> > RWLOCK_INITIALIZER() before use.
> > 
> > I don't have enough experiency with wg(4) stuff to well understand the
> > issue. in wg_decap(), the peer is get from a `struct wg_tag` (from
> > wg_tag_get()). If i didn't mess myself, the p_remote could come from
> > wg_send_initiation() or wg_send_response(). but for both, it comes
> > from wg_peer_create(), and p_remote is initialized with
> > noise_remote_init() (and so lock have proper rw_init()).
> > 
> > do you have idea on the cause of the lost of the rwlock flag ?
> > 
> > if you want to test it with witness(4) enabled, you will need to build
> > a kernel with "option WITNESS" in config file. you could uncomment it
> > from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean,
> > make
> 
> Hi,
> 
> This certainly isn't intentional.
> 
> I had a quick look and believe the following patch should fix it. The
> rwlock in the antireplay counter is rw_init'ed and then bzero'ed,
> matching the symptoms. It also explains why lock1,lock2,lock1,... was
> seen as there are generally two counters in use (one for the current
> session and one for the previous session). It wouldn't make sense for
> r_keypair_lock to be in two different locations for the same peer.
> 
> I haven't tested it my side but I'm quite confident it should fix the
> unitialised lock issue.
> 
> Cheers,
> Matt

Thanks. Confirmed this works ok and stops witness from triggering,
I'll go ahead and I'll commit it.


> --- net/wg_noise.c
> +++ net/wg_noise.c
> @@ -467,8 +467,8 @@ noise_remote_begin_session(struct noise_remote *r)
>   kp.kp_local_index = hs->hs_local_index;
>   kp.kp_remote_index = hs->hs_remote_index;
>   getnanouptime(&kp.kp_birthdate);
> - rw_init(&kp.kp_ctr.c_lock, "noise_counter");
>   bzero(&kp.kp_ctr, sizeof(kp.kp_ctr));
> + rw_init(&kp.kp_ctr.c_lock, "noise_counter");
>  
>   /* Now we need to add_new_keypair */
>   rw_enter_write(&r->r_keypair_lock);
> 



Re: rad(8): rdomain support

2020-12-01 Thread Theo Buehler
On Mon, Nov 30, 2020 at 05:46:57PM +0100, Florian Obser wrote:
> On Sun, Nov 29, 2020 at 12:20:31PM +0100, Florian Obser wrote:
> > 
> > Let rad(8) handle all rdomains in a single daemon, similar to previous
> > work in slaacd.
> > 
> > Suggested / requested by tb who showed me previous work by reyk which
> > unfortunately predated my work in slaacd and followed a different
> > pattern.
> > 
> > There has already been a fair bit of testing and input from tb on
> > previous iterations.
> > 
> > More tests, OKs?
> > 
> 
> Updated diff after some testing by tb@
> 
> - our AF_ROUTE socket needs a RTABLE_ANY ROUTE_TABLEFILTER otherwise
>   we don't see link-local addresses arriving in rdomains != 0.
> - check if the arriving link-local address is tentative (or a
>   douplicate) and ignore it because we can't use it and sendmsg failes.
>   Monitor RTM_CHGADDRATTR messages to see when the link-local address
>   becomes valid

Thank you so much for this!

I think I have done all the testing I can for my use case and this now
looks all good. I think this covers most, if not all, code paths added
by this diff. I did as thorough a review as I can, being foreign to this
part of the tree.

ok tb

I only have one tiny comment:

> +++ frontend.c

[...]

> @@ -746,15 +743,29 @@ get_link_state(char *if_name)
>   return ls;
>  }
>  
> +int
> +get_ifrdomain(char *if_name)
> +{
> + struct ifreq ifr;
> +
> + (void) strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name));

I'm not sure why you explicitly indicate that you don't want to check,
while you don't so in interface_has_linklocal_address() (both copy the
same name to an array of length IFNAMSIZ).



Re: wireguard + witness

2020-12-01 Thread Matt Dunwoodie
On Tue, 1 Dec 2020 10:32:29 +0100
Sebastien Marie  wrote:

> Jason, Matt,
> 
> sthen@ told me that the same lock is reported several times (exactly,
> two locks are reported several times: lock1, lock2, lock1, lock2...)
> 
> witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in
> internal part of `struct rwlock'. Next it sets it, and skip futher
> analysis for this particular lock.
> 
> if the same lock is reported several times, it means the memory is
> zeroed (and so the flag removed). it could be intented or not. but in
> all cases, a rwlock should be properly initialized with rw_init() or
> RWLOCK_INITIALIZER() before use.
> 
> I don't have enough experiency with wg(4) stuff to well understand the
> issue. in wg_decap(), the peer is get from a `struct wg_tag` (from
> wg_tag_get()). If i didn't mess myself, the p_remote could come from
> wg_send_initiation() or wg_send_response(). but for both, it comes
> from wg_peer_create(), and p_remote is initialized with
> noise_remote_init() (and so lock have proper rw_init()).
> 
> do you have idea on the cause of the lost of the rwlock flag ?
> 
> if you want to test it with witness(4) enabled, you will need to build
> a kernel with "option WITNESS" in config file. you could uncomment it
> from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean,
> make

Hi,

This certainly isn't intentional.

I had a quick look and believe the following patch should fix it. The
rwlock in the antireplay counter is rw_init'ed and then bzero'ed,
matching the symptoms. It also explains why lock1,lock2,lock1,... was
seen as there are generally two counters in use (one for the current
session and one for the previous session). It wouldn't make sense for
r_keypair_lock to be in two different locations for the same peer.

I haven't tested it my side but I'm quite confident it should fix the
unitialised lock issue.

Cheers,
Matt

--- net/wg_noise.c
+++ net/wg_noise.c
@@ -467,8 +467,8 @@ noise_remote_begin_session(struct noise_remote *r)
kp.kp_local_index = hs->hs_local_index;
kp.kp_remote_index = hs->hs_remote_index;
getnanouptime(&kp.kp_birthdate);
-   rw_init(&kp.kp_ctr.c_lock, "noise_counter");
bzero(&kp.kp_ctr, sizeof(kp.kp_ctr));
+   rw_init(&kp.kp_ctr.c_lock, "noise_counter");
 
/* Now we need to add_new_keypair */
rw_enter_write(&r->r_keypair_lock);



Re: wireguard + witness

2020-12-01 Thread Stuart Henderson
On 2020/12/01 10:32, Sebastien Marie wrote:
> On Tue, Dec 01, 2020 at 06:59:22AM +0100, Sebastien Marie wrote:
> > On Mon, Nov 30, 2020 at 11:14:46PM +, Stuart Henderson wrote:
> > > Thought I'd try a WITNESS kernel to see if that gives any clues about
> > > what's going on with my APU crashing all over the place (long shot but
> > > I got bored with trying different older kernels..)
> > > 
> > > I see these from time to time (one during netstart, and another 4 in
> > > 15 mins uptime), anyone know if they're important?
> > 
> > this check ("lock_object uninitialized") was recently added in witness.
> > 
> > it means that the rwlock was uninitialized (the witness flag
> > LO_INITIALIZED isn't present whereas it should) when witness check the
> > lock.
> > 
> > it could be:
> > - someone omits to call rw_init() or RWLOCK_INITIALIZER() (possible bug if 
> > memory isn't zero)
> > - the struct has been zeroed (possible bug too)
> > 
> > > witness: lock_object uninitialized: 0x80bcf0d8
> > > Starting stack trace...
> > > witness_checkorder(80bcf0d8,9,0) at witness_checkorder+0xab
> > > rw_enter_write(80bcf0c8) at rw_enter_write+0x43
> > > noise_remote_decrypt(80bcea48,978dc3d,0,fd805e22cb7c,10) at 
> > > noise_remote_decrypt+0x135
> > > wg_decap(8054a000,fd8061835200) at wg_decap+0xda
> > > wg_decap_worker(8054a000) at wg_decap_worker+0x7a
> > > taskq_thread(8012d900) at taskq_thread+0x9f
> > > end trace frame: 0x0, count: 251
> > > End of stack trace.
> > 
> > from the trace, the sole rw_enter_write() usage in noise_remote_decrypt() is
> > 
> >  rw_enter_write(&r->r_keypair_lock)
> > 
> > but I am seeing few rw_init() on r_keypair_lock. I will see if I could
> > see the source of the problem.
> > 
> 
> Jason, Matt,
> 
> sthen@ told me that the same lock is reported several times (exactly,
> two locks are reported several times: lock1, lock2, lock1, lock2...)
> 
> witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in
> internal part of `struct rwlock'. Next it sets it, and skip futher
> analysis for this particular lock.
> 
> if the same lock is reported several times, it means the memory is
> zeroed (and so the flag removed). it could be intented or not. but in
> all cases, a rwlock should be properly initialized with rw_init() or
> RWLOCK_INITIALIZER() before use.
> 
> I don't have enough experiency with wg(4) stuff to well understand the
> issue. in wg_decap(), the peer is get from a `struct wg_tag` (from
> wg_tag_get()). If i didn't mess myself, the p_remote could come from
> wg_send_initiation() or wg_send_response(). but for both, it comes
> from wg_peer_create(), and p_remote is initialized with
> noise_remote_init() (and so lock have proper rw_init()).
> 
> do you have idea on the cause of the lost of the rwlock flag ?
> 
> if you want to test it with witness(4) enabled, you will need to build
> a kernel with "option WITNESS" in config file. you could uncomment it
> from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean,
> make
> 
> Thanks.
> -- 
> Sebastien Marie
> 

Config of the machine is simple; one wg peer, the peer is on a fixed
internet address, this machine is natted but the address won't change.
It isn't routing, wg is only to reach the machine itself.

Here's a more complete console log but I don't think it adds anything
to the above.


pf enabled
starting network
em0: no link
em0: 192.168.42.220 lease accepted from 192.168.42.21 (00:1b:21:9b:68:10)
add net 192.168.40.0/21: gateway 192.168.43.33
add net 195.95.187.0/24: gateway 192.168.43.33
reordering libraries:witness: lock_object uninitialized: 0x80bcf580
Starting stack trace...
witness_checkorder(80bcf580,9,0) at witness_checkorder+0xab
rw_enter_write(80bcf570) at rw_enter_write+0x43
noise_remote_decrypt(80bcea48,3f71de8d,0,fd80073918fc,10) at 
noise_remote_decrypt+0x135
wg_decap(8054a000,fd8067faf900) at wg_decap+0xda
wg_decap_worker(8054a000) at wg_decap_worker+0x7a
taskq_thread(8012d900) at taskq_thread+0x9f
end trace frame: 0x0, count: 251
End of stack trace.
 done.
starting early daemons: syslogd pflogd ntpd.
starting RPC daemons:.
savecore: no core dump
checking quotas: done.
clearing /tmp
kern.securelevel: 0 -> 1
creating runtime link editor directory cache.
preserving editor files.
starting network daemons: sshd smtpd sndiod.
starting package daemons: conserver lldpd.
starting local daemons: cron.
Mon Nov 30 22:59:18 GMT 2020
reorder_kernel: failed -- see /usr/share/relink/kernel/WITNESS.MP/relink.log

OpenBSD/amd64 (cons1) (tty00)

login: witness: lock_object uninitialized: 0x80bcf0d8
Starting stack trace...
witness_checkorder(80bcf0d8,9,0) at witness_checkorder+0xab
rw_enter_write(80bcf0c8) at rw_enter_write+0x43
noise_remote_decrypt(80bcea48,c4992785,0,fd80073c89bc,10) at 
noise_remote_decrypt+0x135
wg_decap(8054a00

Re: wireguard + witness

2020-12-01 Thread Sebastien Marie
On Tue, Dec 01, 2020 at 06:59:22AM +0100, Sebastien Marie wrote:
> On Mon, Nov 30, 2020 at 11:14:46PM +, Stuart Henderson wrote:
> > Thought I'd try a WITNESS kernel to see if that gives any clues about
> > what's going on with my APU crashing all over the place (long shot but
> > I got bored with trying different older kernels..)
> > 
> > I see these from time to time (one during netstart, and another 4 in
> > 15 mins uptime), anyone know if they're important?
> 
> this check ("lock_object uninitialized") was recently added in witness.
> 
> it means that the rwlock was uninitialized (the witness flag
> LO_INITIALIZED isn't present whereas it should) when witness check the
> lock.
> 
> it could be:
> - someone omits to call rw_init() or RWLOCK_INITIALIZER() (possible bug if 
> memory isn't zero)
> - the struct has been zeroed (possible bug too)
> 
> > witness: lock_object uninitialized: 0x80bcf0d8
> > Starting stack trace...
> > witness_checkorder(80bcf0d8,9,0) at witness_checkorder+0xab
> > rw_enter_write(80bcf0c8) at rw_enter_write+0x43
> > noise_remote_decrypt(80bcea48,978dc3d,0,fd805e22cb7c,10) at 
> > noise_remote_decrypt+0x135
> > wg_decap(8054a000,fd8061835200) at wg_decap+0xda
> > wg_decap_worker(8054a000) at wg_decap_worker+0x7a
> > taskq_thread(8012d900) at taskq_thread+0x9f
> > end trace frame: 0x0, count: 251
> > End of stack trace.
> 
> from the trace, the sole rw_enter_write() usage in noise_remote_decrypt() is
> 
>  rw_enter_write(&r->r_keypair_lock)
> 
> but I am seeing few rw_init() on r_keypair_lock. I will see if I could
> see the source of the problem.
> 

Jason, Matt,

sthen@ told me that the same lock is reported several times (exactly,
two locks are reported several times: lock1, lock2, lock1, lock2...)

witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in
internal part of `struct rwlock'. Next it sets it, and skip futher
analysis for this particular lock.

if the same lock is reported several times, it means the memory is
zeroed (and so the flag removed). it could be intented or not. but in
all cases, a rwlock should be properly initialized with rw_init() or
RWLOCK_INITIALIZER() before use.

I don't have enough experiency with wg(4) stuff to well understand the
issue. in wg_decap(), the peer is get from a `struct wg_tag` (from
wg_tag_get()). If i didn't mess myself, the p_remote could come from
wg_send_initiation() or wg_send_response(). but for both, it comes
from wg_peer_create(), and p_remote is initialized with
noise_remote_init() (and so lock have proper rw_init()).

do you have idea on the cause of the lost of the rwlock flag ?

if you want to test it with witness(4) enabled, you will need to build
a kernel with "option WITNESS" in config file. you could uncomment it
from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean,
make

Thanks.
-- 
Sebastien Marie



Re: Use SMR_TAILQ for `ps_threads'

2020-12-01 Thread Claudio Jeker
On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote:
> Every multi-threaded process keeps a list of threads in `ps_threads'.
> This list is iterated in interrupt and process context which makes it
> complicated to protect it with a rwlock.
> 
> One of the places where such iteration is done is inside the tsleep(9)
> routines, directly in single_thread_check() or via CURSIG().  In order
> to take this code path out of the KERNEL_LOCK(), claudio@ proposed to
> use SMR_TAILQ.  This has the advantage of not introducing lock
> dependencies and allow us to address every iteration one-by-one.
> 
> Diff below is a first step into this direction, it replaces the existing
> TAILQ_* macros by the locked version of SMR_TAILQ*.  This is mostly lifted
> from claudio@'s diff and should not introduce any side effect.
> 
> ok?
> 
> diff --git lib/libkvm/kvm_proc2.c lib/libkvm/kvm_proc2.c
> index 96f7dc91b92..1f4f9b914bb 100644
> --- lib/libkvm/kvm_proc2.c
> +++ lib/libkvm/kvm_proc2.c
> @@ -341,8 +341,9 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process 
> *pr,
>   kp.p_pctcpu = 0;
>   kp.p_stat = (process.ps_flags & PS_ZOMBIE) ? SDEAD :
>   SIDL;
> - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; 
> - p = TAILQ_NEXT(&proc, p_thr_link)) {
> + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads);
> + p != NULL;
> + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) {
>   if (KREAD(kd, (u_long)p, &proc)) {
>   _kvm_err(kd, kd->program,
>   "can't read proc at %lx",
> @@ -376,8 +377,8 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process 
> *pr,
>   if (!dothreads)
>   continue;
>  
> - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; 
> - p = TAILQ_NEXT(&proc, p_thr_link)) {
> + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); p != NULL;
> + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) {
>   if (KREAD(kd, (u_long)p, &proc)) {
>   _kvm_err(kd, kd->program,
>   "can't read proc at %lx",
> diff --git sys/kern/exec_elf.c sys/kern/exec_elf.c
> index 5e455208663..575273b306c 100644
> --- sys/kern/exec_elf.c
> +++ sys/kern/exec_elf.c
> @@ -85,6 +85,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -1360,7 +1361,7 @@ coredump_notes_elf(struct proc *p, void *iocookie, 
> size_t *sizep)
>* threads in the process have been stopped and the list can't
>* change.
>*/
> - TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) {
> + SMR_TAILQ_FOREACH_LOCKED(q, &pr->ps_threads, p_thr_link) {
>   if (q == p) /* we've taken care of this thread */
>   continue;
>   error = coredump_note_elf(q, iocookie, ¬esize);
> diff --git sys/kern/init_main.c sys/kern/init_main.c
> index fed6be19435..2b657ffe328 100644
> --- sys/kern/init_main.c
> +++ sys/kern/init_main.c
> @@ -519,7 +519,7 @@ main(void *framep)
>*/
>   LIST_FOREACH(pr, &allprocess, ps_list) {
>   nanouptime(&pr->ps_start);
> - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) {
> + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) {
>   nanouptime(&p->p_cpu->ci_schedstate.spc_runtime);
>   timespecclear(&p->p_rtime);
>   }
> diff --git sys/kern/kern_exit.c sys/kern/kern_exit.c
> index a20775419e3..ffc0158954c 100644
> --- sys/kern/kern_exit.c
> +++ sys/kern/kern_exit.c
> @@ -63,6 +63,7 @@
>  #ifdef SYSVSEM
>  #include 
>  #endif
> +#include 
>  #include 
>  
>  #include 
> @@ -161,7 +162,7 @@ exit1(struct proc *p, int xexit, int xsig, int flags)
>   }
>  
>   /* unlink ourselves from the active threads */
> - TAILQ_REMOVE(&pr->ps_threads, p, p_thr_link);
> + SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link);
>   if ((p->p_flag & P_THREAD) == 0) {
>   /* main thread gotta wait because it has the pid, et al */
>   while (pr->ps_refcnt > 1)
> @@ -724,7 +725,7 @@ process_zap(struct process *pr)
>   if (pr->ps_ptstat != NULL)
>   free(pr->ps_ptstat, M_SUBPROC, sizeof(*pr->ps_ptstat));
>   pool_put(&rusage_pool, pr->ps_ru);
> - KASSERT(TAILQ_EMPTY(&pr->ps_threads));
> + KASSERT(SMR_TAILQ_EMPTY_LOCKED(&pr->ps_threads));
>   lim_free(pr->ps_limit);
>   crfree(pr->ps_ucred);
>   pool_put(&process_pool, pr);
> diff --git sys/kern/kern_fork.c sys/kern/kern_fork.c
> index 9fb239bc8b4..e1cb587b2b8 100644
> --- sys/kern/kern_fork.c
> +++ sys/kern/kern_fork.c
> @@ -52,6 +52,7 @@
>  #include 
>  #include 
>  #include 
> +#include 

Re: Prevent race in single_thread_set()

2020-12-01 Thread Claudio Jeker
On Mon, Nov 30, 2020 at 07:19:28PM -0300, Martin Pieuchot wrote:
> On 04/11/20(Wed) 11:19, Martin Pieuchot wrote:
> > Here's a 3rd approach to solve the TOCTOU race in single_thread_set().
> > The issue being that the lock serializing access to `ps_single' is not
> > held when calling single_thread_check().
> > 
> > The approach below is controversial because it extends the scope of the
> > SCHED_LOCK().  On the other hand, the two others approaches that both
> > add a new lock to avoid this race ignore the fact that accesses to
> > `ps_single' are currently not clearly serialized w/o KERNEL_LOCK().
> > 
> > So the diff below improves the situation in that regard and do not add
> > more complexity due to the use of multiple locks.  After having looked
> > for a way to split the SCHED_LOCK() I believe this is the simplest
> > approach.
> > 
> > I deliberately used a *_locked() function to avoid grabbing the lock
> > recursively as I'm trying to get rid of the recursion, see the other
> > thread on tech@.
> > 
> > That said the uses of `ps_single' in ptrace_ctrl() are not covered by
> > this diff and I'd be glad to hear some comments about them.  This is
> > fine as long as all the code using `ps_single' runs under KERNEL_LOCK()
> > but since we're trying to get the single_thread_* API out of it, this
> > need to be addressed.
> > 
> > Note that this diff introduces a helper for initializing ps_single*
> > values in order to keep all the accesses of those fields in the same
> > file.
> 
> Anyone?  With this only the `ps_threads' iteration must receive some
> love in order to take the single_thread_* API out of the KERNEL_LOCK().
> For that I just sent a SMR_TAILQ conversion diff.
> 
> Combined with the diff to remove the recursive attribute of the
> SCHED_LOCK() we're ready to split it into multiple mutexes.
> 
> Does that make any sense?  Comments?  Oks?
> 
> > Index: kern/kern_fork.c
> > ===
> > RCS file: /cvs/src/sys/kern/kern_fork.c,v
> > retrieving revision 1.226
> > diff -u -p -r1.226 kern_fork.c
> > --- kern/kern_fork.c25 Oct 2020 01:55:18 -  1.226
> > +++ kern/kern_fork.c4 Nov 2020 12:52:54 -
> > @@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta
> >  * if somebody else wants to take us to single threaded mode,
> >  * count ourselves in.
> >  */
> > -   if (pr->ps_single) {
> > -   atomic_inc_int(&pr->ps_singlecount);
> > -   atomic_setbits_int(&p->p_flag, P_SUSPSINGLE);
> > -   }
> > +   single_thread_init(p);

This is not the right name for this function. It does not initalize
anything. Why is this indirection needed? I would just put the
SCHED_LOCK() around this bit. It makes more sense especially with the
comment above.
In my opinion a better name would be single_thread_add() or
single_thread_hatch() but even that I'm not too fond of.

> >  
> > /*
> >  * Return tid to parent thread and copy it out to userspace
> > Index: kern/kern_sig.c
> > ===
> > RCS file: /cvs/src/sys/kern/kern_sig.c,v
> > retrieving revision 1.263
> > diff -u -p -r1.263 kern_sig.c
> > --- kern/kern_sig.c 16 Sep 2020 13:50:42 -  1.263
> > +++ kern/kern_sig.c 4 Nov 2020 12:38:35 -
> > @@ -1932,11 +1932,27 @@ userret(struct proc *p)
> > p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri;
> >  }
> >  
> > +void
> > +single_thread_init(struct proc *p)
> > +{
> > +   struct process *pr = p->p_p;
> > +   int s;
> > +
> > +   SCHED_LOCK(s);
> > +   if (pr->ps_single) {
> > +   atomic_inc_int(&pr->ps_singlecount);
> > +   atomic_setbits_int(&p->p_flag, P_SUSPSINGLE);
> > +   }
> > +   SCHED_UNLOCK(s);
> > +}
> > +
> >  int
> > -single_thread_check(struct proc *p, int deep)
> > +_single_thread_check_locked(struct proc *p, int deep)

Please don't add the leading _ to this function. There is no need for it.

> >  {
> > struct process *pr = p->p_p;
> >  
> > +   SCHED_ASSERT_LOCKED();
> > +
> > if (pr->ps_single != NULL && pr->ps_single != p) {
> > do {
> > int s;
> > @@ -1949,14 +1965,12 @@ single_thread_check(struct proc *p, int 
> > return (EINTR);
> > }
> >  
> > -   SCHED_LOCK(s);
> > -   if (pr->ps_single == NULL) {
> > -   SCHED_UNLOCK(s);
> > +   if (pr->ps_single == NULL)
> > continue;
> > -   }
> >  
> > if (atomic_dec_int_nv(&pr->ps_singlecount) == 0)
> > wakeup(&pr->ps_singlecount);
> > +
> > if (pr->ps_flags & PS_SINGLEEXIT) {
> > SCHED_UNLOCK(s);
> > KERNEL_LOCK();
> > @@ -1967,13 +1981,24 @@ single_thread_check(struct proc *p, int 
> > /* n