Re: Use SMR_TAILQ for `ps_threads'
On Tue, Dec 01, 2020 at 02:35:18PM -0300, Martin Pieuchot wrote: > On 01/12/20(Tue) 15:30, Claudio Jeker wrote: > > [...] > > Did you run a make build with that smr_barrier() in it and checked that it > > does not cause a slow down? I am sceptical, smr_barrier() is a very slow > > construct which introduces large delays and should be avoided whenever > > possible. > > I did build GENERIC.MP multiple times on a 4CPU sparc64 with the diff > below, without noticeable difference. > > I'm happy to hear from sceptical performance checkers :o) On a reasonably fast amd64 box, this increases GENERIC.MP make -j6 build time from ~3m06s to ~3m44s, which seems a bit much to me. Replacing smr_barrier() with smr_flush() reduces the overhead to a couple of seconds, and it seems warranted here. > > diff --git lib/libkvm/kvm_proc2.c lib/libkvm/kvm_proc2.c > index 96f7dc91b92..1f4f9b914bb 100644 > --- lib/libkvm/kvm_proc2.c > +++ lib/libkvm/kvm_proc2.c > @@ -341,8 +341,9 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process > *pr, > kp.p_pctcpu = 0; > kp.p_stat = (process.ps_flags & PS_ZOMBIE) ? SDEAD : > SIDL; > - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; > - p = TAILQ_NEXT(&proc, p_thr_link)) { > + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); > + p != NULL; > + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) { > if (KREAD(kd, (u_long)p, &proc)) { > _kvm_err(kd, kd->program, > "can't read proc at %lx", > @@ -376,8 +377,8 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process > *pr, > if (!dothreads) > continue; > > - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; > - p = TAILQ_NEXT(&proc, p_thr_link)) { > + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); p != NULL; > + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) { > if (KREAD(kd, (u_long)p, &proc)) { > _kvm_err(kd, kd->program, > "can't read proc at %lx", > diff --git sys/kern/exec_elf.c sys/kern/exec_elf.c > index 5e455208663..575273b306c 100644 > --- sys/kern/exec_elf.c > +++ sys/kern/exec_elf.c > @@ -85,6 +85,7 @@ > #include > #include > #include > +#include > > #include > > @@ -1360,7 +1361,7 @@ coredump_notes_elf(struct proc *p, void *iocookie, > size_t *sizep) >* threads in the process have been stopped and the list can't >* change. >*/ > - TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) { > + SMR_TAILQ_FOREACH_LOCKED(q, &pr->ps_threads, p_thr_link) { > if (q == p) /* we've taken care of this thread */ > continue; > error = coredump_note_elf(q, iocookie, ¬esize); > diff --git sys/kern/init_main.c sys/kern/init_main.c > index fed6be19435..2b657ffe328 100644 > --- sys/kern/init_main.c > +++ sys/kern/init_main.c > @@ -519,7 +519,7 @@ main(void *framep) >*/ > LIST_FOREACH(pr, &allprocess, ps_list) { > nanouptime(&pr->ps_start); > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) { > nanouptime(&p->p_cpu->ci_schedstate.spc_runtime); > timespecclear(&p->p_rtime); > } > diff --git sys/kern/kern_exit.c sys/kern/kern_exit.c > index a20775419e3..3c526ab83b8 100644 > --- sys/kern/kern_exit.c > +++ sys/kern/kern_exit.c > @@ -63,6 +63,7 @@ > #ifdef SYSVSEM > #include > #endif > +#include > #include > > #include > @@ -161,7 +162,8 @@ exit1(struct proc *p, int xexit, int xsig, int flags) > } > > /* unlink ourselves from the active threads */ > - TAILQ_REMOVE(&pr->ps_threads, p, p_thr_link); > + SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link); > + smr_barrier(); > if ((p->p_flag & P_THREAD) == 0) { > /* main thread gotta wait because it has the pid, et al */ > while (pr->ps_refcnt > 1) > @@ -724,7 +726,7 @@ process_zap(struct process *pr) > if (pr->ps_ptstat != NULL) > free(pr->ps_ptstat, M_SUBPROC, sizeof(*pr->ps_ptstat)); > pool_put(&rusage_pool, pr->ps_ru); > - KASSERT(TAILQ_EMPTY(&pr->ps_threads)); > + KASSERT(SMR_TAILQ_EMPTY_LOCKED(&pr->ps_threads)); > lim_free(pr->ps_limit); > crfree(pr->ps_ucred); > pool_put(&process_pool, pr); > diff --git sys/kern/kern_fork.c sys/kern/kern_fork.c > index 9fb239bc8b4..e1cb587b2b8 100644 > --- sys/kern/kern_fork.c > +++ sys/kern/kern_fork.c > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include > #include >
Re: ACPI diff that needs wide testing
On Tue, Dec 01, 2020 at 10:40:30PM +0100, Mark Kettenis wrote: > > From: Greg Steuck > > Date: Mon, 30 Nov 2020 20:54:59 -0800 > > > > Mark Kettenis writes: > > > > > The diff below fixes the way we handle named references in AML > > > packages. This fixes some bugs but I'd like to make sure that this > > > doesn't inadvertedly break things. So tests on a wide variety of > > > machines are welcome. > > > > I see a prompt crash: > > Does this one work better? Indeed it does, no panic, suspend resume still works, no change in dmesg. X1 Gen 2. OpenBSD 6.8-current (GENERIC.MP) #2: Wed Dec 2 08:09:03 CET 2020 flor...@openbsd-build.home.narrans.de:/usr/src/sys/arch/amd64/compile/GENERIC.MP real mem = 8233394176 (7851MB) avail mem = 7968555008 (7599MB) random: good seed from bootblocks mpath0 at root scsibus0 at mpath0: 256 targets mainbus0 at root bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xdcd3d000 (61 entries) bios0: vendor LENOVO version "GRET40WW (1.17 )" date 09/02/2014 bios0: LENOVO 20A7006VUS acpi0 at bios0: ACPI 5.0 acpi0: sleep states S0 S3 S4 S5 acpi0: tables DSDT FACP SLIC DBGP ECDT HPET APIC MCFG SSDT SSDT SSDT SSDT SSDT SSDT SSDT PCCT SSDT UEFI MSDM ASF! BATB FPDT UEFI DMAR acpi0: wakeup devices LID_(S4) SLPB(S3) IGBE(S4) EXP2(S4) XHCI(S3) EHC1(S3) acpitimer0 at acpi0: 3579545 Hz, 24 bits acpiec0 at acpi0 acpihpet0 at acpi0: 14318179 Hz acpimadt0 at acpi0 addr 0xfee0: PC-AT compat cpu0 at mainbus0: apid 0 (boot processor) cpu0: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.30 MHz, 06-45-01 cpu0: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu0: 256KB 64b/line 8-way L2 cache cpu0: smt 0, core 0, package 0 mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges cpu0: apic clock running at 99MHz cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE cpu1 at mainbus0: apid 1 (application processor) cpu1: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.16 MHz, 06-45-01 cpu1: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu1: 256KB 64b/line 8-way L2 cache cpu1: smt 1, core 0, package 0 cpu2 at mainbus0: apid 2 (application processor) cpu2: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.16 MHz, 06-45-01 cpu2: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu2: 256KB 64b/line 8-way L2 cache cpu2: smt 0, core 1, package 0 cpu3 at mainbus0: apid 3 (application processor) cpu3: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 798.16 MHz, 06-45-01 cpu3: FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN cpu3: 256KB 64b/line 8-way L2 cache cpu3: smt 1, core 1, package 0 ioapic0 at mainbus0: apid 2 pa 0xfec0, version 20, 40 pins acpimcfg0 at acpi0 acpimcfg0: addr 0xf800, bus 0-63 acpiprt0 at acpi0: bus 0 (PCI0) acpiprt1 at acpi0: bus -1 (PEG_) acpiprt2 at acpi0: bus 2 (EXP1) acpiprt3 at acpi0: bus 3 (EXP2) acpiprt4 at acpi0: bus -1 (EXP3) dwiic0 at acpi0 I2C1 addr 0xfe105000/0x1000 irq 7 iic0 at dwiic0 "CPLM3218" at iic0 addr 0x48 not configured acpibtn0 at acpi0: LID_ acpibtn1 at acpi0: SLPB acpipci0 at acpi0 PCI0: 0x 0x0011 0x0001 acpicmos0 at acpi0 acpibat0 at acpi0: BAT0 model "45N1703" serial 3191 type LiP oem "SMP" acpiac0 at acpi0: AC unit offline acpithinkpad0 at acpi0: version 2.0 "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "PNP0C14" at acpi0 not configured "INT340F" at acpi0 not configured acpicpu0 at acpi0: C3(200@506 mwait.1@0x60), C2(200@148 mwait.1@0x33), C1(1000@1 mwa
Re: ACPI diff that needs wide testing
Mark Kettenis writes: >> From: Greg Steuck >> Date: Mon, 30 Nov 2020 20:54:59 -0800 >> >> Mark Kettenis writes: >> >> > The diff below fixes the way we handle named references in AML >> > packages. This fixes some bugs but I'd like to make sure that this >> > doesn't inadvertedly break things. So tests on a wide variety of >> > machines are welcome. >> >> I see a prompt crash: > > Does this one work better? Much better. No crash and no difference in dmesg with/without patch. Also survives suspend/wakeup. I'll keep running with this for now. Thanks Greg
Re: Dell XPS 9310 succesful install but bootloader can't read disk label
On Wed, 2 Dec 2020 01:59:00 +0100 Noth wrote: > Disk: sd0 Usable LBA: 34 to 4000797326 [4000797360 Sectors] > #: type [ start: size ] > > 0: EFI Sys [ 2048: 389120 ] > 1: e3c9e316-0b5c-4db8-817d-f92df00215ae [ 391168: 262144 ] > 2: FAT12 [ 653312: 1071679488 ] > 3: 516e7cba-6ecf-11d6-8ff8-00022d09712b [ 1072332800: 3905536 ] > 4: Linux files* [ 1076238336: 2692558848 ] > 5: OpenBSD [ 3768797184: 195311616 ] > 6: Win Recovery [ 3964108800: 2027520 ] > 7: Win Recovery [ 3966136320: 31772672 ] > 8: Win Recovery [ 3997911040: 2885632 ] OpenBSD offset 3768797184 is about 1797G. I believe that our EFI bootloader works only if OpenBSD partition 'a' is in the first 1024G of the drive. Back in January 2020, I suspected that a daddr32_t would overflow in /sys/arch/amd64/stand/efiboot/efidev.c, see https://marc.info/?l=openbsd-bugs&m=158007879212894&w=2 I have no drives larger than 1024G, so I have no way to reproduce the problem.--George
Re: Dell XPS 9310 succesful install but bootloader can't read disk label
Here we go (sd0 is the NVMe, sd0a is an encrypted OpenBSD install, which is on sd2). And yes, once decrypted, I can write to the partitions: Disk: sd0 Usable LBA: 34 to 4000797326 [4000797360 Sectors] #: type [ start: size ] 0: EFI Sys [ 2048: 389120 ] 1: e3c9e316-0b5c-4db8-817d-f92df00215ae [ 391168: 262144 ] 2: FAT12 [ 653312: 1071679488 ] 3: 516e7cba-6ecf-11d6-8ff8-00022d09712b [ 1072332800: 3905536 ] 4: Linux files* [ 1076238336: 2692558848 ] 5: OpenBSD [ 3768797184: 195311616 ] 6: Win Recovery [ 3964108800: 2027520 ] 7: Win Recovery [ 3966136320: 31772672 ] 8: Win Recovery [ 3997911040: 2885632 ] Disk: sd2 Usable LBA: 64 to 195311024 [195311088 Sectors] #: type [ start: size ] 1: EFI Sys [ 64: 960 ] 3: OpenBSD [ 1024: 195310001 ] # /dev/rsd0c: type: SCSI disk: SCSI disk label: KXG60PNV2T04 NVM duid: 9bfcd2a2080acb33 flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 249038 total sectors: 4000797360 boundstart: 3768797184 boundend: 3964108800 drivedata: 0 16 partitions: # size offset fstype [fsize bsize cpg] a: 195311616 3768797184 RAID c: 4000797360 0 unused i: 389120 2048 MSDOS j: 262144 391168 unknown k: 1071679488 653312 MSDOS l: 2027520 3964108800 unknown m: 31772672 3966136320 unknown n: 2885632 3997911040 unknown o: 3905536 1072332800 ext2fs p: 2692558848 1076238336 ext2fs # /dev/rsd2c: type: SCSI disk: SCSI disk label: SR CRYPTO duid: 72cca61970b173bb flags: bytes/sector: 512 sectors/track: 63 tracks/cylinder: 255 sectors/cylinder: 16065 cylinders: 12157 total sectors: 195311088 boundstart: 1024 boundend: 195311025 drivedata: 0 16 partitions: # size offset fstype [fsize bsize cpg] a: 2097152 1024 4.2BSD 2048 16384 12960 b: 17486680 2098176 swap c: 195311088 0 unused d: 8388576 19584864 4.2BSD 2048 16384 12960 e: 22683520 27973440 4.2BSD 2048 16384 12960 f: 12582912 50656960 4.2BSD 2048 16384 12960 g: 2097152 63239872 4.2BSD 2048 16384 12960 h: 28081408 65337024 4.2BSD 2048 16384 12960 i: 960 64 MSDOS j: 4194304 93418432 4.2BSD 2048 16384 12960 k: 12582912 97612736 4.2BSD 2048 16384 12960 l: 85115360 110195648 4.2BSD 2048 16384 12960 I hope this isn't because I'm triplebooting? I guess that is an edgecase, but all the same... Cheers, Noth On 01/12/2020 21:39, Mark Kettenis wrote: From: Noth Date: Tue, 1 Dec 2020 21:18:14 +0100 Hi, As a follow up, I got a usb-c stick and installed -current to that. It boots the system so I now have a dmesg, pcidump and usbdump for you: http://casper.nineinchnetworks.ch/images/dmesg9310.txt http://casper.nineinchnetworks.ch/images/pcidump9310.txt http://casper.nineinchnetworks.ch/images/usbdump9310.txt Cheers, Noth While booted using the USB stick, can you: a) Access the NVMe drive? b) Show the output of disklabel sd0? c) Show the output of fdisk sd0? Cheers, Mark On 19/11/2020 00:31, Noth wrote: Hi, I've got a brand new Dell XPS 9310 and I've tried to get 6.7, 6.8 and the latest snapshot on it. Installation works fine but neither the 3.50, 3.54 or 3.55 UEFI bootloaders can read the disk label so it's unbootable. I of course disactivated Secure Boot and anything in the BIOS that looked like it might be an obstacle. I've got two pictures of the bsd.rd dmesg for the latest snapshot for you here: http://casper.nineinchnetworks.ch/images/dmesg1.jpg http://casper.nineinchnetworks.ch/images/dmesg2.jpg I have no idea what to do next, hope this is of use to you. Cheers, Noth P.S.: I'm not subscribed to this mailing list, please CC me when answering.
PF synproxy should act on inbound packets only
Hello, the issue described here has been hit bu Stuart some time ago. feel free to stop reading if you don't care/use pf(4) synproxy. let's assume there are rules which allow just surfing web over http: block all pass proto tcp from any to any port = 80 synproxy state pass proto udp from any to any port = 53 The 'synproxy' rule is not bound to any interface nor any direction. As such it matches every forwarded SYN packet to port 80 two times. If there would have been no 'synproxy state' in rule, then pair of states gets created: A -> B:80 @ IN # inbound direction at RX NIC A -> B:80 @ OUT # outbound direction at TX NIC The 'synproxy' action changes things a bit. The 'synproxy' action is handled here in pf_create_state(): 4163 if (pd->proto == IPPROTO_TCP && (th->th_flags & (TH_SYN|TH_ACK)) == 4164 TH_SYN && r->keep_state == PF_STATE_SYNPROXY) { 4165 int rtid = pd->rdomain; 4166 if (act->rtableid >= 0) 4167 rtid = act->rtableid; 4168 pf_set_protostate(s, PF_PEER_SRC, PF_TCPS_PROXY_SRC); 4169 s->src.seqhi = arc4random(); 4170 /* Find mss option */ 4171 mss = pf_get_mss(pd); 4172 mss = pf_calc_mss(pd->src, pd->af, rtid, mss); 4173 mss = pf_calc_mss(pd->dst, pd->af, rtid, mss); 4174 s->src.mss = mss; 4175 pf_send_tcp(r, pd->af, pd->dst, pd->src, th->th_dport, 4176 th->th_sport, s->src.seqhi, ntohl(th->th_seq) + 1, 4177 TH_SYN|TH_ACK, 0, s->src.mss, 0, 1, 0, pd->rdomain); ^^ 4178 REASON_SET(&reason, PFRES_SYNPROXY); 4179 return (PF_SYNPROXY_DROP); 4180 } Note pf_send_tcp() at line 4175 is being called with argument `tag` set to 1. `tag` == 1 orders pf_send_tcp() to send PF_TAG_GENERATED flag at packet. 2770 struct mbuf * 2771 pf_build_tcp(const struct pf_rule *r, sa_family_t af, 2772 const struct pf_addr *saddr, const struct pf_addr *daddr, 2773 u_int16_t sport, u_int16_t dport, u_int32_t seq, u_int32_t ack, 2774 u_int8_t flags, u_int16_t win, u_int16_t mss, u_int8_t ttl, int tag, 2775 u_int16_t rtag, u_int sack, u_int rdom) 2776 { 2805 2806 /* create outgoing mbuf */ 2807 m = m_gethdr(M_DONTWAIT, MT_HEADER); 2808 if (m == NULL) 2809 return (NULL); 2810 if (tag) 2811 m->m_pkthdr.pf.flags |= PF_TAG_GENERATED; 2812 m->m_pkthdr.pf.tag = rtag; the PF_TAG_GENERATED flag brings an unexpected effect to packet sent by pf_send_tcp() at line 4175. The packet sent by pf_send_tcp() is intercepted by pf_test() as outbound. However PF_TAG_GENERATED flag turns pf_test() into No-OP: 6855 int 6856 pf_test(sa_family_t af, int fwdir, struct ifnet *ifp, struct mbuf **m0) 6857 { #ifdef DIAGNOSTIC 6890 if (((*m0)->m_flags & M_PKTHDR) == 0) 6891 panic("non-M_PKTHDR is passed to pf_test"); 6892 #endif /* DIAGNOSTIC */ 6893 6894 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_GENERATED) 6895 return (PF_PASS); 6896 6897 if ((*m0)->m_pkthdr.pf.flags & PF_TAG_DIVERTED_PACKET) 6898 return (PF_PASS); 6899 however it is not desired in this case. We actually do want PF to create a state for such packet, so response B:80 -> A can enter the firewall. the fix is to apply synproxy action on inbound packets only. Diff below does that exactly. Furthermore it also makes pfctl(8) to emit warning, when synproxy is being used in outbound/unbound rule: lumpy$ echo 'pass proto tcp from any to any port = 80 synproxy state' |./pfctl -nf - warning (stdin:1): synproxy acts on inbound packets only synproxy action is ignored for outbound packets OK? thanks and regards sashan 8<---8<---8<--8< diff --git a/sbin/pfctl/parse.y b/sbin/pfctl/parse.y index f06171158cb..d052b5b9a0e 100644 --- a/sbin/pfctl/parse.y +++ b/sbin/pfctl/parse.y @@ -4042,6 +4042,13 @@ rule_consistent(struct pf_rule *r) "synproxy state or modulate state"); problems++; } + + if ((r->keep_state == PF_STATE_SYNPROXY) && (r->direction != PF_IN)) + fprintf(stderr, "warning (%s:%d): " + "synproxy acts on inbound packets only\n" + "synproxy action is ignored for outbound packets\n", + file->name, yylval.lineno); + if ((r->nat.addr.type != PF_ADDR_NONE || r->rdr.addr.type != PF_ADDR_NONE) && r->action != PF_MATCH && !r->keep_state) { diff --git a/sys/net/pf.c b/sys/net/pf.c index 823fdc22133..986ee57bff9 100644 --- a/sys/net/pf.c +++ b/sys/net/pf.c @@ -4161,7 +4161,7 @@ pf_create_state(struct pf_pdesc *pd, struct pf_rule *r,
Re: stdio: fclose(3), fopen(3): intended locking hierarchy?
On Mon, Nov 30, 2020 at 08:44:41PM -0800, Philip Guenther wrote: > On Mon, Nov 30, 2020 at 6:10 PM Scott Cheloha > wrote: > > > On Sat, Nov 28, 2020 at 01:41:50PM -0800, Philip Guenther wrote: > > > ... > > > > After thinking through states more, #4 isn't safe: _fwalk() can't take > > > sfp_mutex because it's called from __srefill() which has its FILE locked, > > > which would reverse the order: two concurrent calls to __srefill() from > > > line-buffered FILEs could have one in _fwalk() blocking on locking the > > > other, while the other blocks on the sfp_mutex for _fwalk(). > > > > This part in __srefill(): > > > > /* > > * Before reading from a line buffered or unbuffered file, > > * flush all line buffered output files, per the ANSI C > > * standard. > > */ > > > ... > > > Where in the standard(s) is this behavior required? I'm not even sure > > how to look for it. > > > > Pick a version of the C standard and search for "buffered" until you find > something like this, which is part of 7.19.3p3 in the draft I'm looking at: > ><...> When a stream is line buffered, characters are intended to be >transmitted to or from the host environment as a block when a new-line > character is >encountered. Furthermore, characters are intended to be transmitted as a > block to the host >environment when a buffer is filled, when input is requested on an > unbuffered stream, or >when input is requested on a line buffered stream that requires the > transmission of >characters from the host environment. Support for these characteristics > is >implementation-defined, and may be affected via the setbuf and setvbuf > functions. > > Effectively the same text appears in the POSIX standard in XSH 2.5p6. > > Basically, you're allowed to stop doing that by instead printing out your > cell-phone number so that everyone who wants to complain that their program > failed to output its prompt before blocking for input can call and scream > at you. :D If they want that they need to call fflush(3) on the stream the prompt is printed to if that stream is line- or block-buffered. Anyway, the key phrase is: > Furthermore, characters are intended to be transmitted as a block to the host > environment when a buffer is filled, when input is requested on an unbuffered > stream, or when input is requested on a line buffered stream that requires the > transmission of characters from the host environment. When I read this I get the impression they are only talking about the FILE being read, not that we should transfer characters as a block for *every* open FILE when input is requested on any unbuffered or line-buffered stream. But I am not certain exactly what is meant by the passage you quoted. It is pretty vague. One way around the deadlock in __srefill() is to add a FTRYLOCKFILE() macro and do that during _fwalk() before acting on each FILE. If we can't lock the FILE we just move on to the next one. That doesn't seem right to me though. FWIW, musl doesn't flush every file when refilling a line- or unbuffered FILE. Then again, maybe they misinterpreted the standard.
Re: ACPI diff that needs wide testing
> From: Greg Steuck > Date: Mon, 30 Nov 2020 20:54:59 -0800 > > Mark Kettenis writes: > > > The diff below fixes the way we handle named references in AML > > packages. This fixes some bugs but I'd like to make sure that this > > doesn't inadvertedly break things. So tests on a wide variety of > > machines are welcome. > > I see a prompt crash: Does this one work better? Index: dev/acpi/acpi.c === RCS file: /cvs/src/sys/dev/acpi/acpi.c,v retrieving revision 1.391 diff -u -p -r1.391 acpi.c --- dev/acpi/acpi.c 27 Aug 2020 01:08:55 - 1.391 +++ dev/acpi/acpi.c 1 Dec 2020 21:40:03 - @@ -2980,21 +2980,24 @@ acpi_getprop(struct aml_node *node, cons /* Check properties. */ for (i = 0; i < dsd.v_package[1]->length; i++) { struct aml_value *res = dsd.v_package[1]->v_package[i]; + struct aml_value *val; int len; if (res->type != AML_OBJTYPE_PACKAGE || res->length != 2 || res->v_package[0]->type != AML_OBJTYPE_STRING) continue; - len = res->v_package[1]->length; - switch (res->v_package[1]->type) { + val = res->v_package[1]; + if (val->type == AML_OBJTYPE_OBJREF) + val = val->v_objref.ref; + + len = val->length; + switch (val->type) { case AML_OBJTYPE_BUFFER: - memcpy(buf, res->v_package[1]->v_buffer, - min(len, buflen)); + memcpy(buf, val->v_buffer, min(len, buflen)); return len; case AML_OBJTYPE_STRING: - memcpy(buf, res->v_package[1]->v_string, - min(len, buflen)); + memcpy(buf, val->v_string, min(len, buflen)); return len; } } @@ -3031,14 +3034,22 @@ acpi_getpropint(struct aml_node *node, c /* Check properties. */ for (i = 0; i < dsd.v_package[1]->length; i++) { struct aml_value *res = dsd.v_package[1]->v_package[i]; + struct aml_value *val; if (res->type != AML_OBJTYPE_PACKAGE || res->length != 2 || - res->v_package[0]->type != AML_OBJTYPE_STRING || - res->v_package[1]->type != AML_OBJTYPE_INTEGER) + res->v_package[0]->type != AML_OBJTYPE_STRING) + continue; + + val = res->v_package[1]; + if (val->type == AML_OBJTYPE_OBJREF) + val = val->v_objref.ref; + + if (val->type != AML_OBJTYPE_INTEGER) continue; - if (strcmp(res->v_package[0]->v_string, prop) == 0) - return res->v_package[1]->v_integer; + if (strcmp(res->v_package[0]->v_string, prop) == 0 && + val->type == AML_OBJTYPE_INTEGER) + return val->v_integer; } return defval; @@ -3130,7 +3141,7 @@ const char *acpi_isa_hids[] = { void acpi_attach_deps(struct acpi_softc *sc, struct aml_node *node) { - struct aml_value res; + struct aml_value res, *val; struct aml_node *dep; int i; @@ -3141,9 +3152,14 @@ acpi_attach_deps(struct acpi_softc *sc, return; for (i = 0; i < res.length; i++) { - if (res.v_package[i]->type != AML_OBJTYPE_STRING) + val = res.v_package[i]; + if (val->type == AML_OBJTYPE_OBJREF) + val = val->v_objref.ref; + if (val->type != AML_OBJTYPE_DEVICE) { + printf("%s: type %d\n", __func__, val->type); continue; - dep = aml_searchrel(node, res.v_package[i]->v_string); + } + dep = val->node; if (dep == NULL || dep->attached) continue; dep = aml_searchname(dep, "_HID"); Index: dev/acpi/acpiprt.c === RCS file: /cvs/src/sys/dev/acpi/acpiprt.c,v retrieving revision 1.49 diff -u -p -r1.49 acpiprt.c --- dev/acpi/acpiprt.c 11 Apr 2020 11:01:18 - 1.49 +++ dev/acpi/acpiprt.c 1 Dec 2020 21:40:03 - @@ -272,14 +272,6 @@ acpiprt_prt_add(struct acpiprt_softc *sc } pp = v->v_package[2]; - if (pp->type == AML_OBJTYPE_STRING) { - node = aml_searchrel(sc->sc_devnode, pp->v_string); - if (node == NULL) { - printf("Invalid device\n"); - return; - } - pp = node->value; - } if (pp->type == AML_OBJTYPE_NAMEREF) { node = aml_searchrel(sc->sc_devnode, pp->v_nameref); if
Re: Dell XPS 9310 succesful install but bootloader can't read disk label
> From: Noth > Date: Tue, 1 Dec 2020 21:18:14 +0100 > > Hi, > > As a follow up, I got a usb-c stick and installed -current to that. > It boots the system so I now have a dmesg, pcidump and usbdump for you: > > http://casper.nineinchnetworks.ch/images/dmesg9310.txt > > http://casper.nineinchnetworks.ch/images/pcidump9310.txt > > http://casper.nineinchnetworks.ch/images/usbdump9310.txt > > Cheers, > > Noth While booted using the USB stick, can you: a) Access the NVMe drive? b) Show the output of disklabel sd0? c) Show the output of fdisk sd0? Cheers, Mark > On 19/11/2020 00:31, Noth wrote: > > Hi, > > > > I've got a brand new Dell XPS 9310 and I've tried to get 6.7, 6.8 > > and the latest snapshot on it. Installation works fine but neither the > > 3.50, 3.54 or 3.55 UEFI bootloaders can read the disk label so it's > > unbootable. I of course disactivated Secure Boot and anything in the > > BIOS that looked like it might be an obstacle. I've got two pictures > > of the bsd.rd dmesg for the latest snapshot for you here: > > > > http://casper.nineinchnetworks.ch/images/dmesg1.jpg > > > > http://casper.nineinchnetworks.ch/images/dmesg2.jpg > > > > I have no idea what to do next, hope this is of use to you. > > > > Cheers, > > > > Noth > > > > P.S.: I'm not subscribed to this mailing list, please CC me when > > answering. > > > >
Re: Dell XPS 9310 succesful install but bootloader can't read disk label
Hi, As a follow up, I got a usb-c stick and installed -current to that. It boots the system so I now have a dmesg, pcidump and usbdump for you: http://casper.nineinchnetworks.ch/images/dmesg9310.txt http://casper.nineinchnetworks.ch/images/pcidump9310.txt http://casper.nineinchnetworks.ch/images/usbdump9310.txt Cheers, Noth On 19/11/2020 00:31, Noth wrote: Hi, I've got a brand new Dell XPS 9310 and I've tried to get 6.7, 6.8 and the latest snapshot on it. Installation works fine but neither the 3.50, 3.54 or 3.55 UEFI bootloaders can read the disk label so it's unbootable. I of course disactivated Secure Boot and anything in the BIOS that looked like it might be an obstacle. I've got two pictures of the bsd.rd dmesg for the latest snapshot for you here: http://casper.nineinchnetworks.ch/images/dmesg1.jpg http://casper.nineinchnetworks.ch/images/dmesg2.jpg I have no idea what to do next, hope this is of use to you. Cheers, Noth P.S.: I'm not subscribed to this mailing list, please CC me when answering.
Re: ACPI diff that needs wide testing
> Date: Tue, 1 Dec 2020 20:17:47 +0100 > From: Theo Buehler > > On Tue, Dec 01, 2020 at 08:10:33PM +0100, Mark Kettenis wrote: > > > From: Greg Steuck > > > Date: Mon, 30 Nov 2020 20:54:59 -0800 > > > > > > Mark Kettenis writes: > > > > > > > The diff below fixes the way we handle named references in AML > > > > packages. This fixes some bugs but I'd like to make sure that this > > > > doesn't inadvertedly break things. So tests on a wide variety of > > > > machines are welcome. > > > > > > I see a prompt crash: https://photos.app.goo.gl/5NkwAHA6jxrXFzEV7 > > > > Can you send me the acpidump output for that machine? > > gnezdo provided iasl -d output: > > > > The contents of > > >cp -R /var/db/acpi /tmp && iasl -d /tmp/acpi/* > > > are here: https://blackgnezdo.github.io/hp-elitebook-840.tgz Ah crap. Reading is hard... Sorry Greg.
Re: ACPI diff that needs wide testing
On Tue, Dec 01, 2020 at 08:10:33PM +0100, Mark Kettenis wrote: > > From: Greg Steuck > > Date: Mon, 30 Nov 2020 20:54:59 -0800 > > > > Mark Kettenis writes: > > > > > The diff below fixes the way we handle named references in AML > > > packages. This fixes some bugs but I'd like to make sure that this > > > doesn't inadvertedly break things. So tests on a wide variety of > > > machines are welcome. > > > > I see a prompt crash: https://photos.app.goo.gl/5NkwAHA6jxrXFzEV7 > > Can you send me the acpidump output for that machine? gnezdo provided iasl -d output: > > The contents of > >cp -R /var/db/acpi /tmp && iasl -d /tmp/acpi/* > > are here: https://blackgnezdo.github.io/hp-elitebook-840.tgz
Re: ACPI diff that needs wide testing
> From: Greg Steuck > Date: Mon, 30 Nov 2020 20:54:59 -0800 > > Mark Kettenis writes: > > > The diff below fixes the way we handle named references in AML > > packages. This fixes some bugs but I'd like to make sure that this > > doesn't inadvertedly break things. So tests on a wide variety of > > machines are welcome. > > I see a prompt crash: https://photos.app.goo.gl/5NkwAHA6jxrXFzEV7 Can you send me the acpidump output for that machine? > OCRed: > 7b880 size Ox50 previous type ACPI (0xdeaf4151 = 0xdeaf4152) > Stopped at db_enter+0x10: : pop %rbp > TID PID UID PR FLAGS PFLAGS > 0x1 0x200 CPU COMMAND 0K swapper > db_enter(10, 8251cf10,282,8, 81887a30, 8251cf18) at > db_ enter +0x10 > panic(81dd9174, 81dd9174, 3f, 8le23d0e, > 80080807b888,9) at panic+0x12a > malloc(50,21,9,50,69ffcbe5966771e0, 8010d488) at malloc+0x783 > aml_parse(8810d488,69, 8003a810, 8010d488, > ee7eae2dea48b7a 4, 8010d488) at aml_parse+0x2a32 > aml_parse(8010d488, 54,0,80008818d488, ee?eae2dea > 4983ef,80880818d488) at aml_parse+0x3bc > aml_eval(80176f88, 808088082108, 74, 18021, 8, > 800888176f88) at aml_eval+0x317 > aml_parse(80176f88, 74, 7d, 80176f88, ee?eae2dea 48b7a4, > 89800017 6f88) at aml_parse+0x2df6 > aml_parse(808080176f88,54, 88176f88, 88176f88, > ee7eae2dea4983e, f80176f88) at aml_parse+8x3bc > aml_eval(0, 80081a08, 74,0,0,0) at aml_eval+0x317 > aml_evalnode(88008202c, 80082008,8,0, > 8251d648,8808088 8202c) at aml_evalnode +0xb4 > aml_evalinteger(8002e400, 80082c88, 81ddbf04,8,8, > 8251d6f0) at aml_evalinteger+0x4e > acpitz_gettempreading(80257800, 81ddbf04, dad3d43216419857, > ff ff8251d73b, 80257800, aac) at acpitz_gettempreading +0x37 > acpitz_attach(8002e400, 80257800, 8251d818, > 8002e4 08,44b6122d7f8019d2, 8002e480) at acpitz_attach+Oxlea > config_attach(80028400, 8210ef70, 82510818, > 817576 40,68aa0891922ad23e, 80082c88) at config_attach+8x1f4 > end trace frame: 0x8251d950, count: 0 > > The contents of >cp -R /var/db/acpi /tmp && iasl -d /tmp/acpi/* > are here: https://blackgnezdo.github.io/hp-elitebook-840.tgz > > dmesg when not running your patch: > OpenBSD 6.8-current (GENERIC.MP) #69: Mon Nov 30 19:53:29 PST 2020 > g...@hippy.nest.cx:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 17055608832 (16265MB) > avail mem = 16523407360 (15757MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbbe3f000 (33 entries) > bios0: vendor Hewlett-Packard version "L71 Ver. 01.20" date 07/28/2014 > bios0: Hewlett-Packard HP EliteBook 840 G1 > acpi0 at bios0: ACPI 5.0 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP HPET APIC MCFG TCPA SSDT SSDT FPDT BGRT SSDT SSDT > SSDT SSDT SSDT ASF! DMAR > acpi0: wakeup devices LANC(S5) EHC1(S3) XHC_(S3) PCIB(S5) NIC_(S5) RP04(S5) > WNIC(S5) > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpihpet0 at acpi0: 14318179 Hz > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > cpu0 at mainbus0: apid 0 (boot processor) > cpu0: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 1995.70 MHz, 06-45-01 > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu0: 256KB 64b/line 8-way L2 cache > cpu0: smt 0, core 0, package 0 > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > cpu0: apic clock running at 99MHz > cpu0: mwait min=64, max=64, C-substates=0.2.1.2.4.1.1.1, IBE > cpu1 at mainbus0: apid 1 (application processor) > cpu1: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 1995.39 MHz, 06-45-01 > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,SDBG,FMA3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,MOVBE,POPCNT,DEADLINE,AES,XSAVE,AVX,F16C,RDRAND,NXE,PAGE1GB,RDTSCP,LONG,LAHF,ABM,PERF,ITSC,FSGSBASE,TSC_ADJUST,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,SRBDS_CTRL,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > cpu1: 256KB 64b/line 8-way L2 cache > cpu1: smt 1, core 0, package 0 > cpu2 at mainbus0: apid 2 (application processor) > cpu2: Intel(R) Core(TM) i7-4600U CPU @ 2.10GHz, 1995.40 MHz, 06-45-01 > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMO
[PATCH] __rec_open() should set the type
According to dbopen(3), the 'type' field of struct DB holds the "type of the underlying access method (and file format)." In __bt_open() it is set to DB_BTREE and in __hash_open() it is set to DB_HASH, so one might expect that in __rec_open() it is set to DB_RECNO. However, it is left untouched. Index: rec_open.c === RCS file: /cvs/src/lib/libc/db/recno/rec_open.c,v retrieving revision 1.13 diff -u -p -u -r1.13 rec_open.c --- rec_open.c 21 Sep 2016 04:38:56 - 1.13 +++ rec_open.c 1 Dec 2020 13:01:37 - @@ -154,6 +154,7 @@ slow: if ((t->bt_rfp = fdopen(rfd, "r" dbp->put = __rec_put; dbp->seq = __rec_seq; dbp->sync = __rec_sync; + dbp->type = DB_RECNO; /* If the root page was created, reset the flags. */ if ((h = mpool_get(t->bt_mp, P_ROOT, 0)) == NULL) -- Gemaakt met Opera's e-mailprogramma: http://www.opera.com/mail/
Re: Use SMR_TAILQ for `ps_threads'
On 01/12/20(Tue) 15:30, Claudio Jeker wrote: > [...] > Did you run a make build with that smr_barrier() in it and checked that it > does not cause a slow down? I am sceptical, smr_barrier() is a very slow > construct which introduces large delays and should be avoided whenever > possible. I did build GENERIC.MP multiple times on a 4CPU sparc64 with the diff below, without noticeable difference. I'm happy to hear from sceptical performance checkers :o) diff --git lib/libkvm/kvm_proc2.c lib/libkvm/kvm_proc2.c index 96f7dc91b92..1f4f9b914bb 100644 --- lib/libkvm/kvm_proc2.c +++ lib/libkvm/kvm_proc2.c @@ -341,8 +341,9 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process *pr, kp.p_pctcpu = 0; kp.p_stat = (process.ps_flags & PS_ZOMBIE) ? SDEAD : SIDL; - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; - p = TAILQ_NEXT(&proc, p_thr_link)) { + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); + p != NULL; + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) { if (KREAD(kd, (u_long)p, &proc)) { _kvm_err(kd, kd->program, "can't read proc at %lx", @@ -376,8 +377,8 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process *pr, if (!dothreads) continue; - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; - p = TAILQ_NEXT(&proc, p_thr_link)) { + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); p != NULL; + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) { if (KREAD(kd, (u_long)p, &proc)) { _kvm_err(kd, kd->program, "can't read proc at %lx", diff --git sys/kern/exec_elf.c sys/kern/exec_elf.c index 5e455208663..575273b306c 100644 --- sys/kern/exec_elf.c +++ sys/kern/exec_elf.c @@ -85,6 +85,7 @@ #include #include #include +#include #include @@ -1360,7 +1361,7 @@ coredump_notes_elf(struct proc *p, void *iocookie, size_t *sizep) * threads in the process have been stopped and the list can't * change. */ - TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) { + SMR_TAILQ_FOREACH_LOCKED(q, &pr->ps_threads, p_thr_link) { if (q == p) /* we've taken care of this thread */ continue; error = coredump_note_elf(q, iocookie, ¬esize); diff --git sys/kern/init_main.c sys/kern/init_main.c index fed6be19435..2b657ffe328 100644 --- sys/kern/init_main.c +++ sys/kern/init_main.c @@ -519,7 +519,7 @@ main(void *framep) */ LIST_FOREACH(pr, &allprocess, ps_list) { nanouptime(&pr->ps_start); - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) { nanouptime(&p->p_cpu->ci_schedstate.spc_runtime); timespecclear(&p->p_rtime); } diff --git sys/kern/kern_exit.c sys/kern/kern_exit.c index a20775419e3..3c526ab83b8 100644 --- sys/kern/kern_exit.c +++ sys/kern/kern_exit.c @@ -63,6 +63,7 @@ #ifdef SYSVSEM #include #endif +#include #include #include @@ -161,7 +162,8 @@ exit1(struct proc *p, int xexit, int xsig, int flags) } /* unlink ourselves from the active threads */ - TAILQ_REMOVE(&pr->ps_threads, p, p_thr_link); + SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link); + smr_barrier(); if ((p->p_flag & P_THREAD) == 0) { /* main thread gotta wait because it has the pid, et al */ while (pr->ps_refcnt > 1) @@ -724,7 +726,7 @@ process_zap(struct process *pr) if (pr->ps_ptstat != NULL) free(pr->ps_ptstat, M_SUBPROC, sizeof(*pr->ps_ptstat)); pool_put(&rusage_pool, pr->ps_ru); - KASSERT(TAILQ_EMPTY(&pr->ps_threads)); + KASSERT(SMR_TAILQ_EMPTY_LOCKED(&pr->ps_threads)); lim_free(pr->ps_limit); crfree(pr->ps_ucred); pool_put(&process_pool, pr); diff --git sys/kern/kern_fork.c sys/kern/kern_fork.c index 9fb239bc8b4..e1cb587b2b8 100644 --- sys/kern/kern_fork.c +++ sys/kern/kern_fork.c @@ -52,6 +52,7 @@ #include #include #include +#include #include #include #include @@ -179,8 +180,8 @@ process_initialize(struct process *pr, struct proc *p) { /* initialize the thread links */ pr->ps_mainproc = p; - TAILQ_INIT(&pr->ps_threads); - TAILQ_INSERT_TAIL(&pr->ps_threads, p, p_thr_link); + SMR_TAILQ_INIT(&pr->ps_threads); + SMR_TAILQ_INSERT_TAIL_LOCKED(&pr->ps_threads, p, p_thr_link); pr->ps_refcnt = 1; p->p_p = pr; @@ -557,7 +558,7 @@ t
Re: More uvm_pagefree() lock fixes
> Date: Tue, 1 Dec 2020 13:18:59 -0300 > From: Martin Pieuchot > > On 01/12/20(Tue) 15:18, Mark Kettenis wrote: > > > Date: Tue, 1 Dec 2020 11:08:50 -0300 > > > From: Martin Pieuchot > > > > > > As recently pointed out by jmatthew@ these need the lock as well, ok? > > > > No. These pages are "unmanaged" and therefore they can not be on any > > of the page queues and locking those page queues would be pointless. > > > > We probably should adjust the documentation to reflect this. > > Something like that? Yes, because... > Btw did you look at the uses of uvm_pagefree(), uvm_pagelookup() & co in > the various pmap? Do you have any suggestion about how to handle them? ...the pages in the various pmaps aren't managed either. However the use of uvm_pagelookup() in some of the pmaps suggests that we need to think a bit about PG_TABLED. It doesn't really make sense to use the page queue lock for that though. For the amd64 pmap there are separate objects for each individual userland pmap. So the per-pmap lock may already provide the necessary locking. I also need to go back to your previous diff. In hindsight I think locking the page queues should not be necessary for buffer map pages either. > Index: uvm/uvm_page.c > === > RCS file: /cvs/src/sys/uvm/uvm_page.c,v > retrieving revision 1.153 > diff -u -p -r1.153 uvm_page.c > --- uvm/uvm_page.c1 Dec 2020 13:56:22 - 1.153 > +++ uvm/uvm_page.c1 Dec 2020 16:16:29 - > @@ -928,7 +928,7 @@ uvm_pagerealloc(struct vm_page *pg, stru > * uvm_pageclean: clean page > * > * => erase page's identity (i.e. remove from object) > - * => caller must lock page queues > + * => caller must lock page queues if `pg' is managed > * => assumes all valid mappings of pg are gone > */ > void > @@ -937,7 +937,8 @@ uvm_pageclean(struct vm_page *pg) > u_int flags_to_clear = 0; > > #if all_pmap_are_fixed > - MUTEX_ASSERT_LOCKED(&uvm.pageqlock); > + if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE)) > + MUTEX_ASSERT_LOCKED(&uvm.pageqlock); > #endif > > #ifdef DEBUG > @@ -998,14 +999,15 @@ uvm_pageclean(struct vm_page *pg) > * > * => erase page's identity (i.e. remove from object) > * => put page on free list > - * => caller must lock page queues > + * => caller must lock page queues if `pg' is managed > * => assumes all valid mappings of pg are gone > */ > void > uvm_pagefree(struct vm_page *pg) > { > #if all_pmap_are_fixed > - MUTEX_ASSERT_LOCKED(&uvm.pageqlock); > + if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE)) > + MUTEX_ASSERT_LOCKED(&uvm.pageqlock); > #endif > > uvm_pageclean(pg); >
Re: [PATCH] __rec_open() should set the type
On Tue, 01 Dec 2020 14:07:18 +0100, "Boudewijn Dijkstra" wrote: > According to dbopen(3), the 'type' field of struct DB holds the "type of > the underlying access method (and file format)." In __bt_open() it is set > to DB_BTREE and in __hash_open() it is set to DB_HASH, so one might expect > that in __rec_open() it is set to DB_RECNO. However, it is left untouched. Thanks, committed. - todd
Re: More uvm_pagefree() lock fixes
On 01/12/20(Tue) 15:18, Mark Kettenis wrote: > > Date: Tue, 1 Dec 2020 11:08:50 -0300 > > From: Martin Pieuchot > > > > As recently pointed out by jmatthew@ these need the lock as well, ok? > > No. These pages are "unmanaged" and therefore they can not be on any > of the page queues and locking those page queues would be pointless. > > We probably should adjust the documentation to reflect this. Something like that? Btw did you look at the uses of uvm_pagefree(), uvm_pagelookup() & co in the various pmap? Do you have any suggestion about how to handle them? Index: uvm/uvm_page.c === RCS file: /cvs/src/sys/uvm/uvm_page.c,v retrieving revision 1.153 diff -u -p -r1.153 uvm_page.c --- uvm/uvm_page.c 1 Dec 2020 13:56:22 - 1.153 +++ uvm/uvm_page.c 1 Dec 2020 16:16:29 - @@ -928,7 +928,7 @@ uvm_pagerealloc(struct vm_page *pg, stru * uvm_pageclean: clean page * * => erase page's identity (i.e. remove from object) - * => caller must lock page queues + * => caller must lock page queues if `pg' is managed * => assumes all valid mappings of pg are gone */ void @@ -937,7 +937,8 @@ uvm_pageclean(struct vm_page *pg) u_int flags_to_clear = 0; #if all_pmap_are_fixed - MUTEX_ASSERT_LOCKED(&uvm.pageqlock); + if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE)) + MUTEX_ASSERT_LOCKED(&uvm.pageqlock); #endif #ifdef DEBUG @@ -998,14 +999,15 @@ uvm_pageclean(struct vm_page *pg) * * => erase page's identity (i.e. remove from object) * => put page on free list - * => caller must lock page queues + * => caller must lock page queues if `pg' is managed * => assumes all valid mappings of pg are gone */ void uvm_pagefree(struct vm_page *pg) { #if all_pmap_are_fixed - MUTEX_ASSERT_LOCKED(&uvm.pageqlock); + if (pg->pg_flags & (PG_TABLED|PQ_ACTIVE|PQ_INACTIVE)) + MUTEX_ASSERT_LOCKED(&uvm.pageqlock); #endif uvm_pageclean(pg);
snmpd drop traphandler process
Hello tech@, Long story short: the traphandler process in snmpd annoys me a great deal and is in the way for overhauling the transport mapping section of snmpe, which is needed for implementing new agentx master support. The current traphandler process is also a joke, since all it does is receive a message, does a minimal parsing validation and then pass then entire buffer to the parent process, which forks the actual handlers. This means that the current pledgeset is also way too wide for what traphandler does. Since snmpe already does a similar packet parsing I see no reason to keep this initial verification in the traphandler process. The diff below drops the traphandler process and moves the receiving of the packets to snmpe, which then forwards the pdu (instead of the current whole packet) to the parent. I also extended the checking, so it should be a little more resilient to malformed packets. While here I also didn't like the fact that listening on a trap-port always implies that snmp itself is also listening. This diff adds the trap keyword (for udp) on the listen on line, so that we can setup a traphandler-only setup. This gives the added bonus that we can now also choose the port we listen on for traps. Adding a listen trap statement requires a trap handle and vice versa. It's not the pretties code, but it's a stopgap that allows me to move forward without creating even larger diffs and without (hopefully) breaking existing setups (apart from the keyword change). OK? martijn@ Index: parse.y === RCS file: /cvs/src/usr.sbin/snmpd/parse.y,v retrieving revision 1.62 diff -u -p -r1.62 parse.y --- parse.y 30 Oct 2020 07:43:48 - 1.62 +++ parse.y 1 Dec 2020 16:12:20 - @@ -94,11 +94,9 @@ char *symget(const char *); struct snmpd *conf = NULL; static int errors = 0; static struct usmuser *user = NULL; -static char*snmpd_port = SNMPD_PORT; int host(const char *, const char *, int, struct sockaddr_storage *, int); -int listen_add(struct sockaddr_storage *, int); typedef struct { union { @@ -284,13 +282,16 @@ listenproto : UDP listen_udp listen_udp : STRING port { struct sockaddr_storage ss[16]; int nhosts, i; + char *port = $2; - nhosts = host($1, $2, SOCK_DGRAM, ss, nitems(ss)); + if (port == NULL) + port = SNMPD_PORT; + + nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss)); if (nhosts < 1) { yyerror("invalid address: %s", $1); free($1); - if ($2 != snmpd_port) - free($2); + free($2); YYERROR; } if (nhosts > (int)nitems(ss)) @@ -298,10 +299,38 @@ listen_udp: STRING port { $1, $2, nitems(ss)); free($1); - if ($2 != snmpd_port) + free($2); + for (i = 0; i < nhosts; i++) { + if (listen_add(&(ss[i]), SOCK_DGRAM, 0) == -1) { + yyerror("calloc"); + YYERROR; + } + } + } + | STRING port TRAP { + struct sockaddr_storage ss[16]; + int nhosts, i; + char *port = $2; + + if (port == NULL) + port = SNMPD_TRAPPORT; + + nhosts = host($1, port, SOCK_DGRAM, ss, nitems(ss)); + if (nhosts < 1) { + yyerror("invalid address: %s", $1); + free($1); free($2); + YYERROR; + } + if (nhosts > (int)nitems(ss)) + log_warn("%s:%s resolves to more than %zu hosts", + $1, $2, nitems(ss)); + + free($1); + free($2); for (i = 0; i < nhosts; i++) { - if (listen_add(&(ss[i]), SOCK_DGRAM) == -1) { + if (listen_add(&(ss[i]), SOCK_DGRAM, + ADDRESS_FLAG_TRAP) == -1) { yyerror("calloc"); YYERROR;
Re: Use SMR_TAILQ for `ps_threads'
On Tue, Dec 01, 2020 at 10:46:00AM -0300, Martin Pieuchot wrote: > On 01/12/20(Tue) 21:47, Jonathan Matthew wrote: > > On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote: > > > On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote: > > > > Every multi-threaded process keeps a list of threads in `ps_threads'. > > > > This list is iterated in interrupt and process context which makes it > > > > complicated to protect it with a rwlock. > > > > > > > > One of the places where such iteration is done is inside the tsleep(9) > > > > routines, directly in single_thread_check() or via CURSIG(). In order > > > > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to > > > > use SMR_TAILQ. This has the advantage of not introducing lock > > > > dependencies and allow us to address every iteration one-by-one. > > > > > > > > Diff below is a first step into this direction, it replaces the existing > > > > TAILQ_* macros by the locked version of SMR_TAILQ*. This is mostly > > > > lifted > > > > from claudio@'s diff and should not introduce any side effect. > > > > > > > > ok? > > > > > > > > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c > > > > index 390307c4c81..40a10e4c1c5 100644 > > > > --- sys/uvm/uvm_glue.c > > > > +++ sys/uvm/uvm_glue.c > > > > @@ -369,7 +369,7 @@ uvm_swapout_threads(void) > > > > * the smallest p_slptime > > > > */ > > > > slpp = NULL; > > > > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { > > > > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, > > > > p_thr_link) { > > > > switch (p->p_stat) { > > > > case SRUN: > > > > case SONPROC: > > > > > > > > > > Why did you not include the smr_call() to safely free struct proc in this > > > diff? > > So we can discuss it separately, I'm happy to do it now :) > > > I was wondering about this too. Freeing the struct proc is already delayed > > by some amount since it happens in the reaper or in the parent process, does > > it make sense to combine that with the SMR wait? > > exit1() for a multi-threaded process does the following: > > atomic_setbits_int(&p->p_flag, P_WEXIT); > > single_thread_check(p, 0); // directly or via single_thread_set() > > SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link); > > > Note that exit1() can't be called in parallel. > > If exit1() is called with EXIT_NORMAL, the current thread will start by > setting `ps_single' then iterate over `ps_thread'. This is done before > updating `ps_thread' so there's no race in that case. > > If exit1() is called with EXIT_THREAD, the current thread might end up > removing itself from `ps_thread' while another one is iterating over it. > Since we're currently concerned about the iteration in single_thread_set() > notice that `P_WEXIT' is checked first. So if my understanding is correct > is should be enough to do the following: > > SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link); > smr_barrier(); > > That would delays exit1() a bit but doesn't involve callback which is > IMHO simpler. > > Did I miss anything? Did you run a make build with that smr_barrier() in it and checked that it does not cause a slow down? I am sceptical, smr_barrier() is a very slow construct which introduces large delays and should be avoided whenever possible. -- :wq Claudio
Re: Prevent race in single_thread_set()
On Tue, Dec 01, 2020 at 10:27:15AM -0300, Martin Pieuchot wrote: > On 01/12/20(Tue) 10:21, Claudio Jeker wrote: > > On Mon, Nov 30, 2020 at 07:19:28PM -0300, Martin Pieuchot wrote: > > > On 04/11/20(Wed) 11:19, Martin Pieuchot wrote: > > > > Here's a 3rd approach to solve the TOCTOU race in single_thread_set(). > > > > The issue being that the lock serializing access to `ps_single' is not > > > > held when calling single_thread_check(). > > > > > > > > The approach below is controversial because it extends the scope of the > > > > SCHED_LOCK(). On the other hand, the two others approaches that both > > > > add a new lock to avoid this race ignore the fact that accesses to > > > > `ps_single' are currently not clearly serialized w/o KERNEL_LOCK(). > > > > > > > > So the diff below improves the situation in that regard and do not add > > > > more complexity due to the use of multiple locks. After having looked > > > > for a way to split the SCHED_LOCK() I believe this is the simplest > > > > approach. > > > > > > > > I deliberately used a *_locked() function to avoid grabbing the lock > > > > recursively as I'm trying to get rid of the recursion, see the other > > > > thread on tech@. > > > > > > > > That said the uses of `ps_single' in ptrace_ctrl() are not covered by > > > > this diff and I'd be glad to hear some comments about them. This is > > > > fine as long as all the code using `ps_single' runs under KERNEL_LOCK() > > > > but since we're trying to get the single_thread_* API out of it, this > > > > need to be addressed. > > > > > > > > Note that this diff introduces a helper for initializing ps_single* > > > > values in order to keep all the accesses of those fields in the same > > > > file. > > > > > > Anyone? With this only the `ps_threads' iteration must receive some > > > love in order to take the single_thread_* API out of the KERNEL_LOCK(). > > > For that I just sent a SMR_TAILQ conversion diff. > > > > > > Combined with the diff to remove the recursive attribute of the > > > SCHED_LOCK() we're ready to split it into multiple mutexes. > > > > > > Does that make any sense? Comments? Oks? > > > > > > > Index: kern/kern_fork.c > > > > === > > > > RCS file: /cvs/src/sys/kern/kern_fork.c,v > > > > retrieving revision 1.226 > > > > diff -u -p -r1.226 kern_fork.c > > > > --- kern/kern_fork.c25 Oct 2020 01:55:18 - 1.226 > > > > +++ kern/kern_fork.c4 Nov 2020 12:52:54 - > > > > @@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta > > > > * if somebody else wants to take us to single threaded mode, > > > > * count ourselves in. > > > > */ > > > > - if (pr->ps_single) { > > > > - atomic_inc_int(&pr->ps_singlecount); > > > > - atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); > > > > - } > > > > + single_thread_init(p); > > > > This is not the right name for this function. It does not initalize > > anything. Why is this indirection needed? I would just put the > > SCHED_LOCK() around this bit. It makes more sense especially with the > > comment above. > > Updated diff does that. I introduced a function because it helps me > having all the locking in the same place. > > > > > Index: kern/kern_sig.c > > > > === > > > > RCS file: /cvs/src/sys/kern/kern_sig.c,v > > > > retrieving revision 1.263 > > > > diff -u -p -r1.263 kern_sig.c > > > > --- kern/kern_sig.c 16 Sep 2020 13:50:42 - 1.263 > > > > +++ kern/kern_sig.c 4 Nov 2020 12:38:35 - > > > > @@ -1932,11 +1932,27 @@ userret(struct proc *p) > > > > p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri; > > > > } > > > > > > > > +void > > > > +single_thread_init(struct proc *p) > > > > +{ > > > > + struct process *pr = p->p_p; > > > > + int s; > > > > + > > > > + SCHED_LOCK(s); > > > > + if (pr->ps_single) { > > > > + atomic_inc_int(&pr->ps_singlecount); > > > > + atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); > > > > + } > > > > + SCHED_UNLOCK(s); > > > > +} > > > > + > > > > int > > > > -single_thread_check(struct proc *p, int deep) > > > > +_single_thread_check_locked(struct proc *p, int deep) > > > > Please don't add the leading _ to this function. There is no need for it. > > Done. > > > > > @@ -2014,7 +2043,6 @@ single_thread_set(struct proc *p, enum s > > > > panic("single_thread_mode = %d", mode); > > > > #endif > > > > } > > > > - SCHED_LOCK(s); > > > > pr->ps_singlecount = 0; > > > > membar_producer(); > > > > pr->ps_single = p; > > > > I think you can remove the membar_producer() here. It is of no use > > anymore. > > I can indeed because SCHED_LOCK() is currently taken by sleep_setup(). > I'd argue this is an implementation detail and I hope to change th
Re: More uvm_pagefree() lock fixes
> Date: Tue, 1 Dec 2020 11:08:50 -0300 > From: Martin Pieuchot > > As recently pointed out by jmatthew@ these need the lock as well, ok? No. These pages are "unmanaged" and therefore they can not be on any of the page queues and locking those page queues would be pointless. We probably should adjust the documentation to reflect this. > Index: uvm/uvm_km.c > === > RCS file: /cvs/src/sys/uvm/uvm_km.c,v > retrieving revision 1.137 > diff -u -p -r1.137 uvm_km.c > --- uvm/uvm_km.c 23 May 2020 06:15:09 - 1.137 > +++ uvm/uvm_km.c 1 Dec 2020 14:00:38 - > @@ -291,7 +291,9 @@ uvm_km_pgremove_intrsafe(vaddr_t start, > pg = PHYS_TO_VM_PAGE(pa); > if (pg == NULL) > panic("uvm_km_pgremove_intrsafe: no page"); > + uvm_lock_pageq(); > uvm_pagefree(pg); > + uvm_unlock_pageq(); > } > } > > @@ -799,7 +801,9 @@ uvm_km_doputpage(struct uvm_km_free_page > if (freeva) > uvm_unmap(kernel_map, va, va + PAGE_SIZE); > > + uvm_lock_pageq(); > uvm_pagefree(pg); > + uvm_unlock_pageq(); > return (nextfp); > } > #endif /* !__HAVE_PMAP_DIRECT */ > >
More uvm_pagefree() lock fixes
As recently pointed out by jmatthew@ these need the lock as well, ok? Index: uvm/uvm_km.c === RCS file: /cvs/src/sys/uvm/uvm_km.c,v retrieving revision 1.137 diff -u -p -r1.137 uvm_km.c --- uvm/uvm_km.c23 May 2020 06:15:09 - 1.137 +++ uvm/uvm_km.c1 Dec 2020 14:00:38 - @@ -291,7 +291,9 @@ uvm_km_pgremove_intrsafe(vaddr_t start, pg = PHYS_TO_VM_PAGE(pa); if (pg == NULL) panic("uvm_km_pgremove_intrsafe: no page"); + uvm_lock_pageq(); uvm_pagefree(pg); + uvm_unlock_pageq(); } } @@ -799,7 +801,9 @@ uvm_km_doputpage(struct uvm_km_free_page if (freeva) uvm_unmap(kernel_map, va, va + PAGE_SIZE); + uvm_lock_pageq(); uvm_pagefree(pg); + uvm_unlock_pageq(); return (nextfp); } #endif /* !__HAVE_PMAP_DIRECT */
Re: Use SMR_TAILQ for `ps_threads'
On 01/12/20(Tue) 21:47, Jonathan Matthew wrote: > On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote: > > On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote: > > > Every multi-threaded process keeps a list of threads in `ps_threads'. > > > This list is iterated in interrupt and process context which makes it > > > complicated to protect it with a rwlock. > > > > > > One of the places where such iteration is done is inside the tsleep(9) > > > routines, directly in single_thread_check() or via CURSIG(). In order > > > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to > > > use SMR_TAILQ. This has the advantage of not introducing lock > > > dependencies and allow us to address every iteration one-by-one. > > > > > > Diff below is a first step into this direction, it replaces the existing > > > TAILQ_* macros by the locked version of SMR_TAILQ*. This is mostly lifted > > > from claudio@'s diff and should not introduce any side effect. > > > > > > ok? > > > > > > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c > > > index 390307c4c81..40a10e4c1c5 100644 > > > --- sys/uvm/uvm_glue.c > > > +++ sys/uvm/uvm_glue.c > > > @@ -369,7 +369,7 @@ uvm_swapout_threads(void) > > >* the smallest p_slptime > > >*/ > > > slpp = NULL; > > > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { > > > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) { > > > switch (p->p_stat) { > > > case SRUN: > > > case SONPROC: > > > > > > > Why did you not include the smr_call() to safely free struct proc in this > > diff? So we can discuss it separately, I'm happy to do it now :) > I was wondering about this too. Freeing the struct proc is already delayed > by some amount since it happens in the reaper or in the parent process, does > it make sense to combine that with the SMR wait? exit1() for a multi-threaded process does the following: atomic_setbits_int(&p->p_flag, P_WEXIT); single_thread_check(p, 0); // directly or via single_thread_set() SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link); Note that exit1() can't be called in parallel. If exit1() is called with EXIT_NORMAL, the current thread will start by setting `ps_single' then iterate over `ps_thread'. This is done before updating `ps_thread' so there's no race in that case. If exit1() is called with EXIT_THREAD, the current thread might end up removing itself from `ps_thread' while another one is iterating over it. Since we're currently concerned about the iteration in single_thread_set() notice that `P_WEXIT' is checked first. So if my understanding is correct is should be enough to do the following: SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link); smr_barrier(); That would delays exit1() a bit but doesn't involve callback which is IMHO simpler. Did I miss anything?
Re: Prevent race in single_thread_set()
On 01/12/20(Tue) 10:21, Claudio Jeker wrote: > On Mon, Nov 30, 2020 at 07:19:28PM -0300, Martin Pieuchot wrote: > > On 04/11/20(Wed) 11:19, Martin Pieuchot wrote: > > > Here's a 3rd approach to solve the TOCTOU race in single_thread_set(). > > > The issue being that the lock serializing access to `ps_single' is not > > > held when calling single_thread_check(). > > > > > > The approach below is controversial because it extends the scope of the > > > SCHED_LOCK(). On the other hand, the two others approaches that both > > > add a new lock to avoid this race ignore the fact that accesses to > > > `ps_single' are currently not clearly serialized w/o KERNEL_LOCK(). > > > > > > So the diff below improves the situation in that regard and do not add > > > more complexity due to the use of multiple locks. After having looked > > > for a way to split the SCHED_LOCK() I believe this is the simplest > > > approach. > > > > > > I deliberately used a *_locked() function to avoid grabbing the lock > > > recursively as I'm trying to get rid of the recursion, see the other > > > thread on tech@. > > > > > > That said the uses of `ps_single' in ptrace_ctrl() are not covered by > > > this diff and I'd be glad to hear some comments about them. This is > > > fine as long as all the code using `ps_single' runs under KERNEL_LOCK() > > > but since we're trying to get the single_thread_* API out of it, this > > > need to be addressed. > > > > > > Note that this diff introduces a helper for initializing ps_single* > > > values in order to keep all the accesses of those fields in the same > > > file. > > > > Anyone? With this only the `ps_threads' iteration must receive some > > love in order to take the single_thread_* API out of the KERNEL_LOCK(). > > For that I just sent a SMR_TAILQ conversion diff. > > > > Combined with the diff to remove the recursive attribute of the > > SCHED_LOCK() we're ready to split it into multiple mutexes. > > > > Does that make any sense? Comments? Oks? > > > > > Index: kern/kern_fork.c > > > === > > > RCS file: /cvs/src/sys/kern/kern_fork.c,v > > > retrieving revision 1.226 > > > diff -u -p -r1.226 kern_fork.c > > > --- kern/kern_fork.c 25 Oct 2020 01:55:18 - 1.226 > > > +++ kern/kern_fork.c 4 Nov 2020 12:52:54 - > > > @@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta > > >* if somebody else wants to take us to single threaded mode, > > >* count ourselves in. > > >*/ > > > - if (pr->ps_single) { > > > - atomic_inc_int(&pr->ps_singlecount); > > > - atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); > > > - } > > > + single_thread_init(p); > > This is not the right name for this function. It does not initalize > anything. Why is this indirection needed? I would just put the > SCHED_LOCK() around this bit. It makes more sense especially with the > comment above. Updated diff does that. I introduced a function because it helps me having all the locking in the same place. > > > Index: kern/kern_sig.c > > > === > > > RCS file: /cvs/src/sys/kern/kern_sig.c,v > > > retrieving revision 1.263 > > > diff -u -p -r1.263 kern_sig.c > > > --- kern/kern_sig.c 16 Sep 2020 13:50:42 - 1.263 > > > +++ kern/kern_sig.c 4 Nov 2020 12:38:35 - > > > @@ -1932,11 +1932,27 @@ userret(struct proc *p) > > > p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri; > > > } > > > > > > +void > > > +single_thread_init(struct proc *p) > > > +{ > > > + struct process *pr = p->p_p; > > > + int s; > > > + > > > + SCHED_LOCK(s); > > > + if (pr->ps_single) { > > > + atomic_inc_int(&pr->ps_singlecount); > > > + atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); > > > + } > > > + SCHED_UNLOCK(s); > > > +} > > > + > > > int > > > -single_thread_check(struct proc *p, int deep) > > > +_single_thread_check_locked(struct proc *p, int deep) > > Please don't add the leading _ to this function. There is no need for it. Done. > > > @@ -2014,7 +2043,6 @@ single_thread_set(struct proc *p, enum s > > > panic("single_thread_mode = %d", mode); > > > #endif > > > } > > > - SCHED_LOCK(s); > > > pr->ps_singlecount = 0; > > > membar_producer(); > > > pr->ps_single = p; > > I think you can remove the membar_producer() here. It is of no use > anymore. I can indeed because SCHED_LOCK() is currently taken by sleep_setup(). I'd argue this is an implementation detail and I hope to change that soon. So unless I'm completely mistaken, I'd keep it. But again I don't have a strong opinion. Index: kern/kern_fork.c === RCS file: /cvs/src/sys/kern/kern_fork.c,v retrieving revision 1.226 diff -u -p -r1.226 kern_fork.c --- kern/kern_fork.c25 Oct 2020 01:55:18 - 1.226 +++ kern/kern_fork.c1 Dec 2020 13:04:37 - @@ -516,7 +516,
Re: Use SMR_TAILQ for `ps_threads'
On Tue, Dec 01, 2020 at 09:47:35PM +1000, Jonathan Matthew wrote: > On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote: > > On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote: > > > Every multi-threaded process keeps a list of threads in `ps_threads'. > > > This list is iterated in interrupt and process context which makes it > > > complicated to protect it with a rwlock. > > > > > > One of the places where such iteration is done is inside the tsleep(9) > > > routines, directly in single_thread_check() or via CURSIG(). In order > > > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to > > > use SMR_TAILQ. This has the advantage of not introducing lock > > > dependencies and allow us to address every iteration one-by-one. > > > > > > Diff below is a first step into this direction, it replaces the existing > > > TAILQ_* macros by the locked version of SMR_TAILQ*. This is mostly lifted > > > from claudio@'s diff and should not introduce any side effect. > > > > > > ok? > > > > > > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c > > > index 390307c4c81..40a10e4c1c5 100644 > > > --- sys/uvm/uvm_glue.c > > > +++ sys/uvm/uvm_glue.c > > > @@ -369,7 +369,7 @@ uvm_swapout_threads(void) > > >* the smallest p_slptime > > >*/ > > > slpp = NULL; > > > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { > > > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) { > > > switch (p->p_stat) { > > > case SRUN: > > > case SONPROC: > > > > > > > Why did you not include the smr_call() to safely free struct proc in this > > diff? > > I was wondering about this too. Freeing the struct proc is already delayed > by some amount since it happens in the reaper or in the parent process, does > it make sense to combine that with the SMR wait? My diff actually did that. My fear was that SMR callbacks can be very slow and so combining the two was worthwhile. -- :wq Claudio
Re: rad(8): rdomain support
I copied the (void)strlcpy from somewhere, likely ifconfig originally. Thanks for pointing it out, I shall remove it before commit. I don't like that style and don't think it helps anything during review since one needs to check anyway that truncation either can't happen or doesn't matter. On 1 December 2020 12:02:14 CET, Theo Buehler wrote: >On Mon, Nov 30, 2020 at 05:46:57PM +0100, Florian Obser wrote: >> On Sun, Nov 29, 2020 at 12:20:31PM +0100, Florian Obser wrote: >> > >> > Let rad(8) handle all rdomains in a single daemon, similar to >previous >> > work in slaacd. >> > >> > Suggested / requested by tb who showed me previous work by reyk >which >> > unfortunately predated my work in slaacd and followed a different >> > pattern. >> > >> > There has already been a fair bit of testing and input from tb on >> > previous iterations. >> > >> > More tests, OKs? >> > >> >> Updated diff after some testing by tb@ >> >> - our AF_ROUTE socket needs a RTABLE_ANY ROUTE_TABLEFILTER otherwise >> we don't see link-local addresses arriving in rdomains != 0. >> - check if the arriving link-local address is tentative (or a >> douplicate) and ignore it because we can't use it and sendmsg >failes. >> Monitor RTM_CHGADDRATTR messages to see when the link-local address >> becomes valid > >Thank you so much for this! > >I think I have done all the testing I can for my use case and this now >looks all good. I think this covers most, if not all, code paths added >by this diff. I did as thorough a review as I can, being foreign to >this >part of the tree. > >ok tb > >I only have one tiny comment: > >> +++ frontend.c > >[...] > >> @@ -746,15 +743,29 @@ get_link_state(char *if_name) >> return ls; >> } >> >> +int >> +get_ifrdomain(char *if_name) >> +{ >> +struct ifreq ifr; >> + >> +(void) strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name)); > >I'm not sure why you explicitly indicate that you don't want to check, >while you don't so in interface_has_linklocal_address() (both copy the >same name to an array of length IFNAMSIZ). -- Sent from a mobile device. Please excuse poor formating.
Re: Use SMR_TAILQ for `ps_threads'
On Tue, Dec 01, 2020 at 10:31:43AM +0100, Claudio Jeker wrote: > On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote: > > Every multi-threaded process keeps a list of threads in `ps_threads'. > > This list is iterated in interrupt and process context which makes it > > complicated to protect it with a rwlock. > > > > One of the places where such iteration is done is inside the tsleep(9) > > routines, directly in single_thread_check() or via CURSIG(). In order > > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to > > use SMR_TAILQ. This has the advantage of not introducing lock > > dependencies and allow us to address every iteration one-by-one. > > > > Diff below is a first step into this direction, it replaces the existing > > TAILQ_* macros by the locked version of SMR_TAILQ*. This is mostly lifted > > from claudio@'s diff and should not introduce any side effect. > > > > ok? > > > > diff --git sys/uvm/uvm_glue.c sys/uvm/uvm_glue.c > > index 390307c4c81..40a10e4c1c5 100644 > > --- sys/uvm/uvm_glue.c > > +++ sys/uvm/uvm_glue.c > > @@ -369,7 +369,7 @@ uvm_swapout_threads(void) > > * the smallest p_slptime > > */ > > slpp = NULL; > > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { > > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) { > > switch (p->p_stat) { > > case SRUN: > > case SONPROC: > > > > Why did you not include the smr_call() to safely free struct proc in this > diff? I was wondering about this too. Freeing the struct proc is already delayed by some amount since it happens in the reaper or in the parent process, does it make sense to combine that with the SMR wait?
Re: wireguard + witness
On 2020/12/01 21:27, Matt Dunwoodie wrote: > On Tue, 1 Dec 2020 10:32:29 +0100 > Sebastien Marie wrote: > > > Jason, Matt, > > > > sthen@ told me that the same lock is reported several times (exactly, > > two locks are reported several times: lock1, lock2, lock1, lock2...) > > > > witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in > > internal part of `struct rwlock'. Next it sets it, and skip futher > > analysis for this particular lock. > > > > if the same lock is reported several times, it means the memory is > > zeroed (and so the flag removed). it could be intented or not. but in > > all cases, a rwlock should be properly initialized with rw_init() or > > RWLOCK_INITIALIZER() before use. > > > > I don't have enough experiency with wg(4) stuff to well understand the > > issue. in wg_decap(), the peer is get from a `struct wg_tag` (from > > wg_tag_get()). If i didn't mess myself, the p_remote could come from > > wg_send_initiation() or wg_send_response(). but for both, it comes > > from wg_peer_create(), and p_remote is initialized with > > noise_remote_init() (and so lock have proper rw_init()). > > > > do you have idea on the cause of the lost of the rwlock flag ? > > > > if you want to test it with witness(4) enabled, you will need to build > > a kernel with "option WITNESS" in config file. you could uncomment it > > from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean, > > make > > Hi, > > This certainly isn't intentional. > > I had a quick look and believe the following patch should fix it. The > rwlock in the antireplay counter is rw_init'ed and then bzero'ed, > matching the symptoms. It also explains why lock1,lock2,lock1,... was > seen as there are generally two counters in use (one for the current > session and one for the previous session). It wouldn't make sense for > r_keypair_lock to be in two different locations for the same peer. > > I haven't tested it my side but I'm quite confident it should fix the > unitialised lock issue. > > Cheers, > Matt Thanks. Confirmed this works ok and stops witness from triggering, I'll go ahead and I'll commit it. > --- net/wg_noise.c > +++ net/wg_noise.c > @@ -467,8 +467,8 @@ noise_remote_begin_session(struct noise_remote *r) > kp.kp_local_index = hs->hs_local_index; > kp.kp_remote_index = hs->hs_remote_index; > getnanouptime(&kp.kp_birthdate); > - rw_init(&kp.kp_ctr.c_lock, "noise_counter"); > bzero(&kp.kp_ctr, sizeof(kp.kp_ctr)); > + rw_init(&kp.kp_ctr.c_lock, "noise_counter"); > > /* Now we need to add_new_keypair */ > rw_enter_write(&r->r_keypair_lock); >
Re: rad(8): rdomain support
On Mon, Nov 30, 2020 at 05:46:57PM +0100, Florian Obser wrote: > On Sun, Nov 29, 2020 at 12:20:31PM +0100, Florian Obser wrote: > > > > Let rad(8) handle all rdomains in a single daemon, similar to previous > > work in slaacd. > > > > Suggested / requested by tb who showed me previous work by reyk which > > unfortunately predated my work in slaacd and followed a different > > pattern. > > > > There has already been a fair bit of testing and input from tb on > > previous iterations. > > > > More tests, OKs? > > > > Updated diff after some testing by tb@ > > - our AF_ROUTE socket needs a RTABLE_ANY ROUTE_TABLEFILTER otherwise > we don't see link-local addresses arriving in rdomains != 0. > - check if the arriving link-local address is tentative (or a > douplicate) and ignore it because we can't use it and sendmsg failes. > Monitor RTM_CHGADDRATTR messages to see when the link-local address > becomes valid Thank you so much for this! I think I have done all the testing I can for my use case and this now looks all good. I think this covers most, if not all, code paths added by this diff. I did as thorough a review as I can, being foreign to this part of the tree. ok tb I only have one tiny comment: > +++ frontend.c [...] > @@ -746,15 +743,29 @@ get_link_state(char *if_name) > return ls; > } > > +int > +get_ifrdomain(char *if_name) > +{ > + struct ifreq ifr; > + > + (void) strlcpy(ifr.ifr_name, if_name, sizeof(ifr.ifr_name)); I'm not sure why you explicitly indicate that you don't want to check, while you don't so in interface_has_linklocal_address() (both copy the same name to an array of length IFNAMSIZ).
Re: wireguard + witness
On Tue, 1 Dec 2020 10:32:29 +0100 Sebastien Marie wrote: > Jason, Matt, > > sthen@ told me that the same lock is reported several times (exactly, > two locks are reported several times: lock1, lock2, lock1, lock2...) > > witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in > internal part of `struct rwlock'. Next it sets it, and skip futher > analysis for this particular lock. > > if the same lock is reported several times, it means the memory is > zeroed (and so the flag removed). it could be intented or not. but in > all cases, a rwlock should be properly initialized with rw_init() or > RWLOCK_INITIALIZER() before use. > > I don't have enough experiency with wg(4) stuff to well understand the > issue. in wg_decap(), the peer is get from a `struct wg_tag` (from > wg_tag_get()). If i didn't mess myself, the p_remote could come from > wg_send_initiation() or wg_send_response(). but for both, it comes > from wg_peer_create(), and p_remote is initialized with > noise_remote_init() (and so lock have proper rw_init()). > > do you have idea on the cause of the lost of the rwlock flag ? > > if you want to test it with witness(4) enabled, you will need to build > a kernel with "option WITNESS" in config file. you could uncomment it > from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean, > make Hi, This certainly isn't intentional. I had a quick look and believe the following patch should fix it. The rwlock in the antireplay counter is rw_init'ed and then bzero'ed, matching the symptoms. It also explains why lock1,lock2,lock1,... was seen as there are generally two counters in use (one for the current session and one for the previous session). It wouldn't make sense for r_keypair_lock to be in two different locations for the same peer. I haven't tested it my side but I'm quite confident it should fix the unitialised lock issue. Cheers, Matt --- net/wg_noise.c +++ net/wg_noise.c @@ -467,8 +467,8 @@ noise_remote_begin_session(struct noise_remote *r) kp.kp_local_index = hs->hs_local_index; kp.kp_remote_index = hs->hs_remote_index; getnanouptime(&kp.kp_birthdate); - rw_init(&kp.kp_ctr.c_lock, "noise_counter"); bzero(&kp.kp_ctr, sizeof(kp.kp_ctr)); + rw_init(&kp.kp_ctr.c_lock, "noise_counter"); /* Now we need to add_new_keypair */ rw_enter_write(&r->r_keypair_lock);
Re: wireguard + witness
On 2020/12/01 10:32, Sebastien Marie wrote: > On Tue, Dec 01, 2020 at 06:59:22AM +0100, Sebastien Marie wrote: > > On Mon, Nov 30, 2020 at 11:14:46PM +, Stuart Henderson wrote: > > > Thought I'd try a WITNESS kernel to see if that gives any clues about > > > what's going on with my APU crashing all over the place (long shot but > > > I got bored with trying different older kernels..) > > > > > > I see these from time to time (one during netstart, and another 4 in > > > 15 mins uptime), anyone know if they're important? > > > > this check ("lock_object uninitialized") was recently added in witness. > > > > it means that the rwlock was uninitialized (the witness flag > > LO_INITIALIZED isn't present whereas it should) when witness check the > > lock. > > > > it could be: > > - someone omits to call rw_init() or RWLOCK_INITIALIZER() (possible bug if > > memory isn't zero) > > - the struct has been zeroed (possible bug too) > > > > > witness: lock_object uninitialized: 0x80bcf0d8 > > > Starting stack trace... > > > witness_checkorder(80bcf0d8,9,0) at witness_checkorder+0xab > > > rw_enter_write(80bcf0c8) at rw_enter_write+0x43 > > > noise_remote_decrypt(80bcea48,978dc3d,0,fd805e22cb7c,10) at > > > noise_remote_decrypt+0x135 > > > wg_decap(8054a000,fd8061835200) at wg_decap+0xda > > > wg_decap_worker(8054a000) at wg_decap_worker+0x7a > > > taskq_thread(8012d900) at taskq_thread+0x9f > > > end trace frame: 0x0, count: 251 > > > End of stack trace. > > > > from the trace, the sole rw_enter_write() usage in noise_remote_decrypt() is > > > > rw_enter_write(&r->r_keypair_lock) > > > > but I am seeing few rw_init() on r_keypair_lock. I will see if I could > > see the source of the problem. > > > > Jason, Matt, > > sthen@ told me that the same lock is reported several times (exactly, > two locks are reported several times: lock1, lock2, lock1, lock2...) > > witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in > internal part of `struct rwlock'. Next it sets it, and skip futher > analysis for this particular lock. > > if the same lock is reported several times, it means the memory is > zeroed (and so the flag removed). it could be intented or not. but in > all cases, a rwlock should be properly initialized with rw_init() or > RWLOCK_INITIALIZER() before use. > > I don't have enough experiency with wg(4) stuff to well understand the > issue. in wg_decap(), the peer is get from a `struct wg_tag` (from > wg_tag_get()). If i didn't mess myself, the p_remote could come from > wg_send_initiation() or wg_send_response(). but for both, it comes > from wg_peer_create(), and p_remote is initialized with > noise_remote_init() (and so lock have proper rw_init()). > > do you have idea on the cause of the lost of the rwlock flag ? > > if you want to test it with witness(4) enabled, you will need to build > a kernel with "option WITNESS" in config file. you could uncomment it > from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean, > make > > Thanks. > -- > Sebastien Marie > Config of the machine is simple; one wg peer, the peer is on a fixed internet address, this machine is natted but the address won't change. It isn't routing, wg is only to reach the machine itself. Here's a more complete console log but I don't think it adds anything to the above. pf enabled starting network em0: no link em0: 192.168.42.220 lease accepted from 192.168.42.21 (00:1b:21:9b:68:10) add net 192.168.40.0/21: gateway 192.168.43.33 add net 195.95.187.0/24: gateway 192.168.43.33 reordering libraries:witness: lock_object uninitialized: 0x80bcf580 Starting stack trace... witness_checkorder(80bcf580,9,0) at witness_checkorder+0xab rw_enter_write(80bcf570) at rw_enter_write+0x43 noise_remote_decrypt(80bcea48,3f71de8d,0,fd80073918fc,10) at noise_remote_decrypt+0x135 wg_decap(8054a000,fd8067faf900) at wg_decap+0xda wg_decap_worker(8054a000) at wg_decap_worker+0x7a taskq_thread(8012d900) at taskq_thread+0x9f end trace frame: 0x0, count: 251 End of stack trace. done. starting early daemons: syslogd pflogd ntpd. starting RPC daemons:. savecore: no core dump checking quotas: done. clearing /tmp kern.securelevel: 0 -> 1 creating runtime link editor directory cache. preserving editor files. starting network daemons: sshd smtpd sndiod. starting package daemons: conserver lldpd. starting local daemons: cron. Mon Nov 30 22:59:18 GMT 2020 reorder_kernel: failed -- see /usr/share/relink/kernel/WITNESS.MP/relink.log OpenBSD/amd64 (cons1) (tty00) login: witness: lock_object uninitialized: 0x80bcf0d8 Starting stack trace... witness_checkorder(80bcf0d8,9,0) at witness_checkorder+0xab rw_enter_write(80bcf0c8) at rw_enter_write+0x43 noise_remote_decrypt(80bcea48,c4992785,0,fd80073c89bc,10) at noise_remote_decrypt+0x135 wg_decap(8054a00
Re: wireguard + witness
On Tue, Dec 01, 2020 at 06:59:22AM +0100, Sebastien Marie wrote: > On Mon, Nov 30, 2020 at 11:14:46PM +, Stuart Henderson wrote: > > Thought I'd try a WITNESS kernel to see if that gives any clues about > > what's going on with my APU crashing all over the place (long shot but > > I got bored with trying different older kernels..) > > > > I see these from time to time (one during netstart, and another 4 in > > 15 mins uptime), anyone know if they're important? > > this check ("lock_object uninitialized") was recently added in witness. > > it means that the rwlock was uninitialized (the witness flag > LO_INITIALIZED isn't present whereas it should) when witness check the > lock. > > it could be: > - someone omits to call rw_init() or RWLOCK_INITIALIZER() (possible bug if > memory isn't zero) > - the struct has been zeroed (possible bug too) > > > witness: lock_object uninitialized: 0x80bcf0d8 > > Starting stack trace... > > witness_checkorder(80bcf0d8,9,0) at witness_checkorder+0xab > > rw_enter_write(80bcf0c8) at rw_enter_write+0x43 > > noise_remote_decrypt(80bcea48,978dc3d,0,fd805e22cb7c,10) at > > noise_remote_decrypt+0x135 > > wg_decap(8054a000,fd8061835200) at wg_decap+0xda > > wg_decap_worker(8054a000) at wg_decap_worker+0x7a > > taskq_thread(8012d900) at taskq_thread+0x9f > > end trace frame: 0x0, count: 251 > > End of stack trace. > > from the trace, the sole rw_enter_write() usage in noise_remote_decrypt() is > > rw_enter_write(&r->r_keypair_lock) > > but I am seeing few rw_init() on r_keypair_lock. I will see if I could > see the source of the problem. > Jason, Matt, sthen@ told me that the same lock is reported several times (exactly, two locks are reported several times: lock1, lock2, lock1, lock2...) witness(4) reports when a lock doesn't have LO_INITIALIZED flag set in internal part of `struct rwlock'. Next it sets it, and skip futher analysis for this particular lock. if the same lock is reported several times, it means the memory is zeroed (and so the flag removed). it could be intented or not. but in all cases, a rwlock should be properly initialized with rw_init() or RWLOCK_INITIALIZER() before use. I don't have enough experiency with wg(4) stuff to well understand the issue. in wg_decap(), the peer is get from a `struct wg_tag` (from wg_tag_get()). If i didn't mess myself, the p_remote could come from wg_send_initiation() or wg_send_response(). but for both, it comes from wg_peer_create(), and p_remote is initialized with noise_remote_init() (and so lock have proper rw_init()). do you have idea on the cause of the lost of the rwlock flag ? if you want to test it with witness(4) enabled, you will need to build a kernel with "option WITNESS" in config file. you could uncomment it from sys/arch/amd64/conf/GENERIC.MP, and run make config, make clean, make Thanks. -- Sebastien Marie
Re: Use SMR_TAILQ for `ps_threads'
On Mon, Nov 30, 2020 at 07:10:47PM -0300, Martin Pieuchot wrote: > Every multi-threaded process keeps a list of threads in `ps_threads'. > This list is iterated in interrupt and process context which makes it > complicated to protect it with a rwlock. > > One of the places where such iteration is done is inside the tsleep(9) > routines, directly in single_thread_check() or via CURSIG(). In order > to take this code path out of the KERNEL_LOCK(), claudio@ proposed to > use SMR_TAILQ. This has the advantage of not introducing lock > dependencies and allow us to address every iteration one-by-one. > > Diff below is a first step into this direction, it replaces the existing > TAILQ_* macros by the locked version of SMR_TAILQ*. This is mostly lifted > from claudio@'s diff and should not introduce any side effect. > > ok? > > diff --git lib/libkvm/kvm_proc2.c lib/libkvm/kvm_proc2.c > index 96f7dc91b92..1f4f9b914bb 100644 > --- lib/libkvm/kvm_proc2.c > +++ lib/libkvm/kvm_proc2.c > @@ -341,8 +341,9 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process > *pr, > kp.p_pctcpu = 0; > kp.p_stat = (process.ps_flags & PS_ZOMBIE) ? SDEAD : > SIDL; > - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; > - p = TAILQ_NEXT(&proc, p_thr_link)) { > + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); > + p != NULL; > + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) { > if (KREAD(kd, (u_long)p, &proc)) { > _kvm_err(kd, kd->program, > "can't read proc at %lx", > @@ -376,8 +377,8 @@ kvm_proclist(kvm_t *kd, int op, int arg, struct process > *pr, > if (!dothreads) > continue; > > - for (p = TAILQ_FIRST(&process.ps_threads); p != NULL; > - p = TAILQ_NEXT(&proc, p_thr_link)) { > + for (p = SMR_TAILQ_FIRST_LOCKED(&process.ps_threads); p != NULL; > + p = SMR_TAILQ_NEXT_LOCKED(&proc, p_thr_link)) { > if (KREAD(kd, (u_long)p, &proc)) { > _kvm_err(kd, kd->program, > "can't read proc at %lx", > diff --git sys/kern/exec_elf.c sys/kern/exec_elf.c > index 5e455208663..575273b306c 100644 > --- sys/kern/exec_elf.c > +++ sys/kern/exec_elf.c > @@ -85,6 +85,7 @@ > #include > #include > #include > +#include > > #include > > @@ -1360,7 +1361,7 @@ coredump_notes_elf(struct proc *p, void *iocookie, > size_t *sizep) >* threads in the process have been stopped and the list can't >* change. >*/ > - TAILQ_FOREACH(q, &pr->ps_threads, p_thr_link) { > + SMR_TAILQ_FOREACH_LOCKED(q, &pr->ps_threads, p_thr_link) { > if (q == p) /* we've taken care of this thread */ > continue; > error = coredump_note_elf(q, iocookie, ¬esize); > diff --git sys/kern/init_main.c sys/kern/init_main.c > index fed6be19435..2b657ffe328 100644 > --- sys/kern/init_main.c > +++ sys/kern/init_main.c > @@ -519,7 +519,7 @@ main(void *framep) >*/ > LIST_FOREACH(pr, &allprocess, ps_list) { > nanouptime(&pr->ps_start); > - TAILQ_FOREACH(p, &pr->ps_threads, p_thr_link) { > + SMR_TAILQ_FOREACH_LOCKED(p, &pr->ps_threads, p_thr_link) { > nanouptime(&p->p_cpu->ci_schedstate.spc_runtime); > timespecclear(&p->p_rtime); > } > diff --git sys/kern/kern_exit.c sys/kern/kern_exit.c > index a20775419e3..ffc0158954c 100644 > --- sys/kern/kern_exit.c > +++ sys/kern/kern_exit.c > @@ -63,6 +63,7 @@ > #ifdef SYSVSEM > #include > #endif > +#include > #include > > #include > @@ -161,7 +162,7 @@ exit1(struct proc *p, int xexit, int xsig, int flags) > } > > /* unlink ourselves from the active threads */ > - TAILQ_REMOVE(&pr->ps_threads, p, p_thr_link); > + SMR_TAILQ_REMOVE_LOCKED(&pr->ps_threads, p, p_thr_link); > if ((p->p_flag & P_THREAD) == 0) { > /* main thread gotta wait because it has the pid, et al */ > while (pr->ps_refcnt > 1) > @@ -724,7 +725,7 @@ process_zap(struct process *pr) > if (pr->ps_ptstat != NULL) > free(pr->ps_ptstat, M_SUBPROC, sizeof(*pr->ps_ptstat)); > pool_put(&rusage_pool, pr->ps_ru); > - KASSERT(TAILQ_EMPTY(&pr->ps_threads)); > + KASSERT(SMR_TAILQ_EMPTY_LOCKED(&pr->ps_threads)); > lim_free(pr->ps_limit); > crfree(pr->ps_ucred); > pool_put(&process_pool, pr); > diff --git sys/kern/kern_fork.c sys/kern/kern_fork.c > index 9fb239bc8b4..e1cb587b2b8 100644 > --- sys/kern/kern_fork.c > +++ sys/kern/kern_fork.c > @@ -52,6 +52,7 @@ > #include > #include > #include > +#include
Re: Prevent race in single_thread_set()
On Mon, Nov 30, 2020 at 07:19:28PM -0300, Martin Pieuchot wrote: > On 04/11/20(Wed) 11:19, Martin Pieuchot wrote: > > Here's a 3rd approach to solve the TOCTOU race in single_thread_set(). > > The issue being that the lock serializing access to `ps_single' is not > > held when calling single_thread_check(). > > > > The approach below is controversial because it extends the scope of the > > SCHED_LOCK(). On the other hand, the two others approaches that both > > add a new lock to avoid this race ignore the fact that accesses to > > `ps_single' are currently not clearly serialized w/o KERNEL_LOCK(). > > > > So the diff below improves the situation in that regard and do not add > > more complexity due to the use of multiple locks. After having looked > > for a way to split the SCHED_LOCK() I believe this is the simplest > > approach. > > > > I deliberately used a *_locked() function to avoid grabbing the lock > > recursively as I'm trying to get rid of the recursion, see the other > > thread on tech@. > > > > That said the uses of `ps_single' in ptrace_ctrl() are not covered by > > this diff and I'd be glad to hear some comments about them. This is > > fine as long as all the code using `ps_single' runs under KERNEL_LOCK() > > but since we're trying to get the single_thread_* API out of it, this > > need to be addressed. > > > > Note that this diff introduces a helper for initializing ps_single* > > values in order to keep all the accesses of those fields in the same > > file. > > Anyone? With this only the `ps_threads' iteration must receive some > love in order to take the single_thread_* API out of the KERNEL_LOCK(). > For that I just sent a SMR_TAILQ conversion diff. > > Combined with the diff to remove the recursive attribute of the > SCHED_LOCK() we're ready to split it into multiple mutexes. > > Does that make any sense? Comments? Oks? > > > Index: kern/kern_fork.c > > === > > RCS file: /cvs/src/sys/kern/kern_fork.c,v > > retrieving revision 1.226 > > diff -u -p -r1.226 kern_fork.c > > --- kern/kern_fork.c25 Oct 2020 01:55:18 - 1.226 > > +++ kern/kern_fork.c4 Nov 2020 12:52:54 - > > @@ -563,10 +563,7 @@ thread_fork(struct proc *curp, void *sta > > * if somebody else wants to take us to single threaded mode, > > * count ourselves in. > > */ > > - if (pr->ps_single) { > > - atomic_inc_int(&pr->ps_singlecount); > > - atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); > > - } > > + single_thread_init(p); This is not the right name for this function. It does not initalize anything. Why is this indirection needed? I would just put the SCHED_LOCK() around this bit. It makes more sense especially with the comment above. In my opinion a better name would be single_thread_add() or single_thread_hatch() but even that I'm not too fond of. > > > > /* > > * Return tid to parent thread and copy it out to userspace > > Index: kern/kern_sig.c > > === > > RCS file: /cvs/src/sys/kern/kern_sig.c,v > > retrieving revision 1.263 > > diff -u -p -r1.263 kern_sig.c > > --- kern/kern_sig.c 16 Sep 2020 13:50:42 - 1.263 > > +++ kern/kern_sig.c 4 Nov 2020 12:38:35 - > > @@ -1932,11 +1932,27 @@ userret(struct proc *p) > > p->p_cpu->ci_schedstate.spc_curpriority = p->p_usrpri; > > } > > > > +void > > +single_thread_init(struct proc *p) > > +{ > > + struct process *pr = p->p_p; > > + int s; > > + > > + SCHED_LOCK(s); > > + if (pr->ps_single) { > > + atomic_inc_int(&pr->ps_singlecount); > > + atomic_setbits_int(&p->p_flag, P_SUSPSINGLE); > > + } > > + SCHED_UNLOCK(s); > > +} > > + > > int > > -single_thread_check(struct proc *p, int deep) > > +_single_thread_check_locked(struct proc *p, int deep) Please don't add the leading _ to this function. There is no need for it. > > { > > struct process *pr = p->p_p; > > > > + SCHED_ASSERT_LOCKED(); > > + > > if (pr->ps_single != NULL && pr->ps_single != p) { > > do { > > int s; > > @@ -1949,14 +1965,12 @@ single_thread_check(struct proc *p, int > > return (EINTR); > > } > > > > - SCHED_LOCK(s); > > - if (pr->ps_single == NULL) { > > - SCHED_UNLOCK(s); > > + if (pr->ps_single == NULL) > > continue; > > - } > > > > if (atomic_dec_int_nv(&pr->ps_singlecount) == 0) > > wakeup(&pr->ps_singlecount); > > + > > if (pr->ps_flags & PS_SINGLEEXIT) { > > SCHED_UNLOCK(s); > > KERNEL_LOCK(); > > @@ -1967,13 +1981,24 @@ single_thread_check(struct proc *p, int > > /* n